# vllm-mindspore **Repository Path**: anyrenwei/vllm-mindspore ## Basic Information - **Project Name**: vllm-mindspore - **Description**: MindSpore的vLLM插件,支持基于vLLM框架部署MindSpore模型的推理服务。 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 120 - **Created**: 2025-03-04 - **Last Updated**: 2025-03-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # vllm_mindspore ## Overview The `vllm-mindspore`is a integration for running vLLM on the MindSpore framework. This is the recommended solution for supporting the MindSpore within the vLLM community. It provides deep integration with the MindSpore framework, offering efficient computation and optimization support for vLLM, enabling seamless operation on MindSpore. By using the `vllm-mindspore`, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run seamlessly for training and inference on the MindSpore framework. --- ## Prerequisites - Hardware: Atlas A2/A3 - Software: - Python >= 3.9 - CANN >= 8.0.0 - MindSpore >=2.5.0 --- ## Getting Started ### Installation #### Installation from source code ```shell # 1. Uninstall torch-related packages due to msadapter limitations pip3 uninstall torch torch-npu torchvision # 2.Install vllm_mindspore git clone https://gitee.com/mindspore/vllm_mindspore.git cd vllm_mindspore pip install . ``` #### Set up using Docker ##### Pre-built images ```shell docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:v1.0 ``` ##### Build image from source ```shell docker build --network=host . ``` ### Inference and Serving #### Offline Inference You can run vllm_mindspore in your own code on a list of prompts. ```python import vllm_mindspore # Add this line on the top of script. from vllm import LLM, SamplingParams # Sample prompts. prompts = [ "I am", "Today is", "Llama is" ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.0, top_p=0.95) # Create an LLM. llm = LLM(model="meta-llama/Llama-2-7b-hf") # Generate texts from the prompts. The output is a list of RequestOutput objects # that contain the prompt, generated text, and other information. outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` #### Serving(OpenAI-Compatible) You can start the server via the vllm_mindspore command: `python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "meta-llama/Llama-2-7b-hf"` To call the server, you can use `curl` or any other HTTP client. ```shell curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-2-7b-hf", "prompt": "Llama is", "max_tokens": 120, "temperature": 0 }' ``` ## Contributing We welcome and value any contributions and collaborations: - Please feel free comments about your usage of vllm_mindspore. - Please let us know if you encounter a bug by filing an issue. ## License Apache License 2.0, as found in the [LICENSE](https://gitee.com/mindspore/vllm_mindspore/blob/master/LICENSE) file.