diff --git a/README.md b/README.md index 5ea56601b6088bb737fde97ca01261d5bce9f4ae..f6db521b2ec32901c9d464b2d49646983ddacf73 100644 --- a/README.md +++ b/README.md @@ -1,114 +1,82 @@ -# vllm-mindspore - -## Overview - -The `vllm-mindspore`is a integration for running vLLM on the MindSpore framework. - -This is the recommended solution for supporting the MindSpore within the vLLM community. It provides deep integration with the MindSpore framework, offering efficient computation and optimization support for vLLM, enabling seamless operation on MindSpore. - -By using the `vllm-mindspore`, popular open-source models, can run seamlessly for training and inference on the MindSpore framework. - ---- - -## Prerequisites - -- Hardware: Atlas A2/A3 -- Software: - - Python >= 3.9 - - CANN >= 8.0.0 - - MindSpore >=2.5.0 - ---- - -## Getting Started - -### Installation - -#### Installation from source code - -Install from source code. [Wiki Installation.](https://gitee.com/mindspore/vllm-mindspore/wikis/Getting%20Started/Installation) - -#### Set up using Docker - -##### Pre-built images - -```shell -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:v1.0 -``` - -##### Build image from source - -```shell -docker build --network=host . -``` - -### Inference and Serving - -#### Offline Inference - -You can run vllm_mindspore in your own code on a list of prompts. - -```bash -export ASCEND_TOTAL_MEMORY_GB=64 # Based on the ascend device. -``` - -```python - -import vllm_mindspore # Add this line on the top of script. - -from vllm import LLM, SamplingParams - -# Sample prompts. -prompts = [ - "I am", - "Today is", - "What is" -] - -# Create a sampling params object. -sampling_params = SamplingParams(temperature=0.0, top_p=0.95) - -# Create an LLM. -llm = LLM(model="Qwen/Qwen2.5-32B-Instruct", tensor_parallel_size=8) -# Generate texts from the prompts. The output is a list of RequestOutput objects -# that contain the prompt, generated text, and other information. -outputs = llm.generate(prompts, sampling_params) -# Print the outputs. -for output in outputs: - prompt = output.prompt - generated_text = output.outputs[0].text - print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") - -``` - -#### Serving(OpenAI-Compatible) - -You can start the server via the vllm_mindspore command: - -`python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-32B-Instruct" --tensor_parallel_size=8` - -To call the server, you can use `curl` or any other HTTP client. - -```shell - -curl http://localhost:8000/v1/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "Qwen/Qwen2.5-32B-Instruct", - "prompt": "MindSpore is", - "max_tokens": 120, - "temperature": 0 - }' - -``` - -## Contributing - -We welcome and value any contributions and collaborations: - -- Please feel free comments about your usage of vllm_mindspore. -- Please let us know if you encounter a bug by filing an issue. - -## License - -Apache License 2.0, as found in the [LICENSE](https://gitee.com/mindspore/vllm_mindspore/blob/master/LICENSE) file. +# vLLM-MindSpore + +## 概述 +vLLM-MindSpore 是一个基于 MindSpore 的高效大语言模型推理框架。它结合了 vLLM 的高效推理能力和 MindSpore 的深度学习框架优势,支持多种大语言模型的快速部署和推理。 + +## 安装 + +### 从源码安装 +```bash +git clone https://gitee.com/your-repo/vllm-mindspore.git +cd vllm-mindspore +pip install -e . +``` + +### 使用 Docker 安装 +#### 预构建镜像 +```bash +docker pull your-dockerhub/vllm-mindspore:latest +``` + +#### 从源码构建镜像 +```bash +docker build -t your-dockerhub/vllm-mindspore:latest . +``` + +## 推理与服务部署 + +### 离线推理 +以下是一个简单的离线推理示例: + +```python +from vllm_mindspore import LLM, SamplingParams + +# 示例提示 +prompts = [ + "Hello, my name is", + "The president of the United States is", + "The capital of France is", + "The future of AI is", +] + +# 创建采样参数对象 +sampling_params = SamplingParams(temperature=0.8, top_p=0.95) + +# 创建 LLM +llm = LLM(model="path/to/model") + +# 从提示生成文本 +outputs = llm.generate(prompts, sampling_params) + +# 打印输出 +for output in outputs: + print(output) +``` + +### 服务化部署(兼容 OpenAI API) + +你可以将模型部署为一个兼容 OpenAI API 的服务: + +```bash +python -m vllm_mindspore.entrypoints.openai.api_server --model path/to/model --host 0.0.0.0 --port 8000 +``` + +然后通过 HTTP 请求进行推理: + +```bash +curl http://localhost:8000/v1/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "path/to/model", + "prompt": "Hello, my name is", + "max_tokens": 50 + }' +``` + +## 贡献 + +欢迎贡献代码和文档!请参考 [CONTRIBUTING.md](CONTRIBUTING.md) 获取更多信息。 + +## 许可证 + +本项目采用 Apache-2.0 许可证。详情请参阅 [LICENSE](LICENSE) 文件。 \ No newline at end of file