diff --git a/README.md b/README.md index 5ea56601b6088bb737fde97ca01261d5bce9f4ae..7cd23cac9ed01759dc011ed5a967504de04b5cb4 100644 --- a/README.md +++ b/README.md @@ -1,114 +1,64 @@ -# vllm-mindspore +

+vLLM MindSpore +

-## Overview +

+| 关于MindSpore | #vLLM MindSpore SIG | 问题反馈 | +

-The `vllm-mindspore`is a integration for running vLLM on the MindSpore framework. - -This is the recommended solution for supporting the MindSpore within the vLLM community. It provides deep integration with the MindSpore framework, offering efficient computation and optimization support for vLLM, enabling seamless operation on MindSpore. - -By using the `vllm-mindspore`, popular open-source models, can run seamlessly for training and inference on the MindSpore framework. +

+English | 中文 +

--- +*最新消息* 🔥 -## Prerequisites - -- Hardware: Atlas A2/A3 -- Software: - - Python >= 3.9 - - CANN >= 8.0.0 - - MindSpore >=2.5.0 +- [Coming Soon🏃] 适配vLLM [v0.8.3](https://github.com/vllm-project/vllm/releases/tag/v0.8.3),新增支持vLLM V1架构、Qwen3大模型。 +- [2025/04] 完成vLLM [v0.7.3](https://github.com/vllm-project/vllm/releases/tag/v0.7.3)适配,新增支持Automatic Prefix Caching、Chunked Prefill、Multi-step Scheduling、MTP等特性。联合openEuler社区和上海交通大学,实现DeepSeek全栈开源单机推理部署,你可以在[这里](https://www.openeuler.org/zh/news/openEuler/20240421-jd/20240421-jd.html)阅读详细报道。 +- [2025/03] 完成vLLM [v0.6.6.post1](https://github.com/vllm-project/vllm/releases/tag/v0.6.6.post1)适配,支持采用vllm.entrypoints部署基于MindSpore的DeepSeek-V3/R1、Qwen2.5等大模型推理服务。联合openEuler社区和北京大学,发布全栈开源DeepSeek推理方案,你可以在[这里](https://news.pku.edu.cn/xwzh/e13046c47d03471c8cebb950bd1f4598.htm)阅读详细报道。 +- [2025/02] MindSpore社区正式创建了[mindspore/vllm-mindspore](https://gitee.com/mindspore/vllm-mindspore)代码,旨在将MindSpore大模型推理能力接入vLLM。 --- -## Getting Started - -### Installation - -#### Installation from source code - -Install from source code. [Wiki Installation.](https://gitee.com/mindspore/vllm-mindspore/wikis/Getting%20Started/Installation) - -#### Set up using Docker - -##### Pre-built images - -```shell -docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:v1.0 -``` - -##### Build image from source - -```shell -docker build --network=host . -``` - -### Inference and Serving - -#### Offline Inference - -You can run vllm_mindspore in your own code on a list of prompts. - -```bash -export ASCEND_TOTAL_MEMORY_GB=64 # Based on the ascend device. -``` - -```python - -import vllm_mindspore # Add this line on the top of script. - -from vllm import LLM, SamplingParams - -# Sample prompts. -prompts = [ - "I am", - "Today is", - "What is" -] +# 简介 -# Create a sampling params object. -sampling_params = SamplingParams(temperature=0.0, top_p=0.95) +vLLM Mindspore插件(`vllm-mindspore`)是一个由[MindSpore社区](https://www.mindspore.cn/)孵化的vLLM后端插件。其旨在将基于Mindspore构建的大模型推理能力接入[vLLM](https://github.com/vllm-project/vllm),从而有机整合Mindspore和vLLM的技术长板,提供全栈开源、高性能、易用的大模型推理解决方案。 -# Create an LLM. -llm = LLM(model="Qwen/Qwen2.5-32B-Instruct", tensor_parallel_size=8) -# Generate texts from the prompts. The output is a list of RequestOutput objects -# that contain the prompt, generated text, and other information. -outputs = llm.generate(prompts, sampling_params) -# Print the outputs. -for output in outputs: - prompt = output.prompt - generated_text = output.outputs[0].text - print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") +vLLM MindSpore插件以将Mindspore大模型接入vLLM,并实现服务化部署为功能目标。其遵循以下设计原则: -``` +- 接口兼容:支持vLLM原生的API和服务部署接口,避免新增配置文件或接口,降低用户学习成本和确保易用性。 +- 最小化侵入式修改:尽可能避免侵入式修改vLLM代码,以保障系统的可维护性和可演进性。 +- 组件解耦:最小化和规范化MindSpore大模型组件和vLLM服务组件的耦合面,以利于多种MindSpore大模型套件接入。 -#### Serving(OpenAI-Compatible) +基于上述设计原则,vLLM MindSpore采用如下图所示的系统架构,分组件类别实现vLLM与Mindspore的对接: -You can start the server via the vllm_mindspore command: +- 服务化组件:通过将LLM Engine、Scheduler等服务化组件中的PyTorch API调用映射至MindSpore能力调用,继承支持包括Continuous Batching、PagedAttention在内的服务化功能。 +- 大模型组件:通过注册或替换模型、网络层、自定义算子等组件,将MindSpore Transformers、MindSpore One等MindSpore大模型套件和自定义大模型接入vLLM。 -`python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-32B-Instruct" --tensor_parallel_size=8` +![Architecture](docs/arch.cn.png) +vLLM MindSpore采用vLLM社区推荐的插件机制,实现能力注册。未来期望遵循[[RPC] Multi-framework support for vllm](https://gitee.com/mindspore/vllm-mindspore/issues/IBTNRG)所述原则。 -To call the server, you can use `curl` or any other HTTP client. +# 环境准备 -```shell +- 硬件:Atlas 800I A2推理服务器,或Atlas 800T A2推理服务器,已安装必要的驱动程序,并可连接至互联网 +- 操作系统:openEuler或Ubuntu Linux +- 软件: + - Python >= 3.9, < 3.12 + - CANN >= 8.0.0.beta1 + - MindSpore (与vllm-mindspore版本配套) + - vLLM (与vllm-mindspore版本配套) -curl http://localhost:8000/v1/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "Qwen/Qwen2.5-32B-Instruct", - "prompt": "MindSpore is", - "max_tokens": 120, - "temperature": 0 - }' +# 快速体验 -``` +请查看[快速体验](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md)和[安装指南](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md)了解更多。 -## Contributing +# 贡献 -We welcome and value any contributions and collaborations: +请参考 [CONTRIBUTING](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md) 文档了解更多关于开发环境搭建、功能测试以及 PR 提交规范的信息。 -- Please feel free comments about your usage of vllm_mindspore. -- Please let us know if you encounter a bug by filing an issue. +我们欢迎并重视任何形式的贡献与合作,请通过[Issue](https://gitee.com/mindspore/vllm-mindspore/issues)来告知我们您遇到的任何Bug,或提交您的特性需求、改进建议、技术方案。 -## License +# SIG组织 -Apache License 2.0, as found in the [LICENSE](https://gitee.com/mindspore/vllm_mindspore/blob/master/LICENSE) file. +- 欢迎加入LLM Infercence Serving,参与开源项目共建和产业合作:[https://www.mindspore.cn/community/SIG](https://www.mindspore.cn/community/SIG) +- SIG例会,双周周五或周六晚上,20:00 - 21:00 (UTC+8, [查看您的时区](https://dateful.com/convert/gmt8?t=15)) diff --git a/docs/arch.cn.png b/docs/arch.cn.png new file mode 100644 index 0000000000000000000000000000000000000000..b2c2d0aedfbb3bad25e50071d8070e1f5c3f447d Binary files /dev/null and b/docs/arch.cn.png differ diff --git a/docs/arch.png b/docs/arch.png new file mode 100644 index 0000000000000000000000000000000000000000..fc3b524ca3487ae92431c58157175b4ddcb42725 Binary files /dev/null and b/docs/arch.png differ