From ab8ea070a833dfedfeb1b87fe9ff2e3914b46547 Mon Sep 17 00:00:00 2001 From: horcam Date: Thu, 29 May 2025 20:47:38 +0800 Subject: [PATCH] update vllm-mindspore --- .../installation/installation.md | 50 ++++++------------- .../quick_start/quick_start.md | 14 +++--- .../qwen2.5_32b_multiNPU.md | 18 +++---- .../qwen2.5_7b_singleNPU.md | 16 +++--- .../docs/source_zh_cn/index.rst | 2 +- .../supported_features/operations/npu_ops.md | 2 + .../models_list/models_list.md | 5 +- 7 files changed, 42 insertions(+), 65 deletions(-) diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md index 46ec5290d9..f76ce96408 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md @@ -22,7 +22,7 @@ |[MindSpore Transformers](https://gitee.com/mindspore/mindformers)|1.6 | br_infer_deepseek_os | |[Golden Stick](https://gitee.com/mindspore/golden-stick)|1.1.0 | r1.1.0 | |[vLLM](https://github.com/vllm-project/vllm) | 0.8.3 | v0.8.3 | - |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.2 | develop | + |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.2 | master | ## 配置环境 @@ -106,8 +106,8 @@ pip install vllm_mindspore ### 源码安装 -- 安装CANN与MindSpore - CANN与mindspore的环境配套与安装方法,请参考[MindSpore安装教程](https://www.mindspore.cn/install)与[CANN社区版软件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit)。 +- 安装CANN + CANN安装方法与环境配套,请参考[CANN社区版软件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit),若用户在安装CANN过程中遇到问题,可参考[昇腾常见问题](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html)进行解决。 CANN默认安装路径为`/usr/local/Ascend`。用户在安装CANN完毕后,使用如下命令,为CANN配置环境变量: @@ -117,40 +117,11 @@ pip install vllm_mindspore export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit ``` - 用户安装之后,可以通过以下命令,校验CANN与MindSpore是否安装成功: +- 安装vLLM的前置依赖 + vLLM的环境配置与安装方法,请参考[vLLM安装教程](https://docs.vllm.ai/en/v0.8.3/getting_started/installation/cpu.html)。其依赖`gcc/g++ >= 12.3.0`版本,可通过以下命令完成安装: ```bash - python -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check();exit()" - ``` - - 若执行后返回以下结果,则MindSpore已安装成功: - - ```text - The result of multiplication calculation is correct, MindSpore has been installed on platform [Ascend] successfully! - ``` - - 若用户在安装CANN与MindSpore过程中遇到问题,可参考[MindSpore常见问题](https://www.mindspore.cn/docs/zh-CN/r2.6.0/faq/)与[昇腾常见问题](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html)进行解决。 - -- 安装vLLM - vLLM的环境配置与安装方法,请参考[vLLM安装教程](https://docs.vllm.ai/en/v0.8.3/getting_started/installation/cpu.html)。其依赖`gcc/g++ >= 12.3.0`的版本,在准备好该依赖后,执行以下命令拉取vLLM源码: - - ```bash - git clone https://github.com/vllm-project/vllm.git vllm_source - cd vllm_source - ``` - - 安装vLLM CPU后端所需Python依赖包: - - ```bash - pip install --upgrade pip - pip install "cmake>=3.26" wheel packaging ninja "setuptools-scm>=8" numpy - pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu - ``` - - 最后,编译安装vLLM CPU: - - ```bash - VLLM_TARGET_DEVICE=cpu python setup.py install + yum install -y gcc gcc-c++ ``` - 安装vLLM MindSpore @@ -169,6 +140,15 @@ pip install vllm_mindspore pip install . ``` + 上述命令执行完毕之后,将在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-dev`文件夹,将其加入到环境变量中: + + ```bash + export MF_PATH=`pwd install_depend_pkgs/mindformers-dev` + export PYTHONPATH=$MF_PATH:$PYTHONPATH + ``` + + 若MindSpore Transformers是由`br_infer_deepseek_os`分支编译安装,则会在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-os`文件夹,则环境变量`MF_PATH`需调整为`pwd install_depend_pkgs/mindformers-os`。 + ### 快速验证 用户可以创建一个简单的离线推理场景,验证安装是否成功。下面以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例,用户可以使用如下Python脚本,进行模型的离线推理: diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md index 16e5663c23..60941bdc89 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md @@ -202,24 +202,24 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 #### 发送请求 -使用如下命令发送请求。其中`$PROMPT`为模型输入: +使用如下命令发送请求。其中`prompt`字段为模型输入: ```bash PROMPT="I am" -curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "$PROMPT", "max_tokens": 120, "temperature": 0}' +curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}' ``` 若请求处理成功,将获得以下的推理结果: ```text { - "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion", - "create":1747398389, + "id":"cmpl-bac2b14c726b48b9967bcfc724e7c2a8","object":"text_completion", + "create":1748485893, "model":"Qwen2.5-7B-Instruct", "choices":[ { "index":0, - "trying to create a virtual environment for my Python project, but I am encountering some", + "trying to create a virtual environment for my Python project, but I am encountering some issues with setting up", "logprobs":null, "finish_reason":"length", "stop_reason":null, @@ -228,8 +228,8 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ], "usage":{ "prompt_tokens":2, - "total_tokens":17, - "completion_tokens":15, + "total_tokens":22, + "completion_tokens":20, "prompt_tokens_details":null } } diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md index 5484b56c40..4e4fbdc76a 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md @@ -171,25 +171,23 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 ### 发送请求 -使用如下命令发送请求。其中`$PROMPT`为模型输入: +使用如下命令发送请求。其中`prompt`字段为模型输入: ```bash -PROMPT="请介绍Qwen2.5-32B模型" -MAX_TOKEN=64 -curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "$PROMPT", "max_tokens": $MAX_TOKEN, "temperature": 0}' +curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}' ``` 若请求处理成功,将获得以下的推理结果: ```text { - "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion", - "create":1747398389, + "id":"cmpl-11fe2898c77d4ff18c879f57ae7aa9ca","object":"text_completion", + "create":1748568696, "model":"Qwen2.5-32B-Instruct", "choices":[ { "index":0, - "text":"的使用方法\nQwen2.5-32B 是一个大型的自然语言处理模型,通常用于生成文本、回答问题、进行对话等任务。以下是使用 Qwen2.5-32B 模型的一般步骤:\n\n### 1. 环境准备\n-", + "text":"trying to create a virtual environment in Python using venv, but I am encountering some issues with setting", "logprobs":null, "finish_reason":"length", "stop_reason":null, @@ -197,9 +195,9 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d } ], "usage":{ - "prompt_tokens":12, - "total_tokens":76, - "completion_tokens":64, + "prompt_tokens":2, + "total_tokens":22, + "completion_tokens":20, "prompt_tokens_details":null } } diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md index 72ca0cc4cf..75cafa0566 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md @@ -206,25 +206,23 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 #### 发送请求 -使用如下命令发送请求。其中`$PROMPT`为模型输入: +使用如下命令发送请求。其中`prompt`字段为模型输入: ```bash -PROMPT="I am" -MAX_TOKEN=120 -curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "$PROMPT", "max_tokens": $MAX_TOKEN, "temperature": 0}' +curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}' ``` 若请求处理成功,将获得以下的推理结果: ```text { - "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion", - "create":1747398389, + "id":"cmpl-bac2b14c726b48b9967bcfc724e7c2a8","object":"text_completion", + "create":1748485893, "model":"Qwen2.5-7B-Instruct", "choices":[ { "index":0, - "trying to create a virtual environment for my Python project, but I am encountering some", + "trying to create a virtual environment for my Python project, but I am encountering some issues with setting up", "logprobs":null, "finish_reason":"length", "stop_reason":null, @@ -233,8 +231,8 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ], "usage":{ "prompt_tokens":2, - "total_tokens":17, - "completion_tokens":15, + "total_tokens":22, + "completion_tokens":20, "prompt_tokens_details":null } } diff --git a/docs/vllm_mindspore/docs/source_zh_cn/index.rst b/docs/vllm_mindspore/docs/source_zh_cn/index.rst index 82db28882a..0045c52e4b 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/index.rst +++ b/docs/vllm_mindspore/docs/source_zh_cn/index.rst @@ -28,7 +28,7 @@ vLLM MindSpore插件以将MindSpore大模型接入vLLM,并实现服务化部 -vLLM MindSpore采用vLLM社区推荐的插件机制,实现能力注册。未来期望遵循 `RPC Multi-framework support for vllm `_ 所述原则,推动上游vLLM社区通过抽象和解耦AI框架,支持接入包括PaddlePaddle、JAX等多类型AI框架推理能力。 +vLLM MindSpore采用vLLM社区推荐的插件机制,实现能力注册。未来期望遵循 `RPC Multi-framework support for vllm `_ 所述原则。 代码仓地址: diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md index efb2a2e5cb..8c7c4e853f 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md @@ -6,6 +6,8 @@ 实际开发中,可根据项目需求扩展更多功能,算子实现细节可参考 [MindSpore 自定义算子实现方式](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_customopbuilder.html)。 +**目前该特性只支持动态图场景。** + ## 文件组织结构 接入自定义算子需要在 vLLM MindSpore 项目的 `vllm_mindspore/ops` 目录下添加代码,目录结构如下: diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md index bd21b8fc07..67d32e620c 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md @@ -5,17 +5,16 @@ | 模型 | 是否支持 | 模型下载链接 | 模型后端 | |-------| --------- | ---- | ---- | | Qwen2.5 | √ | [Qwen2.5-7B](https://modelers.cn/models/AI-Research/Qwen2.5-7B)、[Qwen2.5-32B](https://modelers.cn/models/AI-Research/Qwen2.5-32B) 等 | MINDFORMER_MODELS | -| Qwen2.5-VL | √ | [Qwen2.5-VL-7B](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)、[Qwen2.5-VL-72B](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct) 等 | NATIVE_MODELS | | Qwen3 | √ | [Qwen3-8B](https://modelers.cn/models/MindSpore-Lab/Qwen3-8B)、[Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B) 等 | MINDFORMER_MODELS | | DeepSeek V3 | √ | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3) 等 | MINDFORMER_MODELS | | DeepSeek R1 | √ | [DeepSeek-R1](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1)、[Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8) 等 | MINDFORMER_MODELS | -其中,“模型后端”指模型的来源是来自于MindSpore Transformers和vllm-mindspore原生模型,使用环境变量`vLLM_MODEL_BACKEND`进行指定: +其中,“模型后端”指模型的来源是来自于MindSpore Transformers和vLLM MindSpore原生模型,使用环境变量`vLLM_MODEL_BACKEND`进行指定: - 模型来源为MindSpore Transformers时,则取值为`MINDFORMER_MODELS`; - 模型来源为vllm-mindspore时,则取值为`NATIVE_MODELS`; -该值默认原生模型,当需要更改模型后端时,使用如下命令: +该值默认`NATIVE_MODELS`,当需要更改模型后端时,使用如下命令: ```bash export vLLM_MODEL_BACKEND=MINDFORMER_MODELS -- Gitee