From f05c519f95d1122353d0f7ab793f9c204ee1f747 Mon Sep 17 00:00:00 2001 From: horcam Date: Thu, 19 Jun 2025 16:19:45 +0800 Subject: [PATCH] fix doc for vLLM MindSpore --- .../operations/npu_ops.md | 4 +-- .../docs/source_en/faqs/faqs.md | 12 +-------- .../installation/installation.md | 17 ++++++++++--- .../quick_start/quick_start.md | 8 ++++++ .../deepseek_r1_671b_w8a8_dp4_tp4_ep4.md | 18 +++++++++++-- .../qwen2.5_32b_multiNPU.md | 2 +- docs/vllm_mindspore/docs/source_en/index.rst | 2 +- .../environment_variables.md | 25 +++++++------------ .../supported_features/benchmark/benchmark.md | 20 +++++++++------ .../quantization/quantization.md | 4 +-- .../models_list/models_list.md | 16 ++++++------ .../operations/npu_ops.md | 8 +++--- .../docs/source_zh_cn/faqs/faqs.md | 13 +--------- .../installation/installation.md | 15 +++++++++-- .../quick_start/quick_start.md | 8 ++++++ .../deepseek_r1_671b_w8a8_dp4_tp4_ep4.md | 14 +++++++++++ .../qwen2.5_32b_multiNPU.md | 2 +- .../qwen2.5_7b_singleNPU.md | 2 +- .../docs/source_zh_cn/index.rst | 2 +- .../environment_variables.md | 25 +++++++------------ .../supported_features/benchmark/benchmark.md | 20 +++++++++------ .../quantization/quantization.md | 2 +- .../models_list/models_list.md | 16 ++++++------ 23 files changed, 149 insertions(+), 106 deletions(-) rename docs/vllm_mindspore/docs/source_en/{user_guide/supported_features => developer_guide}/operations/npu_ops.md (96%) rename docs/vllm_mindspore/docs/source_zh_cn/{user_guide/supported_features => developer_guide}/operations/npu_ops.md (93%) diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md b/docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md similarity index 96% rename from docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md rename to docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md index b186332f26..fc83bd2418 100644 --- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md +++ b/docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md @@ -1,6 +1,6 @@ # Custom Operator Integration -[![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md) +[![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md) This document would introduce how to integrate a new custom operator into the vLLM MindSpore project, with the **`adv_step_flash`** operator as an example. The following sections would focus on the integration process, and user can refer to operator implementation introduction in official MindSpore tutorial: [Dynamic Graph Custom Operator Integration](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder.html). @@ -89,7 +89,7 @@ MS_EXTENSION_MODULE(my_custom_op) { ### Operator Compilation and Testing 1. **Code Integration**: Merge the code into the vLLM MindSpore project. -2. **Project Compilation**: Build and install the whl package containing the custom operator. +2. **Project Compilation**: run `pip install .` in vllm-mindspore to build and install vLLM MindSpore. 3. **Operator Testing**: Invoke the operator in Python: ```python diff --git a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md index 2d27c2c672..e032d1d0ef 100644 --- a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md +++ b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md @@ -60,16 +60,6 @@ Check whether the CANN and MindSpore versions are correctly matched. -### `resolve_transformers_fallback` Import Error When Running Qwen3 - -- Key error message: - - ```text - ImportError: cannot import name 'resolve_transformers_fallback' from 'vllm.model_executor.model_loader.utils' - ``` - - Try switching `vllm` to version `0.7.3`. - ### `torch` Not Found When Importing `vllm_mindspore` - Key error message: @@ -78,7 +68,7 @@ importlib.metadata.PackageNotFoundError: No package metadata was found for torch ``` - Execute the following commands to reinstall torch-related components: + Execute the following commands to uninstall torch-related components: ```bash pip uninstall torch diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md index 972dd8be0f..9e56504c28 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md @@ -142,20 +142,31 @@ pip install vllm_mindspore After executing the above commands, `mindformers-dev` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. Add this folder to the environment variables: ```bash - export MF_PATH=`pwd install_depend_pkgs/mindformers-dev` + export MF_PATH=`realpath install_depend_pkgs/mindformers-dev` export PYTHONPATH=$MF_PATH:$PYTHONPATH ``` If MindSpore Transformers was compiled and installed from the `br_infer_deepseek_os` branch, `mindformers-os` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. In this case, adjust the `MF_PATH` environment variable to: ```bash - export MF_PATH=`pwd install_depend_pkgs/mindformers-os` + export MF_PATH=`realpath install_depend_pkgs/mindformers-os` export PYTHONPATH=$MF_PATH:$PYTHONPATH ``` ### Quick Verification -To verify the installation, run a simple offline inference test with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): +User can verify the installation with a simple offline inference test. First, user need to configure the environment variables with the following command: + +```bash +export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory. +export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend. +export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation +export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file. +``` + +About environment variables above, user can also refer to [here](../quick_start/quick_start.md#setting-environment-variables) for more details. + +User can use the following Python scripts to verify with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): ```python import vllm_mindspore # Add this line on the top of script. diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md index 277383bf69..57a0300f18 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md @@ -136,6 +136,14 @@ Here is an explanation of these environment variables: - `vLLM_MODEL_MEMORY_USE_GB`: The memory reserved for model loading. Adjust this value if insufficient memory error occurs during model loading. - `MINDFORMERS_MODEL_CONFIG`: The model configuration file. +Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command: + +```bash +export PYTHONPATH=/path/to/mindformers:$PYTHONPATH +``` + +This will include MindSpore Transformers in the Python path. + ### Offline Inference Taking [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example, user can perform offline inference with the following Python script: diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md index 17257f7638..f07b0ccd17 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md @@ -140,8 +140,16 @@ parallel_config: model_parallel: 16 pipeline_stage: 1 expert_parallel: 1 +``` + +Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command: + +```bash +export PYTHONPATH=/path/to/mindformers:$PYTHONPATH ``` +This will include MindSpore Transformers in the Python path. + ### Starting Ray for Multi-Node Cluster Management On Ascend, the pyACL package must be installed to adapt Ray. Additionally, the CANN dependency versions on all nodes must be consistent. @@ -156,8 +164,14 @@ In the corresponding environment, obtain the Ascend-cann-nnrt installation packa ./Ascend-cann-nnrt_8.0.RC1_linux-aarch64.run --noexec --extract=./ cd ./run_package ./Ascend-pyACL_8.0.RC1_linux-aarch64.run --full --install-path= -export PYTHONPATH=/CANN-/python/site-packages/:$PYTHONPATH -``` +export PYTHONPATH=/CANN-/python/site-packages/:$PYTHONPATH +``` + +If you encounter permission issues during installation, you can grant permissions using: + +```bash +chmod -R 777 ./Ascend-pyACL_8.0.RC1_linux-aarch64.run +``` Download the Ascend runtime package from the [Ascend homepage](https://www.hiascend.cn/developer/download/community/result?module=cann&version=8.0.RC1.beta1). diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md index 6142a006d9..b1ab69b18a 100644 --- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md +++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md @@ -81,7 +81,7 @@ Users can download the model using either [Python Tools](#downloading-with-pytho Execute the following Python script to download the [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) weights and files from [Hugging Face](https://huggingface.co/): ```python -from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download +from openmind_hub import snapshot_download snapshot_download( repo_id="Qwen/Qwen2.5-32B-Instruct", local_dir="/path/to/save/Qwen2.5-32B-Instruct", diff --git a/docs/vllm_mindspore/docs/source_en/index.rst b/docs/vllm_mindspore/docs/source_en/index.rst index 4f1f599040..836b284c20 100644 --- a/docs/vllm_mindspore/docs/source_en/index.rst +++ b/docs/vllm_mindspore/docs/source_en/index.rst @@ -113,7 +113,6 @@ Apache License 2.0, as found in the `LICENSE /CANN-/python/site-packages/:$PYTHONPATH ``` +若安装过程有权限问题,可以使用以下命令加权限: + +```bash +chmod -R 777 ./Ascend-pyACL_8.0.RC1_linux-aarch64.run +``` + 在 Ascend 的首页中可以下载 Ascend 运行包。如, 可以下载 [8.0.RC1.beta1](https://www.hiascend.cn/developer/download/community/result?module=cann&version=8.0.RC1.beta1) 对应版本的运行包。 #### 多节点间集群 diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md index 4e4fbdc76a..30e8851e1c 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md @@ -81,7 +81,7 @@ docker exec -it $DOCKER_NAME bash 执行以下 Python 脚本,从[Huggingface Face社区](https://huggingface.co/)下载 [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 权重及文件: ```python -from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download +from openmind_hub import snapshot_download snapshot_download( repo_id="Qwen/Qwen2.5-32B-Instruct", local_dir="/path/to/save/Qwen2.5-32B-Instruct", diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md index 75cafa0566..5c32271c4a 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md +++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md @@ -81,7 +81,7 @@ docker exec -it $DOCKER_NAME bash 执行以下 Python 脚本,从[Huggingface Face社区](https://huggingface.co/)下载 [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 权重及文件: ```python -from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download +from openmind_hub import snapshot_download snapshot_download( repo_id="Qwen/Qwen2.5-7B-Instruct", local_dir="/path/to/save/Qwen2.5-7B-Instruct", diff --git a/docs/vllm_mindspore/docs/source_zh_cn/index.rst b/docs/vllm_mindspore/docs/source_zh_cn/index.rst index 7213a68b71..98e178d032 100644 --- a/docs/vllm_mindspore/docs/source_zh_cn/index.rst +++ b/docs/vllm_mindspore/docs/source_zh_cn/index.rst @@ -113,7 +113,6 @@ Apache 许可证 2.0,如 `LICENSE