From f05c519f95d1122353d0f7ab793f9c204ee1f747 Mon Sep 17 00:00:00 2001
From: horcam <zhanghongquan15@huawei.com>
Date: Thu, 19 Jun 2025 16:19:45 +0800
Subject: [PATCH] fix doc for vLLM MindSpore

---
 .../operations/npu_ops.md                     |  4 +--
 .../docs/source_en/faqs/faqs.md               | 12 +--------
 .../installation/installation.md              | 17 ++++++++++---
 .../quick_start/quick_start.md                |  8 ++++++
 .../deepseek_r1_671b_w8a8_dp4_tp4_ep4.md      | 18 +++++++++++--
 .../qwen2.5_32b_multiNPU.md                   |  2 +-
 docs/vllm_mindspore/docs/source_en/index.rst  |  2 +-
 .../environment_variables.md                  | 25 +++++++------------
 .../supported_features/benchmark/benchmark.md | 20 +++++++++------
 .../quantization/quantization.md              |  4 +--
 .../models_list/models_list.md                | 16 ++++++------
 .../operations/npu_ops.md                     |  8 +++---
 .../docs/source_zh_cn/faqs/faqs.md            | 13 +---------
 .../installation/installation.md              | 15 +++++++++--
 .../quick_start/quick_start.md                |  8 ++++++
 .../deepseek_r1_671b_w8a8_dp4_tp4_ep4.md      | 14 +++++++++++
 .../qwen2.5_32b_multiNPU.md                   |  2 +-
 .../qwen2.5_7b_singleNPU.md                   |  2 +-
 .../docs/source_zh_cn/index.rst               |  2 +-
 .../environment_variables.md                  | 25 +++++++------------
 .../supported_features/benchmark/benchmark.md | 20 +++++++++------
 .../quantization/quantization.md              |  2 +-
 .../models_list/models_list.md                | 16 ++++++------
 23 files changed, 149 insertions(+), 106 deletions(-)
 rename docs/vllm_mindspore/docs/source_en/{user_guide/supported_features => developer_guide}/operations/npu_ops.md (96%)
 rename docs/vllm_mindspore/docs/source_zh_cn/{user_guide/supported_features => developer_guide}/operations/npu_ops.md (93%)

diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md b/docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md
similarity index 96%
rename from docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md
rename to docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md
index b186332f26..fc83bd2418 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md
+++ b/docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md
@@ -1,6 +1,6 @@
 # Custom Operator Integration
 
-[![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md)  
+[![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/developer_guide/operations/npu_ops.md)  
 
 This document would introduce how to integrate a new custom operator into the vLLM MindSpore project, with the **`adv_step_flash`** operator as an example. The following sections would focus on the integration process, and user can refer to operator implementation introduction in official MindSpore tutorial: [Dynamic Graph Custom Operator Integration](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder.html).
 
@@ -89,7 +89,7 @@ MS_EXTENSION_MODULE(my_custom_op) {
 ### Operator Compilation and Testing
 
 1. **Code Integration**: Merge the code into the vLLM MindSpore project.  
-2. **Project Compilation**: Build and install the whl package containing the custom operator.  
+2. **Project Compilation**: run `pip install .` in vllm-mindspore to build and install vLLM MindSpore.  
 3. **Operator Testing**: Invoke the operator in Python:
 
     ```python
diff --git a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md
index 2d27c2c672..e032d1d0ef 100644
--- a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md
+++ b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md
@@ -60,16 +60,6 @@
 
    Check whether the CANN and MindSpore versions are correctly matched.
 
-### `resolve_transformers_fallback` Import Error When Running Qwen3
-
-- Key error message:
-
-   ```text
-   ImportError: cannot import name 'resolve_transformers_fallback' from 'vllm.model_executor.model_loader.utils'
-   ```
-
-   Try switching `vllm` to version `0.7.3`.
-
 ### `torch` Not Found When Importing `vllm_mindspore`
 
 - Key error message:
@@ -78,7 +68,7 @@
    importlib.metadata.PackageNotFoundError: No package metadata was found for torch
    ```
 
-   Execute the following commands to reinstall torch-related components:
+   Execute the following commands to uninstall torch-related components:
 
    ```bash
    pip uninstall torch
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
index 972dd8be0f..9e56504c28 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
@@ -142,20 +142,31 @@ pip install vllm_mindspore
   After executing the above commands, `mindformers-dev` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. Add this folder to the environment variables:  
 
   ```bash  
-  export MF_PATH=`pwd install_depend_pkgs/mindformers-dev`  
+  export MF_PATH=`realpath install_depend_pkgs/mindformers-dev`  
   export PYTHONPATH=$MF_PATH:$PYTHONPATH  
   ```  
 
   If MindSpore Transformers was compiled and installed from the `br_infer_deepseek_os` branch, `mindformers-os` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. In this case, adjust the `MF_PATH` environment variable to:
 
   ```bash
-  export MF_PATH=`pwd install_depend_pkgs/mindformers-os`
+  export MF_PATH=`realpath install_depend_pkgs/mindformers-os`
   export PYTHONPATH=$MF_PATH:$PYTHONPATH
   ```
 
 ### Quick Verification
 
-To verify the installation, run a simple offline inference test with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct):  
+User can verify the installation with a simple offline inference test. First, user need to configure the environment variables with the following command:
+
+```bash
+export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
+export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
+export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
+```
+
+About environment variables above, user can also refer to [here](../quick_start/quick_start.md#setting-environment-variables) for more details.
+
+User can use the following Python scripts to verify with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct):  
 
 ```python  
 import vllm_mindspore # Add this line on the top of script.
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
index 277383bf69..57a0300f18 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
@@ -136,6 +136,14 @@ Here is an explanation of these environment variables:
 - `vLLM_MODEL_MEMORY_USE_GB`: The memory reserved for model loading. Adjust this value if insufficient memory error occurs during model loading.  
 - `MINDFORMERS_MODEL_CONFIG`: The model configuration file.  
 
+Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command:  
+
+```bash  
+export PYTHONPATH=/path/to/mindformers:$PYTHONPATH  
+```  
+
+This will include MindSpore Transformers in the Python path.
+
 ### Offline Inference
 
 Taking [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example, user can perform offline inference with the following Python script:  
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 17257f7638..f07b0ccd17 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -140,8 +140,16 @@ parallel_config:
   model_parallel: 16
   pipeline_stage: 1
   expert_parallel: 1
+```
+
+Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command:  
+
+```bash  
+export PYTHONPATH=/path/to/mindformers:$PYTHONPATH  
 ```  
 
+This will include MindSpore Transformers in the Python path.
+
 ### Starting Ray for Multi-Node Cluster Management
 
 On Ascend, the pyACL package must be installed to adapt Ray. Additionally, the CANN dependency versions on all nodes must be consistent.  
@@ -156,8 +164,14 @@ In the corresponding environment, obtain the Ascend-cann-nnrt installation packa
 ./Ascend-cann-nnrt_8.0.RC1_linux-aarch64.run --noexec --extract=./
 cd ./run_package
 ./Ascend-pyACL_8.0.RC1_linux-aarch64.run --full --install-path=<install_path>
-export PYTHONPATH=<install_path>/CANN-<VERSION>/python/site-packages/:$PYTHONPATH  
-```  
+export PYTHONPATH=<install_path>/CANN-<VERSION>/python/site-packages/:$PYTHONPATH
+```
+
+If you encounter permission issues during installation, you can grant permissions using:  
+
+```bash  
+chmod -R 777 ./Ascend-pyACL_8.0.RC1_linux-aarch64.run  
+```
 
 Download the Ascend runtime package from the [Ascend homepage](https://www.hiascend.cn/developer/download/community/result?module=cann&version=8.0.RC1.beta1).  
 
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 6142a006d9..b1ab69b18a 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -81,7 +81,7 @@ Users can download the model using either [Python Tools](#downloading-with-pytho
 Execute the following Python script to download the [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) weights and files from [Hugging Face](https://huggingface.co/):
 
 ```python
-from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download
+from openmind_hub import snapshot_download
 snapshot_download(
     repo_id="Qwen/Qwen2.5-32B-Instruct",
     local_dir="/path/to/save/Qwen2.5-32B-Instruct",
diff --git a/docs/vllm_mindspore/docs/source_en/index.rst b/docs/vllm_mindspore/docs/source_en/index.rst
index 4f1f599040..836b284c20 100644
--- a/docs/vllm_mindspore/docs/source_en/index.rst
+++ b/docs/vllm_mindspore/docs/source_en/index.rst
@@ -113,7 +113,6 @@ Apache License 2.0, as found in the `LICENSE <https://gitee.com/mindspore/vllm-m
 
    user_guide/supported_models/models_list/models_list
    user_guide/supported_features/features_list/features_list
-   user_guide/supported_features/operations/npu_ops
    user_guide/supported_features/quantization/quantization
    user_guide/supported_features/profiling/profiling
    user_guide/supported_features/benchmark/benchmark
@@ -125,6 +124,7 @@ Apache License 2.0, as found in the `LICENSE <https://gitee.com/mindspore/vllm-m
    :caption: Developer Guide
    :hidden:
 
+   developer_guide/operations/npu_ops
    developer_guide/contributing
 
 .. toctree::
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
index 99d210e577..7efb3195d2 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
@@ -2,19 +2,12 @@
 
 [![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md)
 
-| Environment Variable | Required for Basic Scenarios | Function |
-|----------------------|-----------------------------|----------|
-| `export vLLM_MODEL_BACKEND=MINDFORMER_MODELS` | Running MindSpore Transformers models | Distinguishes between MindSpore Transformers and vLLM MindSpore native models (default: native models) |
-| `export PYTHONPATH=/xxx/mindformers-dev/:$PYTHONPATH` | Running models in MindSpore Transformers  Research directory | MindSpore Transformers must be installed from source, as research directory code is not packaged into whl files |
-| `export MINDFORMERS_MODEL_CONFIG=/xxx.yaml` | Running MindSpore Transformers models | Configuration file for MindSpore Transformers models |
-| `export MS_JIT_MODULES="vllm_mindspore,research"` | Greater than v0.7.3 version | Specifies modules require JIT static compilation in static graph mode; corresponds to top-level module names in imports |
-| `export GLOO_SOCKET_IFNAME=enp189s0f0` | Ray multi-machine | Used for inter-server communication in Ray multi-machine scenarios |
-| `export TP_SOCKET_IFNAME=enp189s0f0` | Ray multi-machine | Required for RPC in Ray multi-machine scenarios |
-| `export HCCL_OP_EXPANSION_MODE=AIV` | Multi-machine | Multi-machine optimization configuring communication algorithm orchestration for acceleration |
-| `export HCCL_EXEC_TIMEOUT=7200` | Multi-machine | Multi-machine optimization controlling device synchronization timeout (seconds, default: 1836) |
-| `export RUN_MODE="predict"` | Basic inference workflow (system default) | Configures network execution mode (predict mode enables optimizations) |
-| `export DEVICE_NUM_PER_NODE=16` | Multi-machine checkpoint splitting | Required for automatic weight splitting functionality (default: 8 NPUs/server) |
-| `export vLLM_USE_NPU_ADV_STEP_FLASH_OP="on"` | MSS (Multi-step scheduler) custom operators | Toggle for custom operators in MSS functionality |
-| `export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1` | Ray multi-machine | Enables Ray dependency in vLLM MindSpore |
-| `export MS_JIT=0` | Quantization scenarios (post v0.7.3) | 0: Disables JIT compilation, executing network scripts in dynamic graph (PyNative) mode |
-| `export FORCE_EAGER="true"` | Quantization scenarios (post v0.7.3) |  |
+| Environment Variable | Function | Type | Values | Description |
+|----------------------|----------|------|--------|-------------|
+| `vLLM_MODEL_BACKEND` | Specifies the model source. Required when using an external vLLM MindSpore model. | String | `MindFormers`: Model source is MindSpore Transformers | When the model source is MindSpore Transformers (e.g., Qwen2.5 series or DeepSeek series models), configure the environment variable: `export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`. |
+| `MINDFORMERS_MODEL_CONFIG` | Configuration file for MindSpore Transformers models. Required for Qwen2.5 series or DeepSeek series models. | String | Path to the model configuration file | **This environment variable will be removed in future versions.** Example: `export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml`. |
+| `GLOO_SOCKET_IFNAME` | Specifies the network interface name for inter-machine communication using gloo. | String | Interface name (e.g., `enp189s0f0`) | Used in multi-machine scenarios. The interface name can be found via `ifconfig` by matching the IP address. |
+| `TP_SOCKET_IFNAME` | Specifies the network interface name for inter-machine communication using TP. | String | Interface name (e.g., `enp189s0f0`) | Used in multi-machine scenarios. The interface name can be found via `ifconfig` by matching the IP address. |
+| `HCCL_SOCKET_IFNAME` | Specifies the network interface name for inter-machine communication using HCCL. | String | Interface name (e.g., `enp189s0f0`) | Used in multi-machine scenarios. The interface name can be found via `ifconfig` by matching the IP address. |
+| `ASCEND_RT_VISIBLE_DEVICES` | Specifies which devices are visible to the current process, supporting one or multiple Device IDs. | String | Device IDs as a comma-separated string (e.g., `"0,1,2,3,4,5,6,7"`) | Recommended for Ray usage scenarios. |
+| `HCCL_BUFFSIZE` | Controls the buffer size for data sharing between two NPUs. | int | Buffer size in MB (e.g., `2048`). | Usage reference: [HCCL_BUFFSIZE](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1beta1/maintenref/envvar/envref_07_0080.html). Example: For DeepSeek hybrid parallelism (Data Parallel: 32, Expert Parallel: 32) with `max-num-batched-tokens=256`, set `export HCCL_BUFFSIZE=2048`. |
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
index f6e66b1194..b349ddf683 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
@@ -1,10 +1,10 @@
-# Benchmark  
+# Benchmark
 
 [![View Source](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md)  
 
 The benchmark tool of vLLM MindSpore is inherited from vLLM. You can refer to the [vLLM BenchMark](https://github.com/vllm-project/vllm/blob/main/benchmarks/README.md) documentation for more details. This document introduces [Online Benchmark](#online-benchmark) and [Offline Benchmark](#offline-benchmark). Users can follow the steps to conduct performance tests.  
 
-## Online Benchmark  
+## Online Benchmark
 
 For single-GPU inference, we take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. You can prepare the environment by following the guide [NPU Single-GPU Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#online-inference), then start the online service with the following command:  
 
@@ -31,11 +31,14 @@ INFO:     Application startup complete.
 Clone the vLLM repository and import the vLLM MindSpore plugin to reuse the benchmark tools:  
 
 ```bash  
-git clone https://github.com/vllm-project/vllm.git
+export VLLM_BRANCH=v0.8.3
+git clone https://github.com/vllm-project/vllm.git -b ${VLLM_BRANCH}
 cd vllm
 sed -i '1i import vllm_mindspore' benchmarks/benchmark_serving.py
 ```  
 
+Here, $VLLM_BRANCH$ refers to the branch name of vLLM, which needs to be compatible with vLLM MindSpore. For compatibility details, please refer to [here](../../../getting_started/installation/installation.md#version-compatibility).
+
 Execute the test script:  
 
 ```bash  
@@ -93,20 +96,23 @@ For offline performance benchmark, take [Qwen2.5-7B](https://huggingface.co/Qwen
 
 Clone the vLLM repository and import the vLLM-MindSpore plugin to reuse the benchmark tools:
 
-```bash  
-git clone https://github.com/vllm-project/vllm.git
+```bash
+export VLLM_BRANCH=v0.8.3
+git clone https://github.com/vllm-project/vllm.git -b ${VLLM_BRANCH}
 cd vllm
 sed -i '1i import vllm_mindspore' benchmarks/benchmark_throughput.py
 ```  
 
-Run the test script with the following command:  
+Here, $VLLM_BRANCH$ refers to the branch name of vLLM, which needs to be compatible with vLLM MindSpore. For compatibility details, please refer to [here](../../../getting_started/installation/installation.md#version-compatibility).
+
+Run the test script with the following command. The script below will start the model automatically, and user does not need to start the model manually:  
 
 ```bash  
 python3 benchmarks/benchmark_throughput.py \  
     --model Qwen/Qwen2.5-7B-Instruct \  
     --dataset-name sonnet \  
     --dataset-path benchmarks/sonnet.txt \  
-    --num-prompts 10  
+    --num-prompts 10
 ```  
 
 If the test runs successfully, the following results will be returned:  
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
index 9fcf6f953f..97d9973500 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
@@ -12,7 +12,7 @@ We use the [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) network
 
 ### Quantizing Networks with MindSpore Golden Stick
 
-We employ [MindSpore Golden Stick's PTQ algorithm](https://gitee.com/mindspore/golden-stick/blob/master/mindspore_gs/ptq/ptq/README.md) for SmoothQuant quantization of Qwen3-8B. For detailed methods, refer to [DeepSeekR1-OutlierSuppressionLite Quantization Example](https://gitee.com/mindspore/golden-stick/blob/master/example/deepseekv3/a8w8-osl/readme.md).
+We employ [MindSpore Golden Stick's PTQ algorithm](https://gitee.com/mindspore/golden-stick/blob/master/mindspore_gs/ptq/ptq/README.md) for quantization of DeepSeek-R1. For detailed methods, refer to [DeepSeekR1-OutlierSuppressionLite Quantization Example](https://gitee.com/mindspore/golden-stick/blob/master/example/deepseekv3/a8w8-osl/readme.md).
 
 ### Downloading Quantized Weights
 
@@ -20,7 +20,7 @@ We have uploaded the quantized DeepSeek-R1 to [ModelArts Community](https://mode
 
 ## Quantized Model Inference
 
-After obtaining the Qwen3-8B OutlierSuppressionLite weights, ensure they are stored in the relative path `DeepSeek-R1-W8A8`.
+After obtaining the DeepSeek-R1 W8A8 weights, ensure they are stored in the relative path `DeepSeek-R1-W8A8`.
 
 ### Offline Inference
 
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
index 097ed295cc..1e9c312b22 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
@@ -4,18 +4,18 @@
 
 | Model | Supported | Download Link | Backend |
 |-------| --------- | ------------- | ------- |
-| Qwen2.5 |  √ | [Qwen2.5-7B](https://modelers.cn/models/AI-Research/Qwen2.5-7B), [Qwen2.5-32B](https://modelers.cn/models/AI-Research/Qwen2.5-32B), etc. | MINDFORMER_MODELS |
-| Qwen3 |   √ | [Qwen3-8B](https://modelers.cn/models/MindSpore-Lab/Qwen3-8B), [Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B), etc. | MINDFORMER_MODELS |
-| DeepSeek V3 |   √ | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3), etc. | MINDFORMER_MODELS |
-| DeepSeek R1 |   √ | [DeepSeek-R1](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1), [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8), etc. | MINDFORMER_MODELS |
+| Qwen2.5 |  √ | [Qwen2.5-7B](https://modelers.cn/models/AI-Research/Qwen2.5-7B), [Qwen2.5-32B](https://modelers.cn/models/AI-Research/Qwen2.5-32B), etc. | MindSpore Transformers  |
+| Qwen3 |   √ |[Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B), etc. | MindSpore Transformers   |
+| DeepSeek V3 |   √ | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3), etc. | MindSpore Transformers   |
+| DeepSeek R1 |   √ | [DeepSeek-R1](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1), [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8), etc. | MindSpore Transformers   |
 
 The "Backend" refers to the source of the model, which can be either from MindSpore Transformers or vLLM MindSpore native models. It is specified using the environment variable `vLLM_MODEL_BACKEND`:
 
-- If the model source is MindSpore Transformers, the value is `MINDFORMER_MODELS`;
-- If the model source is vLLM MindSpore, the value is `NATIVE_MODELS`.
+- If the model source is MindSpore Transformers, the value is `MindFormers`;
+- If the model source is vLLM MindSpore, user does not need to set the environment variable.
 
-By default, the backend is set to `NATIVE_MODELS`. To change the model backend, use the following command:
+User can change the model backend by the following command:
 
 ```bash
-export vLLM_MODEL_BACKEND=MINDFORMER_MODELS
+export vLLM_MODEL_BACKEND=MindFormers
 ```
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md
similarity index 93%
rename from docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md
rename to docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md
index 8c7c4e853f..ea13f1f9fd 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md
@@ -1,6 +1,6 @@
 # 自定义算子接入
 
-[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/operations/npu_ops.md)
+[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md)
 
 本文档将以 **`adv_step_flash`** 算子的接入为例，讲解如何在 vLLM MindSpore 项目中接入一个新的自定义算子。本文重点在于接入流程，算子的实现参考 MindSpore 官方教程：[动态图自定义算子接入方式](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_customopbuilder.html)。以下章节将介绍文件的组织结构及接入步骤。
 
@@ -90,9 +90,9 @@ MS_EXTENSION_MODULE(my_custom_op) {
 
 ### 算子编译并测试
 
-- **代码集成**：将代码集成至 vllm-mindspore 项目。
-- **编译项目**：生成并安装包含自定义算子的whl包。
-- **测试算子接口**：使用 Python 调用注册的算子接口：
+1. **代码集成**：将代码集成至 vllm-mindspore 项目。
+2. **编译项目**：于vllm-mindspore工程中，执行`pip install .`，编译安装vLLM MindSpore。
+3. **测试算子接口**：使用 Python 调用注册的算子接口：
 
     ```python
     from vllm_mindspore import npu_ops
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md b/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md
index 7815a11f1a..592302dc9f 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md
@@ -61,17 +61,6 @@
 - 解决思路：
    请检查CANN与MindSpore的配套关系是否正确。
 
-### 执行Qwen3时，报vLLM相关的`resolve_transformers_fallback`导入错误
-
-- 错误关键信息：
-
-   ```text
-   ImportError: cannot import name 'resolve_transformers_fallback' from 'vllm.model_executor.model_loader.utils'
-   ```
-
-- 解决思路：
-   请尝试将`vllm`切换为`v0.7.3`版本。
-
 ### `import vllm_mindspore`时找不到`torch`
 
 - 错误关键信息：
@@ -81,7 +70,7 @@
    ```
 
 - 解决思路：
-   请执行以下命令，下载torch相关组件：
+   请执行以下命令，卸载torch相关组件：
 
    ```bash
    pip uninstall torch
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
index 90ee43d264..b3c6e2c179 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
@@ -143,11 +143,11 @@ pip install vllm_mindspore
     上述命令执行完毕之后，将在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-dev`文件夹，将其加入到环境变量中：
 
     ```bash
-    export MF_PATH=`pwd install_depend_pkgs/mindformers-dev`
+    export MF_PATH=`realpath install_depend_pkgs/mindformers-dev`
     export PYTHONPATH=$MF_PATH:$PYTHONPATH
     ```
 
-    若MindSpore Transformers是由`br_infer_deepseek_os`分支编译安装，则会在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-os`文件夹，则环境变量`MF_PATH`需调整为`pwd install_depend_pkgs/mindformers-os`。
+    若MindSpore Transformers是由`br_infer_deepseek_os`分支编译安装，则会在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-os`文件夹，则环境变量`MF_PATH`需调整为`realpath install_depend_pkgs/mindformers-os`。
 
 ### 快速验证
 
@@ -179,6 +179,17 @@ for output in outputs:
     print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")
 ```
 
+执行以下命令，设置环境变量：
+
+```bash
+export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
+export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
+export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
+```
+
+关于环境变量的具体含义，可参考[这里](../quick_start/quick_start.md#设置环境变量)。
+
 若成功执行，则可以获得类似的执行结果：
 
 ```text
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
index aca28df549..1bad1e345b 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
@@ -136,6 +136,14 @@ export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Tra
 - `vLLM_MODEL_MEMORY_USE_GB`：模型加载时所用空间，根据用户所使用的模型进行设置。若用户在模型加载过程中遇到显存不足时，可适当增大该值并重试；
 - `MINDFORMERS_MODEL_CONFIG`：模型配置文件。
 
+另外，用户需要确保MindSpore Transformers已安装。用户可通过
+
+```bash
+export PYTHONPATH=/path/to/mindformers:$PYTHONPATH
+```
+
+以引入MindSpore Tranformers。
+
 ### 离线推理
 
 以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例，用户可以使用如下Python脚本，进行模型的离线推理：
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index add87115e0..bcba21794d 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -142,6 +142,14 @@ parallel_config:
   expert_parallel: 1
 ```
 
+另外，用户需要确保MindSpore Transformers已安装。用户可通过
+
+```bash
+export PYTHONPATH=/path/to/mindformers:$PYTHONPATH
+```
+
+以引入MindSpore Tranformers。
+
 ### 启动 Ray 进行多节点集群管理
 
 在 Ascend 上，需要额外安装 pyACL 包来适配 Ray。且所有节点的 CANN 依赖版本需要保持一致。
@@ -159,6 +167,12 @@ cd ./run_package
 export PYTHONPATH=<install_path>/CANN-<VERSION>/python/site-packages/:$PYTHONPATH
 ```
 
+若安装过程有权限问题，可以使用以下命令加权限：
+
+```bash
+chmod -R 777 ./Ascend-pyACL_8.0.RC1_linux-aarch64.run
+```
+
 在 Ascend 的首页中可以下载 Ascend 运行包。如, 可以下载 [8.0.RC1.beta1](https://www.hiascend.cn/developer/download/community/result?module=cann&version=8.0.RC1.beta1) 对应版本的运行包。
 
 #### 多节点间集群
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 4e4fbdc76a..30e8851e1c 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -81,7 +81,7 @@ docker exec -it $DOCKER_NAME bash
 执行以下 Python 脚本，从[Huggingface Face社区](https://huggingface.co/)下载 [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) 权重及文件：
 
 ```python
-from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download
+from openmind_hub import snapshot_download
 snapshot_download(
     repo_id="Qwen/Qwen2.5-32B-Instruct",
     local_dir="/path/to/save/Qwen2.5-32B-Instruct",
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index 75cafa0566..5c32271c4a 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -81,7 +81,7 @@ docker exec -it $DOCKER_NAME bash
 执行以下 Python 脚本，从[Huggingface Face社区](https://huggingface.co/)下载 [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 权重及文件：
 
 ```python
-from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download
+from openmind_hub import snapshot_download
 snapshot_download(
     repo_id="Qwen/Qwen2.5-7B-Instruct",
     local_dir="/path/to/save/Qwen2.5-7B-Instruct",
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/index.rst b/docs/vllm_mindspore/docs/source_zh_cn/index.rst
index 7213a68b71..98e178d032 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/index.rst
+++ b/docs/vllm_mindspore/docs/source_zh_cn/index.rst
@@ -113,7 +113,6 @@ Apache 许可证 2.0，如  `LICENSE <https://gitee.com/mindspore/vllm-mindspore
 
    user_guide/supported_models/models_list/models_list
    user_guide/supported_features/features_list/features_list
-   user_guide/supported_features/operations/npu_ops
    user_guide/supported_features/quantization/quantization
    user_guide/supported_features/profiling/profiling
    user_guide/supported_features/benchmark/benchmark
@@ -125,6 +124,7 @@ Apache 许可证 2.0，如  `LICENSE <https://gitee.com/mindspore/vllm-mindspore
    :caption: 开发者指南
    :hidden:
 
+   developer_guide/operations/npu_ops
    developer_guide/contributing
 
 .. toctree::
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
index 4ba0744ef9..e3f12e85ad 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
@@ -2,19 +2,12 @@
 
 [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md)
 
-|   环境变量   |   必配基础场景   |   功能   |
-|   ------   |   ----------  |   -------  |
-|   export vLLM_MODEL_BACKEND=MINDFORMER_MODELS   |   运行MindSpore Transformers模型   |   用于区分MindSpore Transformers和vLLM MindSpore原生模型，默认原生模型   |
-|   export PYTHONPATH=/xxx/mindformers-dev/:$PYTHONPATH   |   运行MindSpore Transformers的research下模型   |   MindSpore Transformers要用源码安装，因为research目录下代码不打包到whl中   |
-|   export MINDFORMERS_MODEL_CONFIG=/xxx.yaml   |   运行MindSpore Transformers模型   |   MindSpore Transformers模型的必须配置文件   |
-|   export MS_JIT_MODULES="vllm_mindspore,research"   |   升级0.7.3后版本   |   指定静态图模式下哪些模块需要JIT静态编译，其函数方法会被编译成静态计算图; 对应import导入的顶层模块的名称   |
-|   export GLOO_SOCKET_IFNAME=enp189s0f0   |   Ray多机   |   Ray多机场景使用，用于服务器间通信   |
-|   export TP_SOCKET_IFNAME=enp189s0f0   |   Ray多机   |   Ray多机场景使用，RPC时需要设置   |
-|   export HCCL_OP_EXPANSION_MODE=AIV   |   多机   |   多机场景优化，配置通信算法的编排展开位置，用于通信加速   |
-|   export HCCL_EXEC_TIMEOUT=7200   |   多机   |   多机场景优化，控制设备间执行时同步等待的时间，单位为s，默认值为1836   |
-|   export RUN_MODE="predict"   |   推理基础流程---系统默认配置   |   配置网络执行模式，predict模式下会使能一些优化   |
-|   export DEVICE_NUM_PER_NODE=16   |   多机使用ckpt切分   |   自动权重切分要识别卡数功能依赖，单机实际NPU数量，不设置默认为8卡服务器   |
-|   export vLLM_USE_NPU_ADV_STEP_FLASH_OP="on"   |   mss（Multi-step scheduler）自定义算子   |   mss（Multi-step scheduler）功能中自定义算子开关   |
-|   export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1   |    Ray多机场景   |   使能Ray依赖   |
-|   export MS_JIT=0   |   量化场景，升级0.7.3后版本   |   0：不使用JIT即时编译，网络脚本直接按照动态图（PyNative）模式执行。   |
-|   export FORCE_EAGER="true"   |   量化场景，升级0.7.3后版本   |       |
+|   环境变量   |   功能   |   类型   |   取值   |   说明   |
+|   ------   |   -------  |   ------   |   ------   |   ------   |
+|   vLLM_MODEL_BACKEND   |   用于指定模型来源。当使用的模型为vLLM MindSpore外部模型时则需要指定。   |   String   |   MindFormers:  模型来源为MindSpore Transformers   |   当模型来源为MindSpore Transformers，使用Qwen2.5系列、DeepSeek系列模型时，需要配置环境变量：`export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`   |
+|   MINDFORMERS_MODEL_CONFIG   |   MindSpore Transformers模型的配置文件。使用Qwen2.5系列、DeepSeek系列模型时，需要配置文件路径。   |   String   |   模型配置文件路径   |   **该环境变量在后续版本会被移除。**样例：`export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml`   |
+|   GLOO_SOCKET_IFNAME   |   用于多机之间使用gloo通信时的网口名称。   |   String   |  网口名称，例如enp189s0f0    |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
+|   TP_SOCKET_IFNAME   |   用于多机之间使用TP通信时的网口名称。   |   String   | 网口名称，例如enp189s0f0      |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
+| HCCL_SOCKET_IFNAME | 用于多机之间使用HCCL通信时的网口名称。 | String | 网口名称，例如enp189s0f0  | 多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。 |
+| ASCEND_RT_VISIBLE_DEVICES | 指定哪些Device对当前进程可见，支持一次指定一个或多个Device ID。 | String | 为Device ID，逗号分割的字符串，例如"0,1,2,3,4,5,6,7" | ray使用场景建议使用 |
+| HCCL_BUFFSIZE | 此环境变量用于控制两个NPU之间共享数据的缓存区大小。 | int | 缓存区大小，大小为MB。例如：`2048` | 使用方法参考：[HCCL_BUFFSIZE](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1beta1/maintenref/envvar/envref_07_0080.html)。例如DeepSeek 混合并行（数据并行数为32，专家并行数为32），且`max-num-batched-tokens`为256时，则`export HCCL_BUFFSIZE=2048` |
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
index 981b0d40a9..3d2f484ee2 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
@@ -31,11 +31,14 @@ INFO:     Application startup complete.
 拉取vLLM代码仓，导入vLLM MindSpore插件，复用其中benchmark功能：
 
 ```bash
-git clone https://github.com/vllm-project/vllm.git
+export VLLM_BRANCH=v0.8.3
+git clone https://github.com/vllm-project/vllm.git -b ${VLLM_BRANCH}
 cd vllm
 sed -i '1i import vllm_mindspore' benchmarks/benchmark_serving.py
 ```
 
+其中，$VLLM_BRANCH$为vLLM的分支名，其需要与vLLM MindSpore相配套。配套关系可以参考[这里](../../../getting_started/installation/installation.md#版本配套)。
+
 执行测试脚本：
 
 ```bash
@@ -94,18 +97,21 @@ P99 ITL (ms):                            ....
 并拉取vLLM代码仓，导入vLLM MindSpore插件，复用其中benchmark功能：
 
 ```bash
-git clone https://github.com/vllm-project/vllm.git
+export VLLM_BRANCH=v0.8.3
+git clone https://github.com/vllm-project/vllm.git -b ${VLLM_BRANCH}
 cd vllm
 sed -i '1i import vllm_mindspore' benchmarks/benchmark_throughput.py
 ```
 
-用户可通过以下命令，运行测试脚本：
+其中，$VLLM_BRANCH$为vLLM的分支名，其需要与vLLM MindSpore相配套。配套关系可以参考[这里](../../../getting_started/installation/installation.md#版本配套)。
+
+用户可通过以下命令，运行测试脚本。该脚本将启动模型，并执行测试，用户不需要再拉起模型：
 
 ```bash
-python3 benchmarks/benchmark_throughput.py  \
-    --model Qwen/Qwen2.5-7B-Instruct  \
-    --dataset-name sonnet  \
-    --dataset-path benchmarks/sonnet.txt  \
+python3 benchmarks/benchmark_throughput.py \  
+    --model Qwen/Qwen2.5-7B-Instruct \  
+    --dataset-name sonnet \  
+    --dataset-path benchmarks/sonnet.txt \  
     --num-prompts 10
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
index 5a1dcb2850..03fce55fac 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
@@ -12,7 +12,7 @@
 
 ### 使用MindSpore金箍棒量化网络
 
-我们将使用[MindSpore 金箍棒的PTQ算法](https://gitee.com/mindspore/golden-stick/blob/master/mindspore_gs/ptq/ptq/README_CN.md)对DeepSeek-R1网络进行OutlierSuppressionLite量化，详细方法参考[DeepSeekR1-OutlierSuppressionLite量化样例](https://gitee.com/mindspore/golden-stick/blob/master/example/deepseekv3/a8w8-osl/readme.md)
+我们将使用[MindSpore 金箍棒的PTQ算法](https://gitee.com/mindspore/golden-stick/blob/master/mindspore_gs/ptq/ptq/README_CN.md)对DeepSeek-R1网络进行量化，详细方法参考[DeepSeekR1-OutlierSuppressionLite量化样例](https://gitee.com/mindspore/golden-stick/blob/master/example/deepseekv3/a8w8-osl/readme.md)
 
 ### 直接下载量化权重
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
index a108f92ae7..1f317226ec 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
@@ -4,18 +4,18 @@
 
 | 模型 | 是否支持 | 模型下载链接 | 模型后端 |
 |-------| --------- | ---- | ---- |
-| Qwen2.5 |  √ | [Qwen2.5-7B](https://modelers.cn/models/AI-Research/Qwen2.5-7B)、[Qwen2.5-32B](https://modelers.cn/models/AI-Research/Qwen2.5-32B) 等 | MINDFORMER_MODELS |
-| Qwen3 |   √ | [Qwen3-8B](https://modelers.cn/models/MindSpore-Lab/Qwen3-8B)、[Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B) 等 | MINDFORMER_MODELS |
-| DeepSeek V3 |   √ | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3) 等 | MINDFORMER_MODELS |
-| DeepSeek R1 |   √ | [DeepSeek-R1](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1)、[Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8) 等 | MINDFORMER_MODELS |
+| Qwen2.5 |  √ | [Qwen2.5-7B](https://modelers.cn/models/AI-Research/Qwen2.5-7B)、[Qwen2.5-32B](https://modelers.cn/models/AI-Research/Qwen2.5-32B) 等 | MindSpore Transformers  |
+| Qwen3 |   √ | [Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B) 等 | MindSpore Transformers |
+| DeepSeek V3 |   √ | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3) 等 | MindSpore Transformers |
+| DeepSeek R1 |   √ | [DeepSeek-R1](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1)、[Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8) 等 | MindSpore Transformers |
 
 其中，“模型后端”指模型的来源是来自于MindSpore Transformers和vLLM MindSpore原生模型，使用环境变量`vLLM_MODEL_BACKEND`进行指定：
 
-- 模型来源为MindSpore Transformers时，则取值为`MINDFORMER_MODELS`；
-- 模型来源为vLLM MindSpore时，则取值为`NATIVE_MODELS`；
+- 模型来源为MindSpore Transformers时，则取值为`MindFormers`；
+- 模型来源为vLLM MindSpore时，则不需设置环境变量；
 
-该值默认`NATIVE_MODELS`，当需要更改模型后端时，使用如下命令：
+当需要更改模型后端时，使用如下命令：
 
 ```bash
-export vLLM_MODEL_BACKEND=MINDFORMER_MODELS
+export vLLM_MODEL_BACKEND=MindFormers
 ```
-- 
Gitee