diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
index 73851332d61d4d9f33665d31dfb44915d1071bb9..7299d810a19425a57052642c19cb664345246e7d 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
@@ -4,8 +4,7 @@
 
 This document describes the steps to install the vLLM MindSpore environment. Three installation methods are provided:  
 
-- [Docker Installation](#docker-installation): Suitable for quick deployment scenarios.  
-- [Pip Installation](#pip-installation): Suitable for scenarios requiring specific versions.  
+- [Docker Installation](#docker-installation): Suitable for quick deployment scenarios.
 - [Source Code Installation](#source-code-installation): Suitable for incremental development of vLLM MindSpore.  
 
 ## Version Compatibility
@@ -14,19 +13,19 @@ This document describes the steps to install the vLLM MindSpore environment. Thr
 - Python: 3.9 / 3.10 / 3.11  
 - Software version compatibility  
 
-  | Software | Version | Corresponding Branch |  
-  | -------- | ------- | -------------------- |  
-  | [CANN](https://www.hiascend.com/developer/download/community/result?module=cann) | 8.1 | - |  
-  | [MindSpore](https://www.mindspore.cn/install/) | 2.7 | master |  
-  | [MSAdapter](https://git.openi.org.cn/OpenI/MSAdapter) | 0.2 | master |  
-  | [MindSpore Transformers](https://gitee.com/mindspore/mindformers) | 1.6 | dev |  
-  | [Golden Stick](https://gitee.com/mindspore/golden-stick) | 1.1.0 | r1.1.0 |  
-  | [vLLM](https://github.com/vllm-project/vllm) | 0.9.1 | v0.9.1 |  
-  | [vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.3 | master |  
+   | Software | Version And Links |
+   | -----    | -----   |
+   |[CANN](https://www.hiascend.com/developer/download/community/result?module=cann)     |   [8.1.RC1](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Debian&Software=cannToolKit)      |
+   |[MindSpore](https://www.mindspore.cn/install/) |  [2.7.0](https://repo.mindspore.cn/mindspore/mindspore/version/202508/20250814/master_20250814091143_7548abc43af03319bfa528fc96d0ccd3917fcc9c_newest/unified/)    |
+   |[MSAdapter](https://git.openi.org.cn/OpenI/MSAdapter)| [0.5.0](https://repo.mindspore.cn/mindspore/msadapter/version/202508/20250814/master_20250814010018_4615051c43eef898b6bbdc69768656493b5932f8_newest/any/) |
+   |[MindSpore Transformers](https://gitee.com/mindspore/mindformers)| [1.6.0](https://gitee.com/mindspore/mindformers)  |
+   |[Golden Stick](https://gitee.com/mindspore/golden-stick)| [1.2.0](https://repo.mindspore.cn/mindspore/golden-stick/version/202508/20250814/master_20250814010017_2713821db982330b3bcd6d84d85a3b337d555f27_newest/any/)  |
+   |[vLLM](https://github.com/vllm-project/vllm)      | [0.9.1](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/) |
+   |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | [0.3.0](https://gitee.com/mindspore/vllm-mindspore/) |
 
 ## Environment Setup
 
-This section introduces three installation methods: [Docker Installation](#docker-installation), [Pip Installation](#pip-installation), [Source Code Installation](#source-code-installation), and [Quick Verification](#quick-verification) example to check the installation.  
+This section introduces two installation methods: [Docker Installation](#docker-installation), [Source Code Installation](#source-code-installation), and [Quick Verification](#quick-verification) example to check the installation.  
 
 ### Docker Installation
 
@@ -106,56 +105,109 @@ docker exec -it $DOCKER_NAME bash
 
 ### Source Code Installation
 
-- **CANN Installation**
+#### CANN Installation
 
-  For CANN installation methods and environment configuration, please refer to [CANN Community Edition Installation Guide](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit). If you encounter any issues during CANN installation, please consult the [Ascend FAQ](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html) for troubleshooting.
+For CANN installation methods and environment configuration, please refer to [CANN Community Edition Installation Guide](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit). If you encounter any issues during CANN installation, please consult the [Ascend FAQ](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html) for troubleshooting.
 
-  The default installation path for CANN is `/usr/local/Ascend`. After completing CANN installation, configure the environment variables with the following commands:
+The default installation path for CANN is `/usr/local/Ascend`. After completing CANN installation, configure the environment variables with the following commands:
 
-  ```bash
-  LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
-  source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
-  export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
-  ```
+```bash
+LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
+source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
+export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
+```
 
-- **vLLM Prerequisites Installation**
+#### vLLM Prerequisites Installation
 
   For vLLM environment configuration and installation methods, please refer to the [vLLM Installation Guide](https://docs.vllm.ai/en/v0.9.1/getting_started/installation/cpu.html). In vllM installation, `gcc/g++ >= 12.3.0` is required, and it could be  installed by the following command:
 
-  ```bash
-  yum install -y gcc gcc-c++
-  ```
+```bash
+yum install -y gcc gcc-c++
+```
+
+#### vLLM MindSpore Installation
+
+vLLM MindSpore can be installed in the following two ways. **vLLM MindSpore One-click Installation** is suitable for scenarios where users need quick deployment and usage. **vLLM MindSpore Manual Installation** is suitable for scenarios where users require custom modifications to the components.
+
+- **vLLM MindSpore One-click Installation**
+
+    To install vLLM MindSpore, user needs to pull the vLLM MindSpore source code and then runs the following command to install the dependencies:
+
+    ```bash  
+    git clone https://gitee.com/mindspore/vllm-mindspore.git  
+    cd vllm-mindspore  
+    bash install_depend_pkgs.sh  
+    ```  
+
+    Compile and install vLLM MindSpore:  
+
+    ```bash  
+    pip install .  
+    ```
+
+    After executing the above commands, `mindformers` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. Add this folder to the environment variables:  
+
+    ```bash
+    export PYTHONPATH=$MF_PATH:$PYTHONPATH  
+    ```
+
+- **vLLM MindSpore Manual Installation**
+
+    If user need to modify the components or use other versions, components need to be manually installed in a specific order. Version compatibility of vLLM MindSpore can be found [Version Compatibility](#version-compatibility), abd vLLM MindSpore requires the following installation sequence:  
+
+    1. Install vLLM  
+
+       ```bash  
+       pip install /path/to/vllm-*.whl  
+       ```  
+
+    2. Uninstall Torch-related components  
+
+       ```bash  
+       pip uninstall torch torch-npu torchvision torchaudio -y  
+       ```  
+
+    3. Install MindSpore  
+
+       ```bash  
+       pip install /path/to/mindspore-*.whl  
+       ```  
+
+    4. Clone the MindSpore Transformers repository and add it to `PYTHONPATH`  
+
+       ```bash  
+       git clone https://gitee.com/mindspore/mindformers.git  
+       export PYTHONPATH=$MF_PATH:$PYTHONPATH  
+       ```  
 
-- **vLLM MindSpore Installation**
+    5. Install Golden Stick  
 
-  To install vLLM MindSpore, user needs to pull the vLLM MindSpore source code and then runs the following command to install the dependencies:
+       ```bash  
+       pip install /path/to/mindspore_gs-*.whl  
+       ```  
 
-  ```bash  
-  git clone https://gitee.com/mindspore/vllm-mindspore.git  
-  cd vllm-mindspore  
-  bash install_depend_pkgs.sh  
-  ```  
+    6. Install MSAdapter  
 
-  Compile and install vLLM MindSpore:  
+       ```bash  
+       pip install /path/to/msadapter-*.whl  
+       ```  
 
-  ```bash  
-  pip install .  
-  ```
+    7. Install vLLM MindSpore
 
-  After executing the above commands, `mindformers` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. Add this folder to the environment variables:  
+       User needs to pull source of vLLM MindSpore, and run installation.
 
-  ```bash
-  export PYTHONPATH=$MF_PATH:$PYTHONPATH  
-  ```
+       ```bash  
+       git clone https://gitee.com/mindspore/vllm-mindspore.git
+       cd vllm-mindspore
+       pip install .  
+       ```
 
-### Quick Verification
+## Quick Verification
 
 User can verify the installation with a simple offline inference test. First, user need to configure the environment variables with the following command:
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
index c0eaf16c347299abc75285ffc1d57c50403d5fee..91a88e814de3e6d37ed6edb5b4a37615dc58ffac 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
@@ -131,18 +131,14 @@ git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
 Before launching the model, user need to set the following environment variables:  
 
 ```bash  
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.  
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.  
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation.  
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.  
 ```  
 
 Here is an explanation of these environment variables:  
 
-- `ASCEND_TOTAL_MEMORY_GB`: The memory size of each card. User can check the memory by using `npu-smi info`, where the value corresponds to `HBM-Usage(MB)` in the query results.
 - `vLLM_MODEL_BACKEND`: The backend of the model to run. User could find supported models and backends for vLLM MindSpore in the [Model Support List](../../user_guide/supported_models/models_list/models_list.md).  
-- `vLLM_MODEL_MEMORY_USE_GB`: The memory reserved for model loading. Adjust this value if insufficient memory error occurs during model loading.  
-- `MINDFORMERS_MODEL_CONFIG`: The model configuration file.  
+- `MINDFORMERS_MODEL_CONFIG`: The model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5). For Qwen2.5-7B, the YAML file is [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml).
 
 Additionally, users need to ensure that MindSpore Transformers is installed. Users can add it by running the following command:  
 
@@ -202,7 +198,7 @@ Use the model `Qwen/Qwen2.5-7B-Instruct` and start the vLLM service with the fol
 python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-7B-Instruct"
 ```  
 
-If the service starts successfully, similar output will be obtained:  
+User can also set the local model path by `--model` argument. If the service starts successfully, similar output will be obtained:  
 
 ```text  
 INFO:   Started server process [6363]  
@@ -224,6 +220,8 @@ Use the following command to send a request, where `prompt` is the model input:
 curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 15, "temperature": 0}'  
 ```  
 
+User needs to ensure that the `"model"` field matches the `--model` in the service startup, and the request can successfully match the model.
+
 If the request is processed successfully, the following inference result will be returned:  
 
 ```text  
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 7a3a9fb4a83667a8d74de50b205d55a4a8a7a928..9c369446a24fd0f0cd955eeb0a369d4293266e07 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -52,8 +52,8 @@ Execute the following Python script to download the MindSpore-compatible DeepSee
 
 ```python  
 from openmind_hub import snapshot_download
-snapshot_download(repo_id="MindSpore-Lab/DeepSeek-R1-W8A8",
-                  local_dir="/path/to/save/deepseek_r1_w8a8",
+snapshot_download(repo_id="MindSpore-Lab/DeepSeek-R1-0528-A8W8",
+                  local_dir="/path/to/save/deepseek_r1_0528_a8w8",
                   local_dir_use_symlinks=False)
 ```
 
@@ -78,7 +78,7 @@ If the tool is unavailable, install [git-lfs](https://git-lfs.com) first. Refer
 Once confirmed, download the weights by executing the following command:
 
 ```shell  
-git clone https://modelers.cn/MindSpore-Lab/DeepSeek-R1-W8A8.git  
+git clone https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-0528-A8W8.git  
 ```  
 
 ## TP16 Tensor Parallel Inference
@@ -241,18 +241,20 @@ Execution example:
 
 ```bash  
 # Master node:  
-vllm-mindspore serve --model="/path/to/save/deepseek_r1_w8a8" --trust-remote-code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 16 --distributed-executor-backend=ray
+vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 16 --distributed-executor-backend=ray
 ```  
 
-In tensor parallel scenarios, the `--tensor-parallel-size` parameter overrides the `model_parallel` configuration in the model YAML file.  
+In tensor parallel scenarios, the `--tensor-parallel-size` parameter overrides the `model_parallel` configuration in the model YAML file. User can also set the local model path by `--model` argument.
 
 #### Sending Requests
 
 Use the following command to send requests, where `prompt` is the model input:  
 
 ```bash  
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "/path/to/save/deepseek_r1_w8a8", "prompt": "I am", "max_tokens": 20, "temperature": 0, "top_p": 1.0, "top_k": 1, "repetition_penalty": 1.0}'  
-```  
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "MindSpore-Lab/DeepSeek-R1-0528-A8W8", "prompt": "I am", "max_tokens": 20, "temperature": 0, "top_p": 1.0, "top_k": 1, "repetition_penalty": 1.0}'  
+```
+
+User needs to ensure that the `"model"` field matches the `--model` in the service startup, and the request can successfully match the model.
 
 ## Hybrid Parallel Inference
 
@@ -301,6 +303,8 @@ parallel_config:
 
 ### Online Inference
 
+#### Starting the Service
+
 `vllm-mindspore` can deploy online inference using the OpenAI API protocol. Below is the workflow for launching the service:  
 
 ```bash  
@@ -321,22 +325,24 @@ vllm-mindspore serve
  --data-parallel-address [Master node communication IP]  
  --data-parallel-rpc-port [Master node communication port]  
  --enable-expert-parallel # Enable expert parallelism  
-```  
+```
 
-Execution example:  
+User can also set the local model path by `--model` argument. The following is an execution example:  
 
 ```bash  
 # Master node:  
-vllm-mindspore serve --model="/path/to/save/deepseek_r1_w8a8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 0 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel  
+vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 0 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel  
 
 # Worker node:  
-vllm-mindspore serve --headless --model="/path/to/save/deepseek_r1_w8a8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel  
-```  
+vllm-mindspore serve --headless --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel  
+```
 
-## Sending Requests
+#### Sending Requests
 
 Use the following command to send requests, where `prompt` is the model input:  
 
 ```bash  
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "/path/to/save/deepseek_r1_w8a8", "prompt": "I am", "max_tokens": 20, "temperature": 0}'  
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "MindSpore-Lab/DeepSeek-R1-0528-A8W8", "prompt": "I am", "max_tokens": 20, "temperature": 0}'  
 ```
+
+User needs to ensure that the `"model"` field matches the `--model` in the service startup, and the request can successfully match the model.
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 24d4d4a2cac790dc5e9e8f5d5145266b896d32f5..40ad4a1597b1a469f47fbdcd06d036219c27b71a 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -127,18 +127,14 @@ For [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct), the followi
 
 ```bash
 #set environment variables
-export ASCEND_TOTAL_MEMORY_GB=64 # Use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # Use MindSpore TransFormers as the model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Adjust based on the model's maximum usage, with the remaining allocated for KV cache.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model YAML file.
 ```
 
 Here is an explanation of these environment variables:  
 
-- `ASCEND_TOTAL_MEMORY_GB`: The memory size of each compute card. Query using `npu-smi info`, corresponding to `HBM-Usage(MB)` in the results.  
 - `vLLM_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).  
-- `vLLM_MODEL_MEMORY_USE_GB`: Memory reserved for model loading. Adjust this if encountering insufficient memory.  
-- `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2_5). For Qwen2.5-32B, the YAML file is [predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml).
+- `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5). For Qwen2.5-32B, the YAML file is [predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml).
 
 Users can check memory usage with `npu-smi info` and set the NPU cards for inference using the following example (assuming cards 4,5,6,7 are used):
 
@@ -160,7 +156,7 @@ export MAX_MODEL_LEN=1024
 python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-32B-Instruct" --trust_remote_code --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_MODEL_LEN
 ```
 
-Here, `TENSOR_PARALLEL_SIZE` specifies the number of NPU cards, and `MAX_MODEL_LEN` sets the maximum output token length.
+Here, `TENSOR_PARALLEL_SIZE` specifies the number of NPU cards, and `MAX_MODEL_LEN` sets the maximum output token length. User can also set the local model path by `--model` argument.
 
 If the service starts successfully, similar output will be obtained:
 
@@ -181,9 +177,11 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0
 Use the following command to send a request, where `prompt` is the model input:  
 
 ```bash
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
 ```
 
+User needs to ensure that the `"model"` field matches the `--model` in the service startup, and the request can successfully match the model.
+
 If processed successfully, the inference result will be:
 
 ```text
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index 2c360e28f8010720b10792648cd758d2fc54acab..79ed73f81296beb24f42b95e2723377dcd116040 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -127,17 +127,13 @@ For [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), the following
 
 ```bash  
 #set environment variables  
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.  
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.  
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation  
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.  
 ```  
 
 Here is an explanation of these variables:  
 
-- `ASCEND_TOTAL_MEMORY_GB`: The memory size of each compute card. Query using `npu-smi info`, corresponding to `HBM-Usage(MB)` in the results.  
 - `vLLM_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).  
-- `vLLM_MODEL_MEMORY_USE_GB`: Memory reserved for model loading. Adjust this if encountering insufficient memory.  
 - `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5). For Qwen2.5-7B, the YAML file is [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml).  
 
 User can check memory usage with `npu-smi info` and set the compute card for inference using:  
@@ -196,7 +192,7 @@ Use the model `Qwen/Qwen2.5-7B-Instruct` and start the vLLM service with the fol
 python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-7B-Instruct"  
 ```  
 
-If the service starts successfully, similar output will be obtained:
+User can also set the local model path by `--model` argument. If the service starts successfully, similar output will be obtained:
 
 ```text  
 INFO:   Started server process [6363]  
@@ -218,6 +214,8 @@ Use the following command to send a request, where `prompt` is the model input:
 curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 15, "temperature": 0}'  
 ```  
 
+User needs to ensure that the `"model"` field matches the `--model` in the service startup, and the request can successfully match the model.
+
 If the request is processed successfully, the following inference result will be returned:
 
 ```text  
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
index c1b71616260505fa7705916b0598ded2aa098674..036834798eb4e55ba82c5ff34667ff7e83b54e5f 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
@@ -11,6 +11,13 @@
 | `HCCL_SOCKET_IFNAME` | Specifies the network interface name for inter-machine communication using HCCL. | String | Interface name (e.g., `enp189s0f0`). | Used in multi-machine scenarios. The interface name can be found via `ifconfig` by matching the IP address. |
 | `ASCEND_RT_VISIBLE_DEVICES` | Specifies which devices are visible to the current process, supporting one or multiple Device IDs. | String | Device IDs as a comma-separated string (e.g., `"0,1,2,3,4,5,6,7"`). | Recommended for Ray usage scenarios. |
 | `HCCL_BUFFSIZE` | Controls the buffer size for data sharing between two NPUs. | int | Buffer size in MB (e.g., `2048`). | Usage reference: [HCCL_BUFFSIZE](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1beta1/maintenref/envvar/envref_07_0080.html). Example: For DeepSeek hybrid parallelism (Data Parallel: 32, Expert Parallel: 32) with `max-num-batched-tokens=256`, set `export HCCL_BUFFSIZE=2048`. |
-| MS_MEMPOOL_BLOCK_SIZE | Set the size of the memory pool block in PyNative mode for devices | String | String of positive number, and the unit is GB. |  |
-| vLLM_USE_NPU_ADV_STEP_FLASH_OP | Whether to use Ascend operation `adv_step_flash`  | String | `on`: Use；`off`：Not use | If the variable is set to `off`, model will use the implement of small operations. |
-| VLLM_TORCH_PROFILER_DIR | Enables profiling data collection and takes effect when a data save path is configured. | String | The path to save profiling data. | |
+| `MS_MEMPOOL_BLOCK_SIZE` | Set the size of the memory pool block in PyNative mode for devices | String | String of positive number, and the unit is GB. |  |
+| `vLLM_USE_NPU_ADV_STEP_FLASH_OP` | Whether to use Ascend operation `adv_step_flash`  | String | `on`: Use；`off`：Not use | If the variable is set to `off`, model will use the implement of small operations. |
+| `VLLM_TORCH_PROFILER_DIR` | Enables profiling data collection and takes effect when a data save path is configured. | String | The path to save profiling data. | |
+
+More environment variable information can be referred in the following links:
+
+- [CANN Environment Variable List](https://www.hiascend.com/document/detail/en/CANNCommunityEdition/81RC1beta1/index/index.html)
+- [MindSpore Environment Variable List](https://www.mindspore.cn/docs/en/master/api_python/env_var_list.html)
+- [MindSpore Transformers Environment Variable List](https://www.mindspore.cn/mindformers/docs/en/master/index.html)
+- [vLLM Environment Variable List](https://docs.vllm.ai/en/v0.8.4/serving/env_vars.html)
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
index b704a49f442c313bdc3b4a38029672b11ab39734..9ab03ff8ac88bd2612a18421d204a026c63597f5 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
@@ -9,9 +9,7 @@ The benchmark tool of vLLM MindSpore is inherited from vLLM. You can refer to th
 For single-card inference, we take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. You can prepare the environment by following the guide [Single-Card Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#online-inference), set the environment variables:
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
@@ -104,9 +102,7 @@ P99 ITL (ms):                            ....
 For offline performance benchmark, take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. Prepare the environment by following the guide [Single-Card Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#offline-inference). User need to set the environment variables:
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md
index b24b541e59a4a4184de1e3619164b7951fcddd11..897f1ec0c848a9d7608a0100a6e4411989ffe5b8 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md
@@ -40,7 +40,7 @@ curl -X POST http://127.0.0.1:8000/start_profile
 curl http://localhost:8000/v1/completions \  
     -H "Content-Type: application/json" \  
     -d '{  
-        "model": "/home/DeepSeekV3",  
+        "model": "Qwen/Qwen2.5-32B-Instruct",  
         "prompt": "San Francisco is a",  
         "max_tokens": 7,  
         "temperature": 0  
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
index 401768ca1af91c1442bee6a3b3d8960921fcea92..fa1b8f89c339d94737f7a9a5329e25c587691e8d 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
@@ -16,7 +16,7 @@ We employ [MindSpore Golden Stick's PTQ algorithm](https://gitee.com/mindspore/g
 
 ### Downloading Quantized Weights
 
-We have uploaded the quantized DeepSeek-R1 to [ModelArts Community](https://modelers.cn): [MindSpore-Lab/DeepSeek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-W8A8). Refer to the [ModelArts Community documentation](https://modelers.cn/docs/en/openmind-hub-client/0.9/basic_tutorial/download.html) to download the weights locally.
+We have uploaded the quantized DeepSeek-R1 to [ModelArts Community](https://modelers.cn): [MindSpore-Lab/DeepSeek-R1-0528-A8W8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-0528-A8W8). Refer to the [ModelArts Community documentation](https://modelers.cn/docs/en/openmind-hub-client/0.9/basic_tutorial/download.html) to download the weights locally.
 
 ## Quantized Model Inference
 
@@ -27,9 +27,7 @@ After obtaining the DeepSeek-R1 W8A8 weights, ensure they are stored in the rela
 Refer to the [Installation Guide](../../../getting_started/installation/installation.md) to set up the vLLM MindSpore environment. User need to set the following environment variables:
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
index ba825bcae5a769b9217aae4ba8fb808e819b4f85..3d9b49bd6e3f3bd9a9f9bb1cd614cc201c8d1fa9 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
@@ -6,7 +6,7 @@
 |-------| --------- | ---- |
 | DeepSeek-V3 |   Supported | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3) |
 | DeepSeek-R1 |   Supported | [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-V3) |
-| DeepSeek-R1 W8A8 |   Supported | [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8) |
+| DeepSeek-R1 W8A8 |   Supported | [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-0528-A8W8) |
 | Qwen2.5 | Supported | [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct), [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct), [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct), [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) |
 | Qwen3-32B | Supported | [Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B) |
 | Qwen3-235B-A22B | Supported | [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) |
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
index 8121916024cadc82ada5d9bbcfe0b86451ada08b..67269d941984af0fea94ca8b829ff08c93fe5f50 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
@@ -13,19 +13,19 @@
 - Python：3.9 / 3.10 / 3.11
 - 软件版本配套
 
-   | 软件 | 版本 | 对应分支 |
-   | -----    | -----   |  ----- |
-   |[CANN](https://www.hiascend.com/developer/download/community/result?module=cann)     |   8.1      |  -    |
-   |[MindSpore](https://www.mindspore.cn/install/) |  2.7    | master     |
-   |[MSAdapter](https://git.openi.org.cn/OpenI/MSAdapter)| 0.2 | master  |
-   |[MindSpore Transformers](https://gitee.com/mindspore/mindformers)|1.6      | dev |
-   |[Golden Stick](https://gitee.com/mindspore/golden-stick)|1.1.0    | r1.1.0 |
-   |[vLLM](https://github.com/vllm-project/vllm)      | 0.9.1 | v0.9.1   |
-   |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.3 | master  |
+   | 软件 | 配套版本与下载链接 |
+   | -----    | -----   |
+   |[CANN](https://www.hiascend.com/developer/download/community/result?module=cann)     |   [8.1.RC1](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Debian&Software=cannToolKit)      |
+   |[MindSpore](https://www.mindspore.cn/install/) |  [2.7.0](https://repo.mindspore.cn/mindspore/mindspore/version/202508/20250814/master_20250814091143_7548abc43af03319bfa528fc96d0ccd3917fcc9c_newest/unified/)    |
+   |[MSAdapter](https://git.openi.org.cn/OpenI/MSAdapter)| [0.5.0](https://repo.mindspore.cn/mindspore/msadapter/version/202508/20250814/master_20250814010018_4615051c43eef898b6bbdc69768656493b5932f8_newest/any/) |
+   |[MindSpore Transformers](https://gitee.com/mindspore/mindformers)| [1.6.0](https://gitee.com/mindspore/mindformers)  |
+   |[Golden Stick](https://gitee.com/mindspore/golden-stick)| [1.2.0](https://repo.mindspore.cn/mindspore/golden-stick/version/202508/20250814/master_20250814010017_2713821db982330b3bcd6d84d85a3b337d555f27_newest/any/)  |
+   |[vLLM](https://github.com/vllm-project/vllm)      | [0.9.1](https://repo.mindspore.cn/mirrors/vllm/version/202505/20250514/v0.8.4.dev0_newest/any/) |
+   |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | [0.3.0](https://gitee.com/mindspore/vllm-mindspore/) |
 
 ## 配置环境
 
-在本章节中，我们将介绍[docker安装](#docker安装)、[pip安装](#pip安装)、[源码安装](#源码安装)三种安装方式，以及[快速验证](#快速验证)用例，用于验证安装是否成功。
+在本章节中，我们将介绍[docker安装](#docker安装)、[源码安装](#源码安装)两种安装方式，以及[快速验证](#快速验证)用例，用于验证安装是否成功。
 
 ### docker安装
 
@@ -105,29 +105,33 @@ docker exec -it $DOCKER_NAME bash
 
 ### 源码安装
 
-- **CANN安装**
+#### CANN安装
 
-    CANN安装方法与环境配套，请参考[CANN社区版软件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit)，若用户在安装CANN过程中遇到问题，可参考[昇腾常见问题](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html)进行解决。
+CANN安装方法与环境配套，请参考[CANN社区版软件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit)，若用户在安装CANN过程中遇到问题，可参考[昇腾常见问题](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html)进行解决。
 
-    CANN默认安装路径为`/usr/local/Ascend`。用户在安装CANN完毕后，使用如下命令，为CANN配置环境变量：
+CANN默认安装路径为`/usr/local/Ascend`。用户在安装CANN完毕后，使用如下命令，为CANN配置环境变量：
 
-    ```bash
-    LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
-    source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
-    export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
-    ```
+```bash
+LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
+source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
+export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
+```
 
-- **vLLM前置依赖安装**
+#### vLLM前置依赖安装
 
     vLLM的环境配置与安装方法，请参考[vLLM安装教程](https://docs.vllm.ai/en/v0.9.1/getting_started/installation/cpu.html)。其依赖`gcc/g++ >= 12.3.0`版本，可通过以下命令完成安装：
 
-    ```bash
-    yum install -y gcc gcc-c++
-    ```
+```bash
+yum install -y gcc gcc-c++
+```
+
+#### vLLM MindSpore安装
 
-- **vLLM MindSpore安装**
+vLLM MindSpore有以下两种安装方式。**vLLM MindSpore一键式安装**适用于用户快速使用与部署的场景。**vLLM MindSpore手动安装**适用于用户对组件有自定义修改的场景。
 
-    安装vLLM MindSpore，需要在拉取vLLM MindSpore源码后，执行以下命令，安装依赖包：
+- **vLLM MindSpore一键式安装**
+
+    采用一键式安装脚本来安装vLLM MindSpore，需要在拉取vLLM MindSpore源码后，执行以下命令，安装依赖包：
 
     ```bash
     git clone https://gitee.com/mindspore/vllm-mindspore.git
@@ -147,14 +151,63 @@ docker exec -it $DOCKER_NAME bash
     export PYTHONPATH=$MF_PATH:$PYTHONPATH
     ```
 
-### 快速验证
+- **vLLM MindSpore手动安装**
+
+    若用户对组件有修改，或者需使用其他版本，则用户需要按照特定顺序，手动安装组件。vLLM MindSpore软件配套下载地址可以参考[版本配套](#版本配套)，且对组件的安装顺序要求如下：
+
+    1. 安装vLLM
+
+        ```bash
+        pip install /path/to/vllm-*.whl
+        ```
+
+    2. 卸载torch相关组件
+
+        ```bash
+        pip uninstall torch torch-npu torchvision torchaudio -y
+        ```
+
+    3. 安装MindSpore
+
+        ```bash
+        pip install /path/to/mindspore-*.whl
+        ```
+
+    4. 引入MindSpore Transformers仓，加入到`PYTHONPATH`中
+
+        ```bash
+        git clone https://gitee.com/mindspore/mindformers.git
+        export PYTHONPATH=$MF_PATH:$PYTHONPATH
+        ```
+
+    5. 安装Golden Stick
+
+        ```bash
+        pip install /path/to/mindspore_gs-*.whl
+        ```
+
+    6. 安装MSAdapter
+
+        ```bash
+        pip install /path/to/msadapter-*.whl
+        ```
+
+    7. 安装vLLM MindSpore
+
+        需要先拉取vLLM MindSpore源码，再执行安装
+
+        ```bash
+        git clone https://gitee.com/mindspore/vllm-mindspore.git
+        cd vllm-mindspore
+        pip install .
+        ```
+
+## 快速验证
 
 用户可以创建一个简单的离线推理场景，验证安装是否成功。下面以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例。首先用户需要执行以下命令，设置环境变量：
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
index 2bf629908a752b2c386f752e33355339b9ec6069..addd3951d0d120bf3d7a2e3e237b20ddad6a32d4 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
@@ -131,18 +131,14 @@ git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
 用户在拉起模型前，需设置以下环境变量：
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
 以下是对上述环境变量的解释：
 
-- `ASCEND_TOTAL_MEMORY_GB`: 每一张计算卡的显存大小。用户可使用`npu-smi info`命令进行查询，该值对应查询结果中的`HBM-Usage(MB)`；
 - `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM MindSpore所支持的模型与模型后端，可在[模型支持列表](../../user_guide/supported_models/models_list/models_list.md)中进行查询；
-- `vLLM_MODEL_MEMORY_USE_GB`：模型加载时所用空间，根据用户所使用的模型进行设置。若用户在模型加载过程中遇到显存不足时，可适当增大该值并重试；
-- `MINDFORMERS_MODEL_CONFIG`：模型配置文件。
+- `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-7B为例，则其yaml文件为[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml)。
 
 另外，用户需要确保MindSpore Transformers已安装。用户可通过
 
@@ -202,7 +198,7 @@ vLLM MindSpore可使用OpenAI的API协议，进行在线推理部署。以下是
 python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-7B-Instruct"
 ```
 
-若服务成功拉起，则可以获得类似的执行结果：
+用户可以通过`--model`参数，指定模型保存的本地路径。若服务成功拉起，则可以获得类似的执行结果：
 
 ```text
 INFO:   Started server process [6363]
@@ -224,7 +220,7 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0
 curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
 ```
 
-若请求处理成功，将获得以下的推理结果：
+其中，用户需确认`"model"`字段与启动服务中`--model`一致，请求才能成功匹配到模型。若请求处理成功，将获得以下推理结果：
 
 ```text
 {
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 047e7b4aad2f9239abc18c111f8f576867c3a8be..813a0dd588763cd4c46b31331bec7e88edd158c5 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -94,8 +94,8 @@ docker exec -it $DOCKER_NAME bash
 
 ```python
 from openmind_hub import snapshot_download
-snapshot_download(repo_id="MindSpore-Lab/DeepSeek-R1-W8A8",
-                  local_dir="/path/to/save/deepseek_r1_w8a8",
+snapshot_download(repo_id="MindSpore-Lab/DeepSeek-R1-0528-A8W8",
+                  local_dir="/path/to/save/deepseek_r1_0528_a8w8",
                   local_dir_use_symlinks=False)
 ```
 
@@ -120,7 +120,7 @@ Git LFS initialized.
 工具确认可用后，执行以下命令，下载权重：
 
 ```shell
-git clone https://modelers.cn/MindSpore-Lab/DeepSeek-R1-W8A8.git
+git clone https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-0528-A8W8.git
 ```
 
 ## TP16 张量并行推理
@@ -284,19 +284,21 @@ vllm-mindspore serve
 
 ```bash
 # 主节点：
-vllm-mindspore serve --model="/path/to/save/deepseek_r1_w8a8" --trust-remote-code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 16 --distributed-executor-backend=ray
+vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 16 --distributed-executor-backend=ray
 ```
 
-张量并行场景下，`--tensor-parallel-size`参数会覆盖模型yaml文件中`parallel_config`的`model_parallel`配置。
+张量并行场景下，`--tensor-parallel-size`参数会覆盖模型yaml文件中`parallel_config`的`model_parallel`配置。用户可以通过`--model`参数，指定模型保存的本地路径。
 
 #### 发起请求
 
 使用如下命令发送请求。其中`prompt`字段为模型输入：
 
 ```bash
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "/path/to/save/deepseek_r1_w8a8", "prompt": "I am", "max_tokens": 20, "temperature": 0, "top_p": 1.0, "top_k": 1, "repetition_penalty": 1.0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "MindSpore-Lab/DeepSeek-R1-0528-A8W8", "prompt": "I am", "max_tokens": 20, "temperature": 0, "top_p": 1.0, "top_k": 1, "repetition_penalty": 1.0}'
 ```
 
+用户需确认`"model"`字段与启动服务中`--model`一致，请求才能成功匹配到模型。
+
 ## 混合并行推理
 
 vLLM 通过 Ray 对多个节点资源进行管理和运行。该样例对应以下并行策略场景：
@@ -344,6 +346,8 @@ parallel_config:
 
 ### 在线推理
 
+#### 启动服务
+
 `vllm-mindspore`可使用OpenAI的API协议部署在线推理。以下是在线推理的拉起流程：
 
 ```bash
@@ -366,20 +370,22 @@ vllm-mindspore serve
  --enable-expert-parallel # 使能专家并行
 ```
 
-执行示例：
+用户可以通过`--model`参数，指定模型保存的本地路径。以下为执行示例：
 
 ```bash
 # 主节点：
-vllm-mindspore serve --model="/path/to/save/deepseek_r1_w8a8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 0 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel
+vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 0 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel
 
 # 从节点：
-vllm-mindspore serve --headless --model="/path/to/save/deepseek_r1_w8a8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel
+vllm-mindspore serve --headless --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel
 ```
 
-## 发送请求
+#### 发送请求
 
 使用如下命令发送请求。其中`prompt`字段为模型输入：
 
 ```bash
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "/path/to/save/deepseek_r1_w8a8", "prompt": "I am, "max_tokens": 120, "temperature": 0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "MindSpore-Lab/DeepSeek-R1-0528-A8W8", "prompt": "I am, "max_tokens": 120, "temperature": 0}'
 ```
+
+用户需确认`"model"`字段与启动服务中`--model`一致，请求才能成功匹配到模型。
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 811009f6471bea2dad284efdfad79184b3fed3a8..f0901e2e0f9c461a047645f91b508d6278605e03 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -128,18 +128,14 @@ git clone https://huggingface.co/Qwen/Qwen2.5-32B-Instruct
 
 ```bash
 #set environment variables
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
 以下是对上述环境变量的解释：
 
-- `ASCEND_TOTAL_MEMORY_GB`: 每一张计算卡的显存大小。用户可使用`npu-smi info`命令进行查询，该值对应查询结果中的`HBM-Usage(MB)`。
 - `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM MindSpore所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
-- `vLLM_MODEL_MEMORY_USE_GB`：模型加载时所用空间，根据用户所使用的模型进行设置。若用户在模型加载过程中遇到显存不足时，可适当增大该值并重试。
-- `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-32B为例，则其yaml文件为[predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml) 。
+- `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-32B为例，则其yaml文件为[predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml) 。
 
 用户可通过`npu-smi info`查看显存占用情况，并可以使用如下环境变量，设置用于推理的计算卡。以下例子为假设用户使用4,5,6,7卡进行推理：
 
@@ -163,7 +159,7 @@ python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model
 
 其中，`TENSOR_PARALLEL_SIZE`为用户指定的卡数，`MAX_MODEL_LEN`为模型最大输出token数。
 
-若服务成功拉起，则可以获得类似的执行结果：
+用户可以通过`--model`参数，指定模型保存的本地路径。若服务成功拉起，则可以获得类似的执行结果：
 
 ```text
 INFO:   Started server process [6363]
@@ -182,10 +178,10 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0
 使用如下命令发送请求。其中`prompt`字段为模型输入：
 
 ```bash
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
 ```
 
-若请求处理成功，将获得以下的推理结果：
+其中，用户需确认`"model"`字段与启动服务中`--model`一致，请求才能成功匹配到模型。若请求处理成功，将获得以下推理结果：
 
 ```text
 {
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index ffc82071b2e9a98e315268756ad2a48ec9e12246..c7d8426f6dc0c258addf291672a440d3fe93af07 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -128,18 +128,14 @@ git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
 
 ```bash
 #set environment variables
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
 以下是对上述环境变量的解释：
 
-- `ASCEND_TOTAL_MEMORY_GB`: 每一张计算卡的显存大小。用户可使用`npu-smi info`命令进行查询，该值对应查询结果中的`HBM-Usage(MB)`；
 - `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM MindSpore所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询；
-- `vLLM_MODEL_MEMORY_USE_GB`：模型加载时所用空间，根据用户所使用的模型进行设置。若用户在模型加载过程中遇到显存不足时，可适当增大该值并重试；
-- `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-7B为例，则其yaml文件为[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml) 。
+- `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-7B为例，则其yaml文件为[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml) 。
 
 用户可通过`npu-smi info`查看显存占用情况，并可以使用如下环境变量，设置用于推理的计算卡：
 
@@ -198,7 +194,7 @@ vLLM MindSpore可使用OpenAI的API协议，部署为在线推理。以下是以
 python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-7B-Instruct"
 ```
 
-若服务成功拉起，则可以获得类似的执行结果：
+用户可以通过`--model`参数，指定模型保存的本地路径。若服务成功拉起，则可以获得类似的执行结果：
 
 ```text
 INFO:   Started server process [6363]
@@ -220,7 +216,7 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0
 curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
 ```
 
-若请求处理成功，将获得以下的推理结果：
+其中，用户需确认`"model"`字段与启动服务中`--model`一致，请求才能成功匹配到模型。若请求处理成功，将获得以下推理结果：
 
 ```text
 {
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
index 7fd53b3ff3ee7ca8084c66a5942184e2a1fdff73..b5e2aefd2d27953a0ec22dc8cd05a96117ae8bdd 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
@@ -4,13 +4,20 @@
 
 |   环境变量   |   功能   |   类型   |   取值   |   说明   |
 |   ------   |   -------  |   ------   |   ------   |   ------   |
-|   vLLM_MODEL_BACKEND   |   用于指定模型后端。使用vLLM MindSpore原生模型后端时无需指定；使用模型为vLLM MindSpore外部后端时则需要指定。   |   String   | `MindFormers`: 模型后端为MindSpore Transformers。   |   原生模型后端当前支持Qwen2.5系列；MindSpore Transformers模型后端支持Qwen系列、DeepSeek、Llama系列模型，使用时需配置环境变量：`export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`。   |
-|   MINDFORMERS_MODEL_CONFIG   |   MindSpore Transformers模型的配置文件。使用Qwen2.5系列、DeepSeek系列模型时，需要配置文件路径。   |   String   |   模型配置文件路径。   |   **该环境变量在后续版本会被移除。** 样例：`export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml`。   |
-|   GLOO_SOCKET_IFNAME   |   用于多机之间使用gloo通信时的网口名称。   |   String   |  网口名称，例如enp189s0f0。    |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
-|   TP_SOCKET_IFNAME   |   用于多机之间使用TP通信时的网口名称。   |   String   | 网口名称，例如enp189s0f0。      |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
-| HCCL_SOCKET_IFNAME | 用于多机之间使用HCCL通信时的网口名称。 | String | 网口名称，例如enp189s0f0。  | 多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。 |
-| ASCEND_RT_VISIBLE_DEVICES | 指定哪些Device对当前进程可见，支持一次指定一个或多个Device ID。 | String | 为Device ID，逗号分割的字符串，例如"0,1,2,3,4,5,6,7"。 | ray使用场景建议使用。 |
-| HCCL_BUFFSIZE | 此环境变量用于控制两个NPU之间共享数据的缓存区大小。 | int | 缓存区大小，大小为MB。例如：`2048`。 | 使用方法参考：[HCCL_BUFFSIZE](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1beta1/maintenref/envvar/envref_07_0080.html)。例如DeepSeek 混合并行（数据并行数为32，专家并行数为32），且`max-num-batched-tokens`为256时，则`export HCCL_BUFFSIZE=2048`。 |
-| MS_MEMPOOL_BLOCK_SIZE | 设置PyNative模式下设备内存池的块大小。 | String | 正整数string，单位为GB。 |  |
-| vLLM_USE_NPU_ADV_STEP_FLASH_OP | 是否使用昇腾`adv_step_flash`算子。 | String | `on`: 使用；`off`：不使用 | 取值为`off`时，将使用小算子实现替代`adv_step_flash`算子。 |
-| VLLM_TORCH_PROFILER_DIR | 开启profiling采集数据，当配置了采集数据保存路径后生效 | String | Profiling数据保存路径。|   |
+|   `vLLM_MODEL_BACKEND`   |   用于指定模型后端。使用vLLM MindSpore原生模型后端时无需指定；使用模型为vLLM MindSpore外部后端时则需要指定。   |   String   | `MindFormers`: 模型后端为MindSpore Transformers。   |   原生模型后端当前支持Qwen2.5系列；MindSpore Transformers模型后端支持Qwen系列、DeepSeek、Llama系列模型，使用时需配置环境变量：`export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`。   |
+|   `MINDFORMERS_MODEL_CONFIG`   |   MindSpore Transformers模型的配置文件。使用Qwen2.5系列、DeepSeek系列模型时，需要配置文件路径。   |   String   |   模型配置文件路径。   |   **该环境变量在后续版本会被移除。** 样例：`export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml`。   |
+|   `GLOO_SOCKET_IFNAME`   |   用于多机之间使用gloo通信时的网口名称。   |   String   |  网口名称，例如enp189s0f0。    |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
+|   `TP_SOCKET_IFNAME`   |   用于多机之间使用TP通信时的网口名称。   |   String   | 网口名称，例如enp189s0f0。      |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
+| `HCCL_SOCKET_IFNAME` | 用于多机之间使用HCCL通信时的网口名称。 | String | 网口名称，例如enp189s0f0。  | 多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。 |
+| `ASCEND_RT_VISIBLE_DEVICES` | 指定哪些Device对当前进程可见，支持一次指定一个或多个Device ID。 | String | 为Device ID，逗号分割的字符串，例如"0,1,2,3,4,5,6,7"。 | ray使用场景建议使用。 |
+| `HCCL_BUFFSIZE` | 此环境变量用于控制两个NPU之间共享数据的缓存区大小。 | Integer | 缓存区大小，大小为MB。例如：`2048`。 | 使用方法参考：[HCCL_BUFFSIZE](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1beta1/maintenref/envvar/envref_07_0080.html)。例如DeepSeek 混合并行（数据并行数为32，专家并行数为32），且`max-num-batched-tokens`为256时，则`export HCCL_BUFFSIZE=2048`。 |
+| `MS_MEMPOOL_BLOCK_SIZE` | 设置PyNative模式下设备内存池的块大小。 | String | 正整数string，单位为GB。 |  |
+| `vLLM_USE_NPU_ADV_STEP_FLASH_OP` | 是否使用昇腾`adv_step_flash`算子。 | String | `on`: 使用；`off`：不使用 | 取值为`off`时，将使用小算子实现替代`adv_step_flash`算子。 |
+| `VLLM_TORCH_PROFILER_DIR` | 开启profiling采集数据，当配置了采集数据保存路径后生效 | String | Profiling数据保存路径。|   |
+
+更多的环境变量信息，请查看：
+
+- [CANN 环境变量列表](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1beta1/index/index.html)
+- [MindSpore 环境变量列表](https://www.mindspore.cn/docs/zh-CN/master/api_python/env_var_list.html)
+- [MindSpore Transformers 环境变量列表](https://www.mindspore.cn/mindformers/docs/zh-CN/master/index.html)
+- [vLLM 环境变量列表](https://docs.vllm.ai/en/v0.8.4/serving/env_vars.html)
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
index 15f104069996c5c0c4e32fbfdf909a2269f4074a..6d28a6ff7c60cb8c1677aeb7fe116534da81802a 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
@@ -9,9 +9,7 @@ vLLM MindSpore的性能测试能力，继承自vLLM所提供的性能测试能
 若用户使用单卡推理，以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)为例，可按照文档[单卡推理（Qwen2.5-7B）](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#在线推理)进行环境准备，设置以下环境变量：
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
@@ -104,9 +102,7 @@ P99 ITL (ms):                            ....
 用户使用离线性能测试时，以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)为例，可按照文档[单卡推理（Qwen2.5-7B）](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#离线推理)进行环境准备，设置以下环境变量：
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/profiling/profiling.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/profiling/profiling.md
index 4dc4f2ccee29c5c3d3842ba62e5381c4e01834d4..eb90283188cdae258793d3614d88431cd2063f0b 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/profiling/profiling.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/profiling/profiling.md
@@ -40,7 +40,7 @@ curl -X POST http://127.0.0.1:8000/start_profile
 curl http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
-        "model": "/home/DeepSeekV3",
+        "model": "Qwen/Qwen2.5-32B-Instruct",
         "prompt": "San Francisco is a",
         "max_tokens": 7,
         "temperature": 0
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
index 54ad35032dbdcb594ef4ae2b847beb5d41f701fe..22a83475eff16c8a311a7d7c6a12cc62dcb8f483 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
@@ -16,7 +16,7 @@
 
 ### 直接下载量化权重
 
-我们已经将量化好的DeepSeek-R1上传到[魔乐社区](https://modelers.cn)：[MindSpore-Lab/DeepSeek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-W8A8)，可以参考[魔乐社区文档](https://modelers.cn/docs/zh/openmind-hub-client/0.9/basic_tutorial/download.html)将权重下载到本地。
+我们已经将量化好的DeepSeek-R1上传到[魔乐社区](https://modelers.cn)：[MindSpore-Lab/DeepSeek-R1-0528-A8W8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-0528-A8W8)，可以参考[魔乐社区文档](https://modelers.cn/docs/zh/openmind-hub-client/0.9/basic_tutorial/download.html)将权重下载到本地。
 
 ## 量化模型推理
 
@@ -27,9 +27,7 @@
 用户可以参考[安装指南](../../../getting_started/installation/installation.md)，进行vLLM MindSpore的环境搭建。用户需设置以下环境变量：
 
 ```bash
-export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
 export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
index 2e504c0fecd265842d02a1474e0f77fdfe151eac..c64725c9e1e448999d189ec22aeab0942f012299 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
@@ -6,7 +6,7 @@
 |-------| --------- | ---- |
 | DeepSeek-V3 |   已支持 | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3) |
 | DeepSeek-R1 |   已支持 | [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-V3) |
-| DeepSeek-R1 W8A8 |   已支持 | [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8) |
+| DeepSeek-R1 W8A8 |   已支持 | [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1-0528-A8W8) |
 | Qwen2.5 | 已支持 | [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)、[Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)、[Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)、 [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)、[Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)、[Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)、[Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) |
 | Qwen3-32B | 已支持 | [Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B) |
 | Qwen3-235B-A22B | 已支持 | [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) |