diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
index 20315c3d3b31d0fb75f9929027442d40420dce3e..a3b73899ab6f471eb4e8d247832a034e3e494151 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
@@ -203,7 +203,7 @@ vLLM-MindSpore Plugin can be installed in the following two ways. **vLLM-MindSpo
 User can verify the installation with a simple offline inference test. First, user need to configure the environment variables with the following command:
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 9080b67487908939da1d82b6a212341c274042b7..602c3e2decb9c71780689e78382bea5927150c7d 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -101,7 +101,7 @@ export MS_ENABLE_LCCL=off
 export HCCL_OP_EXPANSION_MODE=AIV
 export MS_ALLOC_CONF=enable_vmm:true
 export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export vLLM_MODEL_BACKEND=MindFormers
+export VLLM_MS_MODEL_BACKEND=MindFormers
 export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml
 ```  
 
@@ -114,7 +114,7 @@ Environment variable descriptions:
 - `HCCL_OP_EXPANSION_MODE`: Configure the communication algorithm expansion location to the AI Vector Core (AIV) computing unit on the device side.  
 - `MS_ALLOC_CONF`: Set the memory policy. Refer to the [MindSpore documentation](https://www.mindspore.cn/docs/en/master/api_python/env_var_list.html).  
 - `ASCEND_RT_VISIBLE_DEVICES`: Configure the available device IDs for each node. Use the `npu-smi info` command to check.  
-- `vLLM_MODEL_BACKEND`: The backend of the model to run. Currently supported models and backends for vLLM-MindSpore Plugin can be found in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).  
+- `VLLM_MS_MODEL_BACKEND`: The backend of the model to run. Currently supported models and backends for vLLM-MindSpore Plugin can be found in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).
 - `MINDFORMERS_MODEL_CONFIG`: Model configuration file. Users can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/deepseek3/deepseek_r1_671b), such as [predict_deepseek_r1_671b_w8a8.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml).  
 
 The model parallel strategy is specified in the `parallel_config` of the configuration file. For example, the TP16 tensor parallel configuration is as follows:  
@@ -275,7 +275,7 @@ export MS_ENABLE_LCCL=off
 export HCCL_OP_EXPANSION_MODE=AIV  
 export MS_ALLOC_CONF=enable_vmm:true  
 export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7  
-export vLLM_MODEL_BACKEND=MindFormers  
+export VLLM_MS_MODEL_BACKEND=MindFormers
 export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml  
 ```  
 
@@ -285,7 +285,7 @@ Environment variable descriptions:
 - `HCCL_OP_EXPANSION_MODE`: Configure the communication algorithm expansion location to the AI Vector Core (AIV) computing unit on the device side.  
 - `MS_ALLOC_CONF`: Set the memory policy. Refer to the [MindSpore documentation](https://www.mindspore.cn/docs/en/master/api_python/env_var_list.html).  
 - `ASCEND_RT_VISIBLE_DEVICES`: Configure the available device IDs for each node. Use the `npu-smi info` command to check.  
-- `vLLM_MODEL_BACKEND`: The backend of the model to run. Currently supported models and backends for vLLM-MindSpore Plugin can be found in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).  
+- `VLLM_MS_MODEL_BACKEND`: The backend of the model to run. Currently supported models and backends for vLLM-MindSpore Plugin can be found in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).
 - `MINDFORMERS_MODEL_CONFIG`: Model configuration file. Users can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/deepseek3/deepseek_r1_671b), such as [predict_deepseek_r1_671b_w8a8_ep4t4.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml).  
 
 The model parallel strategy is specified in the `parallel_config` of the configuration file. For example, the hybrid parallel configuration is as follows:  
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 5976859caae0ac95456128fab31c35e8d38b832e..5f1c6261931f54944e354f91bbfd5311c7548a23 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -127,13 +127,13 @@ For [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct), the followi
 
 ```bash
 #set environment variables
-export vLLM_MODEL_BACKEND=MindFormers # Use MindSpore TransFormers as the model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # Use MindSpore TransFormers as the model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model YAML file.
 ```
 
 Here is an explanation of these environment variables:  
 
-- `vLLM_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).  
+- `VLLM_MS_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).
 - `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5). For Qwen2.5-32B, the YAML file is [predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml).
 
 Users can check memory usage with `npu-smi info` and set the NPU cards for inference using the following example (assuming cards 4,5,6,7 are used):
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index ce7e679dcfc58e045b865c01dc0f6066968cb768..f8e86baa792d09225cef2b9db8b342272f51c3b0 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -127,13 +127,13 @@ For [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), the following
 
 ```bash  
 #set environment variables  
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.  
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.  
 ```  
 
 Here is an explanation of these variables:  
 
-- `vLLM_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).  
+- `VLLM_MS_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).
 - `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5). For Qwen2.5-7B, the YAML file is [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml).  
 
 User can check memory usage with `npu-smi info` and set the compute card for inference using:  
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
index 054328a0ff81c21355adebd30230661fb5ef4127..268e1497f14c5c2bdd815d9486b695d26cbd9e6f 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
@@ -4,7 +4,7 @@
 
 | Environment Variable | Function | Type | Values | Description |
 |----------------------|----------|------|--------|-------------|
-| `vLLM_MODEL_BACKEND` | Specifies the model backend. Not Required when using vLLM-MindSpore Plugin native models, and required when using an external vLLM-MindSpore Plugin models. | String | `MindFormers`: Model source is MindSpore Transformers. | vLLM-MindSpore Plugin native model backend supports Qwen2.5 series. MindSpore Transformers model backend supports Qwen/DeepSeek/Llama series models, and the environment variable: `export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH` needs to be set. |
+| `VLLM_MS_MODEL_BACKEND` | Used to specify the model backend. If this variable is not set, the backend will be automatically selected in the priority order: MindFormers > Native > MindONE; if set, the specified backend will be used. | String | `MindFormers`: Model backend is MindSpore Transformers. `Native`: Model backend is Native. `MindONE`: Model backend is MindONE. | The native model backend currently supports the Qwen2.5, Qwen2.5VL, Qwen3 and Llama series; the MindSpore Transformers backend supports Qwen, DeepSeek and TeleChat models. When using MindSpore Transformers, set the environment variable: `export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`. |
 | `MINDFORMERS_MODEL_CONFIG` | Configuration file for MindSpore Transformers models. Required for Qwen2.5 series or DeepSeek series models. | String | Path to the model configuration file | **This environment variable will be removed in future versions.** Example: `export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml`. |
 | `GLOO_SOCKET_IFNAME` | Specifies the network interface name for inter-machine communication using gloo. | String | Interface name (e.g., `enp189s0f0`). | Used in multi-machine scenarios. The interface name can be found via `ifconfig` by matching the IP address. |
 | `TP_SOCKET_IFNAME` | Specifies the network interface name for inter-machine communication using TP. | String | Interface name (e.g., `enp189s0f0`). | Used in multi-machine scenarios. The interface name can be found via `ifconfig` by matching the IP address. |
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
index e72a1fee5400497cfac63acb948619f3d4a76daa..16400d1097248adbc7b5d489da81bca95370d05f 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
@@ -9,7 +9,7 @@ The benchmark tool of vLLM-MindSpore Plugin is inherited from vLLM. You can refe
 For single-card inference, we take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. You can prepare the environment by following the guide [Single-Card Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#online-inference), set the environment variables:
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
@@ -102,7 +102,7 @@ P99 ITL (ms):                            ....
 For offline performance benchmark, take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. Prepare the environment by following the guide [Single-Card Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#offline-inference). User need to set the environment variables:
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
index fb135bfe73a1767dca700fdddbf8c22fe86e6c9a..1b0c6aaea6a9f2535b15797455f37ded267a3464 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
@@ -27,7 +27,7 @@ After obtaining the DeepSeek-R1 W8A8 weights, ensure they are stored in the rela
 Refer to the [Installation Guide](../../../getting_started/installation/installation.md) to set up the vLLM-MindSpore Plugin environment. User need to set the following environment variables:
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
index 3d9b49bd6e3f3bd9a9f9bb1cd614cc201c8d1fa9..712eb68a5cbae0a84a5d091d8e2d1fe028a46ce0 100644
--- a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
@@ -17,4 +17,4 @@
 | Llama3.2 | Testing | [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)   |
 | DeepSeek-V2 | Testing | [DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)     |
 
-Note: refer to [Environment Variable List](../../environment_variables/environment_variables.md), and set the model backend by environment variable `vLLM_MODEL_BACKEND`.
+Note: refer to [Environment Variable List](../../environment_variables/environment_variables.md), and set the model backend by environment variable `VLLM_MS_MODEL_BACKEND`.
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
index a979696d105be3b9a9936d5030f3c7262123d807..f29f8b85513dbb4b3921a6edb6c34fe41cfcc813 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
@@ -203,7 +203,7 @@ vLLM-MindSpore插件有以下两种安装方式。**vLLM-MindSpore插件快速
 用户可以创建一个简单的离线推理场景，验证安装是否成功。下面以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例。首先用户需要执行以下命令，设置环境变量：
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 87eb81a0a13c62b94bc89191982b163df7b596e1..6315735c2e42a3fb97f6f59f71c1dba77ac0a12b 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -143,7 +143,7 @@ export MS_ENABLE_LCCL=off
 export HCCL_OP_EXPANSION_MODE=AIV
 export MS_ALLOC_CONF=enable_vmm:true
 export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export vLLM_MODEL_BACKEND=MindFormers
+export VLLM_MS_MODEL_BACKEND=MindFormers
 export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml
 ```
 
@@ -156,7 +156,7 @@ export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/pre
 - `HCCL_OP_EXPANSION_MODE`: 配置通信算法的编排展开位置为Device侧的AI Vector Core计算单元。
 - `MS_ALLOC_CONF`: 设置内存策略。可参考[MindSpore官网文档](https://www.mindspore.cn/docs/zh-CN/master/api_python/env_var_list.html)。
 - `ASCEND_RT_VISIBLE_DEVICES`: 配置每个节点可用device id。用户可使用`npu-smi info`命令进行查询。
-- `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
+- `VLLM_MS_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
 - `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/deepseek3/deepseek_r1_671b)中，找到对应模型的yaml文件[predict_deepseek_r1_671b_w8a8.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml) 。
 
 模型并行策略通过配置文件中的`parallel_config`指定，例如TP16 张量并行配置如下所示：
@@ -318,7 +318,7 @@ export MS_ENABLE_LCCL=off
 export HCCL_OP_EXPANSION_MODE=AIV
 export MS_ALLOC_CONF=enable_vmm:true
 export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export vLLM_MODEL_BACKEND=MindFormers
+export VLLM_MS_MODEL_BACKEND=MindFormers
 export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml
 ```
 
@@ -328,7 +328,7 @@ export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/pre
 - `HCCL_OP_EXPANSION_MODE`: 配置通信算法的编排展开位置为Device侧的AI Vector Core计算单元。
 - `MS_ALLOC_CONF`: 设置内存策略。可参考[MindSpore官网文档](https://www.mindspore.cn/docs/zh-CN/r2.6.0/api_python/env_var_list.html)。
 - `ASCEND_RT_VISIBLE_DEVICES`: 配置每个节点可用device id。用户可使用`npu-smi info`命令进行查询。
-- `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
+- `VLLM_MS_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
 - `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/deepseek3/deepseek_r1_671b)中，找到对应模型的yaml文件[predict_deepseek_r1_671b_w8a8.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml)。
 
 模型并行策略通过配置文件中的`parallel_config`指定，例如混合并行配置如下所示：
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 67ea031dd1957444f8acd798dc68d96c16cab041..8cefb6eaed7a48161381ea8220c640efaa732207 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -128,13 +128,13 @@ git clone https://huggingface.co/Qwen/Qwen2.5-32B-Instruct
 
 ```bash
 #set environment variables
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
 以下是对上述环境变量的解释：
 
-- `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
+- `VLLM_MS_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询。
 - `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-32B为例，则其yaml文件为[predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml) 。
 
 用户可通过`npu-smi info`查看显存占用情况，并可以使用如下环境变量，设置用于推理的计算卡。以下例子为假设用户使用4,5,6,7卡进行推理：
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index 4302202913b5ffeddb4468478dc2443a069c3a1b..c1542d41799cfe67400ce4c90fe1be36726443d0 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -128,13 +128,13 @@ git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
 
 ```bash
 #set environment variables
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
 以下是对上述环境变量的解释：
 
-- `vLLM_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询；
+- `VLLM_MS_MODEL_BACKEND`：所运行的模型后端。目前vLLM-MindSpore插件所支持的模型与模型后端，可在[模型支持列表](../../../user_guide/supported_models/models_list/models_list.md)中进行查询；
 - `MINDFORMERS_MODEL_CONFIG`：模型配置文件。用户可以在[MindSpore Transformers工程](https://gitee.com/mindspore/mindformers/tree/master/research/qwen2_5)中，找到对应模型的yaml文件。以Qwen2.5-7B为例，则其yaml文件为[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/master/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml) 。
 
 用户可通过`npu-smi info`查看显存占用情况，并可以使用如下环境变量，设置用于推理的计算卡：
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
index d3f594eb29758a95e60a7035a42234fa2582d42e..4eb332bb5708cace688d4b7476bb6ea17a05ae10 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/environment_variables/environment_variables.md
@@ -4,7 +4,7 @@
 
 |   环境变量   |   功能   |   类型   |   取值   |   说明   |
 |   ------   |   -------  |   ------   |   ------   |   ------   |
-|   `vLLM_MODEL_BACKEND`   |   用于指定模型后端。使用vLLM-MindSpore插件原生模型后端时无需指定；使用模型为vLLM-MindSpore插件外部后端时则需要指定。   |   String   | `MindFormers`: 模型后端为MindSpore Transformers。   |   原生模型后端当前支持Qwen2.5系列；MindSpore Transformers模型后端支持Qwen系列、DeepSeek、Llama系列模型，使用时需配置环境变量：`export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`。   |
+|   `VLLM_MS_MODEL_BACKEND`   |   用于指定模型后端。如果不配置变量，会按照 MindFormers > 原生模型 > MindONE 的优先级自动寻找支持的后端； 配置之后则按指定后端执行。  |   String   | `MindFormers`: 模型后端为MindSpore Transformers。 `Native`: 模型后端为原生模型。 `MindONE`: 模型后端为MindONE  |   原生模型后端当前支持Qwen2.5、Qwen2.5VL、Qwen3、Llama系列；MindSpore Transformers模型后端支持Qwen系列、DeepSeek、TeleChat系列模型，使用时需配置环境变量：`export PYTHONPATH=/path/to/mindformers/:$PYTHONPATH`。   |
 |   `MINDFORMERS_MODEL_CONFIG`   |   MindSpore Transformers模型的配置文件。使用Qwen2.5系列、DeepSeek系列模型时，需要配置文件路径。   |   String   |   模型配置文件路径。   |   **该环境变量在后续版本会被移除。** 样例：`export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8.yaml`。   |
 |   `GLOO_SOCKET_IFNAME`   |   用于多机之间使用gloo通信时的网口名称。   |   String   |  网口名称，例如enp189s0f0。    |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
 |   `TP_SOCKET_IFNAME`   |   用于多机之间使用TP通信时的网口名称。   |   String   | 网口名称，例如enp189s0f0。      |   多机场景使用，可通过`ifconfig`查找ip对应网卡的网卡名。   |
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
index 390f5e6291ca0b7048edd83bf165e1f08ba78e42..19cebf9bd46035fc730fa05e603d1297230b5eb0 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/benchmark/benchmark.md
@@ -9,7 +9,7 @@ vLLM-MindSpore插件的性能测试能力，继承自vLLM所提供的性能测
 若用户使用单卡推理，以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)为例，可按照文档[单卡推理（Qwen2.5-7B）](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#在线推理)进行环境准备，设置以下环境变量：
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
@@ -102,7 +102,7 @@ P99 ITL (ms):                            ....
 用户使用离线性能测试时，以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)为例，可按照文档[单卡推理（Qwen2.5-7B）](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#离线推理)进行环境准备，设置以下环境变量：
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
index 0f11e66c83578fbef359737f054a10a1835da3b8..20557bd387b2637b69dcbe1034df08949aa5d5c9 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_features/quantization/quantization.md
@@ -27,7 +27,7 @@
 用户可以参考[安装指南](../../../getting_started/installation/installation.md)，进行vLLM-MindSpore插件的环境搭建。用户需设置以下环境变量：
 
 ```bash
-export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
 export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
 ```
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
index c64725c9e1e448999d189ec22aeab0942f012299..f0ede7769b977b22b1fe0b4b451a7822514563bf 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/user_guide/supported_models/models_list/models_list.md
@@ -17,4 +17,4 @@
 | Llama3.2 | 测试中 | [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)、[Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)   |
 | DeepSeek-V2 | 测试中 | [DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)     |
 
-注：用户可参考[环境变量章节](../../environment_variables/environment_variables.md)，通过环境变量`vLLM_MODEL_BACKEND`，指定模型后端。
+注：用户可参考[环境变量章节](../../environment_variables/environment_variables.md)，通过环境变量`VLLM_MS_MODEL_BACKEND`，指定模型后端。
diff --git a/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md b/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md
index 50fe80c9dab1cbb8bbf013e2d86e3b4bbbae94d3..bd10beb8dcadc8d012c28f15c7811b4466d7f6e2 100644
--- a/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md
+++ b/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md
@@ -116,7 +116,7 @@ export VLLM_MODEL_MEMORY_USE_GB=26
 export VLLM_MASTER_IP=127.0.0.1
 export VLLM_RPC_PORT=12390
 export VLLM_HTTP_PORT=8080
-unset vLLM_MODEL_BACKEND
+unset VLLM_MS_MODEL_BACKEND
 
 # model envs
 export MODEL_ID="/path/to/model/Qwen2-7B-Instruct"
diff --git a/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md b/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md
index 7585a8f3fe521a164d48e800d80f76fcee4a2da5..6ee26515e3427c36639c298facb486c4b1872b60 100644
--- a/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md
+++ b/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md
@@ -116,7 +116,7 @@ export VLLM_MODEL_MEMORY_USE_GB=26
 export VLLM_MASTER_IP=127.0.0.1
 export VLLM_RPC_PORT=12390
 export VLLM_HTTP_PORT=8080
-unset vLLM_MODEL_BACKEND
+unset VLLM_MS_MODEL_BACKEND
 
 # model envs
 export MODEL_ID="/path/to/model/Qwen2-7B-Instruct"