diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 9080b67487908939da1d82b6a212341c274042b7..dee8d2d13bde51cd80d98bcefbfb4d4c5728011c 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -264,7 +264,16 @@ vLLM manages and operates resources across multiple nodes through Ray. This exam
 - Tensor Parallelism (TP): 4;  
 - Expert Parallelism (EP): 4.
 
-### Setting Environment Variables
+The data parallel deployment can be set using `--data-parallel-backend`, either `mp` or `ray`. By default, data parallel will be deployed with `mp` backend.
+
+`--data-parallel-backend` options：
+
+- `mp`: Deploy with multiprocessing backend;  
+- `ray`: Deploy with ray backend.
+
+### Deploy DP with `mp` Backend
+
+#### Setting Environment Variables
 
 Configure the following environment variables on the master and worker nodes:  
 
@@ -301,11 +310,11 @@ parallel_config:
 
 `data_parallel` and `model_parallel` specify the parallelism strategy for the attention and feed-forward dense layers, while `expert_parallel` specifies the expert routing parallelism strategy for MoE layers. Ensure that `data_parallel` * `model_parallel` is divisible by `expert_parallel`.  
 
-### Online Inference
+#### Online Inference
 
-#### Starting the Service
+**Starting the Service**  
 
-`vllm-mindspore` can deploy online inference using the OpenAI API protocol. Below is the workflow for launching the service:  
+vLLM-MindSpore Plugin can deploy online inference using the OpenAI API protocol. Below is the workflow for launching the service:  
 
 ```bash  
 # Parameter explanations for service launch  
@@ -337,7 +346,83 @@ vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remot
 vllm-mindspore serve --headless --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel  
 ```
 
-#### Sending Requests
+**Sending Requests**  
+
+Use the following command to send requests, where `prompt` is the model input:  
+
+```bash  
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "MindSpore-Lab/DeepSeek-R1-0528-A8W8", "prompt": "I am", "max_tokens": 20, "temperature": 0}'  
+```
+
+User needs to ensure that the `"model"` field matches the `--model` in the service startup, and the request can successfully match the model.
+
+### Deploy DP with `ray` Backend
+
+#### Setting Environment Variables
+
+Configure the following environment variables on the master and worker nodes:  
+
+```bash  
+source /usr/local/Ascend/ascend-toolkit/set_env.sh  
+
+export MS_ENABLE_LCCL=off  
+export HCCL_OP_EXPANSION_MODE=AIV  
+export MS_ALLOC_CONF=enable_vmm:true  
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7  
+export vLLM_MODEL_BACKEND=MindFormers  
+export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml  
+
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python  
+export GLOO_SOCKET_IFNAME=enp189s0f0  
+export HCCL_SOCKET_IFNAME=enp189s0f0  
+export TP_SOCKET_IFNAME=enp189s0f0  
+```
+
+Extra environment variable descriptions:  
+
+- `PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION`: Setting the implementation of protocol buffers, need to be set to `python`.  
+- `GLOO_SOCKET_IFNAME`: GLOO backend port. Use `ifconfig` to find the network interface name corresponding to the IP.  
+- `HCCL_SOCKET_IFNAME`: Configure the HCCL port. Use `ifconfig` to find the network interface name corresponding to the IP.  
+- `TP_SOCKET_IFNAME`: Configure the TP port. Use `ifconfig` to find the network interface name corresponding to the IP.  
+
+The model parallel strategy is specified in the `parallel_config` of the configuration file.  
+
+For more information on environment variables and parallel strategy configuration, please refer to [Deployed with `ray` backend](#setting-environment-variables-1).
+
+#### Starting Ray for Multi-Node Cluster Management
+
+Refer to [Starting Ray for Multi-Node Cluster Management for TP](#starting-ray-for-multi-node-cluster-management).
+
+#### Online Inference
+
+**Starting the Service**  
+
+vLLM-MindSpore Plugin can deploy online inference using the OpenAI API protocol. Below is the workflow for launching the service:  
+
+```bash  
+# Parameter explanations for service launch  
+vllm-mindspore serve  
+ --model=[Model Config/Weights Path]  
+ --trust-remote-code # Use locally downloaded model files  
+ --max-num-seqs [Maximum Batch Size]  
+ --max-model-len [Maximum Input/Output Length]  
+ --max-num-batched-tokens [Maximum Tokens per Iteration, recommended: 4096]  
+ --block-size [Block Size, recommended: 128]  
+ --gpu-memory-utilization [GPU Memory Utilization, recommended: 0.9]  
+ --tensor-parallel-size [TP Parallelism Degree]  
+ --data-parallel-size [DP Parallelism Degree]  
+ --data-parallel-size-local [DP count on the current service node, sum across all nodes equals data-parallel-size]  
+ --enable-expert-parallel # Enable expert parallelism  
+ --data-parallel-backend=ray # Set the dp deployment method to ray.
+```
+
+User can also set the local model path by `--model` argument. The following is an execution example:  
+
+```bash
+vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --enable-expert-parallel --data-parallel-backend=ray
+```
+
+**Sending Requests**  
 
 Use the following command to send requests, where `prompt` is the model input:  
 
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
index 87eb81a0a13c62b94bc89191982b163df7b596e1..690b13196436c770014669e66b8eada82b2adf06 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_parallel/deepseek_r1_671b_w8a8_dp4_tp4_ep4.md
@@ -307,7 +307,16 @@ vLLM 通过 Ray 对多个节点资源进行管理和运行。该样例对应以
 - 张量并行（TP）为4；
 - 专家并行（EP）为4。
 
-### 设置环境变量
+数据并行可以通过 `--data-parallel-backend` 设置部署方式，可选项为 `mp` 和 `ray`。默认行为下，DP 将以 mp 方式部署。
+
+`--data-parallel-backend` 选项：
+
+- `mp`: 以多进程的方式部署；
+- `ray`: 以 ray 方式部署。
+
+### 以 mp 方式部署 DP
+
+#### 设置环境变量
 
 分别在主从节点配置如下环境变量：
 
@@ -344,11 +353,11 @@ parallel_config:
 
 `data_parallel`及`model_parallel`指定attn及ffn-dense部分的并行策略，`expert_parallel`指定moe部分路由专家并行策略，且需满足`data_parallel` * `model_parallel`可被`expert_parallel`整除。
 
-### 在线推理
+#### 在线推理
 
-#### 启动服务
+**启动服务**  
 
-`vllm-mindspore`可使用OpenAI的API协议部署在线推理。以下是在线推理的拉起流程：
+vLLM-MindSpore插件可使用OpenAI的API协议部署在线推理。以下是在线推理的拉起流程：
 
 ```bash
 # 启动配置参数说明
@@ -380,7 +389,83 @@ vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remot
 vllm-mindspore serve --headless --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --data-parallel-start-rank 2 --data-parallel-address 192.10.10.10 --data-parallel-rpc-port 12370 --enable-expert-parallel
 ```
 
-#### 发送请求
+**发送请求**  
+
+使用如下命令发送请求。其中`prompt`字段为模型输入：
+
+```bash
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "MindSpore-Lab/DeepSeek-R1-0528-A8W8", "prompt": "I am, "max_tokens": 120, "temperature": 0}'
+```
+
+用户需确认`"model"`字段与启动服务中`--model`一致，请求才能成功匹配到模型。
+
+### 以 ray 方式部署 DP
+
+#### 设置环境变量
+
+分别在主从节点配置如下环境变量：
+
+```bash
+source /usr/local/Ascend/ascend-toolkit/set_env.sh
+
+export MS_ENABLE_LCCL=off
+export HCCL_OP_EXPANSION_MODE=AIV
+export MS_ALLOC_CONF=enable_vmm:true
+export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export vLLM_MODEL_BACKEND=MindFormers
+export MINDFORMERS_MODEL_CONFIG=/path/to/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b_w8a8_ep4tp4.yaml
+
+export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
+export GLOO_SOCKET_IFNAME=enp189s0f0
+export HCCL_SOCKET_IFNAME=enp189s0f0
+export TP_SOCKET_IFNAME=enp189s0f0
+```
+
+额外环境变量说明：
+
+- `PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION`: 指定 protocol buffers 的实现方式，需要设置为 `python`。
+- `GLOO_SOCKET_IFNAME`: GLOO后端端口。可通过`ifconfig`查找ip对应网卡的网卡名。
+- `HCCL_SOCKET_IFNAME`: 配置HCCL端口。可通过`ifconfig`查找ip对应网卡的网卡名。
+- `TP_SOCKET_IFNAME`: 配置TP端口。可通过`ifconfig`查找ip对应网卡的网卡名。
+
+模型并行策略通过配置文件中的`parallel_config`指定。
+
+更多环境变量与并行策略配置方式请参考 [mp 部署 DP](#设置环境变量-1)。
+
+#### 启动 Ray 进行多节点集群管理
+
+参考 [TP 中启动 ray 进行多节点集群管理](#启动-ray-进行多节点集群管理)。
+
+#### 在线推理
+
+**启动服务**  
+
+vLLM-MindSpore插件可使用OpenAI的API协议部署在线推理。以下是在线推理的拉起流程：
+
+```bash
+# 启动配置参数说明
+vllm-mindspore serve
+ --model=[模型Config/权重路径]
+ --trust-remote-code # 使用本地下载的model文件
+ --max-num-seqs [最大Batch数]
+ --max-model-len [输出输出最大长度]
+ --max-num-batched-tokens [单次迭代最大支持token数, 推荐4096]
+ --block-size [Block Size 大小, 推荐128]
+ --gpu-memory-utilization [显存利用率, 推荐0.9]
+ --tensor-parallel-size [TP 并行数]
+ --data-parallel-size [DP 并行数]
+ --data-parallel-size-local [当前服务节点中的DP数，所有节点求和等于data-parallel-size]
+ --enable-expert-parallel # 使能专家并行
+ --data-parallel-backend=ray # 指定 dp 部署方式为 ray
+```
+
+用户可以通过`--model`参数，指定模型保存的本地路径。以下为执行示例：
+
+```bash
+vllm-mindspore serve --model="MindSpore-Lab/DeepSeek-R1-0528-A8W8" --trust-remote-code --max-num-seqs=256 --max-model-len=32768 --max-num-batched-tokens=4096 --block-size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 4 --data-parallel-size 4 --data-parallel-size-local 2 --enable-expert-parallel --data-parallel-backend=ray
+```
+
+**发送请求**  
 
 使用如下命令发送请求。其中`prompt`字段为模型输入：