diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index bc0f3a482a567a2a66abe41abf47d64c03f98a27..7010953d1812712de64734c6aa96b77dc5a703a0 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -145,7 +145,7 @@ export ASCEND_RT_VISIBLE_DEVICES=$NPU_VISIBE_DEVICES
 
 ## Offline Inference
 
-Taking [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example, user can perform offline inference with the following Python code:  
+After setting up the vLLM-MindSpore Plugin environment, user can use the following python code to perform offline inference on the model:
 
 ```python  
 import vllm_mindspore # Add this line on the top of script.  
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/conf.py b/docs/vllm_mindspore/docs/source_zh_cn/conf.py
index 4106f41dc2f941215f7f7e8ddf54e5d72e6a95f4..fa85c0f1323bf0f599a501d9a8a8b02b5a5a653f 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/conf.py
+++ b/docs/vllm_mindspore/docs/source_zh_cn/conf.py
@@ -23,9 +23,9 @@ from sphinx.ext import autodoc as sphinx_autodoc
 
 # -- Project information -----------------------------------------------------
 
-project = 'vLLM-MindSpore插件'
+project = 'vLLM-MindSpore Plugin'
 copyright = 'MindSpore'
-author = 'vLLM-MindSpore插件'
+author = 'vLLM-MindSpore Plugin'
 
 # The full version, including alpha/beta/rc tags
 release = '0.3.0'
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md
index d3857f58d7ea546e084be3062b04fd457f52b956..c19b73358663193d295418c8e26ec56b469d351c 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/operations/npu_ops.md
@@ -92,8 +92,8 @@ MS_EXTENSION_MODULE(my_custom_op) {
 
 ### 算子编译并测试
 
-1. **代码集成**：将代码集成至 vllm-mindspore 项目。
-2. **编译项目**：于vllm-mindspore工程中，执行`pip install .`，编译安装vLLM-MindSpore插件。
+1. **代码集成**：将代码集成至vLLM-MindSpore插件项目。
+2. **编译项目**：在项目代码根目录vllm-mindspore下，执行`pip install .`，编译安装vLLM-MindSpore插件。
 3. **测试算子接口**：使用 Python 调用注册的算子接口：
 
     ```python
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index bae4be1f21f3b8c32bef0ad70ba67c73a9d7b876..0e728941f7d55eafd38770df7ccafd08f2686686 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -146,7 +146,7 @@ export ASCEND_RT_VISIBLE_DEVICES=$NPU_VISIBE_DEVICES
 
 ## 离线推理
 
-vllm MindSprore环境搭建之后，用户可以使用如下Python代码，进行模型的离线推理：
+vLLM-MindSprore插件环境搭建之后，用户可以使用如下Python代码，进行模型的离线推理：
 
 ```python
 import vllm_mindspore # Add this line on the top of script.
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/index.rst b/docs/vllm_mindspore/docs/source_zh_cn/index.rst
index 7fe2761a461e52526a17e1bfbbadca7abbbd4a93..a5bd24f15b192838b47f8f1c432d6b15799a136b 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/index.rst
+++ b/docs/vllm_mindspore/docs/source_zh_cn/index.rst
@@ -7,7 +7,7 @@ vLLM-MindSpore插件（`vllm-mindspore`）是一个由 `MindSpore社区 <https:/
 
 vLLM是由加州大学伯克利分校Sky Computing Lab创建的社区开源项目，已广泛用于学术研究和工业应用。vLLM以Continuous Batching调度机制和PagedAttention Key-Value缓存管理为基础，提供了丰富的推理服务功能，包括投机推理、Prefix Caching、Multi-LoRA等。同时，vLLM已支持种类丰富的开源大模型，包括Transformer类（如LLaMa）、混合专家类（如DeepSeek）、Embedding类（如E5-Mistral）、多模态类（如LLaVA）等。由于vLLM选用PyTorch构建大模型和管理计算存储资源，此前无法使用其部署基于MindSpore大模型的推理服务。
 
-vLLM-MindSpore插件以将MindSpore大模型接入vLLM，并实现服务化部署为功能目标。其遵循以下设计原则：
+vLLM-MindSpore插件已将MindSpore大模型接入vLLM，并实现服务化部署为功能目标。其遵循以下设计原则：
 
 - 接口兼容：支持vLLM原生的API和服务部署接口，避免新增配置文件或接口，降低用户学习成本和确保易用性。
 - 最小化侵入式修改：尽可能避免侵入式修改vLLM代码，以保障系统的可维护性和可演进性。
diff --git a/tools/generate_html/base_version.json b/tools/generate_html/base_version.json
index 521656f42c2903fb093299a1d8fd54ef3770f5e4..03b6b02e827e69ec8fe9aca728cdc888e8e232c0 100644
--- a/tools/generate_html/base_version.json
+++ b/tools/generate_html/base_version.json
@@ -123,8 +123,8 @@
     {
         "version": "r0.3.0",
         "label": {
-            "zh": "vLLM MindSpore",
-            "en": "vLLM MindSpore"
+            "zh": "vLLM-MindSpore Plugin",
+            "en": "vLLM-MindSpore Plugin"
         },
         "repo_name": "vllm_mindspore",
         "theme": "theme-docs"
diff --git a/tutorials/source_en/model_infer/introduction.md b/tutorials/source_en/model_infer/introduction.md
index 484db52c93260571e06df14dcfc5885788d82d26..aab84a6f0e14d027451a4ebda61adda32f2d0c03 100644
--- a/tutorials/source_en/model_infer/introduction.md
+++ b/tutorials/source_en/model_infer/introduction.md
@@ -62,7 +62,7 @@ The following figure shows the key technology stack of MindSpore inference.
 
 - **Inference with a framework**: In scenarios with abundant computing resources, only Python APIs are provided. You need to use Python scripts to build models and perform inference. Service-oriented components are not mandatory.
 
-    - **vLLM&vLLM-MindSpore**: The service-oriented capability of the inference solution with a framework is provided. The popular vLLM service-oriented inference capability in the open-source community is used to seamlessly connect the service-oriented capability of the community to the MindSpore inference ecosystem.
+    - **vLLM & vLLM-MindSpore Plugin**: The service-oriented capability of the inference solution with a framework is provided. The popular vLLM service-oriented inference capability in the open-source community is used to seamlessly connect the service-oriented capability of the community to the MindSpore inference ecosystem.
 
     - **Python API**: MindSpore provides Python APIs, including mint operator APIs (consistent with PyTorch semantics), nn APIs, and parallel APIs.
 
diff --git a/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md b/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md
index bed0c3d7a615211d2ff9ff90186d3a0c840fe1cb..486eb3f504d1842d62867a24ac8b8350a6e364d5 100644
--- a/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md
+++ b/tutorials/source_en/model_infer/ms_infer/ms_infer_model_serving_infer.md
@@ -41,13 +41,13 @@ As an efficient service-oriented model inference backend, it should provide the
 
 ## Inference Tutorial
 
-MindSpore inference works with the vLLM community solution to provide users with full-stack end-to-end inference service capabilities. The vLLM MindSpore adaptation layer implements seamless interconnection of the vLLM community service capabilities in the MindSpore framework. For details, see [vLLM MindSpore](https://www.mindspore.cn/vllm_mindspore/docs/en/master/index.html).
+MindSpore inference works with the vLLM community solution to provide users with full-stack end-to-end inference service capabilities. The vLLM-MindSpore Plugin adaptation layer implements seamless interconnection of the vLLM community service capabilities in the MindSpore framework. For details, see [vLLM-MindSpore Plugin](https://www.mindspore.cn/vllm_mindspore/docs/en/master/index.html).
 
-This section describes the basic usage of vLLM MindSpore service-oriented inference.
+This section describes the basic usage of vLLM-MindSpore Plugin service-oriented inference.
 
 ### Setting Up the Environment
 
-The vLLM MindSpore adaptation layer provides an environment installation script. You can run the following commands to create a vLLM MindSpore operating environment:
+The vLLM-MindSpore Plugin adaptation layer provides an environment installation script. You can run the following commands to create a vLLM-MindSpore Plugin operating environment:
 
 ```shell
 # download vllm-mindspore code
@@ -69,11 +69,11 @@ bash install_depend_pkgs.sh
 python setup.py install
 ```
 
-After the vLLM MindSpore operating environment is created, you need to install the following dependency packages:
+After the vLLM-MindSpore Plugin operating environment is created, you need to install the following dependency packages:
 
 - **mindspore**: MindSpore development framework, which is the basis for model running.
 
-- **vLLM**: vLLM service software.
+- **vllm**: vLLM service software.
 
 - **vllm-mindspore**: vLLM extension that adapts to the MindSpore framework. It is required for running MindSpore models.
 
@@ -85,14 +85,14 @@ After the vLLM MindSpore operating environment is created, you need to install t
 
 ### Preparing a Model
 
-The service-oriented vLLM MindSpore supports the direct running of the native Hugging Face model. Therefore, you can directly download the model from the Hugging Face official website. The following uses the Qwen2-7B-Instruct model as an example:
+The service-oriented vLLM-MindSpore Plugin supports the direct running of the native Hugging Face model. Therefore, you can directly download the model from the Hugging Face official website. The following uses the Qwen2-7B-Instruct model as an example:
 
 ```shell
 git lfs install
 git clone https://huggingface.co/Qwen/Qwen2-7B-Instruct
 ```
 
-If `git lfs install` fails during the pull process, refer to the vLLM MindSpore FAQ for a solution.
+If `git lfs install` fails during the pull process, refer to the vLLM-MindSpore Plugin [FAQ](https://www.mindspore.cn/vllm_mindspore/docs/en/master/faqs/faqs.html) for a solution.
 
 ### Starting a Service
 
@@ -122,7 +122,7 @@ unset vLLM_MODEL_BACKEND
 export MODEL_ID="/path/to/model/Qwen2-7B-Instruct"
 ```
 
-Run the following command to start the vLLM MindSpore service backend:
+Run the following command to start the vLLM-MindSpore Plugin service backend:
 
 ```shell
 vllm-mindspore serve --model=${MODEL_ID} --port=${VLLM_HTTP_PORT} --trust_remote_code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block_size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 1 --data-parallel-size 1 --data-parallel-size-local 1 --data-parallel-start-rank 0  --data-parallel-address ${VLLM_MASTER_IP} --data-parallel-rpc-port ${VLLM_RPC_PORT} &> vllm-mindspore.log &
diff --git a/tutorials/source_zh_cn/model_infer/introduction.md b/tutorials/source_zh_cn/model_infer/introduction.md
index a91bba18c01801df77939db2ca99fc351f9060f3..f383b2f91487d029105a7bd2789d62b6266ea756 100644
--- a/tutorials/source_zh_cn/model_infer/introduction.md
+++ b/tutorials/source_zh_cn/model_infer/introduction.md
@@ -62,7 +62,7 @@ MindSpore框架提供多种模型推理方式，以方便用户在面对不同
 
 - **带框架推理**：面向丰富计算资源场景，只提供Python API接口，用户需要通过Python脚本构建模型并推理，其中服务化组件不是必备的。
 
-    - **vLLM&vLLM-MindSpore**：提供带框架推理方案上的服务化能力，使用当前开源社区热门的vLLM推理服务化能力，实现社区的服务化能力无缝衔接到MindSpore推理生态。
+    - **vLLM & vLLM-MindSpore插件**：提供带框架推理方案上的服务化能力，使用当前开源社区热门的vLLM推理服务化能力，实现社区的服务化能力无缝衔接到MindSpore推理生态。
 
     - **Python API**：MindSpore框架提供Python API接口，其中包括mint算子接口（和PyTorch语义一致）、nn接口、parallel接口等。
 
diff --git a/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md b/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md
index f0a9a82dc01741fdfc49124ee925be5ef72daf76..81c135b13e7fa3f5d71f707bdc5f86da9c998369 100644
--- a/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md
+++ b/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_serving_infer.md
@@ -41,13 +41,13 @@ print(generate_text)
 
 ## 推理教程
 
-MindSpore推理结合vLLM社区方案，为用户提供了全栈端到端的推理服务化能力，通过vLLM MindSpore适配层，实现vLLM社区的服务化能力在MindSpore框架下的无缝对接，具体可以参考[vLLM MindSpore文档](https://www.mindspore.cn/vllm_mindspore/docs/zh-CN/master/index.html)。
+MindSpore推理结合vLLM社区方案，为用户提供了全栈端到端的推理服务化能力，通过vLLM-MindSpore插件适配层，实现vLLM社区的服务化能力在MindSpore框架下的无缝对接，具体可以参考[vLLM-MindSpore插件文档](https://www.mindspore.cn/vllm_mindspore/docs/zh-CN/master/index.html)。
 
-本章主要简单介绍vLLM MindSpore服务化推理的基础使用。
+本章主要简单介绍vLLM-MindSpore插件服务化推理的基础使用。
 
 ### 环境准备
 
-vLLM MindSpore适配层提供了环境安装脚本，用户可以执行如下命令创建一个vLLM MindSpore的运行环境：
+vLLM-MindSpore插件适配层提供了环境安装脚本，用户可以执行如下命令创建一个vLLM-MindSpore插件的运行环境：
 
 ```shell
 # download vllm-mindspore code
@@ -69,7 +69,7 @@ bash install_depend_pkgs.sh
 python setup.py install
 ```
 
-vLLM MindSpore的运行环境创建后，还需要安装以下依赖包：
+vLLM-MindSpore插件的运行环境创建后，还需要安装以下依赖包：
 
 - **mindspore**：MindSpore开发框架，模型运行基础。
 
@@ -85,14 +85,14 @@ vLLM MindSpore的运行环境创建后，还需要安装以下依赖包：
 
 ### 模型准备
 
-vllm-mindspore服务化支持原生Hugging Face的模型直接运行，因此直接从Hugging Face官网下载模型即可，此处我们仍然以Qwen2-7B-Instruct模型为例。
+vLLM-MindSpore插件服务化支持原生Hugging Face的模型直接运行，因此直接从Hugging Face官网下载模型即可，此处我们仍然以Qwen2-7B-Instruct模型为例。
 
 ```shell
 git lfs install
 git clone https://huggingface.co/Qwen/Qwen2-7B-Instruct
 ```
 
-若在拉取过程中，执行`git lfs install失败`，可以参考vLLM MindSpore FAQ 进行解决。
+若在拉取过程中，执行`git lfs install失败`，可以参考vLLM-MindSpore插件 [FAQ](https://www.mindspore.cn/vllm_mindspore/docs/zh-CN/master/faqs/faqs.html) 进行解决。
 
 ### 启动服务
 
@@ -122,7 +122,7 @@ unset vLLM_MODEL_BACKEND
 export MODEL_ID="/path/to/model/Qwen2-7B-Instruct"
 ```
 
-执行如下命令可以启动vLLM MindSpore的服务后端。
+执行如下命令可以启动vLLM-MindSpore插件的服务后端。
 
 ```shell
 vllm-mindspore serve --model=${MODEL_ID} --port=${VLLM_HTTP_PORT} --trust_remote_code --max-num-seqs=256 --max_model_len=32768 --max-num-batched-tokens=4096 --block_size=128 --gpu-memory-utilization=0.9 --tensor-parallel-size 1 --data-parallel-size 1 --data-parallel-size-local 1 --data-parallel-start-rank 0  --data-parallel-address ${VLLM_MASTER_IP} --data-parallel-rpc-port ${VLLM_RPC_PORT} &> vllm-mindspore.log &