From 4a5939a4d891c2ad7139176959554bc5a904c239 Mon Sep 17 00:00:00 2001
From: huan <3174348550@qq.com>
Date: Mon, 28 Jul 2025 19:14:26 +0800
Subject: [PATCH] modify contents in files
---
docs/mindformers/docs/source_en/guide/pre_training.md | 2 +-
docs/mindformers/docs/source_en/introduction/models.md | 2 +-
.../docs/source_en/developer_guide/contributing.md | 2 +-
.../qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md | 2 +-
.../qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md | 2 +-
.../docs/source_zh_cn/developer_guide/contributing.md | 2 +-
.../qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md | 2 +-
.../qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md | 2 +-
tutorials/source_en/parallel/comm_fusion.md | 2 +-
tutorials/source_en/parallel/dynamic_cluster.md | 2 +-
.../source_en/parallel/high_dimension_tensor_parallel.md | 4 ++--
tutorials/source_en/parallel/split_technique.md | 9 +++++----
.../model_infer/ms_infer/ms_infer_model_infer.rst | 2 +-
tutorials/source_zh_cn/parallel/dynamic_cluster.md | 2 +-
.../parallel/high_dimension_tensor_parallel.md | 2 +-
tutorials/source_zh_cn/parallel/split_technique.md | 9 +++++----
16 files changed, 25 insertions(+), 23 deletions(-)
diff --git a/docs/mindformers/docs/source_en/guide/pre_training.md b/docs/mindformers/docs/source_en/guide/pre_training.md
index bdb7f658fb..0901086ea8 100644
--- a/docs/mindformers/docs/source_en/guide/pre_training.md
+++ b/docs/mindformers/docs/source_en/guide/pre_training.md
@@ -120,4 +120,4 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
## More Information
-For more training examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/master/introduction/models.html).
+For more training examples of different models, see [the models supported by MindSpore TransFormers](https://www.mindspore.cn/mindformers/docs/en/master/introduction/models.html).
diff --git a/docs/mindformers/docs/source_en/introduction/models.md b/docs/mindformers/docs/source_en/introduction/models.md
index 6abdc0c7e5..835308efb6 100644
--- a/docs/mindformers/docs/source_en/introduction/models.md
+++ b/docs/mindformers/docs/source_en/introduction/models.md
@@ -2,7 +2,7 @@
[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/introduction/models.md)
-The following table lists models supported by MindFormers.
+The following table lists models supported by MindSpore TransFormers.
| Model | Specifications | Model Type | Latest Version |
|:--------------------------------------------------------------------------------------------------------|:------------------------------|:----------------:|:----------------------:|
diff --git a/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md b/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md
index c8b74ca3cf..e912018c23 100644
--- a/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md
+++ b/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md
@@ -65,7 +65,7 @@ Follow these guidelines for community code review, maintenance, and development.
To contribute by reporting issues, follow these guidelines:
-- Specify your environment versions (vLLM MindSpore, MindFormers, MindSpore, OS, Python, etc.).
+- Specify your environment versions (vLLM MindSpore, MindSpore TransFormers, MindSpore, OS, Python, etc.).
- Indicate whether it's a bug report or feature request.
- Label the issue type for visibility on the issue board.
- Describe the problem and expected resolution.
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 71e7bee046..18be7a9d75 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -120,7 +120,7 @@ For [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct), the followi
```bash
#set environment variables
export ASCEND_TOTAL_MEMORY_GB=64 # Use `npu-smi info` to check the memory.
-export vLLM_MODEL_BACKEND=MindFormers # Use MindFormers as the model backend.
+export vLLM_MODEL_BACKEND=MindFormers # Use MindSpore TransFormers as the model backend.
export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Adjust based on the model's maximum usage, with the remaining allocated for KV cache.
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model YAML file.
```
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index f17c8e589a..4b50576663 100644
--- a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -120,7 +120,7 @@ For [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), the following
```bash
#set environment variables
export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
-export vLLM_MODEL_BACKEND=MindFormers # use MindFormers as model backend.
+export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
```
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md
index f44b6d798e..aef574b259 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/developer_guide/contributing.md
@@ -71,7 +71,7 @@
报告issue时,请参考以下格式:
-- 说明您使用的环境版本(vLLM MindSpore、MindFormers、MindSpore、OS、Python等);
+- 说明您使用的环境版本(vLLM MindSpore、MindSpore TransFormers、MindSpore、OS、Python等);
- 说明是错误报告还是功能需求;
- 说明issue类型,添加标签可以在issue板上突出显示该issue;
- 问题是什么;
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index c39ff38372..12904d51e9 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -121,7 +121,7 @@ git clone https://huggingface.co/Qwen/Qwen2.5-32B-Instruct
```bash
#set environment variables
export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
-export vLLM_MODEL_BACKEND=MindFormers # use MindFormers as model backend.
+export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
```
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index 0c9449f718..4337bda722 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -121,7 +121,7 @@ git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
```bash
#set environment variables
export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
-export vLLM_MODEL_BACKEND=MindFormers # use MindFormers as model backend.
+export vLLM_MODEL_BACKEND=MindFormers # use MindSpore TransFormers as model backend.
export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
```
diff --git a/tutorials/source_en/parallel/comm_fusion.md b/tutorials/source_en/parallel/comm_fusion.md
index bc4ba83916..ae41ee4d5d 100644
--- a/tutorials/source_en/parallel/comm_fusion.md
+++ b/tutorials/source_en/parallel/comm_fusion.md
@@ -22,7 +22,7 @@ As shown in the figure below, each node backs up the complete neural network mod
#### The Necessity of Communication Fusion
-The time overhead of network communication can be measured by the following equation, where $m$ is the size of the data transmission, $\alpha$ is the network transmission rate, and $\beta$ is the inherent overhead of network startup. As can be seen, when the number of transmitted messages becomes larger, the inherent overhead share of network shartup will decrease, transmitting small messages does not make efficient use of network bandwidth resources. Even communication primitives in the HPC domain, such as `AllReduce` and `AllGather`, follow this principle. Therefore, communication fusion technology can effectively improve network resource utilization and reduce network synchronization delay.
+The time overhead of network communication can be measured by the following equation, where $m$ is the size of the data transmission, $\alpha$ is the network transmission rate, and $\beta$ is the inherent overhead of network startup. As can be seen, when the number of transmitted messages becomes larger, the inherent overhead share of network startup will decrease, transmitting small messages does not make efficient use of network bandwidth resources. Even communication primitives in the HPC domain, such as `AllReduce` and `AllGather`, follow this principle. Therefore, communication fusion technology can effectively improve network resource utilization and reduce network synchronization delay.
$$t = \alpha m+\beta$$
diff --git a/tutorials/source_en/parallel/dynamic_cluster.md b/tutorials/source_en/parallel/dynamic_cluster.md
index bf627f114a..31c7a418dd 100644
--- a/tutorials/source_en/parallel/dynamic_cluster.md
+++ b/tutorials/source_en/parallel/dynamic_cluster.md
@@ -380,7 +380,7 @@ do
done
```
-> In a multi-machine task, you need to set a different hostname for each host node, otherwise you will get an error reporting `deivce id` out of bounds. Refer to [FAQ](https://www.mindspore.cn/docs/en/master/faq/distributed_parallel.html#q-when-starting-distributed-framework-using-dynamic-cluster-or-msrun-in-multi-machine-scenario,-an-error-is-reported-that-device-id-is-out-of-range-how-can-we-solve-it?).
+> In a multi-machine task, you need to set a different hostname for each host node, otherwise you will get an error reporting `device id` out of bounds. Refer to [FAQ](https://www.mindspore.cn/docs/en/master/faq/distributed_parallel.html#q-when-starting-distributed-framework-using-dynamic-cluster-or-msrun-in-multi-machine-scenario,-an-error-is-reported-that-device-id-is-out-of-range-how-can-we-solve-it?).
>
> In a multi-machine task, `MS_WORKER_NUM` should be the total number of Worker nodes in the cluster.
>
diff --git a/tutorials/source_en/parallel/high_dimension_tensor_parallel.md b/tutorials/source_en/parallel/high_dimension_tensor_parallel.md
index f5a51e0abd..9887ad351a 100644
--- a/tutorials/source_en/parallel/high_dimension_tensor_parallel.md
+++ b/tutorials/source_en/parallel/high_dimension_tensor_parallel.md
@@ -40,7 +40,7 @@ The 2D tensor parallelism slices both the activation bsh and the weight he by tw
A comprehensive comparison of the theoretical computation, storage, and communication overheads for 1D/2D/3D is as follows:
-| TP Type | Compution | Memory(parameters) | Memory(activation) | Communication Volume(Single Device) |
+| TP Type | Computation | Memory(parameters) | Memory(activation) | Communication Volume(Single Device) |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| 1D tensor parallel computing | O(1/P) | O(1/P) | O(1) | 2(P-1)bsh/P |
| 2D tensor parallel computing | O(1/xy) | O(1/xy) | O(1/xy) | 2bs[e(x-1)+h (y-1)]/xy |
@@ -61,7 +61,7 @@ With the above switch turned on, shard slicing determines whether 2D or 3D paral
2. 3D tensor parallel in_strategy configurations, mainly limiting the activation tensor and the last two dimensions of the weight tensor: `mindspore.ops.MatMul().shard(in_strategy = (layout(("z","y"),"x" ), layout(("x","z"), "y")))`
> 1. The x, y, z in the above slicing rule, i.e., the number of slicing devices for high-dimensional TP in different dimensions, should be determined by the user according to the shape of the tensor involved in the computation, and the principle of evenly slicing the weight tensor configuration has a better performance gain.
-> 2. If MatMul / BatchMatMul has transpose_a or trainspose_b turned on, the slice layout involved in the high-dimensional TP is also switched to the corresponding position.
+> 2. If MatMul / BatchMatMul has transpose_a or transpose_b turned on, the slice layout involved in the high-dimensional TP is also switched to the corresponding position.
## Operation Practice
diff --git a/tutorials/source_en/parallel/split_technique.md b/tutorials/source_en/parallel/split_technique.md
index 4466845c37..bb27d80afa 100644
--- a/tutorials/source_en/parallel/split_technique.md
+++ b/tutorials/source_en/parallel/split_technique.md
@@ -8,7 +8,7 @@ For a new model using `Sharding Propagation` to configure the parallelization st
### Configuring Operators Involving Weights
-The sharding strategy for parameter weights is very important, especially for large models, as the memory consumption caused by parameter weights accounts for a large portion of the total memory consumption for model training. Therefore, operators involving weights usually need to explicitly configure the sharding strategy. In the two examples below, the Gather and MatMul operators involving weights are configured with sharding strategy, while the other operators are not. These correspond the data-parallel VocabEmbedding layer and hybrid-parallel FeedForward Layer in [MindFormers](https://gitee.com/mindspore/mindformers/blob/master/mindformers/modules/transformer/transformer.py), respectively.
+The sharding strategy for parameter weights is very important, especially for large models, as the memory consumption caused by parameter weights accounts for a large portion of the total memory consumption for model training. Therefore, operators involving weights usually need to explicitly configure the sharding strategy. In the two examples below, the Gather and MatMul operators involving weights are configured with sharding strategy, while the other operators are not. These correspond the data-parallel VocabEmbedding layer and hybrid-parallel FeedForward Layer in [MindSpore TransFormers](https://gitee.com/mindspore/mindformers/blob/master/mindformers/modules/transformer/transformer.py), respectively.

@@ -32,7 +32,7 @@ Users working with strategy propagation need to have some understanding not only
## Configuring Code Samples
-Taking the encapsulated class [RowParallelLinear](https://gitee.com/mindspore/mindformers/blob/master/mindformers/experimental/graph/tensor_parallel/layers.py) in MindFormers as an example:
+Taking the encapsulated class RowParallelLinear as an example:
@@ -78,7 +78,8 @@ class RowParallelLinear(nn.Cell):
-The other example is [CoreAttention](https://gitee.com/mindspore/mindformers/blob/master/mindformers/experimental/graph/transformer/transformer.py). Configure it as above:
+The other example is CoreAttention. Configure it as above:
+
@@ -159,7 +160,7 @@ class FlashAttention(Cell):
|
-If classes that are open source and already paired with a strategy in MindFormers are used directly, the external network does not need to configure the shard strategy for the operator again, e.g., [LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/master/mindformers/models/llama/llama.py).
+If classes that are open source and already paired with a strategy in MindSpore TransFormers are used directly, the external network does not need to configure the shard strategy for the operator again, e.g., [LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/master/mindformers/models/llama/llama.py).
diff --git a/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_infer.rst b/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_infer.rst
index b803feaad5..cca47ae04f 100644
--- a/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_infer.rst
+++ b/tutorials/source_zh_cn/model_infer/ms_infer/ms_infer_model_infer.rst
@@ -383,7 +383,7 @@ MindSpore大语言模型带框架推理主要依赖MindSpore开源软件,用
模型并行
~~~~~~~~
-对于模型参数比较多的大语言模型,如Llama2-70B、Qwen2-72B,由于其参数规模通常会超过一张GPU或者NPU的内存容量,因此需要采用多卡并行推理。MindSpore大语言模型推理支持将原始大语言模型切分成N份可并行的子模型,使其能够分别在多卡上并行执行,在实现超大模型推理同时,也利用多卡中更多的资源提升性能。MindFormers模型套件提供的模型脚本天然支持将模型切分成多卡模型执行。
+对于模型参数比较多的大语言模型,如Llama2-70B、Qwen2-72B,由于其参数规模通常会超过一张GPU或者NPU的内存容量,因此需要采用多卡并行推理。MindSpore大语言模型推理支持将原始大语言模型切分成N份可并行的子模型,使其能够分别在多卡上并行执行,在实现超大模型推理同时,也利用多卡中更多的资源提升性能。MindSpore TransFormers模型套件提供的模型脚本天然支持将模型切分成多卡模型执行。
当前,主流的模型并行方法包含以下几类:
diff --git a/tutorials/source_zh_cn/parallel/dynamic_cluster.md b/tutorials/source_zh_cn/parallel/dynamic_cluster.md
index 8677e34622..60c3383b6a 100644
--- a/tutorials/source_zh_cn/parallel/dynamic_cluster.md
+++ b/tutorials/source_zh_cn/parallel/dynamic_cluster.md
@@ -380,7 +380,7 @@ do
done
```
-> 在多机器任务中,需要为每个主机节点设置不同的主机名,否则会出现报错`deivce id`越界。可参考[FAQ](https://www.mindspore.cn/docs/zh-CN/master/faq/distributed_parallel.html#q-多机场景使用动态组网或msrun启动分布式任务时报错device-id越界如何解决)。
+> 在多机器任务中,需要为每个主机节点设置不同的主机名,否则会出现报错`device id`越界。可参考[FAQ](https://www.mindspore.cn/docs/zh-CN/master/faq/distributed_parallel.html#q-多机场景使用动态组网或msrun启动分布式任务时报错device-id越界如何解决)。
>
> 在多机任务中,`MS_WORKER_NUM`应当为集群中Worker节点总数。
>
diff --git a/tutorials/source_zh_cn/parallel/high_dimension_tensor_parallel.md b/tutorials/source_zh_cn/parallel/high_dimension_tensor_parallel.md
index 4081a93c90..104c534b22 100644
--- a/tutorials/source_zh_cn/parallel/high_dimension_tensor_parallel.md
+++ b/tutorials/source_zh_cn/parallel/high_dimension_tensor_parallel.md
@@ -61,7 +61,7 @@
2. 3D张量并行in_strategy配置,主要限定激活张量和权重张量的最后两维的切分: `mindspore.ops.MatMul().shard(in_strategy = (layout(("z","y"),"x" ), layout(("x","z"), "y")))`
> 1. 上述切分规则中的x、y、z即高维TP在不同维度上的切分设备数,需用户根据参与计算的张量的shape自行确定,原则将权重张量均匀切分的配置有更好的性能收益
-> 2. 如果MatMul / BatchMatMul开启了transpose_a或trainspose_b,则高维TP所涉及的切分layout也要调换到对应位置
+> 2. 如果MatMul / BatchMatMul开启了transpose_a或transpose_b,则高维TP所涉及的切分layout也要调换到对应位置
## 操作实践
diff --git a/tutorials/source_zh_cn/parallel/split_technique.md b/tutorials/source_zh_cn/parallel/split_technique.md
index 0e3d4dba24..0f20170311 100644
--- a/tutorials/source_zh_cn/parallel/split_technique.md
+++ b/tutorials/source_zh_cn/parallel/split_technique.md
@@ -8,7 +8,7 @@
### 配置涉及权重的算子
-参数权重的切分策略是十分重要的,尤其对大模型来说,因为参数权重引起的内存消耗占据模型训练总内存消耗的大部分。因此,涉及权重的算子通常需要显式地配置切分策略。在下图的两个例子中,涉及权重的Gather和MatMul算子配置了切分策略,而其他算子没有配置。这分别对应[MindFormers](https://gitee.com/mindspore/mindformers/blob/master/mindformers/modules/transformer/transformer.py)中的数据并行VocabEmbedding层和混合并行FeedForward层。
+参数权重的切分策略是十分重要的,尤其对大模型来说,因为参数权重引起的内存消耗占据模型训练总内存消耗的大部分。因此,涉及权重的算子通常需要显式地配置切分策略。在下图的两个例子中,涉及权重的Gather和MatMul算子配置了切分策略,而其他算子没有配置。这分别对应[MindSpore TransFormers](https://gitee.com/mindspore/mindformers/blob/master/mindformers/modules/transformer/transformer.py)中的数据并行VocabEmbedding层和混合并行FeedForward层。

@@ -32,7 +32,7 @@
## 配置代码样例
-以MindFormers中封装的类[RowParallelLinear](https://gitee.com/mindspore/mindformers/blob/master/mindformers/experimental/graph/tensor_parallel/layers.py)为例:
+以类RowParallelLinear为例:
@@ -78,7 +78,8 @@ class RowParallelLinear(nn.Cell):
-另一个例子是[CoreAttention](https://gitee.com/mindspore/mindformers/blob/master/mindformers/experimental/graph/transformer/transformer.py),根据上述原则配置:
+另一个例子是CoreAttention,根据上述原则配置:
+
@@ -159,7 +160,7 @@ class FlashAttention(Cell):
|
-若直接使用MindFormers中开源且已经配好策略的类,则外部网络无需对算子再配置shard策略,如[LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/master/mindformers/models/llama/llama.py)。
+若直接使用MindSpore TransFormers中开源且已经配好策略的类,则外部网络无需对算子再配置shard策略,如[LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/master/mindformers/models/llama/llama.py)。
|