From 1d07ad5b9f6479f899c786c11d510e1678b1284c Mon Sep 17 00:00:00 2001 From: xiao_yao1994 Date: Tue, 29 Jul 2025 20:03:28 +0800 Subject: [PATCH] modify pipelien docs --- .../source_en/features/parallel/pipeline_parallel.md | 4 ++-- .../source_zh_cn/features/parallel/pipeline_parallel.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md index a57757f2b4..e42b37461b 100644 --- a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md +++ b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md @@ -12,7 +12,7 @@ Related interfaces: 1. [mindspore.parallel.auto_parallel.AutoParallel(network, parallel_mode="semi_auto")](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.auto_parallel.AutoParallel.html): Encapsulates the specified parallel mode via static graph parallelism, where `network` is the top-level `Cell` or function to be encapsulated, and `parallel_mode` takes the value `semi_auto`, indicating a semi-automatic parallel mode. The interface returns a `Cell` encapsulated with parallel configuration. -2. [mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.auto_parallel.AutoParallel.html#mindspore.parallel.auto_parallel.AutoParallel.pipeline): Configures pipeline parallelism settings. `stages` specifies the total number of partitions for pipeline parallelism. If using `WithLossCell` to encapsulate `net`, the name of the `Cell` will be changed and the `_backbone` prefix will be added. `output_broadcast` determines whether to broadcast the output of the final pipeline stage to all other stages during inference. `interleave` shows that whether to enable interleaving scheduling.`scheduler` defines the pipeline scheduling strategy. Supported values: `gpipe` and `1f1b`. +2. [mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.auto_parallel.AutoParallel.html#mindspore.parallel.auto_parallel.AutoParallel.pipeline): Configures pipeline parallelism settings. `stages` specifies the total number of partitions for pipeline parallelism. If using `WithLossCell` to encapsulate `net`, the name of the `Cell` will be changed and the `_backbone` prefix will be added. `output_broadcast` determines whether to broadcast the output of the final pipeline stage to all other stages during inference. `interleave` shows that whether to enable interleaving scheduling.`scheduler` defines the pipeline scheduling strategy. Supported values: `gpipe`/`1f1b`/`seqpipe`/`seqvpp`/`seqsmartvpp`/`zero_bubble_v`. 3. [mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.nn.Pipeline.html): Pipeline parallelism requires wrapping the `network` with an additional layer of `Pipeline`, `micro_size` specifies the number of MicroBatch, which are finer-grained splits of a MiniBatch to improve hardware utilization. If using `WithLossCell` to encapsulate `network`, the name of the `Cell` will be changed and the `_backbone` prefix will be added. The final loss is the accumulation of losses from all MicroBatches. `stage_config` indicates the stage assignment for each Cell in the network. `micro_size` must be greater than or equal to the number of `stages`. @@ -68,7 +68,7 @@ MindSpore has made memory optimization based on Megatron LM interleaved pipeline ### zero_bubble_v Pipeline Scheduler -As shown in Figure 6, zero_bubble_v pipeline parallelism further improves pipeline parallel efficiency and reduces bubble rate by dividing the backward computation into gradient computation and parameter update. For consecutive model layers, the stage value first increases and then decreases. For example, for 8 layers, when the stage size is 4, stage 0 has layer 0 and 7, stage 1 has layer 1 and 6, stage 2 has 2 and 5, stage 3 has layer 3 and 4. +As shown in Figure 6, zero_bubble_v pipeline parallelism further improves pipeline parallel efficiency and reduces bubble rate by dividing the backward computation into gradient computation and parameter update. For consecutive model layers, the stage value first increases and then decreases, the pipeline_segment of the first half of layers is 0, and the pipeline_segment of the second half of layers is 1. For example, for 8 layers, when the stage size is 4, stage0 has layer0 and layer7, stage1 has layer1 and layer6, stage2 has layer2 and layer5, stage 3 has layer3 and layer4, the pipeline_segment of layer0 to layer3 is 0, and the pipeline_segment of layer4 to layer7 is 1. ![mpp2.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/features/parallel/images/zero_bubble_v.png) diff --git a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md index b377c3894b..dd8393cf49 100644 --- a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md +++ b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md @@ -12,7 +12,7 @@ 1. [mindspore.parallel.auto_parallel.AutoParallel(network, parallel_mode="semi_auto")](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.auto_parallel.AutoParallel.html):通过静态图并行封装指定并行模式,其中`network`是待封装的顶层`Cell`或函数,`parallel_mode`取值`semi_auto`,表示半自动并行模式。该接口返回封装后包含并行配置的`Cell`。 -2. [mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.auto_parallel.AutoParallel.html#mindspore.parallel.auto_parallel.AutoParallel.pipeline):设置流水线并行配置。`stages`表示流水线并行需要设置的切分总数,`output_broadcast`表示流水线并行推理时,最后一个stage的结果是否广播给其他stage,`interleave`表示是否开启interleave优化策略,`scheduler`表示流水线并行的调度策略,当前支持`gpipe`和`1f1b`。 +2. [mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.auto_parallel.AutoParallel.html#mindspore.parallel.auto_parallel.AutoParallel.pipeline):设置流水线并行配置。`stages`表示流水线并行需要设置的切分总数,`output_broadcast`表示流水线并行推理时,最后一个stage的结果是否广播给其他stage,`interleave`表示是否开启interleave优化策略,`scheduler`表示流水线并行的调度策略,当前支持`gpipe`/`1f1b`/`seqpipe`/`seqvpp`/`seqsmartvpp`/`zero_bubble_v`。 3. [mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.nn.Pipeline.html):流水线并行需要需要在`network`外再添加一层`Pipeline`,并通过`micro_size`指定MicroBatch的个数,以及指出网络中各Cell在哪个`stage`中执行。如果对于`network`使用`nn.WithLossCell`封装,则会改变`Cell`的名称,并增加`_backbone`前缀。为了提升机器的利用率,MindSpore将MiniBatch切分成了更细粒度的MicroBatch,最终的loss则是所有MicroBatch计算的loss值累加。其中,micro_size必须大于等于stages的数量。 @@ -68,7 +68,7 @@ MindSpore在Megatron-LM的interleaved pipeline调度的基础上做了内存优 ### zero_bubble_v pipeline调度 -zero_bubble_v pipeline调度通过将反向计算过程拆分为梯度计算与参数更新,进一步提升流水线并行的效率,减少Bubble率,如图6所示。在zero_bubble_v pipeline调度中,对于连续的模型层,stage的数值会先增大后减小。例如:对于有8个连续层,stage为4时,stage0有第0和第7层,stage1有第1和第6层,stage2有第2和第5层,stage3有第3和第4层。 +zero_bubble_v pipeline调度通过将反向计算过程拆分为梯度计算与参数更新,进一步提升流水线并行的效率,减少Bubble率,如图6所示。在zero_bubble_v pipeline调度中,对于连续的模型层,stage的数值会先增大后减小,前一半层的pipeline_segment为0,后一半层的pipeline_segment为1。例如:对于有8个连续层,stage为4时,stage0有第0和第7层,stage1有第1和第6层,stage2有第2和第5层,stage3有第3和第4层,第0层到和3层的pipeline_segment为0,第4层到第7层的pipeline_segment为1。 ![zero_bubble_v.png](images/zero_bubble_v.png) -- Gitee