diff --git a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md index ad447e5954e974dcc5270b24ab9c6d114e37efea..fd29dce4cb9e57d22f4493a64a200fa57a0973f5 100644 --- a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md +++ b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md @@ -66,4 +66,10 @@ MindSpore has made memory optimization based on Megatron LM interleaved pipeline *Figure 5: MindSpore Scheduler of Interleaved Pipeline* +### zero_bubble_v Pipeline Scheduler +As shown in Figure 6, zero_bubble_v pipeline parallelism further improves pipeline parallel efficiency and reduces bubble rate by dividing the backward computation into gradient computation and parameter update. For consecutive model layers, the stage value first increases and then decreases. For example, for 8 layers, when the stage size is 4, stage 0 has layer 0 and 7, stage 1 has layer 1 and 6, stage 2 has 2 and 5, stage 3 has layer 3 and 4. + +![mpp2.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/features/parallel/images/zero_bubble_v.png) + +*Figure 6: zero_bubble_v Pipeline Scheduler* diff --git a/docs/mindspore/source_zh_cn/features/parallel/images/zero_bubble_v.png b/docs/mindspore/source_zh_cn/features/parallel/images/zero_bubble_v.png new file mode 100644 index 0000000000000000000000000000000000000000..ce4a75f259fa1d9e14e6f5388aa617fed85a4021 Binary files /dev/null and b/docs/mindspore/source_zh_cn/features/parallel/images/zero_bubble_v.png differ diff --git a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md index 32057ee1b11d0823044c35654473ec3e27c26046..74d6fb59880c0e15e7868aacf6dccdb2b284850b 100644 --- a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md +++ b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md @@ -66,4 +66,10 @@ MindSpore在Megatron-LM的interleaved pipeline调度的基础上做了内存优 *图5: MindSpore的interleaved pipeline调度* +### zero_bubble_v pipeline调度 +zero_bubble_v pipeline调度通过将反向计算过程拆分为梯度计算与参数更新,进一步提升流水线并行的效率,减少Bubble率,如图6所示。在zero_bubble_v pipeline调度中,对于连续的模型层,stage的数值会先增大后减小。例如:对于有8个连续层,stage为4时,stage0有第0和第7层,stage1有第1和第6层,stage2有第2和第5层,stage3有第3和第4层。 + +![zero_bubble_v.png](images/zero_bubble_v.png) + +*图6: zero_bubble_v pipeline调度*