diff --git a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md index b24f7c20422917abf8b9b751dd075b6ceddc0978..a9c557b23c1a2860c050ca393dd6c947c921519a 100644 --- a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md +++ b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md @@ -14,7 +14,7 @@ Related interfaces: 2. `mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')`: Configures pipeline parallelism settings. `stages` specifies the total number of partitions for pipeline parallelism. `output_broadcast` determines whether to broadcast the output of the final pipeline stage to all other stages during inference. `interleave` shows that whether to enable interleaving scheduling.`scheduler` defines the pipeline scheduling strategy. Supported values: `gpipe` and `1f1b`. -3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`: Pipeline parallelism requires wrapping the network with an additional layer of `Pipeline`, `micro_size` specifies the number of MicroBatch, which are finer-grained splits of a MiniBatch to improve hardware utilization. The final loss is the accumulation of losses from all MicroBatches. `stage_config` indicates the stage assignment for each Cell in the network. `micro_size` must be greater than or equal to the number of `stages`. +3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`: Pipeline parallelism requires wrapping the network with an additional layer of `Pipeline`, `micro_size` specifies the number of MicroBatch, which are finer-grained splits of a MiniBatch to improve hardware utilization. If using `WithLossCell` to encapsulate `net`, the name of the `Cell` will be changed and the `_backbone` prefix will be added. The final loss is the accumulation of losses from all MicroBatches. `stage_config` indicates the stage assignment for each Cell in the network. `micro_size` must be greater than or equal to the number of `stages`. 4. `mindspore.parallel.PipelineGradReducer(parameters, scale_sense=1.0, opt_shard=None)`: pipeline parallelism requires using `PipelineGradReducer` for gradient reduction. Because the output of pipeline parallelism is derived by the addition of several micro-batch outputs, as the gradient do. diff --git a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md index d677c936ac83409a48c32c798a16e6b427bde1d7..2e280f4b06beb08ec02e87794985d33441996ded 100644 --- a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md +++ b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md @@ -14,7 +14,7 @@ 2. `mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')`:设置流水线并行配置。`stages`表示流水线并行需要设置的切分总数,`output_broadcast`表示流水线并行推理时,最后一个stage的结果是否广播给其他stage,`interleave`表示是否开启interleave优化策略,`scheduler`表示流水线并行的调度策略,当前支持`gpipe`和`1f1b`。 -3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`:流水线并行需要需要在network外再添加一层`Pipeline`,并通过`micro_size`指定MicroBatch的个数,以及指出网络中各Cell在哪个`stage`中执行。为了提升机器的利用率,MindSpore将MiniBatch切分成了更细粒度的MicroBatch,最终的loss则是所有MicroBatch计算的loss值累加。其中,micro_size必须大于等于stages的数量。 +3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`:流水线并行需要需要在network外再添加一层`Pipeline`,并通过`micro_size`指定MicroBatch的个数,以及指出网络中各Cell在哪个`stage`中执行。如果对于`net`使用`nn.WithLossCell`封装,则会改变`Cell`的名称,并增加`_backbone`前缀。为了提升机器的利用率,MindSpore将MiniBatch切分成了更细粒度的MicroBatch,最终的loss则是所有MicroBatch计算的loss值累加。其中,micro_size必须大于等于stages的数量。 4. `mindspore.parallel.PipelineGradReducer(parameters, scale_sense=1.0, opt_shard=None)`:流水线并行需要使用`PipelineGradReducer`来完成梯度聚合。这是因为流水线并行中,其输出是由多个`MicroBatch`的结果相加得到,因此其梯度也需要进行累加。 diff --git a/tutorials/source_zh_cn/model_migration/model_migration.md b/tutorials/source_zh_cn/model_migration/model_migration.md index c353364ce554f50424f2c091b8d703255178d75e..0667a065b1dec28bbdd976a2403c864eaa01be89 100644 --- a/tutorials/source_zh_cn/model_migration/model_migration.md +++ b/tutorials/source_zh_cn/model_migration/model_migration.md @@ -362,10 +362,7 @@ class Trainer: loss = self.loss_scale.unscale(loss) grads = self.loss_scale.unscale(grads) grads = self.grad_reducer(grads) - state = all_finite(grads) - if state: - self.opt(grads) - + self.opt(grads) return loss, loss1, loss2 def train(self, epochs):