diff --git a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md
index b24f7c20422917abf8b9b751dd075b6ceddc0978..a9c557b23c1a2860c050ca393dd6c947c921519a 100644
--- a/docs/mindspore/source_en/features/parallel/pipeline_parallel.md
+++ b/docs/mindspore/source_en/features/parallel/pipeline_parallel.md
@@ -14,7 +14,7 @@ Related interfaces:
 
 2. `mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')`: Configures pipeline parallelism settings. `stages` specifies the total number of partitions for pipeline parallelism. `output_broadcast` determines whether to broadcast the output of the final pipeline stage to all other stages during inference. `interleave` shows that whether to enable interleaving scheduling.`scheduler` defines the pipeline scheduling strategy. Supported values: `gpipe` and `1f1b`.
 
-3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`: Pipeline parallelism requires wrapping the network with an additional layer of `Pipeline`, `micro_size` specifies the number of MicroBatch, which are finer-grained splits of a MiniBatch to improve hardware utilization. The final loss is the accumulation of losses from all MicroBatches. `stage_config` indicates the stage assignment for each Cell in the network. `micro_size` must be greater than or equal to the number of `stages`.
+3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`: Pipeline parallelism requires wrapping the network with an additional layer of `Pipeline`, `micro_size` specifies the number of MicroBatch, which are finer-grained splits of a MiniBatch to improve hardware utilization. If using `WithLossCell` to encapsulate `net`, the name of the `Cell` will be changed and the `_backbone` prefix will be added. The final loss is the accumulation of losses from all MicroBatches. `stage_config` indicates the stage assignment for each Cell in the network. `micro_size` must be greater than or equal to the number of `stages`.
 
 4. `mindspore.parallel.PipelineGradReducer(parameters, scale_sense=1.0, opt_shard=None)`: pipeline parallelism requires using `PipelineGradReducer` for gradient reduction. Because the output of pipeline parallelism is derived by the addition of several micro-batch outputs, as the gradient do.
 
diff --git a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md
index d677c936ac83409a48c32c798a16e6b427bde1d7..2e280f4b06beb08ec02e87794985d33441996ded 100644
--- a/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md
+++ b/docs/mindspore/source_zh_cn/features/parallel/pipeline_parallel.md
@@ -14,7 +14,7 @@
 
 2. `mindspore.parallel.auto_parallel.AutoParallel.pipeline(stages=1, output_broadcast=False, interleave=False, scheduler='1f1b')`：设置流水线并行配置。`stages`表示流水线并行需要设置的切分总数，`output_broadcast`表示流水线并行推理时，最后一个stage的结果是否广播给其他stage，`interleave`表示是否开启interleave优化策略，`scheduler`表示流水线并行的调度策略，当前支持`gpipe`和`1f1b`。
 
-3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`：流水线并行需要需要在network外再添加一层`Pipeline`，并通过`micro_size`指定MicroBatch的个数，以及指出网络中各Cell在哪个`stage`中执行。为了提升机器的利用率，MindSpore将MiniBatch切分成了更细粒度的MicroBatch，最终的loss则是所有MicroBatch计算的loss值累加。其中，micro_size必须大于等于stages的数量。
+3. `mindspore.parallel.Pipeline(network, micro_size=1, stage_config={"cell1":0, "cell2":1})`：流水线并行需要需要在network外再添加一层`Pipeline`，并通过`micro_size`指定MicroBatch的个数，以及指出网络中各Cell在哪个`stage`中执行。如果对于`net`使用`nn.WithLossCell`封装，则会改变`Cell`的名称，并增加`_backbone`前缀。为了提升机器的利用率，MindSpore将MiniBatch切分成了更细粒度的MicroBatch，最终的loss则是所有MicroBatch计算的loss值累加。其中，micro_size必须大于等于stages的数量。
 
 4. `mindspore.parallel.PipelineGradReducer(parameters, scale_sense=1.0, opt_shard=None)`：流水线并行需要使用`PipelineGradReducer`来完成梯度聚合。这是因为流水线并行中，其输出是由多个`MicroBatch`的结果相加得到，因此其梯度也需要进行累加。
 
diff --git a/tutorials/source_zh_cn/model_migration/model_migration.md b/tutorials/source_zh_cn/model_migration/model_migration.md
index c353364ce554f50424f2c091b8d703255178d75e..0667a065b1dec28bbdd976a2403c864eaa01be89 100644
--- a/tutorials/source_zh_cn/model_migration/model_migration.md
+++ b/tutorials/source_zh_cn/model_migration/model_migration.md
@@ -362,10 +362,7 @@ class Trainer:
         loss = self.loss_scale.unscale(loss)
         grads = self.loss_scale.unscale(grads)
         grads = self.grad_reducer(grads)
-        state = all_finite(grads)
-        if state:
-            self.opt(grads)
-
+        self.opt(grads)
         return loss, loss1, loss2
 
     def train(self, epochs):