diff --git a/docs/mindspore/migration_guide/source_en/migration_script.md b/docs/mindspore/migration_guide/source_en/migration_script.md
index bf8879dfd9af83a095313380fff0ec77a15ceaf8..b9c054ea0a65048a807b7c5a85a0c05bcb51da88 100644
--- a/docs/mindspore/migration_guide/source_en/migration_script.md
+++ b/docs/mindspore/migration_guide/source_en/migration_script.md
@@ -12,7 +12,7 @@ This document describes how to migrate network scripts from the TensorFlow or Py
Migrate scripts by reading the TensorBoard graphs。
-1. The [PoseNet](https://arxiv.org/pdf/1505.07427v4.pdf) implemented by TensorFlow is used as an example to show how to use TensorBoard to read graphs, write mindspore code, and migrate [TensorFlow Models](https://github.com/kentsommer/tensorflow-posenet) to MindSpore.
+1. The [PoseNet](https://arxiv.org/pdf/1505.07427v4.pdf) implemented by TensorFlow is used as an example to show how to use TensorBoard to read graphs, write MindSpore code, and migrate [TensorFlow Models](https://github.com/kentsommer/tensorflow-posenet) to MindSpore.
> The PoseNet code mentioned here is based on Python2. You need to make some syntax changes to run on Python3. Details are not described here.
@@ -36,7 +36,7 @@ Migrate scripts by reading the TensorBoard graphs。
Step 2, the result of step 1, the second and third inputs are used to calculate the loss in the loss subnet.
- Step 3, construct the reverse network by using `TrainOneStepCell` automatic differentiation. Use the Adam optimizer and attributes provided by TensorFlow to write the corresponding Mindspore optimizer to update parameters. The network backbone can write as follows:
+ Step 3, construct the reverse network by using `TrainOneStepCell` automatic differentiation. Use the Adam optimizer and attributes provided by TensorFlow to write the corresponding MindSpore optimizer to update parameters. The network backbone can write as follows:
```python
import mindspore
@@ -152,7 +152,7 @@ Migrate scripts by reading the TensorBoard graphs。
return output
```
- The Mindspore subnet is defined as follows:
+ The MindSpore subnet is defined as follows:
```python
from mindspore import nn
@@ -453,7 +453,7 @@ Read the PyTorch script to migrate directly.
model.train(epoch_size, dataset)
```
-PyTorch and mindspore have similar definitions of some basic APIs, such as [mindspore.nn.SequentialCell](https://www.mindspore.cn/docs/api/en/master/api_python/nn/mindspore.nn.SequentialCell.html#mindspore.nn.SequentialCell) and [torch.nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential). In addition, some operator APIs may be not the same. This section lists some common API comparisons. For more information, see the [MindSpore and PyTorch API mapping](https://www.mindspore.cn/docs/note/en/master/index.html#operator_api) on Mindspore's official website.
+PyTorch and MindSpore have similar definitions of some basic APIs, such as [mindspore.nn.SequentialCell](https://www.mindspore.cn/docs/api/en/master/api_python/nn/mindspore.nn.SequentialCell.html#mindspore.nn.SequentialCell) and [torch.nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential). In addition, some operator APIs may be not the same. This section lists some common API comparisons. For more information, see the [MindSpore and PyTorch API mapping](https://www.mindspore.cn/docs/note/en/master/index.html#operator_api) on MindSpore's official website.
| PyTorch | MindSpore |
| :-------------------------------: | :------------------------------------------------: |
@@ -465,4 +465,4 @@ PyTorch and mindspore have similar definitions of some basic APIs, such as [mind
| torch.nn.Linear | mindspore.nn.Dense |
| torch.nn.PixelShuffle | mindspore.ops.operations.DepthToSpace |
-It should be noticed that although `torch.nn.MaxPool2d` and `mindspore.nn.MaxPool2d` are similar in interface definition, and Mindspore actually invokes the `MaxPoolWithArgMax` operator during training on Ascend. The function of this operator is the same as that of TensorFlow, during the migration, and the MindSpore output after the MaxPool layer is inconsistent with that of PyTorch. Theoretically, it's not affect the final training result.
+It should be noticed that although `torch.nn.MaxPool2d` and `mindspore.nn.MaxPool2d` are similar in interface definition, and MindSpore actually invokes the `MaxPoolWithArgMax` operator during training on Ascend. The function of this operator is the same as that of TensorFlow, during the migration, and the MindSpore output after the MaxPool layer is inconsistent with that of PyTorch. Theoretically, it's not affect the final training result.
diff --git a/docs/mindspore/migration_guide/source_en/neural_network_debug.md b/docs/mindspore/migration_guide/source_en/neural_network_debug.md
index 5a6f1e61461a14ba5369a549739cbfeb27ce7cb3..4b2ca75bc4e7a75fa733093f750ffdf753679666 100644
--- a/docs/mindspore/migration_guide/source_en/neural_network_debug.md
+++ b/docs/mindspore/migration_guide/source_en/neural_network_debug.md
@@ -37,23 +37,23 @@ This section introduces the problems and solutions during Network Debugging proc
For script development and network process debugging, we recommend using the PyNative mode for debugging. The PyNative mode supports executing single operators, normal functions and networks, as well as separate operations for computing gradients. In PyNative mode, you can easily set breakpoints and get intermediate results of network execution, and you can also debug the network by means of pdb.
-By default, MindSpore is in PyNative mode, which can also be defined explicitly via `context.set_context(mode=context.PYNATIVE_MODE)`. Related examples can be found in [Debugging With PyNative Mode](https://www.mindspore.cn/docs/programming_guide/en/master/debug_in_pynative_mode.html#pynative).
+By default, MindSpore is in Graph mode, which can be set as PyNative mode via `context.set_context(mode=context.PYNATIVE_MODE)`. Related examples can be found in [Debugging With PyNative Mode](https://www.mindspore.cn/docs/programming_guide/en/master/debug_in_pynative_mode.html#pynative).
#### Getting More Error Messages
-During the network process debugging, if you need to get more information about error reports, you can get it by the following ways:
+During the network process debugging, if you need to get more information about error messages, you can get it by the following ways:
- Using pdb for debugging in PyNative mode, and using pdb to print relevant stack and contextual information to help locate problems.
- Using Print operator to print more contextual information. Related examples can be found in [Print Operator Features](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#print).
-- Adjusting the log level to get more error information, MindSpore can easily adjust the log level through environment variables. Related examples can be found in [Logging-related Environment Variables And Configurations](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#id6).
+- Adjusting the log level to get more error information. MindSpore can easily adjust the log level through environment variables. Related examples can be found in [Logging-related Environment Variables And Configurations](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#id6).
#### Common Errors
During network process debugging, the common errors are as follows:
-- The operator execution reports an error.
+- The operator execution error.
- During the network process debugging, errors are often reported in the execution of arithmetic such as shape mismatch and unsupported dtype. Then, according to the error message, you should check whether the arithmetic is used correctly and whether the shape of the input data is consistent with the expectation and make corresponding modifications.
+ During the network process debugging, operator execution errors are often reported such as shape mismatch and unsupported dtype. Then, according to the error message, you should check whether the operator is used correctly and whether the shape of the input data is consistent with the expectation and make corresponding modifications.
Supports for related operators and API introductions can be found in [Operator Support List](https://www.mindspore.cn/docs/note/en/master/operator_list.html) and [Operators Python API](https://www.mindspore.cn/docs/api/en/master/index.html).
@@ -67,11 +67,11 @@ During network process debugging, the common errors are as follows:
### Loss Value Comparison
-Having a benchmark script, the loss values run by the benchmark script can be compared with those run by the MindSpore script which can be used to verify the correctness of the overall network structure and the accuracy of the operator.
+With a benchmark script, the loss values run by the benchmark script can be compared with those run by the MindSpore script, which can be used to verify the correctness of the overall network structure and the accuracy of the operator.
#### Main Steps
-1. Guaranteed Identical Input
+1. Guaranteeting Identical Input
It is necessary to ensure that the inputs are the same in both networks, so that they can have the same network output in the same network structure. The same inputs can be guaranteed using following ways:
@@ -81,21 +81,21 @@ Having a benchmark script, the loss values run by the benchmark script can be co
input = Tensor(np.random.randint(0, 10, size=(3, 5, 10)).astype(np.float32))
```
- - Using the same dataset for computation. MindSpore supports the use of the TFRecord dataset, which can be read using the `mindspore.dataset.TFRecordDataset` interface.
+ - Using the same dataset for computation. MindSpore supports the use of the TFRecord dataset, which can be read by using the `mindspore.dataset.TFRecordDataset` interface.
-2. Removing The Influence Of Randomness In The Network
+2. Removing the Influence of Randomness in the Network
- The main methods to remove the effect of randomness in the network are to set the same randomness seed, turn off the data shuffle, modify the code to remove the effect of dropout, initializer and other operators with randomness in the network, etc.
+ The main methods to remove the effect of randomness in the network are to set the same randomness seed, turn off the data shuffle, modify the code to remove the effect of random operators in the network such as dropout and initializer, etc.
-3. Ensuring The Same Settings For The Relevant Hyperparameters
+3. Ensuring the Same Settings for the Relevant Hyperparameters
It is necessary to ensure the same settings for the hyperparameters in the network in order to guarantee the same input and the same output of the operator.
-4. Running the network and comparing the output loss values. Generally, the error of the loss value is about 1%. Because the operator itself has a certain accuracy error. As the number of steps increases, the error will have a certain accumulation.
+4. Running the network and comparing the output loss values. Generally, the error of the loss value is about 1‰. Because the operator itself has a certain accuracy error. As the number of steps increases, the error will have a certain accumulation.
#### Related Issues Locating
-If the loss errors are large, the problem locating can be done using following ways:
+If the loss errors are large, the problem locating can be done by using following ways:
- Checking whether the input and hyperparameter settings are the same, and whether the randomness effect is completely removed.
@@ -127,7 +127,7 @@ If the loss errors are large, the problem locating can be done using following w
- [Callback Function](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#callback)
- MindSpore has provided ModelCheckpoint, LossMonitor, SummaryCollector and other Callback classes for saving model parameters, monitoring loss values, saving training process information, etc. Users can also customize Callback functions like starting and ending runs at each epoch and step, please refer to [Custom Callback](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#id3) for specific examples.
+ MindSpore has provided ModelCheckpoint, LossMonitor, SummaryCollector and other Callback classes for saving model parameters, monitoring loss values, saving training process information, etc. Users can also customize Callback functions to implement starting and ending runs at each epoch and step, and please refer to [Custom Callback](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#id3) for specific examples.
- [MindSpore Metrics Function](https://www.mindspore.cn/docs/programming_guide/en/master/custom_debugging_info.html#mindspore-metrics)
@@ -139,7 +139,7 @@ If the loss errors are large, the problem locating can be done using following w
- Customized Learning Rate
- MindSpore provides some common implementations of dynamic learning rate and some common optimizers with adaptive learning rate adjustment functions, referring to [Dynamic Learning Rate](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.nn.html#dynamic-learning-rate) and [Optimizer Functions](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.nn.html#optimizer-functions) in the API documentation.
+ MindSpore provides some common implementations of dynamic learning rate and some common optimizers with adaptive learning rate adjustment functions, and [Dynamic Learning Rate](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.nn.html#dynamic-learning-rate) and [Optimizer Functions](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.nn.html#optimizer-functions) in the API documentation can be found.
At the same time, the user can implement a customized dynamic learning rate, as exemplified by WarmUpLR:
@@ -166,7 +166,7 @@ If the loss errors are large, the problem locating can be done using following w
#### Hyper-Parameter Optimization with MindOptimizer
-MindSpore provides MindOptimizer tools to help users perform hyper-parameter optimization conveniently, please refer to [Hyper-Parameter Optimization With MindOptimizer](https://www.mindspore.cn/mindinsight/docs/en/master/hyper_parameters_auto_tuning.html) for detailed examples and usage methods.
+MindSpore provides MindOptimizer tools to help users perform hyper-parameter optimization conveniently, and the detailed examples and usage methods can be found in [Hyper-Parameter Optimization With MindOptimizer](https://www.mindspore.cn/mindinsight/docs/en/master/hyper_parameters_auto_tuning.html).
#### Loss Value Anomaly Locating
@@ -176,11 +176,11 @@ For cases where the loss value is INF, NAN, or the loss value does not converge,
In the scenario of using loss_scale with mixed precision, the situation that the loss value is INF and NAN may be caused by the scale value being too large. If it is dynamic loss_scale, the scale value will be adjusted automatically; if it is static loss_scale, the scale value needs to be reduced.
- If the `scale=1` case still has a loss value of INF, NAN, then there should be an overflow of operators in the network and further investigation for locating the problem is needed.
+ If the `scale=1` case still has a loss value of INF, NAN, there should be an overflow of operators in the network and further investigation for locating the problem is needed.
2. The causes of abnormal loss values may be caused by abnormal input data, operator overflow, gradient disappearance, gradient explosion, etc.
- To check the intermediate value of the network such as operator overflow, gradient of 0, abnormal weight, gradient disappearance and gradient explosion, it is recommended to use [MindInsight Debugger](https://www.mindspore.cn/mindinsight/docs/en/master/debugger.html) to set the corresponding detection points for detection and debugging, which can locate the problem in a more comprehensive way with stronger debuggability.
+ To check the intermediate value of the network such as operator overflow, gradient of 0, abnormal weight, gradient disappearance and gradient explosion, it is recommended to use [MindInsight Debugger](https://www.mindspore.cn/mindinsight/docs/en/master/debugger.html) to set the corresponding detection points for detection and debugging, which can locate the problem in a more comprehensive way with the strong debuggability.
The following are a few simple initial troubleshooting methods:
@@ -215,7 +215,7 @@ For cases where the loss value is INF, NAN, or the loss value does not converge,
print('same params num: ', same)
```
- - Checking whether there is NAN, INF abnormal data in the weight value, you can also load the Checkpoint file for a brief judgment. In general, if there is NAN, INF in the weight value, then there is also NAN, INF in the gradient calculation, and there may be an overflow situation. The relevant code reference is as follows:
+ - Checking whether there is NAN, INF abnormal data in the weight value, you can also load the Checkpoint file for a brief judgment. In general, if there is NAN, INF in the weight value, there is also NAN, INF in the gradient calculation, and there may be an overflow situation. The relevant code reference is as follows:
```python
import mindspore
diff --git a/docs/mindspore/migration_guide/source_en/overview.md b/docs/mindspore/migration_guide/source_en/overview.md
index e99101a2be54e192bff7c47bd60b04563a48d117..d14434ee34e864be0b515d2361866a3297a4b584 100644
--- a/docs/mindspore/migration_guide/source_en/overview.md
+++ b/docs/mindspore/migration_guide/source_en/overview.md
@@ -4,7 +4,7 @@
This migration guide describes the complete steps for migrating neural networks from other machine learning frameworks to MindSpore.
-To prepare for the migration process, configure the necessary environment and then analyze the operators contained in the network script. MindSpore script development starts from data processing code, uses MindConverter to build a network to obtain the migrated network script, and finally migrates the inference execution script. After the build is complete, the optimization process includes development and debugging of missing operators and optimization of network performance and accuracy. The migration guide provides solutions to common problems in the migration process and complete network migration examples. Examples are provided in each chapter for reference. The following figure shows the migration process.
+To prepare for the migration process, configure the necessary environment and then analyze the operators contained in the network script. MindSpore script development starts from data processing code, uses MindConverter for network building and obtains the migrated network script, and finally migrates the inference execution script. After the build is complete, the optimization process includes development and debugging of missing operators and optimization of network performance and accuracy. The migration guide provides solutions to common problems in the migration process and complete network migration examples. Examples are provided in each chapter for reference. The following figure shows the migration process.

@@ -22,15 +22,15 @@ After the network script analysis is complete, you can use MindSpore to develop
## Operator Development and Debugging
-Some operators are not supported when the network is migrated to the MindSpore framework. You can provide feedback to the MindSpore developer community or develop custom MindSpore operators. This chapter includes tutorials and examples for operator development, as well as common debugging skills.
+Some operators are not supported when the network is migrated to the MindSpore framework. In addotiom to the feedback to the MindSpore developer communit, you can develop customized MindSpore operators. This chapter includes tutorials and examples for operator development, as well as common debugging skills.
## Network Debugging
-After the network script is developed and the operator is supplemented, you need to debug the model to ensure that the output result is correct. This chapter describes the common network debugging ideas: single-step debugging and multi-round iterative debugging. Common debugging methods include comparing subnet output results in PyNative mode. MindSpore also supports custom debugging information. At last, the solutions to common problems are provided.
+After the network script is developed and the operator is supplemented, you need to debug the model to ensure that the output result is correct. This chapter describes the common network debugging ideas: single-step debugging and multi-round iterative debugging. Common debugging methods include comparing subnet output results with PyNative mode. MindSpore also supports customized debugging information. At last, the solutions to common problems are provided.
## Accuracy and Performance Tuning
-After the network script debugging is complete and the result can be successfully output, you need to tune the model to achieve the expected performance. MindSpore provides developers with the profiler tool which provides easy-to-use and abundant tuning functions in terms of operator performance, iteration performance, and data processing performance, helping users quickly locate and solve performance problems. The tutorials are classified into tuning on the Ascend platform and that on the GPU platform, and three examples of using the profiler tool are provided.
+After the network script debugging is complete and the result can be successfully output, you need to tune the model to achieve the expected performance. MindSpore provides developers with the Profiler tool which provides easy-to-use and abundant tuning functions in terms of operator performance, iteration performance, and data processing performance, to help users quickly locate and solve performance problems. The tutorials are classified into tuning on the Ascend platform and on the GPU platform, and three examples of using the Profiler tool are provided.
## Inference Execution
@@ -38,8 +38,8 @@ MindSpore can execute inference tasks on different hardware platforms based on t
## Network Migration Debugging Example
-This chapter provides a complete network migration example. Using ResNet-50 as an example, this chapter describes how to analyze and reproduce the benchmark network, how to develop scripts, and how to debug and optimize the accuracy. In addition, this chapter lists common problems and corresponding optimization methods during the migration, for example, multi-node synchronization problems and framework performance problems.
+This chapter provides a complete network migration example. Using ResNet-50 as an example, this chapter describes from how to analyze and reproduce the benchmark network, and how to develop scripts and how to debug and optimize the accuracy. In addition, this chapter lists common problems and corresponding optimization methods during the migration, for example, multi-node synchronization problems and framework performance problems.
## FAQs
-This chapter lists the frequently asked questions (FAQs) and solutions during network migration.
+This chapter lists the frequently asked questions and solutions during network migration.
diff --git a/docs/mindspore/migration_guide/source_en/performance_optimization.md b/docs/mindspore/migration_guide/source_en/performance_optimization.md
index b6c45c2519305e3d195d3c7b747d04cb8567b691..551f2d68eec78853bf6e6ab3392832ff27ab34db 100644
--- a/docs/mindspore/migration_guide/source_en/performance_optimization.md
+++ b/docs/mindspore/migration_guide/source_en/performance_optimization.md
@@ -1,10 +1,10 @@
-# Using Performance Profiling Tool
+# Performance Profiling
Profiler provides performance tuning ability for MindSpore, and provides easy-to-use and rich debugging functions in operator performance, data processing performance, etc., helping users quickly locate and solve performance problems.
-This chapter introduces the common methods and cases of performance tuning in neural networks, as well as the resolution of some common problems.
+This chapter introduces the common methods and cases of performance tuning, as well as the solutions of some common problems.
## Quick Start
@@ -20,31 +20,32 @@ This section will introduce the common use of MindSpore Profiler through three t
### Case 1: Long Step Interval
-We run ResNet50 training script in MindSpore [ModelZoo](https://gitee.com/mindspore/models/tree/master ) with batch size set to 32, and we find that each step cost almost 90ms.
-As we observed on the MindInsight UI page, the step interval in the step trace is too long, which may indicate that performance can be optimized in the dataset processing process.
+We run ResNet50 training script in MindSpore [ModelZoo](https://gitee.com/mindspore/models/tree/master ) with batch size set to 32, and we find that each step cost almost 90ms, with a poor performance.
+As we observed on the MindInsight UI page, the step interval in the step trace is too long, which may indicate that data is the performance bottleneck.

*Figure 1: Long Step Interval in Step Trace*
-Looking at the ```Step Interval``` tab in ```Data Preparation details``` page, we can see that the ratio of full queues in ```Host Queue``` is low, which can be preliminarily determined that the performance related to dataset processing can be improved.
+Looking at the iteration gap tab in the data preparation details page, we observed that the data queue has more data in the early stage, and the number of later data becomes 0, because the loading and augmentment of the dataset has begun in the early stage of the graph compilation stage, and multiple pieces of data are cached in the queue.
+
+After the normal training begins in the later stage, the data in the queue is consumed faster than the speed of production, so the data queue gradually becomes empty, indicating that the data becomes a bottleneck at this time. The same is true for observing host queues.
+
+With comprehensive analysis, during normal training, data processing is a performance bottleneck. Therefore, you need to go to the data processing tab in the data preparation details page to see the specific issue.

*Figure 2: Data Preparation Details -- Step Interval*
-Switch to the ```Data Processing``` tab to find which operator is slower.
+By observing the `queue relationship between operators` in the Data Processing tab, we find that the queue usage of the `Queue_3` and later is low, that is, the speed `MapOp_3` of production data as a producer is slower, so we can determine that there is still room for optimization of the performance of `MapOp_3`, and try to optimize the performance of the operator.

*Figure 3: Data Preparation Details -- Data Processing*
-By observing the ```Queue relationship between operators```, we find that the average usage of ```Queue_3``` is relatively inefficient.
-
-Therefore, it can be determined that we can adjust the corresponding dataset operators, ```MapOp_3```, to achieve better performance.
We can refer to [Optimizing the Data Processing](https://www.mindspore.cn/docs/programming_guide/en/master/optimize_data_processing.html ) to adjust dataset operators to improve dataset performance.
-We observe that the ```num_parallel_workers``` parameter of map operator is 1(default value) in ResNet50 training script, code is shown below:
+We find that the num_parallel_workers parameter of map operator is 1(default value) by observing the code part of data processing in ResNet50, and code is shown below:
```python
if do_train:
@@ -66,13 +67,13 @@ else:
data_set = data_set.map(operations=trans, input_columns="image")
```
-Therefore we try to increase the 'num_parallel_workers' parameter to 12 and run training script again. Optimization code is shown below:
+Therefore we try to increase the num_parallel_workers parameter to 12 and run training script again. Optimization code is shown below:
```python
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=12)
```
-We see on the MindInsight UI page that step interval is shorten from 72.8ms to 0.25ms.
+By observing the step trace on the MindInsight performance analysis page, you can see that the step interval is shortened from 72.8ms to 0.25ms, and each step time is shortened from 90ms to 18.07ms.

@@ -80,22 +81,21 @@ We see on the MindInsight UI page that step interval is shorten from 72.8ms to 0
### Case 2: Long Forward Propagation Interval
-We run VGG16 eval script in MindSpore [ModelZoo](https://gitee.com/mindspore/models/tree/master ) , and each step cost almost 113.79ms.
+We run VGG16 inference script in MindSpore [ModelZoo](https://gitee.com/mindspore/models/tree/master ) , and each step cost almost 113.79ms, with a poor performance.
-As we observed on the MindInsight UI page, the forward propagation in the step trace is too long, which may indicate that operators performance can be optimized.
+As we observed on the MindInsight UI page, the forward propagation in the step trace is too long, which may indicate that operators performance can be optimized. In a single card training or inference process, the forward time consumption is usually considered whether there is a operator that can be optimized for the time consumption.

*Figure 5: Long FP interval in Step Trace*
-From the details page of ```Operator Time Consumption Ranking``` we find that ```MatMul``` operator is time-consuming.
+Open the details page of Operator Time Consumption Ranking, and we find that MatMul operator is time-consuming in the operator details page.

*Figure 6: Finding operators that can be optimized via the details page of Operator Time Consumption Ranking*
-Usually float16 type can be used to improve operator performance if there is no difference in accuracy between float16 and float32 type. We can refer to
-[Enabling Mixed Precision](https://www.mindspore.cn/docs/programming_guide/en/master/enable_mixed_precision.html ) to improve operators performance.
+For Operator Time Consumption optimization, usually float16 type with the less computating amount can be used to improve operator performance if there is no difference in accuracy between float16 and float32 type. We can refer to [Enabling Mixed Precision](https://www.mindspore.cn/docs/programming_guide/en/master/enable_mixed_precision.html ) to improve operators performance.
Optimization code is shown below:
@@ -106,7 +106,7 @@ network = vgg16(config.num_classes, config, phase="test")
network.add_flags_recursive(fp16=True)
```
-We run eval script again after set ```fp16``` flag, and the forward propagation interval is shorten from 82.45ms to 16.89ms.
+After the float16 format is set, the inference script is run. From the MindInsight performance analysis page to observe the step trace, we can see that the forward propagation interval is shorten from 82.45ms to 16.89ms and each step time consumption is shortened, which is shown as the following picture:

@@ -114,21 +114,17 @@ We run eval script again after set ```fp16``` flag, and the forward propagation
### Case 3: Optimize The Step Tail
-We run ResNet50 training script with 8 processes in MindSpore [ModelZoo](https://gitee.com/mindspore/models/tree/master ) , set batch size to 32, and each step cost about 23.6ms.
-We still want to improve the performance.
+We run ResNet50 training script with 8 processes in MindSpore [ModelZoo](https://gitee.com/mindspore/models/tree/master ) , set batch size to 32, and each step cost about 23.6ms. We still want to improve each step time consumption.
-As we observed on the MindInsight UI page, step interval and FP/BP interval can not be improved more, so we try to optimize step tail.
+As we observed the step trace on the MindInsight UI page, step interval and FP/BP interval can not be improved more, so we try to optimize step tail.

*Figure 8: Step Trace with Long Step Tail*
-Step Tail is the duration for performing parameter aggregation and update operations in parallel training.
-Normally, AllReduce gradient synchronization waits until all the inverse operators are finished, i.e., all the gradients of all weights are computed before synchronizing the gradients of all machines at once, but with AllReduce tangent,
-we can synchronize the gradients of some weights as soon as they are computed, so that the gradient synchronization and the gradient computation of the remaining operators can be performed in parallel,
-hiding this part of the AllReduce gradient synchronization time. The slicing strategy is usually a manual attempt to find an optimal solution (supporting slicing greater than two segments).
-As an example, ResNet50 network has 160 weights, and [85, 160] indicates that the gradient synchronization is performed immediately after the gradient is calculated for the 0th to 85th weights,
-and the gradient synchronization is performed after the gradient is calculated for the 86th to 160th weights.
+Step Tail duration contains AllReduce gradient synchronization, parameter update and other operations. Normally, AllReduce gradient synchronization waits until all the inverse operators are finished, i.e., all the gradients of all weights are computed before synchronizing the gradients of all machines at once. With AllReduce tangent, we can synchronize the gradients of some weights as soon as they are computed, so that the gradient synchronization and the gradient computation of the remaining operators can be performed in parallel, hiding this part of the AllReduce gradient synchronization time.
+
+The tangent strategy is usually a manual attempt to find an optimal solution (supporting slicing greater than two segments). As an example, ResNet50 network has 160 weights, and [85, 160] indicates that the gradient synchronization is performed immediately after the gradient is calculated for the 0th to 85th weights, and the gradient synchronization is performed after the gradient is calculated for the 86th to 160th weights. Here there are two segments, therefore the gradient synchronization is required to perform twice.
Optimization code is shown below:
@@ -146,7 +142,7 @@ else:
init()
```
-We run ResNet50 8P script again after set the ```all_reduce_fusion_config``` parameter and see that the step tail is shorten from 6.15ms to 4.20ms.
+We run ResNet50 8P script again after AllReduce is sliced. The step trace is observed in the MindInsight performance analysis page, and the step tail is shorten from 6.15ms to 4.20ms. The figure is shown in the following:

diff --git a/docs/mindspore/migration_guide/source_en/sample_code.md b/docs/mindspore/migration_guide/source_en/sample_code.md
index 17bd1637916be4b3add538bf945f603fbe374854..0c65643a7a905d3720d27906984c2a8e8ba4b387 100644
--- a/docs/mindspore/migration_guide/source_en/sample_code.md
+++ b/docs/mindspore/migration_guide/source_en/sample_code.md
@@ -8,30 +8,30 @@ This chapter will introduce the basic steps of network migration, common tools,
Here we take the classical network ResNet50 as an example and introduce the network migration method in detail with codes.
-## Analysis and Reproduce of the Network
+## Analysis and Reproduce of the Benchmark Network
-### Determine the Migration Target
+### Determining the Migration Target
-The first step of network migration is to determine the migration goal. Usually the delivery goal of a deep neural network includes the following four parts.
+The first step of network migration is to determine the migration goal, that is, first find a proper and achievable standard. Usually the delivery goal of a deep neural network includes the following four parts.
1. network implementation: this is the most basic part of the migration goal. Sometimes a single neural network may have different versions, a single version may be implemented differently, or a single neural network may adopt different configurations of hyperparameters, and these differences will have some impacts on the final convergence accuracy and performance. Usually, we take the neural network author's own implementation as the standard, but we can also refer to the official implementations of different frameworks (e.g., TensorFlow, PyTorch, etc.) or other mainstream open source toolkits (e.g., MMDetection).
-2. dataset: the same neural network and parameters often vary greatly in datasets, so we need to confirm the dataset used for the migration network. The data content of some datasets will be updated frequently, and it is necessary to pay attention to the version of the dataset, the ratio of training data to test data division, etc. when using the dataset.
-3. convergence accuracy: different frameworks, GPU models, and whether the training is distributed will have an impact on the accuracy, so we need to analyze the framework, hardware and other information of the counterpart when determining the migration target.
-4. training performance: the same as convergence accuracy, training performance is mainly affected by the network script, framework performance, GPU hardware itself and whether the training is distributed or not.
+2. dataset: the same neural network and parameters often vary greatly in datasets, so we need to confirm the dataset used for the migration network. The data content of some datasets will be updated frequently, and it is necessary to pay attention to the version of the dataset, the ratio of training data to test data division, etc. when determining the dataset.
+3. convergence accuracy: different frameworks, GPU models, and whether the distributed training will have an impact on the accuracy, so we need to analyze the framework, hardware and other information of the benchmark when determining the migration target.
+4. training performance: the same as convergence accuracy. Training performance is mainly affected by the network script, framework performance, GPU hardware itself and whether the distributed training is or not.
#### ResNet50 Migration Example
ResNet50 is a classic deep neural network in CV, which attracts more developers' attention and replication, and the syntax of PyTorch is more similar to MindSpore, so we choose PyTorch as the benchmark framework.
-The official PyTorch implementation script can be found at [torchvision model](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) or [Nvidia PyTorch implementation script](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5), which includes implementations of the mainstream ResNet family of networks (ResNet18, ResNet18, ResNet18, ResNet18, and ResNet18). (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152). The dataset used for ResNet50 is ImageNet2012, and the convergence accuracy can be found in [PyTorch Hub](https://pytorch.org/hub/) pytorch_vision_resnet/#model-description).
+The official PyTorch implementation script can be found at [torchvision model](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) or [Nvidia PyTorch implementation script](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v1.5), which includes implementations of the mainstream ResNet family of networks (ResNet18, ResNet18, ResNet18, ResNet18, and ResNet18). (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152). The dataset used for ResNet50 is ImageNet2012, and the convergence accuracy can be found in [PyTorch Hub](https://pytorch.org/hub/pytorch_vision_resnet/#model-description).
-Developers can run PyTorch-based ResNet50 scripts directly on the benchmark hardware environment and then evaluate the performance of the model, or they can refer to the official data on the same hardware environment. For example, when we benchmark the Nvidia DGX-1 32GB (8x V100 32GB) hardware, we can refer to [Nvidia's official ResNet50 performance data](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v15#training-performance-nvidia-dgx-1-32gb-8x-v100-32gb).
+Developers can run PyTorch-based ResNet50 scripts directly on the benchmark hardware environment and then computes the performance data, or they can refer to the official data on the same hardware environment. For example, when we benchmark the Nvidia DGX-1 32GB (8x V100 32GB) hardware, we can refer to [Nvidia's official ResNet50 performance data](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/ConvNets/resnet50v15#training-performance-nvidia-dgx-1-32gb-8x-v100-32gb).
### Reproduce the Migration Target
-Once the network migration target is determined, the next thing to do is to reproduce the metrics. When there is an accuracy/performance gap between the network we developed in MindSpore and the benchmark script, we often use the benchmark data as a base line to analyze the difference between the migration script and the benchmark script step by step. If the benchmark script cannot reproduce the metrics, then the MindSpore scripts we develop based on the benchmark will not be able to achieve the migration goals. When reproducing migration metrics, it is not only important to reproduce the training phase, but also the inference phase.
+Once the network migration target is determined, the next thing to do is to reproduce the metrics. Reproducing benchmark data is essential to the subsequent accuracy and performance tuning. When there is an accuracy/performance gap between the network we developed in MindSpore and the benchmark script, we often use the benchmark data as a base line to analyze the difference between the migration script and the benchmark script step by step. If the benchmark script cannot reproduce the metrics, the MindSpore scripts we develop based on the benchmark will not be able to achieve the migration goals. When reproducing migration metrics, it is not only important to reproduce the training phase, but important to reproduce the inference phase.
-It is important to note that for some networks, using the same hardware environment and scripts, the final convergence accuracy and performance may be slightly different from the results presented by the original authors, which is a normal range of fluctuation and should be taken into account when migrating the network.
+It is important to note that for some networks, using the same hardware environment and scripts, the final convergence accuracy and performance may be slightly different from the results presented by the original authors, which is a normal range of fluctuation. The fluctuation should be taken into account when migrating the network.
### Reproduce the Single Step Results
@@ -43,11 +43,11 @@ The main purpose of reproducing the single Step results is for the next script d
Before starting the actual script development, a benchmark script analysis is performed. The purpose of the script analysis is to identify missing operators or features in MindSpore compared to the benchmark framework. The methodology can be found in the [Script Evaluation Tutorial](https://www.mindspore.cn/docs/migration_guide/en/master/script_analysis.html).
-MindSpore already supports most of the common [functions](https://www.mindspore.cn/docs/programming_guide/en/master/index.html) and [operators](https://www.mindspore.cn/docs/note/en/master/operator_list.html). MindSpore supports both dynamic graph (PyNative) mode and static graph (Graph) mode, dynamic graph mode is flexible and easy to debug, so dynamic graph mode is mainly used for network debugging. Static graph mode has good performance and is mainly used for whole network training. When analyzing missing operators and functions, these two modes should be analyzed separately.
+MindSpore supports most of the common [functions](https://www.mindspore.cn/docs/programming_guide/en/master/index.html) and [operators](https://www.mindspore.cn/docs/note/en/master/operator_list.html). MindSpore supports both dynamic graph (PyNative) mode and static graph (Graph) mode. Dynamic graph mode is flexible and easy to debug, so dynamic graph mode is mainly used for network debugging. Static graph mode has good performance and is mainly used for whole network training. When analyzing missing operators and functions, these two modes should be analyzed separately.
-If missing operators and functions are found, we can first consider combining the missing operators and functions based on the current operators or functions, and for mainstream CV and NLP networks, new missing operators can generally be solved by combining existing operators.
+If missing operators and functions are found, we can first consider combining the missing operators and functions based on the current operators or functions, and for mainstream CV and NLP class networks, new missing operators can generally be solved by combining existing operators.
-The combined operator can be implemented by means of a cell, which is the case in MindSpore for [nn class operator](https://gitee.com/mindspore/mindspore/tree/master/mindspore/python/mindspore/nn). For example, the following `ReduceSumExp` operator is a combination of the existing `Exp`, `ReduceSum`, and `Log` suboperators.
+The combined operator can be implemented by means of a Cell. In MindSpore, [nn class operator](https://gitee.com/mindspore/mindspore/tree/master/mindspore/python/mindspore/nn) is implemented via this way. For example, the following `ReduceSumExp` operator is a combination of the existing `Exp`, `ReduceSum`, and `Log` suboperators.
```python
class ReduceLogSumExp(Cell):
@@ -67,7 +67,7 @@ class ReduceLogSumExp(Cell):
return logsumexp
```
-If the missing functions and operators cannot be circumvented, or if the performance of the combined operators is poor and seriously affects the training and inference of the network, you can contact [MindSpore Community](https://gitee.com/mindspore/mindspore/issues) for feedback and we will have a dedicated staff to solve it for you.
+If the missing functions and operators cannot be circumvented, or if the performance of the combined operators is poor, which seriously affects the training and inference of the network, you can contact [MindSpore Community](https://gitee.com/mindspore/mindspore/issues) for feedback and we will have a dedicated staff to solve it for you.
#### ResNet50 Migration Example
@@ -77,9 +77,9 @@ The following is the structure of the ResNet family of networks.
The PyTorch implementation of the ResNet50 script is referenced in the [torchvision model](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py).
-We can analyze it based on both operator and functional aspects.
+We can analyze it based on both operator and function aspects.
-- Algorithm analysis
+- Operator analysis
| PyTorch operator | MindSpore operator | supported |
| ---------------------- | ------------------ | ---------------------- |
@@ -93,7 +93,7 @@ We can analyze it based on both operator and functional aspects.
Note: For PyTorch scripts, MindSpore provides the [PyTorch operator mapping tool](https://www.mindspore.cn/docs/programming_guide/en/master/index.html#operator_api ), which can directly query whether the operator is supported.
-- Feature Analysis
+- Function Analysis
| Pytorch Features | MindSpore Features |
| ------------------------- | ------------------------------------- |
@@ -106,24 +106,24 @@ Note: For PyTorch scripts, MindSpore provides the [PyTorch operator mapping tool
(Since the interface design of MindSpore and PyTorch are not exactly the same, only the key functions are listed here for comparison)
-After the operator and function analysis, we found that compared to PyTorch, MindSpore has no missing functions, but the missing operator `nn.AdaptiveAvgPool` is missing. In the ResNet50 network, the input image shape is fixed and uniform as `N,3,224,224`, where N is the batch size, 3 is the number of channels, 224 and 224 are the width and height of the image, respectively, and the operators that change the image size in the network are `Conv2d` and `Maxpool2d`, the effect of these two operators on the shape is fixed, so the input and output shapes of `nn.AdaptiveAvgPool2D` can be determined in advance, as long as we calculate the input and output shapes of `nn.AvgPool` or `nn.ReduceMean`, so the absence of this operator is replaceable and does not affect the training of the network.
+After the operator and function analysis, we found that compared to PyTorch, MindSpore has no missing functions, but `nn.AdaptiveAvgPool` is missing in the operatpr. In this way, we need to further analyzed whether the missing operator has a replaceable plan. In the ResNet50 network, the input image shape is fixed and uniformed as `N,3,224,224`, where N is the batch size, 3 is the number of channels, 224 and 224 are the width and height of the image respectively, and the operators that change the image size in the network are `Conv2d` and `Maxpool2d`, and the effect of these two operators on the shape is fixed, so the input and output shapes of `nn.AdaptiveAvgPool2D` can be determined in advance. As long as we calculate the input and output shapes of `nn.AdaptiveAvgPool2D`, it is implemented via `nn.AvgPool` and `nn.ReduceMean`. The absence of this operator is replaceable and does not affect the training of the network.
### Data Preprocessing
-To understand the implementation of a neural network, it is necessary to know the input data of the network first, so data preprocessing is the first part of the script development.MindSpore has designed a module dedicated to data processing - MindData, and data preprocessing with MindData consists of the following steps.
+To understand the implementation of a neural network, it is necessary to know the input data of the network first, so data preprocessing is the first part of the script development. MindSpore has designed a module dedicated to data processing - MindData, and data preprocessing with MindData consists of the following steps:
-1. Importing the data path and reading the data file.
+1. importing the data path and reading the data file.
2. parsing the data.
3. data processing (e.g. common data slicing, shuffle, data augmentation, etc.).
4. data distribution (distribution of data in batch_size units, distributed training involves multi-machine distribution).
In the process of reading and parsing data, MindSpore provides a more friendly data format - [MindRecord](https://www.mindspore.cn/docs/programming_guide/en/master/convert_dataset.html). Users can convert the dataset in regular format to MindSpore data format, i.e. MindRecord, so that it can be easily loaded into MindSpore for training. At the same time, MindSpore is optimized for performance in some scenarios, and better performance can be obtained by using the MindRecord data format.
-Data processing is usually the most time-consuming phase of data preparation, and most of the operations on data are included in this step, such as Resize, Rescale, Crop, etc. in CV-like networks. MindSpore provides a set of common data processing integration interfaces, which can be called directly by users without implementing them. These integration interfaces not only improve the user-friendliness, but also improve the performance of data preprocessing and reduce the time consuming data preparation during training. For details, please refer to the [Data Preprocessing Tutorial](https://www.mindspore.cn/docs/programming_guide/en/master/optimize_data_processing.html).
+Data processing is usually the most time-consuming phase of data preparation, and most of the operations on data are included in this step, such as Resize, Rescale, Crop, etc. in CV-like networks. MindSpore provides a set of common data processing integration interfaces, which can be called directly by users without implementing them. These integration interfaces not only improve the user-friendliness, but also improve the performance of data preprocessing and reduce the time consumption of data preparation during training. For details, please refer to the [Data Preprocessing Tutorial](https://www.mindspore.cn/docs/programming_guide/en/master/optimize_data_processing.html).
-In the data distribution, MindData provides an extremely simple API, which can be used to combine and repeat data by directly calling batch and repeat operations.
+In the data distribution, MindData provides an extremely simple API, which can be used to batch combination and repeating of data by directly calling batch and repeat operations.
-When the above 4 steps are completed, we can theoretically get the exact same data after processing the dataset using MindSpore script and alignment script (if there are operations that introduce random cases need to be removed).
+When the above 4 steps are completed, we can theoretically get the exact same data after processing the dataset by using MindSpore script and benchmark script processing dataset (if there are operations that introduce random cases need to be removed).
#### ResNet50 Migration Example
@@ -144,7 +144,7 @@ input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
```
-By looking at the above code, we find that the data preprocessing of ResNet50 mainly does Resize, CenterCrop, and Normalize operations, and there are two ways to implement these operations in MindSpore, one is to use MindSpore's data processing module MindData to call the encapsulated MindSpore's data processing module MindData to call the encapsulated data preprocessing interface, or through [Custom Dataset](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_loading.html#loading-user-defined-dataset). Here it is more recommended for developers to choose the first way, which not only can reduce the development of repetitive code and the introduction of errors, but also can get better data processing performance. For more information about MindData data processing, please refer to the Data Pipeline section in [Programming Guide](https://www.mindspore.cn/docs/programming_guide/en/master/index.html).
+By looking at the above code, we find that the data preprocessing of ResNet50 mainly does Resize, CenterCrop, and Normalize operations, and there are two ways to implement these operations in MindSpore, one is to use MindSpore's data processing module MindData to call the encapsulated data preprocessing interface, and the other is loading through [Custom Dataset](https://www.mindspore.cn/docs/programming_guide/en/master/dataset_loading.html#loading-user-defined-dataset). It is more recommended for developers to choose the first way, which not only can reduce the development of repetitive code and the introduction of errors, but also can get better data processing performance. For more information about MindData data processing, please refer to the Data Pipeline section in [Programming Guide](https://www.mindspore.cn/docs/programming_guide/en/master/index.html).
The following data processing functions are developed based on MindData:
@@ -197,7 +197,7 @@ def create_dataset(dataset_path, batch_size=32, rank_size=1, rank_id=0, do_train
return data_set
```
-In the above code we can find that for common classical datasets (e.g. ImageNet2012), MindData also provides us with `ImageFolderDataset` interface to read the raw data directly, which saves the workload of reading files by hand-written code. Note that MindData creates datasets with different parameters for single-machine training and multi-machine distributed training, and distributed training requires two additional parameters `num_shard` and `shard_id`.
+In the above code we can find that for common classical datasets (e.g. ImageNet2012), MindData also provides us with `ImageFolderDataset` interface to read the raw data directly, which saves the workload of reading files by hand-written code. It should be noted that MindData creates datasets with different parameters for single-machine training and multi-machine distributed training, and distributed training requires two additional parameters `num_shard` and `shard_id`.
### Subnet Development
@@ -211,12 +211,12 @@ Analyzing the ResNet50 network code, it can be divided into the following main s
- conv1x1, conv3x3: convolution with different kernel_size is defined.
- BasicBlock: the smallest subnet of ResNet18 and ResNet34 in the ResNet family of networks, consisting of Conv, BN, ReLU and residuals.
-- BottleNeck: The smallest sub-network of ResNet50, ResNet101 and ResNet152 in the ResNet family of networks, with an additional layer of Conv, BN and ReLU compared to BasicBlock, and the convolution position of downsampling has been changed.
+- BottleNeck: The smallest subnet of ResNet50, ResNet101 and ResNet152 in the ResNet family of networks, with an additional layer of Conv, BN and ReLU compared to BasicBlock, and the convolution position of downsampling has been changed.
- ResNet: A network that encapsulates the structure of BasicBlock, BottleNeck and Layer, different ResNet series networks can be constructed by passing different parameters. In this structure, some PyTorch self-defined initialization functions are also used.
Based on the above subnetwork division, we redevelop the above development in conjunction with MindSpore syntax.
-Weight initialization directly using [MindSpore's defined weight initialization methods](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.common.initializer.html).
+For weight initialization, directly see [MindSpore's defined weight initialization methods](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.common.initializer.html).
Redeveloping conv3x3 and conv1x1
@@ -324,7 +324,7 @@ class ResidualBlock(nn.Cell):
return out
```
-Redevelopment of the whole ResNet family of nets.
+Redevelopment of the whole ResNet family of whole nets.
```python
class ResNet(nn.Cell):
@@ -460,7 +460,7 @@ def resnet50(class_num=10):
class_num)
```
-After the above steps, the MindSpore-based ResNet50 whole network structure and each sub-network structure have been developed, and the next step is to develop other modules.
+After the above steps, the MindSpore-based ResNet50 whole network structure and each subnet structure have been developed, and the next step is to develop other modules.
### Other Modules
@@ -475,39 +475,7 @@ For additional training configurations, see [Configuration Information for NVIDI
- The cosine LR schedule is used.
- Label Smoothing is used.
-Implemented cosine LR schedule.
-
-```python
-def _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
- """
- Applies cosine decay to generate learning rate array.
-
- Args:
- lr_init(float): init learning rate.
- lr_end(float): end learning rate
- lr_max(float): max learning rate.
- total_steps(int): all steps in training.
- warmup_steps(int): all steps in warmup epochs.
-
- Returns:
- np.array, learning rate array.
- """
- decay_steps = total_steps - warmup_steps
- lr_each_step = []
- for i in range(total_steps):
- if i < warmup_steps:
- lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps)
- lr = float(lr_init) + lr_inc * (i + 1)
- else:
- linear_decay = (total_steps - i) / decay_steps
- cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
- decayed = linear_decay * cosine_decay + 0.00001
- lr = lr_max * decayed
- lr_each_step.append(lr)
- return lr_each_step
-```
-
-Implemented SGD optimizer with Momentum, and applied WeightDecay to all weights except gamma and bias of BN.
+Implementing the SGD optimizer with Momentum, weights are applied to WeightDecay in addition to the gamma and bias of BN:
```python
# define opt
@@ -525,7 +493,7 @@ group_params = [{'params': decayed_params, 'weight_decay': weight_decay},
opt = Momentum(group_params, lr, momentum)
```
-Develop cosine LR schedule, reference on [MindSpore Cosine Decay LR](https://www.mindspore.cn/docs/api/en/master/api_python/nn/mindspore.nn.cosine_decay_lr.html)
+For implementing cosine LR schedule, reference on [MindSpore Cosine Decay LR](https://www.mindspore.cn/docs/api/en/master/api_python/nn/mindspore.nn.cosine_decay_lr.html)
Define Loss Function and implement Label Smoothing.
@@ -697,9 +665,9 @@ if __name__ == '__main__':
model.train(config.epoch_size, dataset, callbacks=cb, sink_size=step_size, dataset_sink_mode=False)
```
-Note: For codes in other files in the directory, refer to MindSpore model_zoo's [ResNet50 implementation](https://gitee.com/mindspore/models/tree/master/official/cv/resnet)(this script incorporates other ResNet family networks and ResNet-SE networks, and the specific implementation may differ from the benchmark script).
+Note: For codes in other files in the directory, refer to MindSpore ModelZoo's [ResNet50 implementation](https://gitee.com/mindspore/models/tree/master/official/cv/resnet)(this script incorporates other ResNet family networks and ResNet-SE networks, and the specific implementation may differ from the benchmark script).
-### Distributed training
+### Distributed Training
Distributed training has no impact on the network structure compared to stand-alone training, and can be done by modifying the stand-alone script by calling the distributed training interface provided by MindSpore, as described in [Distributed Training Tutorial](https://www.mindspore.cn/docs/programming_guide/en/master/distributed_training.html).
@@ -745,9 +713,9 @@ dataset = create_dataset(args_opt.dataset_path, config.batch_size, rank_size, ra
The inference process differs from training in the following ways.
-- No need to define losses and optimizers.
-- No need for repeat operations when constructing the dataset.
-- Need to load trained CheckPoint after network definition.
+- No need to define optimizers.
+- No need to initialize the weights.
+- Need to load trained CheckPoint after network is defined.
- Define the metric for computing inference accuracy.
#### ResNet50 Migration Example
@@ -815,7 +783,7 @@ You may encounter some interruptions in the training during the process, you can
For full example, you can refer to the link:
-## Precision tuning
+## Precision Tuning
After hitting the flow, you can get the accuracy of network training by both training and inference steps. Usually, it is difficult to reproduce the accuracy of the alignment script at once, and we need to gradually improve the accuracy by accuracy tuning, which is less intuitive, less efficient, and more work than performance tuning.
@@ -860,12 +828,14 @@ Single-Step performance jitter and data queues that remain empty for a period of
When the data processing speed is slow, the queue is gradually depleted from the initial full queue to an empty queue, and the training process will start waiting for the empty queue to be filled with data, and the network will continue the single-step training only once new data is filled. Since there is no queue as buffer for data processing, the performance jitter of data processing is directly reflected in the performance of single-Step, so it will also cause single-Step performance jitter.
+For MindData performance issues, refer to MindData in MindInsight Component's [Data Profiling](https://www.mindspore.cn/mindinsight/docs/en/master/performance_profiling_ascend.html#data-preparation-performance-analysis), which gives common problems and solutions to MindData performance.
+
#### Multi-machine Synchronization Performance
-When distributed training is performed, after the forward propagation and gradient computation are completed during a Step, each machine starts to synchronize the AllReduce gradient, and the AllReduce synchronization time is mainly affected by the number of weights and machines.
+When distributed training is performed, after the forward propagation and gradient computation are completed during a Step, each machine starts to synchronize the AllReduce gradient, and the AllReduce synchronization time is mainly affected by the number of weights and machines. For more complex, larger machine-sized networks, the AllReduce gradient update time is longer, at which point we can perform AllReduce tangent to optimize this part of the time.
-Normally, AllReduce gradient synchronization waits until all the inverse operators are finished, i.e., all the gradients of all weights are computed before synchronizing the gradients of all machines at once, but with AllReduce tangent, we can synchronize the gradients of some weights as soon as they are computed, so that the gradient synchronization and the gradient computation of the remaining operators can be This way, the gradient synchronization and the gradient computation of the remaining operators can be performed in parallel, hiding this part of the AllReduce gradient synchronization time. The slicing strategy is usually a manual attempt to find an optimal solution (supporting slicing greater than two segments).
-As an example, [ResNet50 network](https://gitee.com/mindspore/models/blob/master/official/cv/resnet/train.py) has 160 weights and [85, 160] means that the gradient synchronization is performed immediately after the gradient is calculated for the 0th to 85th weights, and the gradient synchronization is performed after the gradient is calculated for the 86th to 160th weights. The code implementation is as follows:
+Normally, AllReduce gradient synchronization waits until all the inverse operators are finished, i.e., all the gradients of all weights are computed before synchronizing the gradients of all machines at once, but with AllReduce tangent, we can synchronize the gradients of some weights as soon as they are computed, so that the gradient synchronization and the gradient computation of the remaining operators can be performed in parallel, hiding this part of the AllReduce gradient synchronization time. The tangent strategy is usually a manual attempt to find an optimal solution (supporting slicing greater than two segments).
+As an example, [ResNet50 network](https://gitee.com/mindspore/models/blob/master/official/cv/resnet/train.py) has 160 weights and [85, 160] means that the gradient synchronization is performed immediately after the gradient is calculated for the 0th to 85th weights, and the gradient synchronization is performed after the gradient is calculated for the 86th to 160th weights. Here the two segments is sliced, so two gradient synchronizations are required. The code implementation is as follows:
```python
device_id = int(os.getenv('DEVICE_ID', '0'))
@@ -888,7 +858,7 @@ The situation that a single operator takes a long time and the performance of th
1. Use less computationally intensive data types. For example, there is no significant difference in precision between float16 and float32 for the same operator, so use the less computationally intensive float16 format.
2. Use other operators with the same algorithm to circumvent it.
-If you find any arithmetic with poor performance, we suggest you contact [MindSpore Community](https://gitee.com/mindspore/mindspore/issues) for feedback, and we will optimize it as soon as we confirm the performance problem.
+If you find any arithmetic with poor performance, we suggest you contact [MindSpore Community](https://gitee.com/mindspore/mindspore/issues) for feedback, and we will optimize it as soon as we confirm it as the performance problem.
#### Framework Performance
diff --git a/docs/mindspore/migration_guide/source_en/script_analysis.md b/docs/mindspore/migration_guide/source_en/script_analysis.md
index 692731786ac8fcdd8d12bf04cf6857a1cf820a23..74c34e064bea5048921aa3197a3507ecf2e2b4d0 100644
--- a/docs/mindspore/migration_guide/source_en/script_analysis.md
+++ b/docs/mindspore/migration_guide/source_en/script_analysis.md
@@ -6,7 +6,7 @@
### MindSpore Operator Design
-The process of using the MindSpore framework to build a neural network is similar to other frameworks (e.g., TensorFlow/PyTorch), but the supported operators are different. It is necessary to find out the missing operators in the MindSpore framework when performing network migration (e.g., migrating from TensorFlow to the MindSpore-ascend platform).
+The process of using the MindSpore framework to build a neural network is similar to other frameworks (TensorFlow/PyTorch), but the supported operators are different. It is necessary to find out the missing operators in the MindSpore framework when performing network migration (e.g., migrating from TensorFlow to the MindSpore-ascend platform).
MindSpore API is composed of various Python/C++ API operators, which can be roughly divided into:
@@ -22,11 +22,11 @@ MindSpore API is composed of various Python/C++ API operators, which can be roug
Including convolution and normalization operators used in network construction, such as `mindspore.nn.Conv2d`, `mindspore.nn.Dense`, etc.
-The surface layer of the network structure operator is the MindSpore operator (hereinafter referred as ME operator), which is the operator API called by the user (e.g., `mindspore.nn.Softmax`), and the ME operator is implemented by calling the TBE operator (C/C++) at the bottom layer.
+The surface layer of the network structure operator is the ME operator, which is the operator API called by the user (e.g., `mindspore.nn.Softmax`), and the ME operator is implemented by calling the TBE operator (C/C++) at the bottom layer.
-When counting missing ME operators, you need to find out the corresponding operators of all operators in the source script (including data framework classes, data preprocessing, and network structure operators) in the MindSpore framework (e.g.,`tf.nn.relu` corresponds to MindSpore operator `mindspore.nn.ReLU`). If there is no corresponding operator in MindSpore, it will be counted as missing.
+When counting missing ME operators, you need to find out the corresponding operators of all operators (including data framework classes, data preprocessing, and network structure operators) in the source script in the MindSpore framework (e.g.,`tf.nn.relu` corresponds to MindSpore operator `mindspore.nn.ReLU`). If there is no corresponding operator in MindSpore, it will be counted as missing.
-### Query Operator Mapping Table
+### Querying Operator Mapping Table
Find the network structure and the Python file that implements the training function in the code library (the name is generally train.py model.py, etc.), and find all relevant operators in the script file (including data framework classes, data preprocessing, network structure operators, etc.), and compare with [MindSpore Operator API](https://www.mindspore.cn/docs/note/en/master/operator_list_ms.html) , to find the platform support status of the operator under `mindspore.nn` or `mindspore.ops`.
@@ -37,18 +37,18 @@ If the source code is a PyTorch script, you can directly query [MindSpore and Py
### Missing Operator Processing Strategy
1. Consider replacing it with other operators: It is necessary to analyze the implementation formula of the operator and examine whether the existing MindSpore operator can be superimposed to achieve the expected goal.
-2. Consider temporary circumvention solutions: For example, if a certain loss is not supported, it can be replaced with a loss operator of the same kind that has been supported.
-3. Consider using Custom operators: see [Custom Operators (Custom based)](https://www.mindspore.cn/docs/programming_guide/en/master/custom_operator_custom.html).
-4. Consider using third-party operators by Custom operators: see [Use Third-Party Operators by Custom Operators](https://www.mindspore.cn/docs/migration_guide/en/master/use_third_party_op.html).
+3. Consider using Customized operators: see [Custom Operators (Custom based)](https://www.mindspore.cn/docs/programming_guide/en/master/custom_operator_custom.html).
+4. Consider using third-party operators by Customized operators: see [Use Third-Party Operators by Custom Operators](https://www.mindspore.cn/docs/migration_guide/en/master/use_third_party_op.html).
+4. Consider temporary circumvention solutions: For example, if a certain loss is not supported, it can be replaced with a loss operator of the same kind that has been supported.
5. Submit suggestions in [MindSpore Community](https://gitee.com/mindspore/mindspore/issues) to develop missing operators.
## Grammar Assessment
MindSpore provides two modes: `GRAPH_MODE` and `PYNATIVE_MODE`.
-In PyNative mode, the behavior of the model for **Evaluation** is same as that of in the general Python code.
+In PyNative mode, the behavior of the model for **Inference** is same as that of in the general Python code.
-When using `GRAPH_MODE`, or when using `PYNATIVE_MODE` for **Training**, there are usually grammatical restrictions. In these two cases, it is necessary to perform graph compilation operations on the Python code. In this step, MindSpore has not yet been able to support the complete set of Python syntax, so there will be some restrictions on the implementation of the `construct` function. For specific restrictions, please refer to [MindSpore static graph syntax support](https://www.mindspore.cn/docs/note/en/master/static_graph_syntax_support.html).
+When using GRAPH_MODE, or when using PYNATIVE_MODE for **Training**, there are usually grammatical restrictions. In these two cases, it is necessary to perform graph compilation operations on the Python code. In this step, MindSpore has not yet been able to support the complete set of Python syntax, so there will be some restrictions on the implementation of the `construct` function. For specific restrictions, please refer to [MindSpore static graph syntax support](https://www.mindspore.cn/docs/note/en/master/static_graph_syntax_support.html).
### Common Restriction Principles
@@ -61,5 +61,5 @@ Compared with the specific syntax description, the common restrictions can be su
### Common Processing Strategies
1. Use the operators provided by MindSpore to replace the functions of other Python libraries. The processing of constants can be moved to the `__init__` stage.
-2. Use basic types for combination, you can consider increasing the amount of function parameters. There are no restrictions on the input parameters of the function, and variable length input can be used.
-3. Avoid multithreading in the network.
+2. Use basic types for combination, and you can consider increasing the amount of function parameters. There are no restrictions on the input parameters of the function, and variable length input can be used.
+3. Avoid multi-threading processing in the network.
diff --git a/docs/mindspore/migration_guide/source_en/use_third_party_op.md b/docs/mindspore/migration_guide/source_en/use_third_party_op.md
index 55b71ef5f9bd13f1841e020cbbaf239619701da8..1289ec9fe7f8f4de7e246de8efa7d5d7ca9f6c3b 100644
--- a/docs/mindspore/migration_guide/source_en/use_third_party_op.md
+++ b/docs/mindspore/migration_guide/source_en/use_third_party_op.md
@@ -1,37 +1,37 @@
-# Use Third-Party Operators by Custom Operators
+# Call Third-Party Operators by Customized Operators
## Overview
-When built-in operators cannot meet requirements during network development, you can call the Python API [Custom](https://www.mindspore.cn/docs/api/en/master/api_python/ops/mindspore.ops.Custom.html#mindspore-ops-custom) primitive defined in MindSpore to quickly create different types of custom operators for use.
+When built-in operators cannot meet requirements during network development, you can call the Python API [Custom](https://www.mindspore.cn/docs/api/en/master/api_python/ops/mindspore.ops.Custom.html#mindspore-ops-custom) primitive defined in MindSpore to quickly create different types of customized operators for use.
-You can choose different custom operator defining methods base on needs.
-See: [custom_operator_custom](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/custom_operator_custom.html).
+You can choose different customized operator developing methods base on needs.
+See: [custom_operator_custom](https://www.mindspore.cn/docs/programming_guide/en/master/custom_operator_custom.html).
-There is a defining method called `aot` which has a special use. It could load a dynamic library and use cpp/cuda functions in it. When third-party library provide some API, we could try to use these APIs in the dynamic library.
+There is a defining method called `aot` which has a special use. The `aot` mode can call the corresponding `cpp`/`cuda` function by loading the precompiled `so`. Therefore, when a third-party library provides the `cpp`/`cuda` function `API`, you can try to call its function interface in `so`.
-Here is an example of how to use PyTorch Aten by Custom operator.
+Here is an example of how to use `Aten` library of PyTorch Aten.
-## Use PyTorch Aten operators by Custom operator
+## Using PyTorch Aten operators for Docking
-When migrate a network script which using PyTorch Aten ops, we could use `Custom` operator to reuse Pytorch Aten operators if Mindspore missing the operator.
+When migrating a network using the PyTorch Aten operator encounters a shortage of built-in operators, we can use the `aot` development method of the `Custom` operator to call PyTorch Aten's operator for fast verification.
-PyTorch provides a mechanism called C++ extensions that allow users to create PyTorch operators defined out-of-source. It makes user easy to use Pytorch data structure to write cpp/cuda code, and compile it into dynamic library. See:.
+PyTorch provides a way to support the introduction of PyTorch's header files to write `cpp/cuda` code by using its associated data structures and compile it into `so`. See:.
-So Custom operator can use this mechanism to call PyTorch Aten operators. Here is an example of usage:
+Using a combination of the two methods, the customized operator can call the PyTorch Aten operator as follows:
-### 1. Download the Project files
+### 1. Downloading the Project files
User can download the project files from [here](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/migration_guide/test_custom_pytorch.tar).
-Use `tar` to extract files into folder `test_custom_pytorch`:
+Use the following command to extract files and find the folder `test_custom_pytorch`:
```bash
tar xvf test_custom_pytorch.tar
```
-Folder `test_custom_pytorch`include files:
+The folder include the following files:
```text
test_custom_pytorch
@@ -49,20 +49,26 @@ test_custom_pytorch
└── test_gpu_op.py # a test file to run Aten GPU operator on GPU device
```
-### 2. Write a CPP/CU file to use PyTorch Aten operators
+Using the PyTorch Aten operator focuses mainly on env.sh, setup.py, leaky_relu.cpp/cu, test_*, .py.
-The custom operator of aot type adopts the AOT compilation method, which requires network developers to hand-write the source code file of the operator implementation based on a specific interface, and compile the source code file into a dynamic library in advance, and then the framework will automatically call and run the function defined in the dynamic library. In terms of the development language of the operator implementation, the GPU platform supports CUDA, and the CPU platform supports C and C++. The interface specification of the operator implementation in the source file is as follows:
+Among them, env.sh is used to set environment variables, setup.py is used to compile so, leaky_relu.cpp/cu is used to reference the source code that calls the PyTorch Aten operator, and test_*.py is used to refer to the call Custom operator.
+
+### 2. Writing and calling the Source Code File of PyTorch Aten Operators
+
+Refer to leaky_relu.cpp/cu to write a source code file that calls the PyTorch Aten operator.
+
+The customized operator of `aot` type adopts the `AOT` compilation method, which requires network developers to hand-write the source code file of the operator implementation based on a specific interface, and compile the source code file into a dynamic link library in advance, and then the framework will automatically call the function defined in the dynamic link library. In terms of the development language of the operator implementation, the `GPU` platform supports `CUDA`, and the `CPU` platform supports `C` and `C++`. The interface specification of the operator implemented by the operators in the source code file is as follows:
```cpp
extern "C" int func_name(int nparam, void **params, int *ndims, int64_t **shapes, const char **dtypes, void *stream, void *extra);
```
-Take leaky_relu.cpp as an example for `cpu` backend:
+If the `cpu` operator is called, taking `leaky_relu.cpp` as an example, the file provides the function `LeakyRelu` required by `AOT`, which calls `torch::leaky_relu_out` function of PyTorch Aten:
```cpp
#include
-#include
+#include // Header file reference section
#include "ms_ext.h"
extern "C" int LeakyRelu(
@@ -77,7 +83,7 @@ extern "C" int LeakyRelu(
auto at_input = tensors[0];
auto at_output = tensors[1];
torch::leaky_relu_out(at_output, at_input);
- // a case which need copy output
+ // If you are using a version without output, the code is as follows:
// torch::Tensor output = torch::leaky_relu(at_input);
// at_output.copy_(output);
return 0;
@@ -85,7 +91,7 @@ extern "C" int LeakyRelu(
```
-Take leaky_relu.cu as an example for `gpu` backend:
+If the `gpu` operator is called, take `leaky_relu.cu` as an example:
```cpp
#include
@@ -110,24 +116,24 @@ extern "C" int LeakyRelu(
}
```
-PyTorch Aten provides 300+ operator APIs with/without output tensors.
-
-`torch::*_out` is with output tensors which do not need memory copy.
+PyTorch Aten provides operator functions versions with output and operator functions versions without output. Operator functions with output have the '_out' suffix, and PyTorch Aten provides 300+ `apis` of common operators.
-The APIs Without output tensors need use `torch.Tensor.copy_` to copy return value to kernel output.
+When `torch::*_out` is called, `output` copy is not needed. When the versions without `_out`suffix is called, API `torch.Tensor.copy_` is needed to called to result copy.
-For more details of APIs, see: `python*/site-packages/torch/include/ATen/CPUFunctions_inl.h` and `python*/site-packages/torch/include/ATen/CUDAFunctions_inl.h`.
+To see which functions are supported for calling PyTorch Aten, the `CPU` version refers to the PyTorch installation path: `python*/site-packages/torch/include/ATen/CPUFunctions_inl.h` , and for the corresponding `GPU` version, refers to`python*/site-packages/torch/include/ATen/CUDAFunctions_inl.h`。
-A brief introduction of project APIs in ms_ext.h:
+The apis provided by ms_ext.h are used in the above use case, which are briefly described here:
```cpp
// Convert MindSpore kernel's inputs/outputs to PyTorch Aten's Tensor
std::vector get_torch_tensors(int nparam, void** params, int* ndims, int64_t** shapes, const char** dtypes, c10::Device device) ;
```
-### 3. Use `setup.py` to compile source code into dynamic library
+### 3. Using the compilation script `setup.py` to generate so
+
+setup.py uses the `cppextension` provided by PyTorch Aten to compile the above `c++/cuda` source code into an `so` file.
-Install Pytorch first.
+Before execution, you need to make sure that PyTorch is installed.
```bash
pip install torch
@@ -146,14 +152,13 @@ cpu: python setup.py leaky_relu.cpp leaky_relu_cpu.so
gpu: python setup.py leaky_relu.cu leaky_relu_gpu.so
```
-Then the needed dynamic library will be created.
+Then the so files that we need may be obtained.
-### 4. Use the Custom operator
+### 4. Using the Customized Operator
-Take CPU backend as an example to use PyTorch Aten operator in Custom operator:
+Taking CPU as an example, use the Custom operator to call the above PyTorch Aten operator, see the code test_cpu_op.py:
```python
-# test_cpu_op.py
import numpy as np
from mindspore import context, Tensor
from mindspore.nn import Cell
@@ -201,7 +206,7 @@ context.set_context(device_target="GPU")
op = ops.Custom("./leaky_relu_gpu.so:LeakyRelu", out_shape=lambda x : x, out_dtype=lambda x : x, func_type="aot")
```
-When using a PyTorch Aten `CPU` operator and `device_target` is `"GPU"`, should add prim attr like this:
+When using a PyTorch Aten `CPU` operator and `device_target` is `"GPU"`, the settings that need to be added are as follows:
```python
context.set_context(device_target="GPU")
@@ -209,6 +214,6 @@ op = ops.Custom("./leaky_relu_cpu.so:LeakyRelu", out_shape=lambda x : x, out_dty
op.add_prim_attr("primitive_target", "CPU")
```
-> 1. Check compile tools exist and have right version when using cpp extension.
-> 2. Make sure the build folder created by cpp extension is clean when using cpp extnsion first time.
-> 3. Tested by PyTorch 1.9.1,cuda11.1,python3.7,download link:, PyTorch cuda and local cuda should be same.
\ No newline at end of file
+> 1. Compile so with cppextension requires a compiler version that meets the tool's needs, and check for the presence of gcc/clang/nvcc.
+> 2. Compile so with cppextension will generate a build folder in the script path, which stores so. The script will copy so to outside of build, but cppextension will skip compilation if it finds that there is already so in build, so if it is a newly compiled so, remember to empty the so under the build.
+> 3. The following tests is based on PyTorch 1.9.1,cuda11.1,python3.7. The download link:. The cuda version supported by PyTorch Aten needs to be consistent with the local cuda version, and whether other versions are supported needs to be explored by the user.
\ No newline at end of file