diff --git a/tutorials/source_en/advanced_use/dataset_conversion.md b/tutorials/source_en/advanced_use/dataset_conversion.md index e76a8cba882488ac0943aec4a2a3b692058d5e49..b784c2a73e3204d3e1a7b87ead1cca2084a5db57 100644 --- a/tutorials/source_en/advanced_use/dataset_conversion.md +++ b/tutorials/source_en/advanced_use/dataset_conversion.md @@ -7,7 +7,7 @@ - [Convert Dataset to MindRecord](#convert-dataset-to-mindrecord) - [Overview](#overview) - [Basic Concepts](#basic-concepts) - - [Convert Dataset to MindRecord](#convert-dataset-to-mindrecord) + - [Convert Dataset to MindRecord](#convert-dataset-to-mindrecord-1) - [Load MindRecord Dataset](#load-mindrecord-dataset) diff --git a/tutorials/source_en/advanced_use/optimize_the_performance_of_data_preparation.md b/tutorials/source_en/advanced_use/optimize_the_performance_of_data_preparation.md index 5e4b3ce2f15ef363639668b09986cd442ec06172..81a6da7ec887bf4bff4d823b2b1647f323b9a276 100644 --- a/tutorials/source_en/advanced_use/optimize_the_performance_of_data_preparation.md +++ b/tutorials/source_en/advanced_use/optimize_the_performance_of_data_preparation.md @@ -93,7 +93,7 @@ MindSpore provides multiple data loading methods, including common dataset loadi Suggestions on data loading performance optimization are as follows: - Built-in loading operators are preferred for supported dataset formats. For details, see [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution. For details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorial/en/master/advanced_use/optimize_the_performance_of_data_preparation.html#multi-thread-optimization-solution). -- For a dataset format that is not supported, convert the format to MindSpore data format and then use the `MindDataset` class to load the dataset. For details, see [Convert Dataset to MindRecord](https://www.mindspore.cn/api/en/master/programming_guide/dataset_conversion.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorial/en/master/advanced_use/optimize_the_performance_of_data_preparation.html#multi-thread-optimization-solution). +- For a dataset format that is not supported, convert the format to MindSpore data format and then use the `MindDataset` class to load the dataset. For details, see [MindSpore Data Format Conversion](https://www.mindspore.cn/api/en/master/programming_guide/dataset_conversion.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorial/en/master/advanced_use/optimize_the_performance_of_data_preparation.html#multi-thread-optimization-solution). - For dataset formats that are not supported, the user-defined `GeneratorDataset` class is preferred for implementing fast algorithm verification. If the performance cannot meet the requirements, the multi-process concurrency solution can be used. For details, see [Multi-process Optimization Solution](https://www.mindspore.cn/tutorial/en/master/advanced_use/optimize_the_performance_of_data_preparation.html#multi-process-optimization-solution). ### Code Example @@ -172,7 +172,7 @@ Based on the preceding suggestions of data loading performance optimization, the ## Optimizing the Shuffle Performance -The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process. For details, see [Shuffle Processing](https://www.mindspore.cn/api/en/master/programming_guide/pipeline.html#shuffle). However, because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). +The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process. For details, see [Shuffle](https://www.mindspore.cn/api/en/master/programming_guide/pipeline.html#shuffle). However, because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). ### Performance Optimization Solution @@ -256,7 +256,7 @@ During image classification training, especially when the dataset is small, user - Use the built-in Python operator (`py_transforms` module) to perform data augmentation. - Users can define Python functions as needed to perform data augmentation. -For details, see [Data Augmentation](https://www.mindspore.cn/api/en/master/programming_guide/augmentation.html). The performance varies according to the underlying implementation methods. +For details, see [Augmentation](https://www.mindspore.cn/api/en/master/programming_guide/augmentation.html). The performance varies according to the underlying implementation methods. | Module | Underlying API | Description | | :----: | :----: | :----: | @@ -394,7 +394,7 @@ For details, see [Built-in Loading Operators](https://www.mindspore.cn/api/en/ma ### Multi-process Optimization Solution During data processing, operators implemented by Python support the multi-process mode. For example: -- By default, the `GeneratorDataset` class is in multi-process mode. The `num_parallel_workers` parameter indicates the number of enabled processes. The default value is 1. For details, see [Generator Dataset](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html#mindspore.dataset.GeneratorDataset) +- By default, the `GeneratorDataset` class is in multi-process mode. The `num_parallel_workers` parameter indicates the number of enabled processes. The default value is 1. For details, see [GeneratorDataset](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html#mindspore.dataset.GeneratorDataset) - If the user-defined Python function or the `py_transforms` module is used to perform data augmentation and the `python_multiprocessing` parameter of the `map` function is set to True, the `num_parallel_workers` parameter indicates the number of processes and the default value of the `python_multiprocessing` parameter is False. In this case, the `num_parallel_workers` parameter indicates the number of threads. For details, see [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). ### Compose Optimization Solution @@ -405,7 +405,7 @@ Map operators can receive the Tensor operator list and apply all these operators ### Operator Fusion Optimization Solution -Some fusion operators are provided to aggregate the functions of two or more operators into one operator. For details, see [Data Augmentation Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.vision.html). Compared with the pipelines of their components, such fusion operators provide better performance. As shown in the figure: +Some fusion operators are provided to aggregate the functions of two or more operators into one operator. For details, see [Augmentation Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.vision.html). Compared with the pipelines of their components, such fusion operators provide better performance. As shown in the figure: ![title](./images/operator_fusion.png)