diff --git a/tutorials/lite/source_en/quick_start/quick_start.md b/tutorials/lite/source_en/quick_start/quick_start.md index c910dd9a72ef9c71185498ae0ec95198b2cb9d0d..40a337274f2c804108d2582a5480893e8500cdca 100644 --- a/tutorials/lite/source_en/quick_start/quick_start.md +++ b/tutorials/lite/source_en/quick_start/quick_start.md @@ -43,7 +43,7 @@ After you retrain a model provided by MindSpore, export the model in the [.mindi Take the mobilenetv2 model as an example. Execute the following script to convert a model into a MindSpore Lite model for on-device inference. ```bash -./converter_lite --fmk=MS --modelFile=mobilenetv2.mindir --outputFile=mobilenetv2.ms +./converter_lite --fmk=MINDIR --modelFile=mobilenetv2.mindir --outputFile=mobilenetv2.ms ``` ## Deploying an Application diff --git a/tutorials/lite/source_zh_cn/use/post_training_quantization.md b/tutorials/lite/source_zh_cn/use/post_training_quantization.md index 49d3fbbdcea91bf64afb9e8f054fc0f19caa7f77..a72d7c8571e43e21ed3e0db4a82d3244e9724b92 100644 --- a/tutorials/lite/source_zh_cn/use/post_training_quantization.md +++ b/tutorials/lite/source_zh_cn/use/post_training_quantization.md @@ -108,7 +108,7 @@ MindSpore Lite训练后量化分为两类: 校准数据集可以选择测试数据集的子集,要求`/dir/images`目录下存放的每个文件均是预处理好的输入数据,每个文件都可以直接用于推理的输入。 3. 以MindSpore模型为例,执行全量化的模型转换命令: ``` - ./converter_lite --fmk=MS --modelFile=lenet.ms --outputFile=lenet_quant --quantType=PostTraining --config_file=config.cfg + ./converter_lite --fmk=MINDIR --modelFile=lenet.mindir --outputFile=lenet_quant --quantType=PostTraining --config_file=config.cfg ``` 4. 上述命令执行成功后,便可得到量化后的模型`lenet_quant.ms`,通常量化后的模型大小会下降到FP32模型的1/4。 diff --git a/tutorials/training/source_en/advanced_use/custom_debugging_info.md b/tutorials/training/source_en/advanced_use/custom_debugging_info.md index 8417be92e517b20e9f8db40ab08ea18cbdf3e8be..046513d95b9119c56688b5447c96f65a097d962c 100644 --- a/tutorials/training/source_en/advanced_use/custom_debugging_info.md +++ b/tutorials/training/source_en/advanced_use/custom_debugging_info.md @@ -294,7 +294,7 @@ The input and output of the operator can be saved for debugging through the data - `net_name`:net name eg:ResNet50. - `iteration`:Specify the iterations to dump. All kernels in graph will be dumped. - `input_output`:0:dump input and output of kernel, 1:dump input of kernel, 2:dump output of kernel. - - `kernels`:full name of kernel. Enable `context.set_context(save_graphs=True)` and get full name of kernel from `hwopt_d_end_graph_{graph_id}.ir`. + - `kernels`:full name of kernel. Enable `context.set_context(save_graphs=True)` and get full name of kernel from `ir` file. You can get it from `hwopt_d_end_graph_{graph_id}.ir` when `device_target` is `Ascend` and you can get it from `hwopt_pm_7_getitem_tuple.ir` when `device_target` is `GPU`. - `support_device`:support devices, default setting is `[0,1,2,3,4,5,6,7]`. You can specify specific device ids to dump specific device data. - `enable`:enable synchronous dump. - `trans_flag`:enable trans flag. Transform the device data format into NCHW. @@ -310,6 +310,8 @@ The input and output of the operator can be saved for debugging through the data 3. Execute the training script to dump data. + You can set `context.set_context(reserve_class_name_in_scope=False)` in your training script to avoid dump failure because of file name is too long. + 4. Parse the Dump file Call `numpy.fromfile` to parse dump data file. diff --git a/tutorials/training/source_en/advanced_use/cv_resnet50_second_order_optimizer.md b/tutorials/training/source_en/advanced_use/cv_resnet50_second_order_optimizer.md index e41bf2c448d9ebb38a51ae27d16fb84179bbd95c..1bc9c68d82f271d82d8f1de58246bca84535d0a7 100644 --- a/tutorials/training/source_en/advanced_use/cv_resnet50_second_order_optimizer.md +++ b/tutorials/training/source_en/advanced_use/cv_resnet50_second_order_optimizer.md @@ -163,7 +163,7 @@ def create_dataset(dataset_path, do_train, repeat_num=1, batch_size=32, target=" return ds ``` -> MindSpore supports multiple data processing and augmentation operations, which are usually combined. For details, see [Data Processing](https://www.mindspore.cn/tutorial/training/en/r1.0/use/data_preparation.html) and [Augmentation](https://www.mindspore.cn/doc/programming_guide/en/r1.0/augmentation.html). +> MindSpore supports multiple data processing and augmentation operations, which are usually combined. For details, see [Data Processing](https://www.mindspore.cn/doc/programming_guide/en/r1.0/pipeline.html) and [Augmentation](https://www.mindspore.cn/doc/programming_guide/en/r1.0/augmentation.html). ## Defining the Network diff --git a/tutorials/training/source_en/advanced_use/dashboard.md b/tutorials/training/source_en/advanced_use/dashboard.md index 04ef40c9f0737c644ce3fe574d97dab86d0eb7fd..4d960866e7064695126f11d4706069b804dbe7a9 100644 --- a/tutorials/training/source_en/advanced_use/dashboard.md +++ b/tutorials/training/source_en/advanced_use/dashboard.md @@ -20,7 +20,7 @@ ## Overview -Training dashboard is an important part of mindinsight's visualization component, and its tags include scalar visualization, parameter distribution visualization, computational visualization, data visualization, image visualization and tensor visualization. +Training dashboard is an important part of mindinsight's visualization component, and its tags include scalar visualization, parameter distribution visualization, computational graph visualization, data graph visualization, image visualization and tensor visualization. Access the Training Dashboard by selecting a specific training from the training list. diff --git a/tutorials/training/source_en/advanced_use/distributed_training_ascend.md b/tutorials/training/source_en/advanced_use/distributed_training_ascend.md index c9b3f60e3580c5e0ec4e049eb0e9ecc23827a94c..48351012618f1066e51165558e70925a7c1e683d 100644 --- a/tutorials/training/source_en/advanced_use/distributed_training_ascend.md +++ b/tutorials/training/source_en/advanced_use/distributed_training_ascend.md @@ -521,7 +521,7 @@ to: ckpt_config = CheckpointConfig(keep_checkpoint_max=1, integrated_save=False) ``` -It should be noted that if users chooses this checkpoint saving policy, users need to save and load the segmented checkpoint for subsequent reasoning or retraining. Specific usage can refer to(https://www.mindspore.cn/tutorial/en/master/advanced_use/checkpoint_for_hybrid_parallel.html#integrating-the-saved-checkpoint-files)。 +It should be noted that if users chooses this checkpoint saving policy, users need to save and load the segmented checkpoint for subsequent reasoning or retraining. Specific usage can refer to [Integrating the Saved Checkpoint Files](https://www.mindspore.cn/tutorial/training/en/r1.0/advanced_use/save_load_model_hybrid_parallel.html#integrating-the-saved-checkpoint-files). ### Hybrid Parallel Mode diff --git a/tutorials/training/source_en/advanced_use/hub_tutorial.md b/tutorials/training/source_en/advanced_use/hub_tutorial.md index d39f6fa8eabce54aef314ab5599c47343548980c..6f9be4a7a28dbcf166321f1c0ad2914e03349d44 100644 --- a/tutorials/training/source_en/advanced_use/hub_tutorial.md +++ b/tutorials/training/source_en/advanced_use/hub_tutorial.md @@ -13,7 +13,7 @@ - + ### Overview @@ -27,7 +27,7 @@ We accept publishing models to MindSpore Hub via PR in [hub](https://gitee.com/m 1. Host your pre-trained model in a storage location where we are able to access. -2. Add a model generation python file called `mindspore_hub_conf.py` in your own repo using this [template](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/googlenet/mindspore_hub_conf.py). The location of the `mindspore_hub_conf.py` file is shown below: +2. Add a model generation python file called `mindspore_hub_conf.py` in your own repo using this [template](https://gitee.com/mindspore/mindspore/blob/r1.0/model_zoo/official/cv/googlenet/mindspore_hub_conf.py). The location of the `mindspore_hub_conf.py` file is shown below: ```shell script googlenet @@ -57,7 +57,7 @@ We accept publishing models to MindSpore Hub via PR in [hub](https://gitee.com/m | └── md_validator.py ``` - Note that it is required to fill in the `{model_name}_{model_version}_{dataset}.md` template by providing `file-format`、`asset-link` and `asset-sha256` below, which refers to the model file format, model storage location from step 1 and model hash value, respectively. The MindSpore Hub supports multiple model file formats including [MindSpore CKPT](https://www.mindspore.cn/tutorial/en/master/use/saving_and_loading_model_parameters.html#checkpoint-configuration-policies), [AIR](https://www.mindspore.cn/tutorial/en/master/use/multi_platform_inference.html), [MindIR](https://www.mindspore.cn/tutorial/en/master/use/saving_and_loading_model_parameters.html#export-mindir-model), [ONNX](https://www.mindspore.cn/tutorial/en/master/use/multi_platform_inference.html) and [MSLite](https://www.mindspore.cn/lite/tutorial/en/master/use/converter_tool.html). + Note that it is required to fill in the `{model_name}_{model_version}_{dataset}.md` template by providing `file-format`、`asset-link` and `asset-sha256` below, which refers to the model file format, model storage location from step 1 and model hash value, respectively. The MindSpore Hub supports multiple model file formats including [MindSpore CKPT](https://www.mindspore.cn/tutorial/training/en/r1.0/use/save_and_load_model.html#checkpoint-configuration-policies), [AIR](https://www.mindspore.cn/tutorial/training/en/r1.0/use/multi_platform_inference.html), [MindIR](https://www.mindspore.cn/tutorial/training/en/r1.0/use/savie_and_load_model.html#export-mindir-model), [ONNX](https://www.mindspore.cn/tutorial/training/en/r1.0/use/multi_platform_inference.html) and [MSLite](https://www.mindspore.cn/tutorial/lite/en/r1.0/use/converter_tool.html). ```shell script file-format: ckpt @@ -113,7 +113,7 @@ Once your PR is merged into master branch here, your model will show up in [Mind # ... ``` -- After loading the model, you can use MindSpore to do inference. You can refer to [here](https://www.mindspore.cn/tutorial/en/master/use/multi_platform_inference.html). +- After loading the model, you can use MindSpore to do inference. You can refer to [here](https://www.mindspore.cn/tutorial/training/en/r1.0/use/multi_platform_inference.html). ### Model Fine-tuning diff --git a/tutorials/training/source_en/advanced_use/migrate_3rd_scripts_mindconverter.md b/tutorials/training/source_en/advanced_use/migrate_3rd_scripts_mindconverter.md index d74e0380fa40e89f33510930dbecfb58779e2612..41fa5639a9e32f4c769ca754ce77a2ed1cc6c16f 100644 --- a/tutorials/training/source_en/advanced_use/migrate_3rd_scripts_mindconverter.md +++ b/tutorials/training/source_en/advanced_use/migrate_3rd_scripts_mindconverter.md @@ -99,9 +99,7 @@ For the second demand, the Graph mode is recommended. As the computational graph Some typical image classification networks such as ResNet and VGG have been tested for the Graph mode. Note that: > 1. Currently, the Graph mode does not support models with multiple inputs. Only models with a single input and single output are supported. - > 2. The Dropout operator will be lost after conversion because the inference mode is used to load the PyTorch model. Manually re-implement is necessary. - > 3. The Graph-based mode will be continuously developed and optimized with further updates. diff --git a/tutorials/training/source_en/advanced_use/optimize_data_processing.md b/tutorials/training/source_en/advanced_use/optimize_data_processing.md index 3fa96619a299d21517fc6fb3d6160a1dfbb8c236..bf604ea383a278036de19f638a622046be64e77b 100644 --- a/tutorials/training/source_en/advanced_use/optimize_data_processing.md +++ b/tutorials/training/source_en/advanced_use/optimize_data_processing.md @@ -98,7 +98,7 @@ In the preceding information: ## Optimizing the Data Loading Performance -MindSpore provides multiple data loading methods, including common dataset loading, user-defined dataset loading, and MindSpore data format loading. For details, see [Loading Datasets](https://www.mindspore.cn/tutorial/en/master/use/data_preparation/loading_the_datasets.html). The dataset loading performance varies depending on the underlying implementation method. +MindSpore provides multiple data loading methods, including common dataset loading, user-defined dataset loading, and MindSpore data format loading. For details, see [Loading Datasets](https://www.mindspore.cn/doc/programming_guide/en/r1.0/dataset_loading.html). The dataset loading performance varies depending on the underlying implementation method. | | Common Dataset | User-defined Dataset | MindRecord Dataset | | :----: | :----: | :----: | :----: | @@ -110,8 +110,8 @@ MindSpore provides multiple data loading methods, including common dataset loadi ![title](./images/data_loading_performance_scheme.png) Suggestions on data loading performance optimization are as follows: -- Built-in loading operators are preferred for supported dataset formats. For details, see [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution. For details, see [Multi-thread Optimization Solution](#multi-thread-optimization-solution). -- For a dataset format that is not supported, convert the format to MindSpore data format and then use the `MindDataset` class to load the dataset. For details, see [Converting Datasets into MindSpore Data Format](https://www.mindspore.cn/tutorial/en/master/use/data_preparation/converting_datasets.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](#multi-thread-optimization-solution). +- Built-in loading operators are preferred for supported dataset formats. For details, see [Built-in Loading Operators](https://www.mindspore.cn/doc/api_python/en/r1.0/mindspore/mindspore.dataset.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution. For details, see [Multi-thread Optimization Solution](#multi-thread-optimization-solution). +- For a dataset format that is not supported, convert the format to MindSpore data format and then use the `MindDataset` class to load the dataset. For details, see [Converting Datasets into MindSpore Data Format](https://www.mindspore.cn/doc/programming_guide/en/r1.0/dataset_conversion.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](#multi-thread-optimization-solution). - For dataset formats that are not supported, the user-defined `GeneratorDataset` class is preferred for implementing fast algorithm verification. If the performance cannot meet the requirements, the multi-process concurrency solution can be used. For details, see [Multi-process Optimization Solution](#multi-process-optimization-solution). ### Code Example @@ -191,7 +191,7 @@ Based on the preceding suggestions of data loading performance optimization, the ## Optimizing the Shuffle Performance -The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process. For details, see [Shuffle Processing](https://www.mindspore.cn/tutorial/en/master/use/data_preparation/data_processing_and_augmentation.html#shuffle). However, because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). +The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process. For details, see [Shuffle Processing](https://www.mindspore.cn/doc/programming_guide/en/r1.0/pipeline.html#shuffle). However, because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operators](https://www.mindspore.cn/doc/api_python/en/r1.0/mindspore/mindspore.dataset.html). ### Performance Optimization Solution @@ -275,7 +275,7 @@ During image classification training, especially when the dataset is small, user - Use the built-in Python operator (`py_transforms` module) to perform data augmentation. - Users can define Python functions as needed to perform data augmentation. -For details, see [Data Augmentation](https://www.mindspore.cn/tutorial/en/master/use/data_preparation/data_processing_and_augmentation.html#id3). The performance varies according to the underlying implementation methods. +For details, see [Data Augmentation](https://www.mindspore.cn/doc/programming_guide/en/r1.0/dataset_loading.html). The performance varies according to the underlying implementation methods. | Module | Underlying API | Description | | :----: | :----: | :----: | @@ -368,13 +368,13 @@ During the data pipeline process, the number of threads for related operators ca - During data augmentation, the `num_parallel_workers` parameter in the `map` function is used to set the number of threads. - During batch processing, the `num_parallel_workers` parameter in the `batch` function is used to set the number of threads. -For details, see [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). +For details, see [Built-in Loading Operators](https://www.mindspore.cn/doc/api_python/en/r1.0/mindspore/mindspore.dataset.html). ### Multi-process Optimization Solution During data processing, operators implemented by Python support the multi-process mode. For example: -- By default, the `GeneratorDataset` class is in multi-process mode. The `num_parallel_workers` parameter indicates the number of enabled processes. The default value is 1. For details, see [Generator Dataset](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html#mindspore.dataset.GeneratorDataset) -- If the user-defined Python function or the `py_transforms` module is used to perform data augmentation and the `python_multiprocessing` parameter of the `map` function is set to True, the `num_parallel_workers` parameter indicates the number of processes and the default value of the `python_multiprocessing` parameter is False. In this case, the `num_parallel_workers` parameter indicates the number of threads. For details, see [Built-in Loading Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.html). +- By default, the `GeneratorDataset` class is in multi-process mode. The `num_parallel_workers` parameter indicates the number of enabled processes. The default value is 1. For details, see [Generator Dataset](https://www.mindspore.cn/doc/api_python/en/r1.0/mindspore/mindspore.dataset.html#mindspore.dataset.GeneratorDataset) +- If the user-defined Python function or the `py_transforms` module is used to perform data augmentation and the `python_multiprocessing` parameter of the `map` function is set to True, the `num_parallel_workers` parameter indicates the number of processes and the default value of the `python_multiprocessing` parameter is False. In this case, the `num_parallel_workers` parameter indicates the number of threads. For details, see [Built-in Loading Operators](https://www.mindspore.cn/doc/api_python/en/r1.0/mindspore/mindspore.dataset.html). ### Compose Optimization Solution @@ -384,6 +384,6 @@ Map operators can receive the Tensor operator list and apply all these operators ### Operator Fusion Optimization Solution -Some fusion operators are provided to aggregate the functions of two or more operators into one operator. For details, see [Data Augmentation Operators](https://www.mindspore.cn/api/en/master/api/python/mindspore/mindspore.dataset.vision.html). Compared with the pipelines of their components, such fusion operators provide better performance. As shown in the figure: +Some fusion operators are provided to aggregate the functions of two or more operators into one operator. For details, see [Data Augmentation Operators](https://www.mindspore.cn/doc/api_python/en/r1.0/mindspore/mindspore.dataset.vision.html). Compared with the pipelines of their components, such fusion operators provide better performance. As shown in the figure: ![title](./images/operator_fusion.png) diff --git a/tutorials/training/source_en/advanced_use/test_model_security_fuzzing.md b/tutorials/training/source_en/advanced_use/test_model_security_fuzzing.md index b7009637ad6274e12fb5d06e592844337c8a87d3..f85e600781fce5ac7104fc6cb002553f1ef4cd89 100644 --- a/tutorials/training/source_en/advanced_use/test_model_security_fuzzing.md +++ b/tutorials/training/source_en/advanced_use/test_model_security_fuzzing.md @@ -12,7 +12,7 @@ - [Fuzz Testing Application](#fuzz-testing-application) -   +   ## Overview @@ -22,7 +22,7 @@ The fuzz testing module of MindArmour uses the neuron coverage rate as the test The LeNet model and MNIST dataset are used as an example to describe how to use Fuzz testing. -> This example is for CPUs, GPUs, and Ascend 910 AI processors. You can download the complete sample code at . +> This example is for CPUs, GPUs, and Ascend 910 AI processors. You can download the complete sample code at . ## Implementation @@ -60,7 +60,7 @@ For details about the API configuration, see the `context.set_context`. ### Fuzz testing Application -1. Create a LeNet model and load the MNIST dataset. The operation is the same as that for [Model Security](). +1. Create a LeNet model and load the MNIST dataset. The operation is the same as that for [Model Security](). ```python ... @@ -101,7 +101,7 @@ For details about the API configuration, see the `context.set_context`. The data mutation method must include the method based on the image pixel value changes. - The first two image transform methods support user-defined configuration parameters and randomly generated parameters by algorithms. For user-defined configuration parameters see the class methods corresponding to https://gitee.com/mindspore/mindarmour/blob/master/mindarmour/fuzz_testing/image_transform.py. For randomly generated parameters by algorithms you can set method's params to `'auto_param': [True]`. The mutation parameters are randomly generated within the recommended range. + The first two image transform methods support user-defined configuration parameters and randomly generated parameters by algorithms. For user-defined configuration parameters see the class methods corresponding to https://gitee.com/mindspore/mindarmour/blob/r1.0/mindarmour/fuzz_testing/image_transform.py. For randomly generated parameters by algorithms you can set method's params to `'auto_param': [True]`. The mutation parameters are randomly generated within the recommended range. For details about how to set parameters based on the attack defense method, see the corresponding attack method class. diff --git a/tutorials/training/source_en/quick_start/quick_start.md b/tutorials/training/source_en/quick_start/quick_start.md index 7713d5744068a1470bddd7336384f3f4e7cd6b2a..fb7af9e4600ee456d82af54b8f7b4cfac068541d 100644 --- a/tutorials/training/source_en/quick_start/quick_start.md +++ b/tutorials/training/source_en/quick_start/quick_start.md @@ -180,7 +180,7 @@ In the preceding information: Perform the shuffle and batch operations, and then perform the repeat operation to ensure that data is unique during one epoch. -> MindSpore supports multiple data processing and augmentation operations, which are usually used in combined. For details, see section [Data Processing](https://www.mindspore.cn/tutorial/training/en/r1.0/use/data_preparation.html) and [Augmentation](https://www.mindspore.cn/doc/programming_guide/en/r1.0/augmentation.html) in the MindSpore Tutorials. +> MindSpore supports multiple data processing and augmentation operations, which are usually used in combined. For details, see section [Data Processing](https://www.mindspore.cn/doc/programming_guide/en/r1.0/pipeline.html) and [Augmentation](https://www.mindspore.cn/doc/programming_guide/en/r1.0/augmentation.html) in the MindSpore Tutorials. ## Defining the Network diff --git a/tutorials/training/source_zh_cn/advanced_use/converse_dataset.md b/tutorials/training/source_zh_cn/advanced_use/converse_dataset.md index 0c337a304ad7e9a5fda5758516d4671553e2f5ad..fda3ca2825af0f10717c860b8a5feeed23129c53 100644 --- a/tutorials/training/source_zh_cn/advanced_use/converse_dataset.md +++ b/tutorials/training/source_zh_cn/advanced_use/converse_dataset.md @@ -9,6 +9,7 @@ - [概述](#概述) - [基本概念](#基本概念) - [将数据集转换为MindRecord](#将数据集转换为mindrecord) + - [读取MindRecord数据集](#读取MindRecord数据集) @@ -97,13 +98,15 @@ MindSpore数据格式的目标是归一化用户的数据集,并进一步通 5. 创建`FileWriter`对象,传入文件名及分片数量,然后添加Schema文件及索引,调用`write_raw_data`接口写入数据,最后调用`commit`接口生成本地数据文件。 ```python - writer = FileWriter(file_name="testWriter.mindrecord", shard_num=4) + writer = FileWriter(file_name="test.mindrecord", shard_num=4) writer.add_schema(cv_schema_json, "test_schema") writer.add_index(indexes) writer.write_raw_data(data) writer.commit() ``` + 该示例会生成 `test.mindrecord0`,`test.mindrecord0.db`,`test.mindrecord1`,`test.mindrecord1.db`,`test.mindrecord2`,`test.mindrecord2.db`,`test.mindrecord3`,`test.mindrecord3.db` 共8个文件,称为MindRecord数据集。`test.mindrecord0` 和 `test.mindrecord0.db` 称为1个MindRecord文件,其中:`test.mindrecord0`为数据文件,`test.mindrecord0.db`为索引文件。 + **接口说明:** - `write_raw_data`:将数据写入到内存之中。 - `commit`:将最终内存中的数据写入到磁盘。 @@ -111,7 +114,28 @@ MindSpore数据格式的目标是归一化用户的数据集,并进一步通 6. 如果需要在现有数据格式文件中增加新数据,可以调用`open_for_append`接口打开已存在的数据文件,继续调用`write_raw_data`接口写入新数据,最后调用`commit`接口生成本地数据文件。 ```python - writer = FileWriter.open_for_append("testWriter.mindrecord0") + writer = FileWriter.open_for_append("test.mindrecord0") writer.write_raw_data(data) writer.commit() ``` + +## 读取MindRecord数据集 + +下面将简单演示如何读取MindRecord数据集成Dataset。 + +1. 导入读取类`MindDataset`。 + + ```python + import mindspore.dataset as ds + ``` + +2. 使用`MindDataset`读取MindRecord数据集。 + + ```python + data_set = ds.MindDataset(dataset_file="test.mindrecord0") # Read full data set + count = 0 + for item in data_set.create_dict_iterator(output_numpy=True): + print("sample: {}".format(item)) + count += 1 + print("Got {} samples".format(count)) + ``` diff --git a/tutorials/training/source_zh_cn/advanced_use/custom_debugging_info.md b/tutorials/training/source_zh_cn/advanced_use/custom_debugging_info.md index 4165625db71e829029404b5a9b31cca192f08e05..146916d8c58fd845f4fe6f56bbeec652e430841d 100644 --- a/tutorials/training/source_zh_cn/advanced_use/custom_debugging_info.md +++ b/tutorials/training/source_zh_cn/advanced_use/custom_debugging_info.md @@ -298,7 +298,7 @@ val:[[1 1] - `net_name`:自定义的网络名称,例如:"ResNet50"。 - `iteration`:指定需要Dump的迭代,若设置成0,表示Dump所有的迭代。 - `input_output`:设置成0,表示Dump出算子的输入和算子的输出;设置成1,表示Dump出算子的输入;设置成2,表示Dump出算子的输出。 - - `kernels`:算子的全称,可以通过开启IR保持开关`context.set_context(save_graphs=True)`执行用例,从生成的`hwopt_d_end_graph_{graph_id}.ir`文件获取。 + - `kernels`:算子的全称,可以通过开启IR保持开关`context.set_context(save_graphs=True)`执行用例,从生成的`ir`文件获取。例如,`device_target`为`Ascend`时,可以从`hwopt_d_end_graph_{graph_id}.ir`中获取算子全称,`device_target`为`GPU`时,可以从`hwopt_pm_7_getitem_tuple.ir`中获取算子全称。 - `support_device`:支持的设备,默认设置成0到7即可;在分布式训练场景下,需要dump个别设备上的数据,可以只在`support_device`中指定需要Dump的设备Id。 - `enable`:开启E2E Dump。 - `trans_flag`:开启格式转换。将设备上的数据格式转换成NCHW格式。 @@ -313,7 +313,7 @@ val:[[1 1] - 在分布式场景下,Dump环境变量需要调用`mindspore.communication.management.init`之前配置。 3. 执行用例Dump数据。 - + 可以在训练脚本中设置`context.set_context(reserve_class_name_in_scope=False)`,避免Dump文件名称过长导致Dump数据文件生成失败。 4. 解析Dump数据。 通过`numpy.fromfile`读取Dump数据文件即可解析。 diff --git a/tutorials/training/source_zh_cn/advanced_use/cv_resnet50.md b/tutorials/training/source_zh_cn/advanced_use/cv_resnet50.md index 10364968aaea356672f6fcd1ce32fff9462a50c9..ceab7ad2d14e70f10dafaa739892533eafda5a44 100644 --- a/tutorials/training/source_zh_cn/advanced_use/cv_resnet50.md +++ b/tutorials/training/source_zh_cn/advanced_use/cv_resnet50.md @@ -39,7 +39,7 @@ def classify(image): 选择合适的model是关键。这里的model一般指的是深度卷积神经网络,如AlexNet、VGG、GoogLeNet、ResNet等等。 -MindSpore实现了典型的卷积神经网络,开发者可以参考[model_zoo](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official)。 +MindSpore实现了典型的卷积神经网络,开发者可以参考[model_zoo](https://gitee.com/mindspore/mindspore/tree/r1.0/model_zoo/official)。 MindSpore当前支持的图像分类网络包括:典型网络LeNet、AlexNet、ResNet。 @@ -148,7 +148,7 @@ tar -zvxf cifar-10-binary.tar.gz ResNet通常是较好的选择。首先,它足够深,常见的有34层,50层,101层。通常层次越深,表征能力越强,分类准确率越高。其次,可学习,采用了残差结构,通过shortcut连接把低层直接跟高层相连,解决了反向传播过程中因为网络太深造成的梯度消失问题。此外,ResNet网络的性能很好,既表现为识别的准确率,也包括它本身模型的大小和参数量。 -MindSpore Model Zoo中已经实现了ResNet模型,可以采用[ResNet-50](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet/src/resnet.py)。调用方法如下: +MindSpore Model Zoo中已经实现了ResNet模型,可以采用[ResNet-50](https://gitee.com/mindspore/mindspore/blob/r1.0/model_zoo/official/cv/resnet/src/resnet.py)。调用方法如下: ```python network = resnet50(class_num=10) diff --git a/tutorials/training/source_zh_cn/advanced_use/cv_resnet50_second_order_optimizer.md b/tutorials/training/source_zh_cn/advanced_use/cv_resnet50_second_order_optimizer.md index e319100ec1a5d126b69859e58f23952c921d297e..06d420ac55f50a1de09fa65cd1fd9c62bd9f000d 100644 --- a/tutorials/training/source_zh_cn/advanced_use/cv_resnet50_second_order_optimizer.md +++ b/tutorials/training/source_zh_cn/advanced_use/cv_resnet50_second_order_optimizer.md @@ -41,7 +41,7 @@ MindSpore开发团队在现有的自然梯度算法的基础上,对FIM矩阵 本篇教程将主要介绍如何在Ascend 910 以及GPU上,使用MindSpore提供的二阶优化器THOR训练ResNet50-v1.5网络和ImageNet数据集。 > 你可以在这里下载完整的示例代码: - 。 + 。 示例代码目录结构 @@ -164,11 +164,11 @@ def create_dataset(dataset_path, do_train, repeat_num=1, batch_size=32, target=" return ds ``` -> MindSpore支持进行多种数据处理和增强的操作,各种操作往往组合使用,具体可以参考[数据处理](https://www.mindspore.cn/tutorial/training/zh-CN/r1.0/use/data_preparation.html)和[数据增强](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/augmentation.html)章节。 +> MindSpore支持进行多种数据处理和增强的操作,各种操作往往组合使用,具体可以参考[数据处理](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/pipeline.html)和[数据增强](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/augmentation.html)章节。 ## 定义网络 -本示例中使用的网络模型为ResNet50-v1.5,先定义[ResNet50网络](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet/src/resnet.py),然后使用二阶优化器自定义的算子替换`Conv2d`和 +本示例中使用的网络模型为ResNet50-v1.5,先定义[ResNet50网络](https://gitee.com/mindspore/mindspore/blob/r1.0/model_zoo/official/cv/resnet/src/resnet.py),然后使用二阶优化器自定义的算子替换`Conv2d`和 和`Dense`算子。定义好的网络模型在在源码`src/resnet_thor.py`脚本中,自定义的算子`Conv2d_thor`和`Dense_thor`在`src/thor_layer.py`脚本中。 - 使用`Conv2d_thor`替换原网络模型中的`Conv2d` diff --git a/tutorials/training/source_zh_cn/advanced_use/distributed_training_ascend.md b/tutorials/training/source_zh_cn/advanced_use/distributed_training_ascend.md index 6518c7570a7a58ac83e0b090fb2a455eb5d08182..a57e1d8ef47637c78aa33b606cd510837c952e86 100644 --- a/tutorials/training/source_zh_cn/advanced_use/distributed_training_ascend.md +++ b/tutorials/training/source_zh_cn/advanced_use/distributed_training_ascend.md @@ -525,8 +525,8 @@ ckpt_config = CheckpointConfig(keep_checkpoint_max=1) ckpt_config = CheckpointConfig(keep_checkpoint_max=1, integrated_save=False) ``` -需要注意的是,如果用户选择了这种checkpoint保存方式,那么就需要用户自己对切分的checkpoint进行保存和加载,以便进行后续的推理或再训练。具体用法可参考(https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/checkpoint_for_hybrid_parallel.html#checkpoint)。 +需要注意的是,如果用户选择了这种checkpoint保存方式,那么就需要用户自己对切分的checkpoint进行保存和加载,以便进行后续的推理或再训练。具体用法可参考[对保存的checkpoint文件做合并处理](https://www.mindspore.cn/tutorial/training/zh-CN/r1.0/advanced_use/save_load_model_hybrid_parallel.html#checkpoint)。 ### 手动混合并行模式 -手动混合并行模式(Hybrid Parallel)的模型参数保存和加载请参考[手动设置并行场景模型参数的保存和加载](https://www.mindspore.cn/tutorial/training/zh-CN/r1.0/advanced_use/checkpoint_for_hybrid_parallel.html)。 +手动混合并行模式(Hybrid Parallel)的模型参数保存和加载请参考[手动设置并行场景模型参数的保存和加载](https://www.mindspore.cn/tutorial/training/zh-CN/r1.0/advanced_use/save_load_model_hybrid_parallel.html)。 diff --git a/tutorials/training/source_zh_cn/advanced_use/migrate_3rd_scripts.md b/tutorials/training/source_zh_cn/advanced_use/migrate_3rd_scripts.md index 5eddd3faaebb54f7eabb8b5f5481035f6ad5ed17..284144aa45474ed29b4169d863a4650d3bb4593d 100644 --- a/tutorials/training/source_zh_cn/advanced_use/migrate_3rd_scripts.md +++ b/tutorials/training/source_zh_cn/advanced_use/migrate_3rd_scripts.md @@ -57,7 +57,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差别,迁移前需要对原脚本有较为清晰的了解,明确地知道每一层的shape等信息。 -> 你也可以使用[MindConverter工具](https://gitee.com/mindspore/mindinsight/tree/master/mindinsight/mindconverter)实现PyTorch网络定义脚本到MindSpore网络定义脚本的自动转换。 +> 你也可以使用[MindConverter工具](https://gitee.com/mindspore/mindinsight/tree/r1.0/mindinsight/mindconverter)实现PyTorch网络定义脚本到MindSpore网络定义脚本的自动转换。 下面,我们以ResNet-50的迁移,并在Ascend 910上训练为例: @@ -79,7 +79,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差 num_shards=device_num, shard_id=rank_id) ``` - 然后对数据进行了数据增强、数据清洗和批处理等操作。代码详见。 + 然后对数据进行了数据增强、数据清洗和批处理等操作。代码详见。 3. 构建网络。 @@ -212,7 +212,7 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差 6. 构造整网。 - 将定义好的多个子网连接起来就是整个[ResNet-50](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet/src/resnet.py)网络的结构了。同样遵循先定义后使用的原则,在`__init__`中定义所有用到的子网,在`construct`中连接子网。 + 将定义好的多个子网连接起来就是整个[ResNet-50](https://gitee.com/mindspore/mindspore/blob/r1.0/model_zoo/official/cv/resnet/src/resnet.py)网络的结构了。同样遵循先定义后使用的原则,在`__init__`中定义所有用到的子网,在`construct`中连接子网。 7. 定义损失函数和优化器。 @@ -271,4 +271,4 @@ MindSpore与TensorFlow、PyTorch在网络结构组织方式上,存在一定差 1. [常用数据集读取样例](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/dataset_loading.html) -2. [Model Zoo](https://gitee.com/mindspore/mindspore/tree/master/model_zoo) +2. [Model Zoo](https://gitee.com/mindspore/mindspore/tree/r1.0/model_zoo) diff --git a/tutorials/training/source_zh_cn/advanced_use/mindinsight_commands.md b/tutorials/training/source_zh_cn/advanced_use/mindinsight_commands.md index 0383711471a70801b5d341f5bd3556ee1b5557cb..5fcdcbd5cd96cdc4eb39bc1ef72cf2226dcbeb99 100644 --- a/tutorials/training/source_zh_cn/advanced_use/mindinsight_commands.md +++ b/tutorials/training/source_zh_cn/advanced_use/mindinsight_commands.md @@ -13,7 +13,7 @@ - + ## 查看命令帮助信息 diff --git a/tutorials/training/source_zh_cn/advanced_use/optimize_data_processing.md b/tutorials/training/source_zh_cn/advanced_use/optimize_data_processing.md index 61c1ca2cc4231fa34a2065a68b2423e9e947f635..b1ca01a4a9b5e87842779960f00711ad15968fa5 100644 --- a/tutorials/training/source_zh_cn/advanced_use/optimize_data_processing.md +++ b/tutorials/training/source_zh_cn/advanced_use/optimize_data_processing.md @@ -410,7 +410,7 @@ Map算子可以接收Tensor算子列表,并将按照顺序应用所有的这 ### 算子融合优化方案 -提供某些融合算子,这些算子将两个或多个算子的功能聚合到一个算子中。具体内容请参考[数据增强算子](https://www.mindspore.cn/api/zh-CN/master/api/python/mindspore/mindspore.dataset.vision.html),与它们各自组件的流水线相比,这种融合算子提供了更好的性能。如图所示: +提供某些融合算子,这些算子将两个或多个算子的功能聚合到一个算子中。具体内容请参考[数据增强算子](https://www.mindspore.cn/doc/api_python/zh-CN/r1.0/mindspore/mindspore.dataset.vision.html),与它们各自组件的流水线相比,这种融合算子提供了更好的性能。如图所示: ![title](./images/operator_fusion.png) diff --git a/tutorials/training/source_zh_cn/advanced_use/protect_user_privacy_with_differential_privacy.md b/tutorials/training/source_zh_cn/advanced_use/protect_user_privacy_with_differential_privacy.md index a5f7bad2e7d89127104b83aba086d2ac47545554..dd029a44ceba8b0e9d145e955f61a34cfe252cfd 100644 --- a/tutorials/training/source_zh_cn/advanced_use/protect_user_privacy_with_differential_privacy.md +++ b/tutorials/training/source_zh_cn/advanced_use/protect_user_privacy_with_differential_privacy.md @@ -45,7 +45,7 @@ MindArmour的差分隐私模块Differential-Privacy,实现了差分隐私优 这里以LeNet模型,MNIST 数据集为例,说明如何在MindSpore上使用差分隐私优化器训练神经网络模型。 -> 本例面向Ascend 910 AI处理器,你可以在这里下载完整的样例代码: +> 本例面向Ascend 910 AI处理器,你可以在这里下载完整的样例代码: ## 实现阶段 @@ -83,7 +83,7 @@ TAG = 'Lenet5_train' ### 参数配置 -1. 设置运行环境、数据集路径、模型训练参数、checkpoint存储参数、差分隐私参数,`data_path`数据路径替换成你的数据集所在路径。更多配置可以参考。 +1. 设置运行环境、数据集路径、模型训练参数、checkpoint存储参数、差分隐私参数,`data_path`数据路径替换成你的数据集所在路径。更多配置可以参考。 ```python cfg = edict({ diff --git a/tutorials/training/source_zh_cn/advanced_use/test_model_security_membership_inference.md b/tutorials/training/source_zh_cn/advanced_use/test_model_security_membership_inference.md index 4473f1146c22e9ef50162b820563c345bf087313..8ab51825fa52d9630392d649974b5890669b8a85 100644 --- a/tutorials/training/source_zh_cn/advanced_use/test_model_security_membership_inference.md +++ b/tutorials/training/source_zh_cn/advanced_use/test_model_security_membership_inference.md @@ -25,7 +25,7 @@ >本例面向Ascend 910处理器,您可以在这里下载完整的样例代码: > -> +> ## 实现阶段 diff --git a/tutorials/training/source_zh_cn/quick_start/quick_start.md b/tutorials/training/source_zh_cn/quick_start/quick_start.md index fed3d18221851d39fd82c36bc7d75b31f1e9f1cc..f00c5ca35456cff1a4f2b6b1f0817030fa29a9f0 100644 --- a/tutorials/training/source_zh_cn/quick_start/quick_start.md +++ b/tutorials/training/source_zh_cn/quick_start/quick_start.md @@ -185,7 +185,7 @@ def create_dataset(data_path, batch_size=32, repeat_size=1, 先进行shuffle、batch操作,再进行repeat操作,这样能保证1个epoch内数据不重复。 -> MindSpore支持进行多种数据处理和增强的操作,各种操作往往组合使用,具体可以参考[数据处理](https://www.mindspore.cn/tutorial/training/zh-CN/r1.0/use/data_preparation.html)和与[数据增强](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/augmentation.html)章节。 +> MindSpore支持进行多种数据处理和增强的操作,各种操作往往组合使用,具体可以参考[数据处理](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/pipeline.html)和与[数据增强](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/augmentation.html)章节。 ## 定义网络 diff --git a/tutorials/training/source_zh_cn/quick_start/quick_video/saving_and_loading_model_parameters.md b/tutorials/training/source_zh_cn/quick_start/quick_video/saving_and_loading_model_parameters.md index 0c8baefbb691a375895d6eb7a64b82994f5b1823..2658722528b5b0fa4ce49cf899dd197fcf48daff 100644 --- a/tutorials/training/source_zh_cn/quick_start/quick_video/saving_and_loading_model_parameters.md +++ b/tutorials/training/source_zh_cn/quick_start/quick_video/saving_and_loading_model_parameters.md @@ -6,4 +6,4 @@ -**查看完整教程**: \ No newline at end of file +**查看完整教程**: \ No newline at end of file diff --git a/tutorials/training/source_zh_cn/use/load_dataset_text.md b/tutorials/training/source_zh_cn/use/load_dataset_text.md index fae731732d29a09dc52da9a0f01ce24b31906a69..006a6cb928293dbf8a4afd9926a43d4640fe363c 100644 --- a/tutorials/training/source_zh_cn/use/load_dataset_text.md +++ b/tutorials/training/source_zh_cn/use/load_dataset_text.md @@ -47,7 +47,7 @@ MindSpore提供的`mindspore.dataset`库可以帮助用户构建数据集对象 ## 加载数据集 -MindSpore目前支持加载文本领域常用的经典数据集和多种数据存储格式下的数据集,用户也可以通过构建自定义数据集类实现自定义方式的数据加载。各种数据集的详细加载方法,可参考编程指南中[数据集加载](https://www.mindspore.cn/api/zh-CN/master/programming_guide/dataset_loading.html)章节。 +MindSpore目前支持加载文本领域常用的经典数据集和多种数据存储格式下的数据集,用户也可以通过构建自定义数据集类实现自定义方式的数据加载。各种数据集的详细加载方法,可参考编程指南中[数据集加载](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/dataset_loading.html)章节。 下面演示使用`mindspore.dataset`中的`TextFileDataset`类加载数据集。 @@ -154,7 +154,7 @@ MindSpore目前支持的数据处理算子及其详细使用方法,可参考 ## 数据分词 -MindSpore目前支持的数据分词算子及其详细使用方法,可参考编程指南中[分词器](https://www.mindspore.cn/api/zh-CN/master/programming_guide/tokenizer.html)章节。 +MindSpore目前支持的数据分词算子及其详细使用方法,可参考编程指南中[分词器](https://www.mindspore.cn/doc/programming_guide/zh-CN/r1.0/tokenizer.html)章节。 下面演示使用`WhitespaceTokenizer`分词器来分词,该分词是按照空格来进行分词。