From 5139bcb16fcf011c5f06eb301a1dcd4f6f1e2faa Mon Sep 17 00:00:00 2001 From: huan <3174348550@qq.com> Date: Mon, 30 Jun 2025 17:24:27 +0800 Subject: [PATCH] add api links --- tutorials/source_en/beginner/autograd.md | 4 ++-- tutorials/source_en/beginner/dataset.md | 6 +++--- tutorials/source_en/beginner/mixed_precision.md | 8 ++++---- tutorials/source_en/beginner/save_load.md | 4 ++-- tutorials/source_en/beginner/train.md | 2 +- tutorials/source_en/dataset/record.ipynb | 4 ++-- tutorials/source_en/dataset/sampler.md | 4 ++-- tutorials/source_en/parallel/data_parallel.md | 4 ++-- tutorials/source_en/parallel/msrun_launcher.md | 4 ++-- tutorials/source_en/parallel/operator_parallel.md | 4 ++-- tutorials/source_en/parallel/pipeline_parallel.md | 6 +++--- tutorials/source_zh_cn/beginner/autograd.ipynb | 4 ++-- tutorials/source_zh_cn/beginner/dataset.ipynb | 6 +++--- tutorials/source_zh_cn/beginner/mixed_precision.ipynb | 8 ++++---- tutorials/source_zh_cn/beginner/save_load.ipynb | 4 ++-- tutorials/source_zh_cn/beginner/train.ipynb | 2 +- tutorials/source_zh_cn/dataset/record.ipynb | 4 ++-- tutorials/source_zh_cn/dataset/sampler.ipynb | 4 ++-- tutorials/source_zh_cn/parallel/data_parallel.md | 4 ++-- tutorials/source_zh_cn/parallel/msrun_launcher.md | 4 ++-- tutorials/source_zh_cn/parallel/operator_parallel.md | 4 ++-- tutorials/source_zh_cn/parallel/pipeline_parallel.md | 6 +++--- 22 files changed, 50 insertions(+), 50 deletions(-) diff --git a/tutorials/source_en/beginner/autograd.md b/tutorials/source_en/beginner/autograd.md index d2dbace125..9042ab805d 100644 --- a/tutorials/source_en/beginner/autograd.md +++ b/tutorials/source_en/beginner/autograd.md @@ -6,7 +6,7 @@ The training of the neural network mainly uses the back propagation algorithm. Model predictions (logits) and the correct labels are fed into the loss function to obtain the loss, and then the back propagation calculation is performed to obtain the gradients, which are finally updated to the model parameters. Automatic differentiation is able to calculate the value of the derivative of a derivable function at a point and is a generalization of the backpropagation algorithm. The main problem solved by automatic differentiation is to decompose a complex mathematical operation into a series of simple basic operations. The function shields the user from a large number of derivative details and processes, which greatly reduces the threshold of using the framework. -MindSpore uses the design philosophy of functional auto-differentiation to provide auto-differentiation interfaces `grad` and `value_and_grad` that are closer to the mathematical semantics. We introduce it below by using a simple single-level linear transform model. +MindSpore uses the design philosophy of functional auto-differentiation to provide auto-differentiation interfaces [mindspore.grad](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.grad.html) and [mindspore.value_and_grad](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.value_and_grad.html) that are closer to the mathematical semantics. We introduce it below by using a simple single-level linear transform model. ```python import numpy as np @@ -114,7 +114,7 @@ print(grads) [ 1.06568694e+00, 1.05373347e+00, 1.30146706e+00]]), Tensor(shape=[3], dtype=Float32, value= [ 1.06568694e+00, 1.05373347e+00, 1.30146706e+00])) ``` -You can see that the gradient values corresponding to $w$ and $b$ have changed. At this point, if you want to block out the effect of z on the gradient, i.e., still only find the derivative of the parameter with respect to loss, you can use the `ops.stop_gradient` interface to truncate the gradient here. We add the `function` implementation to `stop_gradient` and execute it. +You can see that the gradient values corresponding to $w$ and $b$ have changed. At this point, if you want to block out the effect of z on the gradient, i.e., still only find the derivative of the parameter with respect to loss, you can use the [mindspore.ops.stop_gradient](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.stop_gradient.html) interface to truncate the gradient here. We add the `function` implementation to `stop_gradient` and execute it. ```python def function_stop_gradient(x, y, w, b): diff --git a/tutorials/source_en/beginner/dataset.md b/tutorials/source_en/beginner/dataset.md index 6f6f592a43..9e44abce6c 100644 --- a/tutorials/source_en/beginner/dataset.md +++ b/tutorials/source_en/beginner/dataset.md @@ -31,11 +31,11 @@ import matplotlib.pyplot as plt ## Loading a Dataset -The `mindspore.dataset` module provides loading APIs for custom datasets, standard format datasets, and commonly used publicly datasets. +The [mindspore.dataset](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html) module provides loading APIs for custom datasets, standard format datasets, and commonly used publicly datasets. ### Customizing Dataset -For those datasets that MindSpore does not support yet, it is suggested to load data by constructing customized classes or customized generators. `GeneratorDataset` can help to load dataset based on the logic inside these classes/functions. +For those datasets that MindSpore does not support yet, it is suggested to load data by constructing customized classes or customized generators. [GeneratorDataset](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html) can help to load dataset based on the logic inside these classes/functions. `GeneratorDataset` supports constructing customized datasets from random-accessible objects, iterable objects and Python generator, which are explained in detail below. @@ -150,7 +150,7 @@ for d in dataset: ### Standard-format Dataset -For those datasets that MindSpore does not support yet, it is suggested to convert the dataset into `MindRecord` format and load it through the **MindDataset** interface. +For those datasets that MindSpore does not support yet, it is suggested to convert the dataset into `MindRecord` format and load it through the [mindspore.dataset.MindDataset](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.MindDataset.html) interface. Firstly, create a new `MindRecord` format dataset using the `MindRecord` format interface **FileWriter**, where each sample contains three fields: `filename`, `label`, and `data`. diff --git a/tutorials/source_en/beginner/mixed_precision.md b/tutorials/source_en/beginner/mixed_precision.md index f7ee25d70e..9b78d8e34d 100644 --- a/tutorials/source_en/beginner/mixed_precision.md +++ b/tutorials/source_en/beginner/mixed_precision.md @@ -184,7 +184,7 @@ class NetworkFP16Manual(nn.Cell): ## Loss Scaling -Two implementations of Loss Scale are provided in MindSpore, `StaticLossScaler` and `DynamicLossScaler`, whose difference is whether the loss scale value is dynamically adjusted. The following is an example of `DynamicLossScalar`, which implements the neural network training logic according to the mixed precision calculation process. +Two implementations of Loss Scale are provided in MindSpore, [mindspore.amp.StaticLossScaler](https://www.mindspore.cn/docs/en/master/api_python/amp/mindspore.amp.StaticLossScaler.html) and [mindspore.amp.DynamicLossScalar](https://www.mindspore.cn/docs/en/master/api_python/amp/mindspore.amp.DynamicLossScalar.html), whose difference is whether the loss scale value is dynamically adjusted. The following is an example of `DynamicLossScalar`, which implements the neural network training logic according to the mixed precision calculation process. First, instantiate the LossScaler and manually scale up the loss value when defining the forward network. @@ -261,9 +261,9 @@ It can be seen that the loss convergence is normal and there is no overflow prob ## Automatic Mixed Precision for `Cell` Configuration -MindSpore supports a programming paradigm that uses Cell to encapsulate the full computational graph. When the `mindspore.amp.build_train_network` interface can be used to automatically perform the type conversion and pass in the Loss Scale as part of the full graph computation. At this point, you only need to configure the mixed precision level and `LossScaleManager` to get the computational graph with the configured automatic mixed precision. +MindSpore supports a programming paradigm that uses Cell to encapsulate the full computational graph. When the [mindspore.amp.build_train_network](https://www.mindspore.cn/docs/en/master/api_python/amp/mindspore.amp.build_train_network.html) interface can be used to automatically perform the type conversion and pass in the Loss Scale as part of the full graph computation. At this point, you only need to configure the mixed precision level and `LossScaleManager` to get the computational graph with the configured automatic mixed precision. -`FixedLossScaleManager` and `DynamicLossScaleManager` are the Loss scale management interfaces for configuring the automatic mixed precision with `Cell`, corresponding to `StaticLossScalar` and `DynamicLossScalar`, respectively. For detailed information, refer to [mindspore.amp](https://www.mindspore.cn/docs/en/master/api_python/mindspore.amp.html). +[mindspore.amp.FixedLossScaleManager](https://www.mindspore.cn/docs/en/master/api_python/amp/mindspore.amp.FixedLossScaleManager.html) and [mindspore.amp.DynamicLossScaleManager](https://www.mindspore.cn/docs/en/master/api_python/amp/mindspore.amp.DynamicLossScaleManager.html) are the Loss scale management interfaces for configuring the automatic mixed precision with `Cell`, corresponding to `StaticLossScalar` and `DynamicLossScalar`, respectively. For detailed information, refer to [mindspore.amp](https://www.mindspore.cn/docs/en/master/api_python/mindspore.amp.html). > Automated mixed precision training with `Cell` configuration supports only `GPU` and `Ascend`. @@ -278,7 +278,7 @@ model = build_train_network(model, optimizer, loss_fn, level="O2", loss_scale_ma ## `Model` Configures Automatic Mixed Precision -`mindspore.train.Model` is a high level encapsulation for fast training of neural networks, which encapsulates `mindspore.amp.build_train_network`, so again, only the mixed precision level and `LossScaleManager` need to be configured for automatic mixed precision training. +[mindspore.train.Model](https://www.mindspore.cn/docs/en/master/api_python/train/mindspore.train.Model.html) is a high level encapsulation for fast training of neural networks, which encapsulates `mindspore.amp.build_train_network`, so again, only the mixed precision level and `LossScaleManager` need to be configured for automatic mixed precision training. > Automated mixed precision training with `Model` configuration supports only `GPU` and `Ascend`. diff --git a/tutorials/source_en/beginner/save_load.md b/tutorials/source_en/beginner/save_load.md index 19304aecce..434f7ce05a 100644 --- a/tutorials/source_en/beginner/save_load.md +++ b/tutorials/source_en/beginner/save_load.md @@ -27,7 +27,7 @@ def network(): ## Saving and Loading the Model Weight -Saving model by using the `save_checkpoint` interface, and the specified saving path of passing in the network: +Saving model by using the [mindspore.save_checkpoint](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.save_checkpoint.html) interface, and the specified saving path of passing in the network: ```python model = network() @@ -63,7 +63,7 @@ mindspore.export(model, inputs, file_name="model", file_format="MINDIR") > MindIR saves both Checkpoint and model structure, so it needs to define the input Tensor to get the input shape. -The existing MindIR model can be easily loaded through the `load` interface and passed into `nn.GraphCell` for inference. +The existing MindIR model can be easily loaded through the `load` interface and passed into [mindspore.nn.GraphCell](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.GraphCell.html) for inference. > `nn.GraphCell` only supports graph mode. diff --git a/tutorials/source_en/beginner/train.md b/tutorials/source_en/beginner/train.md index 77ee933406..39c1b0987c 100644 --- a/tutorials/source_en/beginner/train.md +++ b/tutorials/source_en/beginner/train.md @@ -108,7 +108,7 @@ learning_rate = 1e-2 The loss function is used to evaluate the error between the model's predictions (logits) and targets (targets). When training a model, a randomly initialized neural network model starts to predict the wrong results. The loss function evaluates how different the predicted results are from the targets, and the goal of model training is to reduce the error obtained by the loss function. -Common loss functions include `nn.MSELoss` (mean squared error) for regression tasks and `nn.NLLLoss` (negative log-likelihood) for classification. `nn.CrossEntropyLoss` combines `nn.LogSoftmax` and `nn.NLLLoss` to normalize logits and calculate prediction errors. +Common loss functions include [mindspore.nn.MSELoss](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.MSELoss.html) (mean squared error) for regression tasks and [mindspore.nn.NLLLoss](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.NLLLoss.html) (negative log-likelihood) for classification. [mindspore.nn.CrossEntropyLoss](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.CrossEntropyLoss.html) combines [mindspore.nn.LogSoftmax](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.LogSoftmax.html) and [mindspore.nn.NLLLoss](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.NLLLoss.html) to normalize logits and calculate prediction errors. ```python loss_fn = nn.CrossEntropyLoss() diff --git a/tutorials/source_en/dataset/record.ipynb b/tutorials/source_en/dataset/record.ipynb index c773c10b32..54e94f419e 100644 --- a/tutorials/source_en/dataset/record.ipynb +++ b/tutorials/source_en/dataset/record.ipynb @@ -15,7 +15,7 @@ "id": "7fbc6b7a", "metadata": {}, "source": [ - "In MindSpore, the dataset used to train the network model can be converted into MindSpore-specific data format (MindSpore Record), making it easier to save and load data. The goal is to normalize the user's dataset and further enable the reading of the data through the `MindDataset` interface and use it during the training process.\n", + "In MindSpore, the dataset used to train the network model can be converted into MindSpore-specific data format (MindSpore Record), making it easier to save and load data. The goal is to normalize the user's dataset and further enable the reading of the data through the [mindspore.dataset.MindDataset](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.MindDataset.html) interface and use it during the training process.\n", "\n", "![conversion](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_en/dataset/images/data_conversion_concept.png)\n", "\n", @@ -286,7 +286,7 @@ "\n", "### Dumping the CIFAR-10 Dataset\n", "\n", - "Users can convert CIFAR-10 raw data to MindSpore Record and read it using the `MindDataset` interface via the `Dataset.save` class.\n", + "Users can convert CIFAR-10 raw data to MindSpore Record and read it using the `MindDataset` interface via the [mindspore.dataset.Dataset.save](https://www.mindspore.cn/docs/en/master/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.save.html) method.\n", "\n", "1. Download the [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz) and use `Cifar10Dataset` to load." ] diff --git a/tutorials/source_en/dataset/sampler.md b/tutorials/source_en/dataset/sampler.md index 9009bbb302..6b37d6a306 100644 --- a/tutorials/source_en/dataset/sampler.md +++ b/tutorials/source_en/dataset/sampler.md @@ -8,7 +8,7 @@ Data is the foundation of training. The `mindspore.dataset` module provides APIs ### Customizing Dataset -MindSpore supports loading data by constructing customized classes or customized generators. `GeneratorDataset` can help to load dataset based on the logic inside these classes/functions. +MindSpore supports loading data by constructing customized classes or customized generators. [mindspore.dataset.GeneratorDataset](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html) can help to load dataset based on the logic inside these classes/functions. `GeneratorDataset` supports constructing customized datasets from random-accessible objects, iterable objects and Python generator, which are explained in detail below. @@ -184,7 +184,7 @@ plt.show() To meet training requirements and solve problems such as too large datasets or uneven distribution of sample categories, MindSpore provides multiple samplers for different purposes to help users sample datasets. Users only need to import the sampler object when loading the dataset to implement data sampling. -MindSpore provides multiple samplers, such as `RandomSampler`, `WeightedRandomSampler`, and `SubsetRandomSampler`. In addition, users can customize sampler classes as required. +MindSpore provides multiple samplers, such as [mindspore.dataset.RandomSampler](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.RandomSampler.html), [mindspore.dataset.WeightedRandomSampler](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.WeightedRandomSampler.html), and [mindspore.dataset.SubsetRandomSampler](https://www.mindspore.cn/docs/en/master/api_python/dataset/mindspore.dataset.SubsetRandomSampler.html). In addition, users can customize sampler classes as required. > For details about how to use the sampler, see [Sampler API](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.loading.html#sampler-1). diff --git a/tutorials/source_en/parallel/data_parallel.md b/tutorials/source_en/parallel/data_parallel.md index a9c45ce258..cd2b08db96 100644 --- a/tutorials/source_en/parallel/data_parallel.md +++ b/tutorials/source_en/parallel/data_parallel.md @@ -53,7 +53,7 @@ rank_size = get_group_size() dataset = ds.MnistDataset(dataset_path, num_shards=rank_size, shard_id=rank_id) ``` -Unlike single-card, the `num_shards` and `shard_id` parameters need to be passed in the dataset interface, corresponding to the number of cards and the logical serial number, respectively, and it is recommended to obtain them through the `mindspore.communication` interface: +Unlike single-card, the `num_shards` and `shard_id` parameters need to be passed in the dataset interface, corresponding to the number of cards and the logical serial number, respectively, and it is recommended to obtain them through the following interfaces of the [mindspore.communication](https://www.mindspore.cn/docs/en/master/api_python/mindspore.communication.html) module: - `get_rank`: Obtain the ID of the current device in the cluster. - `get_group_size`: Obtain the number of clusters. @@ -115,7 +115,7 @@ net = Network() ## Training Network -In this step, we need to define the loss function, the optimizer, and the training process. The difference with single-card model is that the data parallel mode also requires the addition of the `mindspore.nn.DistributedGradReducer()` interface to aggregate the gradients of all cards. The first parameter of the network is the network parameter to be updated: +In this step, we need to define the loss function, the optimizer, and the training process. The difference with single-card model is that the data parallel mode also requires the addition of the [mindspore.nn.DistributedGradReducer()](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.DistributedGradReducer.html) interface to aggregate the gradients of all cards. The first parameter of the network is the network parameter to be updated: ```python from mindspore import nn diff --git a/tutorials/source_en/parallel/msrun_launcher.md b/tutorials/source_en/parallel/msrun_launcher.md index 017d9c16a1..f62375ef2e 100644 --- a/tutorials/source_en/parallel/msrun_launcher.md +++ b/tutorials/source_en/parallel/msrun_launcher.md @@ -453,9 +453,9 @@ if get_rank() == 7: ms.set_seed(1) ``` -> The `mindspore.communication.get_rank()` interface needs to be called after the `mindspore.communication.init()` interface has completed its distributed initialization to get the rank information properly, otherwise `get_rank()` returns 0 by default. +> The [mindspore.communication.get_rank()](https://www.mindspore.cn/docs/en/master/api_python/communication/mindspore.communication.get_rank.html) interface needs to be called after the [mindspore.communication.init()](https://www.mindspore.cn/docs/en/master/api_python/communication/mindspore.communication.init.html) interface has completed its distributed initialization to get the rank information properly, otherwise `get_rank()` returns 0 by default. -After a breakpoint operation on a rank, it will cause the execution of that rank process to stop at the breakpoint and wait for subsequent interactions, while other unbroken rank processes will continue to run, which may lead to inconsistent running speed, so you can use the `mindspore.communication.comm_func.barrier()` operator and the `mindspore.communication.api._pynative_executor.sync()` to synchronize the running of all ranks, ensuring that other ranks block and wait, and that the stops of other ranks are released once the debugging rank continues to run. For example, in a standalone 8-card task, only rank 7 is broken and all other ranks are blocked: +After a breakpoint operation on a rank, it will cause the execution of that rank process to stop at the breakpoint and wait for subsequent interactions, while other unbroken rank processes will continue to run, which may lead to inconsistent running speed, so you can use the [mindspore.communication.comm_func.barrier()](https://www.mindspore.cn/docs/en/master/api_python/communication/mindspore.communication.comm_func.barrier.html) operator and the `mindspore.communication.api._pynative_executor.sync()` to synchronize the running of all ranks, ensuring that other ranks block and wait, and that the stops of other ranks are released once the debugging rank continues to run. For example, in a standalone 8-card task, only rank 7 is broken and all other ranks are blocked: ```python import pdb diff --git a/tutorials/source_en/parallel/operator_parallel.md b/tutorials/source_en/parallel/operator_parallel.md index ea1588669a..788386d257 100644 --- a/tutorials/source_en/parallel/operator_parallel.md +++ b/tutorials/source_en/parallel/operator_parallel.md @@ -107,7 +107,7 @@ The `ops.MatMul()` and `ops.ReLU()` operators for the above networks are configu #### Training Network Definition -In this step, we need to define the loss function, the optimizer, and the training process. Note that due to the huge number of parameters of the large model, the graphics memory will be far from sufficient if parameter initialization is performed when defining the network on a single card. Therefore, delayed initialization is required when defining the network in conjunction with the `no_init_parameters` interface to delay parameter initialization until the parallel multicard phase. Here both network and optimizer definitions need to be delayed initialized. +In this step, we need to define the loss function, the optimizer, and the training process. Note that due to the huge number of parameters of the large model, the graphics memory will be far from sufficient if parameter initialization is performed when defining the network on a single card. Therefore, delayed initialization is required when defining the network in conjunction with the [mindspore.nn.utils.no_init_parameters](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.utils.no_init_parameters.html) interface to delay parameter initialization until the parallel multicard phase. Here both network and optimizer definitions need to be delayed initialized. ```python from mindspore.nn.utils import no_init_parameters @@ -250,7 +250,7 @@ data_set = create_dataset(32) #### Defining the Network -In the current mint operator parallel mode, the network needs to be defined with mint operators. Since the mint operators, as a functional interface, does not directly expose its operator type (Primitive), it is impossible to directly configure the slicing strategy for the operator. Instead, users need to manually configure the slicing strategy for mint operators by using `mindspore.parallel.shard` interface based on a single-card network, e.g., the network structure after configuring the strategy is: +In the current mint operator parallel mode, the network needs to be defined with mint operators. Since the mint operators, as a functional interface, does not directly expose its operator type (Primitive), it is impossible to directly configure the slicing strategy for the operator. Instead, users need to manually configure the slicing strategy for mint operators by using [mindspore.parallel.shard](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.shard.html) interface based on a single-card network, e.g., the network structure after configuring the strategy is: ```python import mindspore as ms diff --git a/tutorials/source_en/parallel/pipeline_parallel.md b/tutorials/source_en/parallel/pipeline_parallel.md index 0580987529..f6f1ee36f3 100644 --- a/tutorials/source_en/parallel/pipeline_parallel.md +++ b/tutorials/source_en/parallel/pipeline_parallel.md @@ -122,11 +122,11 @@ class Network(nn.Cell): ### Training Network Definition -In this step, we need to define the loss function, the optimizer, and the training process. It should be noted that the definitions of both the network and the optimizer here require deferred initialization. Besides, the interface `PipelineGradReducer` is needed to handle gradient of pipeline parallelism, the first parameter of this interface is the network parameter to be updated, and the second one is whether to use optimizer parallelism. +In this step, we need to define the loss function, the optimizer, and the training process. It should be noted that the definitions of both the network and the optimizer here require deferred initialization. Besides, the interface [mindspore.parallel.nn.PipelineGradReducer](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.nn.PipelineGradReducer.html) is needed to handle gradient of pipeline parallelism, the first parameter of this interface is the network parameter to be updated, and the second one is whether to use optimizer parallelism. Unlike the single-card model, two interfaces need to be called in this section to configure the pipeline parallel: -- First define the LossCell. In this case the `nn.WithLossCell` interface is called to encapsulate the network and loss functions. +- First define the LossCell. In this case the [mindspore.nn.WithLossCell](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.WithLossCell.html) interface is called to encapsulate the network and loss functions. - Finally, wrap the LossCell with `Pipeline`, and specify the size of MicroBatch. Configure the `pipeline_stage` for each `Cell` containing training parameters via `stage_config`. ```python @@ -353,7 +353,7 @@ In the previous step, the parameter `embed` is shared by `self.word_embedding` a We need to further set up the parallelism-related configuration by wrapping the network again with `AutoParallel`, specifying the parallelism mode `semi-auto` as semi-automatic parallelism, in addition to turning on pipeline parallelism, configuring `pipeline`, and specifying the total number of stages by configuring the number of `stages`. If `device_target` is not set here, it will be automatically specified as the backend hardware device corresponding to the MindSpore package (default is Ascend). `output_broadcast=True` indicates that the result of the last stage will be broadcast to the remaining stages when pipelined parallel inference is performed, which can be used in autoregressive inference scenarios. -Before inference, executing `parallel_net.compile()` and `sync_pipeline_shared_parameters(parallel_net)`, the framework will synchronize the shared parameter between stages automatically. +Before inference, executing `parallel_net.compile()` and [mindspore.parallel.sync_pipeline_shared_parameters(parallel_net)](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.sync_pipeline_shared_parameters.html), the framework will synchronize the shared parameter between stages automatically. ```python diff --git a/tutorials/source_zh_cn/beginner/autograd.ipynb b/tutorials/source_zh_cn/beginner/autograd.ipynb index 4728b1a89e..cf6b1256c3 100644 --- a/tutorials/source_zh_cn/beginner/autograd.ipynb +++ b/tutorials/source_zh_cn/beginner/autograd.ipynb @@ -17,7 +17,7 @@ "\n", "神经网络的训练主要使用反向传播算法,模型预测值(logits)与正确标签(label)送入损失函数(loss function)获得loss,然后进行反向传播计算,求得梯度(gradients),最终更新至模型参数(parameters)。自动微分能够计算可导函数在某点处的导数值,是反向传播算法的一般化。自动微分主要解决的问题是将一个复杂的数学运算分解为一系列简单的基本运算,该功能对用户屏蔽了大量的求导细节和过程,大大降低了框架的使用门槛。\n", "\n", - "MindSpore使用函数式自动微分的设计理念,提供更接近于数学语义的自动微分接口`grad`和`value_and_grad`。下面我们使用一个简单的单层线性变换模型进行介绍。" + "MindSpore使用函数式自动微分的设计理念,提供更接近于数学语义的自动微分接口[mindspore.grad](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.grad.html)和[mindspore.value_and_grad](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.value_and_grad.html)。下面我们使用一个简单的单层线性变换模型进行介绍。" ] }, { @@ -225,7 +225,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "可以看到求得$w$、$b$对应的梯度值发生了变化。此时如果想要屏蔽掉z对梯度的影响,即仍只求参数对loss的导数,可以使用`ops.stop_gradient`接口,将梯度在此处截断。我们将`function`实现加入`stop_gradient`,并执行。" + "可以看到求得$w$、$b$对应的梯度值发生了变化。此时如果想要屏蔽掉z对梯度的影响,即仍只求参数对loss的导数,可以使用[mindspore.ops.stop_gradient](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.stop_gradient.html)接口,将梯度在此处截断。我们将`function`实现加入`stop_gradient`,并执行。" ] }, { diff --git a/tutorials/source_zh_cn/beginner/dataset.ipynb b/tutorials/source_zh_cn/beginner/dataset.ipynb index eed11ee459..a8ca89666a 100644 --- a/tutorials/source_zh_cn/beginner/dataset.ipynb +++ b/tutorials/source_zh_cn/beginner/dataset.ipynb @@ -55,11 +55,11 @@ "source": [ "## 数据集加载\n", "\n", - "`mindspore.dataset`模块提供了自定义数据集、标准格式数据集和一些常用的公开数据集的加载API。\n", + "[mindspore.dataset](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.dataset.html)模块提供了自定义数据集、标准格式数据集和一些常用的公开数据集的加载API。\n", "\n", "### 自定义数据集\n", "\n", - "对于MindSpore暂不支持直接加载的数据集,可以构造自定义数据加载类或自定义数据集生成函数的方式来生成数据集,然后通过`GeneratorDataset`接口实现自定义方式的数据集加载。\n", + "对于MindSpore暂不支持直接加载的数据集,可以构造自定义数据加载类或自定义数据集生成函数的方式来生成数据集,然后通过[GeneratorDataset](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html)接口实现自定义方式的数据集加载。\n", "\n", "`GeneratorDataset`支持通过可随机访问数据集对象、可迭代数据集对象和生成器(generator)构造自定义数据集,下面分别对其进行介绍。\n", "\n", @@ -248,7 +248,7 @@ "source": [ "### 标准格式数据集\n", "\n", - "对于MindSpore暂不支持直接加载的数据集,可以将数据集转换成**MindRecord格式**数据集,然后通过`MindDataset`接口实现数据集加载。" + "对于MindSpore暂不支持直接加载的数据集,可以将数据集转换成**MindRecord格式**数据集,然后通过[mindspore.dataset.MindDataset](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.MindDataset.html)接口实现数据集加载。" ] }, { diff --git a/tutorials/source_zh_cn/beginner/mixed_precision.ipynb b/tutorials/source_zh_cn/beginner/mixed_precision.ipynb index 5b32a38f30..cf32f97e6d 100644 --- a/tutorials/source_zh_cn/beginner/mixed_precision.ipynb +++ b/tutorials/source_zh_cn/beginner/mixed_precision.ipynb @@ -289,7 +289,7 @@ "source": [ "## 损失缩放\n", "\n", - "MindSpore中提供了两种Loss Scale的实现,分别为`StaticLossScaler`和`DynamicLossScaler`,其差异为损失缩放值scale value是否进行动态调整。下面以`DynamicLossScalar`为例,根据混合精度计算流程实现神经网络训练逻辑。\n", + "MindSpore中提供了两种Loss Scale的实现,分别为[mindspore.amp.StaticLossScaler](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.StaticLossScaler.html)和[mindspore.amp.DynamicLossScalar](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.DynamicLossScalar.html),其差异为损失缩放值scale value是否进行动态调整。下面以`DynamicLossScalar`为例,根据混合精度计算流程实现神经网络训练逻辑。\n", "\n", "首先,实例化LossScaler,并在定义前向网络时,手动放大loss值。" ] @@ -448,10 +448,10 @@ "source": [ "## `Cell`配置自动混合精度\n", "\n", - "MindSpore支持使用Cell封装完整计算图的编程范式,此时可以使用`mindspore.amp.build_train_network`接口,自动进行类型转换,并将Loss Scale传入,作为整图计算的一部分。\n", + "MindSpore支持使用Cell封装完整计算图的编程范式,此时可以使用[mindspore.amp.build_train_network](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.build_train_network.html)接口,自动进行类型转换,并将Loss Scale传入,作为整图计算的一部分。\n", "此时仅需要配置混合精度等级和`LossScaleManager`即可获得配置好自动混合精度的计算图。\n", "\n", - "`FixedLossScaleManager`和`DynamicLossScaleManager`是`Cell`配置自动混合精度的Loss scale管理接口,分别与`StaticLossScalar`和`DynamicLossScalar`对应,具体详见[mindspore.amp](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.amp.html)。\n", + "[mindspore.amp.FixedLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.FixedLossScaleManager.html)和[mindspore.amp.DynamicLossScaleManager](https://www.mindspore.cn/docs/zh-CN/master/api_python/amp/mindspore.amp.DynamicLossScaleManager.html)是`Cell`配置自动混合精度的Loss scale管理接口,分别与`StaticLossScalar`和`DynamicLossScalar`对应,具体详见[mindspore.amp](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.amp.html)。\n", "\n", "> 使用`Cell`配置自动混合精度训练仅支持`GPU`和`Ascend`。" ] @@ -484,7 +484,7 @@ "id": "d5c8adf4-71bb-4061-8801-9a4f97168dc5", "metadata": {}, "source": [ - "`mindspore.train.Model`是神经网络快速训练的高阶封装,其将`mindspore.amp.build_train_network`封装在内,因此同样只需要配置混合精度等级和`LossScaleManager`,即可进行自动混合精度训练。\n", + "[mindspore.train.Model](https://www.mindspore.cn/docs/zh-CN/master/api_python/train/mindspore.train.Model.html)是神经网络快速训练的高阶封装,其将`mindspore.amp.build_train_network`封装在内,因此同样只需要配置混合精度等级和`LossScaleManager`,即可进行自动混合精度训练。\n", "\n", "> 使用`Model`配置自动混合精度训练仅支持`GPU`和`Ascend`。" ] diff --git a/tutorials/source_zh_cn/beginner/save_load.ipynb b/tutorials/source_zh_cn/beginner/save_load.ipynb index 53da365e99..c3f6fe06fc 100644 --- a/tutorials/source_zh_cn/beginner/save_load.ipynb +++ b/tutorials/source_zh_cn/beginner/save_load.ipynb @@ -63,7 +63,7 @@ "source": [ "## 保存和加载模型权重\n", "\n", - "保存模型使用`save_checkpoint`接口,传入网络和指定的保存路径:" + "保存模型使用[mindspore.save_checkpoint](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.save_checkpoint.html)接口,传入网络和指定的保存路径:" ] }, { @@ -162,7 +162,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "已有的MindIR模型可以方便地通过`load`接口加载,传入`nn.GraphCell`即可进行推理。\n", + "已有的MindIR模型可以方便地通过`load`接口加载,传入[mindspore.nn.GraphCell](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.GraphCell.html)即可进行推理。\n", "\n", "> `nn.GraphCell`仅支持图模式。" ] diff --git a/tutorials/source_zh_cn/beginner/train.ipynb b/tutorials/source_zh_cn/beginner/train.ipynb index 70b27668f5..9e434cc218 100644 --- a/tutorials/source_zh_cn/beginner/train.ipynb +++ b/tutorials/source_zh_cn/beginner/train.ipynb @@ -194,7 +194,7 @@ "\n", "损失函数(loss function)用于评估模型的预测值(logits)和目标值(targets)之间的误差。训练模型时,随机初始化的神经网络模型开始时会预测出错误的结果。损失函数会评估预测结果与目标值的相异程度,模型训练的目标即为降低损失函数求得的误差。\n", "\n", - "常见的损失函数包括用于回归任务的`nn.MSELoss`(均方误差)和用于分类的`nn.NLLLoss`(负对数似然)等。 `nn.CrossEntropyLoss` 结合了`nn.LogSoftmax`和`nn.NLLLoss`,可以对logits 进行归一化并计算预测误差。" + "常见的损失函数包括用于回归任务的[mindspore.nn.MSELoss](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.MSELoss.html)(均方误差)和用于分类的[mindspore.nn.NLLLoss](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.NLLLoss.html)(负对数似然)等。[mindspore.nn.CrossEntropyLoss](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.CrossEntropyLoss.html) 结合了[mindspore.nn.LogSoftmax](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.LogSoftmax.html)和[mindspore.nn.NLLLoss](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.NLLLoss.html),可以对logits 进行归一化并计算预测误差。" ] }, { diff --git a/tutorials/source_zh_cn/dataset/record.ipynb b/tutorials/source_zh_cn/dataset/record.ipynb index 019e285ad7..5206b545b6 100644 --- a/tutorials/source_zh_cn/dataset/record.ipynb +++ b/tutorials/source_zh_cn/dataset/record.ipynb @@ -10,7 +10,7 @@ "[![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_download_code.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/zh_cn/dataset/mindspore_record.py) \n", "[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/dataset/record.ipynb)\n", "\n", - "MindSpore可以把用于训练网络模型的数据集转换为MindSpore特定的数据格式(MindSpore Record),从而更加方便地保存和加载数据。其目标是归一化用户的数据集,并进一步通过`MindDataset`接口实现数据的读取,并用于训练过程。\n", + "MindSpore可以把用于训练网络模型的数据集转换为MindSpore特定的数据格式(MindSpore Record),从而更加方便地保存和加载数据。其目标是归一化用户的数据集,并进一步通过[mindspore.dataset.MindDataset](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.MindDataset.html)接口实现数据的读取,并用于训练过程。\n", "\n", "![conversion](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/dataset/images/data_conversion_concept.png)\n", "\n", @@ -273,7 +273,7 @@ "\n", "### 转存CIFAR-10数据集\n", "\n", - "用户可以通过`Dataset.save`类,将CIFAR-10原始数据转换为MindSpore Record,并使用`MindDataset`接口读取。\n", + "用户可以通过[mindspore.dataset.Dataset.save](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/dataset_method/operation/mindspore.dataset.Dataset.save.html)方法,将CIFAR-10原始数据转换为MindSpore Record,并使用`MindDataset`接口读取。\n", "\n", "1. 下载[CIFAR-10数据集](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz),并使用`Cifar10Dataset`加载。" ] diff --git a/tutorials/source_zh_cn/dataset/sampler.ipynb b/tutorials/source_zh_cn/dataset/sampler.ipynb index 5e3b576855..d122138b18 100644 --- a/tutorials/source_zh_cn/dataset/sampler.ipynb +++ b/tutorials/source_zh_cn/dataset/sampler.ipynb @@ -29,7 +29,7 @@ "\n", "### 自定义数据集\n", "\n", - "MindSpore可以构造自定义数据加载类或自定义数据集生成函数的方式来生成数据集,然后通过GeneratorDataset接口实现自定义方式的数据集加载。\n", + "MindSpore可以构造自定义数据加载类或自定义数据集生成函数的方式来生成数据集,然后通过[mindspore.dataset.GeneratorDataset](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.GeneratorDataset.html)接口实现自定义方式的数据集加载。\n", "\n", "`GeneratorDataset` 支持通过可随机访问数据集对象、可迭代数据集对象和生成器(generator)构造自定义数据集,下面分别对其进行介绍。\n", "\n", @@ -249,7 +249,7 @@ "\n", "为满足训练需求,解决诸如数据集过大或样本类别分布不均等问题,MindSpore提供了多种不同用途的采样器(Sampler),帮助用户对数据集进行不同形式的采样。用户只需在加载数据集时传入采样器对象,即可实现数据的采样。\n", "\n", - "MindSpore目前提供了如`RandomSampler`、`WeightedRandomSampler`、`SubsetRandomSampler`等多种采样器。此外,用户也可以根据需要实现自定义的采样器类。\n", + "MindSpore目前提供了如[mindspore.dataset.RandomSampler](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.RandomSampler.html)、[mindspore.dataset.WeightedRandomSampler](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.WeightedRandomSampler.html)、[mindspore.dataset.SubsetRandomSampler](https://www.mindspore.cn/docs/zh-CN/master/api_python/dataset/mindspore.dataset.SubsetRandomSampler.html)等多种采样器。此外,用户也可以根据需要实现自定义的采样器类。\n", "\n", "> 更多采样器的使用方法参见[采样器API文档](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.dataset.loading.html#%E9%87%87%E6%A0%B7%E5%99%A8-1)。\n", "\n", diff --git a/tutorials/source_zh_cn/parallel/data_parallel.md b/tutorials/source_zh_cn/parallel/data_parallel.md index 04eb60a3ca..c93abf6cac 100644 --- a/tutorials/source_zh_cn/parallel/data_parallel.md +++ b/tutorials/source_zh_cn/parallel/data_parallel.md @@ -53,7 +53,7 @@ rank_size = get_group_size() dataset = ds.MnistDataset(dataset_path, num_shards=rank_size, shard_id=rank_id) ``` -其中,与单卡不同的是,在数据集接口需要传入`num_shards`和`shard_id`参数,分别对应卡的数量和逻辑序号,建议通过`mindspore.communication`接口获取: +其中,与单卡不同的是,在数据集接口需要传入`num_shards`和`shard_id`参数,分别对应卡的数量和逻辑序号,建议通过[mindspore.communication](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.communication.html)模块的以下接口获取: - `get_rank`:获取当前设备在集群中的ID。 - `get_group_size`:获取集群数量。 @@ -115,7 +115,7 @@ net = Network() ## 训练网络 -在这一步,我们需要定义损失函数、优化器以及训练过程。与单卡模型不同的地方在于,数据并行模式还需要增加`mindspore.nn.DistributedGradReducer()`接口,来对所有卡的梯度进行聚合,该接口第一个参数为需要更新的网络参数: +在这一步,我们需要定义损失函数、优化器以及训练过程。与单卡模型不同的地方在于,数据并行模式还需要增加[mindspore.nn.DistributedGradReducer()](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.DistributedGradReducer.html)接口,来对所有卡的梯度进行聚合,该接口第一个参数为需要更新的网络参数: ```python from mindspore import nn diff --git a/tutorials/source_zh_cn/parallel/msrun_launcher.md b/tutorials/source_zh_cn/parallel/msrun_launcher.md index 5cb77133b2..162fea5384 100644 --- a/tutorials/source_zh_cn/parallel/msrun_launcher.md +++ b/tutorials/source_zh_cn/parallel/msrun_launcher.md @@ -453,9 +453,9 @@ if get_rank() == 7: ms.set_seed(1) ``` -> `mindspore.communication.get_rank()`接口需要在调用`mindspore.communication.init()`接口完成分布式初始化后才能正常获取rank信息,否则`get_rank()`默认返回0。 +> [mindspore.communication.get_rank()](https://www.mindspore.cn/docs/zh-CN/master/api_python/communication/mindspore.communication.get_rank.html)接口需要在调用[mindspore.communication.init()](https://www.mindspore.cn/docs/zh-CN/master/api_python/communication/mindspore.communication.init.html)接口完成分布式初始化后才能正常获取rank信息,否则`get_rank()`默认返回0。 -在对某一rank进行断点操作之后,会导致该rank进程执行停止在断点处等待后续交互操作,而其他未断点rank进程会继续运行,这样可能会导致快慢卡的情况,所以可以使用`mindspore.communication.comm_func.barrier()`算子和`mindspore.common.api._pynative_executor.sync()`来同步所有rank的运行,确保其他rank阻塞等待,且一旦调试的rank继续运行则其他rank的停止会被释放。比如在单机八卡任务中,仅针对rank 7进行断点调试且阻塞所有其他rank: +在对某一rank进行断点操作之后,会导致该rank进程执行停止在断点处等待后续交互操作,而其他未断点rank进程会继续运行,这样可能会导致快慢卡的情况,所以可以使用[mindspore.communication.comm_func.barrier()](https://www.mindspore.cn/docs/zh-CN/master/api_python/communication/mindspore.communication.comm_func.barrier.html)算子和`mindspore.common.api._pynative_executor.sync()`来同步所有rank的运行,确保其他rank阻塞等待,且一旦调试的rank继续运行则其他rank的停止会被释放。比如在单机八卡任务中,仅针对rank 7进行断点调试且阻塞所有其他rank: ```python import pdb diff --git a/tutorials/source_zh_cn/parallel/operator_parallel.md b/tutorials/source_zh_cn/parallel/operator_parallel.md index 592e4b3623..68cbd0e3e5 100644 --- a/tutorials/source_zh_cn/parallel/operator_parallel.md +++ b/tutorials/source_zh_cn/parallel/operator_parallel.md @@ -107,7 +107,7 @@ class Network(nn.Cell): #### 训练网络定义 -在这一步,我们需要定义损失函数、优化器以及训练过程。需要注意的是,由于大模型的参数量巨大,在单卡上定义网络时如果进行参数初始化,显存将远远不够。因此在定义网络时需要配合`no_init_parameters`接口进行延迟初始化,将参数初始化延迟到并行多卡阶段。这里包括网络和优化器的定义都需要延后初始化。 +在这一步,我们需要定义损失函数、优化器以及训练过程。需要注意的是,由于大模型的参数量巨大,在单卡上定义网络时如果进行参数初始化,显存将远远不够。因此在定义网络时需要配合[mindspore.nn.utils.no_init_parameters](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.utils.no_init_parameters.html)接口进行延迟初始化,将参数初始化延迟到并行多卡阶段。这里包括网络和优化器的定义都需要延后初始化。 ```python from mindspore.nn.utils import no_init_parameters @@ -250,7 +250,7 @@ data_set = create_dataset(32) #### 定义网络 -在当前mint算子并行模式下,需要用mint算子定义网络。由于mint算子作为函数式接口,并不直接对外暴露其算子类型原语(Primitive),因此无法直接为算子配置并行策略,而需要用户在单卡网络的基础上使用`mindspore.parallel.shard`接口手动配置mint算子的切分策略,例如配置策略后的网络结构为: +在当前mint算子并行模式下,需要用mint算子定义网络。由于mint算子作为函数式接口,并不直接对外暴露其算子类型原语(Primitive),因此无法直接为算子配置并行策略,而需要用户在单卡网络的基础上使用[mindspore.parallel.shard](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.shard.html)接口手动配置mint算子的切分策略,例如配置策略后的网络结构为: ```python import mindspore as ms diff --git a/tutorials/source_zh_cn/parallel/pipeline_parallel.md b/tutorials/source_zh_cn/parallel/pipeline_parallel.md index 0972dae422..e38b87ae28 100644 --- a/tutorials/source_zh_cn/parallel/pipeline_parallel.md +++ b/tutorials/source_zh_cn/parallel/pipeline_parallel.md @@ -122,11 +122,11 @@ class Network(nn.Cell): ### 训练网络定义 -在这一步,我们需要定义损失函数、优化器以及训练过程。需要注意的是,这里对网络和优化器的定义都需要延后初始化。除此之外,还需要增加 `PipelineGradReducer` 接口,用于处理流水线并行下的梯度,该接口的第一个参数为需要更新的网络参数,第二个为是否使用优化器并行。 +在这一步,我们需要定义损失函数、优化器以及训练过程。需要注意的是,这里对网络和优化器的定义都需要延后初始化。除此之外,还需要增加 [mindspore.parallel.nn.PipelineGradReducer](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.nn.PipelineGradReducer.html) 接口,用于处理流水线并行下的梯度,该接口的第一个参数为需要更新的网络参数,第二个为是否使用优化器并行。 与单卡模型不同,在这部分需要调用两个接口来配置流水线并行: -- 首先需要定义LossCell,本例中调用了`nn.WithLossCell`接口封装网络和损失函数。 +- 首先需要定义LossCell,本例中调用了[mindspore.nn.WithLossCell](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.WithLossCell.html)接口封装网络和损失函数。 - 然后需要在LossCell外包一层`Pipeline`,并指定MicroBatch的size,并通过`stage_config`配置每个包含训练参数的`Cell`的`pipeline_stage`。 ```python @@ -353,7 +353,7 @@ net.head.pipeline_stage = 3 我们需要进一步设置并行有关的配置,用`AutoParallel`再包裹一次network,指定并行模式`semi_auto`为半自动并行模式,此外,还需开启流水线并行,配置`pipeline`,并通过配置`stages`数来指定stage的总数。此处不设置`device_target`会自动指定为MindSpore包对应的后端硬件设备(默认为Ascend)。`output_broadcast=True`表示流水线并行推理时,将最后一个stage的结果广播给其余stage,可以用于自回归推理场景。 -在执行推理前,先编译计算图`parallel_net.compile()`,再调用`sync_pipeline_shared_parameters(parallel_net)`接口,框架自动同步stage间的共享权重。 +在执行推理前,先编译计算图`parallel_net.compile()`,再调用[mindspore.parallel.sync_pipeline_shared_parameters(parallel_net)](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.sync_pipeline_shared_parameters.html)接口,框架自动同步stage间的共享权重。 ```python -- Gitee