From 8f2ed959cffe86f2f0cd533c527df0729695c741 Mon Sep 17 00:00:00 2001 From: zhangyi Date: Tue, 9 Aug 2022 10:52:13 +0800 Subject: [PATCH] update the translation --- tutorials/source_en/advanced/network.rst | 12 + .../advanced/network/control_flow.md | 384 ++++++++++++ .../source_en/advanced/network/derivation.md | 592 ++++++++++++++++++ .../source_en/advanced/network/forward.md | 472 ++++++++++++++ tutorials/source_en/advanced/network/loss.md | 322 ++++++++++ tutorials/source_en/advanced/network/optim.md | 292 +++++++++ .../source_en/advanced/network/parameter.md | 314 ++++++++++ tutorials/source_en/index.rst | 3 +- 8 files changed, 2390 insertions(+), 1 deletion(-) create mode 100644 tutorials/source_en/advanced/network.rst create mode 100644 tutorials/source_en/advanced/network/control_flow.md create mode 100644 tutorials/source_en/advanced/network/derivation.md create mode 100644 tutorials/source_en/advanced/network/forward.md create mode 100644 tutorials/source_en/advanced/network/loss.md create mode 100644 tutorials/source_en/advanced/network/optim.md create mode 100644 tutorials/source_en/advanced/network/parameter.md diff --git a/tutorials/source_en/advanced/network.rst b/tutorials/source_en/advanced/network.rst new file mode 100644 index 0000000000..e94141feb8 --- /dev/null +++ b/tutorials/source_en/advanced/network.rst @@ -0,0 +1,12 @@ +Network Building +=================== + +.. toctree:: + :maxdepth: 1 + + network/parameter + network/loss + network/optim + network/forward + network/derivation + network/control_flow diff --git a/tutorials/source_en/advanced/network/control_flow.md b/tutorials/source_en/advanced/network/control_flow.md new file mode 100644 index 0000000000..a32b8a4fd8 --- /dev/null +++ b/tutorials/source_en/advanced/network/control_flow.md @@ -0,0 +1,384 @@ +# Process Control Statements + + + +Currently, there are two execution modes of a mainstream deep learning framework: a static graph mode (`GRAPH_MODE`) and a dynamic graph mode (`PYNATIVE_MODE`). + +In `PYNATIVE_MODE`, MindSpore fully supports process control statements of the native Python syntax. In `GRAPH_MODE`, MindSpore performance is optimized during build. Therefore, there are some special constraints on using process control statements when during network definition. Other constraints are the same as those in the native Python syntax. + +When switching the running mode from dynamic graph to static graph, pay attention to the [static graph syntax support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html#static-graph-syntax-support). The following describes how to use process control statements when defining a network in `GRAPH_MODE`. + +## Constant and Variable Conditions + +When a network is defined in `GRAPH_MODE`, MindSpore classifies condition expressions in process control statements into constant and variable conditions. During graph build, a condition expression that can be determined to be either true or false is a constant condition, while a condition expression that cannot be determined to be true or false is a variable condition. **MindSpore generates control flow operators on a network only when the condition expression is a variable condition.** + +It should be noted that, when a control flow operator exists in a network, the network is divided into multiple execution subgraphs, and process jumping and data transmission between the subgraphs cause performance loss to some extent. + +### Constant Conditions + +Check methods: + +- The condition expression does not contain tensors or any list, tuple, or dict whose elements are of the tensor type. +- The condition expression contains tensors or a list, tuple, or dict of the tensor type, but the condition expression result is not affected by the tensor value. + +Examples: + +- `for i in range(0,10)`: `i` is a scalar. The result of the potential condition expression `i < 10` can be determined during graph build. Therefore, it is a constant condition. + +- `self.flag`: A scalar of the Boolean type. Its value is determined when the Cell object is built. + +- `x + 1 < 10`: `x` is a scalar: The value of `x + 1` is uncertain when the Cell object is built. MindSpore computes the results of all scalar expressions during graph build. Therefore, the value of the expression is determined during build. + +- `len(my_list) < 10`: `my_list` is a list object whose element is of the tensor type. This condition expression contains tensors, but the expression result is not affected by the tensor value and is related only to the number of tensors in `my_list`. + +### Variable Conditions + +Check method: + +- The condition expression contains tensors or a list, tuple, or dict of the tensor type, and the condition expression result is affected by the tensor value. + +Examples: + +- `x < y`: `x` and `y` are operator outputs. + +- `x in list`: `x` is the operator output. + +The operator output can be determined only when each step is executed. Therefore, the preceding two conditions are variable conditions. + +## if Statement + +When defining a network in `GRAPH_MODE` using the `if` statement, pay attention to the following: **When the condition expression is a variable condition, the same variable in different branches must be assigned the same data type.** + +### if Statement Under a Variable Condition + +In the following code, shapes of tensors assigned to the `out` variable in the `if` and `else` branches are `()` and `(2,)`, respectively. The shape of the tensor returned by the network is determined by the condition `x < y`. The result of `x < y` cannot be determined during graph build. Therefore, whether the `out` shape is `()` or `(2,)` cannot be determined during graph build. MindSpore throws an exception due to type derivation failure. + +```python +import numpy as np +import mindspore as ms +from mindspore import nn + +class SingleIfNet(nn.Cell): + + def construct(self, x, y, z): + # Build an if statement whose condition expression is a variable condition. + if x < y: + out = x + else: + out = z + out = out + 1 + return out + +forward_net = SingleIfNet() + +x = ms.Tensor(np.array(0), dtype=ms.int32) +y = ms.Tensor(np.array(1), dtype=ms.int32) +z = ms.Tensor(np.array([1, 2]), dtype=ms.int32) + +output = forward_net(x, y, z) +``` + +Execute the preceding code. The error information is as follows: + +```text +ValueError: mindspore/ccsrc/pipeline/jit/static_analysis/static_analysis.cc:800 ProcessEvalResults] Cannot join the return values of different branches, perhaps you need to make them equal. +Shape Join Failed: shape1 = (), shape2 = (2). +``` + +### if Statement Under a Constant Condition + +When the condition expression in the `if` statement is a constant condition, the usage of the condition expression is the same as that of the native Python syntax, and there is no additional constraint. In the following code, the condition expression `x < y + 1` of the `if` statement is a constant condition (because `x` and `y` are scalar constants). During graph build, the `out` variable is of the scalar `int` type. The network can be built and executed properly, and the correct result `1` is displayed. + +```python +import numpy as np +import mindspore as ms +from mindspore import nn + +class SingleIfNet(nn.Cell): + + def construct(self, z): + x = 0 + y = 1 + + # Build an if statement whose condition expression is a constant condition. + if x < y + 1: + out = x + else: + out = z + out = out + 1 + + return out + +z = ms.Tensor(np.array([0, 1]), dtype=ms.int32) +forward_net = SingleIfNet() + +output = forward_net(z) +print("output:", output) +``` + +```text + output: 1 +``` + +## for Statement + +The `for` statement expands the loop body. Therefore, the number of subgraphs and operators of the network that uses the `for` statement depends on the number of loops of the `for` statement. If the number of operators or subgraphs is too large, more hardware resources are consumed. + +In the following sample code, the loop body in the `for` statement is executed for three times, and the output is `5`. + +```python +import numpy as np +from mindspore import nn +import mindspore as ms + +class IfInForNet(nn.Cell): + + def construct(self, x, y): + out = 0 + + # Build a for statement whose condition expression is a constant condition. + for i in range(0, 3): + # Build an if statement whose condition expression is a variable condition. + if x + i < y: + out = out + x + else: + out = out + y + out = out + 1 + + return out + +forward_net = IfInForNet() + +x = ms.Tensor(np.array(0), dtype=ms.int32) +y = ms.Tensor(np.array(1), dtype=ms.int32) + +output = forward_net(x, y) +print("output:", output) +``` + +```text + output: 5 +``` + +The `for` statement expands the loop body. Therefore, the preceding code is equivalent to the following code: + +```python +import numpy as np +from mindspore import nn +import mindspore as ms + +class IfInForNet(nn.Cell): + def construct(self, x, y): + out = 0 + + # Loop: 0 + if x + 0 < y: + out = out + x + else: + out = out + y + out = out + 1 + # Loop: 1 + if x + 1 < y: + out = out + x + else: + out = out + y + out = out + 1 + # Loop: 2 + if x + 2 < y: + out = out + x + else: + out = out + y + out = out + 1 + + return out + +forward_net = IfInForNet() + +x = ms.Tensor(np.array(0), dtype=ms.int32) +y = ms.Tensor(np.array(1), dtype=ms.int32) + +output = forward_net(x, y) +print("output:", output) +``` + +```text + output: 5 +``` + +According to the preceding sample code, using the `for` statement may cause too many subgraphs in some scenarios. To reduce hardware resource overhead and improve network build performance, you can convert the `for` statement to the `while` statement whose condition expression is a variable condition. + +## while Statement + +The `while` statement is more flexible than the `for` statement. When the condition of `while` is a constant, `while` processes and expands the loop body in a similar way as `for`. + +When the condition expression of `while` is a variable condition, the `while` statement does not expand the loop body. Instead, a control flow operator is generated during graph execution. Therefore, the problem of too many subgraphs caused by the `for` loop can be avoided. + +### while Statement Under a Constant Condition + +In the following sample code, the loop body in the `for` statement is executed for three times, and the output result is `5`, which is essentially the same as the sample code in the `for` statement. + +```python +import numpy as np +from mindspore import nn +import mindspore as ms + +class IfInWhileNet(nn.Cell): + + def construct(self, x, y): + i = 0 + out = x + # Build a while statement whose condition expression is a constant condition. + while i < 3: + # Build an if statement whose condition expression is a variable condition. + if x + i < y: + out = out + x + else: + out = out + y + out = out + 1 + i = i + 1 + return out + +forward_net = IfInWhileNet() +x = ms.Tensor(np.array(0), dtype=ms.int32) +y = ms.Tensor(np.array(1), dtype=ms.int32) + +output = forward_net(x, y) +print("output:", output) +``` + +```text + output: 5 +``` + +### while Statement Under a Variable Condition + +1. Constraint 1: **When the condition expression in the while statement is a variable condition, the while loop body cannot contain computation operations of non-tensor types, such as scalar, list, and tuple.** + + To avoid too many control flow operators, you can use the `while` statement whose condition expression is a variable condition to rewrite the preceding code. + + ```python + import numpy as np + from mindspore import nn + import mindspore as ms + + class IfInWhileNet(nn.Cell): + + def construct(self, x, y, i): + out = x + # Build a while statement whose condition expression is a variable condition. + while i < 3: + # Build an if statement whose condition expression is a variable condition. + if x + i < y: + out = out + x + else: + out = out + y + out = out + 1 + i = i + 1 + return out + + forward_net = IfInWhileNet() + i = ms.Tensor(np.array(0), dtype=ms.int32) + x = ms.Tensor(np.array(0), dtype=ms.int32) + y = ms.Tensor(np.array(1), dtype=ms.int32) + + output = forward_net(x, y, i) + print("output:", output) + ``` + + ```text + output: 5 + ``` + + It should be noted that in the preceding code, the condition expression of the `while` statement is a variable condition, and the `while` loop body is not expanded. The expressions in the `while` loop body are computed during the running of each step. In addition, the following constraints are generated: + + > When the condition expression in the `while` statement is a variable condition, the `while` loop body cannot contain computation operations of non-tensor types, such as scalar, list, and tuple. + + These types of computation operations are completed during graph build, which conflicts with the computation mechanism of the `while` loop body during execution. The following uses sample code as an example: + + ```Python + import numpy as np + from mindspore import nn + import mindspore as ms + class IfInWhileNet(nn.Cell): + + def __init__(self): + super().__init__() + self.nums = [1, 2, 3] + + def construct(self, x, y, i): + j = 0 + out = x + + # Build a while statement whose condition expression is a variable condition. + while i < 3: + if x + i < y: + out = out + x + else: + out = out + y + out = out + self.nums[j] + i = i + 1 + # Build scalar computation in the loop body of the while statement whose condition expression is a variable condition. + j = j + 1 + + return out + + forward_net = IfInWhileNet() + i = ms.Tensor(np.array(0), dtype=ms.int32) + x = ms.Tensor(np.array(0), dtype=ms.int32) + y = ms.Tensor(np.array(1), dtype=ms.int32) + + output = forward_net(x, y, i) + ``` + + In the preceding code, the `while` loop body of the condition expression `i < 3` contains scalar computation `j = j + 1`. As a result, an error occurs during graph build. The following error information is displayed during code execution: + + ```text + IndexError: mindspore/core/abstract/prim_structures.cc:127 InferTupleOrListGetItem] list_getitem evaluator index should be in range[-3, 3), but got 3. + ``` + +2. Constraint 2: **When the condition expression in the while statement is a variable condition, the input shape of the operator cannot be changed in the loop body.** + + MindSpore requires that the input shape of the same operator on the network be determined during graph build. However, changing the input shape of the operator in the `while` loop body takes effect during graph execution. + + The following uses sample code as an example: + + ```Python + import numpy as np + from mindspore import nn + import mindspore as ms + from mindspore import ops + + class IfInWhileNet(nn.Cell): + + def __init__(self): + super().__init__() + self.expand_dims = ops.ExpandDims() + + def construct(self, x, y, i): + out = x + # Build a while statement whose condition expression is a variable condition. + while i < 3: + if x + i < y: + out = out + x + else: + out = out + y + out = out + 1 + # Change the input shape of an operator. + out = self.expand_dims(out, -1) + i = i + 1 + return out + + forward_net = IfInWhileNet() + i = ms.Tensor(np.array(0), dtype=ms.int32) + x = ms.Tensor(np.array(0), dtype=ms.int32) + y = ms.Tensor(np.array(1), dtype=ms.int32) + + output = forward_net(x, y, i) + ``` + + In the preceding code, the `ExpandDims` operator in the `while` loop body of the condition expression `i < 3` changes the input shape of the expression `out = out + 1` in the next loop. As a result, an error occurs during graph build. The following error information is displayed during code execution: + + ```text + ValueError: mindspore/ccsrc/pipeline/jit/static_analysis/static_analysis.cc:800 ProcessEvalResults] Cannot join the return values of different branches, perhaps you need to make them equal. + Shape Join Failed: shape1 = (1), shape2 = (1, 1). + ``` diff --git a/tutorials/source_en/advanced/network/derivation.md b/tutorials/source_en/advanced/network/derivation.md new file mode 100644 index 0000000000..30b8c521af --- /dev/null +++ b/tutorials/source_en/advanced/network/derivation.md @@ -0,0 +1,592 @@ +# Automatic Derivation + + + +The `GradOperation` API provided by the `mindspore.ops` module can be used to generate a gradient of a network model. The following describes how to use the `GradOperation` API to perform first-order and second-order derivations and how to stop gradient computation. + +> For details about `GradOperation`, see [API](https://mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.GradOperation.html#mindspore.ops.GradOperation). + +## First-order Derivation + +Method: `mindspore.ops.GradOperation()`. The parameter usage is as follows: + +- `get_all`: If this parameter is set to `False`, the derivation is performed only on the first input. If this parameter is set to `True`, the derivation is performed on all inputs. +- `get_by_list`: If this parameter is set to `False`, the weight derivation is not performed. If this parameter is set to `True`, the weight derivation is performed. +- `sens_param`: The output value of the network is scaled to change the final gradient. Therefore, the dimension is the same as the output dimension. + +The [MatMul](https://mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.MatMul.html#mindspore.ops.MatMul) operator is used to build a customized network model `Net`, and then perform first-order derivation on the model. The following formula is an example to describe how to use the `GradOperation` API: + +$$f(x, y)=(x * z) * y \tag{1}$$ + +First, define the network model `Net`, input `x`, and input `y`. + +```python +import numpy as np +import mindspore.nn as nn +import mindspore.ops as ops +import mindspore as ms + +# Define the inputs x and y. +x = ms.Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=ms.float32) +y = ms.Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=ms.float32) + +class Net(nn.Cell): + """Define the matrix multiplication network Net.""" + + def __init__(self): + super(Net, self).__init__() + self.matmul = ops.MatMul() + self.z = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='z') + + def construct(self, x, y): + x = x * self.z + out = self.matmul(x, y) + return out +``` + +### Computing the Input Derivative + +Compute the input derivative. The code is as follows: + +```python +class GradNetWrtX(nn.Cell): + """Define the first-order derivation of network input.""" + + def __init__(self, net): + super(GradNetWrtX, self).__init__() + self.net = net + self.grad_op = ops.GradOperation() + + def construct(self, x, y): + gradient_function = self.grad_op(self.net) + return gradient_function(x, y) + +output = GradNetWrtX(Net())(x, y) +print(output) +``` + +```text + [[4.5099998 2.7 3.6000001] + [4.5099998 2.7 3.6000001]] +``` + +The preceding result is explained as follows. To facilitate analysis, the preceding inputs `x` and `y`, and weight `z` are expressed as follows: + +```text +x = ms.Tensor([[x1, x2, x3], [x4, x5, x6]]) +y = ms.Tensor([[y1, y2, y3], [y4, y5, y6], [y7, y8, y9]]) +z = ms.Tensor([z]) +``` + +The following forward result can be obtained based on the definition of the MatMul operator: + +$$output = [[(x_1 \cdot y_1 + x_2 \cdot y_4 + x_3 \cdot y_7) \cdot z, (x_1 \cdot y_2 + x_2 \cdot y_5 + x_3 \cdot y_8) \cdot z, (x_1 \cdot y_3 + x_2 \cdot y_6 + x_3 \cdot y_9) \cdot z],$$ + +$$[(x_4 \cdot y_1 + x_5 \cdot y_4 + x_6 \cdot y_7) \cdot z, (x_4 \cdot y_2 + x_5 \cdot y_5 + x_6 \cdot y_8) \cdot z, (x_4 \cdot y_3 + x_5 \cdot y_6 + x_6 \cdot y_9) \cdot z]] \tag{2}$$ + +MindSpore uses the reverse-mode automatic differentiation mechanism during gradient computation. The output result is summed and then the derivative of the input `x` is computed. + +1. Sum formula: + + $$\sum{output} = [(x_1 \cdot y_1 + x_2 \cdot y_4 + x_3 \cdot y_7) + (x_1 \cdot y_2 + x_2 \cdot y_5 + x_3 \cdot y_8) + (x_1 \cdot y_3 + x_2 \cdot y_6 + x_3 \cdot y_9)$$ + + $$+ (x_4 \cdot y_1 + x_5 \cdot y_4 + x_6 \cdot y_7) + (x_4 \cdot y_2 + x_5 \cdot y_5 + x_6 \cdot y_8) + (x_4 \cdot y_3 + x_5 \cdot y_6 + x_6 \cdot y_9)] \cdot z \tag{3}$$ + +2. Derivation formula: + + $$\frac{\mathrm{d}(\sum{output})}{\mathrm{d}x} = [[(y_1 + y_2 + y_3) \cdot z, (y_4 + y_5 + y_6) \cdot z, (y_7 + y_8 + y_9) \cdot z],$$ + + $$[(y_1 + y_2 + y_3) \cdot z, (y_4 + y_5 + y_6) \cdot z, (y_7 + y_8 + y_9) \cdot z]] \tag{4}$$ + +3. Computation result: + + $$\frac{\mathrm{d}(\sum{output})}{\mathrm{d}x} = [[4.51 \quad 2.7 \quad 3.6] [4.51 \quad 2.7 \quad 3.6]] \tag{5}$$ + + > If the derivatives of the `x` and `y` inputs are considered, you only need to set `self.grad_op = GradOperation(get_all=True)` in `GradNetWrtX`. + +### Computing the Weight Derivative + +Compute the weight derivative. The sample code is as follows: + +```python +class GradNetWrtZ(nn.Cell): + """Define the first-order derivation of network weight."" + + def __init__(self, net): + super(GradNetWrtZ, self).__init__() + self.net = net + self.params = ms.ParameterTuple(net.trainable_params()) + self.grad_op = ops.GradOperation(get_by_list=True) + + def construct(self, x, y): + gradient_function = self.grad_op(self.net, self.params) + return gradient_function(x, y) + +output = GradNetWrtZ(Net())(x, y) +print(output[0]) +``` + +```text + [21.536] +``` + +The following formula is used to explain the preceding result. A derivation formula for the weight is: + +$$\frac{\mathrm{d}(\sum{output})}{\mathrm{d}z} = (x_1 \cdot y_1 + x_2 \cdot y_4 + x_3 \cdot y_7) + (x_1 \cdot y_2 + x_2 \cdot y_5 + x_3 \cdot y_8) + (x_1 \cdot y_3 + x_2 \cdot y_6 + x_3 \cdot y_9)$$ + +$$+ (x_4 \cdot y_1 + x_5 \cdot y_4 + x_6 \cdot y_7) + (x_4 \cdot y_2 + x_5 \cdot y_5 + x_6 \cdot y_8) + (x_4 \cdot y_3 + x_5 \cdot y_6 + x_6 \cdot y_9) \tag{6}$$ + +Computation result: + +$$\frac{\mathrm{d}(\sum{output})}{\mathrm{d}z} = [2.1536e+01] \tag{7}$$ + +### Gradient Value Scaling + +You can use the `sens_param` parameter to control the scaling of the gradient value. + +```python +class GradNetWrtN(nn.Cell): + """Define the first-order derivation of the network and control gradient value scaling.""" + def __init__(self, net): + super(GradNetWrtN, self).__init__() + self.net = net + self.grad_op = ops.GradOperation(sens_param=True) + + # Define gradient value scaling. + self.grad_wrt_output = ms.Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=ms.float32) + + def construct(self, x, y): + gradient_function = self.grad_op(self.net) + return gradient_function(x, y, self.grad_wrt_output) + +output = GradNetWrtN(Net())(x, y) +print(output) +``` + +```text + [[2.211 0.51 1.49 ] + [5.588 2.68 4.07 ]] +``` + +To facilitate the explanation of the preceding result, `self.grad_wrt_output` is recorded as follows: + +```text +self.grad_wrt_output = ms.Tensor([[s1, s2, s3], [s4, s5, s6]]) +``` + +The output value after scaling is the product of the original output value and the element corresponding to `self.grad_wrt_output`. The formula is as follows: + +$$output = [[(x_1 \cdot y_1 + x_2 \cdot y_4 + x_3 \cdot y_7) \cdot z \cdot s_1, (x_1 \cdot y_2 + x_2 \cdot y_5 + x_3 \cdot y_8) \cdot z \cdot s_2, (x_1 \cdot y_3 + x_2 \cdot y_6 + x_3 \cdot y_9) \cdot z \cdot s_3], $$ + +$$[(x_4 \cdot y_1 + x_5 \cdot y_4 + x_6 \cdot y_7) \cdot z \cdot s_4, (x_4 \cdot y_2 + x_5 \cdot y_5 + x_6 \cdot y_8) \cdot z \cdot s_5, (x_4 \cdot y_3 + x_5 \cdot y_6 + x_6 \cdot y_9) \cdot z \cdot s_6]] \tag{8}$$ + +The derivation formula is changed to compute the derivative of the sum of the output values to each element of `x`. + +$$\frac{\mathrm{d}(\sum{output})}{\mathrm{d}x} = [[(s_1 \cdot y_1 + s_2 \cdot y_2 + s_3 \cdot y_3) \cdot z, (s_1 \cdot y_4 + s_2 \cdot y_5 + s_3 \cdot y_6) \cdot z, (s_1 \cdot y_7 + s_2 \cdot y_8 + s_3 \cdot y_9) \cdot z],$$ + +$$[(s_4 \cdot y_1 + s_5 \cdot y_2 + s_6 \cdot y_3) \cdot z, (s_4 \cdot y_4 + s_5 \cdot y_5 + s_6 \cdot y_6) \cdot z, (s_4 \cdot y_7 + s_5 \cdot y_8 + s_6 \cdot y_9) \cdot z]] \tag{9}$$ + +Computation result: + +$$\frac{\mathrm{d}(\sum{output})}{\mathrm{d}x} = [[2.211 \quad 0.51 \quad 1.49][5.588 \quad 2.68 \quad 4.07]] \tag{10}$$ + +### Stopping Gradient Computation + +You can use `stop_gradient` to stop computing the gradient of a specified operator to eliminate the impact of the operator on the gradient. + +Based on the matrix multiplication network model used for the first-order derivation, add an operator `out2` and disable the gradient computation to obtain the customized network `Net2`. Then, check the derivation result of the input. + +The sample code is as follows: + +```python +class Net(nn.Cell): + + def __init__(self): + super(Net, self).__init__() + self.matmul = ops.MatMul() + + def construct(self, x, y): + out1 = self.matmul(x, y) + out2 = self.matmul(x, y) + out2 = ops.stop_gradient(out2) # Stop computing the gradient of the out2 operator. + out = out1 + out2 + return out + +class GradNetWrtX(nn.Cell): + + def __init__(self, net): + super(GradNetWrtX, self).__init__() + self.net = net + self.grad_op = ops.GradOperation() + + def construct(self, x, y): + gradient_function = self.grad_op(self.net) + return gradient_function(x, y) + +output = GradNetWrtX(Net())(x, y) +print(output) +``` + +```text + [[4.5099998 2.7 3.6000001] + [4.5099998 2.7 3.6000001]] +``` + +According to the preceding information, `stop_gradient` is set for `out2`. Therefore, `out2` does not contribute to gradient computation. The output result is the same as that when `out2` is not added. + +Delete `out2 = stop_gradient(out2)` and check the output. An example of the code is as follows: + +```python +class Net(nn.Cell): + def __init__(self): + super(Net, self).__init__() + self.matmul = ops.MatMul() + + def construct(self, x, y): + out1 = self.matmul(x, y) + out2 = self.matmul(x, y) + # out2 = stop_gradient(out2) + out = out1 + out2 + return out + +class GradNetWrtX(nn.Cell): + def __init__(self, net): + super(GradNetWrtX, self).__init__() + self.net = net + self.grad_op = ops.GradOperation() + + def construct(self, x, y): + gradient_function = self.grad_op(self.net) + return gradient_function(x, y) + +output = GradNetWrtX(Net())(x, y) +print(output) +``` + +```text + [[9.0199995 5.4 7.2000003] + [9.0199995 5.4 7.2000003]] +``` + +According to the printed result, after the gradient of the `out2` operator is computed, the gradients generated by the `out2` and `out1` operators are the same. Therefore, the value of each item in the result is twice the original value (accuracy error exists). + +### Customized Backward Propagation Function + +When MindSpore is used to build a neural network, the `nn.Cell` class needs to be inherited. When there are some operations that do not define backward propagation rules on the network, or when you want to control the gradient computation process of the entire network, you can use the function of customizing the backward propagation function of the `nn.Cell` object. The format is as follows: + +```python +def bprop(self, ..., out, dout): + return ... +``` + +- Input parameters: Input parameters in the forward porpagation plus `out` and `dout`. `out` indicates the computation result of the forward porpagation, and `dout` indicates the gradient returned to the `nn.Cell` object. +- Return values: Gradient of each input in the forward porpagation. The number of return values must be the same as the number of inputs in the forward porpagation. + +A complete example is as follows: + +```python +import mindspore.nn as nn +import mindspore as ms +import mindspore.ops as ops + +class Net(nn.Cell): + def __init__(self): + super(Net, self).__init__() + self.matmul = ops.MatMul() + + def construct(self, x, y): + out = self.matmul(x, y) + return out + + def bprop(self, x, y, out, dout): + dx = x + 1 + dy = y + 1 + return dx, dy + + +class GradNet(nn.Cell): + def __init__(self, net): + super(GradNet, self).__init__() + self.net = net + self.grad_op = ops.GradOperation(get_all=True) + + def construct(self, x, y): + gradient_function = self.grad_op(self.net) + return gradient_function(x, y) + + +x = ms.Tensor([[0.5, 0.6, 0.4], [1.2, 1.3, 1.1]], dtype=ms.float32) +y = ms.Tensor([[0.01, 0.3, 1.1], [0.1, 0.2, 1.3], [2.1, 1.2, 3.3]], dtype=ms.float32) +out = GradNet(Net())(x, y) +print(out) +``` + +```text + (Tensor(shape=[2, 3], dtype=Float32, value= + [[ 1.50000000e+00, 1.60000002e+00, 1.39999998e+00], + [ 2.20000005e+00, 2.29999995e+00, 2.09999990e+00]]), Tensor(shape=[3, 3], dtype=Float32, value= + [[ 1.00999999e+00, 1.29999995e+00, 2.09999990e+00], + [ 1.10000002e+00, 1.20000005e+00, 2.29999995e+00], + [ 3.09999990e+00, 2.20000005e+00, 4.30000019e+00]])) +``` + +Constraints + +- If the number of return values of the `bprop` function is 1, the return value must be written in the tuple format, that is, `return (dx,)`. +- In graph mode, the `bprop` function needs to be converted into a graph IR. Therefore, the static graph syntax must be complied with. For details, see [Static Graph Syntax Support](https://www.mindspore.cn/docs/en/master/note/static_graph_syntax_support.html). +- Only the gradient of the forward porpagation input can be returned. The gradient of the `Parameter` cannot be returned. +- `Parameter` cannot be used in `bprop`. + +## High-order Derivation + +High-order differentiation is used in domains such as AI-supported scientific computing and second-order optimization. For example, in the molecular dynamics simulation, when the potential energy is trained using the neural network, the derivative of the neural network output to the input needs to be computed in the loss function, and then the second-order cross derivative of the loss function to the input and the weight exists in backward propagation. + +In addition, the second-order derivative of the output to the input exists in a differential equation solved by AI (such as PINNs). Another example is that in order to enable the neural network to converge quickly in the second-order optimization, the second-order derivative of the loss function to the weight needs to be computed using the Newton method. + +MindSpore can support high-order derivatives by computing derivatives for multiple times. The following uses several examples to describe how to compute derivatives. + +### Single-input Single-output High-order Derivative + +For example, the formula of the Sin operator is as follows: + +$$f(x) = sin(x) \tag{1}$$ + +The first derivative is: + +$$f'(x) = cos(x) \tag{2}$$ + +The second derivative is: + +$$f''(x) = cos'(x) = -sin(x) \tag{3}$$ + +The second derivative (-Sin) is implemented as follows: + +```python +import numpy as np +import mindspore.nn as nn +import mindspore.ops as ops +import mindspore as ms + +class Net(nn.Cell): + """Feedforward network model""" + def __init__(self): + super(Net, self).__init__() + self.sin = ops.Sin() + + def construct(self, x): + out = self.sin(x) + return out + +class Grad(nn.Cell): + """First-order derivation""" + def __init__(self, network): + super(Grad, self).__init__() + self.grad = ops.GradOperation() + self.network = network + + def construct(self, x): + gout = self.grad(self.network)(x) + return gout + +class GradSec(nn.Cell): + """Second order derivation""" + def __init__(self, network): + super(GradSec, self).__init__() + self.grad = ops.GradOperation() + self.network = network + + def construct(self, x): + gout = self.grad(self.network)(x) + return gout + +x_train = ms.Tensor(np.array([3.1415926]), dtype=ms.float32) + +net = Net() +firstgrad = Grad(net) +secondgrad = GradSec(firstgrad) +output = secondgrad(x_train) + +# Print the result. +result = np.around(output.asnumpy(), decimals=2) +print(result) +``` + +```text + [-0.] +``` + +The preceding print result shows that the value of `-sin(3.1415926)` is close to `0`. + +### Single-input Multi-output High-order Derivative + +Compute the derivation of the following formula: + +$$f(x) = (f_1(x), f_2(x)) \tag{1}$$ + +Where: + +$$f_1(x) = sin(x) \tag{2}$$ + +$$f_2(x) = cos(x) \tag{3}$$ + +MindSpore uses the reverse-mode automatic differentiation mechanism during gradient computation. The output result is summed and then the derivative of the input is computed. Therefore, the first derivative is: + +$$f'(x) = cos(x) -sin(x) \tag{4}$$ + +The second derivative is: + +$$f''(x) = -sin(x) - cos(x) \tag{5}$$ + +```python +import numpy as np +import mindspore.nn as nn +import mindspore.ops as ops +import mindspore as ms + +class Net(nn.Cell): + """Feedforward network model""" + def __init__(self): + super(Net, self).__init__() + self.sin = ops.Sin() + self.cos = ops.Cos() + + def construct(self, x): + out1 = self.sin(x) + out2 = self.cos(x) + return out1, out2 + +class Grad(nn.Cell): + """First-order derivation""" + def __init__(self, network): + super(Grad, self).__init__() + self.grad = ops.GradOperation() + self.network = network + + def construct(self, x): + gout = self.grad(self.network)(x) + return gout + +class GradSec(nn.Cell): + """Second order derivation""" + def __init__(self, network): + super(GradSec, self).__init__() + self.grad = ops.GradOperation() + self.network = network + + def construct(self, x): + gout = self.grad(self.network)(x) + return gout + +x_train = ms.Tensor(np.array([3.1415926]), dtype=ms.float32) + +net = Net() +firstgrad = Grad(net) +secondgrad = GradSec(firstgrad) +output = secondgrad(x_train) + +# Print the result. +result = np.around(output.asnumpy(), decimals=2) +print(result) +``` + +```text + [1.] +``` + +The preceding print result shows that the value of `-sin(3.1415926) - cos(3.1415926)` is close to `1`. + +### Multiple-Input Multiple-Output High-Order Derivative + +Compute the derivation of the following formula: + +$$f(x, y) = (f_1(x, y), f_2(x, y)) \tag{1}$$ + +Where: + +$$f_1(x, y) = sin(x) - cos(y) \tag{2}$$ + +$$f_2(x, y) = cos(x) - sin(y) \tag{3}$$ + +MindSpore uses the reverse-mode automatic differentiation mechanism during gradient computation. The output result is summed and then the derivative of the input is computed. + +Sum: + +$$\sum{output} = sin(x) + cos(x) - sin(y) - cos(y) \tag{4}$$ + +The first derivative of output sum with respect to input $x$ is: + +$$\dfrac{\mathrm{d}\sum{output}}{\mathrm{d}x} = cos(x) - sin(x) \tag{5}$$ + +The second derivative of output sum with respect to input $x$ is: + +$$\dfrac{\mathrm{d}\sum{output}^{2}}{\mathrm{d}^{2}x} = -sin(x) - cos(x) \tag{6}$$ + +The first derivative of output sum with respect to input $y$ is: + +$$\dfrac{\mathrm{d}\sum{output}}{\mathrm{d}y} = -cos(y) + sin(y) \tag{7}$$ + +The second derivative of output sum with respect to input $y$ is: + +$$\dfrac{\mathrm{d}\sum{output}^{2}}{\mathrm{d}^{2}y} = sin(y) + cos(y) \tag{8}$$ + +```python +import numpy as np +import mindspore.nn as nn +import mindspore.ops as ops +import mindspore as ms + +class Net(nn.Cell): + """Feedforward network model""" + def __init__(self): + super(Net, self).__init__() + self.sin = ops.Sin() + self.cos = ops.Cos() + + def construct(self, x, y): + out1 = self.sin(x) - self.cos(y) + out2 = self.cos(x) - self.sin(y) + return out1, out2 + +class Grad(nn.Cell): + """First-order derivation""" + def __init__(self, network): + super(Grad, self).__init__() + self.grad = ops.GradOperation(get_all=True) + self.network = network + + def construct(self, x, y): + gout = self.grad(self.network)(x, y) + return gout + +class GradSec(nn.Cell): + """Second order derivation""" + def __init__(self, network): + super(GradSec, self).__init__() + self.grad = ops.GradOperation(get_all=True) + self.network = network + + def construct(self, x, y): + gout = self.grad(self.network)(x, y) + return gout + +x_train = ms.Tensor(np.array([3.1415926]), dtype=ms.float32) +y_train = ms.Tensor(np.array([3.1415926]), dtype=ms.float32) + +net = Net() +firstgrad = Grad(net) +secondgrad = GradSec(firstgrad) +output = secondgrad(x_train, y_train) + +# Print the result. +print(np.around(output[0].asnumpy(), decimals=2)) +print(np.around(output[1].asnumpy(), decimals=2)) +``` + +```text + [1.] + [-1.] +``` + +According to the preceding result, the value of the second derivative `-sin(3.1415926) - cos(3.1415926)` of the output to the input $x$ is close to `1`, and the value of the second derivative `sin(3.1415926) + cos(3.1415926)` of the output to the input $y$ is close to `-1`. + +> The accuracy may vary depending on the computing platform. Therefore, the execution results of the code in this section vary slightly on different platforms. diff --git a/tutorials/source_en/advanced/network/forward.md b/tutorials/source_en/advanced/network/forward.md new file mode 100644 index 0000000000..b6e31ee1d3 --- /dev/null +++ b/tutorials/source_en/advanced/network/forward.md @@ -0,0 +1,472 @@ +# Building a Network + + + +The `Cell` class of MindSpore is the base class for setting up all networks and the basic unit of a network. When customizing a network, you need to inherit the `Cell` class. The following describes the basic network unit `Cell` and customized feedforward network. + +The following describes the build of the feedforward network model and the basic units of the network model. Because training is not involved, there is no backward propagation or backward graph. + +![learningrate.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/advanced/network/images/introduction3.png) + +## Network Basic Unit: Cell + +In order to customize a network, you need to inherit the `Cell` class and overwrite the `__init__` and `construct` methods. Loss functions, optimizers, and model layers are parts of the network structure and can implement functions by only inheriting the `Cell` class. You can also customize them based on service requirements. + +The following describes the key member functions of `Cell`. + +### construct + +The `Cell` class overrides the `__call__` method. When the `Cell` class instance is called, the `construct` method is executed. The network structure is defined in the `construct` method. + +In the following example, a simple convolutional network is built. The convolutional network is defined in `__init__`. Input data `x` is transferred to the `construct` method to perform convolution computation and the computation result is returned. + +```python +from mindspore import nn + + +class Net(nn.Cell): + def __init__(self): + super(Net, self).__init__() + self.conv = nn.Conv2d(10, 20, 3, has_bias=True, weight_init='normal') + + def construct(self, x): + out = self.conv(x) + return out +``` + +### Obtaining Network Parameters + +In `nn.Cell`, methods that return parameters are `parameters_dict`, `get_parameters`, and `trainable_params`. + +- `parameters_dict`: obtains all parameters in the network structure and returns an OrderedDict with `key` as the parameter name and `value` as the parameter value. +- `get_parameters`: obtains all parameters in the network structure and returns the iterator of the Parameter in the Cell. +- `trainable_params`: obtains the attributes whose `requires_grad` is True in Parameter and returns the list of trainable parameters. + +The following examples use the preceding methods to obtain and print network parameters. + +```python +net = Net() + +# Obtain all parameters in the network structure. +result = net.parameters_dict() +print("parameters_dict of result:\n", result) + +# Obtain all parameters in the network structure. +print("\nget_parameters of result:") +for m in net.get_parameters(): + print(m) + +# Obtain the list of trainable parameters. +result = net.trainable_params() +print("\ntrainable_params of result:\n", result) +``` + +```text +parameters_dict of result: +OrderedDict([('conv.weight', Parameter (name=conv.weight, shape=(20, 10, 3, 3), dtype=Float32, requires_grad=True)), ('conv.bias', Parameter (name=conv.bias, shape=(20,), dtype=Float32, requires_grad=True))]) + +get_parameters of result: +Parameter (name=conv.weight, shape=(20, 10, 3, 3), dtype=Float32, requires_grad=True) +Parameter (name=conv.bias, shape=(20,), dtype=Float32, requires_grad=True) + +trainable_params of result: +[Parameter (name=conv.weight, shape=(20, 10, 3, 3), dtype=Float32, requires_grad=True), Parameter (name=conv.bias, shape=(20,), dtype=Float32, requires_grad=True)] +``` + +### Related Attributes + +1. cells_and_names + + The `cells_and_names` method is an iterator that returns the name and content of each `Cell` on the network. A code example is as follows: + + ```python + net = Net() + for m in net.cells_and_names(): + print(m) + ``` + + ```text + ('', Net< + (conv): Conv2d + >) + ('conv', Conv2d) + ``` + +2. set_grad + + `set_grad` is used to specify whether the network needs to compute the gradient. If no parameter is transferred, `requires_grad` is set to True by default. When the feedforward network is executed, a backward network for computing gradients is built. The `TrainOneStepCell` and `GradOperation` APIs do not need to use `set_grad` because they have been implemented internally. If you need to customize the APIs of this training function, you need to set `set_grad` internally or externally. + + ```python + class CustomTrainOneStepCell(nn.Cell): + def __init__(self, network, optimizer, sens=1.0): + """There are three input parameters: training network, optimizer, and backward propagation scaling ratio.""" + super(CustomTrainOneStepCell, self).__init__(auto_prefix=False) + self.network = network # Feedforward network + self.network.set_grad() # Build a backward network for computing gradients. + self.optimizer = optimizer # Optimizer + ``` + + For details about the `CustomTrainOneStepCell` code, see [Customized Training and Evaluation Networks](https://www.mindspore.cn/tutorials/en/master/advanced/train/train_eval.html). + +3. set_train + + The `set_train` API determines whether the model is in training mode. If no parameter is transferred, the `mode` attribute is set to True by default. + + When implementing a network with different training and inference structures, you can use the `training` attribute to distinguish the training scenario from the inference scenario. When `mode` is set to True, the training scenario is used. When `mode` is set to False, the inference scenario is used. + + For the `nn.Dropout` operator in MindSpore, two execution logics are distinguished based on the `mode` attribute of the `Cell`. If the value of `mode` is False, the input is directly returned. If the value of `mode` is True, the operator is executed. + + ```python + import numpy as np + import mindspore as ms + + x = ms.Tensor(np.ones([2, 2, 3]), ms.float32) + net = nn.Dropout(keep_prob=0.7) + + # Start training. + net.set_train() + output = net(x) + print("training result:\n", output) + + # Start inference. + net.set_train(mode=False) + output = net(x) + print("\ninfer result:\n", output) + ``` + + ```text + training result: + [[[1.4285715 1.4285715 1.4285715] + [1.4285715 0. 0. ]] + + [[1.4285715 1.4285715 1.4285715] + [1.4285715 1.4285715 1.4285715]]] + + infer result: + [[[1. 1. 1.] + [1. 1. 1.]] + + [[1. 1. 1.] + [1. 1. 1.]]] + ``` + +4. to_float + + The `to_float` API recursively configures the forcible conversion types of the current `Cell` and all sub-`Cell`s so that the current network structure uses a specific float type. This API is usually used in mixed precision scenarios. + + The following example uses the float32 and float16 types to compute the `nn.dense` layer and prints the data type of the output result. + + ```python + import numpy as np + from mindspore import nn + import mindspore as ms + + # float32 is used for computation. + x = ms.Tensor(np.ones([2, 2, 3]), ms.float32) + net = nn.Dense(3, 2) + output = net(x) + print(output.dtype) + + # float16 is used for computation. + net1 = nn.Dense(3, 2) + net1.to_float(ms.float16) + output = net1(x) + print(output.dtype) + ``` + + ```text + Float32 + Float16 + ``` + +## Building a Network + +When building a network, you can inherit the `nn.Cell` class, declare the definition of each layer in the `__init__` constructor, and implement the connection relationship between layers in `construct` to complete the build of the feedforward neural network. + +The `mindspore.ops` module provides the implementation of basic operators, such as neural network operators, array operators, and mathematical operators. + +The `mindspore.nn` module further encapsulates basic operators. You can flexibly use different operators as required. + +In addition, to better build and manage complex networks, `mindspore.nn` provides two types of containers to manage submodules or model layers on the network: `nn.CellList` and `nn.SequentialCell`. + +### ops-based Network Build + +The [mindspore.ops](https://www.mindspore.cn/docs/en/master/api_python/mindspore.ops.html) module provides the implementation of basic operators, such as neural network operators, array operators, and mathematical operators. + +You can use operators in `mindspore.ops` to build a simple algorithm $f(x)=x^2+w$. The following is an example: + +```python +import numpy as np +import mindspore as ms +from mindspore import nn, ops + +class Net(nn.Cell): + def __init__(self): + super(Net, self).__init__() + self.mul = ops.Mul() + self.add = ops.Add() + self.weight = ms.Parameter(ms.Tensor(np.array([2, 2, 2]), ms.float32)) + + def construct(self, x): + return self.add(self.mul(x, x), self.weight) + +net = Net() +input = ms.Tensor(np.array([1, 2, 3]), ms.float32) +output = net(input) + +print(output) +``` + +```text + [ 3. 6. 11.] +``` + +### nn-based Network Build + +Although various operators provided by the `mindspore.ops` module can basically meet network build requirements, [mindspore.nn](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#) further encapsulates the `mindspore.ops` operator to provide more convenient and easy-to-use APIs in complex deep networks. + +The `mindspore.nn` module mainly includes a convolutional layer (such as `nn.Conv2d`), a pooling layer (such as `nn.MaxPool2d`), and a non-linear activation function (such as `nn.ReLU`), a loss functions (such as `nn.LossBase`) and an optimizer (such as `nn.Momentum`) that are commonly used in a neural network to facilitate user operations. + +In the following sample code, the `mindspore.nn` module is used to build a Conv + Batch Normalization + ReLu model network. + +```python +import numpy as np +from mindspore import nn + +class ConvBNReLU(nn.Cell): + def __init__(self): + super(ConvBNReLU, self).__init__() + self.conv = nn.Conv2d(3, 64, 3) + self.bn = nn.BatchNorm2d(64) + self.relu = nn.ReLU() + + def construct(self, x): + x = self.conv(x) + x = self.bn(x) + out = self.relu(x) + return out + +net = ConvBNReLU() +print(net) +``` + +```text + ConvBNReLU< + (conv): Conv2d + (bn): BatchNorm2d + (relu): ReLU<> + > +``` + +### Container-based Network Build + +To facilitate managing and forming a more complex network, `mindspore.nn` provides containers to manage submodel blocks or model layers on the network through either `nn.CellList` and `nn.SequentialCell`. + +1. CellList-based Network Build + + A Cell built using `nn.CellList` can be either a model layer or a built network subblock. `nn.CellList` supports the `append`, `extend`, and `insert` methods. + + When running the network, you can use the for loop in the construct method to obtain the output result. + + - `append(cell)`: adds a cell to the end of the list. + - `extend (cells)`: adds cells to the end of the list. + - `insert(index, cell)`: inserts a given cell before the given index in the list. + + The following uses `nn.CellList` to build and execute a network that contains a previously defined model subblock ConvBNReLU, a Conv2d layer, a BatchNorm2d layer, and a ReLU layer in sequence: + + ```python + import numpy as np + import mindspore as ms + from mindspore import nn + + class MyNet(nn.Cell): + + def __init__(self): + super(MyNet, self).__init__() + layers = [ConvBNReLU()] + # Use CellList to manage the network. + self.build_block = nn.CellList(layers) + + # Use the append method to add the Conv2d and ReLU layers. + self.build_block.append(nn.Conv2d(64, 4, 4)) + self.build_block.append(nn.ReLU()) + + # Use the insert method to insert BatchNorm2d between the Conv2d layer and the ReLU layer. + self.build_block.insert(-1, nn.BatchNorm2d(4)) + + def construct(self, x): + # Use the for loop to execute the network. + for layer in self.build_block: + x = layer(x) + return x + + net = MyNet() + print(net) + ``` + + ```text + MyNet< + (build_block): CellList< + (0): ConvBNReLU< + (conv): Conv2d + (bn): BatchNorm2d + (relu): ReLU<> + > + (1): Conv2d + (2): BatchNorm2d + (3): ReLU<> + > + > + ``` + + Input data into the network model. + + ```python + input = ms.Tensor(np.ones([1, 3, 64, 32]), ms.float32) + output = net(input) + print(output.shape) + ``` + + ```text + (1, 4, 64, 32) + ``` + +2. SequentialCell-based Network Build + + Use `nn.SequentialCell` to build a Cell sequence container. Submodules can be input in List or OrderedDict format. + + Different from `nn.CellList`, the `nn.SequentialCell` class implements the `construct` method and can directly output results. + + The following example uses `nn.SequentialCell` to build a network. The input is in List format. The network structure contains a previously defined model subblock ConvBNReLU, a Conv2d layer, a BatchNorm2d layer, and a ReLU layer in sequence. + + ```python + import numpy as np + import mindspore as ms + from mindspore import nn + + class MyNet(nn.Cell): + + def __init__(self): + super(MyNet, self).__init__() + + layers = [ConvBNReLU()] + layers.extend([nn.Conv2d(64, 4, 4), + nn.BatchNorm2d(4), + nn.ReLU()]) + self.build_block = nn.SequentialCell(layers) # Use SequentialCell to manage the network. + + def construct(self, x): + return self.build_block(x) + + net = MyNet() + print(net) + ``` + + ```text + MyNet< + (build_block): SequentialCell< + (0): ConvBNReLU< + (conv): Conv2d + (bn): BatchNorm2d + (relu): ReLU<> + > + (1): Conv2d + (2): BatchNorm2d + (3): ReLU<> + > + > + ``` + + Input data into the network model. + + ```python + input = ms.Tensor(np.ones([1, 3, 64, 32]), ms.float32) + output = net(input) + print(output.shape) + ``` + + ```text + (1, 4, 64, 32) + ``` + + The following example uses `nn.SequentialCell` to build a network. The input is in OrderedDict format. + + ```python + import numpy as np + import mindspore as ms + from mindspore import nn + from collections import OrderedDict + + class MyNet(nn.Cell): + + def __init__(self): + super(MyNet, self).__init__() + layers = OrderedDict() + + # Add cells to the dictionary. + layers["ConvBNReLU"] = ConvBNReLU() + layers["conv"] = nn.Conv2d(64, 4, 4) + layers["norm"] = nn.BatchNorm2d(4) + layers["relu"] = nn.ReLU() + + # Use SequentialCell to manage the network. + self.build_block = nn.SequentialCell(layers) + + def construct(self, x): + return self.build_block(x) + + net = MyNet() + print(net) + + input = ms.Tensor(np.ones([1, 3, 64, 32]), ms.float32) + output = net(input) + print(output.shape) + ``` + + ```text + MyNet< + (build_block): SequentialCell< + (ConvBNReLU): ConvBNReLU< + (conv): Conv2d + (bn): BatchNorm2d + (relu): ReLU<> + > + (conv): Conv2d + (norm): BatchNorm2d + (relu): ReLU<> + > + > + (1, 4, 64, 32) + ``` + +## Relationship Between nn and ops + +The `mindspore.nn` module is a model component implemented by Python. It encapsulates low-level APIs, including various model layers, loss functions, and optimizers related to neural network models. + +In addition, `mindspore.nn` provides some APIs with the same name as the `mindspore.ops` operators to further encapsulate the `mindspore.ops` operators and provide more friendly APIs. You can also use the `mindspore.ops` operators to customize a network based on the actual situation. + +The following example uses the `mindspore.ops.Conv2D` operator to implement the convolution computation function, that is, the `nn.Conv2d` operator function. + +```python +import mindspore.nn as nn +import mindspore.ops as ops +import mindspore as ms +from mindspore.common.initializer import initializer + + +class Net(nn.Cell): + def __init__(self, in_channels=10, out_channels=20, kernel_size=3): + super(Net, self).__init__() + self.conv2d = ops.Conv2D(out_channels, kernel_size) + self.bias_add = ops.BiasAdd() + self.weight = ms.Parameter( + initializer('normal', [out_channels, in_channels, kernel_size, kernel_size]), + name='conv.weight') + self.bias = ms.Parameter(initializer('normal', [out_channels]), name='conv.bias') + + def construct(self, x): + """Input data x.""" + output = self.conv2d(x, self.weight) + output = self.bias_add(output, self.bias) + return output +``` diff --git a/tutorials/source_en/advanced/network/loss.md b/tutorials/source_en/advanced/network/loss.md new file mode 100644 index 0000000000..b3379eb5c8 --- /dev/null +++ b/tutorials/source_en/advanced/network/loss.md @@ -0,0 +1,322 @@ +# Loss Function + + + +A loss function is also called objective function and is used to measure the difference between a predicted value and an actual value. + +In deep learning, model training is a process of reducing the loss function value through continuous iteration. Therefore, it is very important to select a loss function in a model training process, and a good loss function can effectively improve model performance. + +The `mindspore.nn` module provides many [general loss functions](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#loss-function), but these functions cannot meet all requirements. In many cases, you need to customize the required loss functions. The following describes how to customize loss functions. + +![lossfun.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/advanced/network/images/loss_function.png) + +## Built-in Loss Functions + +The following introduces [loss functions](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#loss-function) built in the `mindspore.nn` module. + +For example, use `nn.L1Loss` to compute the mean absolute error between the predicted value and the target value. + +$$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad \text{with } l_n = \left| x_n - y_n \right|$$ + +N is the value of `batch_size` in the dataset. + +$$\ell(x, y) = + \begin{cases} + \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ + \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} + \end{cases}$$ + +A value of the `reduction` parameter in `nn.L1Loss` may be `mean`, `sum`, or `none`. If `reduction` is set to `mean` or `sum`, a scalar tensor (dimension reduced) after mean or sum is output. If `reduction` is set to `none`, the shape of the output tensor is the broadcast shape. + +```python +import numpy as np +import mindspore.nn as nn +import mindspore as ms + +# Output a mean loss value. +loss = nn.L1Loss() +# Output a sum loss value. +loss_sum = nn.L1Loss(reduction='sum') +# Output the original loss value. +loss_none = nn.L1Loss(reduction='none') + +input_data = ms.Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32)) +target_data = ms.Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32)) + +print("loss:", loss(input_data, target_data)) +print("loss_sum:", loss_sum(input_data, target_data)) +print("loss_none:\n", loss_none(input_data, target_data)) +``` + +```text + loss: 1.5 + loss_sum: 9.0 + loss_none: + [[1. 0. 2.] + [1. 2. 3.]] +``` + +## Customized Loss Functions + +You can customize a loss function by defining the loss function based on either `nn.Cell` or `nn.LossBase`. `nn.LossBase` is inherited from `nn.Cell` and provides the `get_loss` method. The `reduction` parameter is used to obtain a sum or mean loss value and output a scalar. + +The following describes how to define the mean absolute error (MAE) function by inheriting `Cell` and `LossBase`. The formula of the MAE algorithm is as follows: + +$$ loss= \frac{1}{m}\sum_{i=1}^m\lvert y_i-f(x_i) \rvert$$ + +In the preceding formula, $f(x)$ indicates the predicted value, $y$ indicates the actual value of the sample, and $loss$ indicates the mean distance between the predicted value and the actual value. + +### `nn.Cell`-based Loss Function Build + +`nn.Cell` is the base class of MindSpore. It can be used to build networks and define loss functions. The process of defining a loss function using `nn.Cell` is similar to that of defining a common network. The difference is that the execution logic is to compute the error between the feedforward network output and the actual value. + +The following describes how to customize the loss function `MAELoss` based on `nn.Cell`. + +```python +import mindspore.ops as ops + +class MAELoss(nn.Cell): + """Customize the loss function MAELoss.""" + + def __init__(self): + """Initialize.""" + super(MAELoss, self).__init__() + self.abs = ops.Abs() + self.reduce_mean = ops.ReduceMean() + + def construct(self, base, target): + """Call the operator.""" + x = self.abs(base - target) + return self.reduce_mean(x) + +loss = MAELoss() + +input_data = ms.Tensor(np.array([0.1, 0.2, 0.3]).astype(np.float32)) # Generate a predicted value. +target_data = ms.Tensor(np.array([0.1, 0.2, 0.2]).astype(np.float32)) # Generate the actual value. + +output = loss(input_data, target_data) +print(output) +``` + +```text + 0.033333335 +``` + +### `nn.LossBase`-based Loss Function Build + +The process of building the loss function `MAELoss` based on [nn.LossBase](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.LossBase.html#mindspore.nn.LossBase) is similar to that of building the loss function based on `nn.Cell`. The `__init__` and `construct` methods need to be rewritten. + +`nn.LossBase` can use the `get_loss` method to apply `reduction` to loss computation. + +```python +class MAELoss(nn.LossBase): + """Customize the loss function MAELoss.""" + + def __init__(self, reduction="mean"): + """Initialize and compute the mean loss value.""" + super(MAELoss, self).__init__(reduction) + self.abs = ops.Abs() # Compute the absolute value. + + def construct(self, base, target): + x = self.abs(base - target) + return self.get_loss(x) # Return the mean loss value. + +loss = MAELoss() + +input_data = ms.Tensor(np.array([0.1, 0.2, 0.3]).astype(np.float32)) # Generate a predicted value. +target_data = ms.Tensor(np.array([0.1, 0.2, 0.2]).astype(np.float32)) # Generate the actual value. + +output = loss(input_data, target_data) +print(output) +``` + +```text + 0.033333335 +``` + +## Loss Function and Model Training + +After the loss function `MAELoss` is customized, you can use the `train` API in the [Model](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.Model.html#mindspore.Model) API of MindSpore to train a model. When building a model, you need to transfer the feedforward network, loss function, and optimizer. The `Model` associates them internally to generate a network model that can be used for training. + +In `Model`, the feedforward network and loss function are associated through [nn.WithLossCell](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.WithLossCell.html#mindspore.nn.WithLossCell). `nn.WithLossCell` supports two inputs: `data` and `label`. + +```python +import mindspore as ms +from mindspore import dataset as ds +from mindspore.common.initializer import Normal +from mindvision.engine.callback import LossMonitor + +def get_data(num, w=2.0, b=3.0): + """Generate data and corresponding labels.""" + for _ in range(num): + x = np.random.uniform(-10.0, 10.0) + noise = np.random.normal(0, 1) + y = x * w + b + noise + yield np.array([x]).astype(np.float32), np.array([y]).astype(np.float32) + +def create_dataset(num_data, batch_size=16): + """Load the dataset.""" + dataset = ds.GeneratorDataset(list(get_data(num_data)), column_names=['data', 'label']) + dataset = dataset.batch(batch_size) + return dataset + +class LinearNet(nn.Cell): + """Define the linear regression network."" + def __init__(self): + super(LinearNet, self).__init__() + self.fc = nn.Dense(1, 1, Normal(0.02), Normal(0.02)) + + def construct(self, x): + return self.fc(x) + +ds_train = create_dataset(num_data=160) +net = LinearNet() +loss = MAELoss() +opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9) + +# Use the model API to associate the network, loss function, and optimizer. +model = ms.Model(net, loss, opt) +model.train(epoch=1, train_dataset=ds_train, callbacks=[LossMonitor(0.005)]) +``` + +```text + Epoch:[ 0/ 1], step:[ 1/ 10], loss:[9.169/9.169], time:365.966 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 2/ 10], loss:[5.861/7.515], time:0.806 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 3/ 10], loss:[8.759/7.930], time:0.768 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 4/ 10], loss:[9.503/8.323], time:1.080 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 5/ 10], loss:[8.541/8.367], time:0.762 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 6/ 10], loss:[9.158/8.499], time:0.707 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 7/ 10], loss:[9.168/8.594], time:0.900 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 8/ 10], loss:[6.828/8.373], time:1.184 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 9/ 10], loss:[7.149/8.237], time:0.962 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 10/ 10], loss:[6.342/8.048], time:1.273 ms, lr:0.00500 + Epoch time: 390.358 ms, per step time: 39.036 ms, avg loss: 8.048 +``` + +## Multi-label Loss Function and Model Training + +A simple mean absolute error loss function `MAELoss` is defined above. However, datasets of many deep learning applications are relatively complex. For example, data of an object detection network Faster R-CNN includes a plurality of labels, instead of simply one piece of data corresponding to one label. In this case, the definition and usage of the loss function are slightly different. + +The following describes how to define a multi-label loss function in a multi-label dataset scenario and use a model for model training. + +### Multi-label Dataset + +In the following example, two groups of linear data $y1$ and $y2$ are fitted by using the `get_multilabel_data` function. The fitting target function is: + +$$f(x)=2x+3$$ + +The final dataset should be randomly distributed around the function. The dataset is generated according to the following formula, where `noise` is a random value that complies with the standard normal distribution. The `get_multilabel_data` function returns data $x$, $y1$, and $y2$. + +$$f(x)=2x+3+noise$$ + +Use `create_multilabel_dataset` to generate a multi-label dataset and set `column_names` in `GeneratorDataset` to ['data', 'label1', 'label2']. The returned dataset is in the format that one piece of `data` corresponds to two labels `label1` and `label2`. + +```python +import numpy as np +from mindspore import dataset as ds + +def get_multilabel_data(num, w=2.0, b=3.0): + for _ in range(num): + x = np.random.uniform(-10.0, 10.0) + noise1 = np.random.normal(0, 1) + noise2 = np.random.normal(-1, 1) + y1 = x * w + b + noise1 + y2 = x * w + b + noise2 + yield np.array([x]).astype(np.float32), np.array([y1]).astype(np.float32), np.array([y2]).astype(np.float32) + +def create_multilabel_dataset(num_data, batch_size=16): + dataset = ds.GeneratorDataset(list(get_multilabel_data(num_data)), column_names=['data', 'label1', 'label2']) + dataset = dataset.batch(batch_size) # Each batch has 16 pieces of data. + return dataset +``` + +### Multi-label Loss Function + +Define the multi-label loss function `MAELossForMultiLabel` for the multi-label dataset created in the previous step. + +$$ loss1= \frac{1}{m}\sum_{i=1}^m\lvert y1_i-f(x_i) \rvert$$ + +$$ loss2= \frac{1}{m}\sum_{i=1}^m\lvert y2_i-f(x_i) \rvert$$ + +$$ loss = \frac{(loss1 + loss2)}{2}$$ + +In the preceding formula, $f(x)$ is the predicted value of the sample label, $y1$ and $y2$ are the actual values of the sample label, and $loss1$ is the mean distance between the predicted value and the actual value $y1$, $loss2$ is the mean distance between the predicted value and the actual value $y2$, and $loss$ is the mean value of the loss value $loss1$ and the loss value $loss2$. + +The `construct` method in `MAELossForMultiLabel` has three inputs: predicted value `base`, actual values `target1` and `target2`. In `construct`, compute the errors between the predicted value and the actual value `target1` and between the predicted value and the actual value `target2`, the mean value of the two errors is used as the final loss function value. + +The sample code is as follows: + +```python +class MAELossForMultiLabel(nn.LossBase): + def __init__(self, reduction="mean"): + super(MAELossForMultiLabel, self).__init__(reduction) + self.abs = ops.Abs() + + def construct(self, base, target1, target2): + x1 = self.abs(base - target1) + x2 = self.abs(base - target2) + return (self.get_loss(x1) + self.get_loss(x2))/2 +``` + +### Multi-label Model Training + +When a `Model` is used to associate a specified feedforward network, loss function, and optimizer, `nn.WithLossCell` used in the `Model` by default accepts only two inputs: `data` and `label`. Therefore, it is not applicable to multi-label scenarios. + +In the multi-label scenario, if you want to use a `Model` for model training, you need to associate the feedforward network with the multi-label loss function in advance, that is, customize the loss network. + +- Define a loss network. + + The following example shows how to define the loss network `CustomWithLossCell`. The `backbone` and `loss_fn` parameters of the `__init__` method indicate the feedforward network and loss function, respectively. The input of the `construct` method is the sample input `data` and the sample actual labels `label1` and `label2`, respectively. Transfer the sample input `data` to the feedforward network `backbone`, and transfer the predicted value and two label values to the loss function `loss_fn`. + + ```python + class CustomWithLossCell(nn.Cell): + def __init__(self, backbone, loss_fn): + super(CustomWithLossCell, self).__init__(auto_prefix=False) + self._backbone = backbone + self._loss_fn = loss_fn + + def construct(self, data, label1, label2): + output = self._backbone(data) + return self._loss_fn(output, label1, label2) + ``` + +- Define and train the network model. + + When `Model` is used to connect the feedforward network, multi-label loss function, and optimizer, the `network` of `Model` is specified as the customized loss network `loss_net`, the loss function `loss_fn` is not specified, and the optimizer is still `Momentum`. + + If `loss_fn` is not specified, the `Model` considers that the logic of the loss function has been implemented in the `network` by default, and does not use `nn.WithLossCell` to associate the feedforward network with the loss function. + + ```python + ds_train = create_multilabel_dataset(num_data=160) + net = LinearNet() + + # Define a multi-label loss function. + loss = MAELossForMultiLabel() + + # Define the loss network. Connect the feedforward network and multi-label loss function. + loss_net = CustomWithLossCell(net, loss) + + # Define the optimizer. + opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9) + + # Define a Model. In the multi-label scenario, the loss function does not need to be specified for the Model. + model = ms.Model(network=loss_net, optimizer=opt) + + model.train(epoch=1, train_dataset=ds_train, callbacks=[LossMonitor(0.005)]) + ``` + + ```text + Epoch:[ 0/ 1], step:[ 1/ 10], loss:[10.329/10.329], time:290.788 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 2/ 10], loss:[10.134/10.231], time:0.813 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 3/ 10], loss:[9.862/10.108], time:2.410 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 4/ 10], loss:[11.182/10.377], time:1.154 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 5/ 10], loss:[8.571/10.015], time:1.137 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 6/ 10], loss:[7.763/9.640], time:0.928 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 7/ 10], loss:[7.542/9.340], time:1.001 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 8/ 10], loss:[8.644/9.253], time:1.156 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 9/ 10], loss:[5.815/8.871], time:1.908 ms, lr:0.00500 + Epoch:[ 0/ 1], step:[ 10/ 10], loss:[5.086/8.493], time:1.575 ms, lr:0.00500 + Epoch time: 323.467 ms, per step time: 32.347 ms, avg loss: 8.493 + ``` + +The preceding describes how to define a loss function and use a Model for model training in the multi-label dataset scenario. In many other scenarios, this method may also be used for model training. diff --git a/tutorials/source_en/advanced/network/optim.md b/tutorials/source_en/advanced/network/optim.md new file mode 100644 index 0000000000..e6d666d86c --- /dev/null +++ b/tutorials/source_en/advanced/network/optim.md @@ -0,0 +1,292 @@ +# Optimizer + + + +During model training, the optimizer is used to compute gradients and update network parameters. A proper optimizer can effectively reduce the training time and improve model performance. + +The most basic optimizer is the stochastic gradient descent (SGD) algorithm. Many optimizers are improved based on the SGD to achieve the target function to converge to the global optimal point more quickly and effectively. The `nn` module in MindSpore provides common optimizers, such as `nn.SGD`, `nn.Adam`, and `nn.Momentum`. The following describes how to configure the optimizer provided by MindSpore and how to customize the optimizer. + +![learningrate.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/advanced/network/images/learning_rate.png) + +> For details about the optimizer provided by MindSpore, see [Optimizer API](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#optimizer). + +## Configuring the Optimizer + +When using the optimizer provided by MindSpore, you need to specify the network parameter `params` to be optimized, and then set other main parameters of the optimizer, such as `learning_rate` and `weight_decay`. + +If you want to set options for different network parameters separately, for example, set different learning rates for convolutional and non-convolutional parameters, you can use the parameter grouping method to set the optimizer. + +### Parameter Configuration + +When building an optimizer instance, you need to use the optimizer parameter `params` to configure the weights to be trained and updated on the model network. + +`Parameter` contains a Boolean class attribute `requires_grad`, which is used to indicate whether network parameters in the model need to be updated. The default value of `requires_grad` of most network parameters is True, while the default value of `requires_grad` of a few network parameters is False, for example, `moving_mean` and `moving_variance` in BatchNorm. + +The `trainable_params` method in MindSpore shields the attribute whose `requires_grad` is False in `Parameter`. When configuring the input parameter `params` for the optimizer, you can use the `net.trainable_params()` method to specify the network parameters to be optimized and updated. + +```python +import numpy as np +import mindspore.ops as ops +from mindspore import nn +import mindspore as ms + +class Net(nn.Cell): + def __init__(self): + super(Net, self).__init__() + self.matmul = ops.MatMul() + self.conv = nn.Conv2d(1, 6, 5, pad_mode="valid") + self.param = ms.Parameter(ms.Tensor(np.array([1.0], np.float32))) + + def construct(self, x): + x = self.conv(x) + x = x * self.param + out = self.matmul(x, x) + return out + +net = Net() + +# Parameters to be updated for the configuration optimizer +optim = nn.Adam(params=net.trainable_params()) +print(net.trainable_params()) +``` + +```text + [Parameter (name=param, shape=(1,), dtype=Float32, requires_grad=True), Parameter (name=conv.weight, shape=(6, 1, 5, 5), dtype=Float32, requires_grad=True)] +``` + +You can manually change the default value of the `requires_grad` attribute of `Parameter` in the network weight to determine which parameters need to be updated. + +As shown in the following example, use the `net.get_parameters()` method to obtain all parameters on the network and manually change the `requires_grad` attribute of the convolutional parameter to False. During the training, only non-convolutional parameters are updated. + +```python +conv_params = [param for param in net.get_parameters() if 'conv' in param.name] +for conv_param in conv_params: + conv_param.requires_grad = False +print(net.trainable_params()) +optim = nn.Adam(params=net.trainable_params()) +``` + +```text + [Parameter (name=param, shape=(1,), dtype=Float32, requires_grad=True)] +``` + +### Learning Rate + +As a common hyperparameter in machine learning and deep learning, the learning rate has an important impact on whether the target function can converge to the local minimum value and when to converge to the minimum value. If the learning rate is too high, the target function may fluctuate greatly and it is difficult to converge to the optimal value. If the learning rate is too low, the convergence process takes a long time. In addition to setting a fixed learning rate, MindSpore also supports setting a dynamic learning rate. These methods can significantly improve the convergence efficiency on a deep learning network. + +#### Fixed Learning Rate + +When a fixed learning rate is used, the `learning_rate` input by the optimizer is a floating-point tensor or scalar tensor. + +Take `nn.Momentum` as an example. The fixed learning rate is 0.01. The following is an example: + +```python +# Set the learning rate to 0.01. +optim = nn.Momentum(params=net.trainable_params(), learning_rate=0.01, momentum=0.9) +``` + +#### Dynamic Learning Rate + +`mindspore.nn` provides the dynamic learning rate module, which is classified into the Dynamic LR function and LearningRateSchedule class. The Dynamic LR function pre-generates a learning rate list whose length is `total_step` and transfers the list to the optimizer for use. During training, the value of the ith learning rate is used as the learning rate of the current step in step `i`. The value of `total_step` cannot be less than the total number of training steps. The LearningRateSchedule class transfers the instance to the optimizer, and the optimizer computes the current learning rate based on the current step. + +- Dynamic LR function + + Currently, the [Dynamic LR function](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#dynamic-lr-function) can compute the learning rate (`nn.cosine_decay_lr`) based on the cosine decay function, the learning rate (`nn.exponential_decay_lr`) based on the exponential decay function, the learning rate (`nn.inverse_decay_lr`) based on the counterclockwise decay function, and the learning rate (`nn.natural_exp_decay_lr`) based on the natural exponential decay function, the piecewise constant learning rate (`nn.piecewise_constant_lr`), the learning rate (`nn.polynomial_decay_lr`) based on the polynomial decay function, and the warm-up learning rate (`nn.warmup_lr`). + + The following uses `nn.piecewise_constant_lr` as an example: + + ```python + from mindspore import nn + + milestone = [1, 3, 10] + learning_rates = [0.1, 0.05, 0.01] + lr = nn.piecewise_constant_lr(milestone, learning_rates) + + # Print the learning rate. + print(lr) + + net = Net() + # The optimizer sets the network parameters to be optimized and the piecewise constant learning rate. + optim = nn.SGD(net.trainable_params(), learning_rate=lr) + ``` + + ```text + [0.1, 0.05, 0.05, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01] + ``` + +- LearningRateSchedule Class + + Currently, the [LearningRateSchedule class](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#learningrateschedule-class) can compute the learning rate (`nn.CosineDecayLR`) based on the cosine decay function, the learning rate (`nn.ExponentialDecayLR`) based on the exponential decay function, the learning rate (`nn.InverseDecayLR`) based on the counterclockwise decay function, the learning rate (`nn.NaturalExpDecayLR`) based on the natural exponential decay function, the learning rate (`nn.PolynomialDecayLR`) based on the polynomial decay function, and warm-up learning rate (`nn.WarmUpLR`). + + In the following example, the learning rate `nn.ExponentialDecayLR` is computed based on the exponential decay function. + + ```python + import mindspore as ms + + learning_rate = 0.1 # Initial value of the learning rate + decay_rate = 0.9 # Decay rate + decay_steps = 4 #Number of decay steps + step_per_epoch = 2 + + exponential_decay_lr = nn.ExponentialDecayLR(learning_rate, decay_rate, decay_steps) + + for i in range(decay_steps): + step = ms.Tensor(i, ms.int32) + result = exponential_decay_lr(step) + print(f"step{i+1}, lr:{result}") + + net = Net() + + # The optimizer sets the learning rate and computes the learning rate based on the exponential decay function. + optim = nn.Momentum(net.trainable_params(), learning_rate=exponential_decay_lr, momentum=0.9) + ``` + + ```text + step1, lr:0.1 + step2, lr:0.097400375 + step3, lr:0.094868325 + step4, lr:0.09240211 + ``` + +### Weight Decay + +Weight decay, also referred to as L2 regularization, is a method for mitigating overfitting of a deep neural network. + +Generally, the value range of `weight_decay` is $ [0,1) $, and the default value is 0.0, indicating that the weight decay policy is not used. + +```python +net = Net() +optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.01, + momentum=0.9, weight_decay=0.9) +``` + +In addition, MindSpore supports dynamic weight decay. In this case, `weight_decay` is a customized Cell called `weight_decay_schedule`. During training, the optimizer calls the instance of the Cell and transfers `global_step` to compute the `weight_decay` value of the current step. `global_step` is an internally maintained variable. The value of `global_step` increases by 1 each time a step is trained. Note that the `construct` of the customized `weight_decay_schedule` receives only one input. The following is an example of exponential decay during training. + +```python +from mindspore.nn import Cell +from mindspore import ops, nn +import mindspore as ms + +class ExponentialWeightDecay(Cell): + + def __init__(self, weight_decay, decay_rate, decay_steps): + super(ExponentialWeightDecay, self).__init__() + self.weight_decay = weight_decay + self.decay_rate = decay_rate + self.decay_steps = decay_steps + self.pow = ops.Pow() + self.cast = ops.Cast() + + def construct(self, global_step): + # The `construct` can have only one input. During training, the global step is automatically transferred for computation. + p = self.cast(global_step, ms.float32) / self.decay_steps + return self.weight_decay * self.pow(self.decay_rate, p) + +net = Net() + +weight_decay = ExponentialWeightDecay(weight_decay=0.0001, decay_rate=0.1, decay_steps=10000) +optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.01, + momentum=0.9, weight_decay=weight_decay) +``` + +### Hyperparameter Grouping + +The optimizer can also set options for different parameters separately. In this case, a dictionary list is transferred instead of variables. Each dictionary corresponds to a group of parameter values. Available keys in the dictionary include `params`, `lr`, `weight_decay`, and `grad_centralizaiton`, and `value` indicates the corresponding value. + +`params` is mandatory, and other parameters are optional. If `params` is not configured, the parameter values set when the optimizer is defined are used. During grouping, the learning rate can be a fixed learning rate or a dynamic learning rate, and `weight_decay` can be a fixed value. + +In the following example, different learning rates and weight decay parameters are set for convolutional and non-convolutional parameters. + +```python +net = Net() + +# Convolutional parameter +conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params())) +# Non-convolutional parameter +no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params())) + +# Fixed learning rate +fix_lr = 0.01 + +# Computation of Learning Rate Based on Polynomial Decay Function +polynomial_decay_lr = nn.PolynomialDecayLR(learning_rate=0.1, # Initial learning rate + end_learning_rate=0.01, # Final the learning rate + decay_steps=4, #Number of decay steps + power=0.5) # Polynomial power + +# The convolutional parameter uses a fixed learning rate of 0.001, and the weight decay is 0.01. +# The non-convolutional parameter uses a dynamic learning rate, and the weight decay is 0.0. +group_params = [{'params': conv_params, 'weight_decay': 0.01, 'lr': fix_lr}, + {'params': no_conv_params, 'lr': polynomial_decay_lr}] + +optim = nn.Momentum(group_params, learning_rate=0.1, momentum=0.9, weight_decay=0.0) +``` + +> Except a few optimizers (such as AdaFactor and FTRL), MindSpore supports grouping of learning rates. For details, see [Optimizer API](https://www.mindspore.cn/docs/en/master/api_python/mindspore.nn.html#optimizer). + +## Customized Optimizer + +In addition to the optimizers provided by MindSpore, you can customize optimizers. + +When customizing an optimizer, you need to inherit the optimizer base class [nn.Optimizer](https://www.mindspore.cn/docs/en/master/api_python/nn/mindspore.nn.Optimizer.html#mindspore.nn.Optimizer) and rewrite the `__init__` and `construct` methods to set the parameter update policy. + +The following example implements the customized optimizer Momentum (SGD algorithm with momentum): + +$$ v_{t+1} = v_t×u+grad \tag{1} $$ + +$$p_{t+1} = p_t - lr*v_{t+1} \tag{2} $$ + +$grad$, $lr$, $p$, $v$, and $u$ respectively represent a gradient, a learning rate, a weight parameter, a momentum parameter, and an initial speed. + +```python +import mindspore as ms +from mindspore import nn, ops + +class Momentum(nn.Optimizer): + """Define the optimizer.""" + def __init__(self, params, learning_rate, momentum=0.9): + super(Momentum, self).__init__(learning_rate, params) + self.momentum = ms.Parameter(ms.Tensor(momentum, ms.float32), name="momentum") + self.moments = self.parameters.clone(prefix="moments", init="zeros") + self.assign = ops.Assign() + + def construct(self, gradients): + """The input of construct is gradient. Gradients are automatically transferred during training.""" + lr = self.get_lr() + params = self.parameters # Weight parameter to be updated + + for i in range(len(params)): + # Update the moments value. + self.assign(self.moments[i], self.moments[i] * self.momentum + gradients[i]) + update = params[i] - self.moments[i] * lr # SGD algorithm with momentum + self.assign(params[i], update) + return params + +net = Net() +# Set the parameter to be optimized and the learning rate of the optimizer to 0.01. +opt = Momentum(net.trainable_params(), 0.01) +``` + +`mindSpore.ops` also encapsulates optimizer operators for users to define optimizers, such as `ops.ApplyCenteredRMSProp`, `ops.ApplyMomentum`, and `ops.ApplyRMSProp`. The following example uses the `ApplyMomentum` operator to customize the optimizer Momentum: + +```python +class Momentum(nn.Optimizer): + """Define the optimizer.""" + def __init__(self, params, learning_rate, momentum=0.9): + super(Momentum, self).__init__(learning_rate, params) + self.moments = self.parameters.clone(prefix="moments", init="zeros") + self.momentum = momentum + self.opt = ops.ApplyMomentum() + + def construct(self, gradients): + # Weight parameter to be updated + params = self.parameters + success = None + for param, mom, grad in zip(params, self.moments, gradients): + success = self.opt(param, mom, self.learning_rate, grad, self.momentum) + return success + +net = Net() +# Set the parameter to be optimized and the learning rate of the optimizer to 0.01. +opt = Momentum(net.trainable_params(), 0.01) +``` diff --git a/tutorials/source_en/advanced/network/parameter.md b/tutorials/source_en/advanced/network/parameter.md new file mode 100644 index 0000000000..ff7cd03b34 --- /dev/null +++ b/tutorials/source_en/advanced/network/parameter.md @@ -0,0 +1,314 @@ +# Network Arguments + + + +MindSpore provides initialization modules for parameters and network arguments. You can initialize network arguments by encapsulating operators to call character strings, Initializer subclasses, or customized tensors. + +In the following figure, a blue box indicates a specific execution operator, and a green box indicates a tensor. As the data in the neural network model, the tensor continuously flows in the network, including the data input of the network model and the input and output data of the operator. A red box indicates a parameter which is used as a attribute of the network model or operators in the model or as an intermediate parameter and temporary parameter generated in the backward graph. + +![parameter.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/advanced/network/images/parameter.png) + +The following describes the data type (`dtype`), parameter (`Parameter`), parameter tuple (`ParameterTuple`), network initialization method, and network argument update. + +## dtype + +MindSpore tensors support different data types, including int8, int16, int32, int64, uint8, uint16, uint32, uint64, float16, float32, float64, and Boolean. These data types correspond to those of NumPy. For details about supported data types, visit [mindspore.dtype](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.dtype.html#mindspore.dtype). + +In the computation process of MindSpore, the int data type in Python is converted into the defined int64 type, and the float data type is converted into the defined float32 type. + +In the following code, the data type of MindSpore is int32. + +```python +import mindspore as ms + +data_type = ms.int32 +print(data_type) +``` + +```text + Int32 +``` + +### Data Type Conversion API + +MindSpore provides the following APIs for conversion between NumPy data types and Python built-in data types: + +- `dtype_to_nptype`: converts the data type of MindSpore to the corresponding data type of NumPy. +- `dtype_to_pytype`: converts the data type of MindSpore to the corresponding built-in data type of Python. +- `pytype_to_dtype`: converts the built-in data type of Python to the corresponding data type of MindSpore. + +The following code implements the conversion between different data types and prints the converted type. + +```python +import mindspore as ms + +np_type = ms.dtype_to_nptype(ms.int32) +ms_type = ms.pytype_to_dtype(int) +py_type = ms.dtype_to_pytype(ms.float64) + +print(np_type) +print(ms_type) +print(py_type) +``` + +```text + + Int64 + +``` + +## Parameter + +A [Parameter](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.Parameter.html#mindspore.Parameter) of MindSpore indicates an argument that needs to be updated during network training. For example, the most common parameters of the `nn.conv` operator during forward computation include `weight` and `bias`. During backward graph build and backward propagation computation, many intermediate parameters are generated to temporarily store first-step information and intermediate output values. + +### Parameter Initialization + +There are many methods for initializing `Parameter`, which can receive different data types such as `Tensor` and `Initializer`. + +- `default_input`: input data. Four data types are supported: `Tensor`, `Initializer`, `int`, and `float`. +- `name`: name of a parameter, which is used to distinguish the parameter from other parameters on the network. +- `requires_grad`: indicates whether to compute the argument gradient during network training. If the argument gradient does not need to be computed, set `requires_grad` to `False`. + +In the following sample code, the `int` or `float` data type is used to directly create a parameter: + +```python +import mindspore as ms + +x = ms.Parameter(default_input=2.0, name='x') +y = ms.Parameter(default_input=5.0, name='y') +z = ms.Parameter(default_input=5, name='z', requires_grad=False) + +print(type(x)) +print(x, "value:", x.asnumpy()) +print(y, "value:", y.asnumpy()) +print(z, "value:", z.asnumpy()) +``` + +```text + + Parameter (name=x, shape=(), dtype=Float32, requires_grad=True) value: 2.0 + Parameter (name=y, shape=(), dtype=Float32, requires_grad=True) value: 5.0 + Parameter (name=z, shape=(), dtype=Int32, requires_grad=False) value: 5 +``` + +In the following code, a MindSpore `Tensor` is used to create a parameter: + +```python +import numpy as np +import mindspore as ms + +my_tensor = ms.Tensor(np.arange(2 * 3).reshape((2, 3))) +x = ms.Parameter(default_input=my_tensor, name="tensor") + +print(x) +``` + +```text + Parameter (name=tensor, shape=(2, 3), dtype=Int64, requires_grad=True) +``` + +In the following code example, `Initializer` is used to create a parameter: + +```python +from mindspore.common.initializer import initializer as init +import mindspore as ms + +x = ms.Parameter(default_input=init('ones', [1, 2, 3], ms.float32), name='x') +print(x) +``` + +```text + Parameter (name=x, shape=(1, 2, 3), dtype=Float32, requires_grad=True) +``` + +### Attribute + +The default attributes of a `Parameter` include `name`, `shape`, `dtype`, and `requires_grad`. + +The following example describes how to initialize a `Parameter` by using a `Tensor` and obtain the attributes of the `Parameter`. The sample code is as follows: + +```python +my_tensor = ms.Tensor(np.arange(2 * 3).reshape((2, 3))) +x = ms.Parameter(default_input=my_tensor, name="x") + +print("x: ", x) +print("x.data: ", x.data) +``` + +```text + x: Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True) + x.data: Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True) +``` + +### Parameter Operations + +1. `clone`: clones a tensor `Parameter`. After the cloning is complete, you can specify a new name for the new `Parameter`. + + ```python + x = ms.Parameter(default_input=init('ones', [1, 2, 3], ms.float32)) + x_clone = x.clone() + x_clone.name = "x_clone" + + print(x) + print(x_clone) + ``` + + ```text + Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True) + Parameter (name=x_clone, shape=(1, 2, 3), dtype=Float32, requires_grad=True) + ``` + +2. `set_data`: modifies the data or `shape` of the `Parameter`. + + The `set_data` method has two input parameters: `data` and `slice_shape`. The `data` indicates the newly input data of the `Parameter`. The `slice_shape` indicates whether to change the `shape` of the `Parameter`. The default value is False. + + ```python + x = ms.Parameter(ms.Tensor(np.ones((1, 2)), ms.float32), name="x", requires_grad=True) + print(x, x.asnumpy()) + + y = x.set_data(ms.Tensor(np.zeros((1, 2)), ms.float32)) + print(y, y.asnumpy()) + + z = x.set_data(ms.Tensor(np.ones((1, 4)), ms.float32), slice_shape=True) + print(z, z.asnumpy()) + ``` + + ```text + Parameter (name=x, shape=(1, 2), dtype=Float32, requires_grad=True) [[1. 1.]] + Parameter (name=x, shape=(1, 2), dtype=Float32, requires_grad=True) [[0. 0.]] + Parameter (name=x, shape=(1, 4), dtype=Float32, requires_grad=True) [[1. 1. 1. 1.]] + ``` + +3. `init_data`: In parallel scenarios, the shape of a argument changes. You can call the `init_data` method of `Parameter` to obtain the original data. + + ```python + x = ms.Parameter(ms.Tensor(np.ones((1, 2)), ms.float32), name="x", requires_grad=True) + + print(x.init_data(), x.init_data().asnumpy()) + ``` + + ```text + Parameter (name=x, shape=(1, 2), dtype=Float32, requires_grad=True) [[1. 1.]] + ``` + +### Updating Parameters + +MindSpore provides the network argument update function. You can use `nn.ParameterUpdate` to update network arguments. The input argument type must be tensor, and the tensor `shape` must be the same as the original network argument `shape`. + +The following is an example of updating the weight arguments of a network: + +```python +import numpy as np +import mindspore as ms +from mindspore import nn + +# Build a network. +network = nn.Dense(3, 4) + +# Obtain the weight argument of a network. +param = network.parameters_dict()['weight'] +print("Parameter:\n", param.asnumpy()) + +# Update the weight argument. +update = nn.ParameterUpdate(param) +weight = ms.Tensor(np.arange(12).reshape((4, 3)), ms.float32) +output = update(weight) +print("Parameter update:\n", output) +``` + +```text + Parameter: + [[-0.0164615 -0.01204428 -0.00813806] + [-0.00270927 -0.0113328 -0.01384139] + [ 0.00849093 0.00351116 0.00989969] + [ 0.00233028 0.00649209 -0.0021333 ]] + Parameter update: + [[ 0. 1. 2.] + [ 3. 4. 5.] + [ 6. 7. 8.] + [ 9. 10. 11.]] +``` + +## Parameter Tuple + +The [ParameterTuple](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.ParameterTuple.html#mindspore.ParameterTuple) is used to store multiple `Parameter`s. It is inherited from the `tuple` and provides the clone function. + +The following example describes how to create a `ParameterTuple`: + +```python +import numpy as np +import mindspore as ms +from mindspore.common.initializer import initializer + +# Create. +x = ms.Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name="x") +y = ms.Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y') +z = ms.Parameter(default_input=2.0, name='z') +params = ms.ParameterTuple((x, y, z)) + +# Clone from params and change the name to "params_copy". +params_copy = params.clone("params_copy") + +print(params) +print(params_copy) +``` + +```text + (Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True)) + (Parameter (name=params_copy.x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True)) +``` + +## Initializing Network Arguments + +MindSpore provides multiple network argument initialization modes and encapsulates the argument initialization function in some operators. The following uses the `Conv2d` operator as an example to describe how to use the `Initializer` subclass, character string, and customized `Tensor` to initialize network arguments. + +### Initializer + +Use `Initializer` to initialize network arguments. The sample code is as follows: + +```python +import numpy as np +import mindspore.nn as nn +import mindspore as ms +from mindspore.common import initializer as init + +ms.set_seed(1) + +input_data = ms.Tensor(np.ones([1, 3, 16, 50], dtype=np.float32)) +# Convolutional layer. The number of input channels is 3, the number of output channels is 64, the size of the convolution kernel is 3 x 3, and the weight argument is a random number generated in normal distribution. +net = nn.Conv2d(3, 64, 3, weight_init=init.Normal(0.2)) +# Network output +output = net(input_data) +``` + +### Character String Initialization + +Use a character string to initialize network arguments. The content of the character string must be the same as the `Initializer` name (case insensitive). If the character string is used for initialization, the default arguments in the `Initializer` class are used. For example, using the character string `Normal` is equivalent to using `Normal()` of `Initializer`. The following is an example: + +```python +import numpy as np +import mindspore.nn as nn +import mindspore as ms + +ms.set_seed(1) + +input_data = ms.Tensor(np.ones([1, 3, 16, 50], dtype=np.float32)) +net = nn.Conv2d(3, 64, 3, weight_init='Normal') +output = net(input_data) +``` + +### Tensor Initialization + +You can also customize a `Tensor` to initialize the arguments of operators in the network model. The sample code is as follows: + +```python +import numpy as np +import mindspore.nn as nn +import mindspore as ms + +init_data = ms.Tensor(np.ones([64, 3, 3, 3]), dtype=ms.float32) +input_data = ms.Tensor(np.ones([1, 3, 16, 50], dtype=np.float32)) + +net = nn.Conv2d(3, 64, 3, weight_init=init_data) +output = net(input_data) +``` diff --git a/tutorials/source_en/index.rst b/tutorials/source_en/index.rst index 265c050907..fadb0118f1 100644 --- a/tutorials/source_en/index.rst +++ b/tutorials/source_en/index.rst @@ -27,4 +27,5 @@ MindSpore Tutorial :caption: Advanced advanced/dataset - advanced/train \ No newline at end of file + advanced/network + advanced/train -- Gitee