diff --git a/tutorials/source_en/beginner/autograd.md b/tutorials/source_en/beginner/autograd.md index 31c200b3eba5fee274af1509cfb082cfd17b1372..49af3948fc30479bf23715aa9a8dc780062c77a8 100644 --- a/tutorials/source_en/beginner/autograd.md +++ b/tutorials/source_en/beginner/autograd.md @@ -1,195 +1,220 @@ # Automatic Differentiation -`Ascend` `GPU` `CPU` `Beginner` `Model Development` + - +Automatic differentiation is able to calculate the derivative value of a derivative function at a certain point, which is a generalization of backpropagation algorithms. The main problem solved by automatic differentiation is to decompose a complex mathematical operation into a series of simple basic operations, which shields the user from a large number of details and processes of differentiation, which greatly reduces the threshold for the use of the framework. -Automatic differentiation is commonly used when implementing machine learning algorithms such as backpropagation for training neural networks. By using automatic differentiation, multi-layer composite functions could be divided into several simple computational steps, thereby helping users avoid implementing complex derivation codes. As a result, automatic differentiation enables ease of use of MindSpore. +MindSpore uses `ops.GradOperation` to calculate a first-order derivative, and the attributes of the first-order derivative are as the following: -The first-order derivative method of MindSpore is `mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)`. When `get_all` is set to `False`, the first input derivative is computed. When `get_all` is set to `True`, all input derivatives are computed. When `get_by_list` is set to `False`, weight derivatives are not computed. When `get_by_list` is set to `True`, the weight derivative is computed. `sens_param` scales the output value of the network to change the final gradient. The following uses the MatMul operator derivative for in-depth analysis. +- `get_all`:Whether to derive the input parameters, the default value is False. +- `get_by_list`:Whether to derive the weight parameters, the default value is False. +- `sens_param`:Whether to scale the output value of the network to change the final gradient, the default value is False. -Import the required modules and APIs: - -```python -import numpy as np -import mindspore.nn as nn -import mindspore.ops as ops -from mindspore import Tensor -from mindspore import ParameterTuple, Parameter -from mindspore import dtype as mstype -``` +This chapter uses `ops.GradOperation` in MindSpore to find first-order derivatives of the function $f(x)=wx+b$. ## First-order Derivative of the Input -To compute the input derivative, you need to define a network requiring a derivative. The following uses a network $f(x,y)=z *x* y$ formed by the MatMul operator as an example. - -The network structure is as follows: +The formula needs to be defined before the input can be derived: +$$ +f(x)=wx+b \tag {1} +$$ +The example code below is an expression of Equation (1), and since MindSpore is functionally programmed, all expressions of computational formulas are represented as functions. ```python +import numpy as np +import mindspore.nn as nn +from mindspore import Parameter, Tensor + class Net(nn.Cell): def __init__(self): super(Net, self).__init__() - self.matmul = ops.MatMul() - self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z') + self.w = Parameter(np.array([6.0]), name='w') + self.b = Parameter(np.array([1.0]), name='b') - def construct(self, x, y): - x = x * self.z - out = self.matmul(x, y) - return out + def construct(self, x): + f = self.w * x + self.b + return f ``` -Define the network requiring the derivative. In the `__init__` function, define the `self.net` and `ops.GradOperation` networks. In the `construct` function, compute the derivative of `self.net`. - -The network structure is as follows: +Define the derivative class `GradNet`. In the `__init__` function, define the `self.net` and `ops.GradOperation` networks. In the `construct` function, compute the derivative of `self.net`. Its corresponding MindSpore internally produces the following formula (2): +$$ +f^{'}(x)=w\tag {2} +$$ ```python -class GradNetWrtX(nn.Cell): +from mindspore import dtype as mstype +import mindspore.ops as ops + +class GradNet(nn.Cell): def __init__(self, net): - super(GradNetWrtX, self).__init__() + super(GradNet, self).__init__() self.net = net self.grad_op = ops.GradOperation() - def construct(self, x, y): + def construct(self, x): gradient_function = self.grad_op(self.net) - return gradient_function(x, y) + return gradient_function(x) ``` -Define the input and display the output: +At last, define the weight parameter as w and a first-order derivative is found for the input parameter x in the input formula (1). From the running result, the input in formula (1) is 6, that is: +$$ +f(x)=wx+b=6*x+1 \tag {3} +$$ + To derive the above equation, there is: +$$ +f^{'}(x)=w=6 \tag {4} +$$ ```python -x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32) -y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32) -output = GradNetWrtX(Net())(x, y) +x = Tensor([100], dtype=mstype.float32) +output = GradNet(Net())(x) + print(output) ``` ```text - [[4.5099998 2.7 3.6000001] - [4.5099998 2.7 3.6000001]] +[6.] ``` -If the derivatives of the `x` and `y` inputs are considered, you only need to set `self.grad_op = GradOperation(get_all=True)` in `GradNetWrtX`. +MindSpore calculates the first derivative method `ops.GradOperation (get_all=False, get_by_lsit=False, sens_param=False)`, where when `get_all` is `False`, only the first input is evaluated, and when `True` is set, all inputs are evaluated. ## First-order Derivative of the Weight To compute weight derivatives, you need to set `get_by_list` in `ops.GradOperation` to `True`. -The `GradNetWrtX` structure is as follows: - ```python -class GradNetWrtX(nn.Cell): +from mindspore import ParameterTuple + +class GradNet(nn.Cell): def __init__(self, net): - super(GradNetWrtX, self).__init__() + super(GradNet, self).__init__() self.net = net self.params = ParameterTuple(net.trainable_params()) - self.grad_op = ops.GradOperation(get_by_list=True) + self.grad_op = ops.GradOperation(get_by_list=True) # Set the first-order derivative of the weight parameters - def construct(self, x, y): + def construct(self, x): gradient_function = self.grad_op(self.net, self.params) - return gradient_function(x, y) + return gradient_function(x) ``` -Run and display the output: +Next, derive the function: ```python -output = GradNetWrtX(Net())(x, y) -print(output) +# Perform a derivative calculation on the function +x = Tensor([100], dtype=mstype.float32) +fx = GradNet(Net())(x) + +# Print the results +print(fx) +print(f"wgrad: {fx[0]}\nbgrad: {fx[1]}") ``` ```text -(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),) +(Tensor(shape=[1], dtype=Float32, value= [ 6.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00])) +wgrad: [6.] +bgrad: [1.] ``` If computation of certain weight derivatives is not required, set `requirements_grad` to `False` when defining the network requiring derivatives. ```Python -self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z', requires_grad=False) -``` +class Net(nn.Cell): + def __init__(self): + super(Net, self).__init__() + self.w = Parameter(Tensor(np.array([6], np.float32)), name='w') + self.b = Parameter(Tensor(np.array([1.0], np.float32)), name='b', requires_grad=False) -## Gradient Value Scaling + def construct(self, x): + out = x * self.w + self.b + return out -You can use the `sens_param` parameter to scale the output value of the network to change the final gradient. Set `sens_param` in `ops.GradOperation` to `True` and determine the scaling index. The dimension must be the same as the output dimension. +class GradNet(nn.Cell): + def __init__(self, net): + super(GradNet, self).__init__() + self.net = net + self.params = ParameterTuple(net.trainable_params()) + self.grad_op = ops.GradOperation(get_by_list=True) -The scaling index `self.grad_wrt_output` may be in the following format: + def construct(self, x): + gradient_function = self.grad_op(self.net, self.params) + return gradient_function(x) -```python -self.grad_wrt_output = Tensor([[s1, s2, s3], [s4, s5, s6]]) +# Construct a derivative network +x = Tensor([5], dtype=mstype.float32) +fw = GradNet(Net())(x) + +print(fw) ``` -The `GradNetWrtX` structure is as follows: +```text +(Tensor(shape=[1], dtype=Float32, value= [ 5.00000000e+00]),) +``` + +## Gradient Value Scaling + +You can use the `sens_param` parameter to scale the output value of the network to change the final gradient. Set `sens_param` in `ops.GradOperation` to `True` and determine the scaling index. The dimension must be the same as the output dimension. ```python -class GradNetWrtX(nn.Cell): +class GradNet(nn.Cell): def __init__(self, net): - super(GradNetWrtX, self).__init__() + super(GradNet, self).__init__() self.net = net + # Derivative operation self.grad_op = ops.GradOperation(sens_param=True) - self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32) + # Scale index + self.grad_wrt_output = Tensor([0.1], dtype=mstype.float32) - def construct(self, x, y): + def construct(self, x): gradient_function = self.grad_op(self.net) - return gradient_function(x, y, self.grad_wrt_output) + return gradient_function(x, self.grad_wrt_output) + +x = Tensor([6], dtype=mstype.float32) +output = GradNet(Net())(x) -output = GradNetWrtX(Net())(x, y) print(output) ``` ```text -[[2.211 0.51 1.49 ] - [5.588 2.68 4.07 ]] +[0.6] ``` -## Stop Gradient +## Stopping Gradient We can use `stop_gradient` to disable calculation of gradient for certain operators. For example: ```python -import numpy as np -import mindspore.nn as nn -import mindspore.ops as ops -from mindspore import Tensor -from mindspore import ParameterTuple, Parameter -from mindspore import dtype as mstype from mindspore.ops import stop_gradient class Net(nn.Cell): def __init__(self): super(Net, self).__init__() - self.matmul = ops.MatMul() + self.w = Parameter(Tensor(np.array([6], np.float32)), name='w') + self.b = Parameter(Tensor(np.array([1.0], np.float32)), name='b') - def construct(self, x, y): - out1 = self.matmul(x, y) - out2 = self.matmul(x, y) - out2 = stop_gradient(out2) - out = out1 + out2 + def construct(self, x): + out = x * self.w + self.b + # Stops updating the gradient, and out does not contribute to gradient calculations + out = stop_gradient(out) return out -class GradNetWrtX(nn.Cell): +class GradNet(nn.Cell): def __init__(self, net): - super(GradNetWrtX, self).__init__() + super(GradNet, self).__init__() self.net = net - self.grad_op = ops.GradOperation() + self.params = ParameterTuple(net.trainable_params()) + self.grad_op = ops.GradOperation(get_by_list=True) - def construct(self, x, y): - gradient_function = self.grad_op(self.net) - return gradient_function(x, y) + def construct(self, x): + gradient_function = self.grad_op(self.net, self.params) + return gradient_function(x) -x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32) -y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32) -output = GradNetWrtX(Net())(x, y) -print(output) -``` +x = Tensor([100], dtype=mstype.float32) +output = GradNet(Net())(x) -```text - [[4.5, 2.7, 3.6], - [4.5, 2.7, 3.6]] +print(f"wgrad: {output[0]}\nbgrad: {output[1]}") ``` -Here, we set `stop_gradient` to `out2`, so this operator does not have any contribution to gradient. If we delete `out2 = stop_gradient(out2)`, the result is: - ```text - [[9.0, 5.4, 7.2], - [9.0, 5.4, 7.2]] +wgrad: [0.] +bgrad: [0.] ``` - -After we do not set `stop_gradient` to `out2`, it will make the same contribution to gradient as `out1`. So we can see that each result has doubled. \ No newline at end of file diff --git a/tutorials/source_en/beginner/dataset.md b/tutorials/source_en/beginner/dataset.md index 35de7d86320b8ba6418225c3072942f2b601dc73..dde8d300eae0b0dd154216b44b554d7af4f6f2a8 100644 --- a/tutorials/source_en/beginner/dataset.md +++ b/tutorials/source_en/beginner/dataset.md @@ -1,252 +1,174 @@ -# Loading and Processing Data +# Data Processing -`Ascend` `GPU` `CPU` `Beginner` `Data Preparation` + - +Data is the foundation of deep learning, and inputting the high-quality data plays an active role in the entire deep neural network. -MindSpore provides APIs for loading common datasets and datasets in standard formats. You can directly use the corresponding dataset loading class in mindspore.dataset to load data. The dataset class provides common data processing APIs for users to quickly process data. +[mindspore.dataset](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.dataset.html) provides a loading interface for some commonly used datasets and standard format datasets, enabling users to quickly perform data processing operations. For the image datasets, users can use `mindvision.dataset` to load and process datasets. This chapter first describes how to load and process a CIFAR-10 dataset by using the `mindvision.dataset.Cifar10` interface, and then describes how to use `mindspore.dataset.GeneratorDataset` to implement custom dataset loading. -## Data Preparation +> `mindvision.dataset`is a dataset interface developed on the basis of `mindspore.dataset`. In addition to providing dataset loading capabilities, `mindvision.dataset` further provides dataset download capabilities, data processing, and data enhancement capabilities. -Execute the following command to download and decompress the CIFAR-10 and MNIST dataset to the specified location. +## Data Process -```python -import os -import requests -import tarfile -import zipfile - -def download_dataset(url, target_path): - """download and decompress dataset""" - if not os.path.exists(target_path): - os.makedirs(target_path) - download_file = url.split("/")[-1] - if not os.path.exists(download_file): - res = requests.get(url, stream=True, verify=False) - if download_file.split(".")[-1] not in ["tgz","zip","tar","gz"]: - download_file = os.path.join(target_path, download_file) - with open(download_file, "wb") as f: - for chunk in res.iter_content(chunk_size=512): - if chunk: - f.write(chunk) - if download_file.endswith("zip"): - z = zipfile.ZipFile(download_file, "r") - z.extractall(path=target_path) - z.close() - if download_file.endswith(".tar.gz") or download_file.endswith(".tar") or download_file.endswith(".tgz"): - t = tarfile.open(download_file) - names = t.getnames() - for name in names: - t.extract(name, target_path) - t.close() - -download_dataset("https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz", "./datasets") -download_dataset("https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/MNIST_Data.zip", "./datasets") -``` +In the network training and inference process, the raw data is generally stored in a disk or database. The raw data needs to be read into the memory space through the data loading step, converted into a framework-common tensor format, and then mapped to a more easy-to-learn space through the data processing and augmentation steps. While the number of samples and generalization is increased, the data finally enters the network for calculation. -The directory structure of the CIFAR-10 dataset file is as follows: +The overall process is shown in the following figure: -```text -./datasets/cifar-10-batches-bin -├── batches.meta.txt -├── data_batch_1.bin -├── data_batch_2.bin -├── data_batch_3.bin -├── data_batch_4.bin -├── data_batch_5.bin -├── readme.html -└── test_batch.bin -``` +![dataset_pipeline](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_zh_cn/beginner/images/dataset_pipeline.png) -Refer to [Quick Start](https://www.mindspore.cn/tutorials/en/master/quick_start.html#downloading-the-dataset) for the directory structure of MINIST dataset files. +### Dataset -## Loading the Dataset +A dataset is a collection of samples, and a row of a dataset is a sample that contains one or more features, and may further contain a label. The dataset needs to meet certain specification requirements to make it easier to evaluate the effectiveness of the model. -In the following example, the CIFAR-10 dataset is loaded through the `Cifar10Dataset` API, and the first five samples are obtained using the sequential sampler. +Dataset supports multiple format datasets, including MindRecord, a MindSpore self-developed data format, commonly used public image datasets and text datasets, user-defined datasets, etc. -```python -import mindspore.dataset as ds +### Dataset Loading -DATA_DIR = "./datasets/cifar-10-batches-bin" -sampler = ds.SequentialSampler(num_samples=5) -dataset = ds.Cifar10Dataset(DATA_DIR, sampler=sampler) -``` +Dataset loading allows the model to be continuously acquired for training during training. Dataset provides corresponding classes for a variety of commonly used datasets to load datasets. For data files in different storage formats, Dataset also has corresponding classes for data loading. -## Iterating Dataset +Dataset provides multiple uses of the sampler (Sampler), and the sampler is responsible for generating the read index sequence. The Dataset is responsible for reading the corresponding data according to the index, helping users to sample the dataset in different forms to meet the training needs, and solving problems such as the data set is too large or the sample class distribution is uneven. -You can use `create_dict_iterator` to create a data iterator to iteratively access data. The following shows the image shapes and labels. +> It should be noted that the sampler is responsible for performing filter and reorder operations on the sample, not performing the Batch operation. -```python -for data in dataset.create_dict_iterator(): - print("Image shape: {}".format(data['image'].shape), ", Label: {}".format(data['label'])) -``` +### Data processing -```text - Image shape: (32, 32, 3) , Label: 6 - Image shape: (32, 32, 3) , Label: 9 - Image shape: (32, 32, 3) , Label: 9 - Image shape: (32, 32, 3) , Label: 4 - Image shape: (32, 32, 3) , Label: 1 -``` +After the Dataset loads the data into the memory, the data is organized in a Tensor form. Tensor is also a basic data structure in data augmentation operations. -## Customizing Datasets +## Loading the Dataset -For datasets that cannot be directly loaded by MindSpore, you can build a custom dataset class and use the `GeneratorDataset` API to customize data loading. +In the following example, the CIFAR-10 dataset is loaded through the `mindvision.dataset.Cifar10` interface. The CIFAR-10 dataset has a total of 60,000 32*32 color images, which are divided into 10 categories, each with 6,000 maps, and a total of 50,000 training pictures and 10,000 test pictures in the dataset. `Cifar10` interface provides CIFAR-10 dataset download and load capabilities. -```python -import numpy as np +![cifar10](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_zh_cn/beginner/images/cifar10.jpg) -np.random.seed(58) +- `path`: The location of the dataset root directory. +- `split`: Training, testing or inferencing of the dataset, optionally `train`,`test` or `infer`, `train` by default. +- `download`: Whether to download the dataset. When `ture` is set, if the dataset does not exist, you can download and extract the dataset, `False` by default. -class DatasetGenerator: - def __init__(self): - self.data = np.random.sample((5, 2)) - self.label = np.random.sample((5, 1)) +```python +from mindvision.dataset import Cifar10 - def __getitem__(self, index): - return self.data[index], self.label[index] +# Dataset root directory +data_dir = "./datasets" - def __len__(self): - return len(self.data) +# Download, extract and load the CIFAR-10 training dataset +dataset = Cifar10(path=data_dir, split='train', batch_size=6, resize=32, download=True) +dataset = dataset.run() ``` -You need to customize the following class functions: - -- **\_\_init\_\_** - - When a dataset object is instantiated, the `__init__` function is called. You can perform operations such as data initialization. +The directory structures of the CIFAR-10 dataset files are as follows: - ```python - def __init__(self): - self.data = np.random.sample((5, 2)) - self.label = np.random.sample((5, 1)) - ``` - -- **\_\_getitem\_\_** - - Define the `__getitem__` function of the dataset class to support random access and obtain and return data in the dataset based on the specified `index` value. - - The return value of the `__getitem__` function needs to be a tuple of numpy arrays. When returning a single numpy array, it can be written as `return (np_array_1,)`. - - ```python - def __getitem__(self, index): - return self.data[index], self.label[index] - ``` - -- **\_\_len\_\_** +```text +datasets/ +├── cifar-10-batches-py +│ ├── batches.meta +│ ├── data_batch_1 +│ ├── data_batch_2 +│ ├── data_batch_3 +│ ├── data_batch_4 +│ ├── data_batch_5 +│ ├── readme.html +│ └── test_batch +└── cifar-10-python.tar.gz +``` - Define the `__len__` function of the dataset class and return the number of samples in the dataset. +## Iterating Dataset - ```python - def __len__(self): - return len(self.data) - ``` +You can use `create_dict_iterator` interface to create a data iterator to iteratively access data. The data type of the access is `Tensor` by default, and if `output_numpy=True` is set, the data type of the access is `Numpy`. -After the dataset class is defined, the `GeneratorDataset` API can be used to load and access dataset samples in the user-defined mode. +The following shows the corresponding access data type, and the image shapes and labels. ```python -dataset_generator = DatasetGenerator() -dataset = ds.GeneratorDataset(dataset_generator, ["data", "label"], shuffle=False) +data = next(dataset.create_dict_iterator()) +print(f"Data type:{type(data['image'])}\nImage shape: {data['image'].shape}, Label: {data['label']}") -for data in dataset.create_dict_iterator(): - print('{}'.format(data["data"]), '{}'.format(data["label"])) +data = next(dataset.create_dict_iterator(output_numpy=True)) +print(f"Data type:{type(data['image'])}\nImage shape: {data['image'].shape}, Label: {data['label']}") ``` ```text - [0.36510558 0.45120592] [0.78888122] - [0.49606035 0.07562207] [0.38068183] - [0.57176158 0.28963401] [0.16271622] - [0.30880446 0.37487617] [0.54738768] - [0.81585667 0.96883469] [0.77994068] +Data type: +Image shape: (6, 3, 32, 32), Label: [7 1 2 8 7 8] +Data type: +Image shape: (6, 3, 32, 32), Label: [8 0 0 2 6 1] ``` ## Data Processing and Augmentation -### Processing Data +### Data Processing -The dataset APIs provided by MindSpore support common data processing methods. You only need to call the corresponding function APIs to quickly process data. +`mindvision.dataset.Cifar10` interface provides data processing capbilities. The data can be processed by simply setting the corresponding attributes. -In the following example, the datasets are shuffled, and then two samples form a batch. +- `shuffle`: Whether to disrupt the order of the datasets, when `True` is set, the order of the datasets is disturbed, `False` by default . +- `batch_size`: The number of data contained in each group. The `batch_size=2` contains 2 data per group, and the default size of the `batch_size` value is 32. +- `repeat_num`: For the number of duplicate datasets. `repeat_num=1` is a dataset, and the default value of the `repeat_num` is 1. ```python -ds.config.set_seed(58) +import numpy as np +import matplotlib.pyplot as plt + +import mindspore.dataset.vision.c_transforms as transforms + +trans = [transforms.HWC2CHW()] +dataset = Cifar10(data_dir, batch_size=6, resize=32, repeat_num=1, shuffle=True, transform=trans) +data = dataset.run() +data = next(data.create_dict_iterator()) -# Shuffle the data sequence. -dataset = dataset.shuffle(buffer_size=10) -# Perform batch operations on datasets. -dataset = dataset.batch(batch_size=2) +images = data["image"].asnumpy() +labels = data["label"].asnumpy() +print(f"Image shape: {images.shape}, Label: {labels}") -for data in dataset.create_dict_iterator(): - print("data: {}".format(data["data"])) - print("label: {}".format(data["label"])) +plt.figure() +for i in range(1, 7): + plt.subplot(2, 3, i) + image_trans = np.transpose(images[i-1], (1, 2, 0)) + plt.title(f"{dataset.index2label[labels[i-1]]}") + plt.imshow(image_trans, interpolation="None") +plt.show() ``` ```text - data: [[0.36510558 0.45120592] - [0.57176158 0.28963401]] - label: [[0.78888122] - [0.16271622]] - data: [[0.30880446 0.37487617] - [0.49606035 0.07562207]] - label: [[0.54738768] - [0.38068183]] - data: [[0.81585667 0.96883469]] - label: [[0.77994068]] +Image shape: (6, 3, 32, 32), Label: [9 3 8 9 6 8] ``` -Where, - -`buffer_size`: size of the buffer for shuffle operations in the dataset. - -`batch_size`: number of data records in each group. Currently, each group contains 2 data records. - ### Data Augmentation -If the data volume is too small or the sample scenario is simple, the model training effect is affected. You can perform the data augmentation operation to expand the sample diversity and improve the generalization capability of the model. +Problems such as too small amount of data or single sample scene will affect the training effect of the model, and users can expand the diversity of samples through data augmentation operations to improve the generalization ability of the model. The `mindvision.dataset.Cifar10` interface uses the default data augmentation feature, which allows users to perform data augmentation operations by setting attribute `transform` and `target_transform`. -The following example uses the operators in the `mindspore.dataset.vision.c_transforms` module to perform data argumentation on the MNIST dataset. +- `transform`: augment dataset image data. +- `target_transform`: process the dataset label data. -Import the `c_transforms` module and load the MNIST dataset. +This section describes data augmentation of the CIFAR-10 dataset by using operators in the `mindspore.dataset.vision .c_transforms` module. ```python +import numpy as np import matplotlib.pyplot as plt -from mindspore.dataset.vision import Inter -import mindspore.dataset.vision.c_transforms as c_vision - -DATA_DIR = './datasets/MNIST_Data/train' - -mnist_dataset = ds.MnistDataset(DATA_DIR, num_samples=6, shuffle=False) - -# View the original image data. -mnist_it = mnist_dataset.create_dict_iterator() -data = next(mnist_it) -plt.imshow(data['image'].asnumpy().squeeze(), cmap=plt.cm.gray) -plt.title(data['label'].asnumpy(), fontsize=20) +import mindspore.dataset.vision.c_transforms as transforms + +# Image augmentation +trans = [ + transforms.RandomCrop((32, 32), (4, 4, 4, 4)), # Automatic cropping of images + transforms.RandomHorizontalFlip(prob=0.5), # Flip the image randomly and horizontally + transforms.HWC2CHW(), # Convert (h, w, c) to (c, h, w) +] + +dataset = Cifar10(data_dir, batch_size=6, resize=32, transform=trans) +data = dataset.run() +data = next(data.create_dict_iterator()) +images = data["image"].asnumpy() +labels = data["label"].asnumpy() +print(f"Image shape: {images.shape}, Label: {labels}") + +plt.figure() +for i in range(1, 7): + plt.subplot(2, 3, i) + image_trans = np.transpose(images[i-1], (1, 2, 0)) + plt.title(f"{dataset.index2label[labels[i-1]]}") + plt.imshow(image_trans, interpolation="None") plt.show() ``` -![png](./images/output_13_0.PNG) - -Define the data augmentation operator, perform the `Resize` and `RandomCrop` operations on the dataset, and insert the dataset into the data processing pipeline through `map` mapping. - -```python -resize_op = c_vision.Resize(size=(200,200), interpolation=Inter.LINEAR) -crop_op = c_vision.RandomCrop(150) -transforms_list = [resize_op, crop_op] -mnist_dataset = mnist_dataset.map(operations=transforms_list, input_columns=["image"]) -``` - -View the data augmentation effect. - -```python -mnist_dataset = mnist_dataset.create_dict_iterator() -data = next(mnist_dataset) -plt.imshow(data['image'].asnumpy().squeeze(), cmap=plt.cm.gray) -plt.title(data['label'].asnumpy(), fontsize=20) -plt.show() +```text +Image shape: (6, 3, 32, 32), Label: [7 6 7 4 5 3] ``` -![png](./images/output_17_0.PNG) - -For more information, see [Data augmentation](https://www.mindspore.cn/docs/programming_guide/en/master/augmentation.html). diff --git a/tutorials/source_en/beginner/images/Introduction4.png b/tutorials/source_en/beginner/images/Introduction4.png new file mode 100644 index 0000000000000000000000000000000000000000..0f897ef607fdd55924be9f8ec8ced24b1a6ea73e Binary files /dev/null and b/tutorials/source_en/beginner/images/Introduction4.png differ diff --git a/tutorials/source_en/beginner/images/introduction2.png b/tutorials/source_en/beginner/images/introduction2.png index 61ff0d91e4ee5a54f1a2a05e4e3f000b1af30c58..afb59aced75a3505f209794731c97973feebbf13 100644 Binary files a/tutorials/source_en/beginner/images/introduction2.png and b/tutorials/source_en/beginner/images/introduction2.png differ diff --git a/tutorials/source_en/beginner/introduction.ipynb b/tutorials/source_en/beginner/introduction.ipynb index db36c0734a4613ae36ab0a2fa6d7d1f0b46cd7ff..5705e245147e26350c61233bc2ec0ab355414227 100644 --- a/tutorials/source_en/beginner/introduction.ipynb +++ b/tutorials/source_en/beginner/introduction.ipynb @@ -5,26 +5,47 @@ "source": [ "# Overview\n", "\n", - "`Ascend` `GPU` `CPU` `Device` `Beginner`\n", + "[![View-Source](https://gitee.com/mindspore/docs/raw/tutorials-develop/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/tutorials-develop/tutorials/source_en/beginner/introduction.ipynb)\n", "\n", "The following describes the Huawei AI full-stack solution and introduces the position of MindSpore in the solution. Developers who are interested in MindSpore can visit the [MindSpore community](https://gitee.com/mindspore/mindspore) and click [Watch, Star, and Fork](https://gitee.com/mindspore/mindspore).\n", "\n", "## MindSpore Introduction\n", "\n", - "MindSpore is a deep learning framework in all scenarios, aiming to achieve easy development, efficient execution, and all-scenario coverage. Easy development features friendly APIs and easy debugging. Efficient execution is reflected in computing, data preprocessing, and distributed training. All-scenario coverage means that the framework supports cloud, edge, and device scenarios.\n", + "MindSpore is a deep learning framework in all scenarios, aiming to achieve easy development, efficient execution, and all-scenario coverage.\n", + "\n", + "Easy development features friendly APIs and easy debugging. Efficient execution is reflected in computing, data preprocessing, and distributed training. All-scenario coverage means that the framework supports cloud, edge, and device scenarios.\n", "\n", "The following figure shows the overall MindSpore architecture:\n", "\n", - "![MindSpore](images/introduction2.png)\n", + "![MindSpore](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_en/beginner/images/introduction2.png)\n", "\n", "- **ModelZoo**: ModelZoo provides available deep learning algorithm networks, and more developers are welcome to contribute new networks ([ModelZoo](https://gitee.com/mindspore/models)).\n", - "- **MindSpore Extend**: The expansion package of MindSpore expands the support of new fields, such as GNN/deep probabilistic programming/reinforcement learning, etc. We look forward to more developers to contribute and build together.\n", - "- **MindScience**:MindScience is a scientific computing kits for various industries based on the converged MindSpore framefork. It contains the industry-leading datasets, basic network structures, high-precision pre-trained models, and pre-and post-processing tools that accelerate application development of the scientific computing ([More Information](https://mindspore.cn/mindscience/docs/en/master/index.html)).\n", - "- **MindExpression**: Python-based frontend expression and programming interfaces. In the future, more frontends based on C/C++ will be provided. Cangjie, Huawei's self-developed programming language frontend, is now in the pre-research phase. In addition, Huawei is working on interconnection with third-party frontends to introduce more third-party ecosystems.\n", - "- **MindCompiler**: The core compiler of the layer, which implements three major functions based on the unified device-cloud MindIR, including hardware-independent optimization (type derivation, automatic differentiation, and expression simplification), hardware-related optimization (automatic parallelism, memory optimization, graph kernel fusion, and pipeline execution) and optimization related to deployment and inference (quantification and pruning).\n", - "- **MindRT**: MindSpore runtime system, which covers the cloud-side host-side runtime system, the device-side and the lightweight runtime system of the smaller IoT.\n", - "- **MindInsight**: Provides MindSpore's visual debugging and tuning tools, and supports users to debug and tune the training network ([More Information](https://mindspore.cn/mindinsight/docs/en/master/index.html)).\n", - "- **MindArmour**: For enterprise-level applications, security and privacy protection related enhancements, such as anti-robustness, model security testing, differential privacy training, privacy leakage risk assessment, data drift detection, etc. technology ([More Information](https://mindspore.cn/mindarmour/docs/en/master/index.html)).\n", + "- **Extend**: The expansion package of MindSpore expands the support of new fields, such as GNN/deep probabilistic programming/reinforcement learning, etc. We look forward to more developers to contribute and build together.\n", + "- **Science**:MindScience is a scientific computing kits for various industries based on the converged MindSpore framefork. It contains the industry-leading datasets, basic network structures, high-precision pre-trained models, and pre-and post-processing tools that accelerate application development of the scientific computing ([More Information](https://mindspore.cn/mindscience/docs/en/master/index.html)).\n", + "- **Expression**: Python-based frontend expression and programming interfaces. In the future, more frontends based on C/C++ will be provided. Cangjie, Huawei's self-developed programming language frontend, is now in the pre-research phase. In addition, Huawei is working on interconnection with third-party frontends to introduce more third-party ecosystems.\n", + "- **Data**: Providing functions such as efficient data processing, common data sets loading and programming interfaces, and supporting users to flexibly define processing registration and pipeline parallel optimization.\n", + "- **Compiler**: The core compiler of the layer, which implements three major functions based on the unified device-cloud MindIR, including hardware-independent optimization (type derivation, automatic differentiation, and expression simplification), hardware-related optimization (automatic parallelism, memory optimization, graph kernel fusion, and pipeline execution) and optimization related to deployment and inference (quantification and pruning).\n", + "- **Runtime**: MindSpore runtime system, which covers the cloud-side host-side runtime system, the device-side and the lightweight runtime system of the smaller IoT.\n", + "- **Insight**: Provides MindSpore's visual debugging and tuning tools, and supports users to debug and tune the training network ([More Information](https://mindspore.cn/mindinsight/docs/en/master/index.html)).\n", + "- **Armour**: For enterprise-level applications, security and privacy protection related enhancements, such as anti-robustness, model security testing, differential privacy training, privacy leakage risk assessment, data drift detection, etc. technology ([More Information](https://mindspore.cn/mindarmour/docs/en/master/index.html)).\n", + "\n", + "### Execution Process\n", + "\n", + "With an understanding of the overall architecture of Ascend MindSpore, we can look at the overall coordination relationship between the various modules, as shown in the figure:\n", + "\n", + "![MindSpore](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_en/beginner/images/introduction4.png)\n", + "\n", + "As a full-scenario AI framework, MindSpore supports different series of hardware for end (mobile phone and IOT device), edge (base station and routing device), and cloud (server) scenarios, including Ascend series products, NVIDIA series products, Arm series Qualcomm Snapdragon, Huawei Kirin chips and other series of products.\n", + "\n", + "The blue box on the left is the MindSpore main framework, which mainly provides the basic API functions related to the training and verification of neural networks, and also provides automatic differentiation and automatic parallelism by default.\n", + "\n", + "Below the blue box is the MindSpore Data module, which can be used for data preprocessing, including data sampling, data iteration, data format conversion and other different data operations. In the process of training, many debugging and tuning problems will be encountered, so the MindSpore Insight module visualizes the data related to debugging and tuning such as loss curves, operator execution, and weight parameter variables, so as to facilitate users to debug and optimize during training.\n", + "\n", + "The simplest way to be AI security is from the perspective of attack and defense, for example, the attacker introduces malicious data during the training stage, affecting the inference ability of the AI model, so MindSpore launched the MindSpore Armour module to provide AN security mechanisms for MindSpore.\n", + "\n", + "The content above the blue box is closer to the users related to algorithm development, including storing a large number of AI algorithm model libraries ModelZoo, providing a development tool suite for different fields MindSpore DevKit, and a high-level expansion library MindSpore Extend, which is worth mentioning the scientific computing suite MindSciences in MindSpore Extend. For the first time, MindSpore explores the combination of scientific computing and deep learning, the combination of numerical computing and deep learning, and the support of electromagnetic simulation, drug molecule simulation and so on through deep learning.\n", + "\n", + "After the neural network model is trained, you can export the model or load the model that has been trained in MindSpore Hub. Then MindIR provides a unified IR format for the end cloud, which defines the logical structure of the network and the properties of the operators through the unified IR, and decouples the model file in the MindIR format with the hardware platform to achieve multiple deployments at one time. Therefore, as shown in the figure, the model is exported to different modules through IR to perform inference.\n", "\n", "### Design Philosophy\n", "\n", @@ -40,7 +61,7 @@ "\n", " At present, there are two execution modes of mainstream deep learning frameworks, they are Graph mode and PyNative mode. Graph mode has high training performance but is difficult to debug. Although the Pynative mode is easier to debug than the static graph mode, it is difficult to execute efficiently. MindSpore provides a unified coding method for PyNative graphs and static graphs, which greatly increases their compatibility. Users do not need to develop multiple sets of codes, and can switch the mode only by changing one line of code. For example, users can set `context.set_context(mode=context.PYNATIVE_MODE)` to switch to the PyNative mode and set `context.set_context(mode=context.GRAPH_MODE)` to switch to the Graph mode, users can have an easier development, debugging and performance experience.\n", "\n", - "- Use functional differentiable programming architecture, allowing users to focus on the mathematical native expression of model algorithms\n", + "- Use functional differentiable programming architecture and allow users to focus on the mathematical native expression of model algorithms\n", "\n", " Neural network models are usually trained based on gradient descent algorithms, but the manual derivation process is complicated and the results are prone to errors. The Automatic Differentiation mechanism based on Source Code Transformation of MindSpore adopts a functional differentiable programming architecture, and provides a Python programming interface at the interface layer, including the expression of control flow. Users can focus on the mathematically native expression of the model algorithm without manual derivation.\n", "\n", @@ -52,7 +73,7 @@ "\n", "To support network building, entire graph execution, subgraph execution, and single-operator execution, MindSpore provides users with three levels of APIs. In ascending order, these are Low-Level Python API, Medium-Level Python API, and High-Level Python API.\n", "\n", - "![MindSpore API](images/introduction3.png)\n", + "![MindSpore API](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_zh_cn/beginner/images/introduction3.png)\n", "\n", "- High-Level Python API\n", "\n", diff --git a/tutorials/source_en/beginner/model.md b/tutorials/source_en/beginner/model.md index d5a2ba96fd7c8e6200d76985cc614f231d8297e3..5db7bb9a7fee5bbea721a951aa31ce11984c91ac 100644 --- a/tutorials/source_en/beginner/model.md +++ b/tutorials/source_en/beginner/model.md @@ -1,41 +1,51 @@ # Building a Neural Network -`Ascend` `GPU` `CPU` `Beginner` `Model Development` + - +A neural network model consists of multiple data operation layers. `mindspore.nn` provides various basic network modules. The following uses LeNet-5 as an example to first describe how to build a neural network model by using `mindspore.nn` , and then describes how to build a LeNet-5 network model by using `mindvision.classification.models`. -A neural network model consists of multiple data operation layers. `mindspore.nn` provides various basic network modules. +> `mindvision.classification.models` is a network model interface developed based on `mindspore.nn`, providing some classic and commonly used network models for the convenience of users. -The following uses LeNet as an example to describe how MindSpore builds a neural network model. +## LeNet-5 model -Import the required modules and APIs: +[LeNet-5](https://ieeexplore.ieee.org/document/726791) is a typical convolutional neural network proposed by Professor Yann LeCun in 1998, which achieves 99.4% accuracy on the MNIST dataset and is the first classic in the field of CNN. The model structure is shown in the following figure: -```python -import numpy as np -import mindspore -import mindspore.nn as nn -from mindspore import Tensor -``` +![LeNet-5](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_zh_cn/beginner/images/lenet.png) + +According to the network structure of LeNet, there are 7 layers of LeNet removal input layer, including 3 convolutional layers, 2 sub-sampling layers, and 3 fully-connected layers. ## Defining a Model Class +In the above figure, C is used to represent the convolutional layer, S to represent the sampling layer, and F to represent the fully-connected layer. + +The input size of the picture is fixed at 32∗32. In order to get a good convolution effect, the number is required in the center of the picture, so the size at 32∗32 is actually the result of the picture at 28∗28 after filled. In addition, unlike the input picture of the three channels of the CNN network, the input of the LeNet picture is only a normalized binary image. The output of the network is a prediction probability of ten digits 0\~9, which can be understood as the probability that the input image belongs to 0\~9 digits. + The `Cell` class of MindSpore is the base class for building all networks and the basic unit of a network. When a neural network is required, you need to inherit the `Cell` class and overwrite the `__init__` and `construct` methods. ```python +import mindspore.nn as nn + class LeNet5(nn.Cell): """ Lenet network structure """ def __init__(self, num_class=10, num_channel=1): super(LeNet5, self).__init__() - # Define the required operation. + # Convolutional layer, the number of input channels is num_channel, the number of output channels is 6, and the convolutional kernel size is 5*5 self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid') + # Convolutional layer, the number of input channels is 6, the number of output channels is 16, and the convolutional kernel size is 5 * 5 self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid') + # Fully connected layer, the number of inputs is 16*5*5, and the number of outputs is 120 self.fc1 = nn.Dense(16 * 5 * 5, 120) + # Fully-connected layer, the number of inputs is 120, and the number of outputs is 84 self.fc2 = nn.Dense(120, 84) + # Fully connected layer, the number of inputs is 84, and the number of classifications is num_class self.fc3 = nn.Dense(84, num_class) + # ReLU Activation function self.relu = nn.ReLU() + # Pooling layer self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) + # Multidimensional arrays are flattened into one-dimensional arrays self.flatten = nn.Flatten() def construct(self, x): @@ -55,23 +65,50 @@ class LeNet5(nn.Cell): return x ``` +Next, build the neural network model defined above and look at the structure of the network model. + +```python +model = LeNet5() + +print(model) +``` + +```text +LeNet5< + (conv1): Conv2d + (conv2): Conv2d + (fc1): Dense + (fc2): Dense + (fc3): Dense + (relu): ReLU<> + (max_pool2d): MaxPool2d + (flatten): Flatten<> + > +``` + ## Model Layers -The following describes the key member functions of the `Cell` class used in LeNet, and then describes how to use the `Cell` class to access model parameters through the instantiation network. +The following describes the key member functions of the `Cell` class used in LeNet, and then describes how to use the `Cell` class to access model parameters through the instantiation network. For more `cell` class contents, refer to [mindspore.nn interface](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.nn.html). ### nn.Conv2d Add the `nn.Conv2d` layer and add a convolution function to the network to help the neural network extract features. ```python -conv2d = nn.Conv2d(1, 6, 5, has_bias=False, weight_init='normal', pad_mode='valid') -input_x = Tensor(np.ones([1, 1, 32, 32]), mindspore.float32) +import numpy as np + +from mindspore import Tensor +from mindspore import dtype as mstype + +# The number of channels input is 1, the number of channels of output is 6, the convolutional kernel size is 5*5, and the parameters are initialized using the norm operator, and the pixels are not filled +conv2d = nn.Conv2d(1, 6, 5, has_bias=False, weight_init='normal', pad_mode='same') +input_x = Tensor(np.ones([1, 1, 32, 32]), mstype.float32) print(conv2d(input_x).shape) ``` ```text - (1, 6, 28, 28) +(8, 6, 32, 32) ``` ### nn.ReLU @@ -80,9 +117,10 @@ Add the `nn.ReLU` layer and add a non-linear activation function to the network ```python relu = nn.ReLU() -input_x = Tensor(np.array([-1, 2, -3, 2, -1]), mindspore.float16) -output = relu(input_x) +input_x = Tensor(np.array([-1, 2, -3, 2, -1]), mstype.float16) + +output = relu(input_x) print(output) ``` @@ -95,19 +133,19 @@ print(output) Initialize the `nn.MaxPool2d` layer and down-sample the 6 x 28 x 28 array to a 6 x 14 x 14 array. ```python -max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) -input_x = Tensor(np.ones([1, 6, 28, 28]), mindspore.float32) +max_pool2d = nn.MaxPool2d(kernel_size=4, stride=4) +input_x = Tensor(np.ones([1, 6, 28, 28]), mstype.float32) print(max_pool2d(input_x).shape) ``` ```text - (1, 6, 14, 14) + (1, 6, 7, 7) ``` ### nn.Flatten -Initialize the `nn.Flatten` layer and convert the 16 x 5 x 5 array into 400 consecutive arrays. +Initialize the `nn.Flatten` layer and convert the 1x16 x 5 x 5 array into 400 consecutive arrays. ```python flatten = nn.Flatten() @@ -127,7 +165,7 @@ Initialize the `nn.Dense` layer and perform linear transformation on the input m ```python dense = nn.Dense(400, 120, weight_init='normal') -input_x = Tensor(np.ones([1, 400]), mindspore.float32) +input_x = Tensor(np.ones([1, 400]), mstype.float32) output = dense(input_x) print(output.shape) @@ -139,23 +177,46 @@ print(output.shape) ## Model Parameters -The convolutional layer and fully-connected layer in the network will have weights and offsets after being instantiated, and these weight and offset parameters are optimized in subsequent training. In `nn.Cell`, the `parameters_and_names()` method is used to access all parameters. +The convolutional layer and fully-connected layer in the network will have weights and offsets after being instantiated, which has a weight parameter and a bias parameter, and these parameters are optimized in subsequent training. During training, you can use `get_parameters()` to view information such as the name, shape, data type, and whether the network layers are inversely calculated. + +```python +for m in model.get_parameters(): + print(f"layer:{m.name}, shape:{m.shape}, dtype:{m.dtype}, requeires_grad:{m.requires_grad}") +``` + +```text +layer:backbone.conv1.weight, shape:(6, 1, 5, 5), dtype:Float32, requeires_grad:True +layer:backbone.conv2.weight, shape:(16, 6, 5, 5), dtype:Float32, requeires_grad:True +layer:backbone.fc1.weight, shape:(120, 400), dtype:Float32, requeires_grad:True +layer:backbone.fc1.bias, shape:(120,), dtype:Float32, requeires_grad:True +layer:backbone.fc2.weight, shape:(84, 120), dtype:Float32, requeires_grad:True +layer:backbone.fc2.bias, shape:(84,), dtype:Float32, requeires_grad:True +layer:backbone.fc3.weight, shape:(10, 84), dtype:Float32, requeires_grad:True +layer:backbone.fc3.bias, shape:(10,), dtype:Float32, requeires_grad:True +``` + +## Quickly Build a LeNet-5 Network Model -In the example, we traverse each parameter and display the name and attribute of each layer in the network. +The above describes the use of `mindspore.nn.cell` to build a LeNet-5 network model. The network model interface has been built in `mindvision.classification.models`, and the LeNet-5 network model can be directly built using the `lenet` interface. ```python -model = LeNet5() -for m in model.parameters_and_names(): - print(m) +from mindvision.classification.models import lenet + +# num_classes represents the category of the classification, and pretrained indicates whether to train with the trained model +model = lenet(num_classes=10, pretrained=False) + +for m in model.get_parameters(): + print(f"layer:{m.name}, shape:{m.shape}, dtype:{m.dtype}, requeires_grad:{m.requires_grad}") ``` ```text -('conv1.weight', Parameter (name=conv1.weight, shape=(6, 1, 5, 5), dtype=Float32, requires_grad=True)), -('conv2.weight', Parameter (name=conv2.weight, shape=(16, 6, 5, 5), dtype=Float32, requires_grad=True)), -('fc1.weight', Parameter (name=fc1.weight, shape=(120, 400), dtype=Float32, requires_grad=True)), -('fc1.bias', Parameter (name=fc1.bias, shape=(120,), dtype=Float32, requires_grad=True)), -('fc2.weight', Parameter (name=fc2.weight, shape=(84, 120), dtype=Float32, requires_grad=True)), -('fc2.bias', Parameter (name=fc2.bias, shape=(84,), dtype=Float32, requires_grad=True)), -('fc3.weight', Parameter (name=fc3.weight, shape=(10, 84), dtype=Float32, requires_grad=True)), -('fc3.bias', Parameter (name=fc3.bias, shape=(10,), dtype=Float32, requires_grad=True)) +layer:backbone.conv1.weight, shape:(6, 1, 5, 5), dtype:Float32, requeires_grad:True +layer:backbone.conv2.weight, shape:(16, 6, 5, 5), dtype:Float32, requeires_grad:True +layer:backbone.fc1.weight, shape:(120, 400), dtype:Float32, requeires_grad:True +layer:backbone.fc1.bias, shape:(120,), dtype:Float32, requeires_grad:True +layer:backbone.fc2.weight, shape:(84, 120), dtype:Float32, requeires_grad:True +layer:backbone.fc2.bias, shape:(84,), dtype:Float32, requeires_grad:True +layer:backbone.fc3.weight, shape:(10, 84), dtype:Float32, requeires_grad:True +layer:backbone.fc3.bias, shape:(10,), dtype:Float32, requeires_grad:True ``` + diff --git a/tutorials/source_en/beginner/model_train.md b/tutorials/source_en/beginner/model_train.md deleted file mode 100644 index b283b65c4bc70213a64bad6085a2d04c3a391cbb..0000000000000000000000000000000000000000 --- a/tutorials/source_en/beginner/model_train.md +++ /dev/null @@ -1,186 +0,0 @@ -# Training the Model - -`Ascend` `GPU` `CPU` `Beginner` `Model Development` - - - -After learning how to create a model and build a dataset in the preceding tutorials, you can start to learn how to set hyperparameters and optimize model parameters. - -## Hyperparameters - -Hyperparameters can be adjusted to control the model training and optimization process. Different hyperparameter values may affect the model training and convergence speed. - -Generally, the following hyperparameters are defined for training: - -- Epoch: specifies number of times that the dataset is traversed during training. -- Batch size: specifies the size of each batch of data to be read. -- Learning rate: If the learning rate is low, the convergence speed slows down. If the learning rate is high, unpredictable results such as no training convergence may occur. - -```python -epochs = 5 -batch_size = 64 -learning_rate = 1e-3 -``` - -## Loss Functions - -The **loss function** is used to evaluate the difference between **predicted value** and **actual value** of a model. Here, the absolute error loss function `L1Loss` is used. `mindspore.nn.loss` provides many common loss functions, such as `SoftmaxCrossEntropyWithLogits`, `MSELoss`, and `SmoothL1Loss`. - -The output value and target value are provided to compute the loss value. The method is as follows: - -```python -import numpy as np -import mindspore.nn as nn -from mindspore import Tensor - -loss = nn.L1Loss() -output_data = Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32)) -target_data = Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32)) -print(loss(output_data, target_data)) -``` - -```text - 1.5 -``` - -## Optimizer - -An optimizer is used to compute and update the gradient. The selection of the model optimization algorithm directly affects the performance of the final model. A poor effect may be caused by the optimization algorithm instead of the feature or model design. All optimization logic of MindSpore is encapsulated in the `Optimizer` object. Here, the Momentum optimizer is used. `mindspore.nn` provides many common optimizers, such as `Adam` and `Momentum`. - -You need to build an `Optimizer` object. This object can retain the current parameter status and update parameters based on the computed gradient. - -To build an `Optimizer`, we need to provide an iterator that contains parameters (must be variable objects) to be optimized. For example, set `params` to `net.trainable_params()` for all `parameter` that can be trained on the network. Then, you can set the `Optimizer` parameter options, such as the learning rate and weight attenuation. - -A code example is as follows: - -```python -from mindspore import nn - -optim = nn.Momentum(net.trainable_params(), 0.1, 0.9) -``` - -## Training - -A model training process is generally divided into four steps. - -1. Define a neural network. -2. Build a dataset. -3. Define hyperparameters, a loss function, and an optimizer. -4. Enter the epoch and dataset for training. - -Execute the following command to download and decompress the dataset to the specified location. - -```python -import os -import requests -import tarfile -import zipfile - -def download_dataset(url, target_path): - """download dataset""" - if not os.path.exists(target_path): - os.makedirs(target_path) - download_file = url.split("/")[-1] - if not os.path.exists(download_file): - res = requests.get(url, stream=True, verify=False) - if download_file.split(".")[-1] not in ["tgz","zip","tar","gz"]: - download_file = os.path.join(target_path, download_file) - with open(download_file, "wb") as f: - for chunk in res.iter_content(chunk_size=512): - if chunk: - f.write(chunk) - if download_file.endswith("zip"): - z = zipfile.ZipFile(download_file, "r") - z.extractall(path=target_path) - z.close() - if download_file.endswith(".tar.gz") or download_file.endswith(".tar") or download_file.endswith(".tgz"): - t = tarfile.open(download_file) - names = t.getnames() - for name in names: - t.extract(name, target_path) - t.close() - -download_dataset("https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz", "./datasets") -``` - -The code example for model training is as follows: - -```python -import mindspore.dataset as ds -import mindspore.dataset.transforms.c_transforms as C -import mindspore.dataset.vision.c_transforms as CV -from mindspore import nn, Tensor, Model -from mindspore import dtype as mstype -from mindspore.train.callback import LossMonitor - -DATA_DIR = "./datasets/cifar-10-batches-bin" - -# Define a neural network. -class Net(nn.Cell): - def __init__(self, num_class=10, num_channel=3): - super(Net, self).__init__() - self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid') - self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid') - self.fc1 = nn.Dense(16 * 5 * 5, 120) - self.fc2 = nn.Dense(120, 84) - self.fc3 = nn.Dense(84, num_class) - self.relu = nn.ReLU() - self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) - self.flatten = nn.Flatten() - - def construct(self, x): - x = self.conv1(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.conv2(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.flatten(x) - x = self.fc1(x) - x = self.relu(x) - x = self.fc2(x) - x = self.relu(x) - x = self.fc3(x) - return x - -net = Net() -epochs = 5 -batch_size = 64 -learning_rate = 1e-3 - -# Build a dataset. -sampler = ds.SequentialSampler(num_samples=128) -dataset = ds.Cifar10Dataset(DATA_DIR, sampler=sampler) - -# Convert the data type. -type_cast_op_image = C.TypeCast(mstype.float32) -type_cast_op_label = C.TypeCast(mstype.int32) -HWC2CHW = CV.HWC2CHW() -dataset = dataset.map(operations=[type_cast_op_image, HWC2CHW], input_columns="image") -dataset = dataset.map(operations=type_cast_op_label, input_columns="label") -dataset = dataset.batch(batch_size) - -# Define hyperparameters, a loss function, and an optimizer. -optim = nn.Momentum(net.trainable_params(), learning_rate, 0.9) -loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') -cb = LossMonitor() - -# Enter the epoch and dataset for training. -model = Model(net, loss_fn=loss, optimizer=optim) -model.train(epoch=epochs, train_dataset=dataset, callbacks=cb) -``` - -The output is as follows: - -```text -epoch: 1 step: 1, loss is 2.3025818 -epoch: 1 step: 2, loss is 2.3025775 -epoch: 2 step: 1, loss is 2.3025408 -epoch: 2 step: 2, loss is 2.3025331 -epoch: 3 step: 1, loss is 2.3024616 -epoch: 3 step: 2, loss is 2.302457 -epoch: 4 step: 1, loss is 2.3023522 -epoch: 4 step: 2, loss is 2.3023558 -epoch: 5 step: 1, loss is 2.3022182 -epoch: 5 step: 2, loss is 2.3022337 -``` \ No newline at end of file diff --git a/tutorials/source_en/beginner/save_load.md b/tutorials/source_en/beginner/save_load.md new file mode 100644 index 0000000000000000000000000000000000000000..7bf065834e94bb79b45feb065ba7b67adb72cf77 --- /dev/null +++ b/tutorials/source_en/beginner/save_load.md @@ -0,0 +1,196 @@ +# Saving and Loading the Model + + + +The content of the previous chapter mainly introduced how to adjust hyperparameters and train network models. During the process of training the network model, we actually want to save the intermediate and final results for fine-tune and subsequent model deployment and inference, and now start learning how to set hyperparameters and optimize model parameters. + +## Model Training + +The following are the basic steps and code for network model training, with sample code as follows: + +```python +import mindspore.nn as nn +from mindspore.train import Model + +from mindvision.classification.dataset import Mnist +from mindvision.classification.models import lenet +from mindvision.engine.callback import LossMonitor + +epochs = 10 # Training batch + +# 1. Build a dataset +download_train = Mnist(path="./mnist", split="train", batch_size=batch_size, repeat_num=1, shuffle=True, resize=32, download=True) +dataset_train = download_train.run() + +# 2. Define a neural network +network = lenet(num_classes=10, pretrained=False) +# 3.1 Define a loss function +net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') +# 3.2 Defines an optimizer function +net_opt = nn.Momentum(network.trainable_params(), learning_rate=0.01, momentum=0.9) +# 3.3 Initialize model parameters +model = Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={'accuracy'}) + +# 4. Perform training on the neural network +model.train(epochs, dataset_train, callbacks=[LossMonitor(0.01, 1875)]) +``` + +```text +Epoch:[ 0/ 10], step:[ 1875/ 1875], loss:[0.148/1.210], time:2.021 ms, lr:0.01000 +Epoch time: 4251.808 ms, per step time: 2.268 ms, avg loss: 1.210 +Epoch:[ 1/ 10], step:[ 1875/ 1875], loss:[0.049/0.081], time:2.048 ms, lr:0.01000 +Epoch time: 4301.405 ms, per step time: 2.294 ms, avg loss: 0.081 +Epoch:[ 2/ 10], step:[ 1875/ 1875], loss:[0.014/0.050], time:1.992 ms, lr:0.01000 +Epoch time: 4278.799 ms, per step time: 2.282 ms, avg loss: 0.050 +Epoch:[ 3/ 10], step:[ 1875/ 1875], loss:[0.035/0.038], time:2.254 ms, lr:0.01000 +Epoch time: 4380.553 ms, per step time: 2.336 ms, avg loss: 0.038 +Epoch:[ 4/ 10], step:[ 1875/ 1875], loss:[0.130/0.031], time:1.932 ms, lr:0.01000 +Epoch time: 4287.547 ms, per step time: 2.287 ms, avg loss: 0.031 +Epoch:[ 5/ 10], step:[ 1875/ 1875], loss:[0.003/0.027], time:1.981 ms, lr:0.01000 +Epoch time: 4377.000 ms, per step time: 2.334 ms, avg loss: 0.027 +Epoch:[ 6/ 10], step:[ 1875/ 1875], loss:[0.004/0.023], time:2.167 ms, lr:0.01000 +Epoch time: 4687.250 ms, per step time: 2.500 ms, avg loss: 0.023 +Epoch:[ 7/ 10], step:[ 1875/ 1875], loss:[0.004/0.020], time:2.226 ms, lr:0.01000 +Epoch time: 4685.529 ms, per step time: 2.499 ms, avg loss: 0.020 +Epoch:[ 8/ 10], step:[ 1875/ 1875], loss:[0.000/0.016], time:2.275 ms, lr:0.01000 +Epoch time: 4651.129 ms, per step time: 2.481 ms, avg loss: 0.016 +Epoch:[ 9/ 10], step:[ 1875/ 1875], loss:[0.022/0.015], time:2.177 ms, lr:0.01000 +Epoch time: 4623.760 ms, per step time: 2.466 ms, avg loss: 0.015 +``` + +As you can see from the printed results above, the loss values tend to converge as the number of training rounds increases. + +## Saving the Model + +After training the network, the following will describe how to save and load the model. There are two main ways to save the interface of the model: + +1. One is to simply save the network model, which can be saved before and after training. The advantage is that the interface is simple and easy to use, but only the state of the network model when the command is executed is retained; + +2. The other one is to save the interface during network model training. In the process of network model training, MindSpore automatically saves the parameters of the epoch number and step number set during training, that is, the intermediate weight parameters generated during the model training process are also saved to facilitate network fine-tuning and stop training. + +### Saving the Model Directly + +Use the save_checkpoint provided by MindSpore to save the model, and pass it to the network and save the path: + +```python +import mindspore as ms + +# The defined network model is net, which is generally used before or after training +ms.save_checkpoint(net, "./MyNet.ckpt") +``` + +Here, `network` is the training network, and `"./MyNet.ckpt"` is the saving path of the network model. + +### Saving the Model During Training + +In the process of model training, use the `callbacks` parameter in `model.train` to pass in the object [ModelCheckpoint](https://mindspore.cn/docs/api/en/master/api_python/mindspore.train.html#mindspore.train.callback.ModelCheckpoint) that saves the model (Generally used with [CheckpointConfig](https://mindspore.cn/docs/api/en/master/api_python/mindspore.train.html#mindspore.train.callback.CheckpointConfig)), which can save the model parameters and generate CheckPoint (abbreviated as ckpt) files. + +You can configure the checkpoint policies as required. The following describes the usage: + +```python +from mindspore.train.callback import ModelCheckpoint, CheckpointConfig + +# Set the number of epoch_num +epoch_num = 5 + +# Set the model saving parameters +config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10) + +# Apply the model saving parameters +ckpoint = ModelCheckpoint(prefix="lenet", directory="./lenet", config=config_ck) +model.train(epoch_num, dataset_train, callbacks=ckpoint) +``` + +In the preceding code, you need to initialize a `CheckpointConfig` class object to set the saving policy. + +- `save_checkpoint_steps` indicates the interval (in steps) for saving the checkpoint file. +- `keep_checkpoint_max` indicates the maximum number of checkpoint files that can be retained. +- `prefix` indicates the prefix of the generated checkpoint file. +- `directory` indicates the directory for storing files. + +Create a `ModelCheckpoint` object and pass it to the `model.train` method. Then the checkpoint function can be used during training. + +The generated checkpoint file is as follows: + +```text +lenet-graph.meta # Computational graph after compiled. +lenet-1_1875.ckpt # The extension of the checkpoint file is .ckpt. +lenet-2_1875.ckpt # The file name format contains the epoch and step correspond to the saved parameters. Here are the model parameters for the 1875th step of the 2nd epoch. +lenet-3_1875.ckpt # indicates the model parameters saved for the 1875th step of the 3rd epoch. +... +``` + +If you use the same prefix and run the training script for multiple times, checkpoint files with the same name may be generated. To help users distinguish files generated each time, MindSpore adds underscores "_" and digits to the end of the user-defined prefix. If you want to delete the `.ckpt` file, delete the `.meta` file at the same time. + +For example, `lenet_3-2_1875.ckpt` indicates the CheckPoint file of the 1875th step of the 2nd epoch generated by running the 3rd script. + +## Loading the Model + +To load the model weight, you need to create an instance of the same model and then use the `load_checkpoint` and `load_param_into_net` methods to load parameters. + +The sample code is as follows: + +```python +from mindspore import load_checkpoint, load_param_into_net + +from mindvision.classification.dataset import Mnist +from mindvision.classification.models import lenet + +# Store the model parameters in the parameter dictionary, where the model parameters saved during the training process above are loaded +param_dict = load_checkpoint("./lenet/lenet-1_1875.ckpt") + +# Redefine a LeNet neural network +net = lenet(num_classes=10, pretrained=False) + +# Load parameters to the network +load_param_into_net(network, param_dict) +model = Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={"accuracy"}) +``` + +- The `load_checkpoint` method loads the network parameters in the parameter file to the `param_dict` dictionary. +- The `load_param_into_net` method loads the parameters in the `param_dict` dictionary to the network or optimizer. After the loading, parameters in the network are stored by the checkpoint. + +### Validating the Model + +After the above module loads the parameters into the network, for the inference scenario, you can call the `eval` function for inference verification. The sample code is as follows: + +```python +# Call eval() for inference. +download_eval = Mnist(path="./mnist", split="test", batch_size=32, resize=32, download=True) +dataset_eval = download_eval.run() +acc = model.eval(dataset_eval) + +print("{}".format(acc)) +``` + +```text +{'accuracy': 0.9857772435897436} +``` + +### For Transfer Learning + +For task interruption retraining and fine-tuning scenarios, you can call the `train` function for transfer learning. The sample code is as follows: + +```python +# Define a training dataset. +download_train = Mnist(path="./mnist", split="train", batch_size=32, repeat_num=1, shuffle=True, resize=32, download=True) +dataset_train = download_train.run() + +# Network model calls train() for training. +model.train(epoch_num, dataset_train, callbacks=[LossMonitor(0.01, 1875)]) +``` + +```text +Epoch:[ 0/ 5], step:[ 1875/ 1875], loss:[0.000/0.010], time:2.193 ms, lr:0.01000 +Epoch time: 4106.620 ms, per step time: 2.190 ms, avg loss: 0.010 +Epoch:[ 1/ 5], step:[ 1875/ 1875], loss:[0.000/0.009], time:2.036 ms, lr:0.01000 +Epoch time: 4233.697 ms, per step time: 2.258 ms, avg loss: 0.009 +Epoch:[ 2/ 5], step:[ 1875/ 1875], loss:[0.000/0.010], time:2.045 ms, lr:0.01000 +Epoch time: 4246.248 ms, per step time: 2.265 ms, avg loss: 0.010 +Epoch:[ 3/ 5], step:[ 1875/ 1875], loss:[0.000/0.008], time:2.001 ms, lr:0.01000 +Epoch time: 4235.036 ms, per step time: 2.259 ms, avg loss: 0.008 +Epoch:[ 4/ 5], step:[ 1875/ 1875], loss:[0.002/0.008], time:2.039 ms, lr:0.01000 +Epoch time: 4354.482 ms, per step time: 2.322 ms, avg loss: 0.008 +``` + + diff --git a/tutorials/source_en/beginner/save_load_model.md b/tutorials/source_en/beginner/save_load_model.md deleted file mode 100644 index aa810586e1fae3f0a6a78c7ded81c9c734b093eb..0000000000000000000000000000000000000000 --- a/tutorials/source_en/beginner/save_load_model.md +++ /dev/null @@ -1,174 +0,0 @@ -# Saving and Loading the Model - -`Ascend` `GPU` `CPU` `Beginner` `Model Export` `Model Loading` - - - -In the previous tutorial, you learn how to train the network. In this tutorial, you will learn how to save and load a model, and how to export a saved model in a specified format to different platforms for inference. - -## Saving the Model - -There are two main ways to save the interface of the model: - -1. One is to simply save the network model, which can be saved before and after training. The advantage is that the interface is simple and easy to use, but only the state of the network model when the command is executed is retained; - -2. The other one is to save the interface during network model training. In the process of network model training, MindSpore automatically saves the parameters of the epoch number and step number set during training, that is, the intermediate weight parameters generated during the model training process are also saved to facilitate network fine-tuning and stop training. - -### Saving the Model Directly - -Use the save_checkpoint provided by MindSpore to save the model, pass it to the network and save the path: - -```python -import mindspore as ms - -# The defined network model is net, which is generally used before or after training -ms.save_checkpoint(net, "./MyNet.ckpt") -``` - -Here, `net` is the training network, and the definition method can be referred to [Building a Neural Network](https://www.mindspore.cn/tutorials/en/master/model.html). - -### Saving the Model During Training - -In the process of model training, use the `callbacks` parameter in `model.train` to pass in the object `ModelCheckpoint` that saves the model, which can save the model parameters and generate CheckPoint (abbreviated as ckpt) files. - -```python -from mindspore.train.callback import ModelCheckpoint - -ckpt_cb = ModelCheckpoint() -model.train(epoch_num, dataset, callbacks=ckpt_cb) -``` - -Here, `epoch_num` is the number of times that the dataset is traversed during training. The definition method can be referred to [Training the Model](https://www.mindspore.cn/tutorials/en/master/optimization.html). `dataset` is the dataset to be loaded. The definition method can be referred to [Loading and Processing Data](https://www.mindspore.cn/tutorials/en/master/dataset.html). - -You can configure the checkpoint policies as required. The following describes the usage: - -```python -from mindspore.train.callback import ModelCheckpoint, CheckpointConfig - -config_ck = CheckpointConfig(save_checkpoint_steps=32, keep_checkpoint_max=10) -ckpt_cb = ModelCheckpoint(prefix='resnet50', directory=None, config=config_ck) -model.train(epoch_num, dataset, callbacks=ckpt_cb) -``` - -In the preceding code, you need to initialize a `CheckpointConfig` class object to set the saving policy. - -- `save_checkpoint_steps` indicates the interval (in steps) for saving the checkpoint file. -- `keep_checkpoint_max` indicates the maximum number of checkpoint files that can be retained. -- `prefix` indicates the prefix of the generated checkpoint file. -- `directory` indicates the directory for storing files. - -Create a `ModelCheckpoint` object and pass it to the `model.train` method. Then the checkpoint function can be used during training. - -The generated checkpoint file is as follows: - -```text -resnet50-graph.meta # Computational graph after build. -resnet50-1_32.ckpt # The extension of the checkpoint file is .ckpt. -resnet50-2_32.ckpt # The file name format contains the epoch and step correspond to the saved parameters. -resnet50-3_32.ckpt # The file name indicates that the model parameters generated during the 32nd step of the third epoch are saved. -... -``` - -If you use the same prefix and run the training script for multiple times, checkpoint files with the same name may be generated. To help users distinguish files generated each time, MindSpore adds underscores (_) and digits to the end of the user-defined prefix. If you want to delete the `.ckpt` file, delete the `.meta` file at the same time. - -For example, `resnet50_3-2_32.ckpt` indicates the checkpoint file generated during the 32nd step of the second epoch after the script is executed for the third time. - -## Loading the Model - -To load the model weight, you need to create an instance of the same model and then use the `load_checkpoint` and `load_param_into_net` methods to load parameters. - -The sample code is as follows: - -```python -from mindspore import load_checkpoint, load_param_into_net - -resnet = ResNet50() -# Store model parameters in the parameter dictionary. -param_dict = load_checkpoint("resnet50-2_32.ckpt") -# Load parameters to the network. -load_param_into_net(resnet, param_dict) -model = Model(resnet, loss, metrics={"accuracy"}) -``` - -- The `load_checkpoint` method loads the network parameters in the parameter file to the `param_dict` dictionary. -- The `load_param_into_net` method loads the parameters in the `param_dict` dictionary to the network or optimizer. After the loading, parameters in the network are stored by the checkpoint. - -### Validating the Model - -In the inference-only scenario, parameters are directly loaded to the network for subsequent inference and validation. The sample code is as follows: - -```python -# Define a validation dataset. -dataset_eval = create_dataset(os.path.join(mnist_path, "test"), 32, 1) - -# Call eval() for inference. -acc = model.eval(dataset_eval) -``` - -### For Transfer Learning - -You can load network parameters and optimizer parameters to the model in the case of task interruption, retraining, and fine-tuning. The sample code is as follows: - -```python -# Set the number of training epochs. -epoch = 1 -# Define a training dataset. -dataset = create_dataset(os.path.join(mnist_path, "train"), 32, 1) -# Call train() for training. -model.train(epoch, dataset) -``` - -## Exporting the Model - -During model training, you can add checkpoints to save model parameters for inference and retraining. If you want to perform inference on different hardware platforms, you can generate MindIR, AIR, or ONNX files based on the network and checkpoint files. - -The following describes how to save a checkpoint file and export a MindIR, AIR, or ONNX file. - -> MindSpore is an all-scenario AI framework that uses MindSpore IR to unify intermediate representation of network models. Therefore, you are advised to export files in MindIR format. - -### Exporting a MindIR File - -If you want to perform inference across platforms or hardware (such as the Ascend AI Processors, MindSpore devices, or GPUs) after obtaining a checkpoint file, you can define the network and checkpoint to generate a model file in MINDIR format. Currently, the inference network export based on static graphs is supported and does not contain control flow semantics. An example of the code for exporting the file is as follows: - -```python -from mindspore import export, load_checkpoint, load_param_into_net -from mindspore import Tensor -import numpy as np - -resnet = ResNet50() -# Store model parameters in the parameter dictionary. -param_dict = load_checkpoint("resnet50-2_32.ckpt") - -# Load parameters to the network. -load_param_into_net(resnet, param_dict) -input = np.random.uniform(0.0, 1.0, size=[32, 3, 224, 224]).astype(np.float32) -export(resnet, Tensor(input), file_name='resnet50-2_32', file_format='MINDIR') -``` - -> - `input` specifies the input shape and data type of the exported model. If the network has multiple inputs, you need to pass them to the `export` method. Example: `export(network, Tensor(input1), Tensor(input2), file_name='network', file_format='MINDIR')` -> - If `file_name` does not contain the ".mindir" suffix, the system will automatically add the ".mindir" suffix to it. - -### Exporting in Other Formats - -#### Exporting an AIR File - -If you want to perform inference on the Ascend AI Processor after obtaining a checkpoint file, use the network and checkpoint to generate a model file in AIR format. An example of the code for exporting the file is as follows: - -```python -export(resnet, Tensor(input), file_name='resnet50-2_32', file_format='AIR') -``` - -> - `input` specifies the input shape and data type of the exported model. If the network has multiple inputs, you need to pass them to the `export` method. Example: `export(network, Tensor(input1), Tensor(input2), file_name='network', file_format='AIR')` -> - If `file_name` does not contain the ".air" suffix, the system will automatically add the ".air" suffix to it. - -#### Exporting an ONNX File - -If you want to perform inference on other third-party hardware after obtaining a checkpoint file, use the network and checkpoint to generate a model file in ONNX format. An example of the code for exporting the file is as follows: - -```python -export(resnet, Tensor(input), file_name='resnet50-2_32', file_format='ONNX') -``` - -> - `input` specifies the input shape and data type of the exported model. If the network has multiple inputs, you need to pass them to the `export` method. Example: `export(network, Tensor(input1), Tensor(input2), file_name='network', file_format='ONNX')` -> - If `file_name` does not contain the ".onnx" suffix, the system will automatically add the ".onnx" suffix to it. -> - Currently, only the ONNX format export of ResNet series networks and BERT are supported. diff --git a/tutorials/source_en/beginner/tensor.ipynb b/tutorials/source_en/beginner/tensor.ipynb index 81ce058371652560a20679179833c839912a1712..4a869794849656861cf2062e3e54662a76209bf8 100644 --- a/tutorials/source_en/beginner/tensor.ipynb +++ b/tutorials/source_en/beginner/tensor.ipynb @@ -5,33 +5,22 @@ "source": [ "# Tensor\n", "\n", - "`Ascend` `GPU` `CPU` `Beginner`\n", + "[![View-Source](https://gitee.com/mindspore/docs/raw/tutorials-develop/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/tutorials-develop/tutorials/source_en/beginner/tensor.ipynb)\n", "\n", - "[Tensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.Tensor.html) is a basic data structure in the MindSpore network computing. This chapter mainly introduces the properties and usage of tensors and sparse tensors.\n", + "Tensor is a multilinear function that can be used to represent linear relationships between vectors , scalars , and other tensors. The basic examples of these linear relations are the inner product , the outer product , the linear map , and the Cartesian product. In the $n$ dimensional space, its coordinates have $n^{r}$ components, where each component is a function of coordinates, and these components are also linearly transformed according to certain rules when the coordinates are transformed. $r$ is called the rank or order of this tensor (neither has anything to do with the rank and order of the matrix).\n", "\n", - "First, prepare the environment required to run this document. In this document, the mode is set to Graph mode, and the hardware is set to CPU. Users can configure it according to the actual environment." + "Tensor is a special data structure that is very similar to arrays and matrices. [Tensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.Tensor.html) is the basic data structure in MindSpore network operations, and this chapter mainly introduces the attributes and usage of the tensor and the sparse tensor." ], "metadata": {} }, - { - "cell_type": "code", - "execution_count": 1, - "source": [ - "from mindspore import context\n", - "\n", - "context.set_context(mode=context.GRAPH_MODE, device_target=\"CPU\")" - ], - "outputs": [], - "metadata": {} - }, { "cell_type": "markdown", "source": [ - "## Initializing a Tensor\n", + "## Building the Tensor\n", "\n", - "There are multiple methods for initializing tensors. When building a tensor, you can pass the Tensor with `float`, `int`, `bool`, `tuple`, `list`, and `NumPy.array` types.\n", + "There are multiple methods for initializing the tensor. When building the tensor, you can pass the Tensor with `float`, `int`, `bool`, `tuple`, `list`, and `NumPy.array` types.\n", "\n", - "- **Generate a tensor based on data.**\n", + "- **Generating a tensor based on data.**\n", "\n", "You can create a tensor based on data. The data type can be set or automatically inferred." ], @@ -51,7 +40,7 @@ { "cell_type": "markdown", "source": [ - "- **Generate a tensor from the NumPy array.**\n", + "- **Generating a tensor from the NumPy array.**\n", "\n", "You can create a tensor from the NumPy array." ], @@ -64,11 +53,23 @@ "import numpy as np\n", "\n", "arr = np.array([1, 0, 1, 0])\n", - "x_np = Tensor(arr)" + "tensor_arr = Tensor(arr)\n", + "\n", + "print(type(arr))\n", + "print(type(tensor_arr))" ], "outputs": [], "metadata": {} }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "" + ] + }, { "cell_type": "markdown", "source": [ @@ -79,13 +80,13 @@ { "cell_type": "markdown", "source": [ - "- **Generate a tensor from the init**\n", + "- **Generating a tensor by using the init**\n", "\n", - "You can create a tensor with the `init`, `shape` and `dtype`.\n", + "When using the `init` to initialize a tensor, the parameters that support passing are `init`, `shape` and `dtype`.\n", "\n", - "- `init`: Supported subclasses of incoming Subclass of [initializer](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.common.initializer.html).\n", - "- `shape`: Supported subclasses of incoming `list`, `tuple`, `int`.\n", - "- `dtype`: Supported subclasses of incoming [mindspore.dtype](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.html#mindspore.dtype)." + "- `init`: Support passing the subclass of [initializer](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.common.initializer.html).\n", + "- `shape`: Support passing `list`, `tuple`, `int`.\n", + "- `dtype`: Support passing [mindspore.dtype](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore.html#mindspore.dtype)." ], "metadata": {} }, @@ -102,17 +103,20 @@ "\n", "tensor1 = Tensor(shape=(2, 2), dtype=mstype.float32, init=One())\n", "tensor2 = Tensor(shape=(2, 2), dtype=mstype.float32, init=Normal())\n", - "print(tensor1)\n", - "print(tensor2)" + "\n", + "print(\"tensor1:\\n\", tensor1)\n", + "print(\"tensor2:\\n\", tensor2)" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "[[1. 1.]\n", + "tensor1:\n", + " [[1. 1.]\n", " [1. 1.]]\n", - "[[-0.00128023 -0.01392901]\n", + "tensor2:\n", + " [[-0.00128023 -0.01392901]\n", " [ 0.0130886 -0.00107818]]\n" ] } @@ -122,9 +126,9 @@ { "cell_type": "markdown", "source": [ - "The `init` is used for delayed initialization in parallel mode. Usually, it is not recommended to use `init` interface to initialize parameters in other conditions.\n", + "The `init` is used for delayed initialization in parallel mode. Usually, it is not recommended to use `init` interface to initialize parameters.\n", "\n", - "- **Inherit attributes of another tensor to form a new tensor.**" + "- **Inheriting attributes of another tensor to form a new tensor.**" ], "metadata": {} }, @@ -137,7 +141,10 @@ "oneslike = ops.OnesLike()\n", "x = Tensor(np.array([[0, 1], [2, 1]]).astype(np.int32))\n", "output = oneslike(x)\n", - "print(output)" + "\n", + "print(output)\n", + "print(\"input shape:\", x.shape)\n", + "print(\"output shape:\", output.shape)" ], "outputs": [ { @@ -145,7 +152,9 @@ "name": "stdout", "text": [ "[[1 1]\n", - " [1 1]]\n" + " [1 1]]\n", + "input shape: (2, 2)\n", + "output shape: (2, 2)\n" ] } ], @@ -154,7 +163,7 @@ { "cell_type": "markdown", "source": [ - "- **Output a constant tensor of a specified size.**\n", + "- **Outputting a constant tensor of a specified size.**\n", "\n", "`shape` is the size tuple of a tensor, which determines the dimension of the output tensor." ], @@ -167,7 +176,6 @@ "shape = (2, 2)\n", "ones = ops.Ones()\n", "output = ones(shape, mstype.float32)\n", - "print(output)\n", "\n", "zeros = ops.Zeros()\n", "output = zeros(shape, mstype.float32)\n", @@ -190,25 +198,25 @@ { "cell_type": "markdown", "source": [ - "During `Tensor` initialization, dtype can be specified to, for example, `mstype.int32`, `mstype.float32` or `mstype.bool_`.\n", + "During `Tensor` initialization, dtype can be specified, for example, `mstype.int32`, `mstype.float32` or `mstype.bool_`.\n", "\n", "## Tensor Attributes\n", "\n", "Tensor attributes include shape, data type, transposed tensor, item size, number of bytes occupied, dimension, size of elements, and stride per dimension.\n", "\n", - "- shape: the shape of `Tensor`, it is a tuple.\n", + "- shape: the shape of `Tensor`, a tuple.\n", "\n", - "- dtype: the dtype of `Tensor`, it is a data type of MindSpore.\n", + "- dtype: the dtype of `Tensor`, a data type of MindSpore.\n", "\n", - "- T: the Transpose of `Tensor`, it is also a `Tensor`.\n", + "- T: the Transpose of `Tensor`, also a `Tensor`.\n", "\n", "- itemsize: the number of bytes occupied by each element in `Tensor` is an integer.\n", "\n", - "- nbytes: the total number of bytes occupied by `Tensor`, as an integer.\n", + "- nbytes: the total number of bytes occupied by `Tensor`, an integer.\n", "\n", - "- ndim: the rank of a `Tensor`, which is len(tensor.shape), is an integer.\n", + "- ndim: the rank of a `Tensor`, which is len(tensor.shape), an integer.\n", "\n", - "- size: the number of all elements in `Tensor`, it is an integer.\n", + "- size: the number of all elements in `Tensor`, an integer.\n", "\n", "- strides: the number of bytes to traverse in each dimension of `Tensor`." ], @@ -252,7 +260,7 @@ { "cell_type": "markdown", "source": [ - "## Index\n", + "## Tensor Indexing\n", "\n", "Tensor indexing is similar to Numpy indexing, indexing starts from 0, negative indexing means indexing in reverse order, and colons `:` and `...` are used for slicing." ], @@ -263,6 +271,7 @@ "execution_count": 8, "source": [ "tensor = Tensor(np.array([[0, 1], [2, 3]]).astype(np.float32))\n", + "\n", "print(\"First row: {}\".format(tensor[0]))\n", "print(\"value of top right corner: {}\".format(tensor[1, 1]))\n", "print(\"Last column: {}\".format(tensor[:, -1]))\n", @@ -289,7 +298,7 @@ "\n", "There are many operations between tensors, including arithmetic, linear algebra, matrix processing (transposing, indexing, and slicing), and sampling. The following describes several operations. The usage of tensor computation is similar to that of NumPy.\n", "\n", - "Common arithmetic operations include: addition (+), subtraction (-), multiplication (\\*), division (/), modulo (%), power (\\*\\*), and exact division (//)." + ">Common arithmetic operations include: addition (+), subtraction (-), multiplication (\\*), division (/), modulo (%), power (\\*\\*), and exact division (//)." ], "metadata": {} }, @@ -297,15 +306,14 @@ "cell_type": "code", "execution_count": 9, "source": [ - "x = Tensor(np.array([1.0, 2.0, 3.0]), mstype.float32)\n", - "y = Tensor(np.array([4.0, 5.0, 6.0]), mstype.float32)\n", + "x = Tensor(np.array([1, 2, 3]), mstype.int32)\n", + "y = Tensor(np.array([4, 5, 6]), mstype.int32)\n", "\n", "output_add = x + y\n", "output_sub = x - y\n", "output_mul = x * y\n", "output_div = y / x\n", - "output_mod = x % y\n", - "output_pow = x ** 2\n", + "output_mod = y % x\n", "output_floordiv = y // x\n", "\n", "print(\"add:\", output_add)\n", @@ -313,7 +321,6 @@ "print(\"mul:\", output_mul)\n", "print(\"div:\", output_div)\n", "print(\"mod:\", output_mod)\n", - "print(\"pow:\", output_pow)\n", "print(\"floordiv:\", output_floordiv)" ], "outputs": [ @@ -321,13 +328,12 @@ "output_type": "stream", "name": "stdout", "text": [ - "add: [5. 7. 9.]\n", - "sub: [-3. -3. -3.]\n", - "mul: [ 4. 10. 18.]\n", - "div: [4. 2.5 2. ]\n", - "mod: [1. 2. 3.]\n", - "pow: [1. 4. 9.]\n", - "floordiv: [4. 2. 2.]\n" + "add: [5 7 9]\n", + "sub: [-3 -3 -3]\n", + "mul: [ 4 10 18]\n", + "div: [4 2 2]\n", + "mod: [0 1 0]\n", + "floordiv: [4 2 2]\n" ] } ], @@ -348,7 +354,9 @@ "data2 = Tensor(np.array([[4, 5], [6, 7]]).astype(np.float32))\n", "op = ops.Concat()\n", "output = op((data1, data2))\n", - "print(output)" + "\n", + "print(output)\n", + "print(\"shape:\\n\", output.shape)" ], "outputs": [ { @@ -358,7 +366,9 @@ "[[0. 1.]\n", " [2. 3.]\n", " [4. 5.]\n", - " [6. 7.]]\n" + " [6. 7.]]\n", + "shape:\n", + " (4, 2)\n" ] } ], @@ -379,7 +389,9 @@ "data2 = Tensor(np.array([[4, 5], [6, 7]]).astype(np.float32))\n", "op = ops.Stack()\n", "output = op([data1, data2])\n", - "print(output)" + "\n", + "print(output)\n", + "print(\"shape:\\n\", output.shape)" ], "outputs": [ { @@ -390,7 +402,9 @@ " [2. 3.]]\n", "\n", " [[4. 5.]\n", - " [6. 7.]]]\n" + " [6. 7.]]]\n", + "shape:\n", + " (2, 2, 2)\n" ] } ], @@ -414,8 +428,10 @@ "execution_count": 12, "source": [ "zeros = ops.Zeros()\n", + "\n", "output = zeros((2, 2), mstype.float32)\n", "print(\"output: {}\".format(type(output)))\n", + "\n", "n_output = output.asnumpy()\n", "print(\"n_output: {}\".format(type(n_output)))" ], @@ -446,6 +462,7 @@ "source": [ "output = np.array([1, 0, 1, 0])\n", "print(\"output: {}\".format(type(output)))\n", + "\n", "t_output = Tensor(output)\n", "print(\"t_output: {}\".format(type(t_output)))" ], @@ -467,15 +484,17 @@ "source": [ "## Sparse Tensor\n", "\n", - "`SparseTensor` is a special kind of tensor which most of the elements are zero. In some scenario (e.g., Recommendation Systems, Molecular Dynamics, Graph Neural Networks), the data is sparse. If we use common dense tensors to represent the data, we may introduce many\n", + "The sparse tensor is a special kind of tensor which most of the elements are zero.\n", + "\n", + "In some scenario (e.g., Recommendation Systems, Molecular Dynamics, Graph Neural Networks), the data is sparse. If we use common dense tensors to represent the data, we may introduce many\n", "unnecessary calculations, storage and communication costs. In this situation, it is better to use sparse tensor to\n", - "represent the data. Common sparse data formats includes `COO`, `CSR`, `CSC`, `DIA`, etc. Different sparse formats are better fit for different scenario. For now, MindSpore supports both `CSR` and `COO`.\n", + "represent the data.\n", "\n", - "The common structure of sparse tensor is ``. `indices` means index of\n", - "non-zero elements, `values` means the values of these non-zero elements and `dense_shape` means the dense shape of\n", - "the sparse tensor. Using this structure, we define data structure `CSRTensor`, `COOTensor` and `RowTensor`.\n", + "MindSpore now supports the two most commonly used `CSR` and `COO` sparse data formats.\n", "\n", - "> Both `COOTensor` and `CSRTensor` support Graph Mode and PyNative Mode." + "The common structure of the sparse tensor is ``. `indices` means index of\n", + "non-zero elements, `values` means the values of these non-zero elements and shape means the dense shape of\n", + "the sparse tensor. Using this structure, we define data structure `CSRTensor`, `COOTensor` and `RowTensor`." ] }, { @@ -484,17 +503,19 @@ "source": [ "### CSRTensor\n", "\n", - "`CSR`(Compressed Sparse Row) is efficient in both storage and computation. All the non-zeros values are stored in `values`, and their positions are stored in `indptr`(row position) and `indices` (column position).\n", + "`CSR`(Compressed Sparse Row) is efficient in both storage and computation. All the non-zero values are stored in `values`, and their positions are stored in `indptr`(row position) and `indices` (column position).\n", + "\n", + "- `indptr`: 1-D integer tensor, indicating the start and end points of the non-zero element of each row of the sparse data in `values`. Index data type only supports int32 for now.\n", "\n", - "- `indptr`: 1-D Tensor of size shape[0] + 1, which indicates the start and end point for `values` in each row. Data type only supports `int32` for now.\n", + "- `indices`: 1-D integer tensor, indicating the position of the sparse tensor non-zero element in the column and has the same length as `values`. Index data type only supports int32 for now.\n", "\n", - "- `indices`: 1-D Tensor, which has the same length as `values`. `indices` indicates the which column values should be placed. Data type only supports `int32` for now.\n", + "- `values`: 1-D tensor, indicating that the value of the non-zero element corresponding to the `CSRTensor` and has the same length as `indices`.\n", "\n", - "- `values`: 1-D Tensor, which has the same length as `indices`. `values` stores the data for CSRTensor.\n", + "- `shape`: indicating that the shape of a compressed sparse tensor. The data type is `Tuple`, and currently only 2-D `CSRTensor` is supported.\n", "\n", - "- `shape`: A tuple indicates the shape of the CSRTensor, its length must be 2, as only 2-D CSRTensor is currently supported, and shape[0] must equal to indptr[0] - 1, which all equal to number of rows of the CSRTensor.\n", + ">For more details of the `CSRTensor`, please see [mindspore.CSRTensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.CSRTensor.html).\n", "\n", - "For more details, please see [mindspore.CSRTensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.CSRTensor.html)." + "Here are some examples of how CSRTensor can be used:\n" ] }, { @@ -511,14 +532,15 @@ } ], "source": [ - "# constructs CSRTensor\n", "import mindspore as ms\n", "from mindspore import Tensor, CSRTensor\n", "\n", - "indptr = Tensor([0, 1, 2], dtype=mstype.int32)\n", - "indices = Tensor([0, 1], dtype=mstype.int32)\n", + "indptr = Tensor([0, 1, 2])\n", + "indices = Tensor([0, 1])\n", "values = Tensor([1, 2], dtype=ms.float32)\n", "shape = (2, 4)\n", + "\n", + "# constructs CSRTensor\n", "csr_tensor = CSRTensor(indptr, indices, values, shape)\n", "\n", "print(csr_tensor.astype(ms.float64).dtype)" @@ -530,16 +552,16 @@ "source": [ "### COOTensor\n", "\n", - "`COO`(Coordinate Format) represents a set of nonzero elememts from a tensor at given indices. If the number of non-zero elements\n", + "`COOTensor` is used to compress Tensors with irregular distribution of non-zero elements. If the number of non-zero elements\n", "is `N` and the dense shape of the sparse tensor is `ndims`:\n", "\n", - "- `indices`: A 2-D integer Tensor of shape `[N, ndims]`. Each line represents the index of non-zero elements. Data type only supports `int32` for now.\n", - "- `values`: A 1-D tensor of any type and shape `[N]`. Represents the value of non-zero elements.\n", - "- `shape`: A integer tuple of size `ndims`, which specifies the dense shape of the sparse tensor, Currently only supports 2-D `COOTensor`.\n", + "- `indices`: 2-D integer Tensor and each row indicates a non-zero element subscript. Shape: `[N, ndims]`. Index data type only supports int32 for now.\n", + "- `values`: 1-D tensor of any type, indicating the value of non-zero elements. Shape: `[N]`.\n", + "- `shape`: indicating a dense shape of the sparse tensor, currently only 2-D `COOTensor` is supported.\n", "\n", - "For more details, please see [mindspore.COOTensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.COOTensor.html).\n", + ">For more details for `COOTensor`, please see [mindspore.COOTensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.COOTensor.html).\n", "\n", - "Some code examples are given below:" + "Here are some examples of how COOTensor can be used:" ] }, { @@ -560,7 +582,6 @@ } ], "source": [ - "# Constructs COOTensor\n", "import mindspore as ms\n", "import mindspore.nn as nn\n", "from mindspore import Tensor, COOTensor\n", @@ -568,22 +589,31 @@ "indices = Tensor([[0, 1], [1, 2]])\n", "values = Tensor([1, 2], dtype=ms.float32)\n", "shape = (3, 4)\n", + "\n", + "# constructs COOTensor\n", "coo_tensor = COOTensor(indices, values, shape)\n", "\n", "print(coo_tensor.values)\n", "print(coo_tensor.indices)\n", "print(coo_tensor.shape)\n", - "# COOTensor cast to another data type\n", - "print(coo_tensor.astype(ms.float64).dtype)" + "print(coo_tensor.astype(ms.float64).dtype) # COOTensor cast to another data type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The codes above produce a `COOTensor`:\n", - "\n", - "![COOTensor](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_en/beginner/images/tensor_1.PNG)" + "The codes above produce a `COOTensor` as following:\n", + "\n", + "$$\n", + " \\left[\n", + " \\begin{matrix}\n", + " 0 & 1 & 0 & 0 \\\\\n", + " 0 & 0 & 2 & 0 \\\\\n", + " 0 & 0 & 0 & 0\n", + " \\end{matrix}\n", + " \\right]\n", + "$$" ] }, { @@ -594,13 +624,13 @@ "\n", "`RowTensor` is used to compress tensors that are sparse in the zeroth dimension. If the dimension of `RowTensor` is `[L0, D1, D2, ..., DN ]`. The number of non-zero elements in the zeroth dimension is `D0`, then `L0 >> D0`.\n", "\n", - "- `indices`: one-dimensional integer tensor representing the position of non-zero elements in the zeroth dimension of the sparse tensor, shape: `[D0]`.\n", + "- `indices`: 1-D integer tensor, indicating the position of non-zero elements in the zeroth dimension of the sparse tensor, shape: `[D0]`.\n", "\n", - "- `values`: represents the value of the corresponding non-zero element, shape: `[D0, D1, D2, ..., DN]`.\n", + "- `values`: indicating the value of the corresponding non-zero element, shape: `[D0, D1, D2, ..., DN]`.\n", "\n", - "- `dense_shape`: represents the shape of the compressed sparse tensor.\n", + "- `dense_shape`: indicating the shape of the compressed sparse tensor.\n", "\n", - "> `RowTensor` can only be used in the constructor of `Cell`." + "> `RowTensor` can only be used in the constructor of `Cell`. For the detailed contents, refer to the code example in [mindspore.RowTensor](https://www.mindspore.cn/docs/api/en/master/api_python/mindspore/mindspore.RowTensor.html)" ] }, { @@ -619,14 +649,14 @@ } ], "source": [ - "from mindspore import RowTensor, Tensor\n", - "from mindspore.common import dtype as mstype\n", + "from mindspore import RowTensor\n", "import mindspore.nn as nn\n", "\n", "class Net(nn.Cell):\n", " def __init__(self, dense_shape):\n", " super(Net, self).__init__()\n", " self.dense_shape = dense_shape\n", + "\n", " def construct(self, indices, values):\n", " x = RowTensor(indices, values, self.dense_shape)\n", " return x.values, x.indices, x.dense_shape\n", @@ -634,6 +664,7 @@ "indices = Tensor([0])\n", "values = Tensor([[1, 2]], dtype=mstype.float32)\n", "out = Net((3, 2))(indices, values)\n", + "\n", "print(\"non-zero values:\", out[0])\n", "print(\"non-zero indices:\", out[1])\n", "print(\"shape:\", out[2])" diff --git a/tutorials/source_en/beginner/train.md b/tutorials/source_en/beginner/train.md new file mode 100644 index 0000000000000000000000000000000000000000..092a56c00012bfd4283d81889781375919d3edc6 --- /dev/null +++ b/tutorials/source_en/beginner/train.md @@ -0,0 +1,132 @@ +# Training the Model + + + +After learning how to create a model and build a dataset in the preceding tutorials, you can start to learn how to set hyperparameters and optimize model parameters. + +## Hyperparameters + +Hyperparameters can be adjusted to control the model training and optimization process. Different hyperparameter values may affect the model training and convergence speed. At present, deep learning models are mostly optimized by a batch stochastic gradient descent algorithm, and the principle of the stochastic gradient descent algorithm is as follows: +$$ +w_{t+1}=w_{t}-\eta \frac{1}{n} \sum_{x \in \mathcal{B}} \nabla l\left(x, w_{t}\right) +$$ +where $n$ is the batch size, and $η$ is a learning rate. In addition, $w_{t}$ is the weight parameter in the training batch t, and $\nabla l$ is the derivative of the loss function. It can be known that in addition to the gradient itself, these two factors directly determine the weight update of the model, and from the optimization itself, they are the most important parameters that affect the performance convergence of the model. Generally, the following hyperparameters are defined for training: + +- Epoch: specifies number of times that the dataset is traversed during training. +- Batch size: The dataset is trained for batch reading, setting the size of each batch of data. The batch size is too small, takes a lot of time, and the gradient oscillation is serious, which is not conducive to convergence. The batch size is too large, and the gradient direction of different batches does not change at all, which is easy to fall into local minimum values. In this way, an appropriate batch size needs to be chosen, to effectively improve the accuracy of the model and global convergence. +- Learning rate: If the learning rate is low, the convergence speed slows down. If the learning rate is high, unpredictable results such as no training convergence may occur. Gradient descent is a parameter optimization algorithm that is widely used to minimize model errors. Gradient descent estimates the parameters of the model by iterating and minimizing the loss function at each step. The learning rate is that during the iteration process, the learning progress of the model will be controlled. + +![learning-rate](https://gitee.com/mindspore/docs/raw/tutorials-develop/tutorials/source_zh_cn/beginner/images/learning_rate.png) + +```python +epochs = 10 +batch_size = 32 +momentum = 0.9 +learning_rate = 1e-2 +``` + +## Loss Functions + +The **loss function** is used to evaluate the difference between **predicted value** and **target value** of a model. Here, the absolute error loss function `L1Loss` is used: +$$ +\text { L1 Loss Function }=\sum_{i=1}^{n}\left|y_{true}-y_{predicted}\right| +$$ + `mindspore.nn.loss` provides many common loss functions, such as `SoftmaxCrossEntropyWithLogits`, `MSELoss`, and `SmoothL1Loss`. + +Given the predicted value and the target value, we calculate the error (loss value) between the predicted value and the target value by means of a loss function, which is used as follows: + +```python +import numpy as np +import mindspore.nn as nn +from mindspore import Tensor + +loss = nn.L1Loss() +output_data = Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32)) +target_data = Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32)) +print(loss(output_data, target_data)) +``` + +```text + 1.5 +``` + +## Optimizer Functions + +An optimizer is used to compute and update the gradient. The selection of the model optimization algorithm directly affects the performance of the final model. A poor effect may be caused by the optimization algorithm instead of the feature or model design. + +All optimization logic of MindSpore is encapsulated in the `Optimizer` object. Here, the Momentum optimizer is used. `mindspore.nn` provides many common optimizers, such as `Adam`, `SGD` and `RMSProp`. + +You need to build an `Optimizer` object. This object can retain the current parameter status and update parameters based on the computed gradient. To build an `Optimizer`, we need to provide an iterator that contains parameters (must be variable objects) to be optimized. For example, set parameters to `net.trainable_params()` for all `parameter` that can be trained on the network. + +Then, you can set the `Optimizer` parameter options, such as the learning rate and weight attenuation. + +A code example is as follows: + +```python +from mindspore import nn +from mindvision.classification.models import lenet + +net = lenet(num_classes=10, pretrained=False) +optim = nn.Momentum(net.trainable_params(), learning_rate, momentum) +``` + +## Model Training + +A model training process is generally divided into four steps. + +1. Define a neural network. +2. Build a dataset. +3. Define hyperparameters, a loss function, and an optimizer. +4. Enter the epoch and dataset for training. + +The model training sample code is as follows: + +```python +import mindspore.nn as nn +from mindspore.train import Model + +from mindvision.classification.dataset import Mnist +from mindvision.classification.models import lenet +from mindvision.engine.callback import LossMonitor + +# 1. Build a dataset +download_train = Mnist(path="./mnist", split="train", batch_size=batch_size, repeat_num=1, shuffle=True, resize=32, download=True) +dataset_train = download_train.run() + +# 2. Define a neural network +network = lenet(num_classes=10, pretrained=False) +# 3.1 Define a loss function +net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') +# 3.2 Defines an optimizer function +net_opt = nn.Momentum(network.trainable_params(), learning_rate=learning_rate, momentum=momentum) +# 3.3 Initialize model parameters +model = Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={'acc'}) + +# 4. Perform training on the neural network +model.train(epochs, dataset_train, callbacks=[LossMonitor(learning_rate, 1875)]) +``` + +```text +Epoch:[ 0/ 10], step:[ 1875/ 1875], loss:[0.189/1.176], time:2.254 ms, lr:0.01000 +Epoch time: 4286.163 ms, per step time: 2.286 ms, avg loss: 1.176 +Epoch:[ 1/ 10], step:[ 1875/ 1875], loss:[0.085/0.080], time:1.895 ms, lr:0.01000 +Epoch time: 4064.532 ms, per step time: 2.168 ms, avg loss: 0.080 +Epoch:[ 2/ 10], step:[ 1875/ 1875], loss:[0.021/0.054], time:1.901 ms, lr:0.01000 +Epoch time: 4194.333 ms, per step time: 2.237 ms, avg loss: 0.054 +Epoch:[ 3/ 10], step:[ 1875/ 1875], loss:[0.284/0.041], time:2.130 ms, lr:0.01000 +Epoch time: 4252.222 ms, per step time: 2.268 ms, avg loss: 0.041 +Epoch:[ 4/ 10], step:[ 1875/ 1875], loss:[0.003/0.032], time:2.176 ms, lr:0.01000 +Epoch time: 4216.039 ms, per step time: 2.249 ms, avg loss: 0.032 +Epoch:[ 5/ 10], step:[ 1875/ 1875], loss:[0.003/0.027], time:2.205 ms, lr:0.01000 +Epoch time: 4400.771 ms, per step time: 2.347 ms, avg loss: 0.027 +Epoch:[ 6/ 10], step:[ 1875/ 1875], loss:[0.000/0.024], time:1.973 ms, lr:0.01000 +Epoch time: 4554.252 ms, per step time: 2.429 ms, avg loss: 0.024 +Epoch:[ 7/ 10], step:[ 1875/ 1875], loss:[0.008/0.022], time:2.048 ms, lr:0.01000 +Epoch time: 4361.135 ms, per step time: 2.326 ms, avg loss: 0.022 +Epoch:[ 8/ 10], step:[ 1875/ 1875], loss:[0.000/0.018], time:2.130 ms, lr:0.01000 +Epoch time: 4547.597 ms, per step time: 2.425 ms, avg loss: 0.018 +Epoch:[ 9/ 10], step:[ 1875/ 1875], loss:[0.008/0.017], time:2.135 ms, lr:0.01000 +Epoch time: 4601.861 ms, per step time: 2.454 ms, avg loss: 0.017 +``` + +The loss value is printed during training. The loss value fluctuates, but in general the loss value decreases gradually and the accuracy gradually increases. The loss values run by different persons have a certain randomness and are not necessarily exactly the same. \ No newline at end of file