From b07efd216b2ee89674bdfe4a7509b3c21c5def26 Mon Sep 17 00:00:00 2001 From: ZhangHui Date: Thu, 22 Apr 2021 17:39:38 +0800 Subject: [PATCH 1/3] Fix Issue #I3NOR1 --- .../quick_start-checkpoint.ipynb | 488 ++++++++++++++++++ tutorials/source_zh_cn/quick_start.ipynb | 17 +- 2 files changed, 495 insertions(+), 10 deletions(-) create mode 100644 tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb diff --git a/tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb b/tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb new file mode 100644 index 0000000000..97350b5acf --- /dev/null +++ b/tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb @@ -0,0 +1,488 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 初学入门\n", + "\n", + "[![](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/quick_start.ipynb) [![](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/master/quick_start/mindspore_quick_start.ipynb) [![](https://gitee.com/mindspore/docs/raw/master/tutorials/training/source_zh_cn/_static/logo_modelarts.png)](https://console.huaweicloud.com/modelarts/?region=cn-north-4#/notebook/loading?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svbW9kZWxhcnRzL3F1aWNrX3N0YXJ0L21pbmRzcG9yZV9xdWlja19zdGFydC5pcHluYg==&image_id=65f636a0-56cf-49df-b941-7d2a07ba8c8c)\n", + "\n", + "本节贯穿MindSpore的基础功能,实现深度学习中的常见任务,请参考各节链接进行更加深入的学习。\n", + "\n", + "## 配置运行信息\n", + "\n", + "MindSpore通过`context.set_context`来配置运行需要的信息,如运行模式、后端信息、硬件等信息。\n", + "\n", + "导入`context`模块,配置运行需要的信息。" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import argparse\n", + "from mindspore import context\n", + "\n", + "parser = argparse.ArgumentParser(description='MindSpore LeNet Example')\n", + "parser.add_argument('--device_target', type=str, default=\"CPU\", choices=['Ascend', 'GPU', 'CPU'])\n", + "\n", + "args = parser.parse_known_args()[0]\n", + "context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "在样例中,我们配置样例运行使用图模式。根据实际情况配置硬件信息,譬如代码运行在Ascend AI处理器上,则`--device_target`选择`Ascend`,代码运行在CPU、GPU同理。详细参数说明,请参见[context.set_context](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/mindspore.context.html)接口说明。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 下载数据集\n", + "\n", + "我们示例中用到的MNIST数据集是由10类28∗28的灰度图片组成,训练数据集包含60000张图片,测试数据集包含10000张图片。\n", + "\n", + "你可以从[MNIST数据集下载页面](http://yann.lecun.com/exdb/mnist/)下载,并按下方目录结构放置,如运行环境为Linux,还可以直接运行如下命令完成下载和放置:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./datasets/MNIST_Data\n", + "├── test\n", + "│   ├── t10k-images-idx3-ubyte\n", + "│   └── t10k-labels-idx1-ubyte\n", + "└── train\n", + " ├── train-images-idx3-ubyte\n", + " └── train-labels-idx1-ubyte\n", + "\n", + "2 directories, 4 files\n" + ] + } + ], + "source": [ + "!mkdir -p ./datasets/MNIST_Data/train ./datasets/MNIST_Data/test\n", + "!wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-labels-idx1-ubyte\n", + "!wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-images-idx3-ubyte\n", + "!wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-labels-idx1-ubyte\n", + "!wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-images-idx3-ubyte\n", + "!tree ./datasets/MNIST_Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 数据处理\n", + "\n", + "数据集对于模型训练非常重要,好的数据集可以有效提高训练精度和效率。\n", + "MindSpore提供了用于数据处理的API模块 `mindspore.dataset` ,用于存储样本和标签。在加载数据集前,我们通常会对数据集进行一些处理,`mindspore.dataset`也集成了常见的数据处理方法。\n", + "\n", + "首先导入MindSpore中`mindspore.dataset`和其他相应的模块。" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import mindspore.dataset as ds\n", + "import mindspore.dataset.transforms.c_transforms as C\n", + "import mindspore.dataset.vision.c_transforms as CV\n", + "from mindspore.dataset.vision import Inter\n", + "from mindspore import dtype as mstype" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "数据集处理主要分为四个步骤:\n", + "\n", + "1. 定义函数`create_dataset`来创建数据集。\n", + "2. 定义需要进行的数据增强和处理操作,为之后进行map映射做准备。\n", + "3. 使用map映射函数,将数据操作应用到数据集。\n", + "4. 进行数据shuffle、batch操作。" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def create_dataset(data_path, batch_size=32, repeat_size=1,\n", + " num_parallel_workers=1):\n", + " # 定义数据集\n", + " mnist_ds = ds.MnistDataset(data_path)\n", + " resize_height, resize_width = 32, 32\n", + " rescale = 1.0 / 255.0\n", + " shift = 0.0\n", + " rescale_nml = 1 / 0.3081\n", + " shift_nml = -1 * 0.1307 / 0.3081\n", + "\n", + " # 定义所需要操作的map映射\n", + " resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)\n", + " rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)\n", + " rescale_op = CV.Rescale(rescale, shift)\n", + " hwc2chw_op = CV.HWC2CHW()\n", + " type_cast_op = C.TypeCast(mstype.int32)\n", + " \n", + " # 使用map映射函数,将数据操作应用到数据集\n", + " mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns=\"label\", num_parallel_workers=num_parallel_workers)\n", + " mnist_ds = mnist_ds.map(operations=resize_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", + " mnist_ds = mnist_ds.map(operations=rescale_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", + " mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", + " mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", + " \n", + " # 进行shuffle、batch操作\n", + " buffer_size = 10000\n", + " mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)\n", + " mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)\n", + "\n", + " return mnist_ds" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "其中,`batch_size`为每组包含的数据个数,现设置每组包含32个数据。\n", + "\n", + "> MindSpore支持进行多种数据处理和增强的操作,具体可以参考[数据处理](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/pipeline.html)和[数据增强](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/augmentation.html)章节。\n", + "\n", + "## 创建模型\n", + "\n", + "使用MindSpore定义神经网络需要继承`mindspore.nn.Cell`。`Cell`是所有神经网络(如`Conv2d-relu-softmax`等)的基类。\n", + "\n", + "神经网络的各层需要预先在`__init__`方法中定义,然后通过定义`construct`方法来完成神经网络的前向构造。按照LeNet的网络结构,定义网络各层如下:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import mindspore.nn as nn\n", + "from mindspore.common.initializer import Normal\n", + "\n", + "class LeNet5(nn.Cell):\n", + " \"\"\"\n", + " Lenet网络结构\n", + " \"\"\"\n", + " def __init__(self, num_class=10, num_channel=1):\n", + " super(LeNet5, self).__init__()\n", + " # 定义所需要的运算\n", + " self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')\n", + " self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')\n", + " self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))\n", + " self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))\n", + " self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))\n", + " self.relu = nn.ReLU()\n", + " self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n", + " self.flatten = nn.Flatten()\n", + "\n", + " def construct(self, x):\n", + " # 使用定义好的运算构建前向网络\n", + " x = self.conv1(x)\n", + " x = self.relu(x)\n", + " x = self.max_pool2d(x)\n", + " x = self.conv2(x)\n", + " x = self.relu(x)\n", + " x = self.max_pool2d(x)\n", + " x = self.flatten(x)\n", + " x = self.fc1(x)\n", + " x = self.relu(x)\n", + " x = self.fc2(x)\n", + " x = self.relu(x)\n", + " x = self.fc3(x)\n", + " return x\n", + "\n", + "# 实例化网络\n", + "net = LeNet5()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + ">阅读更多有关[在MindSpore中构建神经网络](https://www.mindspore.cn/tutorial/training/zh-CN/master/use/defining_the_network.html)的信息。\n", + "\n", + "## 优化模型参数\n", + "\n", + "要训练神经网络模型,需要定义损失函数和优化器。\n", + "\n", + "MindSpore支持的损失函数有`SoftmaxCrossEntropyWithLogits`、`L1Loss`、`MSELoss`等。这里使用交叉熵损失函数`SoftmaxCrossEntropyWithLogits`。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# 定义损失函数\n", + "net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + ">阅读更多有关[在MindSpore中使用损失函数](https://www.mindspore.cn/tutorial/zh-CN/master/optimization.html#损失函数)的信息。\n", + "\n", + "MindSpore支持的优化器有`Adam`、`AdamWeightDecay`、`Momentum`等。这里使用`Momentum`优化器为例。" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# 定义优化器\n", + "net_opt = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + ">阅读更多有关[在MindSpore中使用优化器](https://www.mindspore.cn/tutorial/zh-CN/master/optimization.html#优化器)的信息。\n", + "\n", + "## 训练及保存模型\n", + "\n", + "MindSpore提供了回调Callback机制,可以在训练过程中执行自定义逻辑,这里以使用框架提供的`ModelCheckpoint`为例。\n", + "`ModelCheckpoint`可以保存网络模型和参数,以便进行后续的Fine-tuning(微调)操作。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from mindspore.train.callback import ModelCheckpoint, CheckpointConfig\n", + "# 设置模型保存参数\n", + "config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)\n", + "# 应用模型保存参数\n", + "ckpoint = ModelCheckpoint(prefix=\"checkpoint_lenet\", config=config_ck)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "通过MindSpore提供的`model.train`接口可以方便地进行网络的训练,`LossMonitor`可以监控训练过程中`loss`值的变化。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# 导入模型训练需要的库\n", + "from mindspore.nn import Accuracy\n", + "from mindspore.train.callback import LossMonitor\n", + "from mindspore import Model" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def train_net(args, model, epoch_size, data_path, repeat_size, ckpoint_cb, sink_mode):\n", + " \"\"\"定义训练的方法\"\"\"\n", + " # 加载训练数据集\n", + " ds_train = create_dataset(os.path.join(data_path, \"train\"), 32, repeat_size)\n", + " model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor(125)], dataset_sink_mode=sink_mode)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "其中,`dataset_sink_mode`用于控制数据是否下沉,数据下沉是指数据通过通道直接传送到Device上,可以加快训练速度,`dataset_sink_mode`为True表示数据下沉,否则为非下沉。\n", + "\n", + "通过模型运行测试数据集得到的结果,验证模型的泛化能力。\n", + "\n", + "1. 使用`model.eval`接口读入测试数据集。\n", + "2. 使用保存后的模型参数进行推理。" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "def test_net(network, model, data_path):\n", + " \"\"\"定义验证的方法\"\"\"\n", + " ds_eval = create_dataset(os.path.join(data_path, \"test\"))\n", + " acc = model.eval(ds_eval, dataset_sink_mode=False)\n", + " print(\"{}\".format(acc))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "这里把`train_epoch`设置为1,对数据集进行1个迭代的训练。在`train_net`和 `test_net`方法中,我们加载了之前下载的训练数据集,`mnist_path`是MNIST数据集路径。" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "train_epoch = 1\n", + "mnist_path = \"./datasets/MNIST_Data\"\n", + "dataset_size = 1\n", + "model = Model(net, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n", + "train_net(args, model, train_epoch, mnist_path, dataset_size, ckpoint, False)\n", + "test_net(net, model, mnist_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "使用以下命令运行脚本:\n", + "\n", + "```bash\n", + "python lenet.py --device_target=CPU\n", + "```\n", + "\n", + "其中,\n", + "\n", + "`lenet.py`:可以把前面的代码都粘贴到lenet.py中(不包含“下载数据集”的代码)。一般情况下,可将import部分移到代码头部,类、函数和方法的定义放在之后,最后在main方法中将前面的操作串起来即可。\n", + "\n", + "`--device_target=CPU`:指定运行硬件平台,参数为`CPU`、`GPU`或者`Ascend`,根据你的实际运行硬件平台来指定。\n", + "\n", + "训练过程中会打印loss值,类似下图。loss值会波动,但总体来说loss值会逐步减小,精度逐步提高。每个人运行的loss值有一定随机性,不一定完全相同。\n", + "训练过程中loss打印示例如下:\n", + "\n", + "```bash\n", + "epoch: 1 step: 125, loss is 2.3083377\n", + "epoch: 1 step: 250, loss is 2.3019726\n", + "...\n", + "epoch: 1 step: 1500, loss is 0.028385757\n", + "epoch: 1 step: 1625, loss is 0.0857362\n", + "epoch: 1 step: 1750, loss is 0.05639569\n", + "epoch: 1 step: 1875, loss is 0.12366105\n", + "{'Accuracy': 0.9663477564102564}\n", + "```\n", + "\n", + "可以在打印信息中看出模型精度数据,示例中精度数据达到96.6%,模型质量良好。随着网络迭代次数`train_epoch`增加,模型精度会进一步提高。\n", + "\n", + "## 加载模型\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "from mindspore.train.serialization import load_checkpoint, load_param_into_net\n", + "# 加载已经保存的用于测试的模型\n", + "param_dict = load_checkpoint(\"checkpoint_lenet-1_1875.ckpt\")\n", + "# 加载参数到网络中\n", + "load_param_into_net(net, param_dict)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + ">阅读更多有关[MindSpore加载模型](https://www.mindspore.cn/tutorial/zh-CN/master/save_load_model.html#id3)的信息。\n", + "\n", + "## 验证模型\n", + "\n", + "我们使用生成的模型进行单个图片数据的分类预测,具体步骤如下:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Predicted: \"6\", Actual: \"6\"\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "from mindspore import Tensor\n", + "\n", + "# 定义测试数据集,batch_size设置为1,则取出一张图片\n", + "ds_test = create_dataset(os.path.join(mnist_path, \"test\"), batch_size=1).create_dict_iterator()\n", + "data = next(ds_test)\n", + "\n", + "# images为测试图片,labels为测试图片的实际分类\n", + "images = data[\"image\"].asnumpy()\n", + "labels = data[\"label\"].asnumpy()\n", + "\n", + "# 使用函数model.predict预测image对应分类\n", + "output = model.predict(Tensor(data['image']))\n", + "predicted = np.argmax(output.asnumpy(), axis=1)\n", + "\n", + "# 输出预测分类与实际分类\n", + "print(f'Predicted: \"{predicted[0]}\", Actual: \"{labels[0]}\"')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "MindSpore-1.1.1", + "language": "python", + "name": "mindspore-1.1.1" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/tutorials/source_zh_cn/quick_start.ipynb b/tutorials/source_zh_cn/quick_start.ipynb index 49b072e3be..97350b5acf 100644 --- a/tutorials/source_zh_cn/quick_start.ipynb +++ b/tutorials/source_zh_cn/quick_start.ipynb @@ -12,7 +12,7 @@ "\n", "## 配置运行信息\n", "\n", - "MindSpore通过`context.set_context`来配置运行需要的信息,譬如运行模式、后端信息、硬件等信息。\n", + "MindSpore通过`context.set_context`来配置运行需要的信息,如运行模式、后端信息、硬件等信息。\n", "\n", "导入`context`模块,配置运行需要的信息。" ] @@ -21,8 +21,7 @@ "cell_type": "code", "execution_count": 1, "metadata": {}, - "outputs": [ - ], + "outputs": [], "source": [ "import os\n", "import argparse\n", @@ -50,7 +49,7 @@ "\n", "我们示例中用到的MNIST数据集是由10类28∗28的灰度图片组成,训练数据集包含60000张图片,测试数据集包含10000张图片。\n", "\n", - "你可以从[MNIST数据集下载页面](http://yann.lecun.com/exdb/mnist/)下载,并按下方目录结构放置,或直接运行如下命令完成下载和放置:" + "你可以从[MNIST数据集下载页面](http://yann.lecun.com/exdb/mnist/)下载,并按下方目录结构放置,如运行环境为Linux,还可以直接运行如下命令完成下载和放置:" ] }, { @@ -360,8 +359,7 @@ "cell_type": "code", "execution_count": 12, "metadata": {}, - "outputs": [ - ], + "outputs": [], "source": [ "train_epoch = 1\n", "mnist_path = \"./datasets/MNIST_Data\"\n", @@ -384,7 +382,7 @@ "\n", "其中,\n", "\n", - "`lenet.py`:为你根据教程编写的脚本文件。\n", + "`lenet.py`:可以把前面的代码都粘贴到lenet.py中(不包含“下载数据集”的代码)。一般情况下,可将import部分移到代码头部,类、函数和方法的定义放在之后,最后在main方法中将前面的操作串起来即可。\n", "\n", "`--device_target=CPU`:指定运行硬件平台,参数为`CPU`、`GPU`或者`Ascend`,根据你的实际运行硬件平台来指定。\n", "\n", @@ -411,8 +409,7 @@ "cell_type": "code", "execution_count": 13, "metadata": {}, - "outputs": [ - ], + "outputs": [], "source": [ "from mindspore.train.serialization import load_checkpoint, load_param_into_net\n", "# 加载已经保存的用于测试的模型\n", @@ -488,4 +485,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file -- Gitee From c2b09095092e6130194812e11bf79949b5fb3cd9 Mon Sep 17 00:00:00 2001 From: ZhangHui Date: Thu, 22 Apr 2021 17:45:54 +0800 Subject: [PATCH 2/3] Fix Issue #I3NOR1 2 --- .../quick_start-checkpoint.ipynb | 488 ------------------ 1 file changed, 488 deletions(-) delete mode 100644 tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb diff --git a/tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb b/tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb deleted file mode 100644 index 97350b5acf..0000000000 --- a/tutorials/source_zh_cn/.ipynb_checkpoints/quick_start-checkpoint.ipynb +++ /dev/null @@ -1,488 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 初学入门\n", - "\n", - "[![](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/quick_start.ipynb) [![](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/master/quick_start/mindspore_quick_start.ipynb) [![](https://gitee.com/mindspore/docs/raw/master/tutorials/training/source_zh_cn/_static/logo_modelarts.png)](https://console.huaweicloud.com/modelarts/?region=cn-north-4#/notebook/loading?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svbW9kZWxhcnRzL3F1aWNrX3N0YXJ0L21pbmRzcG9yZV9xdWlja19zdGFydC5pcHluYg==&image_id=65f636a0-56cf-49df-b941-7d2a07ba8c8c)\n", - "\n", - "本节贯穿MindSpore的基础功能,实现深度学习中的常见任务,请参考各节链接进行更加深入的学习。\n", - "\n", - "## 配置运行信息\n", - "\n", - "MindSpore通过`context.set_context`来配置运行需要的信息,如运行模式、后端信息、硬件等信息。\n", - "\n", - "导入`context`模块,配置运行需要的信息。" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import argparse\n", - "from mindspore import context\n", - "\n", - "parser = argparse.ArgumentParser(description='MindSpore LeNet Example')\n", - "parser.add_argument('--device_target', type=str, default=\"CPU\", choices=['Ascend', 'GPU', 'CPU'])\n", - "\n", - "args = parser.parse_known_args()[0]\n", - "context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "在样例中,我们配置样例运行使用图模式。根据实际情况配置硬件信息,譬如代码运行在Ascend AI处理器上,则`--device_target`选择`Ascend`,代码运行在CPU、GPU同理。详细参数说明,请参见[context.set_context](https://www.mindspore.cn/doc/api_python/zh-CN/master/mindspore/mindspore.context.html)接口说明。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 下载数据集\n", - "\n", - "我们示例中用到的MNIST数据集是由10类28∗28的灰度图片组成,训练数据集包含60000张图片,测试数据集包含10000张图片。\n", - "\n", - "你可以从[MNIST数据集下载页面](http://yann.lecun.com/exdb/mnist/)下载,并按下方目录结构放置,如运行环境为Linux,还可以直接运行如下命令完成下载和放置:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "./datasets/MNIST_Data\n", - "├── test\n", - "│   ├── t10k-images-idx3-ubyte\n", - "│   └── t10k-labels-idx1-ubyte\n", - "└── train\n", - " ├── train-images-idx3-ubyte\n", - " └── train-labels-idx1-ubyte\n", - "\n", - "2 directories, 4 files\n" - ] - } - ], - "source": [ - "!mkdir -p ./datasets/MNIST_Data/train ./datasets/MNIST_Data/test\n", - "!wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-labels-idx1-ubyte\n", - "!wget -NP ./datasets/MNIST_Data/train https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/train-images-idx3-ubyte\n", - "!wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-labels-idx1-ubyte\n", - "!wget -NP ./datasets/MNIST_Data/test https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/mnist/t10k-images-idx3-ubyte\n", - "!tree ./datasets/MNIST_Data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 数据处理\n", - "\n", - "数据集对于模型训练非常重要,好的数据集可以有效提高训练精度和效率。\n", - "MindSpore提供了用于数据处理的API模块 `mindspore.dataset` ,用于存储样本和标签。在加载数据集前,我们通常会对数据集进行一些处理,`mindspore.dataset`也集成了常见的数据处理方法。\n", - "\n", - "首先导入MindSpore中`mindspore.dataset`和其他相应的模块。" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import mindspore.dataset as ds\n", - "import mindspore.dataset.transforms.c_transforms as C\n", - "import mindspore.dataset.vision.c_transforms as CV\n", - "from mindspore.dataset.vision import Inter\n", - "from mindspore import dtype as mstype" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "数据集处理主要分为四个步骤:\n", - "\n", - "1. 定义函数`create_dataset`来创建数据集。\n", - "2. 定义需要进行的数据增强和处理操作,为之后进行map映射做准备。\n", - "3. 使用map映射函数,将数据操作应用到数据集。\n", - "4. 进行数据shuffle、batch操作。" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "def create_dataset(data_path, batch_size=32, repeat_size=1,\n", - " num_parallel_workers=1):\n", - " # 定义数据集\n", - " mnist_ds = ds.MnistDataset(data_path)\n", - " resize_height, resize_width = 32, 32\n", - " rescale = 1.0 / 255.0\n", - " shift = 0.0\n", - " rescale_nml = 1 / 0.3081\n", - " shift_nml = -1 * 0.1307 / 0.3081\n", - "\n", - " # 定义所需要操作的map映射\n", - " resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR)\n", - " rescale_nml_op = CV.Rescale(rescale_nml, shift_nml)\n", - " rescale_op = CV.Rescale(rescale, shift)\n", - " hwc2chw_op = CV.HWC2CHW()\n", - " type_cast_op = C.TypeCast(mstype.int32)\n", - " \n", - " # 使用map映射函数,将数据操作应用到数据集\n", - " mnist_ds = mnist_ds.map(operations=type_cast_op, input_columns=\"label\", num_parallel_workers=num_parallel_workers)\n", - " mnist_ds = mnist_ds.map(operations=resize_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", - " mnist_ds = mnist_ds.map(operations=rescale_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", - " mnist_ds = mnist_ds.map(operations=rescale_nml_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", - " mnist_ds = mnist_ds.map(operations=hwc2chw_op, input_columns=\"image\", num_parallel_workers=num_parallel_workers)\n", - " \n", - " # 进行shuffle、batch操作\n", - " buffer_size = 10000\n", - " mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size)\n", - " mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True)\n", - "\n", - " return mnist_ds" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "其中,`batch_size`为每组包含的数据个数,现设置每组包含32个数据。\n", - "\n", - "> MindSpore支持进行多种数据处理和增强的操作,具体可以参考[数据处理](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/pipeline.html)和[数据增强](https://www.mindspore.cn/doc/programming_guide/zh-CN/master/augmentation.html)章节。\n", - "\n", - "## 创建模型\n", - "\n", - "使用MindSpore定义神经网络需要继承`mindspore.nn.Cell`。`Cell`是所有神经网络(如`Conv2d-relu-softmax`等)的基类。\n", - "\n", - "神经网络的各层需要预先在`__init__`方法中定义,然后通过定义`construct`方法来完成神经网络的前向构造。按照LeNet的网络结构,定义网络各层如下:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "import mindspore.nn as nn\n", - "from mindspore.common.initializer import Normal\n", - "\n", - "class LeNet5(nn.Cell):\n", - " \"\"\"\n", - " Lenet网络结构\n", - " \"\"\"\n", - " def __init__(self, num_class=10, num_channel=1):\n", - " super(LeNet5, self).__init__()\n", - " # 定义所需要的运算\n", - " self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')\n", - " self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')\n", - " self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))\n", - " self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))\n", - " self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))\n", - " self.relu = nn.ReLU()\n", - " self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n", - " self.flatten = nn.Flatten()\n", - "\n", - " def construct(self, x):\n", - " # 使用定义好的运算构建前向网络\n", - " x = self.conv1(x)\n", - " x = self.relu(x)\n", - " x = self.max_pool2d(x)\n", - " x = self.conv2(x)\n", - " x = self.relu(x)\n", - " x = self.max_pool2d(x)\n", - " x = self.flatten(x)\n", - " x = self.fc1(x)\n", - " x = self.relu(x)\n", - " x = self.fc2(x)\n", - " x = self.relu(x)\n", - " x = self.fc3(x)\n", - " return x\n", - "\n", - "# 实例化网络\n", - "net = LeNet5()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - ">阅读更多有关[在MindSpore中构建神经网络](https://www.mindspore.cn/tutorial/training/zh-CN/master/use/defining_the_network.html)的信息。\n", - "\n", - "## 优化模型参数\n", - "\n", - "要训练神经网络模型,需要定义损失函数和优化器。\n", - "\n", - "MindSpore支持的损失函数有`SoftmaxCrossEntropyWithLogits`、`L1Loss`、`MSELoss`等。这里使用交叉熵损失函数`SoftmaxCrossEntropyWithLogits`。\n" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "# 定义损失函数\n", - "net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - ">阅读更多有关[在MindSpore中使用损失函数](https://www.mindspore.cn/tutorial/zh-CN/master/optimization.html#损失函数)的信息。\n", - "\n", - "MindSpore支持的优化器有`Adam`、`AdamWeightDecay`、`Momentum`等。这里使用`Momentum`优化器为例。" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "# 定义优化器\n", - "net_opt = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.9)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - ">阅读更多有关[在MindSpore中使用优化器](https://www.mindspore.cn/tutorial/zh-CN/master/optimization.html#优化器)的信息。\n", - "\n", - "## 训练及保存模型\n", - "\n", - "MindSpore提供了回调Callback机制,可以在训练过程中执行自定义逻辑,这里以使用框架提供的`ModelCheckpoint`为例。\n", - "`ModelCheckpoint`可以保存网络模型和参数,以便进行后续的Fine-tuning(微调)操作。\n" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "from mindspore.train.callback import ModelCheckpoint, CheckpointConfig\n", - "# 设置模型保存参数\n", - "config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10)\n", - "# 应用模型保存参数\n", - "ckpoint = ModelCheckpoint(prefix=\"checkpoint_lenet\", config=config_ck)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "通过MindSpore提供的`model.train`接口可以方便地进行网络的训练,`LossMonitor`可以监控训练过程中`loss`值的变化。\n" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "# 导入模型训练需要的库\n", - "from mindspore.nn import Accuracy\n", - "from mindspore.train.callback import LossMonitor\n", - "from mindspore import Model" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "def train_net(args, model, epoch_size, data_path, repeat_size, ckpoint_cb, sink_mode):\n", - " \"\"\"定义训练的方法\"\"\"\n", - " # 加载训练数据集\n", - " ds_train = create_dataset(os.path.join(data_path, \"train\"), 32, repeat_size)\n", - " model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor(125)], dataset_sink_mode=sink_mode)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "其中,`dataset_sink_mode`用于控制数据是否下沉,数据下沉是指数据通过通道直接传送到Device上,可以加快训练速度,`dataset_sink_mode`为True表示数据下沉,否则为非下沉。\n", - "\n", - "通过模型运行测试数据集得到的结果,验证模型的泛化能力。\n", - "\n", - "1. 使用`model.eval`接口读入测试数据集。\n", - "2. 使用保存后的模型参数进行推理。" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def test_net(network, model, data_path):\n", - " \"\"\"定义验证的方法\"\"\"\n", - " ds_eval = create_dataset(os.path.join(data_path, \"test\"))\n", - " acc = model.eval(ds_eval, dataset_sink_mode=False)\n", - " print(\"{}\".format(acc))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "这里把`train_epoch`设置为1,对数据集进行1个迭代的训练。在`train_net`和 `test_net`方法中,我们加载了之前下载的训练数据集,`mnist_path`是MNIST数据集路径。" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "train_epoch = 1\n", - "mnist_path = \"./datasets/MNIST_Data\"\n", - "dataset_size = 1\n", - "model = Model(net, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()})\n", - "train_net(args, model, train_epoch, mnist_path, dataset_size, ckpoint, False)\n", - "test_net(net, model, mnist_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "使用以下命令运行脚本:\n", - "\n", - "```bash\n", - "python lenet.py --device_target=CPU\n", - "```\n", - "\n", - "其中,\n", - "\n", - "`lenet.py`:可以把前面的代码都粘贴到lenet.py中(不包含“下载数据集”的代码)。一般情况下,可将import部分移到代码头部,类、函数和方法的定义放在之后,最后在main方法中将前面的操作串起来即可。\n", - "\n", - "`--device_target=CPU`:指定运行硬件平台,参数为`CPU`、`GPU`或者`Ascend`,根据你的实际运行硬件平台来指定。\n", - "\n", - "训练过程中会打印loss值,类似下图。loss值会波动,但总体来说loss值会逐步减小,精度逐步提高。每个人运行的loss值有一定随机性,不一定完全相同。\n", - "训练过程中loss打印示例如下:\n", - "\n", - "```bash\n", - "epoch: 1 step: 125, loss is 2.3083377\n", - "epoch: 1 step: 250, loss is 2.3019726\n", - "...\n", - "epoch: 1 step: 1500, loss is 0.028385757\n", - "epoch: 1 step: 1625, loss is 0.0857362\n", - "epoch: 1 step: 1750, loss is 0.05639569\n", - "epoch: 1 step: 1875, loss is 0.12366105\n", - "{'Accuracy': 0.9663477564102564}\n", - "```\n", - "\n", - "可以在打印信息中看出模型精度数据,示例中精度数据达到96.6%,模型质量良好。随着网络迭代次数`train_epoch`增加,模型精度会进一步提高。\n", - "\n", - "## 加载模型\n" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "from mindspore.train.serialization import load_checkpoint, load_param_into_net\n", - "# 加载已经保存的用于测试的模型\n", - "param_dict = load_checkpoint(\"checkpoint_lenet-1_1875.ckpt\")\n", - "# 加载参数到网络中\n", - "load_param_into_net(net, param_dict)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - ">阅读更多有关[MindSpore加载模型](https://www.mindspore.cn/tutorial/zh-CN/master/save_load_model.html#id3)的信息。\n", - "\n", - "## 验证模型\n", - "\n", - "我们使用生成的模型进行单个图片数据的分类预测,具体步骤如下:" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Predicted: \"6\", Actual: \"6\"\n" - ] - } - ], - "source": [ - "import numpy as np\n", - "from mindspore import Tensor\n", - "\n", - "# 定义测试数据集,batch_size设置为1,则取出一张图片\n", - "ds_test = create_dataset(os.path.join(mnist_path, \"test\"), batch_size=1).create_dict_iterator()\n", - "data = next(ds_test)\n", - "\n", - "# images为测试图片,labels为测试图片的实际分类\n", - "images = data[\"image\"].asnumpy()\n", - "labels = data[\"label\"].asnumpy()\n", - "\n", - "# 使用函数model.predict预测image对应分类\n", - "output = model.predict(Tensor(data['image']))\n", - "predicted = np.argmax(output.asnumpy(), axis=1)\n", - "\n", - "# 输出预测分类与实际分类\n", - "print(f'Predicted: \"{predicted[0]}\", Actual: \"{labels[0]}\"')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "MindSpore-1.1.1", - "language": "python", - "name": "mindspore-1.1.1" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file -- Gitee From 851db5180bf090667910beff1bec45258e3f9093 Mon Sep 17 00:00:00 2001 From: ZhangHui Date: Thu, 22 Apr 2021 17:39:38 +0800 Subject: [PATCH 3/3] Fix Issue #I3NOR1 --- tutorials/source_zh_cn/quick_start.ipynb | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/tutorials/source_zh_cn/quick_start.ipynb b/tutorials/source_zh_cn/quick_start.ipynb index 49b072e3be..97350b5acf 100644 --- a/tutorials/source_zh_cn/quick_start.ipynb +++ b/tutorials/source_zh_cn/quick_start.ipynb @@ -12,7 +12,7 @@ "\n", "## 配置运行信息\n", "\n", - "MindSpore通过`context.set_context`来配置运行需要的信息,譬如运行模式、后端信息、硬件等信息。\n", + "MindSpore通过`context.set_context`来配置运行需要的信息,如运行模式、后端信息、硬件等信息。\n", "\n", "导入`context`模块,配置运行需要的信息。" ] @@ -21,8 +21,7 @@ "cell_type": "code", "execution_count": 1, "metadata": {}, - "outputs": [ - ], + "outputs": [], "source": [ "import os\n", "import argparse\n", @@ -50,7 +49,7 @@ "\n", "我们示例中用到的MNIST数据集是由10类28∗28的灰度图片组成,训练数据集包含60000张图片,测试数据集包含10000张图片。\n", "\n", - "你可以从[MNIST数据集下载页面](http://yann.lecun.com/exdb/mnist/)下载,并按下方目录结构放置,或直接运行如下命令完成下载和放置:" + "你可以从[MNIST数据集下载页面](http://yann.lecun.com/exdb/mnist/)下载,并按下方目录结构放置,如运行环境为Linux,还可以直接运行如下命令完成下载和放置:" ] }, { @@ -360,8 +359,7 @@ "cell_type": "code", "execution_count": 12, "metadata": {}, - "outputs": [ - ], + "outputs": [], "source": [ "train_epoch = 1\n", "mnist_path = \"./datasets/MNIST_Data\"\n", @@ -384,7 +382,7 @@ "\n", "其中,\n", "\n", - "`lenet.py`:为你根据教程编写的脚本文件。\n", + "`lenet.py`:可以把前面的代码都粘贴到lenet.py中(不包含“下载数据集”的代码)。一般情况下,可将import部分移到代码头部,类、函数和方法的定义放在之后,最后在main方法中将前面的操作串起来即可。\n", "\n", "`--device_target=CPU`:指定运行硬件平台,参数为`CPU`、`GPU`或者`Ascend`,根据你的实际运行硬件平台来指定。\n", "\n", @@ -411,8 +409,7 @@ "cell_type": "code", "execution_count": 13, "metadata": {}, - "outputs": [ - ], + "outputs": [], "source": [ "from mindspore.train.serialization import load_checkpoint, load_param_into_net\n", "# 加载已经保存的用于测试的模型\n", @@ -488,4 +485,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file -- Gitee