diff --git a/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.ipynb b/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..9963b40de1e5c724f314e581342be4f26fcf1e27 --- /dev/null +++ b/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.ipynb @@ -0,0 +1,1006 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d8f28388", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:06:23.745016Z", + "start_time": "2022-03-02T09:06:21.533915Z" + } + }, + "source": [ + "# PyNative模式应用\n", + "\n", + "`Ascend` `GPU` `CPU` `模型运行`\n", + "\n", + "[![下载Notebook](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/master/programming_guide/zh_cn/mindspore_debug_in_pynative_mode.ipynb) [![下载样例代码](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/master/programming_guide/zh_cn/mindspore_debug_in_pynative_mode.py) [![查看源文件](https://gitee.com/mindspore/docs/raw/master/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.ipynb)\n", + "\n", + "## 概述\n", + "\n", + "MindSpore支持两种运行模式,在调试或者运行方面做了不同的优化:\n", + "\n", + "- PyNative模式:也称动态图模式,将神经网络中的各个算子逐一下发执行,方便用户编写和调试神经网络模型。\n", + "- Graph模式:也称静态图模式或者图模式,将神经网络模型编译成一整张图,然后下发执行。该模式利用图优化等技术提高运行性能,同时有助于规模部署和跨平台运行。\n", + "\n", + "默认情况下,MindSpore处于Graph模式,可以通过`context.set_context(mode=context.PYNATIVE_MODE)`切换为PyNative模式;同样地,MindSpore处于PyNative模式时,可以通过`context.set_context(mode=context.GRAPH_MODE)`切换为Graph模式。\n", + "\n", + "PyNative模式下,支持执行单算子、普通函数和网络,以及单独求梯度的操作。下面将详细介绍使用方法和注意事项。\n", + "\n", + "> PyNative模式下为了提升性能,算子在device上使用了异步执行方式,因此在算子执行错误的时候,错误信息可能会在程序执行到最后才显示。因此在PyNative模式下,增加了一个pynative_synchronize的设置来控制算子device上是否使用异步执行。\n", + ">\n", + "> 下述例子中,参数初始化使用了随机值,在具体执行中输出的结果可能与本地执行输出的结果不同;如果需要稳定输出固定的值,可以设置固定的随机种子,设置方法请参考[mindspore.set_seed()](https://www.mindspore.cn/docs/api/zh-CN/master/api_python/mindspore/mindspore.set_seed.html)。\n", + "\n", + "## 设置模式" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1f073178", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:15:13.240496Z", + "start_time": "2022-03-02T09:15:13.237903Z" + } + }, + "outputs": [], + "source": [ + "from mindspore import context\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)" + ] + }, + { + "cell_type": "markdown", + "id": "7c2d61c8", + "metadata": {}, + "source": [ + "## 执行单算子" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "02da4fc9", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:08:29.337515Z", + "start_time": "2022-03-02T09:08:29.322592Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[[[2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]]\n", + "\n", + " [[2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]]\n", + "\n", + " [[2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]\n", + " [2. 2. 2. 2. 2.]]]]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore.ops as ops\n", + "from mindspore import context, Tensor\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "x = Tensor(np.ones([1, 3, 5, 5]).astype(np.float32))\n", + "y = Tensor(np.ones([1, 3, 5, 5]).astype(np.float32))\n", + "z = ops.add(x, y)\n", + "print(z.asnumpy())" + ] + }, + { + "cell_type": "markdown", + "id": "cb199d5a", + "metadata": {}, + "source": [ + "## 执行函数" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ae94d0c8", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:08:53.065585Z", + "start_time": "2022-03-02T09:08:53.058016Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[3. 3. 3.]\n", + " [3. 3. 3.]\n", + " [3. 3. 3.]]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "from mindspore import context, Tensor\n", + "import mindspore.ops as ops\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "def add_func(x, y):\n", + " z = ops.add(x, y)\n", + " z = ops.add(z, x)\n", + " return z\n", + "\n", + "x = Tensor(np.ones([3, 3], dtype=np.float32))\n", + "y = Tensor(np.ones([3, 3], dtype=np.float32))\n", + "output = add_func(x, y)\n", + "print(output.asnumpy())" + ] + }, + { + "cell_type": "markdown", + "id": "4750f0de", + "metadata": {}, + "source": [ + "## 执行网络\n", + "\n", + "在construct中定义网络结构,在具体运行时,下例中,执行net(x, y)时,会从construct函数中开始执行。" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "b27a3b82", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:09:16.498705Z", + "start_time": "2022-03-02T09:09:16.490549Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 4. 10. 18.]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore.nn as nn\n", + "import mindspore.ops as ops\n", + "from mindspore import context, Tensor\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "class Net(nn.Cell):\n", + " def __init__(self):\n", + " super(Net, self).__init__()\n", + " self.mul = ops.Mul()\n", + "\n", + " def construct(self, x, y):\n", + " return self.mul(x, y)\n", + "\n", + "x = Tensor(np.array([1.0, 2.0, 3.0]).astype(np.float32))\n", + "y = Tensor(np.array([4.0, 5.0, 6.0]).astype(np.float32))\n", + "\n", + "net = Net()\n", + "print(net(x, y))" + ] + }, + { + "cell_type": "markdown", + "id": "e8fd0a3f", + "metadata": {}, + "source": [ + "## 构建网络\n", + "\n", + "可以在网络初始化时,明确定义网络所需要的各个部分,在construct中定义网络结构。" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "587ec18c", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:09:54.055532Z", + "start_time": "2022-03-02T09:09:54.047474Z" + } + }, + "outputs": [], + "source": [ + "import mindspore.nn as nn\n", + "from mindspore.common.initializer import Normal\n", + "\n", + "class LeNet5(nn.Cell):\n", + " def __init__(self, num_class=10, num_channel=1, include_top=True):\n", + " super(LeNet5, self).__init__()\n", + " self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')\n", + " self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')\n", + " self.relu = nn.ReLU()\n", + " self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n", + " self.include_top = include_top\n", + " if self.include_top:\n", + " self.flatten = nn.Flatten()\n", + " self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))\n", + " self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))\n", + " self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))\n", + "\n", + "\n", + " def construct(self, x):\n", + " x = self.conv1(x)\n", + " x = self.relu(x)\n", + " x = self.max_pool2d(x)\n", + " x = self.conv2(x)\n", + " x = self.relu(x)\n", + " x = self.max_pool2d(x)\n", + " if not self.include_top:\n", + " return x\n", + " x = self.flatten(x)\n", + " x = self.relu(self.fc1(x))\n", + " x = self.relu(self.fc2(x))\n", + " x = self.fc3(x)\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "id": "63cd2da8", + "metadata": {}, + "source": [ + "## 设置Loss函数及优化器\n", + "\n", + "在PyNative模式下,通过优化器针对每个参数对应的梯度进行参数更新。\n", + "\n", + "```python\n", + "net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction=\"mean\")\n", + "net_opt = nn.Momentum(network.trainable_params(), config.lr, config.momentum)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "12463c2b", + "metadata": {}, + "source": [ + "## 保存模型参数\n", + "\n", + "保存模型可以通过定义CheckpointConfig来指定模型保存的参数。\n", + "\n", + "`save_checkpoint_steps`:每多少个step保存一下参数;`keep_checkpoint_max`:最多保存多少份模型参数。详细使用方式请参考[保存模型](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/save_model.html)。\n", + "\n", + "```python\n", + "config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps,\n", + " keep_checkpoint_max=config.keep_checkpoint_max)\n", + "ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_lenet\", directory=config.ckpt_path, config=config_ck)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "efe5d247", + "metadata": {}, + "source": [ + "## 训练网络\n", + "\n", + "```python\n", + "context.set_context(mode=context.PYNATIVE_MODE, device_target=config.device_target)\n", + "ds_train = create_dataset(os.path.join(config.data_path, \"train\"), config.batch_size)\n", + "network = LeNet5(config.num_classes)\n", + "net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction=\"mean\")\n", + "net_opt = nn.Momentum(network.trainable_params(), config.lr, config.momentum)\n", + "time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())\n", + "config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps,\n", + " keep_checkpoint_max=config.keep_checkpoint_max)\n", + "ckpoint_cb = ModelCheckpoint(prefix=\"checkpoint_lenet\", directory=config.ckpt_path, config=config_ck)\n", + "\n", + "model = Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()}, amp_level=\"O2\")\n", + "```\n", + "\n", + "完整的运行代码可以到ModelZoo下载[lenet](https://gitee.com/mindspore/models/tree/master/official/cv/lenet),在train.py中修改为:`context.set_context(mode=context.PYNATIVE_MODE, device_target=config.device_target)`。\n", + "\n", + "## 提升PyNative性能\n", + "\n", + "为了提高PyNative模式下的前向计算任务执行速度,MindSpore提供了ms_function功能,该功能可以在PyNative模式下将Python函数或者Python类的方法编译成计算图,通过图优化等技术提高运行速度,如下例所示。" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "642d8765", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:11:12.072192Z", + "start_time": "2022-03-02T09:11:12.033104Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[3. 3. 3. 3.]\n", + " [3. 3. 3. 3.]\n", + " [3. 3. 3. 3.]\n", + " [3. 3. 3. 3.]]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore.nn as nn\n", + "from mindspore import context, Tensor\n", + "import mindspore.ops as ops\n", + "from mindspore import ms_function\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "class TensorAddNet(nn.Cell):\n", + " def __init__(self):\n", + " super(TensorAddNet, self).__init__()\n", + " self.add = ops.Add()\n", + "\n", + " @ms_function\n", + " def construct(self, x, y):\n", + " res = self.add(x, y)\n", + " return res\n", + "\n", + "x = Tensor(np.ones([4, 4]).astype(np.float32))\n", + "y = Tensor(np.ones([4, 4]).astype(np.float32))\n", + "net = TensorAddNet()\n", + "\n", + "z = net(x, y) # Staging mode\n", + "add = ops.Add()\n", + "res = add(x, z) # PyNative mode\n", + "print(res.asnumpy())" + ] + }, + { + "cell_type": "markdown", + "id": "d3acddf1", + "metadata": {}, + "source": [ + "上述示例代码中,在`TensorAddNet`类的`construct`之前加装了`ms_function`装饰器,该装饰器会将`construct`方法编译成计算图,在给定输入之后,以图的形式下发执行,而上一示例代码中的`add`会直接以普通的PyNative的方式执行。\n", + "\n", + "需要说明的是,加装了`ms_function`装饰器的函数中,如果包含不需要进行参数训练的算子(如`pooling`、`add`等算子),则这些算子可以在被装饰的函数中直接调用,如下例所示。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "0d5f8edc", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:11:37.208346Z", + "start_time": "2022-03-02T09:11:37.193631Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[2. 2. 2. 2.]\n", + " [2. 2. 2. 2.]\n", + " [2. 2. 2. 2.]\n", + " [2. 2. 2. 2.]]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore.nn as nn\n", + "from mindspore import context, Tensor\n", + "import mindspore.ops as ops\n", + "from mindspore import ms_function\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "add = ops.Add()\n", + "\n", + "@ms_function\n", + "def add_fn(x, y):\n", + " res = add(x, y)\n", + " return res\n", + "\n", + "x = Tensor(np.ones([4, 4]).astype(np.float32))\n", + "y = Tensor(np.ones([4, 4]).astype(np.float32))\n", + "z = add_fn(x, y)\n", + "print(z.asnumpy())" + ] + }, + { + "cell_type": "markdown", + "id": "51aba8b2", + "metadata": {}, + "source": [ + "如果被装饰的函数中包含了需要进行参数训练的算子(如`Convolution`、`BatchNorm`等算子),则这些算子必须在被装饰的函数之外完成实例化操作,如下例所示。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "b768cd62", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:12:01.721275Z", + "start_time": "2022-03-02T09:12:01.704982Z" + }, + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[[[-0.02816578 0.00726602 -0.0243537 ]\n", + " [-0.03901478 0.00598676 0.02190213]\n", + " [-0.0077684 -0.04365154 0.05111694]]\n", + "\n", + " [[-0.06303091 0.08491095 -0.02060323]\n", + " [ 0.04463107 0.04558043 0.04766568]\n", + " [ 0.06608325 -0.0319737 -0.02196906]]\n", + "\n", + " [[-0.02822576 -0.06556191 -0.01271653]\n", + " [ 0.08373221 -0.02165455 0.11897318]\n", + " [-0.00130219 -0.05717571 -0.02532942]]\n", + "\n", + " [[ 0.02029888 -0.02036209 -0.01771532]\n", + " [-0.00645351 -0.06793338 -0.10712627]\n", + " [ 0.07224355 0.11416516 0.04358198]]]\n", + "\n", + "\n", + " [[[ 0.01887683 0.07527123 -0.04069028]\n", + " [ 0.06689087 0.07286955 0.02803389]\n", + " [ 0.03525448 -0.01206777 0.01026574]]\n", + "\n", + " [[-0.04634263 0.09183761 -0.0099204 ]\n", + " [-0.01378078 0.07725186 0.01579553]\n", + " [ 0.01003464 -0.01863192 -0.00155336]]\n", + "\n", + " [[ 0.01095701 -0.04136867 0.01759142]\n", + " [ 0.03386753 -0.02528351 0.0391456 ]\n", + " [ 0.00925638 -0.00656884 -0.00192295]]\n", + "\n", + " [[ 0.01381767 -0.03718629 0.05333562]\n", + " [ 0.03779759 0.03554788 -0.01645133]\n", + " [-0.0903965 -0.01663967 0.05952255]]]]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore.nn as nn\n", + "from mindspore import context, Tensor\n", + "from mindspore import ms_function\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "conv_obj = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3, stride=2, padding=0)\n", + "conv_obj.init_parameters_data()\n", + "@ms_function\n", + "def conv_fn(x):\n", + " res = conv_obj(x)\n", + " return res\n", + "\n", + "input_data = np.random.randn(2, 3, 6, 6).astype(np.float32)\n", + "z = conv_fn(Tensor(input_data))\n", + "print(z.asnumpy())" + ] + }, + { + "cell_type": "markdown", + "id": "2c22412b", + "metadata": {}, + "source": [ + "更多ms_function的功能可以参考[ms_function文档](https://mindspore.cn/docs/programming_guide/zh-CN/master/ms_function.html)。\n", + "\n", + "## PyNative下同步执行\n", + "\n", + "PyNative模式下算子默认为异步执行,可以通过设置context来控制是否异步执行,当算子执行失败时,可以方便地通过调用栈看到出错的代码位置。\n", + "\n", + "设置为同步执行:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "733a67bc", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:12:28.228442Z", + "start_time": "2022-03-02T09:12:28.225330Z" + } + }, + "outputs": [], + "source": [ + "context.set_context(pynative_synchronize=True)" + ] + }, + { + "cell_type": "markdown", + "id": "198071c8", + "metadata": {}, + "source": [ + "示例代码:\n", + "\n", + "```python\n", + "import numpy as np\n", + "import mindspore.context as context\n", + "import mindspore.nn as nn\n", + "from mindspore import Tensor\n", + "from mindspore import dtype as mstype\n", + "import mindspore.ops as ops\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE, pynative_synchronize=True)\n", + "\n", + "class Net(nn.Cell):\n", + " def __init__(self):\n", + " super(Net, self).__init__()\n", + " self.get_next = ops.GetNext([mstype.float32], [(1, 1)], 1, \"test\")\n", + "\n", + " def construct(self, x1,):\n", + " x = self.get_next()\n", + " x = x + x1\n", + " return x\n", + "\n", + "context.set_context()\n", + "x1 = np.random.randn(1, 1).astype(np.float32)\n", + "net = Net()\n", + "output = net(Tensor(x1))\n", + "print(output.asnumpy())\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "bc31de88", + "metadata": {}, + "source": [ + "输出:此时算子为同步执行,当算子执行错误时,可以看到完整的调用栈,找到出错的代码行。\n", + "\n", + "```text\n", + "Traceback (most recent call last):\n", + " File \"test_pynative_sync_control.py\", line 41, in \n", + " output = net(Tensor(x1))\n", + " File \"mindspore/mindspore/nn/cell.py\", line 406, in \n", + " output = self.run_construct(cast_inputs, kwargs)\n", + " File \"mindspore/mindspore/nn/cell.py\", line 348, in \n", + " output = self.construct(*cast_inputs, **kwargs)\n", + " File \"test_pynative_sync_control.py\", line 33, in \n", + " x = self.get_next()\n", + " File \"mindspore/mindspore/ops/primitive.py\", line 247, in \n", + " return _run_op(self, self.name, args)\n", + " File \"mindspore/mindspore/common/api.py\", line 77, in \n", + " results = fn(*arg, **kwargs)\n", + " File \"mindspore/mindspore/ops/primitive.py\", line 677, in _run_op\n", + " output = real_run_op(obj, op_name, args)\n", + "RuntimeError: mindspore/ccsrc/runtime/device/kernel_runtime.cc:1006 DebugStreamSync] Op Default/GetNext-op0 run failed!\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "76c1c4d4", + "metadata": {}, + "source": [ + "## Hook功能\n", + "\n", + "调试深度学习网络是每一个深度学习领域的从业者需要面对,且投入大量精力的工作。由于深度学习网络隐藏了中间层算子的输入、输出数据以及反向梯度,只提供网络输入数据(特征量、权重)的梯度,导致无法准确地感知中间层算子的数据变化,从而影响调试效率。为了方便用户准确、快速地对深度学习网络进行调试,MindSpore在PyNative模式下设计了Hook功能。使用Hook功能可以捕获中间层算子的输入、输出数据以及反向梯度。目前,PyNative模式下提供了四种形式的Hook功能,分别是:HookBackward算子和在Cell对象上进行注册的register_forward_pre_hook、register_forward_hook、register_backward_hook功能。\n", + "\n", + "### HookBackward算子\n", + "\n", + "HookBackward将Hook功能以算子的形式实现。用户初始化一个HookBackward算子,将其安插到深度学习网络中需要捕获梯度的位置。在网络正向执行时,HookBackward算子将输入数据不做任何修改地原样输出;在网络反向传播梯度时,在HookBackward上注册的Hook函数将会捕获反向传播至此的梯度。用户可以在Hook函数中自定义对梯度的操作,比如打印梯度,或者返回新的梯度。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "7c478a5b", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:13:38.660885Z", + "start_time": "2022-03-02T09:13:38.645597Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(Tensor(shape=[], dtype=Float32, value= 2),)\n", + "(Tensor(shape=[], dtype=Float32, value= 4), Tensor(shape=[], dtype=Float32, value= 4))\n" + ] + } + ], + "source": [ + "import mindspore\n", + "from mindspore import ops\n", + "from mindspore import Tensor\n", + "from mindspore import context\n", + "from mindspore.ops import GradOperation\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "def hook_fn(grad_out):\n", + " print(grad_out)\n", + "\n", + "grad_all = GradOperation(get_all=True)\n", + "hook = ops.HookBackward(hook_fn)\n", + "def hook_test(x, y):\n", + " z = x * y\n", + " z = hook(z)\n", + " z = z * y\n", + " return z\n", + "\n", + "def net(x, y):\n", + " return grad_all(hook_test)(x, y)\n", + "\n", + "output = net(Tensor(1, mindspore.float32), Tensor(2, mindspore.float32))\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "7c591014", + "metadata": {}, + "source": [ + "更多HookBackward算子的说明可以参考[API文档](https://mindspore.cn/docs/api/zh-CN/master/api_python/ops/mindspore.ops.HookBackward.html)。\n", + "\n", + "### Cell对象的register_forward_pre_hook功能\n", + "\n", + "用户可以在Cell对象上使用`register_forward_pre_hook`函数来注册一个自定义的Hook函数,用来捕获正向传入该Cell对象的数据。`register_forward_pre_hook`函数接收Hook函数作为入参,并返回一个与Hook函数一一对应的`handle`对象。用户可以通过调用`handle`对象的`remove()`函数来删除与之对应的Hook函数。 Hook函数应该按照以下的方式进行定义。\n", + "\n", + "示例代码:\n", + "\n", + "```python\n", + "def forward_pre_hook_fn(cell_id, inputs):\n", + " print(\"forward inputs: \", inputs)\n", + "```\n", + "\n", + "这里的cell_id是Cell对象的名称以及ID信息,inputs是正向传入到Cell对象的数据。因此,用户可以使用register_forward_pre_hook函数来捕获网络中某一个Cell对象的正向输入数据。用户可以在Hook函数中自定义对输入数据的操作,比如打印数据,或者返回新的输入数据给当前的Cell对象。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "be24bd1e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]))\n", + "(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]))\n", + "(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]))\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore\n", + "import mindspore.nn as nn\n", + "from mindspore import Tensor\n", + "from mindspore import context\n", + "from mindspore.ops import GradOperation\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "def forward_pre_hook_fn(cell_id, inputs):\n", + " print(\"forward inputs: \", inputs)\n", + "\n", + "class Net(nn.Cell):\n", + " def __init__(self):\n", + " super(Net, self).__init__()\n", + " self.mul = nn.MatMul()\n", + " self.handle = self.mul.register_forward_pre_hook(forward_pre_hook_fn)\n", + "\n", + " def construct(self, x, y):\n", + " x = x + x\n", + " x = self.mul(x, y)\n", + " return x\n", + "\n", + "grad = GradOperation(get_all=True)\n", + "net = Net()\n", + "output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32)))\n", + "print(output)\n", + "net.handle.remove()\n", + "output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32)))\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "7527515d", + "metadata": {}, + "source": [ + "为了避免切换到图模式时脚本运行失败,不建议将register_forward_pre_hook函数直接写在Cell对象的construct函数中。\n", + "\n", + "### Cell对象的register_forward_hook功能\n", + "\n", + "用户可以在Cell对象上使用`register_forward_hook`函数来注册一个自定义的Hook函数,用来捕获正向传入Cell对象的数据和Cell对象的输出数据。`register_forward_hook`函数接收Hook函数作为入参,并返回一个与Hook函数一一对应的handle对象。用户可以通过调用handle对象的`remove()`函数来删除与之对应的Hook函数。 Hook函数应该按照以下的方式进行定义。\n", + "\n", + "示例代码:\n", + "\n", + "```python\n", + "def forward_hook_fn(cell_id, inputs, outputs):\n", + " print(\"forward inputs: \", inputs)\n", + " print(\"forward outputs: \", outputs)\n", + "```\n", + "\n", + "这里的`cell_id`是Cell对象的名称以及ID信息,`inputs`是正向传入到Cell对象的数据,`outputs`是Cell对象的正向输出数据。因此,用户可以使用`register_forward_hook`函数来捕获网络中某一个Cell对象的正向输入数据和输出数据。用户可以在Hook函数中自定义对输入、输出数据的操作,比如打印数据,或者返回新的输出数据。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "bf9a9441", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]))\n", + "forward outputs: 2.0\n", + "(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]))\n", + "(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]))\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore\n", + "import mindspore.nn as nn\n", + "from mindspore import Tensor\n", + "from mindspore import context\n", + "from mindspore.ops import GradOperation\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "def forward_hook_fn(cell_id, inputs, outputs):\n", + " print(\"forward inputs: \", inputs)\n", + " print(\"forward outputs: \", outputs)\n", + "\n", + "class Net(nn.Cell):\n", + " def __init__(self):\n", + " super(Net, self).__init__()\n", + " self.mul = nn.MatMul()\n", + " self.handle = self.mul.register_forward_hook(forward_hook_fn)\n", + "\n", + " def construct(self, x, y):\n", + " x = x + x\n", + " x = self.mul(x, y)\n", + " return x\n", + "\n", + "grad = GradOperation(get_all=True)\n", + "net = Net()\n", + "output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32)))\n", + "print(output)\n", + "net.handle.remove()\n", + "output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32)))\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "6c43f47d", + "metadata": {}, + "source": [ + "为了避免切换到图模式时脚本运行失败,不建议将register_forward_hook函数直接写在Cell对象的construct函数中。" + ] + }, + { + "cell_type": "markdown", + "id": "785b3be4", + "metadata": {}, + "source": [ + "### Cell对象的register_backward_hook功能\n", + "\n", + "用户可以在Cell对象上使用`register_backward_hook`接口来注册一个自定义的Hook函数,用来捕获网络反向传播时与Cell对象相关联的梯度。`register_backward_hook`函数接收Hook函数作为入参,并返回一个与Hook函数一一对应的`handle`对象。用户可以通过调用`handle`对象的`remove()`函数来删除与之对应的Hook函数。\n", + "\n", + "与HookBackward算子所使用的自定义Hook函数有所不同,`register_backward_hook`使用的Hook函数的入参中包含了表示Cell对象名称与id信息的`cell_id`、反向传入到Cell对象的梯度、以及Cell对象的反向输出的梯度。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "9826935c", + "metadata": {}, + "outputs": [], + "source": [ + "def cell_hook_function(cell_id, grad_input, grad_output):\n", + " print(grad_input)\n", + " print(grad_output)" + ] + }, + { + "cell_type": "markdown", + "id": "acb32235", + "metadata": {}, + "source": [ + "这里的`grad_input`是网络反向传播时,传入到Cell对象的梯度,它对应于正向过程中下一个算子的反向输出梯度;`grad_output`是Cell对象反向输出的梯度。因此,用户可以使用`register_backward_hook`函数来捕获网络中某一个Cell对象的反向传入和反向输出梯度。用户可以在Hook函数中自定义对梯度的操作,比如打印梯度,或者返回新的输出梯度。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "6aed31a5", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:14:26.523389Z", + "start_time": "2022-03-02T09:14:26.506784Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(Tensor(shape=[1, 2, 1, 1], dtype=Float32, value=\n", + "[[[[ 1.00000000e+00]],\n", + " [[ 1.00000000e+00]]]]),)\n", + "(Tensor(shape=[1, 2, 1, 1], dtype=Float32, value=\n", + "[[[[ 9.99994993e-01]],\n", + " [[ 9.99994993e-01]]]]),)\n", + "(Tensor(shape=[1, 1, 2, 2], dtype=Float32, value=\n", + "[[[[ 1.99998999e+00, 1.99998999e+00],\n", + " [ 1.99998999e+00, 1.99998999e+00]]]]),)\n", + "(Tensor(shape=[1, 1, 2, 2], dtype=Float32, value=\n", + "[[[[ 1.99998999e+00, 1.99998999e+00],\n", + " [ 1.99998999e+00, 1.99998999e+00]]]]),)\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import mindspore\n", + "import mindspore.nn as nn\n", + "from mindspore import Tensor\n", + "from mindspore import context\n", + "from mindspore.ops import GradOperation\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "def backward_hook_function(cell_id, grad_input, grad_output):\n", + " print(grad_input)\n", + " print(grad_output)\n", + "\n", + "class Net(nn.Cell):\n", + " def __init__(self):\n", + " super(Net, self).__init__()\n", + " self.conv = nn.Conv2d(1, 2, kernel_size=2, stride=1, padding=0, weight_init=\"ones\", pad_mode=\"valid\")\n", + " self.bn = nn.BatchNorm2d(2, momentum=0.99, eps=0.00001, gamma_init=\"ones\")\n", + " self.handle = self.bn.register_backward_hook(backward_hook_function)\n", + " self.relu = nn.ReLU()\n", + "\n", + " def construct(self, x):\n", + " x = self.conv(x)\n", + " x = self.bn(x)\n", + " x = self.relu(x)\n", + " return x\n", + "\n", + "net = Net()\n", + "grad_all = GradOperation(get_all=True)\n", + "output = grad_all(net)(Tensor(np.ones([1, 1, 2, 2]).astype(np.float32)))\n", + "print(output)\n", + "net.handle.remove()\n", + "output = grad_all(net)(Tensor(np.ones([1, 1, 2, 2]).astype(np.float32)))\n", + "print(output)" + ] + }, + { + "cell_type": "markdown", + "id": "5a0eb08c", + "metadata": {}, + "source": [ + "为了避免切换到图模式时脚本运行失败,不建议将register_backward_hook函数直接写在Cell对象的construct函数中。更多关于Cell对象的register_backward_hook功能的说明可以参考[API文档](https://mindspore.cn/docs/api/zh-CN/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_hook)。\n", + "\n", + "## 自定义bprop功能\n", + "\n", + "用户可以自定义nn.Cell对象的反向传播(计算)函数,从而控制nn.Cell对象梯度计算的过程,定位梯度问题。自定义bprop函数的使用方法是:在定义的nn.Cell对象里面增加一个用户自定义的bprop函数。训练的过程中会使用用户自定义的bprop函数来生成反向图。\n", + "\n", + "示例代码:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "342c346e", + "metadata": { + "ExecuteTime": { + "end_time": "2022-03-02T09:14:55.896896Z", + "start_time": "2022-03-02T09:14:55.881233Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(Tensor(shape=[], dtype=Float32, value= 3), Tensor(shape=[], dtype=Float32, value= 2))\n" + ] + } + ], + "source": [ + "import mindspore\n", + "import mindspore.nn as nn\n", + "from mindspore import Tensor\n", + "from mindspore import context\n", + "from mindspore.ops import GradOperation\n", + "\n", + "context.set_context(mode=context.PYNATIVE_MODE)\n", + "\n", + "class Net(nn.Cell):\n", + " def construct(self, x, y):\n", + " z = x * y\n", + " z = z * y\n", + " return z\n", + "\n", + " def bprop(self, x, y, out, dout):\n", + " x_dout = x + y\n", + " y_dout = x * y\n", + " return x_dout, y_dout\n", + "\n", + "grad_all = GradOperation(get_all=True)\n", + "output = grad_all(Net())(Tensor(1, mindspore.float32), Tensor(2, mindspore.float32))\n", + "print(output)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "MindSpore", + "language": "python", + "name": "mindspore" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.md b/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.md deleted file mode 100644 index 8ec084f047866aee3ecacd38e05e63d35ee180ff..0000000000000000000000000000000000000000 --- a/docs/mindspore/programming_guide/source_zh_cn/debug_in_pynative_mode.md +++ /dev/null @@ -1,685 +0,0 @@ -# PyNative模式应用 - -`Ascend` `GPU` `CPU` `模型运行` - -   - - -## 概述 - -MindSpore支持两种运行模式,在调试或者运行方面做了不同的优化: - -- PyNative模式:也称动态图模式,将神经网络中的各个算子逐一下发执行,方便用户编写和调试神经网络模型。 -- Graph模式:也称静态图模式或者图模式,将神经网络模型编译成一整张图,然后下发执行。该模式利用图优化等技术提高运行性能,同时有助于规模部署和跨平台运行。 - -默认情况下,MindSpore处于Graph模式,可以通过`context.set_context(mode=context.PYNATIVE_MODE)`切换为PyNative模式;同样地,MindSpore处于PyNative模式时,可以通过`context.set_context(mode=context.GRAPH_MODE)`切换为Graph模式。 - -PyNative模式下,支持执行单算子、普通函数和网络,以及单独求梯度的操作。下面将详细介绍使用方法和注意事项。 - -> PyNative模式下为了提升性能,算子在device上使用了异步执行方式,因此在算子执行错误的时候,错误信息可能会在程序执行到最后才显示。因此在PyNative模式下,增加了一个pynative_synchronize的设置来控制算子device上是否使用异步执行。 -> -> 下述例子中,参数初始化使用了随机值,在具体执行中输出的结果可能与本地执行输出的结果不同;如果需要稳定输出固定的值,可以设置固定的随机种子,设置方法请参考[mindspore.set_seed()](https://www.mindspore.cn/docs/api/zh-CN/master/api_python/mindspore/mindspore.set_seed.html)。 - -## 设置模式 - -```python -context.set_context(mode=context.PYNATIVE_MODE) -``` - -## 执行单算子 - -```python -import numpy as np -import mindspore.ops as ops -from mindspore import context, Tensor - -context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU") - -x = Tensor(np.ones([1, 3, 5, 5]).astype(np.float32)) -y = Tensor(np.ones([1, 3, 5, 5]).astype(np.float32)) -z = ops.add(x, y) -print(z.asnumpy()) -``` - -输出: - -```text -[[[[2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.]] - - [[2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.]] - - [[2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.] - [2. 2. 2. 2. 2.]]]] -``` - -## 执行函数 - -```python -import numpy as np -from mindspore import context, Tensor -import mindspore.ops as ops - -context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU") - -def add_func(x, y): - z = ops.add(x, y) - z = ops.add(z, x) - return z - -x = Tensor(np.ones([3, 3], dtype=np.float32)) -y = Tensor(np.ones([3, 3], dtype=np.float32)) -output = add_func(x, y) -print(output.asnumpy()) -``` - -输出: - -```text -[[3. 3. 3.] - [3. 3. 3.] - [3. 3. 3.]] -``` - -## 执行网络 - -在construct中定义网络结构,在具体运行时,下例中,执行net(x, y)时,会从construct函数中开始执行。 - -```python -import numpy as np -import mindspore.nn as nn -import mindspore.ops as ops -from mindspore import context, Tensor - -context.set_context(mode=context.PYNATIVE_MODE, device_target="CPU") - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.mul = ops.Mul() - - def construct(self, x, y): - return self.mul(x, y) - -x = Tensor(np.array([1.0, 2.0, 3.0]).astype(np.float32)) -y = Tensor(np.array([4.0, 5.0, 6.0]).astype(np.float32)) - -net = Net() -print(net(x, y)) -``` - -输出: - -```text -[ 4. 10. 18.] -``` - -## 构建网络 - -可以在网络初始化时,明确定义网络所需要的各个部分,在construct中定义网络结构。 - -```python -import mindspore.nn as nn -from mindspore.common.initializer import Normal - -class LeNet5(nn.Cell): - def __init__(self, num_class=10, num_channel=1, include_top=True): - super(LeNet5, self).__init__() - self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid') - self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid') - self.relu = nn.ReLU() - self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) - self.include_top = include_top - if self.include_top: - self.flatten = nn.Flatten() - self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02)) - self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02)) - self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02)) - - - def construct(self, x): - x = self.conv1(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.conv2(x) - x = self.relu(x) - x = self.max_pool2d(x) - if not self.include_top: - return x - x = self.flatten(x) - x = self.relu(self.fc1(x)) - x = self.relu(self.fc2(x)) - x = self.fc3(x) - return x -``` - -## 设置Loss函数及优化器 - -在PyNative模式下,通过优化器针对每个参数对应的梯度进行参数更新。 - -```python -net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") -net_opt = nn.Momentum(network.trainable_params(), config.lr, config.momentum) -``` - -## 保存模型参数 - -保存模型可以通过定义CheckpointConfig来指定模型保存的参数。 - -save_checkpoint_steps:每多少个step保存一下参数;keep_checkpoint_max:最多保存多少份模型参数。详细使用方式请参考[保存模型](https://www.mindspore.cn/docs/programming_guide/zh-CN/master/save_model.html)。 - -```python -config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps, - keep_checkpoint_max=config.keep_checkpoint_max) -ckpoint_cb = ModelCheckpoint(prefix="checkpoint_lenet", directory=config.ckpt_path, config=config_ck) -``` - -## 训练网络 - -```python -context.set_context(mode=context.PYNATIVE_MODE, device_target=config.device_target) -ds_train = create_dataset(os.path.join(config.data_path, "train"), config.batch_size) -network = LeNet5(config.num_classes) -net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean") -net_opt = nn.Momentum(network.trainable_params(), config.lr, config.momentum) -time_cb = TimeMonitor(data_size=ds_train.get_dataset_size()) -config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_steps, - keep_checkpoint_max=config.keep_checkpoint_max) -ckpoint_cb = ModelCheckpoint(prefix="checkpoint_lenet", directory=config.ckpt_path, config=config_ck) - -model = Model(network, net_loss, net_opt, metrics={"Accuracy": Accuracy()}, amp_level="O2") -``` - -完整的运行代码可以到ModelZoo下载[lenet](https://gitee.com/mindspore/models/tree/master/official/cv/lenet),在train.py中修改为context.set_context(mode=context.PYNATIVE_MODE, device_target=config.device_target)。 - -## 提升PyNative性能 - -为了提高PyNative模式下的前向计算任务执行速度,MindSpore提供了ms_function功能,该功能可以在PyNative模式下将Python函数或者Python类的方法编译成计算图,通过图优化等技术提高运行速度,如下例所示。 - -```python -import numpy as np -import mindspore.nn as nn -from mindspore import context, Tensor -import mindspore.ops as ops -from mindspore import ms_function - -context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU") - -class TensorAddNet(nn.Cell): - def __init__(self): - super(TensorAddNet, self).__init__() - self.add = ops.Add() - - @ms_function - def construct(self, x, y): - res = self.add(x, y) - return res - -x = Tensor(np.ones([4, 4]).astype(np.float32)) -y = Tensor(np.ones([4, 4]).astype(np.float32)) -net = TensorAddNet() - -z = net(x, y) # Staging mode -add = ops.Add() -res = add(x, z) # PyNative mode -print(res.asnumpy()) -``` - -输出: - -```text -[[3. 3. 3. 3.] - [3. 3. 3. 3.] - [3. 3. 3. 3.] - [3. 3. 3. 3.]] -``` - -上述示例代码中,在`TensorAddNet`类的`construct`之前加装了`ms_function`装饰器,该装饰器会将`construct`方法编译成计算图,在给定输入之后,以图的形式下发执行,而上一示例代码中的`add`会直接以普通的PyNative的方式执行。 - -需要说明的是,加装了`ms_function`装饰器的函数中,如果包含不需要进行参数训练的算子(如`pooling`、`add`等算子),则这些算子可以在被装饰的函数中直接调用,如下例所示。 - -示例代码: - -```python -import numpy as np -import mindspore.nn as nn -from mindspore import context, Tensor -import mindspore.ops as ops -from mindspore import ms_function - -context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU") - -add = ops.Add() - -@ms_function -def add_fn(x, y): - res = add(x, y) - return res - -x = Tensor(np.ones([4, 4]).astype(np.float32)) -y = Tensor(np.ones([4, 4]).astype(np.float32)) -z = add_fn(x, y) -print(z.asnumpy()) -``` - -输出: - -```text -[[2. 2. 2. 2.] - [2. 2. 2. 2.] - [2. 2. 2. 2.] - [2. 2. 2. 2.]] -``` - -如果被装饰的函数中包含了需要进行参数训练的算子(如`Convolution`、`BatchNorm`等算子),则这些算子必须在被装饰的函数之外完成实例化操作,如下例所示。 - -示例代码: - -```python -import numpy as np -import mindspore.nn as nn -from mindspore import context, Tensor -from mindspore import ms_function - -context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU") - -conv_obj = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3, stride=2, padding=0) -conv_obj.init_parameters_data() -@ms_function -def conv_fn(x): - res = conv_obj(x) - return res - -input_data = np.random.randn(2, 3, 6, 6).astype(np.float32) -z = conv_fn(Tensor(input_data)) -print(z.asnumpy()) -``` - -输出: - -```text -[[[[ 0.10377571 -0.0182163 -0.05221086] -[ 0.1428334 -0.01216263 0.03171652] -[-0.00673915 -0.01216291 0.02872104]] - -[[ 0.02906547 -0.02333629 -0.0358406 ] -[ 0.03805163 -0.00589525 0.04790922] -[-0.01307234 -0.00916951 0.02396654]] - -[[ 0.01477884 -0.06549098 -0.01571796] -[ 0.00526886 -0.09617482 0.04676902] -[-0.02132788 -0.04203424 0.04523344]] - -[[ 0.04590619 -0.00251453 -0.00782715] -[ 0.06099087 -0.03445276 0.00022781] -[ 0.0563223 -0.04832596 -0.00948266]]] - -[[[ 0.08444098 -0.05898955 -0.039262 ] -[ 0.08322686 -0.0074796 0.0411371 ] -[-0.02319113 0.02128408 -0.01493311]] - -[[ 0.02473745 -0.02558945 -0.0337843 ] -[-0.03617039 -0.05027632 -0.04603915] -[ 0.03672804 0.00507637 -0.08433761]] - -[[ 0.09628943 0.01895323 -0.02196114] -[ 0.04779419 -0.0871575 0.0055248 ] -[-0.04382382 -0.00511185 -0.01168541]] - -[[ 0.0534859 0.02526264 0.04755395] -[-0.03438103 -0.05877855 0.06530266] -[ 0.0377498 -0.06117418 0.00546303]]]] -``` - -更多ms_function的功能可以参考[ms_function文档](https://mindspore.cn/docs/programming_guide/zh-CN/master/ms_function.html) - -## PyNative下同步执行 - -PyNative模式下算子默认为异步执行,可以通过设置context来控制是否异步执行,当算子执行失败时,可以方便地通过调用栈看到出错的代码位置。 - -设置为同步执行: - -```python -context.set_context(pynative_synchronize=True) -``` - -示例代码: - -```python -import numpy as np -import mindspore.context as context -import mindspore.nn as nn -from mindspore import Tensor -from mindspore import dtype as mstype -import mindspore.ops as ops - -context.set_context(mode=context.PYNATIVE_MODE, device_target="Ascend", pynative_synchronize=True) - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.get_next = ops.GetNext([mstype.float32], [(1, 1)], 1, "test") - - def construct(self, x1,): - x = self.get_next() - x = x + x1 - return x - -context.set_context() -x1 = np.random.randn(1, 1).astype(np.float32) -net = Net() -output = net(Tensor(x1)) -print(output.asnumpy()) -``` - -输出:此时算子为同步执行,当算子执行错误时,可以看到完整的调用栈,找到出错的代码行。 - -```text -Traceback (most recent call last): - File "test_pynative_sync_control.py", line 41, in - output = net(Tensor(x1)) - File "mindspore/mindspore/nn/cell.py", line 406, in - output = self.run_construct(cast_inputs, kwargs) - File "mindspore/mindspore/nn/cell.py", line 348, in - output = self.construct(*cast_inputs, **kwargs) - File "test_pynative_sync_control.py", line 33, in - x = self.get_next() - File "mindspore/mindspore/ops/primitive.py", line 247, in - return _run_op(self, self.name, args) - File "mindspore/mindspore/common/api.py", line 77, in - results = fn(*arg, **kwargs) - File "mindspore/mindspore/ops/primitive.py", line 677, in _run_op - output = real_run_op(obj, op_name, args) -RuntimeError: mindspore/ccsrc/runtime/device/kernel_runtime.cc:1006 DebugStreamSync] Op Default/GetNext-op0 run failed! -``` - -## Hook功能 - -调试深度学习网络是每一个深度学习领域的从业者需要面对,且投入大量精力的工作。由于深度学习网络隐藏了中间层算子的输入、输出数据以及反向梯度,只提供网络输入数据(特征量、权重)的梯度,导致无法准确地感知中间层算子的数据变化,从而影响调试效率。为了方便用户准确、快速地对深度学习网络进行调试,MindSpore在PyNative模式下设计了Hook功能。使用Hook功能可以捕获中间层算子的输入、输出数据以及反向梯度。目前,PyNative模式下提供了四种形式的Hook功能,分别是:HookBackward算子和在Cell对象上进行注册的register_forward_pre_hook、register_forward_hook、register_backward_hook功能。 - -### HookBackward算子 - -HookBackward将Hook功能以算子的形式实现。用户初始化一个HookBackward算子,将其安插到深度学习网络中需要捕获梯度的位置。在网络正向执行时,HookBackward算子将输入数据不做任何修改地原样输出;在网络反向传播梯度时,在HookBackward上注册的Hook函数将会捕获反向传播至此的梯度。用户可以在Hook函数中自定义对梯度的操作,比如打印梯度,或者返回新的梯度。 - -示例代码: - -```python -import mindspore -from mindspore import ops -from mindspore import Tensor -from mindspore import context -from mindspore.ops import GradOperation - -context.set_context(mode=context.PYNATIVE_MODE) - -def hook_fn(grad_out): - print(grad_out) - -grad_all = GradOperation(get_all=True) -hook = ops.HookBackward(hook_fn) -def hook_test(x, y): - z = x * y - z = hook(z) - z = z * y - return z - -def net(x, y): - return grad_all(hook_test)(x, y) - -output = net(Tensor(1, mindspore.float32), Tensor(2, mindspore.float32)) -print(output) -``` - -输出: - -```python -(Tensor(shape=[], dtype=Float32, value= 2),) -(Tensor(shape=[], dtype=Float32, value= 4), Tensor(shape=[], dtype=Float32, value= 4)) -``` - -更多HookBackward算子的说明可以参考[API文档](https://mindspore.cn/docs/api/zh-CN/master/api_python/ops/mindspore.ops.HookBackward.html)。 - -### Cell对象的register_forward_pre_hook功能 - -用户可以在Cell对象上使用register_forward_pre_hook函数来注册一个自定义的Hook函数,用来捕获正向传入该Cell对象的数据。register_forward_pre_hook函数接收Hook函数作为入参,并返回一个与Hook函数一一对应的`handle`对象。用户可以通过调用`handle`对象的`remove()`函数来删除与之对应的Hook函数。 -Hook函数应该按照以下的方式进行定义。 - -示例代码: - -```python -def forward_pre_hook_fn(cell_id, inputs): - print("forward inputs: ", inputs) -``` - -这里的`cell_id`是Cell对象的名称以及ID信息,`inputs`是正向传入到Cell对象的数据。因此,用户可以使用register_forward_pre_hook函数来捕获网络中某一个Cell对象的正向输入数据。用户可以在Hook函数中自定义对输入数据的操作,比如打印数据,或者返回新的输入数据给当前的Cell对象。 - -示例代码: - -```python -import numpy as np -import mindspore -import mindspore.nn as nn -from mindspore import Tensor -from mindspore import context -from mindspore.ops import GradOperation - -context.set_context(mode=context.PYNATIVE_MODE) - -def forward_pre_hook_fn(cell_id, inputs): - print("forward inputs: ", inputs) - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.mul = nn.MatMul() - self.handle = self.mul.register_forward_pre_hook(forward_pre_hook_fn) - - def construct(self, x, y): - x = x + x - x = self.mul(x, y) - return x - -grad = GradOperation(get_all=True) -net = Net() -output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32))) -print(output) -net.handle.remove() -output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32))) -print(output) -``` - -输出: - -```python -forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00])) -(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00])) -(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00])) -``` - -为了避免切换到图模式时脚本运行失败,不建议将register_forward_pre_hook函数直接写在Cell对象的`construct`函数中。 - -### Cell对象的register_forward_hook功能 - -用户可以在Cell对象上使用register_forward_hook函数来注册一个自定义的Hook函数,用来捕获正向传入Cell对象的数据和Cell对象的输出数据。register_forward_hook函数接收Hook函数作为入参,并返回一个与Hook函数一一对应的`handle`对象。用户可以通过调用`handle`对象的`remove()`函数来删除与之对应的Hook函数。 -Hook函数应该按照以下的方式进行定义。 - -示例代码: - -```python -def forward_hook_fn(cell_id, inputs, outputs): - print("forward inputs: ", inputs) - print("forward outputs: ", outputs) -``` - -这里的`cell_id`是Cell对象的名称以及ID信息,`inputs`是正向传入到Cell对象的数据,`outputs`是Cell对象的正向输出数据。因此,用户可以使用register_forward_hook函数来捕获网络中某一个Cell对象的正向输入数据和输出数据。用户可以在Hook函数中自定义对输入、输出数据的操作,比如打印数据,或者返回新的输出数据。 - -示例代码: - -```python -import numpy as np -import mindspore -import mindspore.nn as nn -from mindspore import Tensor -from mindspore import context -from mindspore.ops import GradOperation - -context.set_context(mode=context.PYNATIVE_MODE) - -def forward_hook_fn(cell_id, inputs, outputs): - print("forward inputs: ", inputs) - print("forward outputs: ", outputs) - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.mul = nn.MatMul() - self.handle = self.mul.register_forward_hook(forward_hook_fn) - - def construct(self, x, y): - x = x + x - x = self.mul(x, y) - return x - -grad = GradOperation(get_all=True) -net = Net() -output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32))) -print(output) -net.handle.remove() -output = grad(net)(Tensor(np.ones([1]).astype(np.float32)), Tensor(np.ones([1]).astype(np.float32))) -print(output) -``` - -输出: - -```python -forward inputs: (Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00])) -forward outputs: 2.0 -(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00])) -(Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00])) -``` - -为了避免切换到图模式时脚本运行失败,不建议将register_forward_hook函数直接写在Cell对象的`construct`函数中。 - -### Cell对象的register_backward_hook功能 - -用户可以在Cell对象上使用register_backward_hook接口来注册一个自定义的Hook函数,用来捕获网络反向传播时与Cell对象相关联的梯度。register_backward_hook函数接收Hook函数作为入参,并返回一个与Hook函数一一对应的`handle`对象。用户可以通过调用`handle`对象的`remove()`函数来删除与之对应的Hook函数。 - -与HookBackward算子所使用的自定义Hook函数有所不同,register_backward_hook使用的Hook函数的入参中包含了表示Cell对象名称与id信息的cell_id、反向传入到Cell对象的梯度、以及Cell对象的反向输出的梯度。 - -示例代码: - -```python -def backward_hook_function(cell_id, grad_input, grad_output): - print(grad_input) - print(grad_output) -``` - -这里的`grad_input`是网络反向传播时,传入到Cell对象的梯度,它对应于正向过程中下一个算子的反向输出梯度;`grad_output`是Cell对象反向输出的梯度。因此,用户可以使用register_backward_hook函数来捕获网络中某一个Cell对象的反向传入和反向输出梯度。用户可以在Hook函数中自定义对梯度的操作,比如打印梯度,或者返回新的输出梯度。 - -示例代码: - -```python -import numpy as np -import mindspore -import mindspore.nn as nn -from mindspore import Tensor -from mindspore import context -from mindspore.ops import GradOperation - -context.set_context(mode=context.PYNATIVE_MODE) - -def backward_hook_function(cell_id, grad_input, grad_output): - print(grad_input) - print(grad_output) - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.conv = nn.Conv2d(1, 2, kernel_size=2, stride=1, padding=0, weight_init="ones", pad_mode="valid") - self.bn = nn.BatchNorm2d(2, momentum=0.99, eps=0.00001, gamma_init="ones") - self.handle = self.bn.register_backward_hook(backward_hook_function) - self.relu = nn.ReLU() - - def construct(self, x): - x = self.conv(x) - x = self.bn(x) - x = self.relu(x) - return x - -net = Net() -grad_all = GradOperation(get_all=True) -output = grad_all(net)(Tensor(np.ones([1, 1, 2, 2]).astype(np.float32))) -print(output) -net.handle.remove() -output = grad_all(net)(Tensor(np.ones([1, 1, 2, 2]).astype(np.float32))) -print(output) -``` - -输出: - -```python -(Tensor(shape=[1, 2, 1, 1], dtype=Float32, value= -[[[[ 1.00000000e+00]], - [[ 1.00000000e+00]]]]),) -(Tensor(shape=[1, 2, 1, 1], dtype=Float32, value= -[[[[ 9.99994993e-01]], - [[ 9.99994993e-01]]]]),) -(Tensor(shape=[1, 1, 2, 2], dtype=Float32, value= -[[[[ 1.99998999e+00, 1.99998999e+00], - [ 1.99998999e+00, 1.99998999e+00]]]]),) -(Tensor(shape=[1, 1, 2, 2], dtype=Float32, value= -[[[[ 1.99998999e+00, 1.99998999e+00], - [ 1.99998999e+00, 1.99998999e+00]]]]),) -``` - -为了避免切换到图模式时脚本运行失败,不建议将register_backward_hook函数直接写在Cell对象的`construct`函数中。更多关于Cell对象的register_backward_hook功能的说明可以参考[API文档](https://mindspore.cn/docs/api/zh-CN/master/api_python/nn/mindspore.nn.Cell.html#mindspore.nn.Cell.register_backward_hook)。 - -## 自定义bprop功能 - -用户可以自定义Cell对象的反向传播(计算)函数,从而控制Cell对象梯度计算的过程,定位梯度问题。自定义bprop函数的使用方法是:在定义的Cell对象里面增加一个用户自定义的bprop函数。训练的过程中会使用用户自定义的bprop函数来生成反向图。 - -示例代码: - -```python -import mindspore -import mindspore.nn as nn -from mindspore import Tensor -from mindspore import context -from mindspore.ops import GradOperation - -context.set_context(mode=context.PYNATIVE_MODE, device_target="GPU") - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - - def construct(self, x, y): - z = x * y - z = z * y - return z - - def bprop(self, x, y, out, dout): - x_dout = x + y - y_dout = x * y - return x_dout, y_dout - -grad_all = GradOperation(get_all=True) -output = grad_all(Net())(Tensor(1, mindspore.float32), Tensor(2, mindspore.float32)) -print(output) -``` - -输出: - -```python -(Tensor(shape=[], dtype=Float32, value= 3), Tensor(shape=[], dtype=Float32, value= 2)) -```