diff --git a/tutorials/experts/source_en/index.rst b/tutorials/experts/source_en/index.rst index a67e32d20f3d86befbd3f470180c5958fe64f488..d32087d7d9a00c6fd361568cc2f59af3936bd057 100644 --- a/tutorials/experts/source_en/index.rst +++ b/tutorials/experts/source_en/index.rst @@ -40,7 +40,6 @@ For Experts :maxdepth: 1 :caption: Custom Operator - operation/op_ascend operation/op_custom .. toctree:: diff --git a/tutorials/experts/source_en/operation/op_ascend.md b/tutorials/experts/source_en/operation/op_ascend.md deleted file mode 100644 index 192300b08c10a051c5d017d2877ad5c7a1333e4a..0000000000000000000000000000000000000000 --- a/tutorials/experts/source_en/operation/op_ascend.md +++ /dev/null @@ -1,372 +0,0 @@ -# Custom Operators (Ascend) - - - -## Overview - -When built-in operators cannot meet requirements during network development, you can call the Python API of MindSpore to quickly extend custom operators of the Ascend AI processor. - -To add a custom operator, you need to register the operator primitive, implement the operator, and register the operator information. - -The related concepts are as follows: - -- Operator primitive: defines the frontend API prototype of an operator on the network. It is the basic unit for forming a network model and includes the operator name, attribute (optional), input and output names, output shape inference method, and output dtype inference method. -- Operator implementation: describes the implementation of the internal computation logic for an operator through the DSL API provided by the Tensor Boost Engine (TBE). The TBE supports the development of custom operators based on the Ascend AI chip. -- Operator information: describes basic information about a TBE operator, such as the operator name and supported input and output types. It is the basis for the backend to select and map operators. - -This section takes a Square operator as an example to describe how to customize an operator. - -> For details, see cases in [tests/st/ops/custom_ops_tbe](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe) in the MindSpore source code. - -## Registering the Operator Primitive - -The primitive of an operator is a subclass inherited from `PrimitiveWithInfer`. The type name of the subclass is the operator name. - -The definition of the custom operator primitive is the same as that of the built-in operator primitive. - -- The attribute is defined by the input parameter of the constructor function `__init__`. The operator in this test case has no attribute. Therefore, `__init__` has only one input parameter. For details about test cases in which operators have attributes, see [custom add3](https://gitee.com/mindspore/mindspore/blob/master/tests/st/ops/custom_ops_tbe/cus_add3.py) in the MindSpore source code. -- The input and output names are defined by the `init_prim_io_names` function. -- The shape inference method of the output tensor is defined in the `infer_shape` function, and the dtype inference method of the output tensor is defined in the `infer_dtype` function. - -The only difference between a custom operator and a built-in operator is that the operator implementation function (`from square_impl import CusSquareImpl`) needs to be imported to the `__init__` function to register the operator implementation with the backend for the custom operator. In this test case, the operator implementation and information are defined in `square_impl.py`, and the definition will be described in the following parts. - -The following code takes the Square operator primitive `cus_square.py` as an example: - -```python -from mindspore.ops import prim_attr_register, PrimitiveWithInfer -import mindspore.ops as ops -# y = x^2 -class CusSquare(PrimitiveWithInfer): - """ - The definition of the CusSquare primitive. - """ - @prim_attr_register - def __init__(self): - self.init_prim_io_names(inputs=['x'], outputs=['y']) - from square_impl import CusSquareImpl # Import the entry function of the kernel implementation from relative path or PYTHONPATH. - - def infer_shape(self, data_shape): - return data_shape - - def infer_dtype(self, data_dtype): - return data_dtype -``` - -## Implementing a TBE Operator and Registering the Operator Information - -### Implementing a TBE Operator - -To compile an operator implementation, you need to compile a computable function and an entry function first. - -The computable function of an operator is mainly used to encapsulate the computation logic of the operator for the main function to call. The computation logic is implemented by calling the combined API of the TBE. - -The entry function of an operator describes the internal process of compiling the operator. The process is as follows: - -1. Prepare placeholders to be input. A placeholder will return a tensor object that represents a group of input data. -2. Call the computable function. The computable function uses the API provided by the TBE to describe the computation logic of the operator. -3. Call the Schedule scheduling module. The model tiles the operator data based on the scheduling description and specifies the data transfer process to ensure optimal hardware execution. By default, the automatic scheduling module (`auto_schedule`) can be used. -4. Call `cce_build_code` to compile and generate an operator binary file. - -> The input parameters of the entry function require the input information of each operator, output information of each operator, operator attributes (optional), and `kernel_name` (name of the generated operator binary file). The input and output information is encapsulated in dictionaries, including the input and output shape and dtype when the operator is called on the network. - -For details about TBE operator development, visit the [TBE website](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_10_0063.html). For details about how to debug and optimize the TBE operator, visit the [Mind Studio website](https://support.huaweicloud.com/usermanual-mindstudioc73/atlasmindstudio_02_0043.html). - -### Registering the Operator Information - -The operator information is key for the backend to select the operator implementation and guides the backend to insert appropriate type and format conversion operators. It uses the `TBERegOp` API for definition and uses the `op_info_register` decorator to bind the operator information to the entry function of the operator implementation. When the .py operator implementation file is imported, the `op_info_register` decorator registers the operator information to the operator information library at the backend. For details about how to use the operator information, see comments for the member method of `TBERegOp`. For the specific field meaning of the operator information, visit the [TBE website](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_10_0096.html). - -> - The numbers and sequences of the input and output information defined in the operator information must be the same as those in the parameters of the entry function of the operator implementation and those listed in the operator primitive. -> -> - If an operator has attributes, use `attr` to describe the attribute information in the operator information. The attribute names must be the same as those in the operator primitive definition. - -### Example - -The following takes the TBE implementation `square_impl.py` of the `Square` operator as an example. `square_compute` is a computable function of the operator implementation. It describes the computation logic of `x * x` by calling the API provided by `te.lang.cce`. `cus_square_op_info` is the operator information, which is defined by `TBERegOp`. - -Note the following parameters when setting `TBERegOp`: - -- `OPAQUE` in `fusion_type("OPAQUE")` indicates that the custom operator uses the non-fusion strategy. -- `CusSquareImpl` in `kernel_name("CusSquareImpl")` must be the same as the name of the operator entry function. -- `dtype_format` is used to describe data types supported by the operator. In the following example, two types are registered, indicating that the operator supports two data types. Each type describes the supported format in order of input and output. The first `dtype_format` indicates that the data type input0 is in F32_Default format and the data type output0 is in F32_Default format. The second `dtype_format` indicates that the data type input0 is in F16_Default format and the data type output0 is in F16_Default format. -- About the interfaces `auto_schedule` and `cce_build_code`, please see the TBE documents [auto_schedule](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_07_0071.html) and [cce_build_code](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_07_0072.html) for details. - -```python -from __future__ import absolute_import -from te import tvm -from topi import generic -import te.lang.cce -from topi.cce import util -from mindspore.ops import op_info_register, TBERegOp, DataType - -def square_compute(input_x): - """ - The compute function of the CusSquare implementation. - """ - res = te.lang.cce.vmul(input_x, input_x) - return res - -# Define the kernel info of CusSquare. -cus_square_op_info = TBERegOp("CusSquare") \ - .fusion_type("OPAQUE") \ - .partial_flag(True) \ - .async_flag(False) \ - .binfile_name("square.so") \ - .compute_cost(10) \ - .kernel_name("CusSquareImpl") \ - .input(0, "x", False, "required", "all") \ - .output(0, "y", False, "required", "all") \ - .dtype_format(DataType.F32_Default, DataType.F32_Default) \ - .dtype_format(DataType.F16_Default, DataType.F16_Default) \ - .get_op_info() - -# Binding kernel info with the kernel implementation. -@op_info_register(cus_square_op_info) -def CusSquareImpl(input_x, output_y, kernel_name="CusSquareImpl"): - """ - The entry function of the CusSquare implementation. - """ - shape = input_x.get("shape") - dtype = input_x.get("dtype").lower() - - shape = util.shape_refine(shape) - data = tvm.placeholder(shape, name="data", dtype=dtype.lower()) - - with tvm.target.cce(): - res = square_compute(data) - sch = generic.auto_schedule(res) - - config = {"print_ir": False, - "name": kernel_name, - "tensor_list": [data, res]} - - te.lang.cce.cce_build_code(sch, config) -``` - -The usage of custom operators is the same as that of built-in operators in the network. The operators can be directly used by importing primitives. The following takes the single-operator network test of `CusSquare` as an example. - -Define the network in the `test_square.py` file. - -```python -import numpy as np -import mindspore.nn as nn -import mindspore as ms -# Import the definition of the CusSquare primitive. -from cus_square import CusSquare -ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend") - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.square = CusSquare() - - def construct(self, data): - return self.square(data) - -def test_net(): - x = np.array([1.0, 4.0, 9.0]).astype(np.float32) - square = Net() - output = square(ms.Tensor(x)) - print("x: ", x) - print("output: ", output) -``` - -Execute the test case. - -```bash -pytest -s tests/st/ops/custom_ops_tbe/test_square.py::test_net -``` - -The execution result is as follows: - -```text -x: [1. 4. 9.] -output: [1. 16. 81.] -``` - -## Implementing an AICPU Operator and Registering the Operator Information - -### Implementing an AICPU Operator - -The AICPU operator based on CANN includes operator prototype definition, operator code implementation, operator repository definition and other steps, for specific development steps, please refer to [CANN AICPU Custom Operator Development](https://support.huaweicloud.com/usermanual-mindstudio303/atlasms_02_0194.html). - -After the development is completed, a file with a specified name will be compiled, such as `libmindspore_aicpu_kernels.so`, `libcust_reshape.so` files. These dynamic libraries can contain one or more AICPU operator implementations, put the file into the lib directory under the MindSpore installation or compilation directory. MindSpore can load the file through subsequent custom operator registration information. - -> The dynamic library file implemented by the operator needs to be placed in the lib directory of MindSpore. For example, MindSpore is installed in the virtual environment `/home/conda/envs/aicpu/lib/python3.7/site-packages/mindspore`, so file of the aicpu needs to be placed in the `/home/conda/envs/aicpu/lib/python3.7/site-packages/mindspore/lib/` directory, This allows the file to be loaded normally. - -For more information on debugging and performance optimization of AICPU operators, see [MindStudio Documentation](https://support.huaweicloud.com/usermanual-mindstudioc73/atlasmindstudio_02_0043.html). - -### Registering the AICPU Custom Operator Information - -After completing the previous step, consistent with the TBE operator, we need to supplement the operator information. The AICPU operator is defined through the `AiCPURegOp` interface, and the operator information is bound to the operator implementation entry function through the `op_info_register` decorator. When the operator implements importing the py file, the `op_info_register` decorator registers the operator information in the operator database on the back end. For more information about the use of operator information, please refer to the annotation of the member method of `AiCPURegOp`, and the field meaning of operator information can be found in [AICPU Documentation](https://support.huaweicloud.com/usermanual-mindstudio303/atlasms_02_0194.html). - -> - The number and order of input and output information defined in the operator information, the number and order of the input and output information in the parameters of the operator implementation entry function, and the number and order of the input and output name list in the operator primitive should be completely consistent. -> - If the operator has attributes, the property information needs to be described with `attr` in the operator information, and the attribute name is consistent with the attribute name in the operator primitive definition. - -It should be noted that in addition to the basic registration information, we need to add an additional `attr("cust_aicpu", "str")` attribute, which is the so name used to get the operator implementation. Taking the `RandomChoiceWithMask` operator as an example, assuming that we have defined the operator primitive and the operator implementation has been compiled to `librandom_choice_with_mask.so`, then we only need to add `attr("cust_aicpu", "str")` to the operator database, and then set the attribute value to `"random_choice_with_ mask"` to complete registering the operator in the custom AICPU operator list when the operator is defined. - -> The value of "cust_aicpu" is a string, and the name of the operator so is denoted by the `lib` prefix and the `.so` suffix, such as `libmindspore_aicpu_kernels.so` is set to `"mindspore_aicpu_kernels"`. - -```python -from mindspore.ops import op_info_register, AiCPURegOp, DataType - -random_choice_with_mask_op_info = AiCPURegOp("RandomChoiceWithMask") \ - .fusion_type("OPAQUE") \ - .input(0, "x", "required") \ - .output(0, "y", "required") \ - .output(1, "mask", "required") \ - .attr("count", "int") \ - .attr("seed", "int") \ - .attr("seed2", "int") \ - .attr("cust_aicpu", "str") \ - .dtype_format(DataType.BOOL_Default, DataType.I32_Default, DataType.BOOL_Default) \ - .get_op_info() - -@op_info_register(random_choice_with_mask_op_info) -def _random_choice_with_mask_aicpu(): - """RandomChoiceWithMask AiCPU register""" - return -``` - -### Example - -The following is an example of the AICPU call implementation of the `Dropout2D` operator, and we will go through four steps: operator implementation, operator primitive registration, operator information database, and operator call: - -1. Operator implementation: Referring to [Implementing an AICPU Operator](#implementing-an-aicpu-operator), we compile the operator to `libmindspore_aicpu_kernels.so`. -2. Operator primitive registration: Referring to [Register the Operator Primitives](#registering-the-operator-primitive), we define a Dropout2D operator. -3. Operator information database: Referring to [Registering the AICPU Custom Operator Information](#registering-the-aicpu-custom-operator-information), we implement the Dropout2D database and add the attribute of `"cust_aicpu"`. -4. Operator call: We can call the Dropout2D operator normally in the form of a single operator network, and at the same time, we can configure the attribute value of `"cust_aicpu"` to be `mindspore_aicpu_kernels`. - -```python -import numpy as np -from mindspore.ops import prim_attr_register, PrimitiveWithInfer -import mindspore as ms -from mindspore.ops import op_info_register, AiCPURegOp, DataType -import mindspore.nn as nn -import mindspore.ops as ops -ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend") - -class Dropout2D(PrimitiveWithInfer): - @prim_attr_register - def __init__(self, keep_prob=0.5): - """Initialize Dropout2D.""" - pass - - def infer_shape(self, x_shape): - return x_shape, x_shape - - def infer_dtype(self, x_dtype): - mask_dtype = ms.tensor_type(ms.bool_) - return x_dtype, mask_dtype - -dropout2d_op_info = AiCPURegOp("Dropout2D") \ - .fusion_type("OPAQUE") \ - .input(0, "x", "required") \ - .output(0, "y", "required") \ - .output(1, "mask", "required") \ - .attr("keep_prob", "float") \ - .attr("cust_aicpu", "str") \ - .dtype_format(DataType.BOOL_Default, DataType.BOOL_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I8_Default, DataType.I8_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I16_Default, DataType.I16_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I32_Default, DataType.I32_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I64_Default, DataType.I64_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U8_Default, DataType.U8_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U16_Default, DataType.U16_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U32_Default, DataType.U32_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U64_Default, DataType.U64_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.F16_Default, DataType.F16_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.F32_Default, DataType.F32_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.F64_Default, DataType.F64_Default, DataType.BOOL_Default) \ - .get_op_info() - -@op_info_register(dropout2d_op_info) -def _dropout2d_aicpu(): - """Dropout2D AiCPU register""" - return - -class NetDropout2D(nn.Cell): - def __init__(self, keep_prob=0.5): - super(NetDropout2D, self).__init__() - self.op = Dropout2D(keep_prob) - self.op.add_prim_attr("cust_aicpu", "mindspore_aicpu_kernels") - - def construct(self, inputs): - return self.op(inputs) - -if __name__ == "__main__": - input_tensor = ms.Tensor(np.ones([1, 1, 2, 3]), ms.float32) - dropout2d_nn = NetDropout2D(0.5) - output, mask = dropout2d_nn(input_tensor) - print("output: ", output) - print("mask: ", mask) -``` - -The execution result is as follows: - -```text -output: [[[[0.0.0.] - [0.0.0.]]]] -mask: [[[[False False False] - [False False False]]]] -``` - -## Defining the bprop Function for an Operator - -If an operator needs to support automatic differentiation, the bprop function needs to be defined in the primitive of the operator. In the bprop function, you need to describe the backward computation logic that uses the forward input, forward output, and output gradients to obtain the input gradients. The backward computation logic can be composed of built-in operators or custom backward operators. - -Note the following points when defining the bprop function: - -- The input parameter sequence of the bprop function is the forward input, forward output, and output gradients. For a multi-output operator, the forward output and output gradients are provided in the form of tuples. -- The return value of the bprop function is tuples consisting of input gradients. The sequence of elements in a tuple is the same as that of the forward input parameters. Even if there is only one input gradient, the return value must be a tuple. - -For example, the `CusSquare` primitive after the bprop function is added is as follows: - -```python -class CusSquare(PrimitiveWithInfer): - @prim_attr_register - def __init__(self): - """init CusSquare""" - self.init_prim_io_names(inputs=['x'], outputs=['y']) - from square_impl import CusSquareImpl - - def infer_shape(self, data_shape): - return data_shape - - def infer_dtype(self, data_dtype): - return data_dtype - - def get_bprop(self): - def bprop(data, out, dout): - twos_like = ops.OnesLike()(data) * 2.0 - gradient = ops.Mul()(data, twos_like) - dx = ops.Mul()(gradient, dout) - return (dx,) - return bprop -``` - -Define backward cases in the `test_square.py` file. - -```python -import mindspore.ops as ops -def test_grad_net(): - x = np.array([1.0, 4.0, 9.0]).astype(np.float32) - sens = np.array([1.0, 1.0, 1.0]).astype(np.float32) - square = Net() - grad = ops.GradOperation(sens_param=True) - dx = grad(square)(ms.Tensor(x), ms.Tensor(sens)) - print("x: ", x) - print("dx: ", dx) -``` - -Execute the test case. - -```bash -pytest -s tests/st/ops/custom_ops_tbe/test_square.py::test_grad_net -``` - -The execution result is as follows: - -```text -x: [1. 4. 9.] -dx: [2. 8. 18.] -``` diff --git a/tutorials/experts/source_zh_cn/index.rst b/tutorials/experts/source_zh_cn/index.rst index 2652c85cc598a2fdb10e134207e9030d8b7e9f8c..5f14387f7b0032c718bf3bb81fc5f1d50bd71fa2 100644 --- a/tutorials/experts/source_zh_cn/index.rst +++ b/tutorials/experts/source_zh_cn/index.rst @@ -40,7 +40,6 @@ :maxdepth: 1 :caption: 自定义算子 - operation/op_ascend operation/op_custom .. toctree:: diff --git a/tutorials/experts/source_zh_cn/operation/op_ascend.md b/tutorials/experts/source_zh_cn/operation/op_ascend.md deleted file mode 100644 index 0cfc6e316f7460326832cca3fe3fe809f2437214..0000000000000000000000000000000000000000 --- a/tutorials/experts/source_zh_cn/operation/op_ascend.md +++ /dev/null @@ -1,375 +0,0 @@ -# 自定义算子(Ascend) - - - -## 概述 - -当开发网络遇到内置算子不足以满足需求时,你可以利用MindSpore的Python API方便快捷地扩展昇腾AI处理器的自定义算子。 - -添加一个自定义算子,需要完成算子原语注册、算子实现、算子信息注册三部分工作。 - -其中: - -- 算子原语:定义了算子在网络中的前端接口原型,也是组成网络模型的基础单元,主要包括算子的名称、属性(可选)、输入输出名称、输出shape推理方法、输出dtype推理方法等信息。 -- 算子实现:通过TBE(Tensor Boost Engine)提供的特性语言接口,描述算子内部计算逻辑的实现。TBE提供了开发昇腾AI芯片自定义算子的能力。 -- 算子信息:描述TBE算子的基本信息,如算子名称、支持的输入输出类型等。它是后端做算子选择和映射时的依据。 - -本文将以自定义Square算子为例,介绍自定义算子的步骤。 - -> 更多详细内容可参考MindSpore源码中[tests/st/ops/custom_ops_tbe](https://gitee.com/mindspore/mindspore/tree/master/tests/st/ops/custom_ops_tbe)下的用例。 - -## 注册算子原语 - -每个算子的原语是一个继承于`PrimitiveWithInfer`的子类,其类型名称即是算子名称。 - -自定义算子原语与内置算子原语的接口定义完全一致: - -- 属性由构造函数`__init__`的入参定义。本用例的算子没有属性,因此`__init__`没有额外的入参。带属性的用例可参考MindSpore源码中的[custom add3](https://gitee.com/mindspore/mindspore/blob/master/tests/st/ops/custom_ops_tbe/cus_add3.py)用例。 -- 输入输出的名称通过`init_prim_io_names`函数定义。 -- 输出Tensor的shape推理方法在`infer_shape`函数中定义,输出Tensor的dtype推理方法在`infer_dtype`函数中定义。 - -自定义算子与内置算子的唯一区别是需要通过在`__init__`函数中导入算子实现函数(`from square_impl import CusSquareImpl`)来将算子实现注册到后端。本用例在`square_impl.py`中定义了算子实现和算子信息,将在后文中说明。 - -以Square算子原语`cus_square.py`为例,给出如下示例代码。 - -```python -from mindspore.ops import prim_attr_register, PrimitiveWithInfer -import mindspore.ops as ops -# y = x^2 -class CusSquare(PrimitiveWithInfer): - """ - The definition of the CusSquare primitive. - """ - @prim_attr_register - def __init__(self): - self.init_prim_io_names(inputs=['x'], outputs=['y']) - from square_impl import CusSquareImpl # Import the entry function of the kernel implementation from relative path or PYTHONPATH. - - def infer_shape(self, data_shape): - return data_shape - - def infer_dtype(self, data_dtype): - return data_dtype -``` - -## 实现TBE算子和注册算子信息 - -### 实现TBE算子 - -通常编写一个算子的实现,需要编写一个计算函数和一个入口函数。 - -算子的计算函数主要用来封装算子的计算逻辑供主函数调用,其内部通过调用TBE的API接口组合实现算子的计算逻辑。 - -算子的入口函数描述了编译算子的内部过程,一般分为如下几步: - -1. 准备输入的placeholder,placeholder是一个占位符,返回一个Tensor对象,表示一组输入数据。 -2. 调用计算函数,计算函数使用TBE提供的API接口描述了算子内部的计算逻辑。 -3. 调用Schedule调度模块,调度模块对算子中的数据按照调度模块的调度描述进行切分,同时指定好数据的搬运流程,确保在硬件上的执行达到最优。默认可以采用自动调度模块(`auto_schedule`)。 -4. 调用`cce_build_code`编译生成算子二进制。 - -> 入口函数的输入参数有特殊要求,需要依次为:算子每个输入的信息、算子每个输出的信息、算子属性(可选)和`kernel_name`(生成算子二进制的名称)。输入和输出的信息用字典封装传入,其中包含该算子在网络中被调用时传入的实际输入和输出的shape和dtype。 - -更多关于使用TBE开发算子的内容请参考[TBE文档](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_10_0063.html),关于TBE算子的调试和性能优化请参考[MindStudio文档](https://support.huaweicloud.com/usermanual-mindstudioc73/atlasmindstudio_02_0043.html)。 - -### 注册算子信息 - -算子信息是指导后端选择算子实现的关键信息,同时也指导后端为算子插入合适的类型和格式转换。它通过`TBERegOp`接口定义,通过`op_info_register`装饰器将算子信息与算子实现入口函数绑定。当算子实现py文件被导入时,`op_info_register`装饰器会将算子信息注册到后端的算子信息库中。更多关于算子信息的使用方法请参考`TBERegOp`的成员方法的注释说明,算子信息的字段含义可以参考[TBE文档](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_10_0096.html)。 - -> - 算子信息中定义输入输出信息的个数和顺序、算子实现入口函数的参数中的输入输出信息的个数和顺序、算子原语中输入输出名称列表的个数和顺序,三者要完全一致。 -> - 算子如果带属性,在算子信息中需要用`attr`描述属性信息,属性的名称与算子原语定义中的属性名称要一致。 - -### 示例 - -下面以`Square`算子的TBE实现`square_impl.py`为例进行介绍。`square_compute`是算子实现的计算函数,通过调用`te.lang.cce`提供的API描述了`x * x`的计算逻辑。`cus_square_op_info`是算子信息,通过`TBERegOp`来定义。 - -`TBERegOp`的设置需要注意以下几点: - -- `TBERegOp("CusSquare")`中算子注册名称`CusSquare`需要与算子名称一致。 -- `fusion_type("OPAQUE")`中`OPAQUE`表示自定义算子采取不融合策略。 -- `kernel_name("CusSquareImpl")`中`CusSquareImpl`需要与算子入口函数名称一致。 -- `dtype_format`用来描述算子支持的数据类型,下面示例中注册了两项,说明该算子支持两种数据类型,每一项需按照输入和输出的顺序依次描述支持的格式。第一个`dtype_format`说明支持的第一种数据类型是input0为F32_Default格式,output0为F32_Default格式。第二个`dtype_format`说明支持的第二种数据类型是input0为F16_Default格式,output0为F16_Default格式。 -- `auto_schedule`、`cce_build_code`等TBE相关接口描述请见TBE文档中[auto_schedule](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_07_0071.html)和[cce_build_code](https://support.huaweicloud.com/odevg-A800_3000_3010/atlaste_07_0072.html)的详细说明。 - -```python -from __future__ import absolute_import -from te import tvm -from topi import generic -import te.lang.cce -from topi.cce import util -from mindspore.ops import op_info_register, TBERegOp, DataType - -def square_compute(input_x): - """ - The compute function of the CusSquare implementation. - """ - res = te.lang.cce.vmul(input_x, input_x) - return res - -# Define the kernel info of CusSquare. -cus_square_op_info = TBERegOp("CusSquare") \ - .fusion_type("OPAQUE") \ - .partial_flag(True) \ - .async_flag(False) \ - .binfile_name("square.so") \ - .compute_cost(10) \ - .kernel_name("CusSquareImpl") \ - .input(0, "x", False, "required", "all") \ - .output(0, "y", False, "required", "all") \ - .dtype_format(DataType.F32_Default, DataType.F32_Default) \ - .dtype_format(DataType.F16_Default, DataType.F16_Default) \ - .get_op_info() - -# Binding kernel info with the kernel implementation. -@op_info_register(cus_square_op_info) -def CusSquareImpl(input_x, output_y, kernel_name="CusSquareImpl"): - """ - The entry function of the CusSquare implementation. - """ - shape = input_x.get("shape") - dtype = input_x.get("dtype").lower() - - shape = util.shape_refine(shape) - data = tvm.placeholder(shape, name="data", dtype=dtype.lower()) - - with tvm.target.cce(): - res = square_compute(data) - sch = generic.auto_schedule(res) - - config = {"print_ir": False, - "name": kernel_name, - "tensor_list": [data, res]} - - te.lang.cce.cce_build_code(sch, config) -``` - -自定义算子与内置算子在网络中的使用方法一样,通过导入原语直接使用。下面以`CusSquare`的单算子网络测试为例进行说明。 - -在`test_square.py`文件中定义网络。 - -```python -import numpy as np -import mindspore.nn as nn -import mindspore as ms -# Import the definition of the CusSquare primitive. -from cus_square import CusSquare -ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend") - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.square = CusSquare() - - def construct(self, data): - return self.square(data) - -def test_net(): - x = np.array([1.0, 4.0, 9.0]).astype(np.float32) - square = Net() - output = square(ms.Tensor(x)) - print("x: ", x) - print("output: ", output) -``` - -执行用例: - -```bash -pytest -s tests/st/ops/custom_ops_tbe/test_square.py::test_net -``` - -执行结果: - -```text -x: [1. 4. 9.] -output: [1. 16. 81.] -``` - -## 实现AICPU算子和注册算子信息 - -### 实现AICPU算子 - -基于CANN开发AICPU算子包含算子原型定义、算子代码实现、算子信息库定义等步骤,具体开发步骤请参考[CANN AICPU 自定义算子开发](https://support.huaweicloud.com/usermanual-mindstudio303/atlasms_02_0194.html)。 - -开发完成之后将编译生成一个指定名称的文件,如`libmindspore_aicpu_kernels.so`,`libcust_reshape.so`这类文件,这些动态库中可包含一个或多个AICPU算子实现,将该文件放到MindSpore安装或者编译目录下的lib目录下,MindSpore即可通过后续自定义算子注册信息加载该文件。 - -> 算子实现的动态库文件,需要放到MindSpore的lib目录下,比如MindSpore安装在虚拟环境`/home/conda/envs/aicpu/lib/python3.7/site-packages/mindspore`下,则aicpu的so文件需要放到`/home/conda/envs/aicpu/lib/python3.7/site-packages/mindspore/lib/`目录下,这样即可正常加载到文件。 - -更多关于AICPU算子的调试和性能优化请参考[MindStudio文档](https://support.huaweicloud.com/usermanual-mindstudioc73/atlasmindstudio_02_0043.html)。 - -### 注册AICPU自定义算子信息 - -在完成上一步后,跟TBE算子一致,我们需要补充算子信息。AICPU算子通过`AiCPURegOp`接口定义,通过`op_info_register`装饰器将算子信息与算子实现入口函数绑定。当算子实现py文件被导入时,`op_info_register`装饰器会将算子信息注册到后端的算子信息库中。更多关于算子信息的使用方法请参考`AiCPURegOp`的成员方法的注释说明,算子信息的字段含义可以参考[AICPU文档](https://support.huaweicloud.com/usermanual-mindstudio303/atlasms_02_0194.html)。 - -> - 算子信息中定义输入输出信息的个数和顺序、算子实现入口函数的参数中的输入输出信息的个数和顺序、算子原语中输入输出名称列表的个数和顺序,三者要完全一致。 -> - 算子如果带属性,在算子信息中需要用`attr`描述属性信息,属性的名称与算子原语定义中的属性名称要一致。 - -需要额外注意的是,在基础的注册信息外,我们需要额外添加`attr("cust_aicpu", "str")`属性,该属性是用于获取算子实现的so名称。以`RandomChoiceWithMask`算子为例,假设我们已经定义好了算子原语,并且算子实现已经编译为`librandom_choice_with_mask.so`,那么我们只需要在算子信息库中添加`attr("cust_aicpu", "str")`,然后在算子定义时,设置该属性值为`"random_choice_with_mask"`即可完成将该算子注册到自定义AICPU算子列表中。 - -> “cust_aicpu”的值为字符串,用算子so的名字去除`lib`前缀与`.so`后缀表示,如`libmindspore_aicpu_kernels.so`则设为`"mindspore_aicpu_kernels"`即可。 - -```python -from mindspore.ops import op_info_register, AiCPURegOp, DataType - -random_choice_with_mask_op_info = AiCPURegOp("RandomChoiceWithMask") \ - .fusion_type("OPAQUE") \ - .input(0, "x", "required") \ - .output(0, "y", "required") \ - .output(1, "mask", "required") \ - .attr("count", "int") \ - .attr("seed", "int") \ - .attr("seed2", "int") \ - .attr("cust_aicpu", "str") \ - .dtype_format(DataType.BOOL_Default, DataType.I32_Default, DataType.BOOL_Default) \ - .get_op_info() - -@op_info_register(random_choice_with_mask_op_info) -def _random_choice_with_mask_aicpu(): - """RandomChoiceWithMask AiCPU register""" - return -``` - -### 示例 - -下面以`Dropout2D`算子的AICPU调用实现为例进行介绍,我们会经历算子实现、算子原语注册、算子信息库、算子调用四个步骤: - -1. 算子实现:参考[实现AICPU算子](#实现aicpu算子)的相关内容,我们将算子编译成`libmindspore_aicpu_kernels.so`。 -2. 算子原语注册:参考[注册算子原语](#注册算子原语)的相关内容,我们将定义一个Dropout2D的算子。 -3. 算子信息库:参考[注册AICPU自定义算子信息](#注册aicpu自定义算子信息)的相关内容,我们将实现Dropout2D的信息库,并且添加`"cust_aicpu"`的属性。 -4. 算子调用:我们可以正常按照单算子网络的形式调用Dropout2D算子,同时可以配置`"cust_aicpu"`的属性值为`mindspore_aicpu_kernels`。 - -```python -import numpy as np -from mindspore.ops import prim_attr_register, PrimitiveWithInfer -from mindspore.ops import op_info_register, AiCPURegOp, DataType -import mindspore.nn as nn -import mindspore.ops as ops -import mindspore as ms -ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend") - -class Dropout2D(PrimitiveWithInfer): - @prim_attr_register - def __init__(self, keep_prob=0.5): - """Initialize Dropout2D.""" - pass - - def infer_shape(self, x_shape): - return x_shape, x_shape - - def infer_dtype(self, x_dtype): - mask_dtype = ms.tensor_type(ms.bool_) - return x_dtype, mask_dtype - -dropout2d_op_info = AiCPURegOp("Dropout2D") \ - .fusion_type("OPAQUE") \ - .input(0, "x", "required") \ - .output(0, "y", "required") \ - .output(1, "mask", "required") \ - .attr("keep_prob", "float") \ - .attr("cust_aicpu", "str") \ - .dtype_format(DataType.BOOL_Default, DataType.BOOL_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I8_Default, DataType.I8_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I16_Default, DataType.I16_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I32_Default, DataType.I32_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.I64_Default, DataType.I64_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U8_Default, DataType.U8_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U16_Default, DataType.U16_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U32_Default, DataType.U32_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.U64_Default, DataType.U64_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.F16_Default, DataType.F16_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.F32_Default, DataType.F32_Default, DataType.BOOL_Default) \ - .dtype_format(DataType.F64_Default, DataType.F64_Default, DataType.BOOL_Default) \ - .get_op_info() - -@op_info_register(dropout2d_op_info) -def _dropout2d_aicpu(): - """Dropout2D AiCPU register""" - return - -class NetDropout2D(nn.Cell): - def __init__(self, keep_prob=0.5): - super(NetDropout2D, self).__init__() - self.op = Dropout2D(keep_prob) - self.op.add_prim_attr("cust_aicpu", "mindspore_aicpu_kernels") - - def construct(self, inputs): - return self.op(inputs) - -if __name__ == "__main__": - input_tensor = ms.Tensor(np.ones([1, 1, 2, 3]), ms.float32) - dropout2d_nn = NetDropout2D(0.5) - output, mask = dropout2d_nn(input_tensor) - print("output: ", output) - print("mask: ", mask) -``` - -执行结果: - -```text -output: [[[[0.0.0.] - [0.0.0.]]]] -mask: [[[[False False False] - [False False False]]]] -``` - -## 定义算子反向传播函数 - -如果算子要支持自动微分,需要在其原语中定义其反向传播函数(bprop)。你需要在bprop中描述利用正向输入、正向输出和输出梯度得到输入梯度的反向计算逻辑。反向计算逻辑可以使用内置算子或自定义反向算子构成。 - -定义算子反向传播函数时需注意以下几点: - -- bprop函数的入参顺序约定为正向的输入、正向的输出、输出梯度。若算子为多输出算子,正向输出和输出梯度将以元组的形式提供。 -- bprop函数的返回值形式约定为输入梯度组成的元组,元组中元素的顺序与正向输入参数顺序一致。即使只有一个输入梯度,返回值也要求是元组的形式。 - -例如,增加bprop后的`CusSquare`原语为: - -```python -from mindspore.ops import prim_attr_register, PrimitiveWithInfer -import mindspore.ops as ops - -class CusSquare(PrimitiveWithInfer): - @prim_attr_register - def __init__(self): - """init CusSquare""" - self.init_prim_io_names(inputs=['x'], outputs=['y']) - from square_impl import CusSquareImpl - - def infer_shape(self, data_shape): - return data_shape - - def infer_dtype(self, data_dtype): - return data_dtype - - def get_bprop(self): - def bprop(data, out, dout): - twos_like = ops.OnesLike()(data) * 2.0 - gradient = ops.Mul()(data, twos_like) - dx = ops.Mul()(gradient, dout) - return (dx,) - return bprop -``` - -在`test_square.py`文件中定义反向用例。 - -```python -import mindspore.ops as ops -def test_grad_net(): - x = np.array([1.0, 4.0, 9.0]).astype(np.float32) - sens = np.array([1.0, 1.0, 1.0]).astype(np.float32) - square = Net() - grad = ops.GradOperation(sens_param=True) - dx = grad(square)(ms.Tensor(x), ms.Tensor(sens)) - print("x: ", x) - print("dx: ", dx) -``` - -执行用例: - -```bash -pytest -s tests/st/ops/custom_ops_tbe/test_square.py::test_grad_net -``` - -执行结果: - -```text -x: [1. 4. 9.] -dx: [2. 8. 18.] -```