diff --git a/tutorials/source_en/custom_program/op_custom.rst b/tutorials/source_en/custom_program/op_custom.rst
index cc50424a27daf70d96d1ae3dfd0f6fe4ad886564..00bff33121e1dab6c61f3b824ec87e87e5b589d8 100644
--- a/tutorials/source_en/custom_program/op_custom.rst
+++ b/tutorials/source_en/custom_program/op_custom.rst
@@ -16,7 +16,6 @@ Custom Operators
    operation/op_customopbuilder
    operation/cpp_api_for_custom_ops
    operation/op_customopbuilder_atb
-   operation/op_customopbuilder_function
 
 When built-in operators cannot meet requirements during network development, you can use MindSpore's custom operator functionality to integrate your operators. Currently, MindSpore provides two approaches for integrating custom operators:
 
diff --git a/tutorials/source_en/custom_program/operation/op_customopbuilder.md b/tutorials/source_en/custom_program/operation/op_customopbuilder.md
index 550a35b89eaf9809f2aa1d7609bce7638e75281f..08e005acadc8cbe6a453662d1baae8c3d03e23d1 100644
--- a/tutorials/source_en/custom_program/operation/op_customopbuilder.md
+++ b/tutorials/source_en/custom_program/operation/op_customopbuilder.md
@@ -215,4 +215,3 @@ Running the above script produces the following result:
 ## More Usage Scenarios
 
 - [Integrating ATB Operators Using AtbOpRunner](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder_atb.html): Introduces methods for quickly integrating ATB operators as custom operators.
-- [Developing Forward and Backward Operators Using the Function Interface](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder_function.html): Introduces the method of defining custom operator forward and backward propagation functions.
diff --git a/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md b/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md
index 5347fbf93aa74b2c0bfc8aefb4efdea0d8bbc44e..2026436a49974a227bb7d202cd2c2a798e7fd586 100644
--- a/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md
+++ b/tutorials/source_en/custom_program/operation/op_customopbuilder_atb.md
@@ -153,7 +153,7 @@ auto pyboost_npu_swiglu(const ms::Tensor &x, int32_t dim) {
 }
 
 PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
-  m.def("npu_swiglu", &pyboost_npu_swiglu, "swiglu realization", pybind11::arg("x"), pybind11::arg("dim") = -1);
+  m.def("swiglu", &pyboost_npu_swiglu, "swiglu realization", pybind11::arg("x"), pybind11::arg("dim") = -1);
 }
 ```
 
@@ -162,8 +162,10 @@ PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
 Save the above C++ code as a file named `atb_activation.cpp`, and then compile it using the Python interface `CustomOpBuilder`.
 
 ```python
+import mindspore
+import numpy as np
 x = mindspore.Tensor(np.random.rand(2, 32).astype(np.float16))
-my_ops = CustomOpBuilder("atb_activation", "atb_activation.cpp", enable_atb=True).load()
+my_ops = mindspore.ops.CustomOpBuilder("atb_activation", "atb_activation.cpp", enable_atb=True).load()
 y = my_ops.swiglu(x, -1)
 print(y)
 ```
diff --git a/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md b/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md
deleted file mode 100644
index 9a15147ac8f855d7235e22343ce9a4d1a599616e..0000000000000000000000000000000000000000
--- a/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md
+++ /dev/null
@@ -1,238 +0,0 @@
-# CustomOpBuilder: Develop Forward and Backward Operators with the Function Interface
-
-[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md)
-
-## Overview
-
-In [Defining the bprop Function for Operators](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_custom_adv.html#defining-the-bprop-function-for-operators), MindSpore provides a method for customizing backward functions. This method requires defining two Custom operators and binding the backward operator to the forward Custom operator in Python. The development process is relatively lengthy.
-
-For dynamic graphs, MindSpore offers another method for customizing backward functions. Using a `Function` interface, the backward and forward propagation functions of the operator can be defined together, with `AutogradContext` used to pass information from the forward function to the backward function. This approach is more in line with common programming practices. Backward operators defined in this way are automatically registered during the execution of the forward operator, requiring no additional operations.
-
-The following is an example illustrating how to use the `Function` interface:
-
-This guide demonstrates a multiplication operator implementation on the Ascend platform. For related code and more examples, see [Repository Code](https://gitee.com/mindspore/mindspore/blob/master/tests/st/pynative/grad/test_custom_cpp_function_grad.py).
-
-**Note:** The `BaseTensorPtr` referenced in this guide is an internal data structure of MindSpore. In future versions, these interfaces will be refactored into interfaces based on `ms::Tensor`.
-
-## Operator Definition
-
-To define a dynamic graph custom operator, users need to implement a C++ computation function and map it to Python via pybind11. Below is an example of a custom operator's computation function.
-
-```cpp
-#include <string>
-#include "ms_extension.h"
-
-namespace mindspore::pynative {
-namespace autograd {
-ShapeVector BroadcastInferShape(const BaseTensorPtr &t1, const BaseTensorPtr &t2) {
-  ShapeVector s1 = t1->shape();
-  ShapeVector s2 = t2->shape();
-  ShapeVector out_shape(std::max(s1.size(), s2.size()), 1LL);
-  if (out_shape.empty()) {
-    return out_shape;
-  }
-  for (size_t i = out_shape.size(); i > 0; i--) {
-    if (i <= s1.size() && s1[s1.size() - i] > 1) {
-      out_shape[out_shape.size() - i] = s1[s1.size() - i];
-    } else if (i <= s2.size() && s2[s2.size() - i] > 1) {
-      out_shape[out_shape.size() - i] = s2[s2.size() - i];
-    }
-  }
-  return out_shape;
-}
-
-class CustomMul : public Function<CustomMul> {
- public:
-  static BaseTensorPtr Forward(AutogradContext *ctx, const BaseTensorPtr &x, const BaseTensorPtr &y) {
-    auto output = std::make_shared<BaseTensor>(x->data_type(), BroadcastInferShape(x, y));
-    custom::CustomLaunchAclnn("aclnnMul", {x, y}, {output});
-    bool x_require_grad = ctx->NeedGrad(x);
-    bool y_require_grad = ctx->NeedGrad(y);
-    if (x_require_grad || y_require_grad) {
-      ctx->SaveForBackward({x_require_grad ? y : nullptr, y_require_grad ? x : nullptr});
-    }
-    return output;
-  }
-
-  static BaseTensorPtrList Backward(AutogradContext *ctx, BaseTensorPtrList grad_outputs) {
-    auto saved = ctx->GetSavedTensors();
-    auto dout = grad_outputs[0];
-
-    BaseTensorPtr grad_x = nullptr;
-    BaseTensorPtr grad_y = nullptr;
-
-    if (ctx->NeedsInputGrad(0)) {
-      grad_x = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[0]));
-      custom::CustomLaunchAclnn("aclnnMul", {dout, saved[0]}, {grad_x});
-    }
-    if (ctx->NeedsInputGrad(1)) {
-      grad_y = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[1]));
-      custom::CustomLaunchAclnn("aclnnMul", {dout, saved[1]}, {grad_y});
-    }
-
-    return {grad_x, grad_y};
-  }
-};
-
-BaseTensorPtr run_custom_mul(const tensor::BaseTensorPtr &x, const tensor::BaseTensorPtr &y) {
-  return CustomMul::Apply(x, y);
-}
-
-}  // namespace autograd
-}  // namespace mindspore::pynative
-
-PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
-  m.def("mul", &mindspore::pynative::autograd::run_custom_mul, "Calculate the value x multiplied by y.");
-}
-```
-
-The `Function` class template constructs the computation function class `CustomMul`, which uses the `Apply` method. The C++ function `run_custom_mul` is bound to Python via `PYBIND11_MODULE` to create the custom operator.
-
-### Data Structures and Interfaces
-
-To facilitate user-defined operators, MindSpore provides foundational data structures and interfaces:
-
-- `Function`: Computation function class template. Custom operator classes derive from this.
-- `BaseTensor`: Tensor type. `BaseTensorPtr` and `BaseTensorPtrList` represent its pointer and list forms.
-- `AutogradContext`: Autodiff environment, detailed below.
-- `CustomLaunchAclnn`: Interface for invoking aclnn operators.
-
-It should be noted that in order to use the data structures provided by MindSpore, it is necessary to refer to the header file `ms_extension.h` in the custom operator code and define the computational function class and computational function in the namespace `mindspore::pyboost`.
-
-### Computation Function Class
-
-In order to facilitate the implementation of user-defined operators and inverses, MindSpore provides a computational function class template `Function`. Users can define the following computational function class according to the operator class name they choose:
-
-```c++
-class CustomMul : public Function<CustomMul>
-```
-
-This class requires two methods: `Forward` (forward pass) and `Backward` (backward pass).
-
-#### Forward Computation
-
-The user implements the forward computation of a customized operator through the `Forward` method. First focus on the following function prototype. Its first input is fixed to `AutogradContext *`, and the rest of the inputs support `BaseTensorPtr`, `std::string`, or other base types, the number of which is determined by the number of inputs to the operator.
-
-```c++
-static BaseTensorPtr Forward(AutogradContext *ctx, const BaseTensorPtr &x, const BaseTensorPtr &y)
-```
-
-Here is the forward function calculation part. The user first creates a Tensor with a data type of `x->data_type()` and a size of `BroadcastInferShape(x, y)`, then uses `CustomLaunchAclnn` to invoke the `aclnnMul` operator for computation. For knowledge related to the compilation of aclnn operators, you can refer to the relevant sections in [AOT type custom operators (Ascend platform)](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_custom_ascendc.html#offline-compilation-and-deployment).
-
-```c++
-auto output = std::make_shared<BaseTensor>(x->data_type(), BroadcastInferShape(x, y));
-custom::CustomLaunchAclnn("aclnnMul", {x, y}, {output});
-```
-
-Finally save the forward inputs for the inverse function on which the differentiation algorithm depends. The `AutogradContext` class will be used here. First the `NeedGrad` interface is used to determine if the corresponding input needs to be derived. If there are inputs that need to be computed backward, record the information via `SaveForBackward`. For multiplication here, if `x` needs to be derived, `y` needs to be saved in the environment, and vice versa.
-
-```c++
-bool x_require_grad = ctx->NeedGrad(x);
-bool y_require_grad = ctx->NeedGrad(y);
-if (x_require_grad || y_require_grad) {
-  ctx->SaveForBackward({x_require_grad ? y : nullptr, y_require_grad ? x : nullptr});
-}
-```
-
-#### Backward Computation
-
-The user implements the inverse computation of a customized operator through the `Backward` method. First focus on the following function prototype. Its first input is fixed to `AutogradContext *` and its second input is fixed to `BaseTensorPtrList`.
-
-```c++
-static BaseTensorPtrList Backward(AutogradContext *ctx, BaseTensorPtrList grad_outputs)
-```
-
-First obtain the tensor used for the calculation of the inverse function, which comes from two parts: the list of tensors saved by the environment and the input of the inverse.
-The tensor values saved by the environment are obtained by the `AutogradContext::GetSavedTensors` interface and correspond to the list of tensors recorded in the forward function using the `SaveForBackward` interface. Here the list of tensors recorded by the forward function is `{x_require_grad ? y : nullptr, y_require_grad ? x : nullptr}`, so `saved` has two elements.
-The input to the inverse is the gradient of the forward input, which corresponds one-to-one to the output of the forward function. Here the forward function has only one output, so `dout` has only one element.
-
-```c++
-auto saved = ctx->GetSavedTensors();
-auto dout = grad_outputs[0];
-```
-
-The value of each forward gradient is then calculated. To minimize the amount of computation, `ctx->NeedsInputGrad(i)` is used first to determine if the ith input needs to be derived. Only if it is needed does it go to the specific calculation function. The computation is done in the same way as the forward function computation can be done by calling the aclnn operator.
-
-```c++
-if (ctx->NeedsInputGrad(0)) {
-  grad_x = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[0]));
-  custom::CustomLaunchAclnn("aclnnMul", {dout, saved[0]}, {grad_x});
-}
-if (ctx->NeedsInputGrad(1)) {
-  grad_y = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[1]));
-  custom::CustomLaunchAclnn("aclnnMul", {dout, saved[1]}, {grad_y});
-}
-```
-
-### Computation Function and Python Binding
-
-After creating the computational function class `CustomMul` and its `Forward/Backward` methods, implement the computational function `run_custom_mul` for the custom operator. The `Apply` method of the `CustomMul` class needs to be used here, and its inputs need to correspond one-to-one with all inputs in the `CustomMul::Forward` signature except `AutogradContext`.
-
-```c++
-BaseTensorPtr run_custom_mul(const tensor::BaseTensorPtr &x, const tensor::BaseTensorPtr &y) {
-  return CustomMul::Apply(x, y);
-}
-```
-
-The C++ function `run_custom_mul` is linked to the Python function `mul` via `PYBIND11_MODULE`. Here, the inputs to `m.def` are respectively:
-
-- `'mul'`: Corresponds to the Python function name.
-- `&mindspore::pynative::autograd::run_custom_mul`: Corresponds to a C++ function pointer.
-- `"Calculate the value x multiplied by y."`: Python function documentation.
-
-```python
-PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
-  m.def("mul", &mindspore::pynative::autograd::run_custom_mul, "Calculate the value x multiplied by y.");
-}
-```
-
-## Operator Usage
-
-In order to facilitate the use of custom operators, MindSpore provides a Python class `CustomOpBuilder` to help users to implement automatic compilation and custom operator running and other functions. An example of how to use a custom operator is as follows.
-
-```python
-import numpy as np
-import mindspore as ms
-from mindspore import Tensor, Parameter, nn
-from mindspore.ops import CustomOpBuilder
-
-class MyNet(nn.Cell):
-    def __init__(self):
-        super().__init__()
-        self.p = Parameter(2.0, requires_grad=True)
-        self.my_ops = CustomOpBuilder("my_ops", ['./custom_src/function_ops.cpp'], backend="Ascend").load()
-
-    def construct(self, x, y):
-        z = self.my_ops.mul(x, y)
-        return self.my_ops.mul(z, self.p)
-
-
-x = Tensor(1.0, ms.float32) * 2
-y = Tensor(1.0, ms.float32) * 3
-net = MyNet()
-grad_op = ms.value_and_grad(net, grad_position=(0, 1), weights=net.trainable_params())
-out, grads = grad_op(x, y)
-print('out:', out)
-print('grads[0]:', grads[0])
-print('grads[1]:', grads[1])
-```
-
-Here, the user defines a custom operator module `self.my_ops = CustomOpBuilder("my_ops", ['. /custom_src/function_ops.cpp'], backend="Ascend").load()`. Here the meaning of `CustomOpBuilder` parameters are:
-
-- `"my_ops"`: Customize the operator module name.
-- `['./custom_src/function_ops.cpp']`: Customize the path to the operator C++ file. If there is more than one C++ file, you need to list them all in the list.
-- `backend="Ascend"`: Customize the backend on which the operator runs.
-
-It should be noted that the users need to call the `load` method after defining a custom operator using `CustomOpBuilder` for automatic compilation and loading of the operator.
-
-Here the custom operator is called in the script via `self.my_ops.mul(x, y)`, where `mul` is the name of the Python function defined in `PYBIND11_MODULE` above.
-
-Run the above script to get the results:
-
-```text
-out: 12.0
-grads[0]: (Tensor(shape=[], dtype=Float32, value= 6), Tensor(shape=[], dtype=Float32, value= 4))
-grads[1]: (Tensor(shape=[], dtype=Float32, value= 6),)
-```
-
-In the above result, `out` denotes the positive output, the two `Tensors` of `grads[0]` denote the derivatives of the inputs `x` and `y`, respectively, and one `Tensor` of grads[1] denotes the derivative of the Parameter `p`.
\ No newline at end of file
diff --git a/tutorials/source_zh_cn/custom_program/op_custom.rst b/tutorials/source_zh_cn/custom_program/op_custom.rst
index 095aaa5ccdd77e3029e9e9e3e8ea42c31f7e67d6..13f5a13fff31a0fb9e343a7f8e147281b8b95bb6 100644
--- a/tutorials/source_zh_cn/custom_program/op_custom.rst
+++ b/tutorials/source_zh_cn/custom_program/op_custom.rst
@@ -16,7 +16,6 @@
    operation/op_customopbuilder
    operation/cpp_api_for_custom_ops
    operation/op_customopbuilder_atb
-   operation/op_customopbuilder_function
 
 当开发网络遇到内置算子不足以满足需求时，你可以利用MindSpore的自定义算子功能接入你的算子。当前MindSpore提供了两种方式接入自定义算子，分别是 `基于Custom原语接入 <https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_custom_prim.html>`_ 和 `基于CustomOpBuilder接入 <https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_customopbuilder.html>`_ 。
 
diff --git a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md
index 63c837bc097a5271b474761af3bc9dd7eb12980f..9622958823e22478bc56ef24f20a44f7f83aa598 100644
--- a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md
+++ b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder.md
@@ -215,4 +215,3 @@ print(out)
 ## 更多场景示例
 
 - [通过AtbOpRunner接入ATB算子](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_customopbuilder_atb.html)：介绍通过自定义算子快速对接ATB算子的方法。
-- [通过Function接口开发正反向算子](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_customopbuilder_function.html)：介绍定义自定义算子正向传播函数和反向传播函数的方法。
diff --git a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md
index a2dfc37f0dc304f312267485bc8fbe6d1f92330b..7b0bdb3277144ff4648afbb7f31e26cc6cc85981 100644
--- a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md
+++ b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_atb.md
@@ -153,7 +153,7 @@ auto pyboost_npu_swiglu(const ms::Tensor &x, int32_t dim) {
 }
 
 PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
-  m.def("npu_swiglu", &pyboost_npu_swiglu, "swiglu realization", pybind11::arg("x"), pybind11::arg("dim") = -1);
+  m.def("swiglu", &pyboost_npu_swiglu, "swiglu realization", pybind11::arg("x"), pybind11::arg("dim") = -1);
 }
 ```
 
@@ -162,8 +162,10 @@ PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
 将上述C++代码保存成文件`atb_activation.cpp`，然后使用Python接口`CustomOpBuilder`编译。
 
 ```python
+import mindspore
+import numpy as np
 x = mindspore.Tensor(np.random.rand(2, 32).astype(np.float16))
-my_ops = CustomOpBuilder("atb_activation", "atb_activation.cpp", enable_atb=True).load()
+my_ops = mindspore.ops.CustomOpBuilder("atb_activation", "atb_activation.cpp", enable_atb=True).load()
 y = my_ops.swiglu(x, -1)
 print(y)
 ```
diff --git a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md
deleted file mode 100644
index a58e2b436d17dea6c9a9fe3ed77b6e358fbbb73b..0000000000000000000000000000000000000000
--- a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md
+++ /dev/null
@@ -1,239 +0,0 @@
-# CustomOpBuilder通过Function接口开发正反向算子
-
-[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md)
-
-## 概述
-
-在 [定义Custom算子的反向传播函数](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_custom_adv.html#定义算子反向传播函数) 中，MindSpore提供了一种自定义反向函数的方法。这种方法需要定义两个Custom算子，并在python的将反向算子绑定给正向Custom算子，开发流程比较冗长。
-
-在动态图上，MindSpore提供另一种自定义反向函数的方法。通过一个 `Function` 接口把算子的反向传播和正向传播函数定义在一起，中间用 `AutogradContext` 把正向函数的信息传递给反向函数，更加符合编程习惯。通过此方式定义的反向算子，会在正向算子执行时自动注册，无需额外操作。
-
-下面以一个例子来说明 `Function` 接口的使用方法：
-
-本指南演示了在Ascend平台上实现一个乘法算子。有关相关代码和更多示例，请参阅[代码仓库](https://gitee.com/mindspore/mindspore/blob/master/tests/st/pynative/grad/test_custom_cpp_function_grad.py)。
-
-** 注意： 本指南中调用的`BaseTensorPtr`是MindSpore的内部数据结构，在后续版本中，这些接口都将会被改造成基于`ms::Tensor`的接口。 **
-
-## 算子定义
-
-为了定义一个动态图的自定义算子，用户需要定义一个C++的计算函数，然后通过pybind11将C++计算映射到Python作为MindSpore算子使用。下面是一个自定义算子的计算函数样例。
-
-```cpp
-#include <string>
-#include "ms_extension.h"
-
-namespace mindspore::pynative {
-namespace autograd {
-ShapeVector BroadcastInferShape(const BaseTensorPtr &t1, const BaseTensorPtr &t2) {
-  ShapeVector s1 = t1->shape();
-  ShapeVector s2 = t2->shape();
-  ShapeVector out_shape(std::max(s1.size(), s2.size()), 1LL);
-  if (out_shape.empty()) {
-    return out_shape;
-  }
-  for (size_t i = out_shape.size(); i > 0; i--) {
-    if (i <= s1.size() && s1[s1.size() - i] > 1) {
-      out_shape[out_shape.size() - i] = s1[s1.size() - i];
-    } else if (i <= s2.size() && s2[s2.size() - i] > 1) {
-      out_shape[out_shape.size() - i] = s2[s2.size() - i];
-    }
-  }
-  return out_shape;
-}
-
-class CustomMul : public Function<CustomMul> {
- public:
-  static BaseTensorPtr Forward(AutogradContext *ctx, const BaseTensorPtr &x, const BaseTensorPtr &y) {
-    auto output = std::make_shared<BaseTensor>(x->data_type(), BroadcastInferShape(x, y));
-    custom::CustomLaunchAclnn("aclnnMul", {x, y}, {output});
-    bool x_require_grad = ctx->NeedGrad(x);
-    bool y_require_grad = ctx->NeedGrad(y);
-    if (x_require_grad || y_require_grad) {
-      ctx->SaveForBackward({x_require_grad ? y : nullptr, y_require_grad ? x : nullptr});
-    }
-    return output;
-  }
-
-  static BaseTensorPtrList Backward(AutogradContext *ctx, BaseTensorPtrList grad_outputs) {
-    auto saved = ctx->GetSavedTensors();
-    auto dout = grad_outputs[0];
-
-    BaseTensorPtr grad_x = nullptr;
-    BaseTensorPtr grad_y = nullptr;
-
-    if (ctx->NeedsInputGrad(0)) {
-      grad_x = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[0]));
-      custom::CustomLaunchAclnn("aclnnMul", {dout, saved[0]}, {grad_x});
-    }
-    if (ctx->NeedsInputGrad(1)) {
-      grad_y = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[1]));
-      custom::CustomLaunchAclnn("aclnnMul", {dout, saved[1]}, {grad_y});
-    }
-
-    return {grad_x, grad_y};
-  }
-};
-
-BaseTensorPtr run_custom_mul(const tensor::BaseTensorPtr &x, const tensor::BaseTensorPtr &y) {
-  return CustomMul::Apply(x, y);
-}
-
-}  // namespace autograd
-}  // namespace mindspore::pynative
-
-PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
-  m.def("mul", &mindspore::pynative::autograd::run_custom_mul, "Calculate the value x multiplied by y.");
-}
-```
-
-这里使用计算函数类模板`Function`构建了一个计算函数类`CustomMul`，并使用计算函数类中的`Apply`方法定义计算函数，最后通过`PYBIND11_MODULE`将C++函数`run_custom_mul`链接到Python函数`mul`中构建自定义算子。
-
-### 数据结构与接口
-
-为了方便用户定义算子，MindSpore提供了基础的数据结构和接口，包括：
-
-- `Function`：计算函数类模板。自定义算子的计算函数类均由此类派生出来。
-- `BaseTensor`：张量。`BaseTensorPtr`为对应的指针的数据结构，`BaseTensorPtrList`为对应的指针的列表的数据结构。
-- `AutogradContext`：自动微分环境。这个数据结构的用法将在下面详细介绍。
-- `CustomLaunchAclnn`：调用aclnn算子接口。
-
-值得注意的是，为了使用MindSpore提供的数据结构，需要在自定义算子代码里引用头文件`ms_extension.h`，并将计算函数类和计算函数定义在命名空间`mindspore::pyboost`中。
-
-### 计算函数类
-
-为了方便用户实现自定义算子及反向，MindSpore提供计算函数类模板`Function`。用户使用时，可根据自己选择的算子类名，定义如下计算函数类：
-
-```c++
-class CustomMul : public Function<CustomMul>
-```
-
-对于这个计算类，用户只需要定义两个方法，分别对应算子的正向计算与反向计算。
-
-#### 正向计算
-
-用户通过`Forward`方法实现自定义算子的正向计算。首先关注如下函数原型。其第一个输入固定为`AutogradContext *`，其余输入支持`BaseTensorPtr`、`std::string`，或者其它基础类型，其个数由算子的输入个数决定。
-
-```c++
-static BaseTensorPtr Forward(AutogradContext *ctx, const BaseTensorPtr &x, const BaseTensorPtr &y)
-```
-
-下面是正向函数计算部分。用户先创建一个数据类型为`x->data_type()`，大小为`BroadcastInferShape(x, y)`的`Tensor`，然后使用`CustomLaunchAclnn`调用`aclnnMul`算子进行计算。对于aclnn算子的编译相关知识，可以参考[AOT类型自定义算子（Ascend平台）](https://www.mindspore.cn/tutorials/zh-CN/master/custom_program/operation/op_custom_ascendc.html#编译与部署方法)中的相关章节。
-
-```c++
-auto output = std::make_shared<BaseTensor>(x->data_type(), BroadcastInferShape(x, y));
-custom::CustomLaunchAclnn("aclnnMul", {x, y}, {output});
-```
-
-最后为反向函数保存微分算法依赖的正向输入。这里会使用`AutogradContext`类。首先通过`NeedGrad`接口确定对应输入是否需要求导。如果有输入需要计算反向，则通过`SaveForBackward`记录相关信息。这里的乘法，如果`x`需要求导，则需在环境中保存`y`，反之亦然。
-
-```c++
-bool x_require_grad = ctx->NeedGrad(x);
-bool y_require_grad = ctx->NeedGrad(y);
-if (x_require_grad || y_require_grad) {
-  ctx->SaveForBackward({x_require_grad ? y : nullptr, y_require_grad ? x : nullptr});
-}
-```
-
-#### 反向计算
-
-用户通过`Backward`方法实现自定义算子的反向计算。首先关注如下函数原型。其第一个输入固定为`AutogradContext *`，第二个输入固定为`BaseTensorPtrList`。
-
-```c++
-static BaseTensorPtrList Backward(AutogradContext *ctx, BaseTensorPtrList grad_outputs)
-```
-
-首先获取反向函数计算使用的张量，张量的内容来自两个部分：环境保存的张量列表与反向的输入。
-环境保存的张量值由`AutogradContext::GetSavedTensors`接口获得，对应正向函数中使用`SaveForBackward`接口记录的张量列表。这里正向函数记录的张量列表为`{x_require_grad ? y : nullptr, y_require_grad ? x : nullptr}`，因此`saved`有两个元素。
-反向的输入为正向输入的梯度，与正向函数的输出一一对应。这里正向函数只有一个输出，因此`dout`只有一个元素。
-
-```c++
-auto saved = ctx->GetSavedTensors();
-auto dout = grad_outputs[0];
-```
-
-然后计算每一个正向梯度的值。为了尽可能的减少计算量，先使用`ctx->NeedsInputGrad(i)`判断第i个输入是否需要求导。如果需要才会进入具体的计算函数。其计算方式与正向函数计算一样可以调用aclnn算子进行计算。
-
-```c++
-if (ctx->NeedsInputGrad(0)) {
-  grad_x = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[0]));
-  custom::CustomLaunchAclnn("aclnnMul", {dout, saved[0]}, {grad_x});
-}
-if (ctx->NeedsInputGrad(1)) {
-  grad_y = std::make_shared<BaseTensor>(dout->data_type(), BroadcastInferShape(dout, saved[1]));
-  custom::CustomLaunchAclnn("aclnnMul", {dout, saved[1]}, {grad_y});
-}
-```
-
-### 计算函数及Python绑定
-
-在创建完计算函数类`CustomMul`及其`Forward/Backward`方法后，实现自定义算子的计算函数`run_custom_mul`。这里需要使用`CustomMul`类的`Apply`方法，其输入需要与`CustomMul::Forward`签名中的除了`AutogradContext`之外的所有输入一一对应。
-
-```c++
-BaseTensorPtr run_custom_mul(const tensor::BaseTensorPtr &x, const tensor::BaseTensorPtr &y) {
-  return CustomMul::Apply(x, y);
-}
-```
-
-然后通过`PYBIND11_MODULE`将C++函数`run_custom_mul`链接到Python函数`mul`中。这里，`m.def`的输入分别为：
-
-- `'mul'`：对应Python函数名字。
-- `&mindspore::pynative::autograd::run_custom_mul`：对应C++函数指针。
-- `"Calculate the value x multiplied by y."`：Python函数文档。
-
-```python
-PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
-  m.def("mul", &mindspore::pynative::autograd::run_custom_mul, "Calculate the value x multiplied by y.");
-}
-```
-
-## 算子使用
-
-为了方便用户使用自定义算子，MindSpore提供了Python类`CustomOpBuilder`帮助用户实现自动编译及自定义算子运行等功能。一个自定义算子的使用用例如下。
-
-```python
-import numpy as np
-import mindspore as ms
-from mindspore import Tensor, Parameter, nn
-from mindspore.ops import CustomOpBuilder
-
-class MyNet(nn.Cell):
-    def __init__(self):
-        super().__init__()
-        self.p = Parameter(2.0, requires_grad=True)
-        self.my_ops = CustomOpBuilder("my_ops", ['./custom_src/function_ops.cpp'], backend="Ascend").load()
-
-    def construct(self, x, y):
-        z = self.my_ops.mul(x, y)
-        return self.my_ops.mul(z, self.p)
-
-
-x = Tensor(1.0, ms.float32) * 2
-y = Tensor(1.0, ms.float32) * 3
-net = MyNet()
-grad_op = ms.value_and_grad(net, grad_position=(0, 1), weights=net.trainable_params())
-out, grads = grad_op(x, y)
-print('out:', out)
-print('grads[0]:', grads[0])
-print('grads[1]:', grads[1])
-```
-
-这里，用户定义了一个自定义算子模块`self.my_ops = CustomOpBuilder("my_ops", ['./custom_src/function_ops.cpp'], backend="Ascend").load()`。这里`CustomOpBuilder`的参数含义分别为：
-
-- `"my_ops"`：自定义算子模块名。
-- `['./custom_src/function_ops.cpp']`：自定义算子C++文件路径。如果有多个C++文件，需要在列表中一一列出。
-- `backend="Ascend"`：自定义算子运行的后端。
-
-值得注意的是，在使用`CustomOpBuilder`定义完自定义算子后需要调用`load`方法进行算子的自动编译和加载。
-
-这里在脚本中通过`self.my_ops.mul(x, y)`调用自定义算子，其中`mul`为上面`PYBIND11_MODULE`中定义的Python函数名。
-
-运行以上脚本，获得结果：
-
-```text
-out: 12.0
-grads[0]: (Tensor(shape=[], dtype=Float32, value= 6), Tensor(shape=[], dtype=Float32, value= 4))
-grads[1]: (Tensor(shape=[], dtype=Float32, value= 6),)
-```
-
-上面结果中，`out`表示正向的输出，`grads[0]`的两个`Tensor`分别表示输入`x`和`y`的导数，grads[1]的一个`Tensor`表示Parameter `p`的导数。
-