diff --git a/docs/mindspore/source_en/design/all_scenarios.md b/docs/mindspore/source_en/design/all_scenarios.md deleted file mode 100644 index e9a961a4949c260a82422c276f95abcd85e2ba7a..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_en/design/all_scenarios.md +++ /dev/null @@ -1,198 +0,0 @@ -# Full-scenarios Unified Architecture - -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/all_scenarios.md) - -MindSpore is designed to provide a device-edge-cloud full-scenarios AI framework that can be deployed in different hardware environments on the device-edge-cloud to meet the differentiated needs of different environments, such as supporting lightweight deployment on the device-side and rich training features such as automatic differentiation, hybrid precision and easy programming of models on the cloud side. - -> The cloud side includes NVIDIA GPU, Huawei Ascend, Intel x86, etc., and the device side includes Arm, Qualcomm, Kirin, etc. - -![intro](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png) - -## Important Features of Full-scenarios - -Several important features of MindSpore full scenarios: - -1. The unified C++ inference interface of the device-edge-cloud supports algorithm code that can be quickly migrated to different hardware environments for execution, such as [device-side training based on C++ interface](https://mindspore.cn/lite/docs/en/master/quick_start/train_lenet.html). -2. Model unification. The device and cloud use the same model format and definition, and the software architecture is consistent. MindSpore supports the execution of Ascend, GPU, CPU (x86, Arm) and other hardware, and is used in multiple deployments for one training. -3. Diversified arithmetic support. Provide a unified southbound interface to support the quick addition of new hardware for use. -4. Model miniaturization techniques, adapted to the requirements of different hardware environments and business scenarios, such as quantization compression. -5. Rapid application of device-edge-cloud collaboration technologies such as [Federated Learning](https://mindspore.cn/federated/docs/en/master/index.html), [End-side Training](https://mindspore.cn/lite/docs/en/master/use/runtime_train.html) and other new technologies. - -## Full-scenarios Support Mode - -As shown above, the model files trained on MindSpore can be deployed in cloud services via Serving and executed on servers, device-side and other devices via Lite. Lite also supports offline optimization of the model via the standalone tool convert, achieving the goal of lightweighting the framework during inference and high performance of model execution. - -MindSpore abstracts a uniform operator interface across hardware, so that the programming code for the network model can be consistent across different hardware environments. Simultaneously loading the same model file performs inference efficiently on each of the different hardware supported by MindSpore. - -The inference aspect takes into account the fact that a large number of users use C++/C programming type, and provides a inference programming interface for C++, and the related programming interface is morphologically closer to the style of the Python interface. - -At the same time, by providing custom offline optimized registration for third-party hardware and custom operator registration mechanism for third-party hardware, it implements fast interconnection to new hardware, while the external model programming interface and model files remain unchanged. - -## MindSpore IR (MindIR) - -### Overview - -An intermediate representation (IR) is a representation of a program between the source and target languages, which facilitates program analysis and optimization for the compiler. Therefore, the IR design needs to consider the difficulty in converting the source language to the target language, as well as the ease-of-use and performance of program analysis and optimization. - -MindSpore IR (MindIR) is a function-style IR based on graph representation. Its core purpose is to serve automatic differential transformation. Automatic differentiation uses the transformation method based on the function-style programming framework. Therefore, IR uses the semantics close to that of the ANF function. In addition, a manner of representation based on an explicit dependency graph is used by referring to excellent designs of Sea of Nodes[1] and Thorin[2]. For the specific introduction of ANF-IR, please refer to [MindSpore IR Syntax](https://www.mindspore.cn/docs/en/master/design/all_scenarios.html#syntax). - -When a model compiled using MindSpore runs in the graph mode `set_context(mode=GRAPH_MODE)` and setting the environment variable `MS_DEV_SAVE_GRAPHS` to 1, some intermediate files will be generated during graph compilation. These intermediate files are called IR files. When more information about backend procedure should be analyzed, we can set the environment variable `MS_DEV_SAVE_GRAPHS` to 2. When more advanced information such as visualizing computing graphs or ir graphs of frontend with more details is required, we can set the environment variable `MS_DEV_SAVE_GRAPHS` to 3 to get more details. Currently, there are two IR files: - -- .ir file: An IR file that describes the model structure in text format and can be directly viewed using any text editors. - -- .dot file: An IR file that describes the topology relationships between different nodes. You can use this file by [graphviz](http://graphviz.org) as the input to generate images for users to view the model structure. - -### Syntax - -ANF is a simple IR commonly used during functional programming. The ANF syntax is defined as follows: - -```text - ::= NUMBER | STRING | VAR | BOOLEAN | PRIMOP - | (lambda (VAR …) ) - ::= ( …) - | (if ) - ::= (let ([VAR ]) ) | | - -``` - -Expressions in the ANF are classified into atomic expressions (aexp) and compound expressions (cexp). An atomic expression indicates a constant value, a variable, or an anonymous function. A compound expression consists of multiple atomic expressions, indicating that an anonymous function or primitive function call. The first input expression of a compound expression is the called function, and the other input expressions are the called parameters. - -The syntax of MindIR is inherited from the ANF and is defined as follows: - -```text - ::= | - ::= Parameter - ::= Scalar | Named | Tensor | Type | Shape - | Primitive | MetaFuncGraph | FuncGraph - ::= ( …) - ::= | -``` - -ANode in a MindIR corresponds to the atomic expression of ANF. ANode has two subclasses: ValueNode and ParameterNode. - -- ValueNode refers to a constant node, which can carry a constant value (such as a scalar, symbol, tensor, type, and dimension), a primitive function (Primitive), a metafunction (MetaFuncGraph), or a common function (FuncGraph). In functional programming, the function definition itself is a value. -- ParameterNode refers to a parameter node, which indicates the formal parameter of a function. - -CNode in a MindIR corresponds to the compound expression of ANF, indicating a function call. - -During automatic differentiation of MindSpore, the gradient contribution of ParameterNode and CNode are calculated, and the final gradient of ParameterNode is returned. The gradient of ValueNode is not calculated. - -### Example - -The following uses a program code segment as an example to help you understand MindIR. - -```python -def func(x, y): - return x / y - -@ms.jit -def test_f(x, y): - a = x - 1 - b = a + y - c = b * func(a, b) - return c -``` - -The ANF corresponding to the Python code is as follows: - -```text -lambda (x, y) - let a = x - 1 in - let b = a + y in - let func = lambda (x, y) - let ret = x / y in - ret end in - let %1 = func(a, b) in - let c = b * %1 in - c end -``` - -The corresponding MindIR is [ir.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/images/ir/ir.dot). - -![image](./images/ir/ir.png) - -In a MindIR, a function graph (FuncGraph) indicates the definition of a common function. A directed acyclic graph (DAG) usually consists of ParameterNode, ValueNode, and CNode, which clearly shows the calculation process from parameters to return values. As shown in the preceding figure, the `test_f` and `func` functions in the Python code are converted into two function graphs. The `x` and `y` parameters are converted into ParameterNode in the function graphs, and each expression is converted into a CNode. The first input of CNode links to the called functions, for example, `add`, `func`, and `return` in the figure. It should be noted that these nodes are all `ValueNode` because they are considered as constant function values. Other input of CNode links to the called parameters. The parameter values can be obtained from the ParameterNode, ValueNode, and other CNode. - -In the ANF, each expression is bound as a variable by using the let expression, and the dependency on the expression output is represented by referencing the variable. In the MindIR, each expression is bound as a node, and the dependency is represented by using the directed edges between nodes. - -### Function-Style Semantics - -Compared with traditional computational graphs, MindIR can not only express data dependency between operators, but also express rich function-style semantics. - -#### Higher-Order Functions - -In a MindIR, a function is defined by a subgraph. However, the function itself can be transferred as the input or output of other higher-order functions. -In the following simple example, the `f` function is transferred as a parameter into the `g` function. Therefore, the `g` function is a higher-order function that receives function input, and the actual call site of the `f` function is inside the `g` function. - -```python -@ms.jit -def hof(x): - def f(x): - return x + 3 - def g(function, x): - return function(x) * function(x) - res = g(f, x) - return res -``` - -The corresponding MindIR is [hof.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/images/ir/hof.dot). - -![image](./images/ir/hof.png) - -In the actual network training scripts, the automatic derivation generic function `grad` and `Partial` and `HyperMap` that are commonly used in the optimizer are typical high-order functions. Higher-order semantics greatly improve the flexibility and simplicity of MindSpore representations. - -#### Control Flows - -In a MindIR, control flows are expressed in the form of high-order function selection and calling. This form transforms a control flow into a data flow of higher-order functions, making the automatic differential algorithm more powerful. It not only supports automatic differentiation of data flows, but also supports automatic differentiation of control flows such as conditional jumps, loops, and recursion. - -The following uses a simple Fibonacci instance as an example. - -```python -@ms.jit -def fibonacci(n): - if n < 1: - return 0 - if n == 1: - return 1 - return fibonacci(n-1) + fibonacci(n-2) -``` - -The corresponding MindIR is [cf.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/images/ir/cf.dot). - -![image](./images/ir/cf.png) - -`fibonacci` is a top-level function graph. Two function graphs at the top level are selected and called by `switch`. `✓fibonacci` is the True branch of the first `if`, and `✗fibonacci` is the False branch of the first `if`. `✓✗fibonacci` called in `✗fibonacci` is the True branch of `elif`, and `✗✗fibonacci` is the False branch of `elif`. The key is, in a MindIR, conditional jumps and recursion are represented in the form of higher-order control flows. For example, `✓✗fibonacci` and `✗fibonacci` are transferred in as parameters of the `switch` operator. `switch` selects a function as the return value based on the condition parameter. In this way, `switch` performs a binary selection operation on the input functions as common values and does not call the functions. The real function call is completed on CNode following `switch`. - -#### Free Variables and Closures - -Closure is a programming language feature that refers to the combination of code blocks and scope environment. A free variable refers to a variable in the scope environment referenced in a code block instead of a local variable. In a MindIR, a code block is represented as a function graph. The scope environment can be considered as the context where the function is called. The capture method of free variables is value copy instead of reference. - -A typical closure instance is as follows: - -```python -@ms.jit -def func_outer(a, b): - def func_inner(c): - return a + b + c - return func_inner - -@ms.jit -def ms_closure(): - closure = func_outer(1, 2) - out1 = closure(1) - out2 = closure(2) - return out1, out2 -``` - -The corresponding MindIR is [closure.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/images/ir/closure.dot). - -![image](./images/ir/closure.png) - -In the example, `a` and `b` are free variables because the variables `a` and `b` in `func_inner` are parameters defined in the referenced parent graph `func_outer`. The variable `closure` is a closure, which is the combination of the function `func_inner` and its context `func_outer(1, 2)`. Therefore, the result of `out1` is 4, which is equivalent to `1+2+1`, and the result of `out2` is 5, which is equivalent to `1+2+2`. - -### References - -[1] C. Click and M. Paleczny. A simple graph-based intermediate representation. -SIGPLAN Not., 30:35-49, March 1995. - -[2] Roland Leißa, Marcel Köster, and Sebastian Hack. A graph-based higher-order intermediate representation. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 202-212. IEEE Computer Society, 2015. diff --git a/docs/mindspore/source_en/design/distributed_training_design.md b/docs/mindspore/source_en/design/distributed_training_design.md deleted file mode 100644 index 432a367e893bdbdd08aa73db5312290d41ac7dde..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_en/design/distributed_training_design.md +++ /dev/null @@ -1,293 +0,0 @@ -# Distributed Parallel Native - -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/distributed_training_design.md) - -## Background - -With the rapid development of deep learning, the number of datasets and parameters are growing exponentially to improve the accuracy and generalization capability of neural networks. Parallel distributed training has become a development trend to resolve the performance bottleneck of ultra-large scale networks. - -To cope with the problem of oversized datasets, MindSpore introduces data parallelism mode, which utilizes the computing resources of multiple devices to process more training data simultaneously and speed up model training. Also when the data is too large or the model is too large to be loaded on a single compute node for training, model parallelism needs to be introduced, where each compute node only needs to load part of the model and data, which can reduce memory usage and improve training efficiency. - -In the evolution of distributed parallelism programming paradigm, in traditional manual parallelism, users need to manually slice the model to parallelize multiple nodes based on communication primitives through coding, and need perceptual graph slicing, operator slicing, and cluster topology to achieve optimal performance. This programming paradigm has certain threshold requirements for engineers, so it evolved into semi-automatic parallelism: parallel logic and algorithmic logic decoupling. The users write algorithmic code in a single-card serial way, with parallel logic serving as the algorithmic configuration. Users only need to configure parallel strategies to achieve automatic parallel slicing without writing additional code and do not need to perceive the distribution of model slices and cluster topology. The fully automatic parallel training programming paradigm goes a step further, where the user only needs to write single-card serial algorithms that automatically generate a better shard strategy through a search algorithm. - -MindSpore implements data communication and synchronization operations during parallel training by means of aggregate communication, which relies on Huawei collective communication library (HCCL) on Ascend chips and NVIDIA collective communication library (NCCL) on GPUs. - -MindSpore currently uses a synchronous training mode, which ensures that the parameters are consistent across all devices and are synchronized on all devices before each training iteration begins. - -This design document will focus on the design principles of several parallel training methods and guide users in custom development. - -## Data Parallelism - -This section describes how the data parallel mode `ParallelMode.DATA_PARALLEL` works in MindSpore. - -### Principle of Data Parallelism - -![Data Parallel Description](./images/data_parallel.png) - -1. Environment dependencies - - Each time before parallel training starts, the [mindspore.communication.init](https://www.mindspore.cn/docs/en/master/api_python/communication/mindspore.communication.init.html) API is called to initialize communication resources and the global communication group `WORLD_COMM_GROUP` is automatically created. - -2. Data distribution - - The key of data parallelism is to split datasets based on the sample dimension and deliver the split datasets to different devices. Each dataset loading API provided by the [mindspore.dataset](https://www.mindspore.cn/docs/en/master/api_python/mindspore.dataset.html) module has the `num_shards` and `shard_id` parameters. The parameters are used to split a dataset into multiple datasets, perform cyclic sampling, and collect data of the `batch` size to each device. When the data volume is insufficient, the sampling restarts from the beginning. - -3. Network structure - - The scripting method of data parallel network is the same as that of standalone network. This is because, although models of each device are executed independently during the forward and backward propagation processes, the same network structure is maintained. To ensure the synchronous training between devices, the initial values of corresponding network parameters must be the same. You are advised to enable `parameter_broadcast` to broadcast the values of weights in `DATA_PARALLEL` and `HYBRID_PARALLEL` modes. And in `AUTO_PARALLEL` and `SEMI_AUTO_PARALLEL` modes, the sharded dimensions of weights will be processed automatically by setting random seeds to ensure the initialization of weights are consistent on the devices which belongs to the same data parallel dimension. - -4. Gradient aggregation - - Theoretically, the training effect of data parallel network should be the same as that of the standalone network. To ensure the consistency of the calculation logic, the [AllReduce](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.AllReduce.html) operator is inserted after gradient calculation to implement the gradient aggregation operation between devices. You can enable `mean` to average the sum of gradient values, or regard `mean` as a hyperparameter. Enabling `mean` is equivalent to reducing the learning rate by multiple times. - -5. Parameter update - - Because the gradient aggregation operation is introduced, the models of each device perform parameter update with the same gradient value. Therefore, MindSpore implements a synchronous data parallel training mode. Theoretically, models trained by each device are the same. If the reduce operation on samples is involved on the network, the network output may be different. This is determined by the sharding attribute of data parallelism. - -### Data Parallel Code - -1. Collective communication - - - [management.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/communication/management.py): This file covers the `helper` function APIs commonly used during the collective communication process, for example, the APIs for obtaining the number of clusters and device ID. When collective communication is executed on the Ascend chip, the framework loads the `libhccl.so` library file in the environment and uses it to call the communication APIs from the Python layer to the underlying layer. - - [comm_ops.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/ops/operations/comm_ops.py): MindSpore encapsulates supported collective communication operations as operators and stores the operators in this file. The operators include [AllReduce](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.AllReduce.html), [AllGather](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.AllGather.html), [ReduceScatter](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.ReduceScatter.html) and [Broadcast](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.Broadcast.html). `PrimitiveWithInfer` defines the attributes required by the operators, as well as the `shape` and `dtype` inference methods from the input to the output during graph composition. - -2. Gradient aggregation - - - [grad_reducer.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/nn/wrap/grad_reducer.py): This file implements the gradient aggregation process. After the input parameter `grads` is expanded by using `HyperMap`, the `AllReduce` operator is inserted. The global communication group is used. You can also perform custom development by referring to this section based on your network requirements. In MindSpore, standalone and distributed execution shares a set of network encapsulation APIs. In the `Cell`, `ParallelMode` is used to determine whether to perform gradient aggregation. - -## Semi-automatic Parallelism - -This subsection describes how the `ParallelMode.SEMI_AUTO_PARALLEL` semi-automatic parallel mode works in MindSpore. - -### Principle of Semi-automatic Parallelism - -![Automatic Parallel Description](./images/auto_parallel.png) - -1. Distributed operators and tensor distribution models - - In the above architecture diagram, the automatic parallel process traverses the forward computation graph (ANF Graph) of a single machine, modeling the tensor slice in terms of distributed operators, representing how the input and output tensor of an operator is distributed to each card of the cluster (Tensor Layout). This model adequately expresses the mapping relationship between tensor and device, and the user does not need to perceive the device where each slice of the model is running, and the framework will automatically schedule the assignment. - - In order to obtain the tensor distribution model, each operator has a shard strategy, which represents the slice of each input of the operator in the corresponding dimension. In general, any dimension of the tensor can be sliced as long as it satisfies the principle of base-2 and uniform distribution. The following figure is an example of a three-dimensional matrix multiplication (BatchMatMul) operation, whose shard strategy consists of two tuples representing the shard forms of `input` and `weight`, respectively. The elements in the tuple correspond to the tensor dimension one by one: `2^N` is the number of slicing copies, and `1` means no slicing. When the user wants to represent a data parallel shard strategy, i.e. `batch` dimension of `input` is sliced and the other dimensions are not sliced, which can be expressed as `strategy=((2^N, 1, 1),(1, 1, 1))`. When representing a model parallel shard strategy, i.e. non-`batch` dimensions slice of `weight`, here `channel` dimensions slicing is an example, other dimensions are not sliced, which can be expressed as `strategy=((1, 1, 1),(1, 1, 2^N))`. When representing a mixed parallelism shard strategy, one of the shard strategies is `strategy=((2^N, 1, 1),(1, 1, 2^N))`. - - ![Operator slice definition](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_en/design/images/operator_split.png) - - Based on the shard strategy, deriving the distribution model method of the input and output tensor of the operator is defined in the distributed operator. This distribution model consists of `device_matrix`, `tensor_shape` and `tensor_map`, which represent the device matrix shape, tensor shape, and the mapping relationship between device and tensor dimensions, respectively. The distributed operator will further determine whether to insert additional computation and communication operations in the graph according to the tensor distribution model, to ensure that the operator operation logic is correct. - -2. Tensor distribution Transformation - - When the output tensor model of the former operator and the input tensor model of the latter operator are not consistent, it is necessary to introduce computational and communication operations to realize the change between tensor arrangements. The automatic parallel process introduces the tensor redistribution algorithm, which can derive an arbitrary inter-tensor communication transition. The following three examples represent the parallel computation of the formula `Z=(X×W)×V`, i.e., two two-dimensional matrix multiplication operations, and show how to convert between different parallel approaches. - In Sample 1, the output of the first data parallel matrix multiplication has a slice in the row direction, while the input of the second model parallel matrix multiplication requires the full-volume tensor. The framework will automatically insert the `AllGather` operator to implement the distribution transformation. - - ![tensor-redistribution1](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_en/design/images/tensor_redistribution1.png) - - In Sample 2, the output of the first model parallel matrix multiplication has a slice in the column direction, while the input of the second data parallel matrix multiplication has a slice in the row direction, and the framework will automatically insert the communication operator equivalent to the `AlltoAll` operation in set communication to implement the distribution transformation. - - ![tensor-redistribution2](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_en/design/images/tensor_redistribution2.png) - - In Sample 3, the output slice of the first hybrid parallel matrix multiplication is the same as the input slice of the second hybrid parallel matrix multiplication, so there is no need to introduce the redistribution transformation. However, since there is a slice in the relevant dimensions of the two inputs in the second matrix multiplication operation, the `AllReduce` operator needs to be inserted to guarantee the correctness of the operation. - - ![tensor-redistribution3](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_en/design/images/tensor_redistribution3.png) - - In summary, samples 1 and 2 are the basis for automatic parallelism implementation. Overall this distributed representation breaks the boundary between data parallelism and model parallelism and easily achieves hybrid parallelism. From the scripting level, the user only needs to construct a single-machine network to express the parallelism algorithm logic, and the framework will automatically achieve the slicing of the whole graph. - -3. Distributed auto-differentiation - - The traditional manual model slicing needs to pay attention to the forward network communication and the parallel operation of network reverse. MindSpore automatically generates the reverse communication operator by encapsulating the communication operation as an operator and using the original auto-differentiation operation of the framework, so even in the distributed training, the user only needs to focus on the forward propagation of the network, which truly realizes the fully automatic parallelism of training. - -4. Support multi-dimensional hybrid parallelism - - Semi-automatic parallelism supports the automatic mixing of multiple parallel modes, respectively: - - **Operator-level parallelism**: Operator parallelism takes the operators in a neural network and slices the input tensor to multiple devices for computation. In this way, data samples and model parameters can be distributed among different devices to train large-scale deep learning models and use cluster resources for parallel computing to improve the overall speed. The user can set the shard strategy for each operator, and the framework will model the slice of each operator and its input tensor according to the shard strategy of the operator to maintain mathematical equivalence. This approach can effectively reduce the load on individual devices and improve computational efficiency, and is suitable for training large-scale deep neural networks. For more details, please refer to [operator-level parallelism](https://www.mindspore.cn/tutorials/en/master/parallel/operator_parallel.html). - - **Pipeline parallism**: When the number of cluster devices is large, if only operator parallelism is used, communication is required over the communication domain of the entire cluster, which may make communication inefficient and thus reduce the overall performance. The pipeline parallelism can slice the neural network structure into multiple stages, and each stage is running in a part of the device, which limits the communication domain of the collective communication to this part of the device, while the inter-stage uses point-to-point communication. The advantages of pipeline parallelism are: improving communication efficiency, and easily handling neural network structures stacked by layers. The disadvantage is that some nodes may be idle at the same time. Foe detailed information, refer to [pipeline parallelism](https://www.mindspore.cn/tutorials/en/master/parallel/pipeline_parallel.html). - - **MoE parallism**: MoE is to distribute the experts to different workers and each worker takes on different batches of training data. For the non-MoE layer, expert parallelism is the same as data parallelism. In the MoE layer, the tokens in the sequence are sent to the workers corresponding to their matching experts via all-to-all communication. After completing the computation of the corresponding expert, it is then re-passed back to the original worker by all-to-all and organized into the original sequence for computation of the next layer. Since MoE models usually have a large number of experts, the expert parallelism increases more with the size of the model than the model parallelism. - - ![MoE Parallelism](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/MoE.png) - - **Multi-Copy Parallelism**: The data of the input model is sliced according to the batchsize dimension, thus modifying the existing single-copy form into a multi-copy form, so that the underlying layer communicates while the other copy performs the computation operation without waiting. This ensures that the computation and communication times of multiple copies complement each other to improve model performance, while splitting the data into multiple copies also reduces the number of parameters of the operator inputs, thus reducing the computation time of individual operators, which helps a lot to improve model performance. - - ![multi-copy Parallelism](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/multi_copy.png) - - **Optimizer Parallelism**: When training in data parallelism or operator parallelism, the same copy of the model parameters may exist on multiple devices, which allows the optimizer to have redundant computations across multiple devices when updating that weight. In this case, the computation of the optimizer can be spread over multiple devices by optimizer parallelism. Its advantages are: reducing static memory consumption, and the amount of computation within the optimizer. The disadvantages are: increasing communication overhead. For detailed information, refer to [Optimizer Parallelism](https://www.mindspore.cn/tutorials/en/master/parallel/optimizer_parallel.html). - -### Semi-automatic Parallel Code - -1. Tensor layout model - - [tensor_layout](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/tensor_layout): This directory contains the definitions and implementation of functions related to the tensor distribution model. `tensor_layout.h` declares the member variables `tensor_map_origin_`, `tensor_shape_`, and `device_arrangement_` required by a tensor distribution model. In `tensor_redistribution.h`, the related methods for implementing the `from_origin_` and `to_origin_` transformation between tensor distributions are declared. The deduced redistribution operation is stored in `operator_list_` and returned, in addition, the communication cost `comm_cost_`,, memory cost `memory_cost_`, and calculation cost `computation_cost_` required for redistribution are calculated. - -2. Distributed operators - - [ops_info](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/ops_info): This directory contains the implementation of distributed operators. In `operator_info.h`, the base class `OperatorInfo` of distributed operator implementation is defined. A distributed operator to be developed shall inherit the base class and explicitly implement related imaginary functions. The `InferTensorInfo`, `InferTensorMap`, and `InferDevMatrixShape` functions define the algorithms for deriving the input and output tensor distribution model of the operator. The `InferForwardCommunication` and `InferMirrorOps` functions define the extra calculation and communication operations to be inserted for operator sharding. The `CheckStrategy` and `GenerateStrategies` functions define the sharding strategy validation and generation for the operator. According to the sharding strategy `SetCostUnderStrategy`, the parallel cost `operator_cost_` of the distributed operator is generated. - -3. Device management - - [device_manager.h](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/device_manager.h): This file is used to create and manage cluster device communication groups. The device matrix model is defined by `device_matrix.h`, and the communication domain is managed by `group_manager.h`. - -4. Entire graph sharding - - [step_auto_parallel.h](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/step_auto_parallel.h), and [step_parallel.h](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/step_parallel.h): The two files contain the core implementation of the automatic parallel process. `step_auto_parallel.h` calls the strategy search process and generates the `OperatorInfo` of the distributed operator. Then in `step_parallel.h`, processes such as operator sharding and tensor redistribution are processed to reconstruct the standalone computing graph in distributed mode. - -5. Backward propagation of communication operators - - [grad_comm_ops.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/ops/_grad_experimental/grad_comm_ops.py): This file defines the backward propagation of communication operators, such as `AllReduce` and `AllGather`. - -## Fully Automatic Parallelism - -Semi-automatic parallelism frees users from the complexity of distributed code development and greatly reduces the difficulty of developing distributed AI macromodels. Although users no longer need to consider data storage and communication between devices, they still need to specify the appropriate shard strategy for each operator, because the training performance of different shard strategies varies greatly. Users still need to have appropriate parallel knowledge and computational analysis based on network structure, cluster topology, etc. in order to define appropriate parallel strategies in a huge search space. The reality is that the main users of AI frameworks are AI researchers and engineers, who may not have specialized parallelism knowledge. On the other hand, in the face of a huge search space, finding the right parallel strategy for a large model requires monthly manual tuning costs and still does not guarantee the optimal strategy. For example, DeepSpeed, Megatron, and other expert custom strategies for transformer-like networks still require user-defined configurations of dp, mp, pp, etc., not to mention that the network model has more than one transformer structure. For these two reasons, MindSpore provides a variety of automatic hybrid parallel strategy generation schemes to minimize the user's perception of parallel configurations and allow users to train large models quickly, efficiently, and easily. - -This subsection describes how the `ParallelMode.AUTO_PARALLEL` fully automatic parallel mode works in MindSpore. - -### Feature Design - -Fully automatic parallelism is based on the MindSpore semi-automatic framework, replacing expert configuration of parallel strategies with automatic hybrid parallel strategy generation algorithms. The following figure shows the process of using MindSpore to train or inference about a neural network in a distributed manner. Users develop their own neural network models (or MindIR imports) using the Python language, which are parsed into computational graphs (ANF graphs) by MindSpore. The automatic hybrid parallel strategy generation module searches for a better strategy through the algorithm and passes it to the semi-automatic parallel module, which analyzes the tensor distribution, distributed operator analysis, device management, and performs whole graph slicing, and passes it to the back-end for computation. - -In fact, the hybrid parallel strategy generation module is responsible for finding a suitable parallel shard strategy for a given neural network model and cluster configuration. The key technology used is a strategy search algorithm based on cost model, which constructs a cost model to describe the computation cost and communication cost in a distributed training scenario, and uses memory cost as a constraint to efficiently search for a better performance parallel strategy through the computational graph search algorithm. - -![Fully automatic Parallelism](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto.png) - -### Three Search Algorithms - -Fully automatic parallelism is very difficult to implement, and MindSpore divides the provided strategy generation algorithm into L1 level and L2 level according to the degree of user intervention required (here we assume that the manually configured full graph strategy SEMI_AUTO is L0 level, and the scheme that does not require user participation is L3 level). - -The strategy generation algorithm at the L1 level is called Strategy Broadcast (Sharding Propagation). In this mode, the user only needs to manually define the strategies for a few key operators, and the strategies for the remaining operators in the computational graph are automatically generated by the algorithm. Because the strategy of the key operator has been defined, the cost model of the algorithm mainly describes the redistribution cost between the operators, and the optimization objective is to minimize the redistribution cost of the whole graph. Because the main operator strategy has been defined, which is equivalent to a compressed search space, the search time of this scheme is shorter and its strategy performance depends on the definition of the key operator strategy, so it still requires the user to have the ability to analyze the defined strategy. Refer to [Sharding Propagation](https://www.mindspore.cn/docs/en/master/features/parallel/auto_parallel.html) for detailed information. - -There are two types of L2-level strategy generation algorithms, Dynamic Programming and Symbolic Automatic Parallel Planner (SAPP for short). Both methods have their advantages and disadvantages. The dynamic programming algorithm is able to search for the optimal strategy inscribed by the cost model, but it takes longer time to search for parallel strategies for huge networks. The SAPP algorithm is able to generate optimal strategies instantaneously for huge networks and large-scale cuts. -The core idea of the dynamic programming algorithm is to build a cost model of the full graph, including computation cost and communication cost, to describe the absolute time delay in the distributed training process, and to compress the search time using equivalent methods such as edge elimination and point elimination, but the search space actually grows exponentially with the number of devices and operators, so it is not efficient for large clusters with large models. -SAPP is modeled based on the parallelism principle by creating an abstract machine to describe the hardware cluster topology and optimizing the cost model by symbolic simplification. Its cost model compares not the predicted absolute latency, but the relative cost of different parallel strategies, so it can greatly compress the search space and guarantee minute search times for 100-card clusters. - -Sharding Propagation and SAPP currently support manual definition of Pipeline + automatic operator parallelism, and can be used in conjunction with optimizations such as recomputation, optimizer parallelism, etc. Dynamic Programming algorithms only support automatic operator parallelism. - -### Fully Automatic Parallelism Code - -**Strategic Search Algorithm**: The [auto_parallel](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/auto_parallel) directory implements the algorithm for strategy search. `graph_costmodel.h` defines the composition information, where each point represents an operator `OperatorInfo` and the directed edge `edge_costmodel.h` represents the input-output relation of the operator and the cost of redistribution. The cost model for each operator is defined in `operator_costmodel.h`, including the computation cost, communication cost and memory cost. The data structures for the cost and graph operations are defined in `costmodel.h`. - -- **dynamic_programming**: [dp_algo_costmodel.cc](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/auto_parallel/dp_algo_costmodel.cc) mainly describes the main flow of the dynamic programming algorithm and consists of a series of graph operations. -- **sharding_propagation**: [graph_costmodel.cc](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/auto_parallel/graph_costmodel.cc) implements the strategy broadcast (Sharding Propagation), which mainly uses the traversal method of BFS to propagate the strategy of several points, from points to interfaces, to the whole graph. -- **symbolic_automatic_parallel_planner**: The [rec_core](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core) directory implements the symbolic automatic parallelplanner (SAPP). - -## Heterogeneous Parallelism - -The heterogeneous parallel training method is to analyze the memory occupation and computational intensity of the operators on the graph, and slice the operators with huge memory consumption or suitable for CPU logic processing to the CPU subgraph, and slice the computationally intensive operators with less memory consumption to the hardware accelerator subgraph. The framework cooperates with different subgraphs for network training, so that subgraphs in different hardware and without dependencies can perform the execution process in parallel. - -### Computational Process - -A typical computational process for MindSpore heterogeneous parallel training is shown in the following figure: - -![heterogeneous-heter](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/heter.png) - -1. Users set backend for network execution - - ```python - import mindspore as ms - ms.set_device(device_target="GPU") - ``` - -2. Users set execution backend of specific operators - - ```python - from mindspore import ops - - prim = ops.Add() - - prim.set_device("CPU") - ``` - -3. The framework is sliced according to the computational graph operator flag. -4. The framework schedules different back-end execution subgraphs. - -Current scenario that typically uses heterogeneous parallel computing is optimizer heterogeneity. - -### Optimizer Heterogeneity - -During the training of a large model in PanGu or GPT3, the optimizer state takes up a large amount of memory, which in turn limits the size of the model that can be trained. Using optimizer heterogeneity, assigning optimizers to CPUs for execution can greatly scale the trainable models: - -![heterogeneous-heter-opt](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/heter-opt.png) - -As shown in the figure, configuring the Adam operator to CPU execution while specifying an accelerator for FP16 computation reduces the parameter memory footprint to 1/3 of the original. - -1. Configure the optimizer operators to CPU execution -2. Initialize weight parameters of FP16 and optimizer state variables of FP32 -3. Convert the gradient of the input optimizer to FP16 (if the gradient is FP16, you can ignore this step) -4. The weights and gradients are converted to FP32 to participate in the optimizer operation -5. The updated FP32 weights are assigned to the FP16 weights - -Sample code of the optimizer heterogeneity is as follows: - -```python -import numpy as np -import mindspore as ms -import mindspore.ops as ops -from mindspore.common.initializer import initializer -from mindspore.nn import Optimizer -_adam_opt = ops.MultitypeFuncGraph("adam_opt") -host_assign = ops.Assign() -host_assign.set_device("CPU") -host_cast = ops.Cast() -host_cast.set_device("CPU") -device_cast = ops.Cast() - -@_adam_opt.register("Function", "Tensor", "Tensor", "Tensor", "Tensor", "Number", "Tensor", "Tensor", "Tensor", - "Tensor", "Bool", "Bool") -def _update_run_kernel(opt, beta1, beta2, eps, lr, weight_decay, param, m, v, gradient, decay_flags, optim_filter): - """ - Update parameters by AdamWeightDecay op. - """ - success = True - if optim_filter: - param32 = host_cast(param, ms.float32) - gradient = device_cast(gradient, ms.float32) - if decay_flags: - next_param = opt(param32, m, v, lr, beta1, beta2, eps, weight_decay, gradient) - else: - next_param = opt(param32, m, v, lr, beta1, beta2, eps, 0.0, gradient) - ret = host_assign(param, host_cast(ops.depend(param32, next_param), ops.dtype(param))) - return ops.depend(success, ret) - return success - -class AdamWeightDecayOp(Optimizer): - def __init__(self, params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-6, weight_decay=0.0): - super(AdamWeightDecayOp, self).__init__(learning_rate, params, weight_decay) - self.beta1 = ms.Tensor(np.array([beta1]).astype(np.float32)) - self.beta2 = ms.Tensor(np.array([beta2]).astype(np.float32)) - self.eps = ms.Tensor(np.array([eps]).astype(np.float32)) - self.moments1 = self.clone_param32(prefix="adam_m", init='zeros') - self.moments2 = self.clone_param32(prefix="adam_v", init='zeros') - self.opt = ops.AdamWeightDecay() - self.hyper_map = ops.HyperMap() - self.opt.set_device("CPU") - - def construct(self, gradients): - """AdamWeightDecayOp""" - lr = self.get_lr() - if self.is_group: - if self.is_group_lr: - optim_result = self.map_reverse(ops.partial(_adam_opt, self.opt, self.beta1, self.beta2, self.eps), - lr, self.weight_decay, self.parameters, self.moments1, self.moments2, - gradients, self.decay_flags, self.optim_filter) - else: - optim_result = self.map_reverse(ops.partial(_adam_opt, self.opt, self.beta1, self.beta2, self.eps, lr), - self.weight_decay, self.parameters, self.moments1, self.moments2, - gradients, self.decay_flags, self.optim_filter) - else: - optim_result = self.map_reverse(ops.partial(_adam_opt, self.opt, self.beta1, self.beta2, self.eps, lr, - self.weight_decay), self.parameters, self.moments1, self.moments2, - gradients, self.decay_flags, self.optim_filter) - return optim_result - - def clone_param32(self, prefix, init=None): - new = [] - for old_param in self.parameters: - param_init = init - if init is None: - param_init = old_param.init - new_state = old_param.clone() - new_state.set_dtype(ms.float32) - new_state.set_data(initializer(param_init, shape=old_param.shape, dtype=ms.float32)) - new_state.name = prefix + '.' + new_state.name - new.append(new_state) - return ms.ParameterTuple(new) -``` - -Steps 4 and 5 can also be directly fused into the optimizer operator for further optimization. The complete optimizer heterogeneous training process can be found at: . - -### Constraints - -Currently the user needs to specify the back-end of the operator execution, and the back-end does not support automatic configuration based on the network. \ No newline at end of file diff --git a/docs/mindspore/source_en/design/index.rst b/docs/mindspore/source_en/design/index.rst deleted file mode 100644 index 05252bf9572ffcc35ada8af343c3f162fa1c5ca6..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_en/design/index.rst +++ /dev/null @@ -1,14 +0,0 @@ -Design Concept -========================= - -.. toctree:: - :glob: - :maxdepth: 1 - - overview - programming_paradigm - distributed_training_design - data_engine - multi_level_compilation - all_scenarios - pluggable_device diff --git a/docs/mindspore/source_en/design/programming_paradigm.md b/docs/mindspore/source_en/design/programming_paradigm.md deleted file mode 100644 index 7603fbc1547353d1154422de6bb48f1fc78fd2fd..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_en/design/programming_paradigm.md +++ /dev/null @@ -1,457 +0,0 @@ -# Functional and Object-Oriented Fusion Programming Paradigm - -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/programming_paradigm.md) - -Programming paradigm refers to the programming style or programming approach of a programming language. Typically, AI frameworks rely on the programming paradigm of the programming language used by the front-end programming interface for the construction and training of neural networks. MindSpore, as an AI+scientific computing convergence computing framework, provides object-oriented programming and functional programming support for AI and scientific computing scenarios, respectively. At the same time, in order to enhance the flexibility and ease of use of the framework, a functional + object-oriented fusion programming paradigm is proposed, which effectively reflects the advantages of functional automatic differentiation mechanism. - -The following describes each of the three types of programming paradigms supported by MindSpore and their simple examples. - -## Object-oriented Programming - -Object-oriented programming (OOP) is a programming method that decomposes programs into modules (classes) that encapsulate data and related operations, with objects being instances of classes. Object-oriented programming uses objects as the basic unit of a program, encapsulating the program and data in order to improve the reusability, flexibility and extensibility of the software, and the program in the object can access and often modify the data associated with the object. - -In a general programming scenario, code and data are the two core components. Object-oriented programming is to design data structures for specific objects to define classes. The class usually consists of the following two parts, corresponding to code and data, respectively: - -- Methods -- Attributes - -For different objects obtained after the instantiation of the same Class, the methods and attributes are the same, but the difference is the values of the attributes. The different attribute values determine the internal state of the object, so OOP can be good for state management. - -The following is an example of a simple class constructed in Python: - -```python -class Sample: #class declaration - def __init__(self, name): # class constructor (code) - self.name = name # attribute (data) - - def set_name(self, name): # method declaration (code) - self.name = name # method implementation (code) -``` - -For constructing a neural network, the primary component is the network layer (Layer), and a neural network layer contains the following components: - -- Tensor Operation -- Weights - -These two correspond exactly to the Methods and Attributes of the class, and the weights themselves are the internal states of the neural network layer, so using the class to construct Layers naturally fits its definition. In addition, we wish to use the neural network layer for stacking and construct deep neural networks when programming, and new Layer classes can be easily constructed by combining Layer objects using OOP programming. In addition, we wish to use neural network layers for stacking and constructing deep neural networks when programming, and new Layer classes can be easily constructed by combining Layer objects by using OOP programming. - -The following is an example of a neural network class constructed by using MindSpore: - -```python -from mindspore import nn, Parameter -from mindspore.common.initializer import initializer -import mindspore.ops as ops - -class Linear(nn.Cell): - def __init__(self, in_features, out_features, has_bias): # class constructor (code) - super().__init__() - self.weight = Parameter(initializer('normal', [out_features, in_features], mindspore.float32), 'weight') # layer weight (data) - self.bias = Parameter(initializer('zeros', [out_features], mindspore.float32), 'bias') # layer weight (data) - - def construct(self, inputs): # method declaration (code) - output = ops.matmul(inputs, self.weight.transpose(0, 1)) # tensor transformation (code) - output = output + self.bias # tensor transformation (code) - return output -``` - -In addition to the construction of the neural network layer by using the object-oriented programming paradigm, MindSpore supports pure object-oriented programming to construct the neural network training logic, where the forward computation, back propagation, gradient optimization and other operations of the neural network are constructed by using classes. The following is an example of pure object-oriented programming. - -```python -import mindspore -import mindspore.nn as nn -from mindspore import value_and_grad - -class TrainOneStepCell(nn.Cell): - def __init__(self, network, optimizer): - super().__init__() - self.network = network - self.optimizer = optimizer - self.grad_fn = value_and_grad(self.network, None, self.optimizer.parameters) - - def construct(self, *inputs): - loss, grads = self.grad_fn(*inputs) - self.optimizer(grads) - return loss - -network = nn.Dense(5, 3) -loss_fn = nn.BCEWithLogitsLoss() -network_with_loss = nn.WithLossCell(network, loss_fn) -optimizer = nn.SGD(network.trainable_params(), 0.001) -trainer = TrainOneStepCell(network_with_loss, optimizer) -``` - -At this point, both the neural network and its training process are managed by using classes that inherit from `nn.Cell`, which can be easily compiled and accelerated as a computational graph. - -## Functional Programming - -Functional programming is a programming paradigm that treats computer operations as functions and avoids the use of program state and mutable objects. - -In the functional programming, functions are treated as first-class citizens, which means they can be bound to names (including local identifiers), passed as arguments, and returned from other functions, just like any other data type. This allows programs to be written in a declarative and composable style, where small functions are combined in a modular fashion. Functional programming is sometimes seen as synonymous with pure functional programming, a subset of functional programming that treats all functions as deterministic mathematical functions or pure functions. When a pure function is called with some given parameters, it will always return the same result and is not affected by any mutable state or other side effects. - -The functional programming has two core features that make it well suited to the needs of scientific computing: - -1. The programming function semantics are exactly equivalent to the mathematical function semantics. -2. Determinism, if the same input is given, the same output is returned. No side effects. - -Due to this feature of determinism, by limiting side effects, programs can have fewer errors, are easier to debug and test, and are more suitable for formal verification. - -MindSpore provides pure functional programming support. With the numerical computation interfaces provided by [mindspore.numpy](https://www.mindspore.cn/docs/en/master/api_python/mindspore.numpy.html) and [mindspore.scipy](https://www.mindspore.cn/docs/en/master/api_python/mindspore.scipy.html), you can easily program scientific computations. The following is an example of using functional programming: - -```python -import mindspore.numpy as mnp -from mindspore import grad - -grad_tanh = grad(mnp.tanh) -print(grad_tanh(2.0)) -# 0.070650816 - -print(grad(grad(mnp.tanh))(2.0)) -print(grad(grad(grad(mnp.tanh)))(2.0)) -# -0.13621868 -# 0.25265405 -``` - -In line with the needs of the functional programming paradigm, MindSpore provides a variety of functional transformation interfaces, including automatic differentiation, automatic vectorization, automatic parallelism, just-in-time compilation, data sinking and other functional modules, which are briefly described below: - -- Automatic differentiation: [grad](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.grad.html), [value_and_grad](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.value_and_grad.html), providing differential function transformation. -- Automatic vectorization: [vmap](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.vmap.html), a higher-order function for mapping a function fn along the parameter axis. -- Automatic parallelism: [shard](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.Primitive.html#mindspore.ops.Primitive.shard), a functional operator slice, specifying the distribution strategy of the function input/output Tensor. -- Just-in-time compilation: [jit](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.jit.html), which compiles a Python function into a callable MindSpore graph. -- Data sinking: [data_sink](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.data_sink.html), transform the input function to obtain a function that can use the data sink pattern. - -Based on the above function transformation interfaces, function transformations can be used quickly and efficiently to implement complex functions when using the functional programming paradigm. - -## Functional Differential Programming - -### Automatic Differentiation - -Modern AI algorithm, such as deep learning, uses a large amount of data to learn and fit an optimized model with parameters. This training process often uses loss back-propagation to update parameters. **Automatic differentiation (AD)** is one of the key techniques. - -Automatic differentiation is a derivation method between neumerical differentiation and symbolic differentiation. The key concept of AD is to divide the calculation of the computer program into a finite set with basic operations. The derivations of all the basic operations are known. After calculating the derivation of all the basic operations, AD uses chain rule to combine them and gets the final gradient. - -The formula of chain rule is: - -$$ -(f\circ g)^{'}(x)=f^{'}(g(x))g^{'}(x) \tag{1} -$$ - -Based on how to connect the gradient of basic components, AD can be divided into **forward mode AD** and **reverse mode AD**. - -- Forward Automatic Differentiation (also known as tangent linear mode AD) or forward cumulative gradient (forward mode). -- Reverse Automatic Differentiation (also known as adjoint mode AD) or reverse cumulative gradient (reverse mode). - -Let's take formula (2) as an example to introduce the specific calculation method of forward and reverse differentiation: - -$$ -y=f(x_{1},x_{2})=ln(x_{1})+x_{1}x_{2}-sin(x_{2}) \tag{2} -$$ - -When we use the forward automatic differentiation formula (2) at $x_{1}=2, x_{2}=5$, $frac{partial y}{partial x_{1}}$, the direction of derivation of forward automatic differentiation is consistent with the evaluation direction of the original function, and the original function result and the differential result can be obtained at the same time. - -![image](./images/forward_ad.png) - -When using reverse automatic differentiation, the direction of differentiation of the reverse automatic differentiation is opposite to the evaluation direction of the original function, and the differential result depends on the running result of the original function. - -![image](./images/backward_ad.png) - -MindSpore first developed automatic differentiation based on the reverse pattern, and implemented forward differentiation on the basis of this method. - -In order to explain the differences between forward mode AD and reverse mode AD in further, we generalize the derived function to F, which has an N input and an M output: - -$$ -(Y_{1},Y_{2},...,Y_{M})=F(X_{1},X_{2},...,X_{N}) \tag{3} -$$ - -The gradient of function $F()$ is a Jacobian matrix. - -$$ -\left[ - \begin{matrix} - \frac{\partial Y_{1}}{\partial X_{1}}& ... & \frac{\partial Y_{1}}{\partial X_{N}} \\ - ... & ... & ... \\ - \frac{\partial Y_{M}}{\partial X_{1}} & ... & \frac{\partial Y_{M}}{\partial X_{N}} - \end{matrix} - \right] -\tag{4} -$$ - -#### Forward Mode AD - -In forward mode AD, the calculation of gradient starts from inputs. So, for each calculation, we can get the gradient of outputs with respect to one input, which is one column of the Jacobian matrix. - -$$ -\left[ - \begin{matrix} - \frac{\partial Y_{1}}{\partial X_{1}}\\ - ... \\ - \frac{\partial Y_{M}}{\partial X_{1}} - \end{matrix} - \right] -\tag{5} -$$ - -In order to get this value, AD divies the program into a series of basic operations. The gradient rules of these basic operations is known. The basic operation can also be represented as a function $f$ with $n$ inputs and $m$ outputs: - -$$ -(y_{1},y_{2},...,y_{m})=f(x_{1},x_{2},...,x_{n}) \tag{6} -$$ - -Since we have defined the gradient rule of $f$, we know the jacobian matrix of $f$. So we can calculate the Jacobian-vector-product (Jvp) and use the chain rule to get the gradient outoput. - -$$ -\left[ - \begin{matrix} - \frac{\partial y_{1}}{\partial X_{i}}\\ - ... \\ - \frac{\partial y_{m}}{\partial X_{i}} - \end{matrix} - \right]=\left[ - \begin{matrix} - \frac{\partial y_{1}}{\partial x_{1}}& ... & \frac{\partial y_{1}}{\partial x_{n}} \\ - ... & ... & ... \\ - \frac{\partial y_{m}}{\partial x_{1}} & ... & \frac{\partial y_{M}}{\partial x_{n}} - \end{matrix} - \right]\left[ - \begin{matrix} - \frac{\partial x_{1}}{\partial X_{i}}\\ - ... \\ - \frac{\partial x_{n}}{\partial X_{i}} - \end{matrix} - \right] -\tag{7} -$$ - -#### Reverse Mode AD - -In reverse mode AD, the calculation of gradient starts from outputs. So, for each calculation, we can get the gradient of one output with respect to inputs, which is one row of the Jacobian matrix. - -$$ -\left[ - \begin{matrix} - \frac{\partial Y_{1}}{\partial X_{1}}& ... & \frac{\partial Y_{1}}{\partial X_{N}} \\ - \end{matrix} - \right] -\tag{8} -$$ - -In order to get this value, AD divies the program into a series of basic operations. The gradient rules of these basic operations is known. The basic operation can also be represented as a function $f$ with n inputs and m outputs: - -$$ -(y_{1},y_{2},...,y_{m})=f(x_{1},x_{2},...,x_{n}) \tag{9} -$$ - -Since we have defined the gradient rule of $f$, we know the jacobian matrix of $f$. So we can calculate the Vector-Jacobian-product (Vjp) and use the chain rule to get the gradient outoput. - -$$ -\left[ - \begin{matrix} - \frac{\partial Y_{j}}{\partial x_{1}}& ... & \frac{\partial Y_{j}}{\partial x_{N}} \\ - \end{matrix} - \right]=\left[ - \begin{matrix} - \frac{\partial Y_{j}}{\partial y_{1}}& ... & \frac{\partial Y_{j}}{\partial y_{m}} \\ - \end{matrix} - \right]\left[ - \begin{matrix} - \frac{\partial y_{1}}{\partial x_{1}}& ... & \frac{\partial y_{1}}{\partial x_{n}} \\ - ... & ... & ... \\ - \frac{\partial y_{m}}{\partial x_{1}} & ... & \frac{\partial y_{m}}{\partial x_{n}} - \end{matrix} - \right] -\tag{10} -$$ - -### `grad` Implementation - -`grad` uses reverse mode AD, which calcultes gradients from network outputs. - -#### `grad` Design - -Consuming that the origin function of defining model is as follows: - -$$ -f(g(x, y, z)) \tag{11} -$$ - -Then the gradient of $f()$ to $x$ is: - -$$ -\frac{df}{dx}=\frac{df}{dg}\frac{dg}{dx}\frac{dx}{dx}+\frac{df}{dg}\frac{dg}{dy}\frac{dy}{dx}+\frac{df}{dg}\frac{dg}{dz}\frac{dz}{dx}\tag{12} -$$ - -The formula of $\frac{df}{dy}$ and $\frac{df}{dz}$ is similar to $\frac{df}{dx}$. - -Based on chain rule, we define gradient function `bprop: dout->(df, dinputs)` for every functions (including operators and graph). Here, `df` means gradients with respect to free variables (variables defined outside the function) and `dinputs` is gradients to function inputs. Then we use total derivative rule to accumulate `(df, dinputs)` to correspond variables. - -MindIR has developed the formulas for branching, loops and closures. So if we define the gradient rules correctly, we can get the correct gradient. - -Define operator K, backward mode AD can be represented as: - -```text -v = (func, inputs) -F(v): { - (result, bprop) = K(func)(inputs) - df, dinputs = bprop(dout) - v.df += df - v.dinputs += dinputs -} -``` - -#### `grad` Algorithm Implementation - -In `grad` process, the function that needs to calculate gradient will be taken out and used as the input of automatic differentiation module. - -AD module will map input function to gradient `fprop`. - -The output gradient has form `fprop = (forward_result, bprop)`. `forward_result` is the output node of the origin function. `bprop` is the gradient function which relies on the closure object of `fprop`. `bprop` has only one input `dout`. `inputs` and `outputs` are the called inputs and outputs of `fprop`. - -```c++ - MapObject(); // Map ValueNode/Parameter/FuncGraph/Primitive object - MapMorphism(); // Map CNode morphism - res = k_graph(); // res is fprop object of gradient function -``` - -When generating gradient function object, we need to do a series of mapping from origin function to gradient function. These mapping will generate gradient function nodes and we will connect these nodes according to reverse mode AD rules. - -For each subgraph of origin function, we will create an `DFunctor` object, for mapping the original function object to a gradient function object. `Dfunctor` will run `MapObject` and `MapMorphism` to do the mapping. - -`MapObject` implements the mapping of the original function node to the gradient function node, including the mapping of free variables, parameter nodes, and ValueNode. - -```c++ -MapFvObject(); // map free variables -MapParamObject(); // map parameters -MapValueObject(); // map ValueNodes -``` - -- `MapFvObject` maps free variables. -- `MapParamObject` maps parameter nodes. -- `MapValueObject` mainly maps `Primitive` and `FuncGraph` objects. - -For `FuncGraph`, we need to create another `DFunctor` object and perform the mapping, which is a recursion process. `Primitive` defines the type of the operator. We need to define gradient function for every `Primitive`. - -MindSpore defines these gradient functions in Python, taking `sin` operator for example: - -```python -import mindspore.ops as ops -from mindspore.ops._grad.grad_base import bprop_getters - -@bprop_getters.register(ops.Sin) -def get_bprop_sin(self): - """Grad definition for `Sin` operation.""" - cos = ops.Cos() - - def bprop(x, out, dout): - dx = dout * cos(x) - return (dx,) - - return bprop -``` - -`x` is the input to the original function object `sin`. `out` is the output of the original function object `sin`, and `dout` is the gradient input of the current accumulation. - -When `MapObject` completes the mapping of the above nodes, `MapMorphism` recursively implements the state injection of `CNode` from the output node of the original function, establishes a backpropagation link between nodes, and realizes gradient accumulation. - -#### `grad` Example - -Let's build a simple network to represent the formula: - -$$ -f(x) = cos(sin(x)) \tag{13} -$$ - -And derive the input `x` of formula (13): - -$$ -f'(x) = -sin(sin(x)) * cos(x) \tag{14} -$$ - -The structure of the network in formula (13) in MindSpore is implemented as follows: - -```python -import mindspore.nn as nn - -class Net(nn.Cell): - def __init__(self): - super(Net, self).__init__() - self.sin = ops.Sin() - self.cos = ops.Cos() - - def construct(self, x): - a = self.sin(x) - out = self.cos(a) - return out -``` - -The structure of a forward network is: - -![auto-gradient-foward](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png) - -After the network is reversely differential, the resulting differential network structure is: - -![auto-gradient-forward2](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png) - -### Forward Automatic Differentiation Implementation - -Besides `grad`, MindSpore has developed forward mode automatic differentiation method `jvp` (Jacobian-Vector-Product). - -Compared to reverse mode AD, forward mode AD is more suitable for networks whose input dimension is smaller than output dimension. MindSpore forward mode AD is developed based on reversed mode Grad function. - -![auto-gradient-jvp](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png) - -The network in black is the origin function. After the first derivative based on one input $x$, we get the network in blue. The second is the blue plot for the $v$ derivative, resulting in a yellow plot. - -This yellow network is the forward mode AD gradient network of black network. Since blue network is a linear network for vector $v$, there will be no connection between blue network and yellow network. So, all the nodes in blue are dangling nodes. We can use only blue and yellow nodes to calculate the gradient. - -#### References - -[1] Baydin, A.G. et al., 2018. [Automatic differentiation in machine learning: A survey](https://arxiv.org/abs/1502.05767). arXiv.org. [Accessed September 1, 2021]. - -## Functional + Object-Oriented Fusion Programming - -Taking into account the flexibility and ease of use of the neural network model construction and training process, combined with MindSpore's own functional automatic differentiation mechanism, MindSpore has designed a functional + object-oriented fusion programming paradigm for AI model training, which can combine the advantages of object-oriented programming and functional programming. The same set of automatic differentiation mechanism is also used to achieve the compatibility of deep learning back propagation and scientific computing automatic differentiation, supporting the compatibility of AI and scientific computing modeling from the bottom. The following is a typical process for the functional + object-oriented fusion programming: - -1. Constructing neural networks with classes. -2. Instantiating neural network objects. -3. Constructing the forward function, and connecting the neural network and the loss function. -4. Using function transformations to obtain gradient calculation (back propagation) functions. -5. Constructing training process functions. -6. Calling functions for training. - -The following is a simple example of functional + object-oriented fusion programming: - -```python -# Class definition -class Net(nn.Cell): - def __init__(self): - ...... - def construct(self, inputs): - ...... - -# Object instantiation -net = Net() # network -loss_fn = nn.CrossEntropyLoss() # loss function -optimizer = nn.Adam(net.trainable_params(), lr) # optimizer - -# define forward function -def forward_fn(inputs, targets): - logits = net(inputs) - loss = loss_fn(logits, targets) - return loss, logits - -# get grad function -grad_fn = value_and_grad(forward_fn, None, optim.parameters, has_aux=True) - -# define train step function -def train_step(inputs, targets): - (loss, logits), grads = grad_fn(inputs, targets) # get values and gradients - optimizer(grads) # update gradient - return loss, logits - -for i in range(epochs): - for inputs, targets in dataset(): - loss = train_step(inputs, targets) -``` - -As in the above example, object-oriented programming is used in the construction of the neural network, and the neural network layers are constructed in a manner consistent with the conventions of AI programming. When performing forward computation and backward propagation, MindSpore uses functional programming to construct the forward computation as a function, then obtains `grad_fn` by function transformation, and finally obtains the gradient corresponding to the weights by executing `grad_fn`. - -The functional + object-oriented fusion programming ensures the ease of use of neural network construction and improves the flexibility of training processes such as forward computation and backward propagation, which is the default programming paradigm recommended by MindSpore. diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_example.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_example.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_example.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_example.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_exec_order.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_exec_order.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_exec_order.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_exec_order.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_framework.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_framework.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_framework.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_framework.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_kernelselect.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_kernelselect.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_kernelselect.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_kernelselect.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_lazyinline.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_lazyinline.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_lazyinline.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_lazyinline.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_memory_manage.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_memory_manage.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_memory_manage.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_memory_manage.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_memory_pool.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_memory_pool.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_memory_pool.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_memory_pool.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_multi_stream.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_multi_stream.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_multi_stream.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_multi_stream.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_no_task.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_no_task.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_no_task.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_no_task.png diff --git a/docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_rt_pipeline.png b/docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_rt_pipeline.png similarity index 100% rename from docs/mindspore/source_en/design/images/multi_level_compilation/jit_level_rt_pipeline.png rename to docs/mindspore/source_en/features/compile/images/multi_level_compilation/jit_level_rt_pipeline.png diff --git a/docs/mindspore/source_en/design/multi_level_compilation.md b/docs/mindspore/source_en/features/compile/multi_level_compilation.md similarity index 99% rename from docs/mindspore/source_en/design/multi_level_compilation.md rename to docs/mindspore/source_en/features/compile/multi_level_compilation.md index f37b95816a434dafed7e124401ef646d188e07b5..c6282d3283987a6624a08b93a5d0b6cb795611cc 100644 --- a/docs/mindspore/source_en/design/multi_level_compilation.md +++ b/docs/mindspore/source_en/features/compile/multi_level_compilation.md @@ -1,6 +1,6 @@ # Multi-Level Compilation Architecture -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/multi_level_compilation.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/features/compile/multi_level_compilation.md) ## Background diff --git a/docs/mindspore/source_en/design/data_engine.md b/docs/mindspore/source_en/features/data_engine.md similarity index 99% rename from docs/mindspore/source_en/design/data_engine.md rename to docs/mindspore/source_en/features/data_engine.md index 3d05c642b8f8069a40a85503f719bb21be36d45a..ba9000f81187abfcdd973b06632ac33610a5299c 100644 --- a/docs/mindspore/source_en/design/data_engine.md +++ b/docs/mindspore/source_en/features/data_engine.md @@ -1,6 +1,6 @@ # High Performance Data Processing Engine -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/data_engine.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/features/data_engine.md) ## Background Introduction diff --git a/docs/mindspore/source_en/design/images/arch_en.png b/docs/mindspore/source_en/features/images/arch_en.png similarity index 100% rename from docs/mindspore/source_en/design/images/arch_en.png rename to docs/mindspore/source_en/features/images/arch_en.png diff --git a/docs/mindspore/source_en/design/images/auto_parallel.png b/docs/mindspore/source_en/features/images/auto_parallel.png similarity index 100% rename from docs/mindspore/source_en/design/images/auto_parallel.png rename to docs/mindspore/source_en/features/images/auto_parallel.png diff --git a/docs/mindspore/source_en/design/images/backward_ad.png b/docs/mindspore/source_en/features/images/backward_ad.png similarity index 100% rename from docs/mindspore/source_en/design/images/backward_ad.png rename to docs/mindspore/source_en/features/images/backward_ad.png diff --git a/docs/mindspore/source_en/design/images/data/data_engine_en.png b/docs/mindspore/source_en/features/images/data/data_engine_en.png similarity index 100% rename from docs/mindspore/source_en/design/images/data/data_engine_en.png rename to docs/mindspore/source_en/features/images/data/data_engine_en.png diff --git a/docs/mindspore/source_en/design/images/data_parallel.png b/docs/mindspore/source_en/features/images/data_parallel.png similarity index 100% rename from docs/mindspore/source_en/design/images/data_parallel.png rename to docs/mindspore/source_en/features/images/data_parallel.png diff --git a/docs/mindspore/source_en/design/images/forward_ad.png b/docs/mindspore/source_en/features/images/forward_ad.png similarity index 100% rename from docs/mindspore/source_en/design/images/forward_ad.png rename to docs/mindspore/source_en/features/images/forward_ad.png diff --git a/docs/mindspore/source_en/design/images/ir/cf.dot b/docs/mindspore/source_en/features/images/ir/cf.dot similarity index 100% rename from docs/mindspore/source_en/design/images/ir/cf.dot rename to docs/mindspore/source_en/features/images/ir/cf.dot diff --git a/docs/mindspore/source_en/design/images/ir/cf.png b/docs/mindspore/source_en/features/images/ir/cf.png similarity index 100% rename from docs/mindspore/source_en/design/images/ir/cf.png rename to docs/mindspore/source_en/features/images/ir/cf.png diff --git a/docs/mindspore/source_en/design/images/ir/closure.dot b/docs/mindspore/source_en/features/images/ir/closure.dot similarity index 100% rename from docs/mindspore/source_en/design/images/ir/closure.dot rename to docs/mindspore/source_en/features/images/ir/closure.dot diff --git a/docs/mindspore/source_en/design/images/ir/closure.png b/docs/mindspore/source_en/features/images/ir/closure.png similarity index 100% rename from docs/mindspore/source_en/design/images/ir/closure.png rename to docs/mindspore/source_en/features/images/ir/closure.png diff --git a/docs/mindspore/source_en/design/images/ir/hof.dot b/docs/mindspore/source_en/features/images/ir/hof.dot similarity index 100% rename from docs/mindspore/source_en/design/images/ir/hof.dot rename to docs/mindspore/source_en/features/images/ir/hof.dot diff --git a/docs/mindspore/source_en/design/images/ir/hof.png b/docs/mindspore/source_en/features/images/ir/hof.png similarity index 100% rename from docs/mindspore/source_en/design/images/ir/hof.png rename to docs/mindspore/source_en/features/images/ir/hof.png diff --git a/docs/mindspore/source_en/design/images/ir/ir.dot b/docs/mindspore/source_en/features/images/ir/ir.dot similarity index 100% rename from docs/mindspore/source_en/design/images/ir/ir.dot rename to docs/mindspore/source_en/features/images/ir/ir.dot diff --git a/docs/mindspore/source_en/design/images/ir/ir.png b/docs/mindspore/source_en/features/images/ir/ir.png similarity index 100% rename from docs/mindspore/source_en/design/images/ir/ir.png rename to docs/mindspore/source_en/features/images/ir/ir.png diff --git a/docs/mindspore/source_en/design/images/operator_split.png b/docs/mindspore/source_en/features/images/operator_split.png similarity index 100% rename from docs/mindspore/source_en/design/images/operator_split.png rename to docs/mindspore/source_en/features/images/operator_split.png diff --git a/docs/mindspore/source_en/design/images/tensor_redistribution1.png b/docs/mindspore/source_en/features/images/tensor_redistribution1.png similarity index 100% rename from docs/mindspore/source_en/design/images/tensor_redistribution1.png rename to docs/mindspore/source_en/features/images/tensor_redistribution1.png diff --git a/docs/mindspore/source_en/design/images/tensor_redistribution2.png b/docs/mindspore/source_en/features/images/tensor_redistribution2.png similarity index 100% rename from docs/mindspore/source_en/design/images/tensor_redistribution2.png rename to docs/mindspore/source_en/features/images/tensor_redistribution2.png diff --git a/docs/mindspore/source_en/design/images/tensor_redistribution3.png b/docs/mindspore/source_en/features/images/tensor_redistribution3.png similarity index 100% rename from docs/mindspore/source_en/design/images/tensor_redistribution3.png rename to docs/mindspore/source_en/features/images/tensor_redistribution3.png diff --git a/docs/mindspore/source_en/features/index.rst b/docs/mindspore/source_en/features/index.rst index ad841b81d8da250af203ff9c511315bc4401f9e0..e1b343d2cfd052bba97411752937d2f2e24a7ed5 100644 --- a/docs/mindspore/source_en/features/index.rst +++ b/docs/mindspore/source_en/features/index.rst @@ -1,148 +1,22 @@ -Feature Description -================================= +Developer Notes +========================= .. toctree:: :glob: :maxdepth: 1 - :hidden: - :caption: Programming Forms - - program_form/overview - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: Data Processing - - dataset/overview - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: Distributed Parallelism + overview parallel/data_parallel parallel/operator_parallel parallel/optimizer_parallel parallel/pipeline_parallel parallel/auto_parallel - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: Compile - + compile/multi_level_compilation compile/graph_construction compile/graph_optimization - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: Runtime - runtime/memory_manager runtime/multilevel_pipeline runtime/multistream_concurrency runtime/pluggable_backend - -.. raw:: html - - - -.. raw:: html - - - \ No newline at end of file + runtime/pluggable_device + data_engine \ No newline at end of file diff --git a/docs/mindspore/source_en/design/overview.md b/docs/mindspore/source_en/features/overview.md similarity index 99% rename from docs/mindspore/source_en/design/overview.md rename to docs/mindspore/source_en/features/overview.md index 045abb5e20819ee586aeac99f2607889298fa14b..9a6f425b186b4958e94e6036d90394a324acd603 100644 --- a/docs/mindspore/source_en/design/overview.md +++ b/docs/mindspore/source_en/features/overview.md @@ -1,6 +1,6 @@ # MindSpore Design Overview -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/overview.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/features/overview.md) ## Introduction diff --git a/docs/mindspore/source_en/features/program_form/overview.md b/docs/mindspore/source_en/features/program_form/overview.md deleted file mode 100644 index ca0b844c88674915ffde8c18663d54e081b66542..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_en/features/program_form/overview.md +++ /dev/null @@ -1,11 +0,0 @@ -# Programming Forms Overview - -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/features/program_form/overview.md) - -MindSpore is an AI framework designed for “end-edge-cloud” full-scenario, providing users with interfaces for AI model development, training, and inference, and supporting the development and debugging of neural networks using native Python syntax. The program provides dynamic graphs, static graphs, dynamic and static unified programming form, so that developers can balance the development efficiency and performance. - -Considering development flexibility and ease of use, MindSpore supports dynamic graph programming model. Based on the functional and nn.cell interfaces provided by MindSpore, users can flexibly assemble and build the required network, and the relevant interfaces are interpreted and executed in accordance with the shape of the Python function library, and support the ability to differentiate and derive, so that the interfaces are easy to debug and develop. The related interfaces are configured to support accelerated hardware asynchronous downstream execution to achieve heterogeneous acceleration. - -Meanwhile, based on the dynamic graph pattern, MindSpore provides @jit decorator optimization capability, and you can specify the function to be optimized by @jit decoration. The decorated part will be parsed as a whole, constructed into a C++ computational graph, globally analyzed, compiled and optimized, thus accelerating the overall execution performance of the decorated part, which is called static acceleration. - -In addition to the dynamic graph mode, MindSpore further provides programming mode in [static graph](https://www.mindspore.cn/tutorials/en/master/compile/static_graph.html). The interfaces related to the MindSpore model construction remain unchanged, no need to add @jit decoration. MindSpore framework will perform overall compilation and parsing on all definition contents developed in construct function of nn.cell class, and construct complete static graph for network to perform whole-graph level compilation optimization and execution. This enables model-level proprietary optimization for the entire network, based on the characteristics of AI model training and inference, to obtain higher execution performance. \ No newline at end of file diff --git a/docs/mindspore/source_en/design/images/pluggable_device_arch.png b/docs/mindspore/source_en/features/runtime/images/pluggable_device_arch.png similarity index 100% rename from docs/mindspore/source_en/design/images/pluggable_device_arch.png rename to docs/mindspore/source_en/features/runtime/images/pluggable_device_arch.png diff --git a/docs/mindspore/source_en/design/pluggable_device.md b/docs/mindspore/source_en/features/runtime/pluggable_device.md similarity index 94% rename from docs/mindspore/source_en/design/pluggable_device.md rename to docs/mindspore/source_en/features/runtime/pluggable_device.md index e452fb85c6f15a7fc62bf111a717a3196ab99424..64e070e11bc9f274033e3c2694e5d3e8e63e3448 100644 --- a/docs/mindspore/source_en/design/pluggable_device.md +++ b/docs/mindspore/source_en/features/runtime/pluggable_device.md @@ -1,6 +1,6 @@ # Third-Party Hardware Interconnection -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/design/pluggable_device.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/features/runtime/pluggable_device.md) MindSpore supports plug-in, standardized, low-cost and rapid interconnection of third-party chips through an open architecture: @@ -11,7 +11,7 @@ MindSpore supports plug-in, standardized, low-cost and rapid interconnection of MindSpore overall architecture and components related to the backend are shown in the following figure: -![image](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_en/design/images/pluggable_device_arch.png) +![image](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_en/features/runtime/images/pluggable_device_arch.png) The overall MindSpore architecture consists of the following major components, which have interdependencies with each other: @@ -46,7 +46,7 @@ The generic architecture Kernel mode requires the following aspects to be implem - Memory management. DeviceAddress is the abstraction of memory, and third-party chip vendors need to implement the function of copying between Host and Device. It also needs to provide memory request and destruction functions. To facilitate third-party chip vendors, MindSpore provides a set of memory pool implementations and an efficient memory reuse algorithm, SOMAS, in the Common component. - Stream management. If the chip to be docked has the concept of stream, it needs to provide the function of creation and destruction. and If not, it will run in single stream mode. -![image](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/pluggable_device_kernel.png) +![image](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_kernel.png) ## Graph Mode Interconnection @@ -56,4 +56,4 @@ If the chip vendor's software stack can provide completely high level APIs, or i - Graph execution. The third-party chip vendor needs to understand MindSpore Tensor format or transform it into a format that can be understood, and call the execution of the ready graph and transform the result of the execution into MindSpore Tensor format. -![image](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/pluggable_device_graph.png) +![image](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_graph.png) diff --git a/docs/mindspore/source_zh_cn/design/all_scenarios.ipynb b/docs/mindspore/source_zh_cn/design/all_scenarios.ipynb deleted file mode 100644 index 11eb5f0c4b2afd48cb432f5b2bfb445845d354ca..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_zh_cn/design/all_scenarios.ipynb +++ /dev/null @@ -1,282 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 全场景统一架构\n", - "\n", - "[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/all_scenarios.ipynb)\n", - "\n", - "MindSpore旨在提供端边云全场景的AI框架。MindSpore可部署于端、边、云不同的硬件环境,满足不同环境的差异化需求,如支持端侧的轻量化部署,支持云侧丰富的训练功能如自动微分、混合精度、模型易用编程等。\n", - "\n", - "> 云侧包括NVIDIA GPU、Huawei Ascend、Intel x86等,端侧包括Arm、Qualcomm、Kirin等。\n", - "\n", - "![intro](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png)\n", - "\n", - "## 全场景重要特性\n", - "\n", - "MindSpore全场景的几个重要特性:\n", - "\n", - "1. 端边云统一的C++推理接口。支持算法代码可快速迁移到不同硬件环境执行,如[基于C++接口实现端侧训练](https://mindspore.cn/lite/docs/zh-CN/master/quick_start/train_lenet.html)。\n", - "2. 模型统一。端云使用相同的模型格式和定义,软件架构一致。MindSpore支持Ascend、GPU、CPU(x86、Arm)等多种硬件的执行,一次训练多处部署使用。\n", - "3. 多样化算力支持。提供统一的南向接口,支持新硬件的快捷添加使用。\n", - "4. 模型小型化技术。适配不同硬件环境和业务场景的要求,如量化压缩等。\n", - "5. 端边云协同技术的快速应用。如[联邦学习](https://mindspore.cn/federated/docs/zh-CN/master/index.html)、[端侧训练](https://mindspore.cn/lite/docs/zh-CN/master/use/runtime_train.html)等新技术。\n", - "\n", - "## 全场景支持模式\n", - "\n", - "![train-process](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/all_scenarios_train_process.png)\n", - "\n", - "如上图所示,在MindSpore上训练出来的模型文件,可通过Serving部署在云服务中执行,也可用过Lite执行在服务器、端侧等设备上。同时,Lite支持通过独立工具convert进行模型的离线优化,实现推理时框架的轻量化以及模型执行高性能的目标。\n", - "\n", - "MindSpore抽象各个硬件下的统一算子接口。因此,在不同硬件环境下,网络模型的编程代码可以保持一致。同时,加载相同的模型文件,在MindSpore支持的各个不同硬件上均能有效执行推理。\n", - "\n", - "推理方面考虑到大量用户使用C++/C编程方式,提供了C++的推理编程接口,相关编程接口在形态上与Python接口的风格较接近。\n", - "\n", - "同时,通过提供第三方硬件的自定义离线优化注册,第三方硬件的自定义算子注册机制,实现快速对接新的硬件,且对外的模型编程接口以及模型文件保持不变。\n", - "\n", - "## 中间表示MindIR\n", - "\n", - "### 简介\n", - "\n", - "中间表示(IR)是程序编译过程中介于源语言和目标语言之间的程序表示,以方便编译器进行程序分析和优化。因此,IR的设计需要考虑从源语言到目标语言的转换难度,同时考虑程序分析和优化的易用性和性能。\n", - "\n", - "MindIR是一种基于图表示的函数式IR,其最核心的目的是服务于自动微分变换。自动微分采用的是基于函数式编程框架的变换方法,因此,IR采用了接近于ANF函数式的语义。此外,借鉴Sea of Nodes[1]和Thorin[2]的优秀设计,采用了一种基于显性依赖图的表示方式。关于ANF-IR的具体介绍,可以参考[MindSpore IR文法定义](https://www.mindspore.cn/docs/zh-CN/master/design/all_scenarios.html#文法定义)。\n", - "\n", - "在图模式`set_context(mode=GRAPH_MODE)`下运行用MindSpore编写的模型时,若设置了环境变量`MS_DEV_SAVE_GRAPHS`的值为1,运行时会输出一些图编译过程中生成的一些中间文件,我们称为IR文件。当需要分析更多后端流程相关的ir文件时,可以设置环境变量`MS_DEV_SAVE_GRAPHS`的值为2。当需要更多进阶的信息比如可视化计算图,或者更多详细前端ir图时,可以设置环境变量`MS_DEV_SAVE_GRAPHS`的值为3。当前主要有两种格式的IR文件:\n", - "\n", - "- ir后缀结尾的IR文件:一种比较直观易懂的以文本格式描述模型结构的文件,可以直接用文本编辑软件查看。\n", - "- dot后缀结尾的IR文件:描述了不同节点间的拓扑关系,可以用[graphviz](http://graphviz.org)将此文件作为输入生成图片,方便用户直观地查看模型结构。\n", - "\n", - "### 文法定义\n", - "\n", - "ANF是函数式编程中常用且简洁的中间表示,其文法定义如下所示:\n", - "\n", - "```text\n", - " ::= NUMBER | STRING | VAR | BOOLEAN | PRIMOP\n", - " | (lambda (VAR …) )\n", - " ::= ( …)\n", - " | (if )\n", - " ::= (let ([VAR ]) ) | | \n", - "\n", - "```\n", - "\n", - "ANF中表达式分为原子表达式(aexp)和复合表达式(cexp),原子表达式表示一个常数值或一个变量或一个匿名函数;复合表达式由多个原子表达式复合组成,表示一个匿名函数或原语函数调用,组合的第一个输入是调用的函数,其余输入是调用的参数。\n", - "\n", - "MindIR文法继承于ANF,其定义如下所示:\n", - "\n", - "```text\n", - " ::= | \n", - " ::= Parameter\n", - " ::= Scalar | Named | Tensor | Type | Shape\n", - " | Primitive | MetaFuncGraph | FuncGraph\n", - " ::= ( …)\n", - " ::= | \n", - "```\n", - "\n", - "MindIR中的ANode对应于ANF的原子表达式,ANode有两个子类分别为ValueNode和ParameterNode,其中:\n", - "\n", - "- ValueNode表示常数节点,可承载一个常数值(标量、符号、张量、类型、维度等),也可以是一个原语函数(Primitive)或一个元函数(MetaFuncGraph)或一个普通函数(FuncGraph),因为在函数式编程中函数定义本身也是一个值。\n", - "- ParameterNode是参数节点,表示函数的形参。\n", - "\n", - "MindIR中的CNode对应于ANF的复合表达式,表示一次函数调用。\n", - "\n", - "在MindSpore自动微分时,会计算ParameterNode和CNode的梯度贡献,并返回最终ParameterNode的梯度,而不计算ValueNode的梯度。\n", - "\n", - "### 示例\n", - "\n", - "下面以一段程序作为示例,对比理解MindIR。" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "def func(x, y):\n", - " return x / y\n", - "\n", - "@ms.jit\n", - "def test_f(x, y):\n", - " a = x - 1\n", - " b = a + y\n", - " c = b * func(a, b)\n", - " return c" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "这段Python代码对应的ANF表达为:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```text\n", - "lambda (x, y)\n", - " let a = x - 1 in\n", - " let b = a + y in\n", - " let func = lambda (x, y)\n", - " let ret = x / y in\n", - " ret end in\n", - " let %1 = func(a, b) in\n", - " let c = b * %1 in\n", - " c end\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "对应的MindIR为[ir.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/images/ir/ir.dot):\n", - "\n", - "![image1](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/ir/ir.png)\n", - "\n", - "在MindIR中,一个函数图(FuncGraph)表示一个普通函数的定义,函数图一般由ParameterNode、ValueNode和CNode组成有向无环图,可以清晰地表达出从参数到返回值的计算过程。在上图中可以看出,python代码中两个函数`test_f`和`func`转换成了两个函数图,其参数`x`和`y`转换为函数图的ParameterNode,每一个表达式转换为一个CNode。CNode的第一个输入链接着调用的函数,例如图中的`add`、`func`、`return`。值得注意的是这些节点均是`ValueNode`,因为它们被理解为常数函数值。CNode的其他输入链接这调用的参数,参数值可以来自于ParameterNode、ValueNode和其他CNode。\n", - "\n", - "在ANF中每个表达式都用let表达式绑定为一个变量,通过对变量的引用来表示对表达式输出的依赖,而在MindIR中每个表达式都绑定为一个节点,通过节点与节点之间的有向边表示依赖关系。\n", - "\n", - "### 函数式语义\n", - "\n", - "MindIR较传统计算图的一个重要特性是不仅可以表达算子之间的数据依赖,还可以表达丰富的函数式语义。\n", - "\n", - "#### 高阶函数\n", - "\n", - "在MindIR中,函数的定义是由一个子图来定义,但其本身可以是一个被传递的值,作为其他高阶函数的输入或输出。\n", - "例如下面一个简单的示例中,函数`f`作为参数传入了函数`g`,因此函数`g`是一个接收函数输入的高阶函数,函数`f`真正的调用点是在函数`g`内部。" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "@ms.jit\n", - "def hof(x):\n", - " def f(x):\n", - " return x + 3\n", - " def g(function, x):\n", - " return function(x) * function(x)\n", - " res = g(f, x)\n", - " return res" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "对应的MindIR为[hof.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/images/ir/hof.dot):\n", - "\n", - "![image2](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/ir/hof.png)\n", - "\n", - "在实际网络训练脚本中,自动求导泛函`grad`和优化器中常用到的`Partial`和`HyperMap`都是典型的高阶函数。高阶语义极大地提升了MindSpore表达的灵活性和简洁性。\n", - "\n", - "#### 控制流\n", - "\n", - "控制流在MindIR中是以高阶函数选择调用的形式表达。这样的形式把控制流转换为高阶函数的数据流,从而使得自动微分算法更加强大。不仅可以支持数据流的自动微分,还可以支持条件跳转、循环和递归等控制流的自动微分。\n", - "\n", - "下面以一个简单的斐波那契用例来演示说明。" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "@ms.jit\n", - "def fibonacci(n):\n", - " if n < 1:\n", - " return 0\n", - " if n == 1:\n", - " return 1\n", - " return fibonacci(n-1) + fibonacci(n-2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "对应的MindIR为[cf.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/images/ir/cf.dot):\n", - "\n", - "![image3](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/ir/cf.png)\n", - "\n", - "其中,`fibonacci`是顶层函数图,在顶层中有两个函数图被`switch`选择调用。`✓fibonacci`是第一个`if`的True分支,`✗fibonacci`是第一个`if`的False分支。在`✗fibonacci`中被调用的`✓✗fibonacci`是`elif`的True分支,`✗✗fibonacci`是`elif`的False分支。这里需要理解的关键是在MindIR中,条件跳转和递归是以高阶控制流的形式表达的。例如,`✓fibonacci`和`✗fibonacci`是作为`switch`算子的参数传入,`switch`根据条件参数选择哪一个函数作为返回值。因此,`switch`是把输入的函数当成普通的值做了一个二元选择操作,并没有调用,而真正的函数调用是在紧随`switch`后的CNode上完成。\n", - "\n", - "#### 自由变量和闭包\n", - "\n", - "闭包(closure)是一种编程语言特性,它指的是代码块和作用域环境的结合。自由变量(free variable)是指在代码块中引用作用域环境中的变量而非局部变量。在MindIR中,代码块是以函数图呈现的,而作用域环境可以理解为该函数被调用时的上下文环境,自由变量的捕获方式是值拷贝而非引用。\n", - "\n", - "一个典型的闭包用例如下:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "@ms.jit\n", - "def func_outer(a, b):\n", - " def func_inner(c):\n", - " return a + b + c\n", - " return func_inner\n", - "\n", - "@ms.jit\n", - "def ms_closure():\n", - " closure = func_outer(1, 2)\n", - " out1 = closure(1)\n", - " out2 = closure(2)\n", - " return out1, out2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "对应的MindIR为[closure.dot](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/images/ir/closure.dot):\n", - "\n", - "![image4](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/ir/closure.png)\n", - "\n", - "在例子中,`a`和`b`是自由变量,因为`func_inner`中变量`a`和`b`是引用的其父图`func_outer`中定义的参数。变量`closure`是一个闭包,它是函数`func_inner`与其上下文`func_outer(1, 2)`的结合。因此,`out1`的结果是4,因为其等价于`1+2+1`,`out2`的结果是5,因为其等价于`1+2+2`。\n", - "\n", - "### 参考文献\n", - "\n", - "[1] C. Click and M. Paleczny. A simple graph-based intermediate representation.\n", - "SIGPLAN Not., 30:35-49, March 1995.\n", - "\n", - "[2] Roland Leißa, Marcel Köster, and Sebastian Hack. A graph-based higher-order intermediate\n", - "representation. In Proceedings of the 13th Annual IEEE/ACM International Symposium on\n", - "Code Generation and Optimization, pages 202-212. IEEE Computer Society, 2015." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "MindSpore", - "language": "python", - "name": "mindspore" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.3" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/docs/mindspore/source_zh_cn/design/distributed_training_design.ipynb b/docs/mindspore/source_zh_cn/design/distributed_training_design.ipynb deleted file mode 100644 index dbe52eb3b4181be0d0e2e3c15c940ce089d7666d..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_zh_cn/design/distributed_training_design.ipynb +++ /dev/null @@ -1,385 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 分布式并行原生\n", - "\n", - "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/zh_cn/design/mindspore_distributed_training_design.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_download_code.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/zh_cn/design/mindspore_distributed_training_design.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/distributed_training_design.ipynb)\n", - "\n", - "## 背景\n", - "\n", - "随着深度学习的快速发展,为了提升神经网络的精度和泛化能力,数据集和参数量都在呈指数级向上攀升。分布式并行训练成为一种解决超大规模网络性能瓶颈的发展趋势。\n", - "\n", - "为了应对数据集过大的问题,MindSpore引入了数据并行模式,利用多个设备的计算资源,同时处理更多的训练数据,加快模型训练速度。同时,当数据过大或模型过大而无法在单个计算节点上加载训练时,通过引入模型并行,使每个计算节点只需要加载部分模型和数据,从而减少内存占用,提高训练效率。\n", - "\n", - "在分布式并行编程的演进过程中,传统的手动并行要求用户基于通信原语通过编码,手动把模型切分到多个节点上并行,用户需要感知图切分、算子切分、集群拓扑,才能实现最优性能。此种编程范式对于工程师存在一定的门槛要求,于是演进出了半自动并行:并行逻辑和算法逻辑解耦,用户按单卡串行的方式写算法代码,并行逻辑作为算法配置。用户只需要配置并行策略实现自动并行切分,无需额外编写代码;用户无需感知模型切片的调度及集群拓扑。全自动并行训练编程范式则更进一步,用户只需要写单卡串行算法,通过搜索算法来自动生成较优的切分策略。\n", - "\n", - "MindSpore通过集合通信的方式来实现并行训练过程中的数据通信和同步操作,在Ascend芯片上它依赖于华为集合通信库HCCL,在GPU上它依赖于英伟达集合通信库NCCL。\n", - "\n", - "MindSpore目前采用的是同步训练模式,同步模式能够保证所有设备上的参数保持一致,在每个训练迭代开始前所有设备上的参数都被同步。\n", - "\n", - "本篇设计文档将会集中介绍几种并行训练方式的设计原理,同时指导用户进行自定义开发。\n", - "\n", - "## 数据并行\n", - "\n", - "这个小节介绍了在MindSpore中`ParallelMode.DATA_PARALLEL`数据并行模式是如何工作的。\n", - "\n", - "### 数据并行原理\n", - "\n", - "![数据并行图解](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/data_parallel.png)\n", - "\n", - "1. 环境依赖\n", - "\n", - " 每次开始并行训练前,通过调用[mindspore.communication.init](https://www.mindspore.cn/docs/zh-CN/master/api_python/communication/mindspore.communication.init.html)接口初始化通信资源,并自动创建全局通信组`WORLD_COMM_GROUP`。\n", - "\n", - "2. 数据分发(Data distribution)\n", - "\n", - " 数据并行的核心在于将数据集在样本维度拆分并下发到不同的卡上。在[mindspore.dataset](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.dataset.html)模块提供的所有数据集加载接口中都有`num_shards`和`shard_id`两个参数,它们用于将数据集拆分为多份并循环采样的方式,采集`batch`大小的数据到各自的卡上,当出现数据量不足的情况时将会从头开始采样。\n", - "\n", - "3. 网络构图\n", - "\n", - " 数据并行网络的书写方式与单机网络没有差别,这是因为在正反向传播(Forward propagation & Backward Propagation)过程中各卡的模型间是独立执行的,只是保持了相同的网络结构。唯一需要特别注意的是:为了保证各卡间训练同步,相应的网络参数初始化值应当是一致的,在`DATA_PARALLEL`和`HYBRID_PARALLEL`模式下建议通过使能`parameter_broadcast`达到权重广播的目的;在`AUTO_PARALLEL`和`SEMI_AUTO_PARALLEL`模式下,框架内部会自动分析参数的并行度,并设置相应的随机数种子,保证在数据并行维度的设备上参数初始化值一致。\n", - "\n", - "4. 梯度聚合(Gradient aggregation)\n", - "\n", - " 数据并行理论上应该实现和单机一致的训练效果。为了保证计算逻辑的一致性,在梯度计算完成后插入[AllReduce](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.AllReduce.html)算子实现各卡间的梯度聚合操作。MindSpore设置了`mean`开关,用户可以选择是否要对求和后的梯度值进行求平均操作,也可以将其视为超参项,打开开关等价于学习率倍数缩小。\n", - "\n", - "5. 参数更新(Parameter update)\n", - "\n", - " 因为引入了梯度聚合操作,所以各卡的模型会以相同的梯度值一起进入参数更新步骤。因此,MindSpore实现的是一种同步数据并行训练方式。理论上最终每卡训练出来的模型是相同的,如果网络中含有在样本维度的归约类型操作,网络的输出可能会有所差别,这是由数据并行的切分性质决定的。\n", - "\n", - "### 数据并行代码\n", - "\n", - "1. 集合通信\n", - "\n", - " - [management.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/communication/management.py):这个文件中涵盖了集合通信过程中常用的`helper`函数接口,例如获取集群数量和卡的序号等。当在Ascend芯片上执行时,框架会加载环境上的`libhccl.so`库文件,通过它来完成从Python层到底层的通信接口调用。\n", - " - [comm_ops.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/ops/operations/comm_ops.py):MindSpore将支持的集合通信操作都封装为算子的形式放在这个文件下,包括[AllReduce](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.AllReduce.html)、[AllGather](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.AllGather.html)、[ReduceScatter](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.ReduceScatter.html)和[Broadcast](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.Broadcast.html)等。`PrimitiveWithInfer`中除了定义算子所需属性外,还包括构图过程中输入到输出的`shape`和`dtype`推导。\n", - "\n", - "2. 梯度聚合\n", - "\n", - " - [grad_reducer.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/nn/wrap/grad_reducer.py):这个文件实现了梯度聚合的过程。对入参`grads`用`HyperMap`展开后插入`AllReduce`算子,这里采用的是全局通信组,用户也可以根据自己网络的需求仿照这个模块进行自定义开发。MindSpore中单机和分布式执行共用一套网络封装接口,在`Cell`内部通过`ParallelMode`来区分是否要对梯度做聚合操作。\n", - "\n", - "## 半自动并行\n", - "\n", - "这个小节介绍了在MindSpore中`ParallelMode.SEMI_AUTO_PARALLEL`半自动并行模式是如何工作的。\n", - "\n", - "### 半自动并行原理\n", - "\n", - "![自动并行图解](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_parallel.png)\n", - "\n", - "1. 分布式算子和张量排布模型\n", - "\n", - " 在上面的架构图中,自动并行流程会对单机的正向计算图(ANF Graph)进行遍历,以分布式算子(Distributed Operator)为单位对张量进行切分建模,表示一个算子的输入输出张量如何分布到集群各个卡上(Tensor Layout)。这种模型充分地表达了张量和设备间的映射关系,用户无需感知模型各切片放到哪个设备上运行,框架会自动调度分配。\n", - "\n", - " 为了得到张量的排布模型,每个算子都具有切分策略(Shard Strategy),它表示算子的各个输入在相应维度的切分情况。通常情况下,只要满足以2为基、均匀分配的原则,张量的任意维度均可切分。\n", - "\n", - " 以下图为例,这是一个三维矩阵乘(BatchMatMul)操作,它的切分策略由两个元组构成,分别表示`input`和`weight`的切分形式。其中元组中的元素与张量维度一一对应,`2^N`为切分份数,`1`表示不切。当用户想表示一个数据并行切分策略时,即`input`的`batch`维度切分,其他维度不切,可以表达为`strategy=((2^N, 1, 1),(1, 1, 1))`;当表示一个模型并行切分策略时,即`weight`的非`batch`维度切分,这里以`channel`维度切分为例,其他维度不切,可以表达为`strategy=((1, 1, 1),(1, 1, 2^N))`;当表示一个混合并行切分策略时,其中一种切分策略为`strategy=((2^N, 1, 1),(1, 1, 2^N))`。\n", - "\n", - " ![算子切分定义](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/operator_split.png)\n", - "\n", - " 依据切分策略,分布式算子中定义了推导算子输入张量和输出张量的排布模型的方法。这个排布模型由`device_matrix`,`tensor_shape`和`tensor_map`组成,分别表示设备矩阵形状、张量形状、设备和张量维度间的映射关系。分布式算子会进一步根据张量排布模型判断是否要在图中插入额外的计算、通信操作,以保证算子运算逻辑正确。\n", - "\n", - "2. 张量排布变换\n", - "\n", - " 当前一个算子的输出张量模型和后一个算子的输入张量模型不一致时,就需要引入计算、通信操作的方式实现张量排布间的变化。自动并行流程引入了张量重排布算法(Tensor Redistribution),可以推导得到任意排布的张量间通信转换方式。下面三个样例表示公式`Z=(X×W)×V`的并行计算过程,即两个二维矩阵乘操作,体现了不同并行方式间如何转换。\n", - "\n", - " 在样例一中,第一个数据并行矩阵乘的输出在行方向上存在切分,而第二个模型并行矩阵乘的输入需要全量张量,框架将会自动插入`AllGather`算子实现排布变换。\n", - "\n", - " ![tensor-redistribution1](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/tensor_redistribution1.png)\n", - "\n", - " 在样例二中,第一个模型并行矩阵乘的输出在列方向上存在切分,而第二个数据并行矩阵乘的输入在行方向上存在切分,框架将会自动插入等价于集合通信中`AlltoAll`操作的通信算子实现排布变换。\n", - "\n", - " ![tensor-redistribution2](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/tensor_redistribution2.png)\n", - "\n", - " 在样例三中,第一个混合并行矩阵乘的输出切分方式和第二个混合并行矩阵乘的输入切分方式一致,所以不需要引入重排布变换。但由于第二个矩阵乘操作中,两个输入的相关维度存在切分,所以需要插入`AllReduce`算子保证运算正确性。\n", - "\n", - " ![tensor-redistribution3](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/tensor_redistribution3.png)\n", - "\n", - " 综上,1、2两点是自动并行实现的基础,总体来说这种分布式表达打破了数据并行和模型并行的边界,轻松实现混合并行。从脚本层面上,用户仅需构造单机网络,即可表达并行算法逻辑,框架将自动实现对整图切分。\n", - "\n", - "3. 分布式自动微分\n", - "\n", - " 传统的手动模型切分除了需要关注正向网络通信还需要考虑网络反向的并行运算,MindSpore通过将通信操作包装为算子,并利用框架原有的自动微分操作自动生成通信算子反向,所以即便在进行分布式训练时,用户同样只需关注网络的前向传播,真正实现训练的全自动并行。\n", - "\n", - "4. 支持多维混合并行\n", - "\n", - " 半自动并行支持多种并行模式的自动混合使用,分别有:\n", - "\n", - " **算子级并行**:算子级并行以神经网络中的算子为单位,将输入张量切分到多个设备上进行计算。通过这种方式,可以实现数据样本和模型参数在不同设备之间的分配,从而训练大规模的深度学习模型,并利用集群资源进行并行计算,提高整体速度。用户可以设置每个算子的切分策略,框架会根据算子的切分策略对每个算子及其输入张量进行切分建模,以保持数学等价性。这种方法可以有效地减少单个设备的负载,提高计算效率,适用于大规模深度神经网络的训练。详情参考:[算子级并行](https://www.mindspore.cn/docs/zh-CN/master/features/parallel/operator_parallel.html)\n", - "\n", - " **流水线并行**:当集群设备数很多时,如果仅采用算子级并行的方式,则需要在整个集群的通信域上进行通信,这可能使得通信效率低,从而降低整体性能。而流水线并行能将神经网络结构切分成多个stage,每个stage跑在一部分设备内,将集合通信的通信域限定在这部分设备范围内,而stage间采用点对点通信。流水线并行的优点在于:能提升通信效率、能方便的处理按层堆叠的神经网络结构。缺点在于:同一时刻内,有些节点可能处于空闲状态。详情参考:[流水线并行](https://www.mindspore.cn/docs/zh-CN/master/features/parallel/pipeline_parallel.html)\n", - "\n", - " **MoE并行**:MoE是将专家分布到不同的worker上,并且每个worker承担不同批次的训练数据。对于非MoE层来说,专家并行和数据并行一样。在MoE层中,序列中的token通过all-to-all通信被发送到它们相匹配的专家所对应的worker。在完成对应专家的计算后,再通过all-to-all重新传回到原来的worker,组织成原始序列,用于下一层的计算。由于MoE模型通常有大量的专家,专家并行度比模型并行度更能随模型规模的增大而增大。\n", - "\n", - " ![MoE并行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/MoE.png)\n", - "\n", - " **多副本并行**:将输入模型的数据按照batchsize维度进行切分,从而将现有的单副本形式修改成多副本的形式,使其底层在通信的时候,另一副本进行计算操作,无需等待,这样就能保证多副本的计算和通信的时间相互互补,提升模型性能,同时将数据拆成多副本的形式还能减少算子输入的参数量,从而减少单个算子的计算时间,对提升模型性能有很大帮助。\n", - "\n", - " ![多副本并行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/multi_copy.png)\n", - "\n", - " **优化器并行**:在数据并行或算子级并行训练时,模型的参数可能在多个设备上存在同一份副本。这使得优化器在更新该权重之时,在多个设备间存在冗余计算。在此情况下,可以通过优化器并行将优化器的计算量分散到多个设备上。它的优点在于:能减少静态内存消耗、减少优化器内的计算量。缺点在于:增加了通信开销。详情参考:[优化器并行](https://www.mindspore.cn/docs/zh-CN/master/features/parallel/optimizer_parallel.html)\n", - "\n", - "### 半自动并行代码\n", - "\n", - "1. 张量排布模型\n", - " - [tensor_layout](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/tensor_layout):这个目录下包含了张量排布模型相关功能的定义及实现。其中`tensor_layout.h`中声明了一个张量排布模型需要具备的成员变量`tensor_map_origin_`,`tensor_shape_`和`device_arrangement_`等。在`tensor_redistribution.h`中声明了实现张量排布间`from_origin_`和`to_origin_`变换的相关方法,将推导得到的重排布操作保存在`operator_list_`中返回,并计算得到重排布所需的通信开销`comm_cost_`, 内存开销`memory_cost_`及计算开销`computation_cost_`。\n", - "\n", - "2. 分布式算子\n", - " - [ops_info](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/ops_info):这个目录下包含了分布式算子的具体实现。在`operator_info.h`中定义了分布式算子实现的基类`OperatorInfo`,开发一个分布式算子需要继承于这个基类并显式实现相关的虚函数。其中`InferTensorInfo`,`InferTensorMap`和`InferDevMatrixShape`函数定义了推导该算子输入、输出张量排布模型的算法。`InferForwardCommunication`,`InferMirrorOps`等函数定义了切分该算子需要插入的额外计算、通信操作。`CheckStrategy`和`GenerateStrategies`函数定义了算子切分策略校验和生成。根据切分策略`SetCostUnderStrategy`将会产生该策略下分布式算子的并行开销值`operator_cost_`。\n", - "\n", - "3. 设备管理\n", - " - [device_manager.h](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/device_manager.h):这个文件实现了集群设备通信组的创建及管理。其中设备矩阵模型由`device_matrix.h`定义,通信域由`group_manager.h`管理。\n", - "\n", - "4. 整图切分\n", - " - [step_auto_parallel.h](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/step_auto_parallel.h), [step_parallel.h](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/step_parallel.h):这两个文件包含了自动并行流程的核心实现。首先由`step_auto_parallel.h`调用策略搜索流程并产生分布式算子的`OperatorInfo`,然后在`step_parallel.h`中处理算子切分和张量重排布等流程,对单机计算图进行分布式改造。\n", - "\n", - "5. 通信算子反向\n", - " - [grad_comm_ops.py](https://gitee.com/mindspore/mindspore/blob/master/mindspore/python/mindspore/ops/_grad_experimental/grad_comm_ops.py):这个文件定义了`AllReduce`和`AllGather`等通信算子的反向操作。\n", - "\n", - "## 全自动并行\n", - "\n", - "半自动并行将用户从复杂的分布式代码开发中解放出来,大大减轻了用户开发分布式AI大模型的难度,但仍存在以下问题:\n", - "\n", - "- 由于不同的切分策略的训练性能相差很大,尽管用户不再需要考虑设备间的数据存储和通信,但仍然需要为每个算子指定合适的切分策略;\n", - "- 用户仍然需要具备相应的并行知识并根据网络结构、集群拓扑等计算分析,才能在巨大的搜索空间中定义合适的并行策略。而现实情况是AI框架的主要用户是AI研究员和工程师,恰恰不一定具备专业的并行知识;\n", - "- 面对巨大的搜索空间,为大模型找到合适的并行策略需要月级人工调优成本,且仍然不能保证策略最优。例如DeepSpeed、Megatron等针对transformer类的网络的专家定制策略,仍然需要用户定义dp、mp、pp等配置,更何况网络模型的结构不止transformer一种。\n", - "\n", - "基于上述原因,MindSpore提供了多种自动混合并行策略生成方案,尽量减轻用户对于并行配置的感知,让用户能够快速、高效、容易地训练大模型。\n", - "\n", - "这个小节介绍了在MindSpore中`ParallelMode.AUTO_PARALLEL`全自动并行模式是如何工作的。\n", - "\n", - "### 特性设计\n", - "\n", - "全自动并行是基于MindSpore半自动框架,以自动混合并行策略生成算法代替专家配置的并行策略。下图展示了使用MindSpore分布式训练或推理一个神经网络的过程,用户使用Python语言开发自己的神经网络模型(或MindIR导入),经MindSpore解析成计算图(ANF图),自动混合并行策略生成模块通过算法搜索到较优的策略,传递给半自动并行模块,经过半自动模块分析张量排布,分布式算子分析,设备管理以及进行整图切分等操作,传递给后端进行计算。\n", - "\n", - "实际上,混合并行策略生成模块负责在给定神经网络模型和集群配置下,来找到适合的并行切分策略。所采用的关键技术是基于代价模型的策略搜索算法,即构建代价模型(Cost Model)来描述在分布式训练场景下所产生的计算代价(Computation Cost)与通信代价(Communication Cost),以内存开销(Memory Cost)为约束条件,通过计算图搜索算法,高效地搜索出性能较优地并行策略。\n", - "\n", - "![全自动并行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto.png)\n", - "\n", - "### 三种搜索算法\n", - "\n", - "全自动并行的实现难度非常大,MindSpore根据需要用户介入的程度,将提供的策略生成算法分成了L1级别和L2级别(此处我们假设手工配置全图策略SEMI_AUTO为L0级别,完全不需要用户参与的方案为L3级别)。\n", - "\n", - "L1级别的策略生成算法叫做策略广播(Sharding Propagation),在该种模式下,用户仅需要手工定义几个关键算子的策略,计算图中其余算子的策略由算法自动生成。因为关键算子的策略已被定义,该算法的代价模型主要描述的算子之间的重排布代价(Redistribution Cost),优化目标为全图重排代价最小。因为已经定义了主要算子策略,相当于认为压缩了搜索空间,这种方案的搜索时间较短,其策略性能依赖于关键算子策略的定义,因此仍然要求用户具备分析定义策略的能力。详情参考:[切分策略传播](https://www.mindspore.cn/docs/zh-CN/master/features/parallel/auto_parallel.html)。\n", - "\n", - "L2级别的策略生成算法有两种,分别是动态规划算法(Dynamic Programming)和符号化自动策略生成(Symbolic Automatic Parallel Planner 缩写SAPP)。两种方法各有优劣,动态规划算法能够搜索出代价模型刻画的最优策略,但是在搜索巨大网络的并行策略时耗时较长。而SAPP算法能够对于巨大网络以及大规模切分瞬间生成最优策略。\n", - "\n", - "动态规划算法的核心思路是建立全图的代价模型,包括计算代价和通信代价,来描述分布式训练过程中的绝对时延,使用边消除和点消除等等价方法压缩搜索时间,但是搜索空间随着设备数和算子数的增加实际上是指数级增长的,因此对于大模型大集群来说效率不高。\n", - "\n", - "SAPP基于并行原理建模,通过建立抽象机来描述硬件集群拓扑,通过符号化简优化代价模型。其代价模型比较的不是预估的绝对时延,而是不同并行策略的相对代价,因此能够大大压缩搜索空间,对于百卡集群能够保证分钟级的搜索时间。\n", - "\n", - "Sharding Propagation和SAPP目前支持手工定义Pipeline+自动算子级并行,且可与重计算、优化器并行等优化共同使用。Dynamic Programming算法仅支持算子级自动并行。\n", - "\n", - "### 全自动并行代码\n", - "\n", - "**策略搜索算法**:[auto_parallel](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/auto_parallel)目录下实现了策略搜索的算法。`graph_costmodel.h`定义了构图信息,其中每个点表示一个算子`OperatorInfo`,有向边`edge_costmodel.h`表示算子的输入输出关系及重排布的代价。`operator_costmodel.h`中定义了每个算子的代价模型,包括计算代价、通信代价和内存代价。在`costmodel.h`中定义了cost和图操作的数据结构。\n", - "\n", - "- **dynamic_programming**:[dp_algo_costmodel.cc](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/auto_parallel/dp_algo_costmodel.cc)文件主要描述了动态规划算法的主要流程,由一系列图操作组成。\n", - "- **sharding_propagation**:[graph_costmodel.cc](https://gitee.com/mindspore/mindspore/blob/master/mindspore/ccsrc/frontend/parallel/auto_parallel/graph_costmodel.cc)文件实现了策略广播(Sharding Propagation),主要用BFS的遍历方法从点到面,将若干个点的策略,传播到整个图。\n", - "- **symbolic_automatic_parallel_planner**:[rec_core](https://gitee.com/mindspore/mindspore/tree/master/mindspore/ccsrc/frontend/parallel/auto_parallel/rec_core)目录下实现了符号化自动策略生成算法(Symbolic Automatic Parallel Planner)。\n", - "\n", - "## 异构并行\n", - "\n", - "异构并行训练方法是通过分析图上算子内存占用和计算密集度,将内存消耗巨大或适合CPU逻辑处理的算子切分到CPU子图,将内存消耗较小计算密集型算子切分到硬件加速器子图,框架协同不同子图进行网络训练,使得处于不同硬件且无依赖关系的子图能够并行进行执行的过程。\n", - "\n", - "### 计算流程\n", - "\n", - "MindSpore异构并行训练典型的计算流程如下图所示:\n", - "\n", - "![heterogeneous-heter](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/heter.png)\n", - "\n", - "1. 用户设置网络执行的后端" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2022-01-05T02:15:56.790220Z", - "start_time": "2022-01-05T02:15:55.114811Z" - } - }, - "outputs": [], - "source": [ - "import mindspore as ms\n", - "ms.set_device(device_target=\"GPU\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "2. 用户设置特定算子执行后端" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "ExecuteTime": { - "end_time": "2022-01-05T09:02:10.573036Z", - "start_time": "2022-01-05T09:02:09.034905Z" - } - }, - "outputs": [], - "source": [ - "from mindspore import ops\n", - "\n", - "prim = ops.Add()\n", - "\n", - "prim.set_device(\"CPU\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "3. 框架根据计算图算子标志进行切图\n", - "4. 框架调度不同后端执行子图" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "当前典型使用异构并行计算的场景有优化器异构。\n", - "\n", - "### 优化器异构\n", - "\n", - "在盘古或GPT3大模型训练过程中,优化器状态占用了大量内存,进而限制了可训练的模型规模。使用优化器异构,将优化器指定到CPU上执行,可以极大扩展可训练模型规模:\n", - "\n", - "![heterogeneous-heter-opt](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/heter-opt.png)\n", - "\n", - "如图所示,将Adam算子配置到CPU执行同时指定加速器进行FP16计算,可以将参数内存占用降低到原始的1/3。\n", - "\n", - "1. 配置优化器算子到CPU执行\n", - "2. 初始化FP16的权重参数以及FP32的优化器状态变量\n", - "3. 将输入优化器的梯度转为FP16(如果本来就是FP16梯度,可忽略这步)\n", - "4. 权重和梯度转为FP32参与优化器运算\n", - "5. 更新后的FP32权重赋值给FP16的权重\n", - "\n", - "优化器异构代码样例如下:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "ExecuteTime": { - "end_time": "2022-01-05T09:02:10.635821Z", - "start_time": "2022-01-05T09:02:10.574494Z" - } - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "import mindspore as ms\n", - "import mindspore.ops as ops\n", - "from mindspore.common.initializer import initializer\n", - "from mindspore.nn import Optimizer\n", - "_adam_opt = ops.MultitypeFuncGraph(\"adam_opt\")\n", - "host_assign = ops.Assign()\n", - "host_assign.set_device(\"CPU\")\n", - "host_cast = ops.Cast()\n", - "host_cast.set_device(\"CPU\")\n", - "device_cast = ops.Cast()\n", - "\n", - "@_adam_opt.register(\"Function\", \"Tensor\", \"Tensor\", \"Tensor\", \"Tensor\", \"Number\", \"Tensor\", \"Tensor\", \"Tensor\",\n", - " \"Tensor\", \"Bool\", \"Bool\")\n", - "def _update_run_kernel(opt, beta1, beta2, eps, lr, weight_decay, param, m, v, gradient, decay_flags, optim_filter):\n", - " \"\"\"\n", - " Update parameters by AdamWeightDecay op.\n", - " \"\"\"\n", - " success = True\n", - " if optim_filter:\n", - " param32 = host_cast(param, ms.float32)\n", - " gradient = device_cast(gradient, ms.float32)\n", - " if decay_flags:\n", - " next_param = opt(param32, m, v, lr, beta1, beta2, eps, weight_decay, gradient)\n", - " else:\n", - " next_param = opt(param32, m, v, lr, beta1, beta2, eps, 0.0, gradient)\n", - " ret = host_assign(param, host_cast(ops.depend(param32, next_param), ops.dtype(param)))\n", - " return ops.depend(success, ret)\n", - " return success\n", - "\n", - "class AdamWeightDecayOp(Optimizer):\n", - " def __init__(self, params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-6, weight_decay=0.0):\n", - " super(AdamWeightDecayOp, self).__init__(learning_rate, params, weight_decay)\n", - " self.beta1 = ms.Tensor(np.array([beta1]).astype(np.float32))\n", - " self.beta2 = ms.Tensor(np.array([beta2]).astype(np.float32))\n", - " self.eps = ms.Tensor(np.array([eps]).astype(np.float32))\n", - " self.moments1 = self.clone_param32(prefix=\"adam_m\", init='zeros')\n", - " self.moments2 = self.clone_param32(prefix=\"adam_v\", init='zeros')\n", - " self.opt = ops.AdamWeightDecay()\n", - " self.hyper_map = ops.HyperMap()\n", - " self.opt.set_device(\"CPU\")\n", - "\n", - " def construct(self, gradients):\n", - " \"\"\"AdamWeightDecayOp\"\"\"\n", - " lr = self.get_lr()\n", - " if self.is_group:\n", - " if self.is_group_lr:\n", - " optim_result = self.map_reverse(ops.partial(_adam_opt, self.opt, self.beta1, self.beta2, self.eps),\n", - " lr, self.weight_decay, self.parameters, self.moments1, self.moments2,\n", - " gradients, self.decay_flags, self.optim_filter)\n", - " else:\n", - " optim_result = self.map_reverse(ops.partial(_adam_opt, self.opt, self.beta1, self.beta2, self.eps, lr),\n", - " self.weight_decay, self.parameters, self.moments1, self.moments2,\n", - " gradients, self.decay_flags, self.optim_filter)\n", - " else:\n", - " optim_result = self.map_reverse(ops.partial(_adam_opt, self.opt, self.beta1, self.beta2, self.eps, lr,\n", - " self.weight_decay), self.parameters, self.moments1, self.moments2,\n", - " gradients, self.decay_flags, self.optim_filter)\n", - " return optim_result\n", - "\n", - " def clone_param32(self, prefix, init=None):\n", - " new = []\n", - " for old_param in self.parameters:\n", - " param_init = init\n", - " if init is None:\n", - " param_init = old_param.init\n", - " new_state = old_param.clone()\n", - " new_state.set_dtype(ms.float32)\n", - " new_state.set_data(initializer(param_init, shape=old_param.shape, dtype=ms.float32))\n", - " new_state.name = prefix + '.' + new_state.name\n", - " new.append(new_state)\n", - " return ms.ParameterTuple(new)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "步骤4、5也可以直接融合到优化器算子中做进一步优化,完整的优化器异构训练流程可以参考: \n", - "\n", - "### 约束\n", - "\n", - "当前需要用户指定算子执行的后端,不支持根据网络进行自动化配置。" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "MindSpore", - "language": "python", - "name": "mindspore" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/docs/mindspore/source_zh_cn/design/index.rst b/docs/mindspore/source_zh_cn/design/index.rst deleted file mode 100644 index 107efbfad2b8813c72001c495c42f72e829dbf07..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_zh_cn/design/index.rst +++ /dev/null @@ -1,14 +0,0 @@ -设计理念 -========================= - -.. toctree:: - :glob: - :maxdepth: 1 - - overview - programming_paradigm - distributed_training_design - data_engine - multi_level_compilation - all_scenarios - pluggable_device \ No newline at end of file diff --git a/docs/mindspore/source_zh_cn/design/programming_paradigm.ipynb b/docs/mindspore/source_zh_cn/design/programming_paradigm.ipynb deleted file mode 100644 index 958090b8e9a6245f2fa41e60cd09bafe233936d9..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_zh_cn/design/programming_paradigm.ipynb +++ /dev/null @@ -1,608 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "id": "2579fa00-0dc6-4672-8a96-5a6bb315881f", - "metadata": {}, - "source": [ - "# 函数式和对象式融合编程范式\n", - "\n", - "[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/programming_paradigm.ipynb)" - ] - }, - { - "cell_type": "markdown", - "id": "65f5d9f4-b38f-4c3a-bf28-5a4bc9510b6e", - "metadata": {}, - "source": [ - "编程范式(Programming paradigm)是指编程语言的编程风格或编程方式。通常情况下,AI框架均依赖前端编程接口所使用的编程语言的编程范式进行神经网络的构造和训练。MindSpore作为AI+科学计算融合计算框架,分别面向AI、科学计算场景,提供了面向对象编程和函数式编程的支持。同时为提升框架使用的灵活性和易用性,提出了函数式+面向对象融合编程范式,有效地体现了函数式自动微分机制的优势。\n", - "\n", - "下面分别介绍MindSpore支持的三类编程范式及其简单示例。" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "62dc35ab-780a-484d-a6bb-113b8e5af1d4", - "metadata": {}, - "source": [ - "## 面向对象编程\n", - "\n", - "面向对象编程(Object-oriented programming, OOP),是指一种将程序分解为封装数据及相关操作的模块(类)而进行的编程方式,对象为类(class)的实例。面向对象编程将对象作为程序的基本单元,将程序和数据封装其中,以提高软件的重用性、灵活性和扩展性,对象里的程序可以访问及经常修改对象相关联的数据。" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "d2f9816d-fa7a-4524-accb-4fa3c30a2dc4", - "metadata": {}, - "source": [ - "在一般的编程场景中,代码(code)和数据(data)是两个核心构成部分。面向对象编程是针对特定对象(Object)来设计数据结构,定义类(Class)。类通常由以下两部分构成,分别对应了code和data:\n", - "\n", - "- 方法(Methods)\n", - "- 属性(Attributes)\n", - "\n", - "对于同一个Class实例化(instantiation)后得到的不同对象而言,方法和属性相同,不同的是属性的值。不同的属性值决定了对象的内部状态,因此OOP能够很好地进行状态管理。\n", - "\n", - "下面为Python构造简单类的示例:\n", - "\n", - "```python\n", - "class Sample: #class declaration\n", - " def __init__(self, name): # class constructor (code)\n", - " self.name = name # attribute (data)\n", - "\n", - " def set_name(self, name): # method declaration (code)\n", - " self.name = name # method implementation (code)\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "b5584f1a-2bf6-4ea1-a393-99aa015926d8", - "metadata": {}, - "source": [ - "对于构造神经网络来说,首要的组件就是网络层(Layer),一个神经网络层包含以下部分:\n", - "\n", - "- Tensor操作 (Operation)\n", - "- 权重 (Weights)\n", - "\n", - "此二者恰好与类的Methods和Attributes一一对应,同时权重本身就是神经网络层的内部状态,因此使用类来构造Layer天然符合其定义。此外,我们在编程时希望使用神经网络层进行堆叠,构造深度神经网络,使用OOP编程可以很容易地通过Layer对象组合构造新的Layer类。\n", - "\n", - "下面为使用MindSpore构造神经网络类的示例:\n", - "\n", - "```python\n", - "from mindspore import nn, Parameter\n", - "from mindspore.common.initializer import initializer\n", - "import mindspore.ops as ops\n", - "\n", - "class Linear(nn.Cell):\n", - " def __init__(self, in_features, out_features, has_bias): # class constructor (code)\n", - " super().__init__()\n", - " self.weight = Parameter(initializer('normal', [out_features, in_features], mindspore.float32), 'weight') # layer weight (data)\n", - " self.bias = Parameter(initializer('zeros', [out_features], mindspore.float32), 'bias') # layer weight (data)\n", - "\n", - " def construct(self, inputs): # method declaration (code)\n", - " output = ops.matmul(inputs, self.weight.transpose(0, 1)) # tensor transformation (code)\n", - " output = output + self.bias # tensor transformation (code)\n", - " return output\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "72d6cbde", - "metadata": {}, - "source": [ - "除神经网络层的构造使用面向对象编程范式外,MindSpore支持纯面向对象编程方式构造神经网络训练逻辑,此时神经网络的正向计算、反向传播、梯度优化等操作均使用类进行构造。下面是纯面向对象编程的示例:\n", - "\n", - "```python\n", - "import mindspore\n", - "import mindspore.nn as nn\n", - "from mindspore import value_and_grad\n", - "\n", - "class TrainOneStepCell(nn.Cell):\n", - " def __init__(self, network, optimizer):\n", - " super().__init__()\n", - " self.network = network\n", - " self.optimizer = optimizer\n", - " self.grad_fn = value_and_grad(self.network, None, self.optimizer.parameters)\n", - "\n", - " def construct(self, *inputs):\n", - " loss, grads = self.grad_fn(*inputs)\n", - " self.optimizer(grads)\n", - " return loss\n", - "\n", - "network = nn.Dense(5, 3)\n", - "loss_fn = nn.BCEWithLogitsLoss()\n", - "network_with_loss = nn.WithLossCell(network, loss_fn)\n", - "optimizer = nn.SGD(network.trainable_params(), 0.001)\n", - "trainer = TrainOneStepCell(network_with_loss, optimizer)\n", - "```\n", - "\n", - "此时,不论是神经网络,还是其训练过程,均使用继承`nn.Cell`的类进行管理,可以方便地作为计算图进行编译加速。" - ] - }, - { - "cell_type": "markdown", - "id": "eea3afc9-7172-4175-a836-374f97589b71", - "metadata": {}, - "source": [ - "## 函数式编程\n", - "\n", - "函数式编程(Functional programming)是一种将计算机运算视为函数运算,并且避免使用程序状态以及可变对象的编程范式。\n", - "\n", - "在函数式编程中,函数被视为一等公民,这意味着它们可以绑定到名称(包括本地标识符),作为参数传递,并从其他函数返回,就像任何其他数据类型一样。这允许以声明性和可组合的风格编写程序,其中小功能以模块化方式组合。函数式编程有时被视为纯函数式编程的同义词,是将所有函数视为确定性数学函数或纯函数的函数式编程的一个子集。当使用一些给定参数调用纯函数时,它将始终返回相同的结果,并且不受任何可变状态或其他副作用的影响。\n", - "\n", - "函数式编程有两个核心特点,使其十分符合科学计算的需要:\n", - "\n", - "1. 编程函数语义与数学函数语义完全对等。\n", - "2. 确定性,给定相同输入必然返回相同输出。无副作用。\n", - "\n", - "由于确定性这一特点,通过限制副作用,程序可以有更少的错误,更容易调试和测试,更适合形式验证。\n", - "\n", - "MindSpore提供纯函数式编程的支持,配合[mindspore.numpy](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.numpy.html)及[mindspore.scipy](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.scipy.html)提供的数值计算接口,可以便捷地进行科学计算编程。下面是使用函数式编程的示例:" - ] - }, - { - "cell_type": "markdown", - "id": "edf6610b-aa3f-4073-880c-fc88f416f26d", - "metadata": {}, - "source": [ - "```python\n", - "import mindspore.numpy as mnp\n", - "from mindspore import grad\n", - "\n", - "grad_tanh = grad(mnp.tanh)\n", - "print(grad_tanh(2.0))\n", - "# 0.070650816\n", - "\n", - "print(grad(grad(mnp.tanh))(2.0))\n", - "print(grad(grad(grad(mnp.tanh)))(2.0))\n", - "# -0.13621868\n", - "# 0.25265405\n", - "```\n", - "\n", - "配合函数式编程范式的需要,MindSpore提供了多种函数变换接口,涵盖包括自动微分、自动向量化、自动并行、即时编译、数据下沉等功能模块,下面简单进行介绍:\n", - "\n", - "- 自动微分:[grad](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.grad.html)、[value_and_grad](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.value_and_grad.html),提供微分函数变换功能;\n", - "- 自动向量化:[vmap](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.vmap.html),用于沿参数轴映射函数 fn 的高阶函数;\n", - "- 自动并行:[shard](https://www.mindspore.cn/docs/zh-CN/master/api_python/ops/mindspore.ops.Primitive.html#mindspore.ops.Primitive.shard),函数式算子切分,指定函数输入/输出Tensor的分布策略;\n", - "- 即时编译:[jit](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.jit.html),将Python函数编译为一张可调用的MindSpore图;\n", - "- 数据下沉:[data_sink](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.data_sink.html),对输入的函数进行变换,获得可使用数据下沉模式的函数。\n", - "\n", - "基于上述函数变换接口,在使用函数式编程范式时可以快速高效地使用函数变换实现复杂的功能。" - ] - }, - { - "cell_type": "markdown", - "id": "96fe37fa", - "metadata": {}, - "source": [ - "## 函数式微分编程\n", - "\n", - "### 自动微分\n", - "\n", - "深度学习等现代AI算法,通过使用大量的数据来学习拟合出一个优化后带参模型,其中使用的学习算法,多是基于现实数据自模型中的经验误差来反向传播以更新模型的参数,**自动微分技术(Automatic Differentiation,AD)**正是其中的关键技术。\n", - "\n", - "自动微分是一种介于数值微分与符号微分之间的一种求导方法。自动微分的核心思想是将计算机程序中的运算操作分解为一个有限的基本操作合集,且合集中基本操作的求导规则均为已知的。在完成每一个基本操作的求导后,使用链式求导法则将结果组合得到整体程序的求导结果。\n", - "\n", - "链式求导法则:\n", - "\n", - "$$\n", - "(f\\circ g)^{'}(x)=f^{'}(g(x))g^{'}(x) \\tag{1}\n", - "$$\n", - "\n", - "根据对分解后的基本操作求导和链式规则的组合不同,自动微分可以分为**前向模式**和**反向模式**。\n", - "\n", - "- 前向自动微分(Forward Automatic Differentiation,也叫做 tangent linear mode AD)或者前向累加梯度(前向模式)。\n", - "\n", - "- 后向自动微分(Reverse Automatic Differentiation,也叫做 adjoint mode AD)或者说反向累计梯度(反向模式)。\n", - "\n", - "我们以公式(2)为例介绍前向微分与反向微分的具体计算方式:\n", - "\n", - "$$\n", - "y=f(x_{1},x_{2})=ln(x_{1})+x_{1}x_{2}-sin(x_{2}) \\tag{2}\n", - "$$\n", - "\n", - "当我们使用前向自动微分求公式(2)在$x_{1}=2,x_{2}=5$处的导数 $\\frac{\\partial y}{\\partial x_{1}}$ 时,前向自动微分的求导方向与原函数的求值方向一致,原函数结果与微分结果可以被同时获得。\n", - "\n", - "![forward](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward.png)\n", - "\n", - "当使用反向自动微分时,反向自动微分的求导方向与原函数的求值方向相反,微分结果需要依赖原函数的运行结果。\n", - "\n", - "![backward](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_backward.png)\n", - "\n", - "MindSpore先构建的是基于反向模式的自动微分,并在该方法的基础上实现了正向微分。\n", - "\n", - "为了进一步说明前向微分与反向微分的区别,我们将被求导的原函数,泛化为具有N输入与M输出的函数F:\n", - "\n", - "$$\n", - "(Y_{1},Y_{2},...,Y_{M})=F(X_{1},X_{2},...,X_{N}) \\tag{3}\n", - "$$\n", - "\n", - "函数 $F()$ 的导数本身为一个雅可比矩阵(Jacobian matrix)。\n", - "\n", - "$$\n", - "\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial Y_{1}}{\\partial X_{1}}& ... & \\frac{\\partial Y_{1}}{\\partial X_{N}} \\\\\n", - " ... & ... & ... \\\\\n", - " \\frac{\\partial Y_{M}}{\\partial X_{1}} & ... & \\frac{\\partial Y_{M}}{\\partial X_{N}}\n", - " \\end{matrix}\n", - " \\right]\n", - "\\tag{4}\n", - "$$\n", - "\n", - "#### 前向自动微分\n", - "\n", - "在前向自动微分当中,我们是从输入开始向输出的方向计算的,因此每一次计算我们可以求得输出对某一输入的导数,即雅可比矩阵中的一列。\n", - "\n", - "$$\n", - "\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial Y_{1}}{\\partial X_{1}}\\\\\n", - " ... \\\\\n", - " \\frac{\\partial Y_{M}}{\\partial X_{1}}\n", - " \\end{matrix}\n", - " \\right]\n", - "\\tag{5}\n", - "$$\n", - "\n", - "为了求取该列的值,自动微分将程序分解为一系列求导规则已知的基本操作,这些基本操作也可以被泛化表达为具有$n$输入和$m$输出的函数$f$:\n", - "\n", - "$$\n", - "(y_{1},y_{2},...,y_{m})=f(x_{1},x_{2},...,x_{n}) \\tag{6}\n", - "$$\n", - "\n", - "由于我们的已知基础函数 $f$ 的求导规则,即 $f$ 的雅可比矩阵是已知的。于是我们可以对$f$计算雅可比向量积(Jvp, Jacobian-vector-product),并应用链式求导法则获得导数结果。\n", - "\n", - "$$\n", - "\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial y_{1}}{\\partial X_{i}}\\\\\n", - " ... \\\\\n", - " \\frac{\\partial y_{m}}{\\partial X_{i}}\n", - " \\end{matrix}\n", - " \\right]=\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial y_{1}}{\\partial x_{1}}& ... & \\frac{\\partial y_{1}}{\\partial x_{n}} \\\\\n", - " ... & ... & ... \\\\\n", - " \\frac{\\partial y_{m}}{\\partial x_{1}} & ... & \\frac{\\partial y_{M}}{\\partial x_{n}}\n", - " \\end{matrix}\n", - " \\right]\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial x_{1}}{\\partial X_{i}}\\\\\n", - " ... \\\\\n", - " \\frac{\\partial x_{n}}{\\partial X_{i}}\n", - " \\end{matrix}\n", - " \\right]\n", - "\\tag{7}\n", - "$$\n", - "\n", - "#### 反向自动微分\n", - "\n", - "在反向自动微分当中,我们是从输出开始向输入的方向计算的,因此每一次计算我们可以求得某一输出对输入的导数,即雅可比矩阵中的一行。\n", - "\n", - "$$\n", - "\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial Y_{1}}{\\partial X_{1}}& ... & \\frac{\\partial Y_{1}}{\\partial X_{N}} \\\\\n", - " \\end{matrix}\n", - " \\right]\n", - "\\tag{8}\n", - "$$\n", - "\n", - "为了求取该列的值,自动微分将程序分解为一系列求导规则已知的基本操作,这些基本操作也可以被泛化表达为具有n输入和m输出的函数$f$:\n", - "\n", - "$$\n", - "(y_{1},y_{2},...,y_{m})=f(x_{1},x_{2},...,x_{n}) \\tag{9}\n", - "$$\n", - "\n", - "由于我们的已知基础函数$f$的求导规则,即f的雅可比矩阵是已知的。于是我们可以对$f$计算向量雅可比积(Vjp, Vector-jacobian-product),并应用链式求导法则获得导数结果。\n", - "\n", - "$$\n", - "\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial Y_{j}}{\\partial x_{1}}& ... & \\frac{\\partial Y_{j}}{\\partial x_{N}} \\\\\n", - " \\end{matrix}\n", - " \\right]=\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial Y_{j}}{\\partial y_{1}}& ... & \\frac{\\partial Y_{j}}{\\partial y_{m}} \\\\\n", - " \\end{matrix}\n", - " \\right]\\left[\n", - " \\begin{matrix}\n", - " \\frac{\\partial y_{1}}{\\partial x_{1}}& ... & \\frac{\\partial y_{1}}{\\partial x_{n}} \\\\\n", - " ... & ... & ... \\\\\n", - " \\frac{\\partial y_{m}}{\\partial x_{1}} & ... & \\frac{\\partial y_{m}}{\\partial x_{n}}\n", - " \\end{matrix}\n", - " \\right]\n", - "\\tag{10}\n", - "$$" - ] - }, - { - "cell_type": "markdown", - "id": "b4dd1b5f", - "metadata": {}, - "source": [ - "### `grad`实现\n", - "\n", - "MindSpore中`grad`使用的是反向自动微分模式,即从正向网络的输出开始计算梯度。\n", - "\n", - "#### `grad`算法设计\n", - "\n", - "设模型定义的原函数为:\n", - "\n", - "$$\n", - "f(g(x, y, z)) \\tag{11}\n", - "$$\n", - "\n", - "则$f()$对$x$的梯度为:\n", - "\n", - "$$\n", - "\\frac{df}{dx}=\\frac{df}{dg}\\frac{dg}{dx}\\frac{dx}{dx}+\\frac{df}{dg}\\frac{dg}{dy}\\frac{dy}{dx}+\\frac{df}{dg}\\frac{dg}{dz}\\frac{dz}{dx}\\tag{12}\n", - "$$\n", - "\n", - "$\\frac{df}{dy}$和$\\frac{df}{dz}$与$\\frac{df}{dx}$类似。\n", - "\n", - "应用链式求导法则,对每个函数(包括算子和图)定义梯度函数`bprop: dout->(df, dinputs)`,这里`df`表示函数对自由变量(函数外定义的变量)的梯度,`dinputs`是对函数输入的梯度。在此基础上,应用全微分法则,将`(df, dinputs)`累加到对应的变量。\n", - "\n", - "MindIR实现了分支,循环,闭包的函数表达式,所以对相应的算子实现正确的反向规则即可求得输入函数的梯度函数。\n", - "\n", - "定义运算符K,反向自动微分算法可以简单表示如下:\n", - "\n", - "```text\n", - "v = (func, inputs)\n", - "F(v): {\n", - " (result, bprop) = K(func)(inputs)\n", - " df, dinputs = bprop(dout)\n", - " v.df += df\n", - " v.dinputs += dinputs\n", - "}\n", - "```\n", - "\n", - "#### `grad`算法实现\n", - "\n", - "在自动微分流程中,需要进行自动微分的函数会被取出。并作为自动微分模块的输入,并输出对应的梯度图。\n", - "\n", - "MindSpore的自动微分模块实现了从原函数对象到梯度函数对象的转换。转换后的对象为`fprop`形式的梯度函数对象。\n", - "\n", - "`fprop = (forward_result, bprop)`、`forward_result`是前向计算图的输出节点, `bprop`是以`fprop`的闭包对象形式生成的梯度函数,它只有`dout`一个入参, `inputs`和`outputs`是引用的`fprop`的输入和输出。\n", - "\n", - "```c++\n", - " MapObject(); // 实现ValueNode/Parameter/FuncGraph/Primitive对象的映射\n", - " MapMorphism(); // 实现CNode的态射\n", - " res = k_graph(); // res就是梯度函数的fprop对象\n", - "```\n", - "\n", - "在生成梯度函数对象的过程中,需要完成从原函数到梯度函数的一系列的映射,即为每个原函数中的节点生成其所对应的梯度函数的节点,再按照反向自动微分的规则将这些节点连接在一起,生成梯度函数图。\n", - "\n", - "每张原函数对象的子图都会都会生成一个`Dfunctor`对象,负责将该原函数对象映射为梯度函数对象。`DFunctor`主要需要经过 `MapObject`, `MapMorphism`两步来实现这种映射关系。\n", - "\n", - "`MapObject`实现了原函数节点到梯度函数节点的映射,具体包括对自由变量,参数节点以及ValueNode的映射。\n", - "\n", - "```c++\n", - "MapFvObject(); // 自由变量的映射\n", - "MapParamObject(); // 参数节点的映射\n", - "MapValueObject(); // ValueNode的映射\n", - "```\n", - "\n", - "- `MapFvObject`是对自由变量的映射;\n", - "\n", - "- `MapParamObject`是对参数节点的映射;\n", - "\n", - "- `MapValueObject`中主要对`Primitive`以及`FuncGraph`对象进行映射。\n", - "\n", - "其中,对`FuncGraph`进行的映射同样需要为该子图创造相应的`DFunctor`,是一个递归的过程。 `Primitive`表明了算子的种类,为了支持自动微分,需要为每一种`Primitive`定义其对应的反向微分函数。\n", - "\n", - "MindSpore将这些定义放在了Python侧,以`sin`算子为例:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "12584611", - "metadata": {}, - "outputs": [], - "source": [ - "import mindspore.ops as ops\n", - "from mindspore.ops._grad.grad_base import bprop_getters\n", - "\n", - "@bprop_getters.register(ops.Sin)\n", - "def get_bprop_sin(self):\n", - " \"\"\"Grad definition for `Sin` operation.\"\"\"\n", - " cos = ops.Cos()\n", - "\n", - " def bprop(x, out, dout):\n", - " dx = dout * cos(x)\n", - " return (dx,)\n", - "\n", - " return bprop" - ] - }, - { - "cell_type": "markdown", - "id": "22e53fe7", - "metadata": {}, - "source": [ - "`x`为原函数对象`sin`的输入,`out`为原函数对象`sin`的输出,`dout`为当前累加的梯度输入。\n", - "\n", - "当`MapObject`完成对以上节点的映射后,`MapMorphism`从原函数的输出节点开始以递归的方式实现对`CNode`的态射,建立起节点间的反向传播链接,实现梯度累加。\n", - "\n", - "#### `grad`示例\n", - "\n", - "我们构建一个简单的网络来表示公式:\n", - "\n", - "$$\n", - "f(x) = cos(sin(x)) \\tag{13}\n", - "$$\n", - "\n", - "并对公式(13)的输入`x`进行求导:\n", - "\n", - "$$\n", - "f'(x) = -sin(sin(x)) * cos(x) \\tag{14}\n", - "$$\n", - "\n", - "在MindSpore中公式(13)的网络的结构实现为:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "2fa10dc1", - "metadata": {}, - "outputs": [], - "source": [ - "import mindspore.nn as nn\n", - "\n", - "class Net(nn.Cell):\n", - " def __init__(self):\n", - " super(Net, self).__init__()\n", - " self.sin = ops.Sin()\n", - " self.cos = ops.Cos()\n", - "\n", - " def construct(self, x):\n", - " a = self.sin(x)\n", - " out = self.cos(a)\n", - " return out" - ] - }, - { - "cell_type": "markdown", - "id": "425eeb1b", - "metadata": {}, - "source": [ - "正向网络的结构为:\n", - "\n", - "![auto-gradient-foward](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png)\n", - "\n", - "对该网络进行反向微分后,所得微分网络结构为:\n", - "\n", - "![auto-gradient-forward2](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png)" - ] - }, - { - "cell_type": "markdown", - "id": "3898186c", - "metadata": {}, - "source": [ - "### 前向自动微分实现\n", - "\n", - "除了支持反向自动微分的`grad`之外,MindSpore还扩展实现了前向自动微分`jvp`(Jacobian-Vector-Product)。\n", - "\n", - "相比于反向自动微分,前向自动微分更适合于求取输入维度小于输出维度的网络的梯度。MindSpore的前向自动微分是基于反向自动微分接口`grad`开发的。\n", - "\n", - "![auto-gradient-jvp](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png)\n", - "\n", - "黑色为网络的正向流程,第一次求导为针对$x$的求导,得到的是蓝色的图。第二次的为蓝色图针对$v$的求导,得到的是黄色的图。\n", - "\n", - "黄色的图就是我们所需要的前向模式自动微分的结果图。由于蓝色图可以视为关于$v$的线性函数,蓝色节点与黄色节点之间不会存在连边。蓝色节点全部为悬空节点,会被消除,真正运行的就只有原函数节点以及前向微分的节点。因此,该方法不会有额外的运行开销。\n", - "\n", - "#### 参考文献\n", - "\n", - "[1] Baydin, A.G. et al., 2018. [Automatic differentiation in machine learning: A survey](https://arxiv.org/abs/1502.05767). arXiv.org. \\[Accessed September 1, 2021\\].\n" - ] - }, - { - "cell_type": "markdown", - "id": "680306ff-ecba-4b1f-be44-df6ac1935f91", - "metadata": {}, - "source": [ - "## 函数式+面向对象融合编程" - ] - }, - { - "cell_type": "markdown", - "id": "d74658ba-41e7-4178-932b-ae8cd888902a", - "metadata": {}, - "source": [ - "考虑到神经网络模型构建和训练流程的灵活性和易用性需求,结合MindSpore自身的函数式自动微分机制,MindSpore针对AI模型训练设计了函数式+面向对象融合编程范式,可以兼顾面向对象编程和函数式编程的优势,同时使用同一套自动微分机制实现深度学习反向传播和科学计算自动微分的兼容,从底层支持AI和科学计算建模的兼容。下面是函数式+面向对象融合编程的典型过程:\n", - "\n", - "1. 用类构建神经网络;\n", - "2. 实例化神经网络对象;\n", - "3. 构造正向函数,连接神经网络和损失函数;\n", - "4. 使用函数变换,获得梯度计算(反向传播)函数;\n", - "5. 构造训练过程函数;\n", - "6. 调用函数进行训练。\n", - "\n", - "下面是函数式+面向对象融合编程的简单示例:" - ] - }, - { - "cell_type": "markdown", - "id": "673cec96-9965-46da-b5a4-2d705ee869e3", - "metadata": {}, - "source": [ - "```python\n", - "# Class definition\n", - "class Net(nn.Cell):\n", - " def __init__(self):\n", - " ......\n", - " def construct(self, inputs):\n", - " ......\n", - "\n", - "# Object instantiation\n", - "net = Net() # network\n", - "loss_fn = nn.CrossEntropyLoss() # loss function\n", - "optimizer = nn.Adam(net.trainable_params(), lr) # optimizer\n", - "\n", - "# define forward function\n", - "def forward_fn(inputs, targets):\n", - " logits = net(inputs)\n", - " loss = loss_fn(logits, targets)\n", - " return loss, logits\n", - "\n", - "# get grad function\n", - "grad_fn = value_and_grad(forward_fn, None, optim.parameters, has_aux=True)\n", - "\n", - "# define train step function\n", - "def train_step(inputs, targets):\n", - " (loss, logits), grads = grad_fn(inputs, targets) # get values and gradients\n", - " optimizer(grads) # update gradient\n", - " return loss, logits\n", - "\n", - "for i in range(epochs):\n", - " for inputs, targets in dataset():\n", - " loss = train_step(inputs, targets)\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "49aadacc-d515-47a9-945b-f255e02508ae", - "metadata": {}, - "source": [ - "如上述示例,在神经网络构造时,使用面向对象编程,神经网络层的构造方式符合AI编程的习惯。在进行前向计算和反向传播时,MindSpore使用函数式编程,将前向计算构造为函数,然后通过函数变换,获得`grad_fn`,最后通过执行`grad_fn`获得权重对应的梯度。\n", - "\n", - "通过函数式+面向对象融合编程,即保证了神经网络构建的易用性,同时提高了前向计算和反向传播等训练过程的灵活性,是MindSpore推荐的默认编程范式。" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "MindSpore", - "language": "python", - "name": "mindspore" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "vscode": { - "interpreter": { - "hash": "8c9da313289c39257cb28b126d2dadd33153d4da4d524f730c81a4aaccbd2ca7" - } - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/mindspore/source_zh_cn/design/images/graphkernel.png b/docs/mindspore/source_zh_cn/features/compile/images/graphkernel.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/graphkernel.png rename to docs/mindspore/source_zh_cn/features/compile/images/graphkernel.png diff --git a/docs/mindspore/source_zh_cn/design/images/graphkernel_akg_overview.png b/docs/mindspore/source_zh_cn/features/compile/images/graphkernel_akg_overview.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/graphkernel_akg_overview.png rename to docs/mindspore/source_zh_cn/features/compile/images/graphkernel_akg_overview.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_example.png b/docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_example.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_example.png rename to docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_example.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_exec_order.png b/docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_exec_order.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_exec_order.png rename to docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_exec_order.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_framework.png b/docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_framework.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_framework.png rename to docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_framework.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_kernelselect.png b/docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_kernelselect.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_kernelselect.png rename to docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_kernelselect.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_lazyinline.png b/docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_lazyinline.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_lazyinline.png rename to docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_lazyinline.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_no_task.png b/docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_no_task.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_level_compilation/jit_level_no_task.png rename to docs/mindspore/source_zh_cn/features/compile/images/multi_level_compilation/jit_level_no_task.png diff --git a/docs/mindspore/source_zh_cn/design/multi_level_compilation.md b/docs/mindspore/source_zh_cn/features/compile/multi_level_compilation.md similarity index 99% rename from docs/mindspore/source_zh_cn/design/multi_level_compilation.md rename to docs/mindspore/source_zh_cn/features/compile/multi_level_compilation.md index db91877291b25bccd9a63ac9a60f179a4278519f..dd640eaff985b5172c732dd6918e263907799c08 100644 --- a/docs/mindspore/source_zh_cn/design/multi_level_compilation.md +++ b/docs/mindspore/source_zh_cn/features/compile/multi_level_compilation.md @@ -1,6 +1,6 @@ # 多级编译架构 -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/multi_level_compilation.md) +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/features/compile/multi_level_compilation.md) ## 背景 diff --git a/docs/mindspore/source_zh_cn/design/data_engine.md b/docs/mindspore/source_zh_cn/features/data_engine.md similarity index 99% rename from docs/mindspore/source_zh_cn/design/data_engine.md rename to docs/mindspore/source_zh_cn/features/data_engine.md index ee1543645baafe6b6a91b55955ddc246974d901a..19c621bd760a108f2ff0f0de349a5befe46bad78 100644 --- a/docs/mindspore/source_zh_cn/design/data_engine.md +++ b/docs/mindspore/source_zh_cn/features/data_engine.md @@ -1,6 +1,6 @@ # 高性能数据处理引擎 -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/data_engine.md) +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/features/data_engine.md) ## 背景介绍 diff --git a/docs/mindspore/source_zh_cn/design/images/MoE.png b/docs/mindspore/source_zh_cn/features/images/MoE.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/MoE.png rename to docs/mindspore/source_zh_cn/features/images/MoE.png diff --git a/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.graffle b/docs/mindspore/source_zh_cn/features/images/all_scenarios_intro.graffle similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.graffle rename to docs/mindspore/source_zh_cn/features/images/all_scenarios_intro.graffle diff --git a/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png b/docs/mindspore/source_zh_cn/features/images/all_scenarios_intro.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png rename to docs/mindspore/source_zh_cn/features/images/all_scenarios_intro.png diff --git a/docs/mindspore/source_zh_cn/design/images/all_scenarios_train_process.graffle b/docs/mindspore/source_zh_cn/features/images/all_scenarios_train_process.graffle similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/all_scenarios_train_process.graffle rename to docs/mindspore/source_zh_cn/features/images/all_scenarios_train_process.graffle diff --git a/docs/mindspore/source_zh_cn/design/images/all_scenarios_train_process.png b/docs/mindspore/source_zh_cn/features/images/all_scenarios_train_process.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/all_scenarios_train_process.png rename to docs/mindspore/source_zh_cn/features/images/all_scenarios_train_process.png diff --git a/docs/mindspore/source_zh_cn/design/images/arch_zh.png b/docs/mindspore/source_zh_cn/features/images/arch_zh.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/arch_zh.png rename to docs/mindspore/source_zh_cn/features/images/arch_zh.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto.png b/docs/mindspore/source_zh_cn/features/images/auto.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto.png rename to docs/mindspore/source_zh_cn/features/images/auto.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_backward.png b/docs/mindspore/source_zh_cn/features/images/auto_gradient_backward.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_backward.png rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_backward.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward.png b/docs/mindspore/source_zh_cn/features/images/auto_gradient_forward.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_forward.png rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_forward.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.graffle b/docs/mindspore/source_zh_cn/features/images/auto_gradient_forward2.graffle similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.graffle rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_forward2.graffle diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png b/docs/mindspore/source_zh_cn/features/images/auto_gradient_forward2.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_forward2.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.graffle b/docs/mindspore/source_zh_cn/features/images/auto_gradient_foward.graffle similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.graffle rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_foward.graffle diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png b/docs/mindspore/source_zh_cn/features/images/auto_gradient_foward.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_foward.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.graffle b/docs/mindspore/source_zh_cn/features/images/auto_gradient_jvp.graffle similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.graffle rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_jvp.graffle diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png b/docs/mindspore/source_zh_cn/features/images/auto_gradient_jvp.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png rename to docs/mindspore/source_zh_cn/features/images/auto_gradient_jvp.png diff --git a/docs/mindspore/source_zh_cn/design/images/auto_parallel.png b/docs/mindspore/source_zh_cn/features/images/auto_parallel.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/auto_parallel.png rename to docs/mindspore/source_zh_cn/features/images/auto_parallel.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/architecture.png b/docs/mindspore/source_zh_cn/features/images/data/architecture.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/architecture.png rename to docs/mindspore/source_zh_cn/features/images/data/architecture.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/auto_augment.png b/docs/mindspore/source_zh_cn/features/images/data/auto_augment.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/auto_augment.png rename to docs/mindspore/source_zh_cn/features/images/data/auto_augment.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/auto_shape.png b/docs/mindspore/source_zh_cn/features/images/data/auto_shape.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/auto_shape.png rename to docs/mindspore/source_zh_cn/features/images/data/auto_shape.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/callback.png b/docs/mindspore/source_zh_cn/features/images/data/callback.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/callback.png rename to docs/mindspore/source_zh_cn/features/images/data/callback.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/connector.png b/docs/mindspore/source_zh_cn/features/images/data/connector.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/connector.png rename to docs/mindspore/source_zh_cn/features/images/data/connector.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/data_engine.png b/docs/mindspore/source_zh_cn/features/images/data/data_engine.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/data_engine.png rename to docs/mindspore/source_zh_cn/features/images/data/data_engine.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/data_process.png b/docs/mindspore/source_zh_cn/features/images/data/data_process.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/data_process.png rename to docs/mindspore/source_zh_cn/features/images/data/data_process.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/ir.png b/docs/mindspore/source_zh_cn/features/images/data/ir.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/ir.png rename to docs/mindspore/source_zh_cn/features/images/data/ir.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/multi_process.png b/docs/mindspore/source_zh_cn/features/images/data/multi_process.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/multi_process.png rename to docs/mindspore/source_zh_cn/features/images/data/multi_process.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/operation.png b/docs/mindspore/source_zh_cn/features/images/data/operation.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/operation.png rename to docs/mindspore/source_zh_cn/features/images/data/operation.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/pipeline.png b/docs/mindspore/source_zh_cn/features/images/data/pipeline.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/pipeline.png rename to docs/mindspore/source_zh_cn/features/images/data/pipeline.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/predict.png b/docs/mindspore/source_zh_cn/features/images/data/predict.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/predict.png rename to docs/mindspore/source_zh_cn/features/images/data/predict.png diff --git a/docs/mindspore/source_zh_cn/design/images/data/queue.png b/docs/mindspore/source_zh_cn/features/images/data/queue.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data/queue.png rename to docs/mindspore/source_zh_cn/features/images/data/queue.png diff --git a/docs/mindspore/source_zh_cn/design/images/data_parallel.png b/docs/mindspore/source_zh_cn/features/images/data_parallel.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/data_parallel.png rename to docs/mindspore/source_zh_cn/features/images/data_parallel.png diff --git a/docs/mindspore/source_zh_cn/design/images/heter-opt.png b/docs/mindspore/source_zh_cn/features/images/heter-opt.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/heter-opt.png rename to docs/mindspore/source_zh_cn/features/images/heter-opt.png diff --git a/docs/mindspore/source_zh_cn/design/images/heter.png b/docs/mindspore/source_zh_cn/features/images/heter.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/heter.png rename to docs/mindspore/source_zh_cn/features/images/heter.png diff --git a/docs/mindspore/source_zh_cn/design/images/ir/cf.dot b/docs/mindspore/source_zh_cn/features/images/ir/cf.dot similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/cf.dot rename to docs/mindspore/source_zh_cn/features/images/ir/cf.dot diff --git a/docs/mindspore/source_zh_cn/design/images/ir/cf.png b/docs/mindspore/source_zh_cn/features/images/ir/cf.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/cf.png rename to docs/mindspore/source_zh_cn/features/images/ir/cf.png diff --git a/docs/mindspore/source_zh_cn/design/images/ir/closure.dot b/docs/mindspore/source_zh_cn/features/images/ir/closure.dot similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/closure.dot rename to docs/mindspore/source_zh_cn/features/images/ir/closure.dot diff --git a/docs/mindspore/source_zh_cn/design/images/ir/closure.png b/docs/mindspore/source_zh_cn/features/images/ir/closure.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/closure.png rename to docs/mindspore/source_zh_cn/features/images/ir/closure.png diff --git a/docs/mindspore/source_zh_cn/design/images/ir/hof.dot b/docs/mindspore/source_zh_cn/features/images/ir/hof.dot similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/hof.dot rename to docs/mindspore/source_zh_cn/features/images/ir/hof.dot diff --git a/docs/mindspore/source_zh_cn/design/images/ir/hof.png b/docs/mindspore/source_zh_cn/features/images/ir/hof.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/hof.png rename to docs/mindspore/source_zh_cn/features/images/ir/hof.png diff --git a/docs/mindspore/source_zh_cn/design/images/ir/ir.dot b/docs/mindspore/source_zh_cn/features/images/ir/ir.dot similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/ir.dot rename to docs/mindspore/source_zh_cn/features/images/ir/ir.dot diff --git a/docs/mindspore/source_zh_cn/design/images/ir/ir.png b/docs/mindspore/source_zh_cn/features/images/ir/ir.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/ir/ir.png rename to docs/mindspore/source_zh_cn/features/images/ir/ir.png diff --git a/docs/mindspore/source_zh_cn/design/images/multi_copy.png b/docs/mindspore/source_zh_cn/features/images/multi_copy.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/multi_copy.png rename to docs/mindspore/source_zh_cn/features/images/multi_copy.png diff --git a/docs/mindspore/source_zh_cn/design/images/operator_split.png b/docs/mindspore/source_zh_cn/features/images/operator_split.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/operator_split.png rename to docs/mindspore/source_zh_cn/features/images/operator_split.png diff --git a/docs/mindspore/source_zh_cn/design/images/tensor_redistribution1.png b/docs/mindspore/source_zh_cn/features/images/tensor_redistribution1.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/tensor_redistribution1.png rename to docs/mindspore/source_zh_cn/features/images/tensor_redistribution1.png diff --git a/docs/mindspore/source_zh_cn/design/images/tensor_redistribution2.png b/docs/mindspore/source_zh_cn/features/images/tensor_redistribution2.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/tensor_redistribution2.png rename to docs/mindspore/source_zh_cn/features/images/tensor_redistribution2.png diff --git a/docs/mindspore/source_zh_cn/design/images/tensor_redistribution3.png b/docs/mindspore/source_zh_cn/features/images/tensor_redistribution3.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/tensor_redistribution3.png rename to docs/mindspore/source_zh_cn/features/images/tensor_redistribution3.png diff --git a/docs/mindspore/source_zh_cn/features/index.rst b/docs/mindspore/source_zh_cn/features/index.rst index 73ae24d7d5b93c862a2a7f900bf36bdc256204ef..383f4cad09de20d0fac025113999a549f4e23877 100644 --- a/docs/mindspore/source_zh_cn/features/index.rst +++ b/docs/mindspore/source_zh_cn/features/index.rst @@ -1,148 +1,22 @@ -特性介绍 +Developer Notes ========================= .. toctree:: :glob: :maxdepth: 1 - :hidden: - :caption: 编程形态 - - program_form/overview - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: 数据处理 - - dataset/overview - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: 分布式并行 + overview parallel/data_parallel parallel/operator_parallel parallel/optimizer_parallel parallel/pipeline_parallel parallel/auto_parallel - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: 编译 - + compile/multi_level_compilation compile/graph_construction compile/graph_optimization - -.. toctree:: - :glob: - :maxdepth: 1 - :hidden: - :caption: 运行时 - runtime/memory_manager runtime/multilevel_pipeline runtime/multistream_concurrency runtime/pluggable_backend - -.. raw:: html - - - -.. raw:: html - - - + runtime/pluggable_device + data_engine diff --git a/docs/mindspore/source_zh_cn/design/overview.md b/docs/mindspore/source_zh_cn/features/overview.md similarity index 99% rename from docs/mindspore/source_zh_cn/design/overview.md rename to docs/mindspore/source_zh_cn/features/overview.md index 95118f6cdca99fcdb99febd18d473a5dbfc89921..8cf8b017ba16cef9e4e23dccd27411499b030465 100644 --- a/docs/mindspore/source_zh_cn/design/overview.md +++ b/docs/mindspore/source_zh_cn/features/overview.md @@ -1,6 +1,6 @@ # MindSpore设计概览 -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/overview.md) +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/features/overview.md) ## 概述 diff --git a/docs/mindspore/source_zh_cn/features/program_form/overview.md b/docs/mindspore/source_zh_cn/features/program_form/overview.md deleted file mode 100644 index abcf4afa3075d489296f2b9cc9ec2aff109fd4bb..0000000000000000000000000000000000000000 --- a/docs/mindspore/source_zh_cn/features/program_form/overview.md +++ /dev/null @@ -1,11 +0,0 @@ -# 编程形态概述 - -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/features/program_form/overview.md) - -MindSpore是面向“端-边-云”全场景设计的AI框架,为用户提供AI模型开发、训练、推理的接口,支持用户用原生Python语法开发和调试神经网络,其提供动态图、静态图、动静统一的编程形态,使开发者可以兼顾开发效率和执行性能。 - -考虑开发灵活性、易用性,MindSpore支持动态图的编程模式,基于MindSpore提供的functional和nn.cell接口,用户可以灵活组装构建所需网络,相关接口按照Python函数库的形态解释执行,并支持微分求导能力。从而易于调试和开发。相关接口按配置支持加速硬件的异步下发执行从而实现异构加速。 - -同时,基于动态图模式,MindSpore提供@jit的装饰器优化能力,可以指定函数通过@jit装饰优化,装饰部分会被整体解析,构建成C++计算图,进行全局分析,编译优化,从而加速被装饰部分的整体执行性能。这一过程我们也称之为静态化加速。 - -除了动态图模式,MindSpore进一步提供了[静态图](https://www.mindspore.cn/tutorials/zh-CN/master/compile/static_graph.html)的编程模式,相关MindSpore模型构建接口不变,无需添加@jit装饰,MindSpore框架会针对所有开发在nn.cell类中construct函数的定义内容,整体编译解析,构建针对网络的完整静态图,进行模型整图级编译优化与执行。这样能针对整网,基于AI模型训练、推理的特点,进行模型级专有的优化,获取更高的执行性能。 diff --git a/docs/mindspore/source_zh_cn/design/images/pluggable_device_arch.png b/docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_arch.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/pluggable_device_arch.png rename to docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_arch.png diff --git a/docs/mindspore/source_zh_cn/design/images/pluggable_device_graph.png b/docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_graph.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/pluggable_device_graph.png rename to docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_graph.png diff --git a/docs/mindspore/source_zh_cn/design/images/pluggable_device_kernel.png b/docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_kernel.png similarity index 100% rename from docs/mindspore/source_zh_cn/design/images/pluggable_device_kernel.png rename to docs/mindspore/source_zh_cn/features/runtime/images/pluggable_device_kernel.png diff --git a/docs/mindspore/source_zh_cn/design/pluggable_device.md b/docs/mindspore/source_zh_cn/features/runtime/pluggable_device.md similarity index 97% rename from docs/mindspore/source_zh_cn/design/pluggable_device.md rename to docs/mindspore/source_zh_cn/features/runtime/pluggable_device.md index 2f17977eace563507d1b66aa1d61c552f9a08fcb..0f6c3beca60ec85678b44ddedd532bd28b267450 100644 --- a/docs/mindspore/source_zh_cn/design/pluggable_device.md +++ b/docs/mindspore/source_zh_cn/features/runtime/pluggable_device.md @@ -1,59 +1,59 @@ -# 三方硬件对接 - -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/design/pluggable_device.md) - -MindSpore通过开放式架构,支持第三方芯片插件化、标准化、低成本快速对接: - -- 后端架构解耦,快速支持新芯片插件化对接; -- 抽象硬件类型建模,对接流程标准化; -- 抽象算子封装,多芯片异构算子统一选择; -- 支持第三方图IR接入,充分发挥芯片架构优势。 - -MindSpore整体架构及后端相关组件如下图所示: - -![image](./images/pluggable_device_arch.png) - -MindSpore整体架构包括如下几个主要组件,它们之间存在相互的依赖关系: - -- Python API:提供了基于Python的前端表达与编程接口,支撑用户进行网络构建、整图执行、子图执行以及单算子执行,并通过pybind11接口调用到C++模块,C++模块分为前端、后端、MindData、Core等; -- MindExpression前端表达:负责编译流程控制和硬件无关的优化如类型推导、自动微分、表达式化简等; -- MindData数据组件:MindData提供高效的数据处理、常用数据集加载等功能和编程接口,支持用户灵活的定义处理注册和pipeline并行优化; -- MindIR:包含了ANF IR数据结构、日志、异常等端、云共用的数据结构与算法。 - -第三方芯片对接MindSpore的过程主要涉及MindSpore的后端,后端也分为多个组件,整体上分为两大类: - -- 一类与硬件无关,如MemoryManager、MemoryPool、DeviceAddress等常用数据结构及相关算法以及包括GraphCompiler、GraphScheduler在内的能够调度整个流程、具有对图或单算子的初步处理和调度能力的组件; -- 另一类与硬件相关,这部分通过对硬件的抽象,提供了多个接口,第三方芯片可以根据情况选择对接,实现硬件平台上特有的算子、图优化、内存分配、流分配等逻辑,并封装成动态库,程序运行时作为插件加载。第三方芯片对接时可以参考MindSpore默认内置的CPU/GPU/Ascend插件。 - -为了方便第三方硬件对接,在MindSpore中提供了硬件抽象层,定义了标准化的硬件对接接口,抽象层被上层统一运行时中的GraphCompiler和GraphScheduler两个模块调用: - -- GraphCompiler负责提供默认的控制流、异构图拆分逻辑,不同阶段的图优化,调用抽象层提供的算子选择/算子编译、内存分配和流分配等; -- GraphScheduler负责将编译完成的图转化为Actor模型并加入到线程池中,并执行调度这些Actor。 - -同时,在框架中也提供了公共数据结构与算法,如debug工具、默认的内存池实现、数百个对Anf IR的常见操作、由MindSpore研发高效内存复用算法SOMAS等。 - -硬件抽象层提供了Graph模式(GraphExecutor)和Kernel模式(KernelExecutor)用于两种对接方式,分别面向DSA架构(如NPU、XPU等)和通用架构的芯片(如GPU、CPU等)提供分类的对接接口。芯片厂商可以继承某种或两种抽象类并实现,根据对接方式的不同,如果对接Kernel模式还需实现DeviceResManager、KernelMod、DeviceAddress等接口。 - -## Kernel模式对接 - -通用架构Kernel模式需要在插件中实现以下几个方面的功能: - -- 自定义图拆分逻辑,可以低成本实现框架提供的控制流、异构等高级特性,如果不使用这些特性,可以空实现; - -- 自定义图优化,可以根据硬件的特性对某些算子进行拆分与融合,以及其他自定义的对图的修改; - -- 算子选择和算子编译; -- 内存管理,DeviceAddress是对内存的抽象,第三方芯片厂商需要实现Host与Device之间拷贝的功能。还需要提供内存申请、销毁的功能。为了方便第三方芯片厂商,MindSpore在Common组件中提供了一套内存池的实现和高效内存复用算法SOMAS; -- 流管理,如果待对接的芯片有流的概念,需要提供创建与销毁的功能,如果没有,则将会以单流模式运行。 - -![image](./images/pluggable_device_kernel.png) - -## Graph模式对接 - -若芯片厂商的软件栈较完整能够提供High level的API,或DSA架构芯片的软件栈与Kernel模式存在差异,可以对接Graph模式。Graph模式将整个图视为一个由第三方软件栈实现的大算子(SuperKernel),需要由第三方软件栈实现以下两个功能: - -- 图编译,第三方芯片厂商需要将MindSpore的Anf IR转换成第三方IR图表达,并执行第三方图编译流程将该图编译至可执行的就绪状态; - -- 图执行,第三方芯片厂商需要理解MindSpore的Tensor格式或将其转换成可被理解的格式,并调用执行已就绪的图,并将执行的结果转换成MindSpore的Tensor格式。 - -![image](./images/pluggable_device_graph.png) +# 三方硬件对接 + +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/features/runtime/pluggable_device.md) + +MindSpore通过开放式架构,支持第三方芯片插件化、标准化、低成本快速对接: + +- 后端架构解耦,快速支持新芯片插件化对接; +- 抽象硬件类型建模,对接流程标准化; +- 抽象算子封装,多芯片异构算子统一选择; +- 支持第三方图IR接入,充分发挥芯片架构优势。 + +MindSpore整体架构及后端相关组件如下图所示: + +![image](./images/pluggable_device_arch.png) + +MindSpore整体架构包括如下几个主要组件,它们之间存在相互的依赖关系: + +- Python API:提供了基于Python的前端表达与编程接口,支撑用户进行网络构建、整图执行、子图执行以及单算子执行,并通过pybind11接口调用到C++模块,C++模块分为前端、后端、MindData、Core等; +- MindExpression前端表达:负责编译流程控制和硬件无关的优化如类型推导、自动微分、表达式化简等; +- MindData数据组件:MindData提供高效的数据处理、常用数据集加载等功能和编程接口,支持用户灵活的定义处理注册和pipeline并行优化; +- MindIR:包含了ANF IR数据结构、日志、异常等端、云共用的数据结构与算法。 + +第三方芯片对接MindSpore的过程主要涉及MindSpore的后端,后端也分为多个组件,整体上分为两大类: + +- 一类与硬件无关,如MemoryManager、MemoryPool、DeviceAddress等常用数据结构及相关算法以及包括GraphCompiler、GraphScheduler在内的能够调度整个流程、具有对图或单算子的初步处理和调度能力的组件; +- 另一类与硬件相关,这部分通过对硬件的抽象,提供了多个接口,第三方芯片可以根据情况选择对接,实现硬件平台上特有的算子、图优化、内存分配、流分配等逻辑,并封装成动态库,程序运行时作为插件加载。第三方芯片对接时可以参考MindSpore默认内置的CPU/GPU/Ascend插件。 + +为了方便第三方硬件对接,在MindSpore中提供了硬件抽象层,定义了标准化的硬件对接接口,抽象层被上层统一运行时中的GraphCompiler和GraphScheduler两个模块调用: + +- GraphCompiler负责提供默认的控制流、异构图拆分逻辑,不同阶段的图优化,调用抽象层提供的算子选择/算子编译、内存分配和流分配等; +- GraphScheduler负责将编译完成的图转化为Actor模型并加入到线程池中,并执行调度这些Actor。 + +同时,在框架中也提供了公共数据结构与算法,如debug工具、默认的内存池实现、数百个对Anf IR的常见操作、由MindSpore研发高效内存复用算法SOMAS等。 + +硬件抽象层提供了Graph模式(GraphExecutor)和Kernel模式(KernelExecutor)用于两种对接方式,分别面向DSA架构(如NPU、XPU等)和通用架构的芯片(如GPU、CPU等)提供分类的对接接口。芯片厂商可以继承某种或两种抽象类并实现,根据对接方式的不同,如果对接Kernel模式还需实现DeviceResManager、KernelMod、DeviceAddress等接口。 + +## Kernel模式对接 + +通用架构Kernel模式需要在插件中实现以下几个方面的功能: + +- 自定义图拆分逻辑,可以低成本实现框架提供的控制流、异构等高级特性,如果不使用这些特性,可以空实现; + +- 自定义图优化,可以根据硬件的特性对某些算子进行拆分与融合,以及其他自定义的对图的修改; + +- 算子选择和算子编译; +- 内存管理,DeviceAddress是对内存的抽象,第三方芯片厂商需要实现Host与Device之间拷贝的功能。还需要提供内存申请、销毁的功能。为了方便第三方芯片厂商,MindSpore在Common组件中提供了一套内存池的实现和高效内存复用算法SOMAS; +- 流管理,如果待对接的芯片有流的概念,需要提供创建与销毁的功能,如果没有,则将会以单流模式运行。 + +![image](./images/pluggable_device_kernel.png) + +## Graph模式对接 + +若芯片厂商的软件栈较完整能够提供High level的API,或DSA架构芯片的软件栈与Kernel模式存在差异,可以对接Graph模式。Graph模式将整个图视为一个由第三方软件栈实现的大算子(SuperKernel),需要由第三方软件栈实现以下两个功能: + +- 图编译,第三方芯片厂商需要将MindSpore的Anf IR转换成第三方IR图表达,并执行第三方图编译流程将该图编译至可执行的就绪状态; + +- 图执行,第三方芯片厂商需要理解MindSpore的Tensor格式或将其转换成可被理解的格式,并调用执行已就绪的图,并将执行的结果转换成MindSpore的Tensor格式。 + +![image](./images/pluggable_device_graph.png) diff --git a/docs/mindspore/source_en/features/dataset/overview.md b/tutorials/source_en/dataset/overview.md similarity index 99% rename from docs/mindspore/source_en/features/dataset/overview.md rename to tutorials/source_en/dataset/overview.md index 8cc2106eff71d4c9142679911831ea476c65e159..c0e100a01b17b959b68992d92f8b8dba670c49ac 100644 --- a/docs/mindspore/source_en/features/dataset/overview.md +++ b/tutorials/source_en/dataset/overview.md @@ -1,6 +1,6 @@ # Data Processing Overview -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_en/features/dataset/overview.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_en/dataset/overview.md) MindSpore Dataset provides two types of data processing capabilities: pipeline mode and lightweight mode. diff --git a/tutorials/source_en/index.rst b/tutorials/source_en/index.rst index 9aea3c9585ce65e831a95e9d8b08194fb585a8d8..5bfcd984d2833decf4abf5f9e3c7223da902cb6d 100644 --- a/tutorials/source_en/index.rst +++ b/tutorials/source_en/index.rst @@ -28,10 +28,11 @@ MindSpore Tutorial :caption: Data Processing :hidden: + dataset/overview dataset/sampler dataset/eager dataset/record - dataset/optimize + dataset/optimize .. toctree:: :glob: diff --git a/docs/mindspore/source_zh_cn/features/dataset/overview.ipynb b/tutorials/source_zh_cn/dataset/overview.ipynb similarity index 96% rename from docs/mindspore/source_zh_cn/features/dataset/overview.ipynb rename to tutorials/source_zh_cn/dataset/overview.ipynb index 90bafb8c4a47256dd2bd1fb402936ecc252af71e..8189b518237fe811b4cd0de7a3dff58207778ed7 100644 --- a/docs/mindspore/source_zh_cn/features/dataset/overview.ipynb +++ b/tutorials/source_zh_cn/dataset/overview.ipynb @@ -13,7 +13,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/zh_cn/features/dataset/mindspore_overview.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_download_code.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/zh_cn/features/dataset/mindspore_overview.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindspore/source_zh_cn/features/dataset/overview.ipynb)" + "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/zh_cn/dataset/mindspore_overview.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_download_code.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/zh_cn/dataset/mindspore_overview.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/dataset/overview.ipynb)" ] }, { diff --git a/tutorials/source_zh_cn/index.rst b/tutorials/source_zh_cn/index.rst index 4493ad476236bbef21514902e7b22458a123c839..4ed77cb60ebba9acc5e2592685b467f52e71e505 100644 --- a/tutorials/source_zh_cn/index.rst +++ b/tutorials/source_zh_cn/index.rst @@ -28,6 +28,7 @@ MindSpore教程 :caption: 数据处理 :hidden: + dataset/overview dataset/sampler dataset/eager dataset/record