diff --git a/tutorials/experts/source_en/model_infer/inference.md b/tutorials/experts/source_en/model_infer/inference.md new file mode 100644 index 0000000000000000000000000000000000000000..caee10b2a51c182883fa25c21318b7c6c19c583b --- /dev/null +++ b/tutorials/experts/source_en/model_infer/inference.md @@ -0,0 +1,59 @@ +# Inference Model Overview + +`Ascend` `GPU` `CPU` `Inference Application` + + + +MindSpore can execute inference tasks on different hardware platforms based on trained models. + +## Model Files + +MindSpore can save two types of data: training parameters and network models that contain parameter information. + +- Training parameters are stored in the checkpoint format. +- Network models are stored in the MindIR, AIR, or ONNX format. + +Basic concepts and application scenarios of these formats are as follows: + +- Checkpoint + - Checkpoint uses the Protocol Buffers format and stores all network parameter values. + - It is generally used to resume training after a training task is interrupted or executes a fine-tune task after training. +- MindSpore IR (MindIR) + - MindIR is a graph-based function-like IR of MindSpore and defines scalable graph structures and operator IRs. + - It eliminates model differences between different backends and is generally used to perform inference tasks across hardware platforms. +- Open Neural Network Exchange (ONNX) + - ONNX is an open format built to represent machine learning models. + - It is generally used to transfer models between different frameworks or used on the inference engine ([TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)). + - At present, mindspire only supports the export of ONNX model, and does not support loading onnx model for inference. Currently, the models supported for export are resnet50, yolov3_ darknet53, YOLOv4 and BERT. These models can be used on [ONNX Runtime](https://onnxruntime.ai/). +- Ascend Intermediate Representation (AIR) + - AIR is an open file format defined by Huawei for machine learning. + - It adapts to Huawei AI processors well and is generally used to execute inference tasks on Ascend 310. + +## Inference Execution + +Inference can be classified into the following two modes based on the application environment: + +1. Local inference + + Load a checkpoint file generated during network training and call the `model.predict` API for inference and validation. For details, see [Online Inference with Checkpoint](https://www.mindspore.cn/docs/programming_guide/en/master/online_inference.html). + +2. Cross-platform inference + + Use a network definition and a checkpoint file, call the `export` API to export a model file, and perform inference on different platforms. Currently, MindIR, ONNX, and AIR (on only Ascend AI Processors) models can be exported. For details, see [Saving Models](https://www.mindspore.cn/docs/programming_guide/en/master/save_model.html). + +## Introduction to MindIR + +MindSpore defines logical network structures and operator attributes through a unified IR, and decouples model files in MindIR format from hardware platforms to implement one-time training and multiple-time deployment. + +1. Overview + + As a unified model file of MindSpore, MindIR stores network structures and weight parameter values. In addition, it can be deployed on the on-cloud Serving and the on-device Lite platforms to execute inference tasks. + + A MindIR file supports the deployment of multiple hardware forms. + + - On-cloud deployment and inference on Serving: After MindSpore trains and generates a MindIR model file, the file can be directly sent to MindSpore Serving for loading and inference. No additional model conversion is required. This ensures that models on different hardware such as Ascend, GPU, and CPU are unified. + - On-device inference and deployment on Lite: MindIR can be directly used for Lite deployment. In addition, to meet the lightweight requirements on devices, the model miniaturization and conversion functions are provided. An original MindIR model file can be converted from the Protocol Buffers format to the FlatBuffers format for storage, and the network structure is lightweight to better meet the performance and memory requirements on devices. + +2. Application Scenarios + + Use a network definition and a checkpoint file to export a MindIR model file, and then execute inference based on different requirements, for example, [Inference Using the MindIR Model on Ascend 310 AI Processors](https://www.mindspore.cn/docs/programming_guide/en/master/multi_platform_inference_ascend_310_mindir.html), [MindSpore Serving-based Inference Service Deployment](https://www.mindspore.cn/serving/docs/en/master/serving_example.html), and [Inference on Devices](https://www.mindspore.cn/lite/docs/en/master/index.html). diff --git a/tutorials/experts/source_en/model_infer/inference_ascend_310.rst b/tutorials/experts/source_en/model_infer/inference_ascend_310.rst new file mode 100644 index 0000000000000000000000000000000000000000..d5a000cb36434a2780967100ca634ae05efcb5a6 --- /dev/null +++ b/tutorials/experts/source_en/model_infer/inference_ascend_310.rst @@ -0,0 +1,14 @@ +Inference on Ascend 310 +=============================== + +Ascend 310 is a high-efficiency and highly integrated AI processor for edge scenes. It supports to perform inference on MindIR format and AIR format models. + +MindIR format can be exported by MindSpore CPU, GPU, Ascend 910, and can be run on GPU, Ascend 910, Ascend 310. There is no need to manually perform model conversion before inference. MindSpore needs to be installed during inference, and MindSpore C++ API is called for inference. + +AIR format can only be exported by MindSpore Ascend 910 and only Ascend 310 can infer. Before inference, the atc tool in Ascend CANN needs to be used for model conversion. MindSpore is not required for inference, only Ascend CANN software package is required. + +.. toctree:: + :maxdepth: 1 + + inference_ascend_310_mindir + inference_ascend_310_air diff --git a/tutorials/experts/source_en/operation/op_overload.md b/tutorials/experts/source_en/operation/op_overload.md new file mode 100644 index 0000000000000000000000000000000000000000..bb1818d5a6f57eda78168bcd063a97f3cc5f4e8e --- /dev/null +++ b/tutorials/experts/source_en/operation/op_overload.md @@ -0,0 +1,78 @@ +# Operation Overloading + +`Ascend` `GPU` `CPU` `Model Development` + + + +## Overview + +`mindspore.ops.composite` provide some operator combinations related to graph transformation such as `MultitypeFuncGraph` and `HyperMap`. + +## MultitypeFuncGraph + +`MultitypeFuncGraph` is used to generate overloaded functions to support different types of input. Users can use `MultitypeFuncGraph` to define a group of overloaded functions. The implementation varies according to the function type. First initialize a `MultitypeFuncGraph` object, and use `register` with input type as the decorator of the function to be registered, so that the object can be called with different types of inputs. For more instructions, see [MultitypeFuncGraph](https://www.mindspore.cn/docs/api/en/master/api_python/ops/mindspore.ops.MultitypeFuncGraph.html). + +A code example is as follows: + +```python +import numpy as np +from mindspore.ops import MultitypeFuncGraph +from mindspore import Tensor +import mindspore.ops as ops + +add = MultitypeFuncGraph('add') +@add.register("Number", "Number") +def add_scalar(x, y): + return ops.scalar_add(x, y) + +@add.register("Tensor", "Tensor") +def add_tensor(x, y): + return ops.add(x, y) + +tensor1 = Tensor(np.array([[1.2, 2.1], [2.2, 3.2]]).astype('float32')) +tensor2 = Tensor(np.array([[1.2, 2.1], [2.2, 3.2]]).astype('float32')) +print('tensor', add(tensor1, tensor2)) +print('scalar', add(1, 2)) +``` + +The following information is displayed: + +```text +tensor [[2.4 4.2] + [4.4 6.4]] +scalar 3 +``` + +## HyperMap + +`HyperMap` can apply an specified operation to one or more input sequences, which can be used with `MultitypeFuncGraph`. For example, after defining a group of overloaded `add` functions, we can apply `add` operation to multiple input groups of different types. Unlike `Map`, `HyperMap` can be used in nested structures to perform specified operations on the input in a sequence or nested sequence. For more instructions, see [HyperMap](https://www.mindspore.cn/docs/api/en/master/api_python/ops/mindspore.ops.HyperMap.html). + +A code example is as follows: + +```python +from mindspore import dtype as mstype +from mindspore import Tensor +from mindspore.ops import MultitypeFuncGraph, HyperMap +import mindspore.ops as ops + +add = MultitypeFuncGraph('add') +@add.register("Number", "Number") +def add_scalar(x, y): + return ops.scalar_add(x, y) + +@add.register("Tensor", "Tensor") +def add_tensor(x, y): + return ops.tensor_add(x, y) + +add_map = HyperMap(add) +output = add_map((Tensor(1, mstype.float32), Tensor(2, mstype.float32), 1), (Tensor(3, mstype.float32), Tensor(4, mstype.float32), 2)) +print("output =", output) +``` + +The following information is displayed: + +```text +output = (Tensor(shape=[], dtype=Float32, value= 4), Tensor(shape=[], dtype=Float32, value= 6), 3) +``` + +In this example, the input of `add_map` contains two sequences. `HyperMap` will get the corresponding elements from the two sequences as `x` and `y` for the inputs of `add` in the form of `operation(args[0][i], args[1][i])`. For example, `add(Tensor(1, mstype.float32), Tensor(3, mstype.float32))`. diff --git a/tutorials/experts/source_en/parallel/images/sharding_propagation.png b/tutorials/experts/source_en/parallel/images/sharding_propagation.png new file mode 100644 index 0000000000000000000000000000000000000000..9ee70d8287d986794163a8be4904acb239fcf802 Binary files /dev/null and b/tutorials/experts/source_en/parallel/images/sharding_propagation.png differ diff --git a/tutorials/experts/source_en/parallel/images/tensor_layout.png b/tutorials/experts/source_en/parallel/images/tensor_layout.png new file mode 100644 index 0000000000000000000000000000000000000000..b4cc8f92dd1aceb482afc4b8b71fb99bf427ac08 Binary files /dev/null and b/tutorials/experts/source_en/parallel/images/tensor_layout.png differ diff --git a/tutorials/experts/source_en/parallel/images/tensor_redistribution.png b/tutorials/experts/source_en/parallel/images/tensor_redistribution.png new file mode 100644 index 0000000000000000000000000000000000000000..2275c38bd31bcadf4fd2b60c743900b3e3464a68 Binary files /dev/null and b/tutorials/experts/source_en/parallel/images/tensor_redistribution.png differ diff --git a/tutorials/experts/source_zh_cn/data_engine/images/cache_dataset.png b/tutorials/experts/source_zh_cn/data_engine/images/cache_dataset.png new file mode 100644 index 0000000000000000000000000000000000000000..665ed25a9a721c74c7c12bdfa5650f6ef792bf81 Binary files /dev/null and b/tutorials/experts/source_zh_cn/data_engine/images/cache_dataset.png differ diff --git a/tutorials/experts/source_zh_cn/data_engine/images/cache_processed_data.png b/tutorials/experts/source_zh_cn/data_engine/images/cache_processed_data.png new file mode 100644 index 0000000000000000000000000000000000000000..11327cba87a190137070b4823546407614ff3c92 Binary files /dev/null and b/tutorials/experts/source_zh_cn/data_engine/images/cache_processed_data.png differ diff --git a/tutorials/experts/source_zh_cn/data_engine/images/eager_mode_en.jpeg b/tutorials/experts/source_zh_cn/data_engine/images/eager_mode_en.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..5f8e8c13c122c36c6adb6c5516630f0ac953f682 Binary files /dev/null and b/tutorials/experts/source_zh_cn/data_engine/images/eager_mode_en.jpeg differ diff --git a/tutorials/experts/source_zh_cn/parallel/optimizer parallel.md b/tutorials/experts/source_zh_cn/parallel/optimizer_parallel.md similarity index 100% rename from tutorials/experts/source_zh_cn/parallel/optimizer parallel.md rename to tutorials/experts/source_zh_cn/parallel/optimizer_parallel.md