diff --git a/tutorials/requirements.txt b/tutorials/requirements.txt index 13396200381cc893c7a1ee023cbc0341eeea9f87..595e6d8df76bf21707fa108959ad7d8e91c20b52 100644 --- a/tutorials/requirements.txt +++ b/tutorials/requirements.txt @@ -1,5 +1,4 @@ -sphinx >= 2.2.1, <= 2.4.4 +sphinx recommonmark sphinx-markdown-tables -sphinx_rtd_theme -jieba +sphinx_rtd_theme \ No newline at end of file diff --git a/tutorials/source_en/compile.md b/tutorials/source_en/compile.md new file mode 100644 index 0000000000000000000000000000000000000000..9f7ae53c96ac66900cc3a590cd2cfb224f2ad3f7 --- /dev/null +++ b/tutorials/source_en/compile.md @@ -0,0 +1,129 @@ +# Compile + + + +- [compilation](#compilation) + - [Environment Requirements](#environment-requirements) + - [Compilation Options](#compilation-options) + - [Output Description](#output-description) + - [Compilation Example](#compilation-example) + + + + + +This document describes how to quickly install MindSpore Lite on the Ubuntu system. + +## Environment Requirements + +- The compilation environment supports Linux x86_64 only. Ubuntu 18.04.02 LTS is recommended. + +- Compilation dependencies (basics): + - [CMake](https://cmake.org/download/) >= 3.14.1 + - [GCC](https://gcc.gnu.org/releases.html) >= 7.3.0 + - [Android_NDK r20b](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) + + > - `Android_NDK` needs to be installed only when the Arm version is compiled. Skip this dependency when the x86_64 version is compiled. + > - To install and use `Android_NDK`, you need to configure environment variables. The command example is `export ANDROID_NDK={$NDK_PATH}/android-ndk-r20b`. + +- Compilation dependencies (additional dependencies required by the MindSpore Lite model conversion tool, which is required only for compilation of the x86_64 version) + - [Autoconf](http://ftp.gnu.org/gnu/autoconf/) >= 2.69 + - [Libtool](https://www.gnu.org/software/libtool/) >= 2.4.6 + - [LibreSSL](http://www.libressl.org/) >= 3.1.3 + - [Automake](https://www.gnu.org/software/automake/) >= 1.11.6 + - [Libevent](https://libevent.org) >= 2.0 + - [M4](https://www.gnu.org/software/m4/m4.html) >= 1.4.18 + - [OpenSSL](https://www.openssl.org/) >= 1.1.1 + + +## Compilation Options + +MindSpore Lite provides multiple compilation options. You can select different compilation options as required. + +| Parameter | Parameter Description | Value Range | Mandatory or Not | +| -------- | ----- | ---- | ---- | +| -d | If this parameter is set, the debug version is compiled. Otherwise, the release version is compiled. | - | No | +| -i | If this parameter is set, incremental compilation is performed. Otherwise, full compilation is performed. | - | No | +| -j[n] | Sets the number of threads used during compilation. Otherwise, the number of threads is set to 8 by default. | - | No | +| -I | Selects an applicable architecture. | arm64, arm32, or x86_64 | Yes | +| -e | In the Arm architecture, select the backend operator and set the `gpu` parameter. The built-in GPU operator of the framework is compiled at the same time. | GPU | No | +| -h | Displays the compilation help information. | - | No | + +> When the `-I` parameter changes, that is, the applicable architecture is changed, the `-i` parameter cannot be used for incremental compilation. + +## Output Description + +After the compilation is complete, go to the `mindspore/output` directory of the source code to view the file generated after compilation. The file is named `mindspore-lite-{version}-{function}-{OS}.tar.gz`. After decompression, the tool package named `mindspore-lite-{version}-{function}-{OS}` can be obtained. + +> version: version of the output, consistent with that of the MindSpore. +> +> function: function of the output. `converter` indicates the output of the conversion tool and `runtime` indicates the output of the inference framework. +> +> OS: OS on which the output will be deployed. + +```bash +tar -xvf mindspore-lite-{version}-{function}-{OS}.tar.gz +``` + +For the x86 architecture, you can obtain the output of the conversion tool and inference framework;But for the ARM architecture, you only get inference framework. + +Generally, the compiled output files include the following types. The architecture selection affects the types of output files. + +> For the Arm 64-bit architecture, you can obtain the output of the `arm64-cpu` inference framework. If `-e gpu` is added, you can obtain the output of the `arm64-gpu` inference framework. The compilation for arm 64-bit is the same as that for arm 32-bit. + +| Directory | Description | converter | runtime | +| --- | --- | --- | --- | +| include | Inference framework header file | No | Yes | +| lib | Inference framework dynamic library | No | Yes | +| benchmark | Benchmark test tool | No | Yes | +| time_profiler | Time consumption analysis tool at the model network layer| No | Yes | +| converter | Model conversion tool | Yes | No | No | +| third_party | Header file and library of the third-party library | Yes | Yes | + +Take the 0.7.0-beta version and CPU as an example. The contents of `third party` and `lib` vary depending on the architecture as follows: +- `mindspore-lite-0.7.0-converter-ubuntu`: `third party`include `protobuf` (Protobuf dynamic library). +- `mindspore-lite-0.7.0-runtime-x86-cpu`: `third party`include `flatbuffers` (FlatBuffers header file), `lib`include`libmindspore-lite.so`(Dynamic library of MindSpore Lite inference framework). +- `mindspore-lite-0.7.0-runtime-arm64-cpu`: `third party`include `flatbuffers` (FlatBuffers header file), `lib`include`libmindspore-lite.so`(Dynamic library of MindSpore Lite inference framework) and `liboptimize.so`(Dynamic library of MindSpore Lite advanced operators). + +> `liboptimize.so` only exits in runtime-arm64 outputs, and only can be used in the CPU which supports armv8.2 and fp16. + +> Before running the tools in the `converter`, `benchmark`, or `time_profiler` directory, you need to configure environment variables and set the paths of the dynamic libraries of MindSpore Lite and Protobuf to the paths of the system dynamic libraries. The following uses the 0.7.0-beta version as an example: `export LD_LIBRARY_PATH=./mindspore-lite-0.7.0/lib:./mindspore-lite-0.7.0/third_party/protobuf/lib:${LD_LIBRARY_PATH}`. + +## Compilation Example + +First, download source code from the MindSpore code repository. + +```bash +git clone https://gitee.com/mindspore/mindspore.git +``` + +Then, run the following commands in the root directory of the source code to compile MindSpore Lite of different versions: + +- Debug version of the x86_64 architecture: + ```bash + bash build.sh -I x86_64 -d + ``` + +- Release version of the x86_64 architecture, with the number of threads set: + ```bash + bash build.sh -I x86_64 -j32 + ``` + +- Release version of the Arm 64-bit architecture in incremental compilation mode, with the number of threads set: + ```bash + bash build.sh -I arm64 -i -j32 + ``` + +- Release version of the Arm 64-bit architecture in incremental compilation mode, with the built-in GPU operator compiled: + ```bash + bash build.sh -I arm64 -e gpu + ``` + +> - In the `build.sh` script, run the `git clone` command to obtain the code in the third-party dependency library. Ensure that the network settings of Git are correct. + +Take the 0.7.0-beta version as an example. After the release version of the x86_64 architecture is compiled, go to the `mindspore/output` directory and run the following decompression command to obtain the output files `include`, `lib`, `benchmark`, `time_profiler`, `converter`, and `third_party`: + +```bash +tar -xvf mindspore-lite-0.7.0-converter-ubuntu.tar.gz +tar -xvf mindspore-lite-0.7.0-runtime-x86-cpu.tar.gz +``` diff --git a/tutorials/source_en/conf.py b/tutorials/source_en/conf.py index 0a00ad8da18607c9f0ac88017972211d04c763c0..a341838e9f6e67d53b005bd741fcc22fff7dec32 100644 --- a/tutorials/source_en/conf.py +++ b/tutorials/source_en/conf.py @@ -16,9 +16,9 @@ import sys # -- Project information ----------------------------------------------------- -project = 'MindSpore' -copyright = '2020, MindSpore' -author = 'MindSpore' +project = 'MindSpore Lite' +copyright = '2020, MindSpore Lite' +author = 'MindSpore Lite' # The full version, including alpha/beta/rc tags release = 'master' diff --git a/tutorials/source_en/images/lite_quick_start_app_result.jpg b/tutorials/source_en/images/lite_quick_start_app_result.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9287aad111992c39145c70f6a473818e31402bc7 Binary files /dev/null and b/tutorials/source_en/images/lite_quick_start_app_result.jpg differ diff --git a/tutorials/source_en/images/lite_quick_start_home.png b/tutorials/source_en/images/lite_quick_start_home.png new file mode 100644 index 0000000000000000000000000000000000000000..c48cf581b33afbc15dbf27be495215b999e1be60 Binary files /dev/null and b/tutorials/source_en/images/lite_quick_start_home.png differ diff --git a/tutorials/source_en/images/lite_quick_start_project_structure.png b/tutorials/source_en/images/lite_quick_start_project_structure.png new file mode 100644 index 0000000000000000000000000000000000000000..ade37a61ef97a479401240215e302011c014824c Binary files /dev/null and b/tutorials/source_en/images/lite_quick_start_project_structure.png differ diff --git a/tutorials/source_en/images/lite_quick_start_run_app.PNG b/tutorials/source_en/images/lite_quick_start_run_app.PNG new file mode 100644 index 0000000000000000000000000000000000000000..2557b6293de5b3d7fefe7f6e58b57c03deabb55d Binary files /dev/null and b/tutorials/source_en/images/lite_quick_start_run_app.PNG differ diff --git a/tutorials/source_en/images/lite_quick_start_sdk.png b/tutorials/source_en/images/lite_quick_start_sdk.png new file mode 100644 index 0000000000000000000000000000000000000000..1fcb8acabc9ba9d289efbe7e82ee5e2da8bfe073 Binary files /dev/null and b/tutorials/source_en/images/lite_quick_start_sdk.png differ diff --git a/tutorials/source_en/index.rst b/tutorials/source_en/index.rst index 66f092f0928d71d777150fe0ebf7125f72963689..569f33def4647002337f602cf29a5341f136a05f 100644 --- a/tutorials/source_en/index.rst +++ b/tutorials/source_en/index.rst @@ -1,75 +1,25 @@ .. MindSpore documentation master file, created by - sphinx-quickstart on Thu Mar 24 09:00:00 2020. + sphinx-quickstart on Thu Aug 17 09:00:00 2020. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -MindSpore Tutorials -=================== +MindSpore Lite Tutorials +======================== .. toctree:: :glob: :maxdepth: 1 :caption: Quick Start + compile quick_start/quick_start - quick_start/quick_video .. toctree:: :glob: :maxdepth: 1 :caption: Use - use/data_preparation/data_preparation - use/defining_the_network - use/saving_and_loading_model_parameters - use/multi_platform_inference - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: Application - - advanced_use/computer_vision_application - advanced_use/nlp_application - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: Model Optimization - - advanced_use/debugging_in_pynative_mode - advanced_use/customized_debugging_information - advanced_use/visualization_tutorials - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: Performance Optimization - - advanced_use/distributed_training_tutorials - advanced_use/mixed_precision - advanced_use/graph_kernel_fusion - advanced_use/quantization_aware - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: Usage on Device - - advanced_use/on_device_inference - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: Network Migration - - advanced_use/network_migration - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: AI Security and Privacy - - advanced_use/model_security - advanced_use/differential_privacy - + use/converter_tool + use/runtime + use/benchmark_tool + use/timeprofiler_tool diff --git a/tutorials/source_en/quick_start/quick_start.md b/tutorials/source_en/quick_start/quick_start.md index 4e37ef22432b2afc587d9ea73ff92c9b41d08e34..349d7a4ddeae35ca90d1dc172503cd4e136693d9 100644 --- a/tutorials/source_en/quick_start/quick_start.md +++ b/tutorials/source_en/quick_start/quick_start.md @@ -1,452 +1,337 @@ -# Implementing an Image Classification Application - -`Ascend` `GPU` `CPU` `Whole Process` `Beginner` `Intermediate` `Expert` +# Quick Start - -- [Implementing an Image Classification Application](#implementing-an-image-classification-application) +- [Quick Start ](#quick-start) - [Overview](#overview) - - [Preparations](#preparations) - - [Downloading the Dataset](#downloading-the-dataset) - - [Importing Python Libraries and Modules](#importing-python-libraries-and-modules) - - [Configuring the Running Information](#configuring-the-running-information) - - [Processing Data](#processing-data) - - [Defining the Dataset and Data Operations](#defining-the-dataset-and-data-operations) - - [Defining the Network](#defining-the-network) - - [Defining the Loss Function and Optimizer](#defining-the-loss-function-and-optimizer) - - [Basic Concepts](#basic-concepts) - - [Defining the Loss Function](#defining-the-loss-function) - - [Defining the Optimizer](#defining-the-optimizer) - - [Training the Network](#training-the-network) - - [Saving the Configured Model](#saving-the-configured-model) - - [Configuring the Network Training](#configuring-the-network-training) - - [Running and Viewing the Result](#running-and-viewing-the-result) - - [Validating the Model](#validating-the-model) + - [Selecting a Model](#selecting-a-model) + - [Converting a Model](#converting-a-model) + - [Deploying an Application](#deploying-an-application) + - [Running Dependencies](#running-dependencies) + - [Building and Running](#building-and-running) + - [Detailed Description of the Sample Program](#detailed-description-of-the-sample-program) + - [Sample Program Structure](#sample-program-structure) + - [Configuring MindSpore Lite Dependencies](#configuring-mindspore-lite-dependencies) + - [Downloading and Deploying a Model File](#downloading-and-deploying-a-model-file) + - [Compiling On-Device Inference Code](#compiling-on-device-inference-code) - + ## Overview -This document uses a practice example to demonstrate the basic functions of MindSpore. For common users, it takes 20 to 30 minutes to complete the practice. - -During the practice, a simple image classification function is implemented. The overall process is as follows: -1. Process the required dataset. The MNIST dataset is used in this example. -2. Define a network. The LeNet network is used in this example. -3. Define the loss function and optimizer. -4. Load dataset, perform training. After the training is complete, check the result and save the model file. -5. Load the saved model for inference. -6. Validate the model, load the test dataset and trained model, and validate the result accuracy. - -> You can find the complete executable sample code at . - -This is a simple and basic application process. For other advanced and complex applications, extend this basic process as needed. - -## Preparations - -Before you start, check whether MindSpore has been correctly installed. If no, install MindSpore on your computer by visiting [MindSpore installation page](https://www.mindspore.cn/install). - -In addition, you shall have basic mathematical knowledge such as Python coding basics, probability, and matrix. +It is recommended that you start from the image classification demo on the Android device to understand how to build the MindSpore Lite application project, configure dependencies, and use related APIs. + +This tutorial demonstrates the on-device deployment process based on the image classification sample program on the Android device provided by the MindSpore team. -Start your MindSpore experience now. +1. Select an image classification model. +2. Convert the model into a MindSpore Lite model. +3. Use the MindSpore Lite inference model on the device. The following describes how to use the MindSpore Lite C++ APIs (Android JNIs) and MindSpore Lite image classification models to perform on-device inference, classify the content captured by a device camera, and display the most possible classification result on the application's image preview screen. + +> Click to find [Android image classification models](https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite) and [sample code](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/lite/image_classification). -### Downloading the Dataset +## Selecting a Model -The `MNIST` dataset used in this example consists of 10 classes of 28 x 28 pixels grayscale images. It has a training set of 60,000 examples, and a test set of 10,000 examples. +The MindSpore team provides a series of preset device models that you can use in your application. +Click [here](https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetv2.ms) to download image classification models in MindSpore ModelZoo. +In addition, you can use the preset model to perform migration learning to implement your image classification tasks. -> Download the MNIST dataset at . This page provides four download links of dataset files. The first two links are required for data training, and the last two links are required for data test. +## Converting a Model -Download the files, decompress them, and store them in the workspace directories `./MNIST_Data/train` and `./MNIST_Data/test`. +After you retrain a model provided by MindSpore, export the model in the [.mindir format](https://www.mindspore.cn/tutorial/en/master/use/saving_and_loading_model_parameters.html#mindir). Use the MindSpore Lite [model conversion tool](https://www.mindspore.cn/lite/tutorial/en/master/use/converter_tool.html) to convert the .mindir model to a .ms model. -The directory structure is as follows: - -``` -└─MNIST_Data - ├─test - │ t10k-images.idx3-ubyte - │ t10k-labels.idx1-ubyte - │ - └─train - train-images.idx3-ubyte - train-labels.idx1-ubyte -``` -> For ease of use, we added the function of automatically downloading datasets in the sample script. - -### Importing Python Libraries and Modules - -Before start, you need to import Python libraries. - -Currently, the `os` libraries are required. For ease of understanding, other required libraries will not be described here. - - -```python -import os +Take the mobilenetv2 model as an example. Execute the following script to convert a model into a MindSpore Lite model for on-device inference. +```bash +./converter_lite --fmk=MS --modelFile=mobilenetv2.mindir --outputFile=mobilenetv2.ms ``` -For details about MindSpore modules, search on the [MindSpore API Page](https://www.mindspore.cn/api/en/master/index.html). +## Deploying an Application -### Configuring the Running Information +The following section describes how to build and execute an on-device image classification task on MindSpore Lite. -Before compiling code, you need to learn basic information about the hardware and backend required for MindSpore running. +### Running Dependencies -You can use `context.set_context` to configure the information required for running, such as the running mode, backend information, and hardware information. +- Android Studio 3.2 or later (Android 4.0 or later is recommended.) +- Native development kit (NDK) 21.3 +- CMake 3.10.2 +- Android software development kit (SDK) 26 or later +- OpenCV 4.0.0 or later (included in the sample code) -Import the `context` module and configure the required information. +### Building and Running -```python -import argparse -from mindspore import context +1. Load the sample source code to Android Studio and install the corresponding SDK. (After the SDK version is specified, Android Studio automatically installs the SDK.) -if __name__ == "__main__": - parser = argparse.ArgumentParser(description='MindSpore LeNet Example') - parser.add_argument('--device_target', type=str, default="CPU", choices=['Ascend', 'GPU', 'CPU'], - help='device where the code will be implemented (default: CPU)') - args = parser.parse_args() - context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target) - dataset_sink_mode = not args.device_target == "CPU" - ... -``` - -This example runs in graph mode. You can configure hardware information based on site requirements. For example, if the code runs on the Ascend AI processor, set `--device_target` to `Ascend`. This rule also applies to the code running on the CPU and GPU. For details about parameters, see the API description for `context.set_context`. - -## Processing Data - -Datasets are important for training. A good dataset can effectively improve training accuracy and efficiency. Generally, before loading a dataset, you need to perform some operations on the dataset. - -### Defining the Dataset and Data Operations - -Define the `create_dataset` function to create a dataset. In this function, define the data augmentation and processing operations to be performed. - -1. Define the dataset. -2. Define parameters required for data augmentation and processing. -3. Generate corresponding data augmentation operations according to the parameters. -4. Use the `map` mapping function to apply data operations to the dataset. -5. Process the generated dataset. - -```python -import mindspore.dataset as ds -import mindspore.dataset.transforms.c_transforms as C -import mindspore.dataset.transforms.vision.c_transforms as CV -from mindspore.dataset.transforms.vision import Inter -from mindspore.common import dtype as mstype - -def create_dataset(data_path, batch_size=32, repeat_size=1, - num_parallel_workers=1): - """ create dataset for train or test - Args: - data_path: Data path - batch_size: The number of data records in each group - repeat_size: The number of replicated data records - num_parallel_workers: The number of parallel workers - """ - # define dataset - mnist_ds = ds.MnistDataset(data_path) - - # define operation parameters - resize_height, resize_width = 32, 32 - rescale = 1.0 / 255.0 - shift = 0.0 - rescale_nml = 1 / 0.3081 - shift_nml = -1 * 0.1307 / 0.3081 - - # define map operations - resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) # resize images to (32, 32) - rescale_nml_op = CV.Rescale(rescale_nml, shift_nml) # normalize images - rescale_op = CV.Rescale(rescale, shift) # rescale images - hwc2chw_op = CV.HWC2CHW() # change shape from (height, width, channel) to (channel, height, width) to fit network. - type_cast_op = C.TypeCast(mstype.int32) # change data type of label to int32 to fit network - - # apply map operations on images - mnist_ds = mnist_ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=resize_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers) + ![start_home](../images/lite_quick_start_home.png) - # apply DatasetOps - buffer_size = 10000 - mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script - mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True) - mnist_ds = mnist_ds.repeat(repeat_size) + Start Android Studio, click `File > Settings > System Settings > Android SDK`, and select the corresponding SDK. As shown in the following figure, select an SDK and click `OK`. Android Studio automatically installs the SDK. - return mnist_ds + ![start_sdk](../images/lite_quick_start_sdk.png) -``` - -In the preceding information: -`batch_size`: number of data records in each group. Currently, each group contains 32 data records. -`repeat_size`: number of replicated data records. + (Optional) If an NDK version issue occurs during the installation, manually download the corresponding [NDK version](https://developer.android.com/ndk/downloads) (the version used in the sample code is 21.3). Specify the SDK location in `Android NDK location` of `Project Structure`. -Perform the shuffle and batch operations, and then perform the repeat operation to ensure that data during an epoch is unique. + ![project_structure](../images/lite_quick_start_project_structure.png) -> MindSpore supports multiple data processing and augmentation operations, which are usually combined. For details, see section "Data Processing and Augmentation" in the MindSpore Tutorials (https://www.mindspore.cn/tutorial/en/master/use/data_preparation/data_processing_and_augmentation.html). +2. Connect to an Android device and runs the image classification application. + Connect to the Android device through a USB cable for debugging. Click `Run 'app'` to run the sample project on your device. -## Defining the Network + ![run_app](../images/lite_quick_start_run_app.PNG) -The LeNet network is relatively simple. In addition to the input layer, the LeNet network has seven layers, including two convolutional layers, two down-sample layers (pooling layers), and three full connection layers. Each layer contains different numbers of training parameters, as shown in the following figure: + For details about how to connect the Android Studio to a device for debugging, see . -![LeNet-5](./images/LeNet_5.jpg) - -> For details about the LeNet network, visit . +3. Continue the installation on the Android device. After the installation is complete, you can view the content captured by a camera and the inference result. -You need to initialize the full connection layers and convolutional layers. + ![result](../images/lite_quick_start_app_result.jpg) -`TruncatedNormal`: parameter initialization method. MindSpore supports multiple parameter initialization methods, such as `TruncatedNormal`, `Normal`, and `Uniform`. For details, see the description of the `mindspore.common.initializer` module of the MindSpore API. -The following is the sample code for initialization: +## Detailed Description of the Sample Program -```python -import mindspore.nn as nn -from mindspore.common.initializer import TruncatedNormal +This image classification sample program on the Android device includes a Java layer and a JNI layer. At the Java layer, the Android Camera 2 API is used to enable a camera to obtain image frames and process images. At the JNI layer, the model inference process is completed in [Runtime](https://www.mindspore.cn/lite/tutorial/en/master/use/runtime.html). -def weight_variable(): - """ - weight initial - """ - return TruncatedNormal(0.02) +> This following describes the JNI layer implementation of the sample program. At the Java layer, the Android Camera 2 API is used to enable a device camera and process image frames. Readers are expected to have the basic Android development knowledge. -def conv(in_channels, out_channels, kernel_size, stride=1, padding=0): - """ - conv layer weight initial - """ - weight = weight_variable() - return nn.Conv2d(in_channels, out_channels, - kernel_size=kernel_size, stride=stride, padding=padding, - weight_init=weight, has_bias=False, pad_mode="valid") +### Sample Program Structure -def fc_with_initialize(input_channels, out_channels): - """ - fc layer weight initial - """ - weight = weight_variable() - bias = weight_variable() - return nn.Dense(input_channels, out_channels, weight, bias) ``` - -To use MindSpore for neural network definition, inherit `mindspore.nn.cell.Cell`. `Cell` is the base class of all neural networks (such as `Conv2d`). - -Define each layer of a neural network in the `__init__` method in advance, and then define the `construct` method to complete the forward construction of the neural network. According to the structure of the LeNet network, define the network layers as follows: - -```python -import mindspore.ops.operations as P - -class LeNet5(nn.Cell): - """ - Lenet network structure - """ - #define the operator required - def __init__(self): - super(LeNet5, self).__init__() - self.conv1 = conv(1, 6, 5) - self.conv2 = conv(6, 16, 5) - self.fc1 = fc_with_initialize(16 * 5 * 5, 120) - self.fc2 = fc_with_initialize(120, 84) - self.fc3 = fc_with_initialize(84, 10) - self.relu = nn.ReLU() - self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) - self.reshape = P.Reshape() - - #use the preceding operators to construct networks - def construct(self, x): - x = self.conv1(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.conv2(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.reshape(x, (self.batch_size, -1)) - x = self.fc1(x) - x = self.relu(x) - x = self.fc2(x) - x = self.relu(x) - x = self.fc3(x) - return x +app +| +├── libs # library files that store MindSpore Lite dependencies +│ └── arm64-v8a +│ ├── libopencv_java4.so +│ └── libmindspore-lite.so +│ +├── opencv # dependency files related to OpenCV +│ └── ... +| +├── src/main +│ ├── assets # resource files +| | └── model.ms # model file +│ | +│ ├── cpp # main logic encapsulation classes for model loading and prediction +| | ├── include # header files related to MindSpore calling +| | | └── ... +│ | | +| | ├── MindSporeNetnative.cpp # JNI methods related to MindSpore calling +│ | └── MindSporeNetnative.h # header file +│ | +│ ├── java # application code at the Java layer +│ │ └── com.huawei.himindsporedemo +│ │ ├── gallery.classify # implementation related to image processing and MindSpore JNI calling +│ │ │ └── ... +│ │ └── obejctdetect # implementation related to camera enabling and drawing +│ │ └── ... +│ │ +│ ├── res # resource files related to Android +│ └── AndroidManifest.xml # Android configuration file +│ +├── CMakeList.txt # CMake compilation entry file +│ +├── build.gradle # Other Android configuration file +└── ... ``` -## Defining the Loss Function and Optimizer - -### Basic Concepts +### Configuring MindSpore Lite Dependencies -Before definition, this section briefly describes concepts of loss function and optimizer. +When MindSpore C++ APIs are called at the Android JNI layer, related library files are required. You can use MindSpore Lite [source code compilation](https://www.mindspore.cn/lite/tutorial/en/master/compile.html) to generate the `libmindspore-lite.so` library file. -- Loss function: It is also called objective function and is used to measure the difference between a predicted value and an actual value. Deep learning reduces the value of the loss function by continuous iteration. Defining a good loss function can effectively improve the model performance. -- Optimizer: It is used to minimize the loss function, improving the model during training. +In Android Studio, place the compiled `libmindspore-lite.so` library file (which can contain multiple compatible architectures) in the `app/libs/ARM64-V8a` (Arm64) or `app/libs/armeabi-v7a` (Arm32) directory of the application project. In the `build.gradle` file of the application, configure the compilation support of CMake, `arm64-v8a`, and `armeabi-v7a`.   -After the loss function is defined, the weight-related gradient of the loss function can be obtained. The gradient is used to indicate the weight optimization direction for the optimizer, improving model performance. - -### Defining the Loss Function - -Loss functions supported by MindSpore include `SoftmaxCrossEntropyWithLogits`, `L1Loss`, `MSELoss`. The loss function `SoftmaxCrossEntropyWithLogits` is used in this example. - -```python -from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits ``` - -Call the defined loss function in the `__main__` function. - -```python -if __name__ == "__main__": - ... - #define the loss function - net_loss = SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction='mean') - ... +android{ + defaultConfig{ + externalNativeBuild{ + cmake{ + arguments "-DANDROID_STL=c++_shared" + } + } + + ndk{ + abiFilters'armeabi-v7a', 'arm64-v8a' + } + } +} ``` -### Defining the Optimizer +Create a link to the `.so` library file in the `app/CMakeLists.txt` file: -Optimizers supported by MindSpore include `Adam`, `AdamWeightDecay` and `Momentum`. - -The popular Momentum optimizer is used in this example. - -```python -if __name__ == "__main__": - ... - #learning rate setting - lr = 0.01 - momentum = 0.9 - #create the network - network = LeNet5() - #define the optimizer - net_opt = nn.Momentum(network.trainable_params(), lr, momentum) - ... ``` - -## Training the Network - -### Saving the Configured Model - -MindSpore provides the callback mechanism to execute customized logic during training. `ModelCheckpoint` provided by the framework is used in this example. -`ModelCheckpoint` can save network models and parameters for subsequent fine-tuning. - -```python -from mindspore.train.callback import ModelCheckpoint, CheckpointConfig - -if __name__ == "__main__": +# Set MindSpore Lite Dependencies. +include_directories(${CMAKE_SOURCE_DIR}/src/main/cpp/include/MindSpore) +add_library(mindspore-lite SHARED IMPORTED ) +set_target_properties(mindspore-lite PROPERTIES + IMPORTED_LOCATION "${CMAKE_SOURCE_DIR}/libs/libmindspore-lite.so") + +# Set OpenCV Dependecies. +include_directories(${CMAKE_SOURCE_DIR}/opencv/sdk/native/jni/include) +add_library(lib-opencv SHARED IMPORTED ) +set_target_properties(lib-opencv PROPERTIES + IMPORTED_LOCATION "${CMAKE_SOURCE_DIR}/libs/libopencv_java4.so") + +# Link target library. +target_link_libraries( ... - # set parameters of check point - config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10) - # apply parameters of check point - ckpoint_cb = ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck) + mindspore-lite + lib-opencv ... +) ``` -### Configuring the Network Training - -Use the `model.train` API provided by MindSpore to easily train the network. `LossMonitor` can monitor the changes of the `loss` value during training. -In this example, set `epoch_size` to 1 to train the dataset for five iterations. - -```python -from mindspore.nn.metrics import Accuracy -from mindspore.train.callback import LossMonitor -from mindspore.train import Model - -... -def train_net(args, model, epoch_size, mnist_path, repeat_size, ckpoint_cb, sink_mode): - """define the training method""" - print("============== Starting Training ==============") - #load training dataset - ds_train = create_dataset(os.path.join(mnist_path, "train"), 32, repeat_size) - model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor()], dataset_sink_mode=sink_mode) # train -... - -if __name__ == "__main__": - ... - - epoch_size = 1 - mnist_path = "./MNIST_Data" - repeat_size = 1 - model = Model(network, net_loss, net_opt, metrics={"Accuracy": Accuracy()}) - train_net(args, model, epoch_size, mnist_path, repeat_size, ckpoint_cb, dataset_sink_mode) - ... -``` -In the preceding information: -In the `train_net` method, we loaded the training dataset, `MNIST path` is MNIST dataset path. -## Running and Viewing the Result -Run the script using the following command: -``` -python lenet.py --device_target=CPU -``` -In the preceding information: -`Lenet. Py`: the script file you wrote. -`--device_target CPU`: Specify the hardware platform.The parameters are 'CPU', 'GPU' or 'Ascend'. +In this example, the download.gradle File configuration auto download ` libmindspot-lite.so `and `libopencv_ Java4.so` library file, placed in the 'app / libs / arm64-v8a' directory. -Loss values are output during training, as shown in the following figure. Although loss values may fluctuate, they gradually decrease and the accuracy gradually increases in general. Loss values displayed each time may be different because of their randomicity. +Note: if the automatic download fails, please manually download the relevant library files and put them in the corresponding location. -The following is an example of loss values output during training: +libmindspore-lite.so [libmindspore-lite.so]( https://download.mindspore.cn/model_zoo/official/lite/lib/mindspore%20version%200.7/libmindspore-lite.so) -```bash -... -epoch: 1 step: 262, loss is 1.9212162 -epoch: 1 step: 263, loss is 1.8498616 -epoch: 1 step: 264, loss is 1.7990671 -epoch: 1 step: 265, loss is 1.9492403 -epoch: 1 step: 266, loss is 2.0305142 -epoch: 1 step: 267, loss is 2.0657792 -epoch: 1 step: 268, loss is 1.9582214 -epoch: 1 step: 269, loss is 0.9459006 -epoch: 1 step: 270, loss is 0.8167224 -epoch: 1 step: 271, loss is 0.7432692 -... -``` +libmindspore-lite include [libmindspore-lite include]( https://download.mindspore.cn/model_zoo/official/lite/lib/mindspore%20version%200.7/include.zip) -The following is an example of model files saved after training: +libopencv_java4.so [libopencv_java4.so](https://download.mindspore.cn/model_zoo/official/lite/lib/opencv%204.4.0/libopencv_java4.so) -```bash -checkpoint_lenet-1_1875.ckpt -``` +libopencv include [libopencv include]( https://download.mindspore.cn/model_zoo/official/lite/lib/opencv%204.4.0/include.zip) -In the preceding information: -`checkpoint_lenet-1_1875.ckpt`: saved model parameter file. The following refers to saved files as well. The file name format is checkpoint_*network name*-*epoch No.*_*step No.*.ckpt. -## Validating the Model -After get the model file, we verify the generalization ability of the model. +### Downloading and Deploying a Model File +In this example, the download.gradle File configuration auto download `mobilenetv2.ms `and placed in the 'app / libs / arm64-v8a' directory. -```python -from mindspore.train.serialization import load_checkpoint, load_param_into_net +Note: if the automatic download fails, please manually download the relevant library files and put them in the corresponding location. -def test_net(args,network,model,mnist_path): - """define the evaluation method""" - print("============== Starting Testing ==============") - #load the saved model for evaluation - param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt") - #load parameter to the network - load_param_into_net(network, param_dict) - #load testing dataset - ds_eval = create_dataset(os.path.join(mnist_path, "test")) # test - acc = model.eval(ds_eval, dataset_sink_mode=False) - print("============== Accuracy:{} ==============".format(acc)) +mobilenetv2.ms [mobilenetv2.ms]( https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetv2.ms) -if __name__ == "__main__": - ... - test_net(args, network, model, mnist_path) -``` -In the preceding information: -`load_checkpoint`: This API is used to load the CheckPoint model parameter file and return a parameter dictionary. -`checkpoint_lenet-3_1404.ckpt`: name of the saved CheckPoint model file. -`load_param_into_net`: This API is used to load parameters to the network. +### Compiling On-Device Inference Code -Run the script using the following command: -``` -python lenet.py --device_target=CPU -``` -In the preceding information: -`Lenet. Py`: the script file you wrote. -`--device_target CPU`: Specify the hardware platform.The parameters are 'CPU', 'GPU' or 'Ascend'. +Call MindSpore Lite C++ APIs at the JNI layer to implement on-device inference. -Command output similar to the following is displayed: +The inference code process is as follows. For details about the complete code, see `src/cpp/MindSporeNetnative.cpp`. -``` -============== Starting Testing ============== -============== Accuracy:{'Accuracy': 0.9742588141025641} ============== -``` +1. Load the MindSpore Lite model file and build the context, session, and computational graph for inference. -The model accuracy data is displayed in the output content. In the example, the accuracy reaches 97.4%, indicating a good model quality. + - Load a model file. Create and configure the context for model inference. + ```cpp + // Buffer is the model data passed in by the Java layer + jlong bufferLen = env->GetDirectBufferCapacity(buffer); + char *modelBuffer = CreateLocalModelBuffer(env, buffer); + ``` + + - Create a session. + ```cpp + void **labelEnv = new void *; + MSNetWork *labelNet = new MSNetWork; + *labelEnv = labelNet; + + // Create context. + lite::Context *context = new lite::Context; + + context->device_ctx_.type = lite::DT_CPU; + context->thread_num_ = numThread; //Specify the number of threads to run inference + + // Create the mindspore session. + labelNet->CreateSessionMS(modelBuffer, bufferLen, "device label", context); + delete(context); + + ``` + + - Load the model file and build a computational graph for inference. + ```cpp + void MSNetWork::CreateSessionMS(char* modelBuffer, size_t bufferLen, std::string name, mindspore::lite::Context* ctx) + { + CreateSession(modelBuffer, bufferLen, ctx); + session = mindspore::session::LiteSession::CreateSession(ctx); + auto model = mindspore::lite::Model::Import(modelBuffer, bufferLen); + int ret = session->CompileGraph(model); + } + ``` + +2. Convert the input image into the Tensor format of the MindSpore model. + + Convert the image data to be detected into the Tensor format of the MindSpore model. + + ```cpp + // Convert the Bitmap image passed in from the JAVA layer to Mat for OpenCV processing + BitmapToMat(env, srcBitmap, matImageSrc); + // Processing such as zooming the picture size. + matImgPreprocessed = PreProcessImageData(matImageSrc); + + ImgDims inputDims; + inputDims.channel = matImgPreprocessed.channels(); + inputDims.width = matImgPreprocessed.cols; + inputDims.height = matImgPreprocessed.rows; + float *dataHWC = new float[inputDims.channel * inputDims.width * inputDims.height] + + // Copy the image data to be detected to the dataHWC array. + // The dataHWC[image_size] array here is the intermediate variable of the input MindSpore model tensor. + float *ptrTmp = reinterpret_cast(matImgPreprocessed.data); + for(int i = 0; i < inputDims.channel * inputDims.width * inputDims.height; i++){ + dataHWC[i] = ptrTmp[i]; + } + + // Assign dataHWC[image_size] to the input tensor variable. + auto msInputs = mSession->GetInputs(); + auto inTensor = msInputs.front(); + memcpy(inTensor->MutableData(), dataHWC, + inputDims.channel * inputDims.width * inputDims.height * sizeof(float)); + delete[] (dataHWC); + ``` + +3. Perform inference on the input tensor based on the model, obtain the output tensor, and perform post-processing. + + - Perform graph execution and on-device inference. + + ```cpp + // After the model and image tensor data is loaded, run inference. + auto status = mSession->RunGraph(); + ``` + + - Obtain the output data. + ```cpp + auto msOutputs = mSession->GetOutputMapByNode(); + std::string retStr = ProcessRunnetResult(msOutputs, ret); + ``` + + - Perform post-processing of the output data. + ```cpp + std::string ProcessRunnetResult(std::unordered_map> msOutputs, + int runnetRet) { + + // Get model output results. + std::unordered_map>::iterator iter; + iter = msOutputs.begin(); + auto brach1_string = iter->first; + auto branch1_tensor = iter->second; + + int OUTPUTS_LEN = branch1_tensor[0]->ElementsNum(); + + float *temp_scores = static_cast(branch1_tensor[0]->MutableData()); + + float scores[RET_CATEGORY_SUM]; + for (int i = 0; i < RET_CATEGORY_SUM; ++i) { + scores[i] = temp_scores[i]; + } + + // Converted to text information that needs to be displayed in the APP. + std::string retStr = ""; + if (runnetRet == 0) { + for (int i = 0; i < RET_CATEGORY_SUM; ++i) { + if (scores[i] > 0.3){ + retStr += g_labels_name_map[i]; + retStr += ":"; + std::string score_str = std::to_string(scores[i]); + retStr += score_str; + retStr += ";"; + } + } + else { + MS_PRINT("MindSpore run net failed!"); + for (int i = 0; i < RET_CATEGORY_SUM; ++i) { + retStr += " :0.0;"; + } + } + return retStr; + } + ``` \ No newline at end of file diff --git a/tutorials/source_en/use/benchmark_tool.md b/tutorials/source_en/use/benchmark_tool.md new file mode 100644 index 0000000000000000000000000000000000000000..78cd34120988addfd60b85883f7f400681160319 --- /dev/null +++ b/tutorials/source_en/use/benchmark_tool.md @@ -0,0 +1,97 @@ +# Benchmark Tool + + + +- [Benchmark Tool](#benchmark-tool) + - [Overview](#overview) + - [Environment Preparation](#environment-preparation) + - [Parameter Description](#parameter-description) + - [Example](#example) + - [Performance Test](#performance-test) + - [Accuracy Test](#accuracy-test) + + + + + +## Overview + +The Benchmark tool is used to perform benchmark testing on a MindSpore Lite model and is implemented using the C++ language. It can not only perform quantitative analysis (performance) on the forward inference execution duration of a MindSpore Lite model, but also perform comparative error analysis (accuracy) based on the output of the specified model. + +## Environment Preparation + +To use the Benchmark tool, you need to prepare the environment as follows: + +- Compilation: Install compilation dependencies and perform compilation. The code of the Benchmark tool is stored in the `mindspore/lite/tools/benchmark` directory of the MindSpore source code. For details about the compilation operations, see the [Environment Requirements](https://www.mindspore.cn/lite/docs/en/master/deploy.html#id2) and [Compilation Example](https://www.mindspore.cn/lite/docs/en/master/deploy.html#id5) in the compilation document. + +- Run: Obtain the `Benchmark` tool and configure environment variables. For details, see [Output Description](https://www.mindspore.cn/lite/docs/zh-CN/master/compile.html#id4) in the compilation document. + +## Parameter Description + +The command used for benchmark testing based on the compiled Benchmark tool is as follows: + +```bash +./benchmark --modelPath= [--accuracyThreshold=] + [--calibDataPath=] [--cpuBindMode=] + [--device=] [--help] [--inDataPath=] + [--inDataType=] [--loopCount=] + [--numThreads=] [--omModelPath=] + [--resizeDims=] [--warmUpLoopCount=] + [--fp16Priority=] +``` + +The following describes the parameters in detail. + +| Parameter | Attribute | Function | Parameter Type | Default Value | Value Range | +| ----------------- | ---- | ------------------------------------------------------------ | ------ | -------- | ---------------------------------- | +| `--modelPath=` | Mandatory | Specifies the file path of the MindSpore Lite model for benchmark testing. | String | Null | - | +| `--accuracyThreshold=` | Optional | Specifies the accuracy threshold. | Float | 0.5 | - | +| `--calibDataPath=` | Optional | Specifies the file path of the benchmark data. The benchmark data, as the comparison output of the tested model, is output from the forward inference of the tested model under other deep learning frameworks using the same input. | String | Null | - | +| `--cpuBindMode=` | Optional | Specifies the type of the CPU core bound to the model inference program. | Integer | 1 | −1: medium core
1: large core
0: not bound | +| `--device=` | Optional | Specifies the type of the device on which the model inference program runs. | String | CPU | CPU or GPU | +| `--help` | Optional | Displays the help information about the `benchmark` command. | - | - | - | +| `--inDataPath=` | Optional | Specifies the file path of the input data of the tested model. If this parameter is not set, a random value will be used. | String | Null | - | +| `--inDataType=` | Optional | Specifies the file type of the input data of the tested model. | String | Bin | Img: The input data is an image. Bin: The input data is a binary file.| +| `--loopCount=` | Optional | Specifies the number of forward inference times of the tested model when the Benchmark tool is used for the benchmark testing. The value is a positive integer. | Integer | 10 | - | +| `--numThreads=` | Optional | Specifies the number of threads for running the model inference program. | Integer | 2 | - | +| `--omModelPath=` | Optional | Specifies the file path of the OM model. This parameter is optional only when the `device` type is NPU. | String | Null | - | +| `--resizeDims=` | Optional | Specifies the size to be adjusted for the input data of the tested model. | String | Null | - | +| `--warmUpLoopCount=` | Optional | Specifies the number of preheating inference times of the tested model before multiple rounds of the benchmark test are executed. | Integer | 3 | - | +| `--fp16Priority=` | Optional | Specifies whether the float16 operator is preferred. | Bool | false | true, false | + +## Example + +When using the Benchmark tool to perform benchmark testing on different MindSpore Lite models, you can set different parameters to implement different test functions. The testing is classified into performance test and accuracy test. + +### Performance Test + +The main test indicator of the performance test performed by the Benchmark tool is the duration of a single forward inference. In a performance test, you do not need to set benchmark data parameters such as `calibDataPath`. For example: + +```bash +./benchmark --modelPath=./models/test_benchmark.ms +``` + +This command uses a random input, and other parameters use default values. After this command is executed, the following statistics are displayed. The statistics include the minimum duration, maximum duration, and average duration of a single inference after the tested model runs for the specified number of inference rounds. + +``` +Model = test_benchmark.ms, numThreads = 2, MinRunTime = 72.228996 ms, MaxRuntime = 73.094002 ms, AvgRunTime = 72.556000 ms +``` + +### Accuracy Test + +The accuracy test performed by the Benchmark tool is to verify the accuracy of the MinSpore model output by setting benchmark data. In an accuracy test, in addition to the `modelPath` parameter, the `calibDataPath` parameter must be set. For example: + +```bash +./benchmark --modelPath=./models/test_benchmark.ms --inDataPath=./input/test_benchmark.bin --device=CPU --accuracyThreshold=3 --calibDataPath=./output/test_benchmark.out +``` + +This command specifies the input data and benchmark data of the tested model, specifies that the model inference program runs on the CPU, and sets the accuracy threshold to 3%. After this command is executed, the following statistics are displayed, including the single input data of the tested model, output result and average deviation rate of the output node, and average deviation rate of all nodes. + +``` +InData0: 139.947 182.373 153.705 138.945 108.032 164.703 111.585 227.402 245.734 97.7776 201.89 134.868 144.851 236.027 18.1142 22.218 5.15569 212.318 198.43 221.853 +================ Comparing Output data ================ +Data of node age_out : 5.94584e-08 6.3317e-08 1.94726e-07 1.91809e-07 8.39805e-08 7.66035e-08 1.69285e-07 1.46246e-07 6.03796e-07 1.77631e-07 1.54343e-07 2.04623e-07 8.89609e-07 3.63487e-06 4.86876e-06 1.23939e-05 3.09981e-05 3.37098e-05 0.000107102 0.000213932 0.000533579 0.00062465 0.00296401 0.00993984 0.038227 0.0695085 0.162854 0.123199 0.24272 0.135048 0.169159 0.0221256 0.013892 0.00502971 0.00134921 0.00135701 0.000383242 0.000163475 0.000136294 9.77864e-05 8.00793e-05 5.73874e-05 3.53858e-05 2.18535e-05 2.04467e-05 1.85286e-05 1.05075e-05 9.34751e-06 6.12732e-06 4.55476e-06 +Mean bias of node age_out : 0% +Mean bias of all nodes: 0% +======================================================= +``` diff --git a/tutorials/source_en/use/converter_tool.md b/tutorials/source_en/use/converter_tool.md new file mode 100644 index 0000000000000000000000000000000000000000..7afc0f0a76e7319183b3500c4656583b03ec1ff4 --- /dev/null +++ b/tutorials/source_en/use/converter_tool.md @@ -0,0 +1,108 @@ +# Converter Tool + + + +- [Model Conversion Tool](#model-conversion-tool) + - [Overview](#overview) + - [Environment Preparation](#environment-preparation) + - [Parameter Description](#parameter-description) + - [Example](#example) + + + + + +## Overview + +MindSpore Lite provides a tool for offline model conversion. It supports conversion of multiple types of models. The converted models can be used for inference. The command line parameters contain multiple personalized options, providing a convenient conversion method for users. + +Currently, the following input formats are supported: MindSpore, TensorFlow Lite, Caffe, and ONNX. + +## Environment Preparation + +To use the MindSpore Lite model conversion tool, you need to prepare the environment as follows: + +- Compilation: Install basic and additional compilation dependencies and perform compilation. The compilation version is x86_64. The code of the model conversion tool is stored in the `mindspore/lite/tools/converter` directory of the MindSpore source code. For details about the compilation operations, see the [Environment Requirements] (https://www.mindspore.cn/lite/docs/zh-CN/master/compile.html#id2) and [Compilation Example] (https://www.mindspore.cn/lite/docs/zh-CN/master/deploy.html#id5) in the compilation document. + +- Run: Obtain the `converter` tool and configure environment variables by referring to [Output Description](https://www.mindspore.cn/lite/docs/zh-CN/master/compile.html#id4) in the compilation document. + +## Parameter Description + +You can use `./converter_lite ` to complete the conversion. In addition, you can set multiple parameters as required. +You can enter `./converter_lite --help` to obtain help information in real time. + +The following describes the parameters in detail. + + +| Parameter | Mandatory or Not | Parameter Description | Value Range | Default Value | +| -------- | ------- | ----- | --- | ---- | +| `--help` | No | Prints all help information. | - | - | +| `--fmk=` | Yes | Original format of the input model. | MS, CAFFE, TFLITE, or ONNX | - | +| `--modelFile=` | Yes | Path of the input model. | - | - | +| `--outputFile=` | Yes | Path of the output model. (If the path does not exist, a directory will be automatically created.) The suffix `.ms` can be automatically generated. | - | - | +| `--weightFile=` | Yes (for Caffe models only) | Path of the weight file of the input model. | - | - | +| `--quantType=` | No | Sets the quant type of the model. | PostTraining: quantization after training
AwareTraining: perceptual quantization | - | +|`--inputInferenceType=` | No(supported by aware quant models only) | Sets the input data type of the converted model. If the type is different from the origin model, the convert tool will insert data type convert op before the model to make sure the input data type is same as the input of origin model. | FLOAT or INT8 | FLOAT | +|`--inferenceType= `| No(supported by aware quant models only) | Sets the output data type of the converted model. If the type is different from the origin model, the convert tool will insert data type convert op before the model to make sure the output data type is same as the input of origin model. | FLOAT or INT8 | FLOAT | +|`--stdDev=`| No(supported by aware quant models only) | Sets the standard deviation of the input data. | (0,+∞) | 128 | +|`--mean=`| No(supported by aware quant models only) | Sets the mean value of the input data. | [-128, 127] | -0.5 | + +> - The parameter name and parameter value are separated by an equal sign (=) and no space is allowed between them. +> - The Caffe model is divided into two files: model structure `*.prototxt`, corresponding to the `--modelFile` parameter; model weight `*.caffemodel`, corresponding to the `--weightFile` parameter + + +## Example + +First, in the root directory of the source code, run the following command to perform compilation. For details, see `compile.md`. +```bash +bash build.sh -I x86_64 +``` +> Currently, the model conversion tool supports only the x86_64 architecture. + +The following describes how to use the conversion command by using several common examples. + +- Take the Caffe model LeNet as an example. Run the following conversion command: + + ```bash + ./converter_lite --fmk=CAFFE --modelFile=lenet.prototxt --weightFile=lenet.caffemodel --outputFile=lenet + ``` + + In this example, the Caffe model is used. Therefore, the model structure and model weight files are required. Two more parameters `fmk` and `outputFile` are also required. + + The output is as follows: + ``` + INFO [converter/converter.cc:190] Runconverter] CONVERTER RESULT: SUCCESS! + ``` + This indicates that the Caffe model is successfully converted into the MindSpore Lite model and the new file `lenet.ms` is generated. + +- The following uses the MindSpore, TensorFlow Lite, ONNX and perception quantization models as examples to describe how to run the conversion command. + + - MindSpore model `model.mindir` + ```bash + ./converter_lite --fmk=MS --modelFile=model.mindir --outputFile=model + ``` + + - TensorFlow Lite model `model.tflite` + ```bash + ./converter_lite --fmk=TFLITE --modelFile=model.tflite --outputFile=model + ``` + + - ONNX model `model.onnx` + ```bash + ./converter_lite --fmk=ONNX --modelFile=model.onnx --outputFile=model + ``` + + - TensorFlow Lite aware quantization model `model_quant.tflite` + ```bash + ./converter_lite --fmk=TFLITE --modelFile=model.tflite --outputFile=model --quantType=AwareTraining + ``` + - TensorFlow Lite aware quantization model `model_quant.tflite` set the input and output data type to be int8 + ```bash + ./converter_lite --fmk=TFLITE --modelFile=model.tflite --outputFile=model --quantType=AwareTraining --inputInferenceType=INT8 --inferenceType=INT8 + ``` + + In the preceding scenarios, the following information is displayed, indicating that the conversion is successful. In addition, the target file `model.ms` is obtained. + ``` + INFO [converter/converter.cc:190] Runconverter] CONVERTER RESULT: SUCCESS! + ``` + \ No newline at end of file diff --git a/tutorials/source_en/use/runtime.md b/tutorials/source_en/use/runtime.md new file mode 100644 index 0000000000000000000000000000000000000000..fe1fa8694aeb3750f199f251f86e68839128dafe --- /dev/null +++ b/tutorials/source_en/use/runtime.md @@ -0,0 +1,3 @@ +# Runtime + + diff --git a/tutorials/source_en/use/timeprofiler_tool.md b/tutorials/source_en/use/timeprofiler_tool.md new file mode 100644 index 0000000000000000000000000000000000000000..506da6162e506367cd09935a050dc254f32de773 --- /dev/null +++ b/tutorials/source_en/use/timeprofiler_tool.md @@ -0,0 +1,93 @@ +# TimeProfiler Tool + + + +- [TimeProfiler Tool](#timeprofiler-tool) + - [Overview](#overview) + - [Environment Preparation](#environment-preparation) + - [Parameter Description](#parameter-description) + - [Example](#example) + + + + + +## Overview + +The TimeProfiler tool can be used to analyze the time consumption of forward inference at the network layer of a MindSpore Lite model. The analysis is implemented using the C++ language. + +## Environment Preparation + +To use the TimeProfiler tool, you need to prepare the environment as follows: + +- Compilation: Install compilation dependencies and perform compilation. The code of the TimeProfiler tool is stored in the `mindspore/lite/tools/time_profiler` directory of the MindSpore source code. For details about the compilation operations, see the [Environment Requirements](https://www.mindspore.cn/lite/docs/en/master/compile.html#id2) and [Compilation Example](https://www.mindspore.cn/lite/docs/en/master/compile.html#id5) in the compilation document. + +- Run: Obtain the `time_profiler` tool and configure environment variables by referring to [Output Description](https://www.mindspore.cn/lite/docs/zh-CN/master/compile.html#id4) in the compilation document. + +## Parameter Description + +The command used for analyzing the time consumption of forward inference at the network layer based on the compiled TimeProfiler tool is as follows: + +```bash +./timeprofiler --modelPath= [--help] [--loopCount=] [--numThreads=] [--cpuBindMode=] [--inDataPath=] [--fp16Priority=] +``` + +The following describes the parameters in detail. + +| Parameter | Attribute | Function | Parameter Type | Default Value | Value Range | +| ----------------- | ---- | ------------------------------------------------------------ | ------ | -------- | ---------------------------------- | +| `--help` | Optional | Displays the help information about the `timeprofiler` command. | - | - | - | +| `--modelPath= ` | Mandatory | Specifies the file path of the MindSpore Lite model for time consumption analysis. | String | Null | - | +| `--loopCount=` | Optional | Specifies the number of times that model inference is executed when the TimeProfiler tool is used for time consumption analysis. The value is a positive integer. | Integer | 100 | - | +| `--numThreads=` | Optional | Specifies the number of threads for running the model inference program. | Integer | 4 | - | +| `--cpuBindMode=` | Optional | Specifies the type of the CPU core bound to the model inference program. | Integer | 1 | −1: medium core
1: large core
0: not bound | +| `--inDataPath=` | Optional | Specifies the file path of the input data of the specified model. If this parameter is not set, a random value will be used. | String | Null | - | +| `--fp16Priority=` | Optional | Specifies whether the float16 operator is preferred. | Bool | false | true, false | + +## Example + +Take the `test_timeprofiler.ms` model as an example and set the number of model inference cycles to 10. The command for using TimeProfiler to analyze the time consumption at the network layer is as follows: + +```bash +./timeprofiler --modelPath=./models/test_timeprofiler.ms --loopCount=10 +``` + +After this command is executed, the TimeProfiler tool outputs the statistics on the running time of the model at the network layer. In this example, the command output is as follows: The statistics are displayed by`opName` and `optype`. `opName` indicates the operator name, `optype` indicates the operator type, and `avg` indicates the average running time of the operator per single run, `percent` indicates the ratio of the operator running time to the total operator running time, `calledTimess` indicates the number of times that the operator is run, and `opTotalTime` indicates the total time that the operator is run for a specified number of times. Finally, `total time` and `kernel cost` show the average time consumed by a single inference operation of the model and the sum of the average time consumed by all operators in the model inference, respectively. + +``` +----------------------------------------------------------------------------------------- +opName avg(ms) percent calledTimess opTotalTime +conv2d_1/convolution 2.264800 0.824012 10 22.648003 +conv2d_2/convolution 0.223700 0.081390 10 2.237000 +dense_1/BiasAdd 0.007500 0.002729 10 0.075000 +dense_1/MatMul 0.126000 0.045843 10 1.260000 +dense_1/Relu 0.006900 0.002510 10 0.069000 +max_pooling2d_1/MaxPool 0.035100 0.012771 10 0.351000 +max_pooling2d_2/MaxPool 0.014300 0.005203 10 0.143000 +max_pooling2d_2/MaxPool_nchw2nhwc_reshape_1/Reshape_0 0.006500 0.002365 10 0.065000 +max_pooling2d_2/MaxPool_nchw2nhwc_reshape_1/Shape_0 0.010900 0.003966 10 0.109000 +output/BiasAdd 0.005300 0.001928 10 0.053000 +output/MatMul 0.011400 0.004148 10 0.114000 +output/Softmax 0.013300 0.004839 10 0.133000 +reshape_1/Reshape 0.000900 0.000327 10 0.009000 +reshape_1/Reshape/shape 0.009900 0.003602 10 0.099000 +reshape_1/Shape 0.002300 0.000837 10 0.023000 +reshape_1/strided_slice 0.009700 0.003529 10 0.097000 +----------------------------------------------------------------------------------------- +opType avg(ms) percent calledTimess opTotalTime +Activation 0.006900 0.002510 10 0.069000 +BiasAdd 0.012800 0.004657 20 0.128000 +Conv2D 2.488500 0.905401 20 24.885004 +MatMul 0.137400 0.049991 20 1.374000 +Nchw2Nhwc 0.017400 0.006331 20 0.174000 +Pooling 0.049400 0.017973 20 0.494000 +Reshape 0.000900 0.000327 10 0.009000 +Shape 0.002300 0.000837 10 0.023000 +SoftMax 0.013300 0.004839 10 0.133000 +Stack 0.009900 0.003602 10 0.099000 +StridedSlice 0.009700 0.003529 10 0.097000 + +total time : 2.90800 ms, kernel cost : 2.74851 ms + +----------------------------------------------------------------------------------------- +``` \ No newline at end of file diff --git a/tutorials/source_zh_cn/compile.md b/tutorials/source_zh_cn/compile.md new file mode 100644 index 0000000000000000000000000000000000000000..c558bb6e089f8caecc7e4f5e056058bfafe15dd1 --- /dev/null +++ b/tutorials/source_zh_cn/compile.md @@ -0,0 +1,228 @@ +# 编译 + + + +- [编译](#编译) + - [Linux环境编译](#linux环境编译) + - [环境要求](#环境要求) + - [编译选项](#编译选项) + - [编译示例](#编译示例) + - [编译输出](#编译输出) + - [模型转换工具converter目录结构说明](#模型转换工具converter目录结构说明) + - [模型推理框架runtime及其他工具目录结构说明](#模型推理框架runtime及其他工具目录结构说明) + - [Windows环境编译](#windows环境编译) + - [环境要求](#环境要求-1) + - [编译选项](#编译选项-1) + - [编译示例](#编译示例-1) + + + + + + +本章节介绍如何在Ubuntu系统上快速编译出MindSpore Lite,其包含的模块如下: + +| 模块 | 支持平台 | 说明 | +| --- | ---- | ---- | +| converter | Linux、Windows | 模型转换工具 | +| runtime | Linux、Android | 模型推理框架 | +| benchmark | Linux、Android | 基准测试工具 | +| time_profiler | Linux、Android | 性能分析工具 | + +## Linux环境编译 + +### 环境要求 + +- 系统环境:Linux x86_64,推荐使用Ubuntu 18.04.02LTS + +- runtime、benchmark、time_profiler编译依赖 + - [CMake](https://cmake.org/download/) >= 3.14.1 + - [GCC](https://gcc.gnu.org/releases.html) >= 7.3.0 + - [Android_NDK](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) >= r20 + - [Git](https://git-scm.com/downloads) >= 2.28.0 + +- converter编译依赖 + - [CMake](https://cmake.org/download/) >= 3.14.1 + - [GCC](https://gcc.gnu.org/releases.html) >= 7.3.0 + - [Android_NDK](https://dl.google.com/android/repository/android-ndk-r20b-linux-x86_64.zip) >= r20 + - [Git](https://git-scm.com/downloads) >= 2.28.0 + - [Autoconf](http://ftp.gnu.org/gnu/autoconf/) >= 2.69 + - [Libtool](https://www.gnu.org/software/libtool/) >= 2.4.6 + - [LibreSSL](http://www.libressl.org/) >= 3.1.3 + - [Automake](https://www.gnu.org/software/automake/) >= 1.11.6 + - [Libevent](https://libevent.org) >= 2.0 + - [M4](https://www.gnu.org/software/m4/m4.html) >= 1.4.18 + - [OpenSSL](https://www.openssl.org/) >= 1.1.1 + +> 编译脚本中会执行`git clone`获取第三方依赖库的代码,请提前确保git的网络设置正确可用。 + +### 编译选项 + +MindSpore Lite提供编译脚本`build.sh`用于一键式编译,位于MindSpore根目录下,该脚本可用于MindSpore训练及推理的编译。下面对MindSpore Lite的编译选项进行说明。 + +| 选项 | 参数说明 | 取值范围 | 是否必选 | +| -------- | ----- | ---- | ---- | +| **-I** | **选择适用架构,编译MindSpore Lite此选项必选** | **arm64、arm32、x86_64** | **是** | +| -d | 设置该参数,则编译Debug版本,否则编译Release版本 | 无 | 否 | +| -i | 设置该参数,则进行增量编译,否则进行全量编译 | 无 | 否 | +| -j[n] | 设定编译时所用的线程数,否则默认设定为8线程 | Integer | 否 | +| -e | 选择除CPU之外的其他内置算子类型,仅在ARM架构下适用,当前仅支持GPU | gpu | 否 | +| -h | 显示编译帮助信息 | 无 | 否 | + +> 在`-I`参数变动时,如`-I x86_64`变为`-I arm64`,添加`-i`参数进行增量编译不生效。 + +### 编译示例 + +首先,在进行编译之前,需从MindSpore代码仓下载源码。 + +```bash +git clone https://gitee.com/mindspore/mindspore.git +``` + +然后,在源码根目录下执行如下命令,可编译不同版本的MindSpore Lite。 + +- 编译x86_64架构Debug版本。 + ```bash + bash build.sh -I x86_64 -d + ``` + +- 编译x86_64架构Release版本,同时设定线程数。 + ```bash + bash build.sh -I x86_64 -j32 + ``` + +- 增量编译ARM64架构Release版本,同时设定线程数。 + ```bash + bash build.sh -I arm64 -i -j32 + ``` + +- 编译ARM64架构Release版本,同时编译内置的GPU算子。 + ```bash + bash build.sh -I arm64 -e gpu + ``` + +### 编译输出 + +编译完成后,进入`mindspore/output/`目录,可查看编译后生成的文件。文件分为两部分: +- `mindspore-lite-{version}-converter-{os}.tar.gz`:包含模型转换工具converter。 +- `mindspore-lite-{version}-runtime-{os}-{device}.tar.gz`:包含模型推理框架runtime、基准测试工具benchmark和性能分析工具time_profiler。 + +> version:输出件版本号,与所编译的分支代码对应的版本一致。 +> +> device:当前分为cpu(内置CPU算子)和gpu(内置CPU和GPU算子)。 +> +> os:输出件应部署的操作系统。 + +执行解压缩命令,获取编译后的输出件: + +```bash +tar -xvf mindspore-lite-{version}-converter-{os}.tar.gz +tar -xvf mindspore-lite-{version}-runtime-{os}-{device}.tar.gz +``` + +#### 模型转换工具converter目录结构说明 + +转换工具仅在`-I x86_64`编译选项下获得,内容包括以下几部分: + +``` +| +├── mindspore-lite-{version}-converter-{os} +│ └── converter # 模型转换工具 +│ └── third_party # 第三方库头文件和库 +│ ├── protobuf # Protobuf的动态库 + +``` + +#### 模型推理框架runtime及其他工具目录结构说明 + +推理框架可在`-I x86_64`、`-I arm64`和`-I arm32`编译选项下获得,内容包括以下几部分: + +- 当编译选项为`-I x86_64`时: + ``` + | + ├── mindspore-lite-{version}-runtime-x86-cpu + │ └── benchmark # 基准测试工具 + │ └── lib # 推理框架动态库 + │ ├── libmindspore-lite.so # MindSpore Lite推理框架的动态库 + │ └── third_party # 第三方库头文件和库 + │ ├── flatbuffers # FlatBuffers头文件 + + ``` + +- 当编译选项为`-I arm64`时: + ``` + | + ├── mindspore-lite-{version}-runtime-arm64-cpu + │ └── benchmark # 基准测试工具 + │ └── lib # 推理框架动态库 + │ ├── libmindspore-lite.so # MindSpore Lite推理框架的动态库 + │ ├── liboptimize.so # MindSpore Lite算子性能优化库 + │ └── third_party # 第三方库头文件和库 + │ ├── flatbuffers # FlatBuffers头文件 + │ └── include # 推理框架头文件 + │ └── time_profiler # 模型网络层耗时分析工具 + + ``` + +- 当编译选项为`-I arm32`时: + ``` + | + ├── mindspore-lite-{version}-runtime-arm64-cpu + │ └── benchmark # 基准测试工具 + │ └── lib # 推理框架动态库 + │ ├── libmindspore-lite.so # MindSpore Lite推理框架的动态库 + │ └── third_party # 第三方库头文件和库 + │ ├── flatbuffers # FlatBuffers头文件 + │ └── include # 推理框架头文件 + │ └── time_profiler # 模型网络层耗时分析工具 + + ``` + +> 1. `liboptimize.so`仅在runtime-arm64的输出包中存在,仅在ARMv8.2和支持fp16特性的CPU上使用。 +> 2. 编译ARM64默认可获得arm64-cpu的推理框架输出件,若添加`-e gpu`则获得arm64-gpu的推理框架输出件,此时包名为`mindspore-lite-{version}-runtime-arm64-gpu.tar.gz`,编译ARM32同理。 +> 3. 运行converter、benchmark或time_profiler目录下的工具前,都需配置环境变量,将MindSpore Lite和Protobuf的动态库所在的路径配置到系统搜索动态库的路径中。以0.7.0-beta版本下编译CPU为例:配置converter:`export LD_LIBRARY_PATH=./mindspore-lite-0.7.0-converter-ubuntu/third_party/protobuf/lib`;配置benchmark和time_profiler:`export LD_LIBRARY_PATH=./mindspore-lite-0.7.0-runtime-x86-cpu/lib` + + +## Windows环境编译 + +### 环境要求 + +- 支持的编译环境为:Windows 10,64位。 + +- 编译依赖 + - [CMake](https://cmake.org/download/) >= 3.14.1 + - [MinGW GCC](https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Personal%20Builds/mingw-builds/7.3.0/threads-posix/seh/x86_64-7.3.0-release-posix-seh-rt_v5-rev0.7z/download) >= 7.3.0 + - [Python](https://www.python.org/) >= 3.7.5 + - [Git](https://git-scm.com/downloads) >= 2.28.0 + +> 编译脚本中会执行`git clone`获取第三方依赖库的代码,请提前确保git的网络设置正确可用。 + +### 编译选项 + +MindSpore Lite的编译选项如下。 + +| 参数 | 参数说明 | 是否必选 | +| -------- | ----- | ---- | +| **lite** | **设置该参数,则对Mindspore Lite工程进行编译** | **是** | +| [n] | 设定编译时所用的线程数,否则默认设定为6线程 | 否 | + +### 编译示例 + +首先,使用git工具从MindSpore代码仓下载源码。 + +```bash +git clone https://gitee.com/mindspore/mindspore.git +``` + +然后,使用cmd工具在源码根目录下,执行如下命令即可编译MindSpore Lite。 + +- 以默认线程数(6线程)编译Windows版本。 + ```bash + call build.bat lite + ``` +- 以指定线程数8编译Windows版本。 + ```bash + call build.bat lite 8 + ``` + +编译完成之后,进入`mindspore/output/`目录,解压后即可获取输出件`mindspore-lite-0.7.0-converter-win-cpu.zip`,其中含有转换工具可执行文件。 diff --git a/tutorials/source_zh_cn/conf.py b/tutorials/source_zh_cn/conf.py index c2d1fb828249e37f44afaa4cda7e45d641784785..c2ba20320a75ee8ae812f03dbc21cb39f30827a8 100644 --- a/tutorials/source_zh_cn/conf.py +++ b/tutorials/source_zh_cn/conf.py @@ -16,9 +16,9 @@ import sys # -- Project information ----------------------------------------------------- -project = 'MindSpore' -copyright = '2020, MindSpore' -author = 'MindSpore' +project = 'MindSpore Lite' +copyright = '2020, MindSpore Lite' +author = 'MindSpore Lite' # The full version, including alpha/beta/rc tags release = 'master' @@ -58,6 +58,4 @@ html_theme = 'sphinx_rtd_theme' html_search_language = 'zh' -html_search_options = {'dict': '../resource/jieba.txt'} - html_static_path = ['_static'] \ No newline at end of file diff --git a/tutorials/source_zh_cn/images/lite_quick_start_app_result.jpg b/tutorials/source_zh_cn/images/lite_quick_start_app_result.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9287aad111992c39145c70f6a473818e31402bc7 Binary files /dev/null and b/tutorials/source_zh_cn/images/lite_quick_start_app_result.jpg differ diff --git a/tutorials/source_zh_cn/images/lite_quick_start_home.png b/tutorials/source_zh_cn/images/lite_quick_start_home.png new file mode 100644 index 0000000000000000000000000000000000000000..c48cf581b33afbc15dbf27be495215b999e1be60 Binary files /dev/null and b/tutorials/source_zh_cn/images/lite_quick_start_home.png differ diff --git a/tutorials/source_zh_cn/images/lite_quick_start_install.png b/tutorials/source_zh_cn/images/lite_quick_start_install.png new file mode 100644 index 0000000000000000000000000000000000000000..cc66708f0633c537e111d65a4b4e8a411a9322af Binary files /dev/null and b/tutorials/source_zh_cn/images/lite_quick_start_install.png differ diff --git a/tutorials/source_zh_cn/images/lite_quick_start_project_structure.png b/tutorials/source_zh_cn/images/lite_quick_start_project_structure.png new file mode 100644 index 0000000000000000000000000000000000000000..ade37a61ef97a479401240215e302011c014824c Binary files /dev/null and b/tutorials/source_zh_cn/images/lite_quick_start_project_structure.png differ diff --git a/tutorials/source_zh_cn/images/lite_quick_start_run_app.PNG b/tutorials/source_zh_cn/images/lite_quick_start_run_app.PNG new file mode 100644 index 0000000000000000000000000000000000000000..2557b6293de5b3d7fefe7f6e58b57c03deabb55d Binary files /dev/null and b/tutorials/source_zh_cn/images/lite_quick_start_run_app.PNG differ diff --git a/tutorials/source_zh_cn/images/lite_quick_start_sdk.png b/tutorials/source_zh_cn/images/lite_quick_start_sdk.png new file mode 100644 index 0000000000000000000000000000000000000000..1fcb8acabc9ba9d289efbe7e82ee5e2da8bfe073 Binary files /dev/null and b/tutorials/source_zh_cn/images/lite_quick_start_sdk.png differ diff --git a/tutorials/source_zh_cn/images/side_infer_process.png b/tutorials/source_zh_cn/images/side_infer_process.png new file mode 100644 index 0000000000000000000000000000000000000000..eb63d0858cbfb92acab10bc62a0ca1ce6a09e512 Binary files /dev/null and b/tutorials/source_zh_cn/images/side_infer_process.png differ diff --git a/tutorials/source_zh_cn/index.rst b/tutorials/source_zh_cn/index.rst index c1551225334e0208d3589dd2bfec89b39b26cbaa..52f49f3366457fc17b653fa30071d5cb67c82963 100644 --- a/tutorials/source_zh_cn/index.rst +++ b/tutorials/source_zh_cn/index.rst @@ -1,88 +1,26 @@ .. MindSpore documentation master file, created by - sphinx-quickstart on Thu Mar 24 09:00:00 2020. + sphinx-quickstart on Thu Aug 17 09:00:00 2020. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -MindSpore教程 -============= +MindSpore端侧教程 +================== .. toctree:: :glob: :maxdepth: 1 :caption: 快速入门 - quick_start/linear_regression + deploy quick_start/quick_start - quick_start/quick_video .. toctree:: :glob: :maxdepth: 1 :caption: 使用指南 - use/data_preparation/data_preparation - use/defining_the_network - use/saving_and_loading_model_parameters - use/multi_platform_inference - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: 应用实践 - - advanced_use/computer_vision_application - advanced_use/nlp_application - advanced_use/second_order_optimizer_for_resnet50_application - advanced_use/synchronization_training_and_evaluation - advanced_use/bert_poetry - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: 模型调优 - - advanced_use/debugging_in_pynative_mode - advanced_use/customized_debugging_information - advanced_use/visualization_tutorials - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: 性能优化 - - advanced_use/distributed_training_tutorials - advanced_use/mixed_precision - advanced_use/graph_kernel_fusion - advanced_use/quantization_aware - advanced_use/gradient_accumulation - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: 推理服务 - - advanced_use/serving - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: 端云使用 - - advanced_use/use_on_the_cloud - advanced_use/on_device_inference - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: 网络迁移 - - advanced_use/network_migration - -.. toctree:: - :glob: - :maxdepth: 1 - :caption: AI安全和隐私 - - advanced_use/model_security - advanced_use/differential_privacy - advanced_use/fuzzer + use/converter_tool + use/runtime + use/benchmark_tool + use/timeprofiler_tool + use/post_training_quantization diff --git a/tutorials/source_zh_cn/quick_start/quick_start.md b/tutorials/source_zh_cn/quick_start/quick_start.md index 2cc7e51cff73c29199ed76c82c50bc28257f15f4..f7bb41de2df45a84d477ab2d61a8fe032d3a24ff 100644 --- a/tutorials/source_zh_cn/quick_start/quick_start.md +++ b/tutorials/source_zh_cn/quick_start/quick_start.md @@ -1,456 +1,329 @@ -# 实现一个图片分类应用 - -`Ascend` `GPU` `CPU` `全流程` `初级` `中级` `高级` +# 快速入门 -- [实现一个图片分类应用](#实现一个图片分类应用) +- [快速入门](#快速入门) - [概述](#概述) - - [准备环节](#准备环节) - - [下载数据集](#下载数据集) - - [导入Python库&模块](#导入python库模块) - - [配置运行信息](#配置运行信息) - - [数据处理](#数据处理) - - [定义数据集及数据操作](#定义数据集及数据操作) - - [定义网络](#定义网络) - - [定义损失函数及优化器](#定义损失函数及优化器) - - [基本概念](#基本概念) - - [定义损失函数](#定义损失函数) - - [定义优化器](#定义优化器) - - [训练网络](#训练网络) - - [配置模型保存](#配置模型保存) - - [配置训练网络](#配置训练网络) - - [运行并查看结果](#运行并查看结果) - - [验证模型](#验证模型) + - [选择模型](#选择模型) + - [转换模型](#转换模型) + - [部署应用](#部署应用) + - [运行依赖](#运行依赖) + - [构建与运行](#构建与运行) + - [示例程序详细说明](#示例程序详细说明) + - [示例程序结构](#示例程序结构) + - [配置MindSpore Lite依赖项](#配置mindspore-lite依赖项) + - [下载及部署模型文件](#下载及部署模型文件) + - [编写端侧推理代码](#编写端侧推理代码) - -   - + ## 概述 -下面我们通过一个实际样例,带领大家体验MindSpore基础的功能,对于一般的用户而言,完成整个样例实践会持续20~30分钟。 - -本例子会实现一个简单的图片分类的功能,整体流程如下: -1. 处理需要的数据集,这里使用了MNIST数据集。 -2. 定义一个网络,这里我们使用LeNet网络。 -3. 定义损失函数和优化器。 -4. 加载数据集并进行训练,训练完成后,查看结果及保存模型文件。 -5. 加载保存的模型,进行推理。 -6. 验证模型,加载测试数据集和训练后的模型,验证结果精度。 - -> 你可以在这里找到完整可运行的样例代码: 。 - - -这是简单、基础的应用流程,其他高级、复杂的应用可以基于这个基本流程进行扩展。 - -## 准备环节 - -在动手进行实践之前,确保,你已经正确安装了MindSpore。如果没有,可以通过[MindSpore安装页面](https://www.mindspore.cn/install)将MindSpore安装在你的电脑当中。 - -同时希望你拥有Python编码基础和概率、矩阵等基础数学知识。 - -那么接下来,就开始MindSpore的体验之旅吧。 - -### 下载数据集 +我们推荐你从端侧Android图像分类demo入手,了解MindSpore Lite应用工程的构建、依赖项配置以及相关API的使用。 + +本教程基于MindSpore团队提供的Android“端侧图像分类”示例程序,演示了端侧部署的流程。 +1. 选择图像分类模型。 +2. 将模型转换成MindSpore Lite模型格式。 +3. 在端侧使用MindSpore Lite推理模型。详细说明如何在端侧利用MindSpore Lite C++ API(Android JNI)和MindSpore Lite图像分类模型完成端侧推理,实现对设备摄像头捕获的内容进行分类,并在APP图像预览界面中,显示出最可能的分类结果。 + +> 你可以在这里找到[Android图像分类模型](https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite)和[示例代码](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/lite/image_classification)。 -我们示例中用到的`MNIST`数据集是由10类28*28的灰度图片组成,训练数据集包含60000张图片,测试数据集包含10000张图片。 +## 选择模型 -> MNIST数据集下载页面:。页面提供4个数据集下载链接,其中前2个文件是训练数据需要,后2个文件是测试结果需要。 - -将数据集下载并解压到本地路径下,这里将数据集解压分别存放到工作区的`./MNIST_Data/train`、`./MNIST_Data/test`路径下。 - -目录结构如下: - -``` -└─MNIST_Data - ├─test - │ t10k-images.idx3-ubyte - │ t10k-labels.idx1-ubyte - │ - └─train - train-images.idx3-ubyte - train-labels.idx1-ubyte -``` -> 为了方便样例使用,我们在样例脚本中添加了自动下载数据集的功能。 - -### 导入Python库&模块 - -在使用前,需要导入需要的Python库。 - -目前使用到`os`库,为方便理解,其他需要的库,我们在具体使用到时再说明。 - - -```python -import os -``` +MindSpore团队提供了一系列预置终端模型,你可以在应用程序中使用这些预置的终端模型。 +MindSpore Model Zoo中图像分类模型可[在此下载]((https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetv2.ms))。 +同时,你也可以使用预置模型做迁移学习,以实现自己的图像分类任务。 -详细的MindSpore的模块说明,可以在[MindSpore API页面](https://www.mindspore.cn/api/zh-CN/master/index.html)中搜索查询。 +## 转换模型 -### 配置运行信息 - -在正式编写代码前,需要了解MindSpore运行所需要的硬件、后端等基本信息。 - -可以通过`context.set_context`来配置运行需要的信息,譬如运行模式、后端信息、硬件等信息。 - -导入`context`模块,配置运行需要的信息。 - -```python -import argparse -from mindspore import context - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description='MindSpore LeNet Example') - parser.add_argument('--device_target', type=str, default="CPU", choices=['Ascend', 'GPU', 'CPU'], - help='device where the code will be implemented (default: CPU)') - args = parser.parse_args() - context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target) - dataset_sink_mode = not args.device_target == "CPU" - ... -``` - -在样例中我们配置样例运行使用图模式。根据实际情况配置硬件信息,譬如代码运行在Ascend AI处理器上,则`--device_target`选择`Ascend`,代码运行在CPU、GPU同理。详细参数说明,请参见`context.set_context`接口说明。 - -## 数据处理 - -数据集对于训练非常重要,好的数据集可以有效提高训练精度和效率。在加载数据集前,我们通常会对数据集进行一些处理。 - -### 定义数据集及数据操作 - -我们定义一个函数`create_dataset`来创建数据集。在这个函数中,我们定义好需要进行的数据增强和处理操作: - -1. 定义数据集。 -2. 定义进行数据增强和处理所需要的一些参数。 -3. 根据参数,生成对应的数据增强操作。 -4. 使用`map`映射函数,将数据操作应用到数据集。 -5. 对生成的数据集进行处理。 - -```python -import mindspore.dataset as ds -import mindspore.dataset.transforms.c_transforms as C -import mindspore.dataset.transforms.vision.c_transforms as CV -from mindspore.dataset.transforms.vision import Inter -from mindspore.common import dtype as mstype - -def create_dataset(data_path, batch_size=32, repeat_size=1, - num_parallel_workers=1): - """ create dataset for train or test - Args: - data_path: Data path - batch_size: The number of data records in each group - repeat_size: The number of replicated data records - num_parallel_workers: The number of parallel workers - """ - # define dataset - mnist_ds = ds.MnistDataset(data_path) - - # define operation parameters - resize_height, resize_width = 32, 32 - rescale = 1.0 / 255.0 - shift = 0.0 - rescale_nml = 1 / 0.3081 - shift_nml = -1 * 0.1307 / 0.3081 - - # define map operations - resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) # resize images to (32, 32) - rescale_nml_op = CV.Rescale(rescale_nml, shift_nml) # normalize images - rescale_op = CV.Rescale(rescale, shift) # rescale images - hwc2chw_op = CV.HWC2CHW() # change shape from (height, width, channel) to (channel, height, width) to fit network. - type_cast_op = C.TypeCast(mstype.int32) # change data type of label to int32 to fit network - - # apply map operations on images - mnist_ds = mnist_ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=resize_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers) - mnist_ds = mnist_ds.map(input_columns="image", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers) - - # apply DatasetOps - buffer_size = 10000 - mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script - mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True) - mnist_ds = mnist_ds.repeat(repeat_size) - - return mnist_ds +如果预置模型已经满足你要求,请跳过本章节。 如果你需要对MindSpore提供的模型进行重训,重训完成后,需要将模型导出为[.mindir格式](https://www.mindspore.cn/tutorial/zh-CN/master/use/saving_and_loading_model_parameters.html#mindir)。然后使用MindSpore Lite[模型转换工具](https://www.mindspore.cn/lite/tutorial/zh-CN/master/use/converter_tool.html)将.mindir模型转换成.ms格式。 +以mobilenetv2模型为例,如下脚本将其转换为MindSpore Lite模型用于端侧推理。 +```bash +./converter_lite --fmk=MS --modelFile=mobilenetv2.mindir --outputFile=mobilenetv2.ms ``` -其中, -`batch_size`:每组包含的数据个数,现设置每组包含32个数据。 -`repeat_size`:数据集复制的数量。 +## 部署应用 -先进行shuffle、batch操作,再进行repeat操作,这样能保证1个epoch内数据不重复。 +接下来介绍如何构建和执行mindspore Lite端侧图像分类任务。 -> MindSpore支持进行多种数据处理和增强的操作,各种操作往往组合使用,具体可以参考[数据处理与数据增强](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/data_processing_and_augmentation.html)章节。 +### 运行依赖 +- Android Studio >= 3.2 (推荐4.0以上版本) +- NDK 21.3 +- CMake 3.10.2 +- Android SDK >= 26 +- OpenCV >= 4.0.0 (本示例代码已包含) -## 定义网络 +### 构建与运行 -我们选择相对简单的LeNet网络。LeNet网络不包括输入层的情况下,共有7层:2个卷积层、2个下采样层(池化层)、3个全连接层。每层都包含不同数量的训练参数,如下图所示: +1. 在Android Studio中加载本示例源码,并安装相应的SDK(指定SDK版本后,由Android Studio自动安装)。 -![LeNet-5](./images/LeNet_5.jpg) - -> 更多的LeNet网络的介绍不在此赘述,希望详细了解LeNet网络,可以查询。 + ![start_home](../images/lite_quick_start_home.png) -我们需要对全连接层以及卷积层进行初始化。 + 启动Android Studio后,点击`File->Settings->System Settings->Android SDK`,勾选相应的SDK。如下图所示,勾选后,点击`OK`,Android Studio即可自动安装SDK。 -`TruncatedNormal`:参数初始化方法,MindSpore支持`TruncatedNormal`、`Normal`、`Uniform`等多种参数初始化方法,具体可以参考MindSpore API的`mindspore.common.initializer`模块说明。 + ![start_sdk](../images/lite_quick_start_sdk.png) -初始化示例代码如下: + (可选)若安装时出现NDK版本问题,可手动下载相应的[NDK版本](https://developer.android.com/ndk/downloads?hl=zh-cn)(本示例代码使用的NDK版本为21.3),并在`Project Structure`的`Android NDK location`设置中指定SDK的位置。 -```python -import mindspore.nn as nn -from mindspore.common.initializer import TruncatedNormal + ![project_structure](../images/lite_quick_start_project_structure.png) -def weight_variable(): - """ - weight initial - """ - return TruncatedNormal(0.02) +2. 连接Android设备,运行图像分类应用程序。 -def conv(in_channels, out_channels, kernel_size, stride=1, padding=0): - """ - conv layer weight initial - """ - weight = weight_variable() - return nn.Conv2d(in_channels, out_channels, - kernel_size=kernel_size, stride=stride, padding=padding, - weight_init=weight, has_bias=False, pad_mode="valid") + 通过USB连接Android设备调试,点击`Run 'app'`即可在你的设备上运行本示例项目。 -def fc_with_initialize(input_channels, out_channels): - """ - fc layer weight initial - """ - weight = weight_variable() - bias = weight_variable() - return nn.Dense(input_channels, out_channels, weight, bias) -``` + ![run_app](../images/lite_quick_start_run_app.PNG) -使用MindSpore定义神经网络需要继承`mindspore.nn.cell.Cell`。`Cell`是所有神经网络(`Conv2d`等)的基类。 - -神经网络的各层需要预先在`__init__`方法中定义,然后通过定义`construct`方法来完成神经网络的前向构造。按照LeNet的网络结构,定义网络各层如下: - -```python -class LeNet5(nn.Cell): - """ - Lenet network structure - """ - #define the operator required - def __init__(self): - super(LeNet5, self).__init__() - self.conv1 = conv(1, 6, 5) - self.conv2 = conv(6, 16, 5) - self.fc1 = fc_with_initialize(16 * 5 * 5, 120) - self.fc2 = fc_with_initialize(120, 84) - self.fc3 = fc_with_initialize(84, 10) - self.relu = nn.ReLU() - self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) - self.flatten = nn.Flatten() - - #use the preceding operators to construct networks - def construct(self, x): - x = self.conv1(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.conv2(x) - x = self.relu(x) - x = self.max_pool2d(x) - x = self.flatten(x) - x = self.fc1(x) - x = self.relu(x) - x = self.fc2(x) - x = self.relu(x) - x = self.fc3(x) - return x -``` + Android Studio连接设备调试操作,可参考。 -## 定义损失函数及优化器 +3. 在Android设备上,点击“继续安装”,安装完即可查看到设备摄像头捕获的内容和推理结果。 -### 基本概念 + ![install](../images/lite_quick_start_install.png) -在进行定义之前,先简单介绍损失函数及优化器的概念。 + 识别结果如下图所示。 -- 损失函数:又叫目标函数,用于衡量预测值与实际值差异的程度。深度学习通过不停地迭代来缩小损失函数的值。定义一个好的损失函数,可以有效提高模型的性能。 -- 优化器:用于最小化损失函数,从而在训练过程中改进模型。 + ![result](../images/lite_quick_start_app_result.jpg) -定义了损失函数后,可以得到损失函数关于权重的梯度。梯度用于指示优化器优化权重的方向,以提高模型性能。 -### 定义损失函数 +## 示例程序详细说明 -MindSpore支持的损失函数有`SoftmaxCrossEntropyWithLogits`、`L1Loss`、`MSELoss`等。这里使用`SoftmaxCrossEntropyWithLogits`损失函数。 +本端侧图像分类Android示例程序分为JAVA层和JNI层,其中,JAVA层主要通过Android Camera 2 API实现摄像头获取图像帧,以及相应的图像处理等功能;JNI层在[Runtime](https://www.mindspore.cn/lite/tutorial/zh-CN/master/use/runtime.html)中完成模型推理的过程。 -```python -from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits -``` +> 此处详细说明示例程序的JNI层实现,JAVA层运用Android Camera 2 API实现开启设备摄像头以及图像帧处理等功能,需读者具备一定的Android开发基础知识。 -在`__main__`函数中调用定义好的损失函数: +### 示例程序结构 -```python -if __name__ == "__main__": - ... - #define the loss function - net_loss = SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction='mean') - ... ``` - -### 定义优化器 - -MindSpore支持的优化器有`Adam`、`AdamWeightDecay`、`Momentum`等。 - -这里使用流行的`Momentum`优化器。 - -```python -if __name__ == "__main__": - ... - #learning rate setting - lr = 0.01 - momentum = 0.9 - #create the network - network = LeNet5() - #define the optimizer - net_opt = nn.Momentum(network.trainable_params(), lr, momentum) - ... +app +| +├── libs # 存放MindSpore Lite依赖的库文件 +│ └── arm64-v8a +│ ├── libopencv_java4.so +│ └── libmindspore-lite.so +│ +├── opencv # opencv 相关依赖文件 +│ └── ... +| +├── src/main +│ ├── assets # 资源文件 +| | └── model.ms # 存放模型文件 +│ | +│ ├── cpp # 模型加载和预测主要逻辑封装类 +| | ├── .. +| | ├── MindSporeNetnative.cpp # MindSpore调用相关的JNI方法 +│ | └── MindSporeNetnative.h # 头文件 +│ | +│ ├── java # java层应用代码 +│ │ └── com.huawei.himindsporedemo +│ │ ├── gallery.classify # 图像处理及MindSpore JNI调用相关实现 +│ │ │ └── ... +│ │ └── obejctdetect # 开启摄像头及绘制相关实现 +│ │ └── ... +│ │ +│ ├── res # 存放Android相关的资源文件 +│ └── AndroidManifest.xml # Android配置文件 +│ +├── CMakeList.txt # cmake编译入口文件 +│ +├── build.gradle # 其他Android配置文件 +└── ... ``` -## 训练网络 +### 配置MindSpore Lite依赖项 -### 配置模型保存 +Android JNI层调用MindSpore C++ API时,需要相关库文件支持。可通过MindSpore Lite[源码编译](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html)生成`libmindspore-lite.so`库文件。 -MindSpore提供了callback机制,可以在训练过程中执行自定义逻辑,这里使用框架提供的`ModelCheckpoint`为例。 -`ModelCheckpoint`可以保存网络模型和参数,以便进行后续的fine-tuning(微调)操作。 +本示例中,bulid过程由download.gradle文件配置自动下载`libmindspore-lite.so`以及OpenCV的`libopencv_java4.so`库文件,并放置在`app/libs/arm64-v8a`目录下。 -```python -from mindspore.train.callback import ModelCheckpoint, CheckpointConfig +注: 若自动下载失败,请手动下载相关库文件并将其放在对应位置: -if __name__ == "__main__": - ... - # set parameters of check point - config_ck = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=10) - # apply parameters of check point - ckpoint_cb = ModelCheckpoint(prefix="checkpoint_lenet", config=config_ck) - ... -``` +libmindspore-lite.so [下载链接](https://download.mindspore.cn/model_zoo/official/lite/lib/mindspore%20version%200.7/libmindspore-lite.so) -### 配置训练网络 +libmindspore-lite include文件 [下载链接](https://download.mindspore.cn/model_zoo/official/lite/lib/mindspore%20version%200.7/include.zip) -通过MindSpore提供的`model.train`接口可以方便地进行网络的训练。`LossMonitor`可以监控训练过程中`loss`值的变化。 -这里把`epoch_size`设置为1,对数据集进行1个迭代的训练。 +libopencv_java4.so [下载链接](https://download.mindspore.cn/model_zoo/official/lite/lib/opencv%204.4.0/libopencv_java4.so) +libopencv include文件 [下载链接](https://download.mindspore.cn/model_zoo/official/lite/lib/opencv%204.4.0/include.zip) -```python -from mindspore.nn.metrics import Accuracy -from mindspore.train.callback import LossMonitor -from mindspore.train import Model -... -def train_net(args, model, epoch_size, mnist_path, repeat_size, ckpoint_cb, sink_mode): - """define the training method""" - print("============== Starting Training ==============") - #load training dataset - ds_train = create_dataset(os.path.join(mnist_path, "train"), 32, repeat_size) - model.train(epoch_size, ds_train, callbacks=[ckpoint_cb, LossMonitor()], dataset_sink_mode=sink_mode) -... -if __name__ == "__main__": - ... - - epoch_size = 1 - mnist_path = "./MNIST_Data" - repeat_size = 1 - model = Model(network, net_loss, net_opt, metrics={"Accuracy": Accuracy()}) - train_net(args, model, epoch_size, mnist_path, repeat_size, ckpoint_cb, dataset_sink_mode) - ... ``` -其中, -在`train_net`方法中,我们加载了之前下载的训练数据集,`mnist_path`是MNIST数据集路径。 - -## 运行并查看结果 - -使用以下命令运行脚本: +android{ + defaultConfig{ + externalNativeBuild{ + cmake{ + arguments "-DANDROID_STL=c++_shared" + } + } + + ndk{ + abiFilters'armeabi-v7a', 'arm64-v8a' + } + } +} ``` -python lenet.py --device_target=CPU -``` -其中, -`lenet.py`:为你根据教程编写的脚本文件。 -`--device_target CPU`:指定运行硬件平台,参数为`CPU`、`GPU`或者`Ascend`,根据你的实际运行硬件平台来指定。 -训练过程中会打印loss值,类似下图。loss值会波动,但总体来说loss值会逐步减小,精度逐步提高。每个人运行的loss值有一定随机性,不一定完全相同。 -训练过程中loss打印示例如下: +在`app/CMakeLists.txt`文件中建立`.so`库文件链接,如下所示。 -```bash -... -epoch: 1 step: 262, loss is 1.9212162 -epoch: 1 step: 263, loss is 1.8498616 -epoch: 1 step: 264, loss is 1.7990671 -epoch: 1 step: 265, loss is 1.9492403 -epoch: 1 step: 266, loss is 2.0305142 -epoch: 1 step: 267, loss is 2.0657792 -epoch: 1 step: 268, loss is 1.9582214 -epoch: 1 step: 269, loss is 0.9459006 -epoch: 1 step: 270, loss is 0.8167224 -epoch: 1 step: 271, loss is 0.7432692 -... -``` - -训练完后,即保存的模型文件,示例如下: - -```bash -checkpoint_lenet-1_1875.ckpt ``` - -其中, -`checkpoint_lenet-1_1875.ckpt`:指保存的模型参数文件。名称具体含义checkpoint_*网络名称*-*第几个epoch*_*第几个step*.ckpt。 - -## 验证模型 - -在得到模型文件后,通过模型运行测试数据集得到的结果,验证模型的泛化能力。 - -1. 使用`model.eval`接口读入测试数据集。 -2. 使用保存后的模型参数进行推理。 - -```python -from mindspore.train.serialization import load_checkpoint, load_param_into_net - -... -def test_net(args,network,model,mnist_path): - """define the evaluation method""" - print("============== Starting Testing ==============") - #load the saved model for evaluation - param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt") - #load parameter to the network - load_param_into_net(network, param_dict) - #load testing dataset - ds_eval = create_dataset(os.path.join(mnist_path, "test")) - acc = model.eval(ds_eval, dataset_sink_mode=False) - print("============== Accuracy:{} ==============".format(acc)) - -if __name__ == "__main__": +# Set MindSpore Lite Dependencies. +include_directories(${CMAKE_SOURCE_DIR}/src/main/cpp/include/MindSpore) +add_library(mindspore-lite SHARED IMPORTED ) +set_target_properties(mindspore-lite PROPERTIES + IMPORTED_LOCATION "${CMAKE_SOURCE_DIR}/libs/libmindspore-lite.so") + +# Set OpenCV Dependecies. +include_directories(${CMAKE_SOURCE_DIR}/opencv/sdk/native/jni/include) +add_library(lib-opencv SHARED IMPORTED ) +set_target_properties(lib-opencv PROPERTIES + IMPORTED_LOCATION "${CMAKE_SOURCE_DIR}/libs/libopencv_java4.so") + +# Link target library. +target_link_libraries( ... - test_net(args, network, model, mnist_path) -``` - -其中, -`load_checkpoint`:通过该接口加载CheckPoint模型参数文件,返回一个参数字典。 -`checkpoint_lenet-1_1875.ckpt`:之前保存的CheckPoint模型文件名称。 -`load_param_into_net`:通过该接口把参数加载到网络中。 - - -使用运行命令,运行你的代码脚本。 -```bash -python lenet.py --device_target=CPU -``` -其中, -`lenet.py`:为你根据教程编写的脚本文件。 -`--device_target CPU`:指定运行硬件平台,参数为`CPU`、`GPU`或者`Ascend`,根据你的实际运行硬件平台来指定。 - -运行结果示例如下: - -``` -... -============== Starting Testing ============== -============== Accuracy:{'Accuracy': 0.9742588141025641} ============== + mindspore-lite + lib-opencv + ... +) ``` -可以在打印信息中看出模型精度数据,示例中精度数据达到97.4%,模型质量良好。 +### 下载及部署模型文件 + +从MindSpore Model Hub中下载模型文件,本示例程序中使用的终端图像分类模型文件为`mobilenetv2.ms`,同样通过`download.gradle`脚本在APP构建时自动下载,并放置在`app/src/main/assets`工程目录下。 + +注:若下载失败请手工下载模型文件,mobilenetv2.ms [下载链接](https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/mobilenetv2.ms) + +### 编写端侧推理代码 + +在JNI层调用MindSpore Lite C++ API实现端测推理。 + +推理代码流程如下,完整代码请参见`src/cpp/MindSporeNetnative.cpp`。 + +1. 加载MindSpore Lite模型文件,构建上下文、会话以及用于推理的计算图。 + + - 加载模型文件:创建并配置用于模型推理的上下文 + ```cpp + // Buffer is the model data passed in by the Java layer + jlong bufferLen = env->GetDirectBufferCapacity(buffer); + char *modelBuffer = CreateLocalModelBuffer(env, buffer); + ``` + + - 创建会话 + ```cpp + void **labelEnv = new void *; + MSNetWork *labelNet = new MSNetWork; + *labelEnv = labelNet; + + // Create context. + lite::Context *context = new lite::Context; + context->device_ctx_.type = lite::DT_CPU; + context->thread_num_ = numThread; //Specify the number of threads to run inference + + // Create the mindspore session. + labelNet->CreateSessionMS(modelBuffer, bufferLen, "device label", context); + delete(context); + + ``` + + - 加载模型文件并构建用于推理的计算图 + ```cpp + void MSNetWork::CreateSessionMS(char* modelBuffer, size_t bufferLen, std::string name, mindspore::lite::Context* ctx) + { + CreateSession(modelBuffer, bufferLen, ctx); + session = mindspore::session::LiteSession::CreateSession(ctx); + auto model = mindspore::lite::Model::Import(modelBuffer, bufferLen); + int ret = session->CompileGraph(model); + } + ``` + +2. 将输入图片转换为传入MindSpore模型的Tensor格式。 + + 将待检测图片数据转换为输入MindSpore模型的Tensor。 + + ```cpp + // Convert the Bitmap image passed in from the JAVA layer to Mat for OpenCV processing + BitmapToMat(env, srcBitmap, matImageSrc); + // Processing such as zooming the picture size. + matImgPreprocessed = PreProcessImageData(matImageSrc); + + ImgDims inputDims; + inputDims.channel = matImgPreprocessed.channels(); + inputDims.width = matImgPreprocessed.cols; + inputDims.height = matImgPreprocessed.rows; + float *dataHWC = new float[inputDims.channel * inputDims.width * inputDims.height] + + // Copy the image data to be detected to the dataHWC array. + // The dataHWC[image_size] array here is the intermediate variable of the input MindSpore model tensor. + float *ptrTmp = reinterpret_cast(matImgPreprocessed.data); + for(int i = 0; i < inputDims.channel * inputDims.width * inputDims.height; i++){ + dataHWC[i] = ptrTmp[i]; + } + + // Assign dataHWC[image_size] to the input tensor variable. + auto msInputs = mSession->GetInputs(); + auto inTensor = msInputs.front(); + memcpy(inTensor->MutableData(), dataHWC, + inputDims.channel * inputDims.width * inputDims.height * sizeof(float)); + delete[] (dataHWC); + ``` + +3. 对输入Tensor按照模型进行推理,获取输出Tensor,并进行后处理。 + + - 图执行,端测推理。 + + ```cpp + // After the model and image tensor data is loaded, run inference. + auto status = mSession->RunGraph(); + ``` + + - 获取输出数据。 + ```cpp + auto msOutputs = mSession->GetOutputMapByNode(); + std::string retStr = ProcessRunnetResult(msOutputs, ret); + ``` + + - 输出数据的后续处理。 + ```cpp + std::string ProcessRunnetResult(std::unordered_map> msOutputs, + int runnetRet) { + + // Get model output results. + std::unordered_map>::iterator iter; + iter = msOutputs.begin(); + auto brach1_string = iter->first; + auto branch1_tensor = iter->second; + + int OUTPUTS_LEN = branch1_tensor[0]->ElementsNum(); + + float *temp_scores = static_cast(branch1_tensor[0]->MutableData()); + float scores[RET_CATEGORY_SUM]; + for (int i = 0; i < RET_CATEGORY_SUM; ++i) { + scores[i] = temp_scores[i]; + } + + // Converted to text information that needs to be displayed in the APP. + std::string retStr = ""; + if (runnetRet == 0) { + for (int i = 0; i < RET_CATEGORY_SUM; ++i) { + if (scores[i] > 0.3){ + retStr += g_labels_name_map[i]; + retStr += ":"; + std::string score_str = std::to_string(scores[i]); + retStr += score_str; + retStr += ";"; + } + } + else { + MS_PRINT("MindSpore run net failed!"); + for (int i = 0; i < RET_CATEGORY_SUM; ++i) { + retStr += " :0.0;"; + } + } + + return retStr; + } + ``` diff --git a/tutorials/source_zh_cn/use/benchmark_tool.md b/tutorials/source_zh_cn/use/benchmark_tool.md new file mode 100644 index 0000000000000000000000000000000000000000..d7c86f7a03425fbb0ed1babbd2c48de111c00e2b --- /dev/null +++ b/tutorials/source_zh_cn/use/benchmark_tool.md @@ -0,0 +1,97 @@ +# Benchmark工具 + + + +- [Benchmark工具](#benchmark工具) + - [概述](#概述) + - [环境准备](#环境准备) + - [参数说明](#参数说明) + - [使用示例](#使用示例) + - [性能测试](#性能测试) + - [精度测试](#精度测试) + + + + + +## 概述 + +Benchmark工具是一款可以对MindSpore Lite模型进行基准测试的工具,由C++语言编码实现。它不仅可以对MindSpore Lite模型前向推理执行耗时进行定量分析(性能),还可以通过指定模型输出进行可对比的误差分析(精度)。 + +## 环境准备 + +使用Benchmark工具,需要进行如下环境准备工作。 + +- 编译:Benchmark工具代码在MindSpore源码的`mindspore/lite/tools/benchmark`目录中,参考部署文档中的[环境要求](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html#id2)和[编译示例](https://www.mindspore.cn/lite/tutorial/zh-CN/master/deploy.html#id5),安装编译依赖基本项,并执行编译。 + +- 运行:参考部署文档中的[输出件说明](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html#id4),获得`benchmark`工具,并配置环境变量。 + +## 参数说明 + +使用编译好的Benchmark工具进行模型的基准测试时,其命令格式如下所示。 + +```bash +./benchmark --modelPath= [--accuracyThreshold=] + [--calibDataPath=] [--cpuBindMode=] + [--device=] [--help] [--inDataPath=] + [--inDataType=] [--loopCount=] + [--numThreads=] [--omModelPath=] + [--resizeDims=] [--warmUpLoopCount=] + [--fp16Priority=] +``` + +下面提供详细的参数说明。 + +| 参数名 | 属性 | 功能描述 | 参数类型 | 默认值 | 取值范围 | +| ----------------- | ---- | ------------------------------------------------------------ | ------ | -------- | ---------------------------------- | +| `--modelPath=` | 必选 | 指定需要进行基准测试的MindSpore Lite模型文件路径。 | String | null | - | +| `--accuracyThreshold=` | 可选 | 指定准确度阈值。 | Float | 0.5 | - | +| `--calibDataPath=` | 可选 | 指定标杆数据的文件路径。标杆数据作为该测试模型的对比输出,是该测试模型使用相同输入并由其它深度学习框架前向推理而来。 | String | null | - | +| `--cpuBindMode=` | 可选 | 指定模型推理程序运行时绑定的CPU核类型。 | Integer | 1 | -1:表示中核
1:表示大核
0:表示不绑定 | +| `--device=` | 可选 | 指定模型推理程序运行的设备类型。 | String | CPU | CPU、GPU | +| `--help` | 可选 | 显示`benchmark`命令的帮助信息。 | - | - | - | +| `--inDataPath=` | 可选 | 指定测试模型输入数据的文件路径。如果未设置,则使用随机输入。 | String | null | - | +| `--inDataType=` | 可选 | 指定测试模型输入数据的文件类型。 | String | bin | img:表示输入数据的文件类型为图片
bin:表示输入数据的类型为二进制文件 | +| `--loopCount=` | 可选 | 指定Benchmark工具进行基准测试时,测试模型的前向推理运行次数,其值为正整数。 | Integer | 10 | - | +| `--numThreads=` | 可选 | 指定模型推理程序运行的线程数。 | Integer | 2 | - | +| `--omModelPath=` | 可选 | 指定OM模型的文件路径,此参数仅当`device`类型为NPU时可选设置。 | String | null | - | +| `--resizeDims=` | 可选 | 指定测试模型输入数据需要调整的尺寸大小。 | String | null | - | +| `--warmUpLoopCount=` | 可选 | 指定测试模型在执行基准测试运行轮数前进行的模型预热推理次数。 | Integer | 3 | - | +| `--fp16Priority=` | 可选 | 指定是否优先使用float16算子。 | Bool | false | true, false | + +## 使用示例 + +对于不同的MindSpore Lite模型,在使用Benchmark工具对其进行基准测试时,可通过设置不同的参数,实现对其不同的测试功能。主要分为性能测试和精度测试。 + +### 性能测试 + +Benchmark工具进行的性能测试主要的测试指标为模型单次前向推理的耗时。在性能测试任务中,不需要设置`calibDataPath`等标杆数据参数。例如: + +```bash +./benchmark --modelPath=./models/test_benchmark.ms +``` + +这条命令使用随机输入,其他参数使用默认值。该命令执行后会输出如下统计信息,该信息显示了测试模型在运行指定推理轮数后所统计出的单次推理最短耗时、单次推理最长耗时和平均推理耗时。 + +``` +Model = test_benchmark.ms, numThreads = 2, MinRunTime = 72.228996 ms, MaxRuntime = 73.094002 ms, AvgRunTime = 72.556000 ms +``` + +### 精度测试 + +Benchmark工具进行的精度测试主要是通过设置标杆数据来对比验证MindSpore Lite模型输出的精确性。在精确度测试任务中,除了需要设置`modelPath`参数以外,还必须设置`calibDataPath`参数。例如: + +```bash +./benchmark --modelPath=./models/test_benchmark.ms --inDataPath=./input/test_benchmark.bin --device=CPU --accuracyThreshold=3 --calibDataPath=./output/test_benchmark.out +``` + +这条命令指定了测试模型的输入数据、标杆数据,同时指定了模型推理程序在CPU上运行,并指定了准确度阈值为3%。该命令执行后会输出如下统计信息,该信息显示了测试模型的单条输入数据、输出节点的输出结果和平均偏差率以及所有节点的平均偏差率。 + +``` +InData0: 139.947 182.373 153.705 138.945 108.032 164.703 111.585 227.402 245.734 97.7776 201.89 134.868 144.851 236.027 18.1142 22.218 5.15569 212.318 198.43 221.853 +================ Comparing Output data ================ +Data of node age_out : 5.94584e-08 6.3317e-08 1.94726e-07 1.91809e-07 8.39805e-08 7.66035e-08 1.69285e-07 1.46246e-07 6.03796e-07 1.77631e-07 1.54343e-07 2.04623e-07 8.89609e-07 3.63487e-06 4.86876e-06 1.23939e-05 3.09981e-05 3.37098e-05 0.000107102 0.000213932 0.000533579 0.00062465 0.00296401 0.00993984 0.038227 0.0695085 0.162854 0.123199 0.24272 0.135048 0.169159 0.0221256 0.013892 0.00502971 0.00134921 0.00135701 0.000383242 0.000163475 0.000136294 9.77864e-05 8.00793e-05 5.73874e-05 3.53858e-05 2.18535e-05 2.04467e-05 1.85286e-05 1.05075e-05 9.34751e-06 6.12732e-06 4.55476e-06 +Mean bias of node age_out : 0% +Mean bias of all nodes: 0% +======================================================= +``` diff --git a/tutorials/source_zh_cn/use/converter_tool.md b/tutorials/source_zh_cn/use/converter_tool.md new file mode 100644 index 0000000000000000000000000000000000000000..42ea5d2bb4b58d223be0783f9c2b45473b63cea8 --- /dev/null +++ b/tutorials/source_zh_cn/use/converter_tool.md @@ -0,0 +1,187 @@ +# 模型转换工具 + + + +- [模型转换工具](#模型转换工具) + - [概述](#概述) + - [Linux环境使用说明](#linux环境使用说明) + - [环境准备](#环境准备) + - [参数说明](#参数说明) + - [使用示例](#使用示例) + - [Windows环境使用说明](#windows环境使用说明) + - [环境准备](#环境准备-1) + - [参数说明](#参数说明-1) + - [使用示例](#使用示例-1) + + + + + +## 概述 + +MindSpore Lite提供离线转换模型功能的工具,支持多种类型的模型转换,转换后的模型可用于推理。命令行参数包含多种个性化选项,为用户提供方便的转换途径。 + +目前支持的输入格式有:MindSpore、TensorFlow Lite、Caffe和ONNX。 + +## Linux环境使用说明 + +### 环境准备 + +使用MindSpore Lite模型转换工具,需要进行如下环境准备工作。 + +- 编译:模型转换工具代码在MindSpore源码的`mindspore/lite/tools/converter`目录中,参考部署文档中的[环境要求](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html#id2)和[编译示例](https://www.mindspore.cn/lite/tutorial/zh-CN/master/deploy.html#id5),安装编译依赖基本项与模型转换工具所需附加项,并编译x86_64版本。 + +- 运行:参考部署文档中的[输出件说明](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html#id4),获得`converter`工具,并配置环境变量。 + +### 参数说明 + +使用`./converter_lite `即可完成转换,同时提供了多种参数设置,用户可根据需要来选择使用。 +此外,用户可输入`./converter_lite --help`获取实时帮助。 + +下面提供详细的参数说明。 + +| 参数 | 是否必选 | 参数说明 | 取值范围 | 默认值 | +| -------- | ------- | ----- | --- | ---- | +| `--help` | 否 | 打印全部帮助信息。 | - | - | +| `--fmk=` | 是 | 输入模型的原始格式。 | MS、CAFFE、TFLITE、ONNX | - | +| `--modelFile=` | 是 | 输入模型的路径。 | - | - | +| `--outputFile=` | 是 | 输出模型的路径(不存在时将自动创建目录),不需加后缀,可自动生成`.ms`后缀。 | - | - | +| `--weightFile=` | 转换Caffe模型时必选 | 输入模型weight文件的路径。 | - | - | +| `--quantType=` | 否 | 设置模型的量化类型。 | PostTraining:训练后量化
AwareTraining:感知量化。 | - | +|` --inputInferenceType=` | 否 | 设置感知量化模型输入数据类型,如果和原模型不一致则转换工具会在模型前插转换算子,使得转换后的模型输入类型和inputInferenceType保持一致。 | FLOAT、INT8 | FLOAT | +| `--inferenceType=` | 否 | 设置感知量化模型输出数据类型,如果和原模型不一致则转换工具会在模型前插转换算子,使得转换后的模型输出类型和inferenceType保持一致。 | FLOAT、INT8 | FLOAT | +| `--stdDev= `| 否 | 感知量化模型转换时用于设置输入数据的标准差。 | (0,+∞) | 128 | +| `--mean=` | 否 | 感知量化模型转换时用于设置输入数据的均值。 | [-128, 127] | -0.5 | + +> - 参数名和参数值之间用等号连接,中间不能有空格。 +> - Caffe模型一般分为两个文件:`*.prototxt`模型结构,对应`--modelFile`参数;`*.caffemodel`模型权值,对应`--weightFile`参数。 + + +### 使用示例 + +首先,在源码根目录下,输入命令进行编译,可参考`compile.md`。 +```bash +bash build.sh -I x86_64 +``` +> 目前模型转换工具仅支持x86_64架构。 + +下面选取了几个常用示例,说明转换命令的使用方法。 + +- 以Caffe模型LeNet为例,执行转换命令。 + + ```bash + ./converter_lite --fmk=CAFFE --modelFile=lenet.prototxt --weightFile=lenet.caffemodel --outputFile=lenet + ``` + + 本例中,因为采用了Caffe模型,所以需要模型结构、模型权值两个输入文件。再加上其他必需的fmk类型和输出路径两个参数,即可成功执行。 + + 结果显示为: + ``` + INFO [converter/converter.cc:190] Runconverter] CONVERTER RESULT: SUCCESS! + ``` + 这表示已经成功将Caffe模型转化为MindSpore Lite模型,获得新文件`lenet.ms`。 + +- 以MindSpore、TensorFlow Lite、ONNX模型格式和感知量化模型为例,执行转换命令。 + + - MindSpore模型`model.mindir` + ```bash + ./converter_lite --fmk=MS --modelFile=model.mindir --outputFile=model + ``` + + - TensorFlow Lite模型`model.tflite` + ```bash + ./converter_lite --fmk=TFLITE --modelFile=model.tflite --outputFile=model + ``` + + - ONNX模型`model.onnx` + ```bash + ./converter_lite --fmk=ONNX --modelFile=model.onnx --outputFile=model + ``` + + - TensorFlow Lite感知量化模型`model_quant.tflite` + ```bash + ./converter_lite --fmk=TFLITE --modelFile=model_quant.tflite --outputFile=model --quantType=AwareTraining + ``` + + - 感知量化模型输入设置为int8,输出设置为int8 + + ```bash + ./converter_lite --fmk=TFLITE --modelFile=model_quant.tflite --outputFile=model --quantType=AwareTraining --inputInferenceType=INT8 --inferenceType=INT8 + ``` + 以上几种情况下,均显示如下转换成功提示,且同时获得`model.ms`目标文件。 + + ``` + INFO [converter/converter.cc:190] Runconverter] CONVERTER RESULT: SUCCESS! + ``` + + +## Windows环境使用说明 + +### 环境准备 + +使用MindSpore Lite模型转换工具,需要进行如下环境准备工作。 + +- 编译:模型转换工具代码在MindSpore源码的`mindspore/lite/tools/converter`目录中,参考部署文档中的[环境要求](https://www.mindspore.cn/lite/docs/zh-CN/master/compile.html#id7)和[编译示例](https://www.mindspore.cn/lite/docs/zh-CN/master/deploy.html#id10),安装编译依赖基本项与模型转换工具所需附加项,并编译Windows版本。 + +- 运行:参考部署文档中的[输出件说明](https://www.mindspore.cn/lite/docs/zh-CN/master/compile.html#id9),获得`converter`工具,并将MinGW/bin目录下的几个依赖文件(libgcc_s_seh-1.dll、libwinpthread-1.dll、libssp-0.dll、libstdc++-6.dll)拷贝至`converter`工具的主目录。 + +### 参数说明 + +参考Linux环境模型转换工具的[参数说明](https://www.mindspore.cn/lite/docs/zh-CN/master/converter_tool.html#id4) + + +### 使用示例 + +首先,使用cmd工具在源码根目录下,输入命令进行编译,可参考`compile.md`。 +```bash +call build.bat lite +``` + +然后,设置日志打印级别为INFO。 +```bash +set MSLOG=INFO +``` + +下面选取了几个常用示例,说明转换命令的使用方法。 + +- 以Caffe模型LeNet为例,执行转换命令。 + + ```bash + call converter_lite --fmk=CAFFE --modelFile=lenet.prototxt --weightFile=lenet.caffemodel --outputFile=lenet + ``` + + 本例中,因为采用了Caffe模型,所以需要模型结构、模型权值两个输入文件。再加上其他必需的fmk类型和输出路径两个参数,即可成功执行。 + + 结果显示为: + ``` + INFO [converter/converter.cc:190] Runconverter] CONVERTER RESULT: SUCCESS! + ``` + 这表示已经成功将Caffe模型转化为MindSpore Lite模型,获得新文件`lenet.ms`。 + +- 以MindSpore、TensorFlow Lite、ONNX模型格式和感知量化模型为例,执行转换命令。 + + - MindSpore模型`model.mindir` + ```bash + call converter_lite --fmk=MS --modelFile=model.mindir --outputFile=model + ``` + + - TensorFlow Lite模型`model.tflite` + ```bash + call converter_lite --fmk=TFLITE --modelFile=model.tflite --outputFile=model + ``` + + - ONNX模型`model.onnx` + ```bash + call converter_lite --fmk=ONNX --modelFile=model.onnx --outputFile=model + ``` + + - TensorFlow Lite感知量化模型`model_quant.tflite` + ```bash + call converter_lite --fmk=TFLITE --modelFile=model_quant.tflite --outputFile=model --quantType=AwareTraining + ``` + + 以上几种情况下,均显示如下转换成功提示,且同时获得`model.ms`目标文件。 + ``` + INFO [converter/converter.cc:190] Runconverter] CONVERTER RESULT: SUCCESS! + ``` + diff --git a/tutorials/source_zh_cn/use/post_training_quantization.md b/tutorials/source_zh_cn/use/post_training_quantization.md new file mode 100644 index 0000000000000000000000000000000000000000..93ba0889434710d8f6143a0469b970b8e1feb2d7 --- /dev/null +++ b/tutorials/source_zh_cn/use/post_training_quantization.md @@ -0,0 +1,63 @@ +# 训练后量化 + + + +- [训练后量化](#训练后量化) + - [概述](#概述) + - [参数说明](#参数说明) + - [使用示例](#使用示例) + + + + + +## 概述 + +对于已经训练好的`float32`模型,通过训练后量化将模型转为`int8`模型,不仅能减小模型大小,而且能显著提高推理性能。在MindSpore端侧框架中,这部分功能集成在模型转换工具`conveter_lite`中,通过增加命令行参数,便能够转换得到量化后模型。 +目前训练后量化属于alpha阶段(支持部分网络,不支持多输入模型),正在持续完善中。 + +``` +./converter_lite --fmk=ModelType --modelFile=ModelFilePath --outputFile=ConvertedModelPath --quantType=PostTraining --config_file=config.cfg +``` +## 参数说明 + +| 参数 | 属性 | 功能描述 | 参数类型 | 默认值 | 取值范围 | +| -------- | ------- | ----- | ----- |----- | ----- | +| --quantType | 必选 | 设置为PostTraining,启用训练后量化 | String | - | 必须设置为PostTraining | +| --config_file | 必选 | 校准数据集配置文件路径 | String | - | - | + +为了计算激活值的量化参数,用户需要提供校准数据集。校准数据集最好来自真实推理场景,能表征模型的实际输入情况,数量在100个左右。 +校准数据集配置文件采用`key=value`的方式定义相关参数,需要配置的`key`如下: + +| 参数名 | 属性 | 功能描述 | 参数类型 | 默认值 | 取值范围 | +| -------- | ------- | ----- | ----- | ----- | ----- | +| image_path | 必选 | 存放校准数据集的目录 | String | - | 该目录存放可直接用于执行推理的输入数据。由于目前框架还不支持数据预处理,所有数据必须事先完成所需的转换,使得它们满足推理的输入要求。 | +| batch_count | 可选 | 使用的输入数目 | Integer | 100 | 大于0 | +| method_x | 可选 | 网络层输入输出数据量化算法 | String | KL | KL,MAX_MIN。 KL: 基于[KL散度](http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)对数据范围作量化校准; MAX_MIN:基于最大值、最小值计算数据的量化参数。 在模型以及数据集比较较简单的情况下,推荐使用MAX_MIN | +| thread_num | 可选 | 使用校准数据集执行推理流程时的线程数 | Integer | 1 | 大于0 | + +## 使用示例 + +1. 正确编译出`converter_lite`可执行文件。 +2. 准备校准数据集,假设存放在`/dir/images`目录,编写配置文件`config.cfg`,内容如下: + ``` + image_path=/dir/images + batch_count=100 + method_x=MAX_MIN + thread_num=1 + ``` + 校准数据集可以选择测试数据集的子集,要求`/dir/images`目录下存放的每个文件均是预处理好的输入数据,每个文件都可以直接用于推理的输入。 +3. 以MindSpore模型为例,执行带训练后量化的模型转换命令: + ``` + ./converter_lite --fmk=MS --modelFile=lenet.ms --outputFile=lenet_quant --quantType=PostTraining --config_file=config.cfg + ``` +4. 上述命令执行成功后,便可得到量化后的模型lenet_quant.ms,通常量化后的模型大小会下降到FP32模型的1/4。 + +## 部分模型精度结果 + + | 模型 | 测试数据集 | method_x | FP32模型精度 | 训练后量化精度 | 说明 | + | -------- | ------- | ----- | ----- | ----- | ----- | + | [Inception_V3](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_v3_2018_04_27.tgz) | [ImageNet](http://image-net.org/) | KL | 77.92% | 77.95% | 校准数据集随机选择ImageNet Validation数据集中的100张 | + | [Mobilenet_V1_1.0_224](https://torage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz) | [ImageNet](http://image-net.org/) | KL | 70.96% | 70.69% | 校准数据集随机选择ImageNet Validation数据集中的100张 | + +> 以上所有结果均在x86环境上测得。 diff --git a/tutorials/source_zh_cn/use/runtime.md b/tutorials/source_zh_cn/use/runtime.md new file mode 100644 index 0000000000000000000000000000000000000000..7444b59305fdbb3900255962a1ed6af82729b8d0 --- /dev/null +++ b/tutorials/source_zh_cn/use/runtime.md @@ -0,0 +1,382 @@ +# Runtime使用指南 + + + +- [Runtime使用指南](#runtime使用指南) + - [概述](#概述) + - [读取模型](#读取模型) + - [创建会话](#创建会话) + - [创建上下文](#创建上下文) + - [创建会话](#创建会话-1) + - [使用示例](#使用示例) + - [图编译](#图编译) + - [可变维度](#可变维度) + - [使用示例](#使用示例-1) + - [图编译](#图编译-1) + - [输入数据](#输入数据) + - [获取输入Tensor](#获取输入tensor) + - [数据拷贝](#数据拷贝) + - [使用示例](#使用示例-2) + - [图执行](#图执行) + - [执行会话](#执行会话) + - [绑核](#绑核) + - [回调运行](#回调运行) + - [使用示例](#使用示例-3) + - [获取输出](#获取输出) + - [获取输出Tensor](#获取输出tensor) + - [使用示例](#使用示例-4) + - [获取版本号](#获取版本号) + - [使用示例](#使用示例-5) + + + + + +## 概述 + +通过MindSpore Lite模型转换后,需在Runtime中完成模型的推理执行流程。 + +Runtime总体使用流程如下图所示: + +![img](../images/side_infer_process.png) + +包含的组件及功能如下所述: +- `Model`:MindSpore Lite使用的模型,通过用户构图或直接加载网络,来实例化算子原型的列表。 +- `Lite Session`:提供图编译的功能,并调用图执行器进行推理。 +- `Scheduler`:算子异构调度器,根据异构调度策略,为每一个算子选择合适的kernel,构造kernel list,并切分子图。 +- `Executor`:图执行器,执行kernel list,动态分配和释放Tensor。 +- `Operator`:算子原型,包含算子的属性,以及shape、data type和format的推导方法。 +- `Kernel`:算子库提供算子的具体实现,提供算子forward的能力。 +- `Tensor`:MindSpore Lite使用的Tensor,提供了Tensor内存操作的功能和接口。 + +## 读取模型 + +在MindSpore Lite中,模型文件是从模型转换工具转换得到的`.ms`文件。进行模型推理时,需要从文件系统加载模型,并进行模型解析,这部分操作主要在Model中实现。Model持有权重数据、算子属性等模型数据。 + +模型通过Model类的静态`Import`方法从内存数据中创建。函数返回的`Model`实例是一个指针,通过`new`创建,不再需要时,需要用户通过`delete`释放。 + +## 创建会话 + +使用MindSpore Lite执行推理时,Session是推理的主入口,通过Session我们可以进行图编译、图执行。 + +### 创建上下文 + +上下文会保存会话所需的一些基本配置参数,用于指导图编译和图执行,其定义如下: + +MindSpore Lite支持异构推理,推理时的主选后端由`Context`中的`device_ctx_`指定,默认为CPU。在进行图编译时,会根据主选后端进行算子选型调度。 + +MindSpore Lite内置一个进程共享的线程池,推理时通过`thread_num_`指定线程池的最大线程数,默认为2线程,推荐最多不超过4个线程,否则可能会影响性能。 + +MindSpore Lite支持动态内存分配和释放,如果没有指定`allocator`,推理时会生成一个默认的`allocator`,也可以通过`Context`方法在多个`Context`中共享内存分配器。 + +如果用户通过`new`创建`Context`,不再需要时,需要用户通过`delete`释放。一般在创建完Session后,Context即可释放。 + +### 创建会话 + +用上一步创建得到的`Context`,调用LiteSession的静态`CreateSession`方法来创建`LiteSession`。函数返回的`LiteSession`实例是一个指针,通过`new`创建,不再需要时,需要用户通过`delete`释放。 + +### 使用示例 + +下面示例代码演示了`Context`的创建,以及在两个`LiteSession`间共享内存池的功能: + +```cpp +auto context = new (std::nothrow) lite::Context; +if (context == nullptr) { + MS_LOG(ERROR) << "New context failed while running %s", modelName.c_str(); + return RET_ERROR; +} +// The preferred backend is GPU, which means, if there is a GPU operator, it will run on the GPU first, otherwise it will run on the CPU. +context->device_ctx_.type = lite::DT_GPU; +// The medium core takes priority in thread and core binding methods. This parameter will work in the BindThread interface. For specific binding effect, see the "Run Graph" section. +context->cpu_bind_mode_ = MID_CPU; +// Configure the number of worker threads in the thread pool to 2, including the main thread. +context->thread_num_ = 2; +// Allocators can be shared across multiple Contexts. +auto *context2 = new Context(context->thread_num_, context->allocator, context->device_ctx_); +context2->cpu_bind_mode_ = context->cpu_bind_mode_; +// Use Context to create Session. +auto session1 = session::LiteSession::CreateSession(context); +// After the LiteSession is created, the Context can be released. +delete (context); +if (session1 == nullptr) { + MS_LOG(ERROR) << "CreateSession failed while running %s", modelName.c_str(); + return RET_ERROR; +} +// session1 and session2 can share one memory pool. +auto session2 = session::LiteSession::CreateSession(context2); +delete (context2); +if (session == nullptr) { + MS_LOG(ERROR) << "CreateSession failed while running %s", modelName.c_str(); + return RET_ERROR; +} +``` + +## 图编译 + +### 可变维度 + +使用MindSpore Lite进行推理时,在已完成会话创建与图编译之后,如果需要对输入的shape进行Resize,则可以通过对输入的tensor重新设置shape,然后调用session的Resize()接口。 + +### 使用示例 + +下面代码演示如何对MindSpore Lite的输入进行Resize(): +```cpp +// Assume we have created a LiteSession instance named session. +auto inputs = session->GetInputs(); +std::vector resize_shape = {1, 128, 128, 3}; +// Assume the model has only one input,resize input shape to [1, 128, 128, 3] +inputs[0]->set_shape(resize_shape); +session->Resize(inputs); +``` + +### 图编译 + +在图执行前,需要调用`LiteSession`的`CompileGraph`接口进行图编译,进一步解析从文件中加载的Model实例,主要进行子图切分、算子选型调度。这部分会耗费较多时间,所以建议`ListSession`创建一次,编译一次,多次执行。 + +## 输入数据 + +### 获取输入Tensor + +在图执行前,需要将输入数据拷贝到模型的输入Tensor。 + +MindSpore Lite提供两种方法来获取模型的输入Tensor。 + +1. 使用`GetInputsByName`方法,根据模型输入节点的名称来获取模型输入Tensor中连接到该节点的Tensor的vector。 +2. 使用`GetInputs`方法,直接获取所有的模型输入Tensor的vector。 + +### 数据拷贝 + +当获取到模型的输入,就需要向Tensor中填入数据。通过`MSTensor`的`Size`方法来获取Tensor应该填入的数据大小,通过`data_type`方法来获取Tensor的数据类型,通过`MSTensor`的`MutableData`方法来获取可写的指针。 + +### 使用示例 + +下面示例代码演示了从`LiteSession`中获取整图输入`MSTensor`,并且向其中灌入模型输入数据的过程: + +```cpp +// Assume we have created a LiteSession instance named session. +auto inputs = session->GetInputs(); +// Assume that the model has only one input tensor. +auto in_tensor = inputs.front(); +if (in_tensor == nullptr) { + std::cerr << "Input tensor is nullptr" << std::endl; + return -1; +} +// It is omitted that users have read the model input file and generated a section of memory buffer: input_buf, as well as the byte size of input_buf: data_size. +if (in_tensor->Size() != data_size) { + std::cerr << "Input data size is not suit for model input" << std::endl; + return -1; +} +auto *in_data = in_tensor->MutableData(); +if (in_data == nullptr) { + std::cerr << "Data of in_tensor is nullptr" << std::endl; + return -1; +} +memcpy(in_data, input_buf, data_size); +// Users need to free input_buf. +// The elements in the inputs are managed by MindSpore Lite so that users do not need to free inputs. +``` + +需要注意的是: +- MindSpore Lite的模型输入Tensor中的数据排布必须是NHWC。 +- 模型的输入`input_buf`是用户从磁盘读取的,当拷贝给模型输入Tensor以后,用户需要自行释放`input_buf`。 +- `GetInputs`和`GetInputsByName`方法返回的vector不需要用户释放。 + +## 图执行 + +### 执行会话 + +MindSpore Lite会话在进行图编译以后,即可使用`LiteSession`的`RunGraph`进行模型推理。 + +### 绑核 + +MindSpore Lite内置线程池支持绑核、解绑操作,通过调用`BindThread`接口,可以将线程池中的工作线程绑定到指定CPU核,用于性能分析。绑核操作与创建`LiteSession`时用户指定的上下文有关,绑核操作会根据上下文中的绑核策略进行线程与CPU的亲和性设置。 + +需要注意的是,绑核是一个亲和性操作,不保证一定能绑定到指定的CPU核,会受到系统调度的影响。而且绑核后,需要在执行完代码后进行解绑操作,示例如下: + +```cpp +// Assume we have created a LiteSession instance named session. +session->BindThread(true); +auto ret = session->RunGraph(); +if (ret != mindspore::lite::RET_OK) { + std::cerr << "RunGraph failed" << std::endl; + delete session; + return -1; +} +session->BindThread(false); +``` + +> 绑核参数有两种选择:大核优先和中核优先。 +> 判定大核和中核的规则其实是根据CPU核的频率而不是根据CPU的架构,对于没有大中小核之分的CPU架构,在该规则下也可以区分大核和中核。 +> 绑定大核优先是指线程池中的线程从频率最高的核开始绑定,第一个线程绑定在频率最高的核上,第二个线程绑定在频率第二高的核上,以此类推。 +> 对于中核优先,中核的定义是根据经验来定义的,默认设定中核是第三和第四高频率的核,当绑定策略为中核优先时,会优先绑定到中核上,当中核不够用时,会往小核上进行绑定。 + +### 回调运行 + +Mindspore Lite可以在调用`RunGraph`时,传入两个`KernelCallBack`函数指针来回调推理模型,相比于一般的图执行,回调运行可以在运行过程中获取额外的信息,帮助开发者进行性能分析、Bug调试等。额外的信息包括: +- 当前运行的节点名称 +- 推理当前节点前的输入输出Tensor +- 推理当前节点后的输入输出Tensor + +### 使用示例 + +下面示例代码演示了使用`LiteSession`进行图编译,并定义了两个回调函数作为前置回调指针和后置回调指针,传入到`RunGraph`接口进行回调推理,并演示了一次图编译,多次图执行的使用场景: + +```cpp +// Assume we have created a LiteSession instance named session and a Model instance named model before. +// The methods of creating model and session can refer to "Import Model" and "Create Session" two sections. +auto ret = session->CompileGraph(model); +if (ret != RET_OK) { + std::cerr << "CompileGraph failed" << std::endl; + // session and model need to be released by users manually. + delete (session); + delete (model); + return ret; +} +// Copy input data into the input tensor. Users can refer to the "Input Data" section. We uses random data here. +auto inputs = session->GetInputs(); +for (auto in_tensor : inputs) { + in_tensor = inputs.front(); + if (in_tensor == nullptr) { + std::cerr << "Input tensor is nullptr" << std::endl; + return -1; + } + // When calling the MutableData method, if the data in MSTensor is not allocated, it will be malloced. After allocation, the data in MSTensor can be considered as random data. + (void) in_tensor->MutableData(); +} +// Definition of callback function before forwarding operator. +auto before_call_back_ = [&](const std::vector &before_inputs, + const std::vector &before_outputs, + const session::CallBackParam &call_param) { + std::cout << "Before forwarding " << call_param.name_callback_param << std::endl; + return true; +}; +// Definition of callback function after forwarding operator. +auto after_call_back_ = [&](const std::vector &after_inputs, + const std::vector &after_outputs, + const session::CallBackParam &call_param) { + std::cout << "After forwarding " << call_param.name_callback_param << std::endl; + return true; +}; +// Call the callback function when performing the model inference process. +ret = session_->RunGraph(before_call_back_, after_call_back_); +if (ret != RET_OK) { + MS_LOG(ERROR) << "Run graph failed."; + return RET_ERROR; +} +// CompileGraph would cost much time, a better solution is calling CompileGraph only once and RunGraph much more times. +for (size_t i = 0; i < 10; i++) { + auto ret = session_->RunGraph(); + if (ret != RET_OK) { + MS_LOG(ERROR) << "Run graph failed."; + return RET_ERROR; + } +} +// session and model needs to be released by users manually. +delete (session); +delete (model); +``` + +## 获取输出 + +### 获取输出Tensor + +MindSpore Lite在执行完推理后,就可以获取模型的推理结果。 + +MindSpore Lite提供四种方法来获取模型的输出`MSTensor`。 +1. 使用`GetOutputsByNodeName`方法,根据模型输出节点的名称来获取模型输出`MSTensor`中连接到该节点的Tensor的vector。 +2. 使用`GetOutputMapByNode`方法,直接获取所有的模型输出节点的名称和连接到该节点的模型输出`MSTensor`的一个map。 +3. 使用`GetOutputByTensorName`方法,根据模型输出Tensor的名称来获取对应的模型输出`MSTensor`。 +4. 使用`GetOutputMapByTensor`方法,直接获取所有的模型输出`MSTensor`的名称和`MSTensor`指针的一个map。 + +当获取到模型的输出Tensor,就需要向Tensor中填入数据。通过`MSTensor`的`Size`方法来获取Tensor应该填入的数据大小,通过`data_type`方法来获取`MSTensor`的数据类型,通过`MSTensor`的`MutableData`方法来获取可读写的内存指针。 + +### 使用示例 + +下面示例代码演示了使用`GetOutputMapByNode`接口获取输出`MSTensor`,并打印了每个输出`MSTensor`的前十个数据或所有数据: + +```cpp +// Assume we have created a LiteSession instance named session before. +auto output_map = session->GetOutputMapByNode(); +// Assume that the model has only one output node. +auto out_node_iter = output_map.begin(); +std::string name = out_node_iter->first; +// Assume that the unique output node has only one output tensor. +auto out_tensor = out_node_iter->second.front(); +if (out_tensor == nullptr) { + std::cerr << "Output tensor is nullptr" << std::endl; + return -1; +} +// Assume that the data format of output data is float 32. +if (out_tensor->data_type() != mindspore::TypeId::kNumberTypeFloat32) { + std::cerr << "Output of lenet should in float32" << std::endl; + return -1; +} +auto *out_data = reinterpret_cast(out_tensor->MutableData()); +if (out_data == nullptr) { + std::cerr << "Data of out_tensor is nullptr" << std::endl; + return -1; +} +// Print the first 10 float data or all output data of the output tensor. +std::cout << "Output data: "; +for (size_t i = 0; i < 10 & i < out_tensor->ElementsNum(); i++) { + std::cout << " " << out_data[i]; +} +std::cout << std::endl; +// The elements in outputs do not need to be free by users, because outputs are managed by the MindSpore Lite. +``` + +需要注意的是,`GetOutputsByNodeName`、`GetOutputMapByNode`、`GetOutputByTensorName`和`GetOutputMapByTensor`方法返回的vector或map不需要用户释放。 + +下面示例代码演示了使用`GetOutputsByNodeName`接口获取输出`MSTensor`的方法: + +```cpp +// Assume we have created a LiteSession instance named session before. +// Assume that model has a output node named output_node_name_0. +auto output_vec = session->GetOutputsByNodeName("output_node_name_0"); +// Assume that output node named output_node_name_0 has only one output tensor. +auto out_tensor = output_vec.front(); +if (out_tensor == nullptr) { + std::cerr << "Output tensor is nullptr" << std::endl; + return -1; +} +``` + +下面示例代码演示了使用`GetOutputMapByTensor`接口获取输出`MSTensor`的方法: + +```cpp +// Assume we have created a LiteSession instance named session before. +auto output_map = session->GetOutputMapByTensor(); +// Assume that output node named output_node_name_0 has only one output tensor. +auto out_tensor = output_vec.front(); +if (out_tensor == nullptr) { + std::cerr << "Output tensor is nullptr" << std::endl; + return -1; +} +``` + +下面示例代码演示了使用`GetOutputByTensorName`接口获取输出`MSTensor`的方法: + +```cpp +// We can use GetOutputTensorNames method to get all name of output tensor of model which is in order. +auto tensor_names = this->GetOutputTensorNames(); +// Assume we have created a LiteSession instance named session before. +// Use output tensor name returned by GetOutputTensorNames as key +for (auto tensor_name : tensor_names) { + auto out_tensor = this->GetOutputByTensorName(tensor_name); + if (out_tensor == nullptr) { + std::cerr << "Output tensor is nullptr" << std::endl; + return -1; + } +} +``` + +## 获取版本号 +MindSpore Lite提供了`Version`方法可以获取版本号,包含在`include/version.h`头文件中,调用该方法可以得到版本号字符串。 + +### 使用示例 + +下面代码演示如何获取MindSpore Lite的版本号: +```cpp +#include "include/version.h" +std::string version = mindspore::lite::Version(); +``` diff --git a/tutorials/source_zh_cn/use/timeprofiler_tool.md b/tutorials/source_zh_cn/use/timeprofiler_tool.md new file mode 100644 index 0000000000000000000000000000000000000000..14bcfb006eda63a579220490ae1fa4bf10c34739 --- /dev/null +++ b/tutorials/source_zh_cn/use/timeprofiler_tool.md @@ -0,0 +1,93 @@ +# TimeProfiler工具 + + + +- [TimeProfiler工具](#timeprofiler工具) + - [概述](#概述) + - [环境准备](#环境准备) + - [参数说明](#参数说明) + - [使用示例](#使用示例) + + + + + +## 概述 + +TimeProfiler工具可以对MindSpore Lite模型网络层的前向推理进行耗时分析,由C++语言编码实现。 + +## 环境准备 + +使用TimeProfiler工具,需要进行如下环境准备工作。 + +- 编译:TimeProfiler工具代码在MindSpore源码的`mindspore/lite/tools/time_profiler`目录中,参考部署文档中的[环境要求](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html#id2)和[编译示例](https://www.mindspore.cn/lite/tutorial/zh-CN/master/deploy.html#id5),安装编译依赖基本项,并执行编译。 + +- 运行:参考部署文档中的[输出件说明](https://www.mindspore.cn/lite/tutorial/zh-CN/master/compile.html#id4),获得`time_profiler`工具,并配置环境变量。 + +## 参数说明 + +使用编译好的TimeProfiler工具进行模型网络层耗时分析时,其命令格式如下所示。 + +```bash +./timeprofiler --modelPath= [--help] [--loopCount=] [--numThreads=] [--cpuBindMode=] [--inDataPath=] [--fp16Priority=] +``` + +下面提供详细的参数说明。 + +| 参数名 | 属性 | 功能描述 | 参数类型 | 默认值 | 取值范围 | +| ----------------- | ---- | ------------------------------------------------------------ | ------ | -------- | ---------------------------------- | +| `--help` | 可选 | 显示`timeprofiler`命令的帮助信息。 | - | - | - | +| `--modelPath= ` | 必选 | 指定需要进行耗时分析的MindSpore Lite模型的文件路径。 | String | null | - | +| `--loopCount=` | 可选 | 指定TimeProfiler工具进行耗时分析时,模型推理的运行次数,其值为正整数。 | Integer | 100 | - | +| `--numThreads=` | 可选 | 指定模型推理程序运行的线程数。 | Integer | 4 | - | +| `--cpuBindMode=` | 可选 | 指定模型推理程序运行时绑定的CPU核类型。 | Integer | 1 | -1:表示中核
1:表示大核
0:表示不绑定 | +| `--inDataPath=` | 可选 | 指定模型输入数据的文件路径。如果未设置,则使用随机输入。 | String | null | - | +| `--fp16Priority=` | 可选 | 指定是否优先使用float16算子。 | Bool | false | true, false | + +## 使用示例 + +使用TimeProfiler对`test_timeprofiler.ms`模型的网络层进行耗时分析,并且设置模型推理循环运行次数为10,则其命令代码如下: + +```bash +./timeprofiler --modelPath=./models/test_timeprofiler.ms --loopCount=10 +``` + +该条命令执行后,TimeProfiler工具会输出模型网络层运行耗时的相关统计信息。对于本例命令,输出的统计信息如下。其中统计信息按照`opName`和`optype`两种划分方式分别显示,`opName`表示算子名,`optype`表示算子类别,`avg`表示该算子的平均单次运行时间,`percent`表示该算子运行耗时占所有算子运行总耗时的比例,`calledTimess`表示该算子的运行次数,`opTotalTime`表示该算子运行指定次数的总耗时。最后,`total time`和`kernel cost`分别显示了该模型单次推理的平均耗时和模型推理中所有算子的平均耗时之和。 + +``` +----------------------------------------------------------------------------------------- +opName avg(ms) percent calledTimess opTotalTime +conv2d_1/convolution 2.264800 0.824012 10 22.648003 +conv2d_2/convolution 0.223700 0.081390 10 2.237000 +dense_1/BiasAdd 0.007500 0.002729 10 0.075000 +dense_1/MatMul 0.126000 0.045843 10 1.260000 +dense_1/Relu 0.006900 0.002510 10 0.069000 +max_pooling2d_1/MaxPool 0.035100 0.012771 10 0.351000 +max_pooling2d_2/MaxPool 0.014300 0.005203 10 0.143000 +max_pooling2d_2/MaxPool_nchw2nhwc_reshape_1/Reshape_0 0.006500 0.002365 10 0.065000 +max_pooling2d_2/MaxPool_nchw2nhwc_reshape_1/Shape_0 0.010900 0.003966 10 0.109000 +output/BiasAdd 0.005300 0.001928 10 0.053000 +output/MatMul 0.011400 0.004148 10 0.114000 +output/Softmax 0.013300 0.004839 10 0.133000 +reshape_1/Reshape 0.000900 0.000327 10 0.009000 +reshape_1/Reshape/shape 0.009900 0.003602 10 0.099000 +reshape_1/Shape 0.002300 0.000837 10 0.023000 +reshape_1/strided_slice 0.009700 0.003529 10 0.097000 +----------------------------------------------------------------------------------------- +opType avg(ms) percent calledTimess opTotalTime +Activation 0.006900 0.002510 10 0.069000 +BiasAdd 0.012800 0.004657 20 0.128000 +Conv2D 2.488500 0.905401 20 24.885004 +MatMul 0.137400 0.049991 20 1.374000 +Nchw2Nhwc 0.017400 0.006331 20 0.174000 +Pooling 0.049400 0.017973 20 0.494000 +Reshape 0.000900 0.000327 10 0.009000 +Shape 0.002300 0.000837 10 0.023000 +SoftMax 0.013300 0.004839 10 0.133000 +Stack 0.009900 0.003602 10 0.099000 +StridedSlice 0.009700 0.003529 10 0.097000 + +total time : 2.90800 ms, kernel cost : 2.74851 ms + +----------------------------------------------------------------------------------------- +``` \ No newline at end of file