From fe75a344de28e843ee1f27af950ae88fc05007ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=86=8A=E6=94=80?= Date: Wed, 27 Aug 2025 16:08:17 +0800 Subject: [PATCH] =?UTF-8?q?=E6=B7=BB=E5=8A=A0model=20cache=E6=96=87?= =?UTF-8?q?=E6=A1=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../api/source_zh_cn/api_cpp/mindspore.md | 189 +++++++++++++++++- .../source_en/mindir/converter_tool_ascend.md | 6 + .../docs/source_en/mindir/runtime_python.md | 57 +++++- .../mindir/converter_tool_ascend.md | 6 + .../source_zh_cn/mindir/runtime_python.md | 55 ++++- 5 files changed, 303 insertions(+), 10 deletions(-) diff --git a/docs/lite/api/source_zh_cn/api_cpp/mindspore.md b/docs/lite/api/source_zh_cn/api_cpp/mindspore.md index 5e6d50b882..d30e973257 100644 --- a/docs/lite/api/source_zh_cn/api_cpp/mindspore.md +++ b/docs/lite/api/source_zh_cn/api_cpp/mindspore.md @@ -9,6 +9,8 @@ | 类名 | 描述 | 云侧推理是否支持 | 端侧推理是否支持 | |--------------------------------------------------|---------------------------------------------------|--------|--------| | [Model](#model) | MindSpore中的模型,便于计算图管理。 | √ | √ | +| [ModelExecutor](#modelexecutor) | 包装多个Model类,用于调度多个Model对象 | √ | ✕ | +| [MultiModelRunner](#multimodelrunner) | 包装多个ModelExecutor类,用于调度多个Model对象 | √ | ✕ | ### 运行环境配置 @@ -1193,7 +1195,7 @@ using Key = struct MS_API Key { size_t len = 0; unsigned char key[32] = {0}; Key() : len(0) {} - explicit Key(const char *dec_key, size_t key_len); + explicit Key(const char *dec_key, size_t key_len) }; ``` @@ -2240,6 +2242,177 @@ Status Finalize() 状态码。 +## ModelExecutor + +\#include <[multi_model_runner.h](https://gitee.com/mindspore/mindspore-lite/blob/master/mindspore-lite/include/api/multi_model_runner.h)> + +ModelExecutor定义了对Model的封装,用于调度多个Model的推理。 + +### 构造函数 + +```c++ +ModelExecutor() +``` + +```c++ +ModelExecutor(const std::vector> &models, const std::vector &executor_input_names, + const std::vector &executor_output_names, + const std::vector> &subgraph_input_names) +``` + +- 参数 + + - `models`: 一个由ModelImplPtr组成的向量,用于在ModelExecutor中推理。 + - `executor_input_names`: 由string组成的向量,当前ModelExecutor的输入名。 + - `executor_output_names`: 由string组成的向量,当前ModelExecutor的输出名。 + - `subgraph_input_names`: 由string组成的向量,当前ModelExecutor中所有模型的输入以及输出名。 + +### 析构函数 + +```c++ +~ModelExecutor() +``` + +### 公有成员函数 + +| 函数 | 云侧推理是否支持 | 端侧推理是否支持 | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| +| [Status Predict(const std::vector\ &inputs, std::vector\ *outputs)](#predict-2) | √ | ✕ | +| [std::vector\ GetInputs() const](#getinputs-1) | √ | ✕ | +| [std::vector\ GetOutputs() const](#getoutputs-1) | √ | ✕ | + +#### Predict + +```cpp +Status Predict(const std::vector &inputs, std::vector *outputs) +``` + +ModelExecutor的推理接口。 + +- 参数 + + - `inputs`: 模型输入按顺序排列的`vector`。 + - `outputs`: 输出参数,按顺序排列的`vector`的指针,模型输出会按顺序填入该容器。 + +- 返回值 + + 状态码类`Status`对象,可以使用其公有函数`StatusCode`或`ToString`函数来获取具体错误码及错误信息。 + +#### GetInputs + +```cpp +std::vector GetInputs() const +``` + +获取模型所有输入张量。 + +- 返回值 + + 包含模型所有输入张量的容器类型变量。 + +#### GetOutputs + +```cpp +std::vector GetOutputs() const +``` + +获取模型所有输出张量。 + +- 返回值 + + 包含模型所有输出张量的容器类型变量。 + +## MultiModelRunner + +\#include <[multi_model_runner.h](https://gitee.com/mindspore/mindspore-lite/blob/master/mindspore-lite/include/api/multi_model_runner.h)> + +MultiModelRunner用于创建包含多个Model的mindir,并提供调度多个模型的方式。 + +### 构造函数 + +```c++ +MultiModelRunner() +``` + +### 析构函数 + +```c++ +~MultiModelRunner() +``` + +### 公有成员函数 + +| 函数 | 云侧推理是否支持 | 端侧推理是否支持 | +|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| +| [inline Status Build(const std::string &model_path, const ModelType &model_type, const std::shared_ptr\ &model_context = nullptr)](#build-5) | √ | ✕ | +| [std::vector\ GetModelExecutor() const](#getmodelexecutor) | √ | ✕ | +| [inline Status LoadConfig(const std::string &config_path)](#loadconfig-1) | √ | ✕ | +| [inline Status UpdateConfig(const std::string §ion, const std::pair\ &config)](#updateconfig-1) | √ | ✕ | + +#### Build + +```cpp +inline Status Build(const std::string &model_path, const ModelType &model_type, + const std::shared_ptr &model_context = nullptr) +``` + +根据路径读取加载模型,并将模型编译至可在Device上运行的状态。与Model接口中同名接口的差别在于传入的model_path指定的mindir文件中包含多个Model。 + +- 参数 + + - `model_path`: 模型文件路径。 + - `model_type`: `ModelType::kMindIR`,对应``mindir`模型(MindSpore导出或`converter_lite`工具导出)。 + - `model_context`: 模型[Context](#context)。 + +- 返回值 + + 状态码类`Status`对象,可以使用其公有函数`StatusCode`或`ToString`函数来获取具体错误码及错误信息。 + +#### GetModelExecutor + +```cpp +std::vector GetModelExecutor() const +``` + +获取MultiModelRunner中创建的多个ModelExecutor对象。 + +- 返回值 + + 包含MultiModelRunner中创建的所有ModelExecutor对象的列表。 + +#### LoadConfig + +```cpp +inline Status LoadConfig(const std::string &config_path) +``` + +根据路径读取配置文件。 + +- 参数 + + - `config_path`: 配置文件路径。 + +- 返回值 + + 状态码类`Status`对象,可以使用其公有函数`StatusCode`或`ToString`函数来获取具体错误码及错误信息。 + +#### UpdateConfig + +```cpp +inline Status UpdateConfig(const std::string §ion, const std::pair &config) +``` + +刷新配置,读文件相对比较费时,如果少部分配置发生变化可以通过该接口更新部分配置。 + +- 参数 + + - `section`: 配置的章节名。 + - `config`: 要更新的配置对。 + +- 返回值 + + 状态码类`Status`对象,可以使用其公有函数`StatusCode`或`ToString`函数来获取具体错误码及错误信息。 + ## MSTensor \#include <[types.h](https://gitee.com/mindspore/mindspore/blob/master/include/api/types.h)> @@ -3186,7 +3359,7 @@ const SchemaVersion GetVersion() ### 构造函数 ```cpp -AbstractDelegate(); +AbstractDelegate() AbstractDelegate(const std::vector &inputs, const std::vector &outputs) : inputs_(inputs), outputs_(outputs) ``` @@ -3242,7 +3415,7 @@ std::vector outputs_ ### 构造函数 ```cpp -IDelegate(); +IDelegate() IDelegate(const std::vector &inputs, const std::vector &outputs) : AbstractDelegate(inputs, outputs) ``` @@ -4380,8 +4553,8 @@ enum CompCode : uint32_t { ### 构造函数 ```cpp - InputAndOutput(); - InputAndOutput(const std::shared_ptr &cell, const std::vector &prev, int32_t index); + InputAndOutput() + InputAndOutput(const std::shared_ptr &cell, const std::vector &prev, int32_t index) ``` - 参数 @@ -4803,9 +4976,9 @@ ModelParallelRunner() |-------------------------------------------------------------|---------|---------| | [inline Status Init(const std::string &model_path, const std::shared_ptr\ &runner_config = nullptr)](#init) | √ | ✕ | | [Status Init(const void *model_data, const size_t data_size, const std::shared_ptr\ &runner_config = nullptr)](#init-1) | √ | ✕ | -| [std::vector\ GetInputs()](#getinputs) | √ | ✕ | -| [std::vector\ GetOutputs()](#getoutputs) | √ | ✕ | -| [Status Predict(const std::vector\ &inputs, std::vector\ *outputs, const MSKernelCallBack &before = nullptr, const MSKernelCallBack &after = nullptr)](#predict) | √ | ✕ | +| [std::vector\ GetInputs()](#getinputs-2) | √ | ✕ | +| [std::vector\ GetOutputs()](#getoutputs-2) | √ | ✕ | +| [Status Predict(const std::vector\ &inputs, std::vector\ *outputs, const MSKernelCallBack &before = nullptr, const MSKernelCallBack &after = nullptr)](#predict-3) | √ | ✕ | #### Init diff --git a/docs/lite/docs/source_en/mindir/converter_tool_ascend.md b/docs/lite/docs/source_en/mindir/converter_tool_ascend.md index 82fec57ecf..6d4f92cc5e 100644 --- a/docs/lite/docs/source_en/mindir/converter_tool_ascend.md +++ b/docs/lite/docs/source_en/mindir/converter_tool_ascend.md @@ -86,6 +86,12 @@ Table 3: Configure [acl_build_options] parameter | `ge.externalWeight` | Optional | Do you want to save the weights of constant nodes separately in a file. | String | Options: `"1"`, `"0"` | | `ge.exec.exclude_engines` | Optional | Set the network model not to use one or some acceleration engines. | String | Options: `"AiCore"`, `"AiVec"`, `"AiCpu"` | +Table 4: Configure [SplitGraph] parameter + +| Parameters | Attributes | Functions Description | Types | Values Description | +| ----------------------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | +| `split_node_name` | Optional | Specify Subgraph Partitioning Nodes​ | String | The format is [[input nodes], [output nodes]]. ​​When the input nodes are empty, it indicates that the subgraph uses the entire graph's inputs. The output nodes cannot be empty.​ | + ## Dynamic Shape Configuration In some inference scenarios, such as detecting a target and then executing the target recognition network, the number of targets is not fixed resulting in a variable input BatchSize for the target recognition network. If each inference is computed at the maximum BatchSize or maximum resolution, it will result in wasted computational resources. Therefore, it needs to support dynamic BatchSize and dynamic resolution scenarios during inference. Lite inference on Ascend supports dynamic BatchSize and dynamic resolution scenarios. The dynamic_dims dynamic parameter in [ascend_context] is configured via configFile in the convert phase, and the model [Resize](https://www.mindspore.cn/lite/docs/en/master/mindir/runtime_cpp.html#dynamic-shape-input) is used during inference, to change the input shape. diff --git a/docs/lite/docs/source_en/mindir/runtime_python.md b/docs/lite/docs/source_en/mindir/runtime_python.md index 15928f2a7c..e95a5bdca3 100644 --- a/docs/lite/docs/source_en/mindir/runtime_python.md +++ b/docs/lite/docs/source_en/mindir/runtime_python.md @@ -4,7 +4,7 @@ ## Overview -This tutorial provides a sample program for MindSpore Lite to perform cloud-side inference, demonstrating the [Python interface](https://mindspore.cn/lite/api/en/master/mindspore_lite.html) to perform the basic process of cloud-side inference through file input, inference execution, and inference result printing, and enables users to quickly understand the use of MindSpore Lite APIs related to cloud-side inference execution. The related files are put in the directory [mindspore-lite/examples/cloud_infer/quick_start_python](https://gitee.com/mindspore/mindspore-lite/tree/master/mindspore-lite/examples/cloud_infer/quick_start_python). +This tutorial provides a sample program for MindSpore Lite to perform cloud-side inference, demonstrating the [Python interface](https://mindspore.cn/lite/api/en/master/mindspore_lite.html) to perform the basic process of cloud-side inference through file input, inference execution, inference result printing, dynamic weight update and subgraph splitting inference, and enables users to quickly understand the use of MindSpore Lite APIs related to cloud-side inference execution. The related files are put in the directory [mindspore-lite/examples/cloud_infer/quick_start_python](https://gitee.com/mindspore/mindspore-lite/tree/master/mindspore-lite/examples/cloud_infer/quick_start_python). MindSpore Lite cloud-side inference is supported to run in Linux environment deployment only. Atlas 200/300/500 inference product, Atlas inference series, Atlas training series and CPU hardware backends are supported. @@ -16,6 +16,10 @@ The following is an example of how to use the Python Cloud-side Inference Demo o - For a description of the Python Cloud-side Inference Demo content, see the [Demo Content Description](#demo-content-description) section for details. +- For a description of Weight Update content, see the [Dynamic Weight Update](#dynamic-weight-update) section for details. + +- For a description of Subgraph Splitting Inference content, see the [Subgraph Splitting Inference](#subgraph-splitting-inference) section for details. + ## One-click Installation This session introduces the installation of MindSpore Lite for Python version 3.7 via pip on a Linux-x86_64 system with a CPU environment, taking the new Ubuntu 18.04 as an example. @@ -202,3 +206,54 @@ new_weight = mslite.Tensor(data) new_weights = [new_weight] model.update_weights([new_weights]) ``` + +## Subgraph Splitting Inference + +When performing offline model conversion, if the [SplitGraph] parameter under the split_node_name is configured in the configuration file, the subgraph splitting inference feature must be used to create and infer the model. The purpose of this feature is to split the original model into multiple parts according to the specified configuration during the conversion process. Users can obtain the output of the intermediate layers of the model or provide input to certain layers in the middle of the model in this way, enabling inference on only a part of the model. + +### Creating MultiModelRunner + +Create a MultiModelRunnerobject as shown below: + +```python +import mindspore_lite as mslite +model_path = "path_to_model" +context = mslite.Context() +context.target = ["ascend"] +context.ascend.device_id = 0 +runner = mslite.MultiModelRunner() +runner.build_from_file(model_path, mslite.ModelType.MINDIR, context) +``` + +### Obtaining ModelExecutor + +ModelExecutor can be understood as a ​​subgraph exported during model conversion​​ based on user-specified inputs and outputs. When creating a MultiModelRunner, multiple ModelExecutorinstances are simultaneously generated for inference. Obtain the ModelExecutoras follows: + +```python +execs = runner.get_model_executor() +``` + +### Executing ModelExecutor Inference​ + +When performing inference using ModelExecutor, you must first identify its input and output names. The inputs of a ModelExecutormay originate from either ​​the inputs of the entire graph​​ or ​​outputs from other ModelExecutorinstances​​. You can use the following methods to retrieve the inputs and outputs: +ModelExecutor.get_inputs() Returns the input names of the current ModelExecutor. +ModelExecutor.get_outputs() Returns the output names of the current ModelExecutor.It is important to note that the sliced subgraph inputs may be more specified in the conversion profile than the output, and the number of subgraphs may also be more than specified in the configuration file, because there are some additional inputs from other subgraphs to prevent duplicate nodes in the subgraph when performing subgraph splitting. +Refer to the following code for inference with ModelExecutor: + +```python +import numpy as np +dtype_map = { + mslite.DataType.FLOAT32:np.float32, + mslite.DataType.INT32:np.int32, + mslite.DataType.FLOAT16:np.float16, + mslite.DataType.INT8:np.int8 +} +for exec in execs: + exec_inputs = exec.get_inputs() + exec_outputs = exec.get_outputs() + for i,input in enumerate(exec_inputs): + print("input name:", input.name, " input.shape:", input.shape, " input.dtype:", input.dtype) + data = np.random.randn(*input.shape).astype(dtype_map[input.dtype]) + input.set_data_from_numpy(data) + exec.predict(exec_inputs) +``` diff --git a/docs/lite/docs/source_zh_cn/mindir/converter_tool_ascend.md b/docs/lite/docs/source_zh_cn/mindir/converter_tool_ascend.md index 4619cdc750..591132b0f1 100644 --- a/docs/lite/docs/source_zh_cn/mindir/converter_tool_ascend.md +++ b/docs/lite/docs/source_zh_cn/mindir/converter_tool_ascend.md @@ -86,6 +86,12 @@ | `ge.externalWeight` | 可选 | 是否将常量节点的权重单独保存到文件中。 | String | 可选有`"1"`、`"0"` | | `ge.exec.exclude_engines` | 可选 | 设置网络模型不使用某个或者某些加速引擎。 | String | 可选有`"AiCore"`、`"AiVec"`、`"AiCpu"` | +表4:配置[SplitGraph]参数 + +| 参数 | 属性 | 功能描述 | 参数类型 | 取值说明 | +| ----------------------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | +| `split_node_name` | 可选 | 指定子图切分节点 | String | 格式为`[[输入节点],[输出节点]]`。输入节点为空时代表该子图输入使用整图输入,输出节点不可为空 | + ## 动态shape配置 在某些推理场景,如检测出目标后再执行目标识别网络,由于目标个数不固定导致目标识别网络输入BatchSize不固定。如果每次推理都按照最大的BatchSize或最大分辨率进行计算,会造成计算资源浪费。因此,推理需要支持动态BatchSize和动态分辨率的场景,Lite在Ascend上推理支持动态BatchSize和动态分辨率场景,在convert阶段通过configFile配置[ascend_context]中dynamic_dims动态参数,推理时使用model的[Resize](https://www.mindspore.cn/lite/docs/zh-CN/master/mindir/runtime_cpp.html#%E5%8A%A8%E6%80%81shape%E8%BE%93%E5%85%A5)功能,改变输入shape。 diff --git a/docs/lite/docs/source_zh_cn/mindir/runtime_python.md b/docs/lite/docs/source_zh_cn/mindir/runtime_python.md index 0a485d0fee..52f660d48f 100644 --- a/docs/lite/docs/source_zh_cn/mindir/runtime_python.md +++ b/docs/lite/docs/source_zh_cn/mindir/runtime_python.md @@ -4,7 +4,7 @@ ## 概述 -本教程提供了MindSpore Lite执行云侧推理的示例程序,通过文件输入、执行推理、打印推理结果的方式,演示了[Python接口](https://mindspore.cn/lite/api/zh-CN/master/mindspore_lite.html)进行云侧推理的基本流程,用户能够快速了解MindSpore Lite执行云侧推理相关API的使用。相关代码放置在[mindspore-lite/examples/cloud_infer/quick_start_python](https://gitee.com/mindspore/mindspore-lite/tree/master/mindspore-lite/examples/cloud_infer/quick_start_python)目录。 +本教程提供了MindSpore Lite执行云侧推理的示例程序,通过文件输入、执行推理、打印推理结果、动态权重更新以及子图切分推理的方式,演示了[Python接口](https://mindspore.cn/lite/api/zh-CN/master/mindspore_lite.html)进行云侧推理的基本流程,用户能够快速了解MindSpore Lite执行云侧推理相关API的使用。相关代码放置在[mindspore-lite/examples/cloud_infer/quick_start_python](https://gitee.com/mindspore/mindspore-lite/tree/master/mindspore-lite/examples/cloud_infer/quick_start_python)目录。 MindSpore Lite云侧推理仅支持在Linux环境部署运行。支持Atlas 200/300/500推理产品、Atlas推理系列产品、Atlas训练系列产品和CPU硬件后端。 @@ -16,6 +16,10 @@ MindSpore Lite云侧推理仅支持在Linux环境部署运行。支持Atlas 200/ - Python云侧推理Demo内容说明,详情参见[Demo内容说明](#demo内容说明)小节。 +- 动态权重更新内容说明,详情参见[动态权重更新](#动态权重更新)小节。 + +- 子图切分推理内容说明,详情参见[子图切分推理](#子图切分推理)小节。 + ## 一键安装 本环节以全新的Ubuntu 18.04为例,介绍在CPU环境的Linux-x86_64系统上,通过pip安装Python3.7版本的MindSpore Lite。 @@ -202,3 +206,52 @@ new_weight = mslite.Tensor(data) new_weights = [new_weight] model.update_weights([new_weights]) ``` + +## 子图切分推理 + +在进行离线模型转换时如果配置文件中配置了[SplitGraph]下split_node_name参数,则需要使用子图切分推理功能来进行模型的创建与推理。该特性的作用是在转换时根据指定的配置将原本的模型切分为多个,用户可以通过该方式获取到模型中间层的输出,或者为模型中间的某些层提供输入,以实现只推理模型中的一部分。 + +### 创建模型 + +按照如下所示方式创建MultiModelRunner对象: + +```python +import mindspore_lite as mslite +model_path = "path_to_model" +context = mslite.Context() +context.target = ["ascend"] +context.ascend.device_id = 0 +runner = mslite.MultiModelRunner() +runner.build_from_file(model_path, mslite.ModelType.MINDIR, context) +``` + +### 获取ModelExecutor + +ModelExecutor可以理解为在模型转换时按照用户所指定的输入和输出导出的一个子图,在创建MultiModelRunner时会同时创建多个ModelExecutor用于推理。需要注意的是切分后的子图输入与输出相比转换配置文件中指定的可能会更多,并且子图数量也可能会比配置文件中指定的更多,这是因为在进行子图切分时为了防止子图中存在重复节点,存在一些来自与其他子图的额外输入。按照如下方式获取ModelExecutor: + +```python +execs = runner.get_model_executor() +``` + +### 执行ModelExecutor推理 + +使用ModelExecutor推理时需要先查看ModelExecutor的输入名和输出名,每个ModelExecutor的输入可能来自于整图的输入或者其他ModelExecutor的输出,可以使用ModelExecutor.get_inputs()方法获取到当前ModelExecutor的输入,以及ModelExecutor.get_outputs()方法来获取ModelExecutor的输出,参考如下代码进行ModelExecutor的推理: + +```python +import numpy as np +dtype_map = { + mslite.DataType.FLOAT32:np.float32, + mslite.DataType.INT32:np.int32, + mslite.DataType.FLOAT16:np.float16, + mslite.DataType.INT8:np.int8 +} +for exec in execs: + exec_inputs = exec.get_inputs() + exec_outputs = exec.get_outputs() + for i,input in enumerate(exec_inputs): + print("input name:", input.name, " input.shape:", input.shape, " input.dtype:", input.dtype) + data = np.random.randn(*input.shape).astype(dtype_map[input.dtype]) + input.set_data_from_numpy(data) + exec.predict(exec_inputs) +``` + -- Gitee