diff --git a/docs/lite/api/source_zh_cn/api_cpp/mindspore.md b/docs/lite/api/source_zh_cn/api_cpp/mindspore.md index db845f92993a0777ec0d7a92a167cd03e9a4acc0..aec44f385345beecdd8cd35f4b55903a5b407286 100644 --- a/docs/lite/api/source_zh_cn/api_cpp/mindspore.md +++ b/docs/lite/api/source_zh_cn/api_cpp/mindspore.md @@ -1520,7 +1520,7 @@ Buffer Clone() const ## Model -\#include <[model.h](https://gitee.com/mindspore/mindspore/blob/master/include/api/model.h)> +\#include <[model.h](https://gitee.com/mindspore/mindspore-lite/blob/master/include/api/model.h)> Model定义了MindSpore中的模型,便于计算图管理。 @@ -1536,10 +1536,11 @@ Model() | 函数 | 云侧推理是否支持 | 端侧推理是否支持 | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| | [Status Build(const void *model_data, size_t data_size, ModelType model_type, const std::shared_ptr\ &model_context = nullptr)](#build) | √ | √ | -| [inline Status Build(const std::string &model_path, ModelType model_type, const std::shared_ptr\ &model_context = nullptr)](#build-1) | √ | √ | -| [inline Status Build(const void *model_data, size_t data_size, ModelType model_type, const std::shared_ptr\ &model_context, const Key &dec_key, const std::string &dec_mode, const std::string &cropto_lib_path)](#build-2) | √ | √ | -| [inline Status Build(const std::string &model_path, ModelType model_type, const std::shared_ptr\ &model_context, const Key &dec_key, const std::string &dec_mode, const std::string &cropto_lib_path)](#build-3) | √ | √ | -| [Status Build(GraphCell graph, const std::shared_ptr\ &model_context = nullptr, const std::shared_ptr\ &train_cfg = nullptr)](#build-4) | ✕ | √ | +| [Status Build(const void *model_data, size_t data_size, const void *weight_data, size_t weight_size, ModelType model_type, const std::shared_ptr\ &model_context = nullptr)](#build-1) | √ | ✕ | +| [inline Status Build(const std::string &model_path, ModelType model_type, const std::shared_ptr\ &model_context = nullptr)](#build-2) | √ | √ | +| [inline Status Build(const void *model_data, size_t data_size, ModelType model_type, const std::shared_ptr\ &model_context, const Key &dec_key, const std::string &dec_mode, const std::string &cropto_lib_path)](#build-3) | √ | √ | +| [inline Status Build(const std::string &model_path, ModelType model_type, const std::shared_ptr\ &model_context, const Key &dec_key, const std::string &dec_mode, const std::string &cropto_lib_path)](#build-4) | √ | √ | +| [Status Build(GraphCell graph, const std::shared_ptr\ &model_context = nullptr, const std::shared_ptr\ &train_cfg = nullptr)](#build-5) | ✕ | √ | | [Status BuildTransferLearning(GraphCell backbone, GraphCell head, const std::shared_ptr\ &context, const std::shared_ptr\ &train_cfg = nullptr)](#buildtransferlearning) | ✕ | √ | | [Status Resize(const std::vector\ &inputs, const std::vector\\> &dims)](#resize) | √ | √ | | [Status UpdateWeights(const std::vector\ &new_weights)](#updateweights) | ✕ | √ | @@ -1593,7 +1594,29 @@ Status Build(const void *model_data, size_t data_size, ModelType model_type, - `model_data`: 指向存储读入模型文件缓冲区的指针。 - `data_size`: 缓冲区大小。 - - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云端推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 + - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云侧推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 + - `model_context`: 模型[Context](#context)。 + +- 返回值 + + 状态码类`Status`对象,可以使用其公有函数`StatusCode`或`ToString`函数来获取具体错误码及错误信息。 + +#### Build + +```cpp +Status Build(const void *model_data, size_t data_size, const void *weight_data, size_t weight_size, ModelType model_type, + const std::shared_ptr &model_context = nullptr) +``` + +从内存缓冲区加载模型和权重数据,并将模型编译至可在Device上运行的状态。 + +- 参数 + + - `model_data`: 指向存储读入模型文件缓冲区的指针。 + - `data_size`: 模型缓冲区大小。 + - `weight_data`: 指向存储读入权重文件缓冲区的指针。 + - `weight_size`: 权重缓冲区大小。 + - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云侧推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 - `model_context`: 模型[Context](#context)。 - 返回值 @@ -1612,7 +1635,7 @@ inline Status Build(const std::string &model_path, ModelType model_type, - 参数 - `model_path`: 模型文件路径。 - - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云端推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 + - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云侧推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 - `model_context`: 模型[Context](#context)。 - 返回值 @@ -1633,7 +1656,7 @@ inline Status Build(const void *model_data, size_t data_size, ModelType model_ty - `model_data`: 指向存储读入模型文件缓冲区的指针。 - `data_size`: 缓冲区大小。 - - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云端推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 + - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云侧推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 - `model_context`: 模型[Context](#context)。 - `dec_key`: 解密密钥,用于解密密文模型,密钥长度为16。 - `dec_mode`: 解密模式,可选有`AES-GCM`。 @@ -1656,7 +1679,7 @@ inline Status Build(const std::string &model_path, ModelType model_type, - 参数 - `model_path`: 模型文件路径。 - - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云端推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 + - `model_type`: 模型文件类型,可选有`ModelType::kMindIR_Lite`、`ModelType::kMindIR`,分别对应`ms`模型(`converter_lite`工具导出)和`mindir`模型(MindSpore导出或`converter_lite`工具导出)。在端侧和云侧推理包中,端侧推理只支持`ms`模型推理,该入参值被忽略。云侧推理支持`ms`和`mindir`模型推理,需要将该参数设置为模型对应的选项值。云侧推理对`ms`模型的支持,将在未来的迭代中删除,推荐通过`mindir`模型进行云侧推理。 - `model_context`: 模型[Context](#context)。 - `dec_key`: 解密密钥,用于解密密文模型,密钥长度为16。 - `dec_mode`: 解密模式,可选有`AES-GCM`。 @@ -2393,7 +2416,7 @@ MultiModelRunner() | 函数 | 云侧推理是否支持 | 端侧推理是否支持 | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------| -| [inline Status Build(const std::string &model_path, const ModelType &model_type, const std::shared_ptr\ &model_context = nullptr)](#build-5) | √ | ✕ | +| [inline Status Build(const std::string &model_path, const ModelType &model_type, const std::shared_ptr\ &model_context = nullptr)](#build-6) | √ | ✕ | | [std::vector\ GetModelExecutor() const](#getmodelexecutor) | √ | ✕ | | [inline Status LoadConfig(const std::string &config_path)](#loadconfig-1) | √ | ✕ | | [inline Status UpdateConfig(const std::string §ion, const std::pair\ &config)](#updateconfig-1) | √ | ✕ | diff --git a/docs/lite/api/source_zh_cn/index.rst b/docs/lite/api/source_zh_cn/index.rst index 8febcf73e1a90a61c38de530ed4be8ccf1c076c0..d8b966bc0a68109cefc6216b9f56c3e87547d4bd 100644 --- a/docs/lite/api/source_zh_cn/index.rst +++ b/docs/lite/api/source_zh_cn/index.rst @@ -160,6 +160,8 @@ MindSpore Lite API 支持情况汇总 +---------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Model | 从内存缓冲区加载模型,并将模型编译至可在Device上运行的状态 | Status Build(const void \*model_data, size_t data_size, ModelType model_type, const std::shared_ptr &model_context = nullptr) | | +---------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Model | 从内存缓冲区加载模型和权重数据,并将模型编译至可在Device上运行的状态 | Status Build(const void \*model_data, size_t data_size, const void \*weight_data, size_t weight_size, ModelType model_type, const std::shared_ptr &model_context = nullptr) | | ++---------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Model | 从内存缓冲区加载模型,并将模型编译至可在Device上运行的状态 | Status Build(const std::string &model_path, ModelType model_type, const std::shared_ptr &model_context = nullptr) | `Model.build_from_file `__ | +---------------------+---------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Model | 根据路径读取加载模型,并将模型编译至可在Device上运行的状态 | Status Build(const void \*model_data, size_t data_size, ModelType model_type, const std::shared_ptr &model_context, const Key &dec_key, const std::string &dec_mode, const std::string &crypto_lib_path) | | diff --git a/docs/lite/docs/source_en/mindir/runtime_cpp.md b/docs/lite/docs/source_en/mindir/runtime_cpp.md index 911f2dbb9c00e6cb392d177bbfae6603180de017..7c6340fc06c30e64f24c34622a3f8b57486f0cc4 100644 --- a/docs/lite/docs/source_en/mindir/runtime_cpp.md +++ b/docs/lite/docs/source_en/mindir/runtime_cpp.md @@ -594,7 +594,90 @@ ge.dynamicNodeType=1 ### Loading Models through Multiple Threads -When the backend is Ascend and the provider is the default, it supports loading multiple Ascend optimized models through multiple threads to improve model loading performance. Using the [Model converting tool](https://www.mindspore.cn/lite/docs/en/master/mindir/converter_tool.html), we can specify `--optimize=ascend_oriented` to convert `MindIR` models exported from MindSpore, third-party framework models such as TensorFlow and ONNX into Ascend optimized models. The `MindIR` models exported by MindSpore have not undergone Ascend optimization. For third-party framework models, the `MindIR` model generated by specifying `--optimize=none` in the converting tool has not undergone Ascend optimization. +When the backend is Ascend or CPU, it supports loading multiple models through multiple threads to improve model loading performance. + +Loading models through multiple threads is disabled by default. It can be enabled by setting `compile_graph_parallel` to `on` in the configuration file: + +```ini +[common_context] +compile_graph_parallel=on +``` + +Disable loading models through multiple threads: + +```ini +[common_context] +compile_graph_parallel=off +``` + +When loading models through multiple threads is disable, the model loading process will be serialized in multiple threads. + +#### Example + +Loading models through multiple threads with C++: + +```c++ +using namespace mindspore; +char *MODEL_PATH = "/path/to/model"; +char *CONFIG_PATH = "/path/to/config"; +ModelType MODEL_TYPE = ModelType.MINDIR; +const int PARALLEL = 2; +int build_model(int i, std::string model_path, std::string config_path, ModelType model_type, std::array, PARALLEL>* models) { + if (models == nullptr) { return -1; } + auto context = std::make_shared(); + if (context == nullptr) { return -1; } + auto& device_list = context->MutableDeviceInfo(); + device_list.push_back(std::make_shared()); + auto model = std::make_unique(); + if (model == nullptr) { return -1; } + if (!config_path.empty()){ + auto ret = model->LoadConfig(config_path); + if (ret != kSuccess) { return -1; } + } + auto ret = model->Build(model_path, model_type, context); + if (ret != kSuccess) { return -1; } + + auto& models_ref = *models + models_ref[i] = std::move(model); + return 0; +} + +void main() { + std::array, PARALLEL> models; + std::vector threads; + for (int i = 0; i < PARALLEL; i++) { + threads.emplace_back(build_model, i, MODEL_PATH, CONFIG_PATH, MODEL_TYPE, &models); + } + for (auto& thread : threads) { thread.join(); } + return 0; +} +``` + +Loading models through multiple threads with Python: + +```python +import mindspore_lite as mslite +from concurrent.futures import ThreadPoolExecutor, as_completed + +MODEL_PATH = "/path/to/model" +CONFIG_PATH = "/path/to/config" +MODEL_TYPE = mslite.ModelType.MINDIR +PARALLEL = 2 + +def build_model(model_path, model_type, config_path): + try: + context = mslite.Context() + model = mslite.Model() + context.target = ["ascend"] + model.build_from_file(model_path, model_type, context, config_path=config_path) + return model + except: + return None + +pool = ThreadPoolExecutor(max_workers=PARALLEL) +tasks = [pool.submit(build_model, MODEL_PATH, CONFIG_PATH, MODEL_TYPE) for i in range(PARALLEL)] +models = [task.result() for task in as_completed(tasks, timeout=thread_pool_timeout)] +``` ### Multiple Models Sharing Weights @@ -751,6 +834,31 @@ ge.externalWeight=1 AddModel, CalMaxSizeOfWorkspace, and model.build need to be executed in child threads when the model on the ACL backend is active and shared for multithreading. ModelGroup and model need to use different contexts, and do not share the same context, That is, N contexts should be initialized for N models, and one context should be added for ModelGroup. +### Inference Time Limit of ACL Offline Model + +ACL offline model inference supports limiting inference time. If the inference time exceeds the specified time, an error will be returned. + +Limiting inference time is disabled by default. It can be enabled by setting `timeout` in Section `[ascend_context]`: + +| Value | Description | +|:--- |:--- | +|-1 | Wait indefinitely | +|>0 | Limit inference time, unit is millisecond. | + +Config of waiting indefinitely: + +```ini +[ascend_context] +timeout=-1 +``` + +Config of waiting 50ms: + +```ini +[ascend_context] +timeout=50 +``` + ## Experimental feature ### multi-backend runtime diff --git a/docs/lite/docs/source_zh_cn/mindir/runtime_cpp.md b/docs/lite/docs/source_zh_cn/mindir/runtime_cpp.md index 991685f69e127efe7b988d862322b6e11e6794b5..9484ab31b93e12151d5ec031aeae46344e38b345 100644 --- a/docs/lite/docs/source_zh_cn/mindir/runtime_cpp.md +++ b/docs/lite/docs/source_zh_cn/mindir/runtime_cpp.md @@ -595,7 +595,90 @@ ge.dynamicNodeType=1 ### 多线程加载模型 -硬件后端为Ascend,provider为默认时,支持多线程并发加载多个Ascend优化后模型,以提升模型加载性能。使用[模型转换工具](https://www.mindspore.cn/lite/docs/zh-CN/master/mindir/converter_tool.html),指定 `--optimize=ascend_oriented` 可将MindSpore导出的 `MindIR` 模型、TensorFlow和ONNX等第三方框架模型转换为Ascend优化后模型。MindSpore导出的 `MindIR` 模型未进行Ascend优化,对于第三方框架模型,转换工具中如果指定 `--optimize=none` 产生的 `MindIR` 模型也未进行Ascend优化。 +后端为ACL或CPU时,支持多线程并发加载模型,以提升模型加载性能。 + +多线程并发加载模型默认配置为关闭,可通过配置文件开启,在`[common_context]`中设置`compile_graph_parallel`选项为`on`: + +```ini +[common_context] +compile_graph_parallel=on +``` + +关闭多线程并发加载模型: + +```ini +[common_context] +compile_graph_parallel=off +``` + +关闭多线程并发加载模型后,在多个线程加载模型,将仍以串行方式加载模型。 + +#### 示例 + +C++ 多线程并发加载模型: + +```c++ +using namespace mindspore; +char *MODEL_PATH = "/path/to/model"; +char *CONFIG_PATH = "/path/to/config"; +ModelType MODEL_TYPE = ModelType.MINDIR; +const int PARALLEL = 2; +int build_model(int i, std::string model_path, std::string config_path, ModelType model_type, std::array, PARALLEL>* models) { + if (models == nullptr) { return -1; } + auto context = std::make_shared(); + if (context == nullptr) { return -1; } + auto& device_list = context->MutableDeviceInfo(); + device_list.push_back(std::make_shared()); + auto model = std::make_unique(); + if (model == nullptr) { return -1; } + if (!config_path.empty()){ + auto ret = model->LoadConfig(config_path); + if (ret != kSuccess) { return -1; } + } + auto ret = model->Build(model_path, model_type, context); + if (ret != kSuccess) { return -1; } + + auto& models_ref = *models + models_ref[i] = std::move(model); + return 0; +} + +void main() { + std::array, PARALLEL> models; + std::vector threads; + for (int i = 0; i < PARALLEL; i++) { + threads.emplace_back(build_model, i, MODEL_PATH, CONFIG_PATH, MODEL_TYPE, &models); + } + for (auto& thread : threads) { thread.join(); } + return 0; +} +``` + +Python 多线程并发加载模型: + +```python +import mindspore_lite as mslite +from concurrent.futures import ThreadPoolExecutor, as_completed + +MODEL_PATH = "/path/to/model" +CONFIG_PATH = "/path/to/config" +MODEL_TYPE = mslite.ModelType.MINDIR +PARALLEL = 2 + +def build_model(model_path, model_type, config_path): + try: + context = mslite.Context() + model = mslite.Model() + context.target = ["ascend"] + model.build_from_file(model_path, model_type, context, config_path=config_path) + return model + except: + return None + +pool = ThreadPoolExecutor(max_workers=PARALLEL) +tasks = [pool.submit(build_model, MODEL_PATH, CONFIG_PATH, MODEL_TYPE) for i in range(PARALLEL)] +models = [task.result() for task in as_completed(tasks, timeout=thread_pool_timeout)] +``` ### 多模型共享权重 @@ -753,6 +836,31 @@ ge.externalWeight=1 acl后端的模型在进行激活共享并且为多线程共享时AddModel,CalMaxSizeOfWorkspace,以及model.build需要在子线程中执行。ModelGroup和model需要使用不同的context实例,不要共用一个context,即N个模型要初始化N个context用于模型,再加一个context用于ModelGroup。 +### ACL离线模型推理时间限制 + +ACL离线模型推理时,支持限制推理时间,推理时间超过指定时间将返回错误。 + +推理时间限制默认配置为关闭,可通过配置文件开启,在`[ascend_context]`中设置`timeout`选项: + +|值 | 描述 | +|:--- |:--- | +|-1 | 表示永久等待 | +|>0 | 限制推理时间,单位是毫秒 | + +永久等待配置: + +```ini +[ascend_context] +timeout=-1 +``` + +50ms超时配置: + +```ini +[ascend_context] +timeout=50 +``` + ## 实验特性 ### 多后端异构能力