diff --git a/operator_contrib/LinSpaceV2Sample/CMakeLists.txt b/operator_contrib/LinSpaceV2Sample/CMakeLists.txt new file mode 100755 index 0000000000000000000000000000000000000000..a6d4c477395e9720b3a13466fce0c4c758c2c642 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/CMakeLists.txt @@ -0,0 +1,25 @@ +add_ops_compile_options( + OP_NAME LinSpaceV2 + OPTIONS --cce-auto-sync=on + -Wno-deprecated-declarations + -Werror +) + +target_sources(op_host_aclnn PRIVATE +op_host/lin_space_v2.cpp +) + +target_sources(optiling PRIVATE + op_host/lin_space_v2.cpp +) + +target_include_directories(optiling PRIVATE + ${CMAKE_CURRENT_SOURCE_DIR}/op_host +) + +target_sources(opsproto PRIVATE + op_host/lin_space_v2.cpp +) + +install(FILES op_kernel/lin_space_v2.cpp + DESTINATION ${ASCEND_IMPL_OUT_DIR}/dynamic) \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/README.md b/operator_contrib/LinSpaceV2Sample/README.md new file mode 100644 index 0000000000000000000000000000000000000000..10f48130e1d502cb3ed28ee2b7b018c3da0816da --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/README.md @@ -0,0 +1,76 @@ +## `LinSpaceV2`自定义算子样例说明 +本样例通过`Ascend C`编程语言实现了`LinSpaceV2`算子。 + +### 算子描述 +`LinSpaceV2`算子在指定区间内生成均匀分布数值序列, 将一段范围等分成固定数量的点,适用于创建横坐标、采样区间或网格数据。`LinSpaceV2`算子生成一维张量,相邻两个点的差值(步长 step)为: + + $$ + step = \frac{stop−start}{num\_axes - 1} + $$ + +### 算子规格描述 + + + + + + + + + + + + + +
算子类型(OpType)LinSpaceV2
nameTypedata typeformat
算子输入
starttensorfloat32、int8、uint8、int32、int16、float16、bfloat16ND
算子输入
stoptensorfloat32、int8、uint8、int32、int16、float16、bfloat16ND
算子输入
num_axestensorint32ND
算子输出ytensorfloat32、int8、uint8、int32、int16、float16、bfloat16ND
核函数名lin_space_v2
+ +### 支持的产品型号 +本样例支持如下产品型号: +- Atlas A2训练系列产品 +- Atlas 200I/500 A2推理产品 + +### 目录结构介绍 +``` +├── docs                        // 算子文档目录 +├── example                     // 调用示例目录 +├── framework                   // 第三方框架适配目录 +├── op_host                     // host目录 +├── op_kernel                   // kernel目录 +├── opp_kernel_aicpu        // aicpu目录 +└── tests                       // 测试用例目录 +``` + +### 环境要求 +编译运行此样例前,请参考[《CANN软件安装指南》](https://hiascend.com/document/redirect/CannCommunityInstSoftware)完成开发运行环境的部署。 + +### 算子包编译部署 + - 进入到仓库目录 + + ```bash + cd ${git_clone_path}/cann-ops + ``` + + - 执行编译 + + ```bash + bash build.sh + ``` + + - 部署算子包 + + ```bash + bash build_out/CANN-custom_ops--linux..run + ``` + +### 算子调用 + + + + + +
目录描述
AclNNInvocationNaive通过aclnn调用的方式调用LinSpaceV2算子。
+ +## 更新说明 +| 时间 | 更新事项 | +|----|------| +| 2025/07/16 | 新增本readme | \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/docs/.gitkeep b/operator_contrib/LinSpaceV2Sample/docs/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/operator_contrib/LinSpaceV2Sample/docs/LinSpaceV2.md b/operator_contrib/LinSpaceV2Sample/docs/LinSpaceV2.md new file mode 100644 index 0000000000000000000000000000000000000000..4e3c12c2f7ff6b9721b36b5ad56a58cb1035065a --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/docs/LinSpaceV2.md @@ -0,0 +1,93 @@ +声明:本文使用[Creative Commons License version 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)许可协议,转载、引用或修改等操作请遵循此许可协议。 + +# LinSpaceV2 + +## 支持的产品型号 + +Atlas A2 训练系列产品/Atlas 200I/500 A2推理产品 + +产品形态详细说明请参见[昇腾产品形态说明](https://www.hiascend.com/document/redirect/CannCommunityProductForm)。 + +## 功能描述 + +- 算子功能:在指定区间内生成均匀分布数值序列, 将一段范围等分成固定数量的点,适用于创建横坐标、采样区间或网格数据。 +- LinSpaceV2生成一维张量,相邻两个点的差值(步长 step)为: + + $$ + step = \frac{stop−start}{num\_axes - 1} + $$ + + +## 实现原理 + +调用`Ascend C`的`API`接口`LinSpaceV2`进行实现。对于16位的数据类型将其通过`Cast`接口转换为32位浮点数进行计算。 + +## 算子执行接口 + +每个算子分为两段式接口,必须先调用“aclnnLinSpaceV2GetWorkspaceSize”接口获取计算所需workspace大小以及包含了算子计算流程的执行器,再调用“aclnnLinSpaceV2”接口执行计算。 + +* `aclnnStatus aclnnLinSpaceV2GetWorkspaceSize(const aclTensor* start, const aclTensor* stop, const aclTensor* num_axes, const aclTensor* output, uint64_t* workspaceSize, aclOpExecutor** executor)` +* `aclnnStatus aclnnLinSpaceV2(void* workspace, uint64_t workspaceSize, aclOpExecutor* executor, aclrtStream stream)` + +**说明**: + +- 算子执行接口对外屏蔽了算子内部实现逻辑以及不同代际NPU的差异,且开发者无需编译算子,实现了算子的精简调用。 +- 若开发者不使用算子执行接口的调用算子,也可以定义基于Ascend IR的算子描述文件,通过ATC工具编译获得算子om文件,然后加载模型文件执行算子,详细调用方法可参见《应用开发指南》的[单算子调用 > 单算子模型执行](https://hiascend.com/document/redirect/CannCommunityCppOpcall)章节。 + +### aclnnLinSpaceV2GetWorkspaceSize + +- **参数说明:** + + - start(aclTensor\*,计算输入):必选参数,Device侧的aclTensor,公式中的输入start,数据类型支持float32、int8、uint8、int32、int16、float16、bfloat16,数据格式支持ND。 + - stop(aclTensor\*,计算输入):必选参数,Device侧的aclTensor,公式中的输入stop,数据类型支持float32、int8、uint8、int32、int16、float16、bfloat16,数据格式支持ND。 + - num_axes(aclTensor\*,计算输入):必选参数,Device侧的aclTensor,公式中的输入num_axes,数据类型支持int32,数据格式支持ND。 + - output(aclTensor\*,计算输出):Device侧的aclTensor,公式中的输出output,数据类型支持float32、int8、uint8、int32、int16、float16、bfloat16,数据格式支持ND,输出维度是一维,元素数量与num_axes[0]一致。 + - workspaceSize(uint64\_t\*,出参):返回用户需要在Device侧申请的workspace大小。 + - executor(aclOpExecutor\*\*,出参):返回op执行器,包含了算子计算流程。 +- **返回值:** + + 返回aclnnStatus状态码,具体参见[aclnn返回码](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/aolapi/context/common/aclnn%E8%BF%94%E5%9B%9E%E7%A0%81_fuse.md)。 + + ``` + 第一段接口完成入参校验,若出现以下错误码,则对应原因为: + - 返回161001(ACLNN_ERR_PARAM_NULLPTR):如果传入参数是必选输入,输出或者必选属性,且是空指针,则返回161001。 + - 返回161002(ACLNN_ERR_PARAM_INVALID):x、out的数据类型和数据格式不在支持的范围内。 + ``` + +### aclnnLinSpaceV2 + +- **参数说明:** + + - workspace(void\*,入参):在Device侧申请的workspace内存起址。 + - workspaceSize(uint64\_t,入参):在Device侧申请的workspace大小,由第一段接口aclnnLinSpaceV2GetWorkspaceSize获取。 + - executor(aclOpExecutor\*,入参):op执行器,包含了算子计算流程。 + - stream(aclrtStream,入参):指定执行任务的AscendCL stream流。 +- **返回值:** + + 返回aclnnStatus状态码,具体参见[aclnn返回码](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/aolapi/context/common/aclnn%E8%BF%94%E5%9B%9E%E7%A0%81_fuse.md)。 + + +## 约束与限制 + +- start,stop, num_axes只有1个元素 +- start,stop的数据类型支持float32、int8、uint8、int32、int16、float16、bfloat16,数据格式只支持ND + +## 算子原型 + + + + + + + + + + + + + +
算子类型(OpType)LinSpaceV2
nameTypedata typeformat
算子输入
starttensorfloat32、int8、uint8、int32、int16、float16、bfloat16ND
算子输入
stoptensorfloat32、int8、uint8、int32、int16、float16、bfloat16ND
算子输入
num_axestensorint32ND
算子输出ytensorfloat32、int8、uint8、int32、int16、float16、bfloat16ND
核函数名lin_space_v2
+ +## 调用示例 + +详见[LinSpaceV2自定义算子样例说明算子调用章节](../README.md#算子调用) \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/examples/.gitkeep b/operator_contrib/LinSpaceV2Sample/examples/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/CMakeLists.txt b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..c95c7c00349a635adb84d00b0689494c1079da31 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/CMakeLists.txt @@ -0,0 +1,60 @@ +# CMake lowest version requirement +cmake_minimum_required(VERSION 3.5.1) + +# project information +project(acl_execute_lin_space_v2) + +# Compile options +add_compile_options(-std=c++11) + +set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "./") + +set(INC_PATH $ENV{DDK_PATH}) + +if (NOT DEFINED ENV{DDK_PATH}) + set(INC_PATH "/usr/local/Ascend/ascend-toolkit/latest") + message(STATUS "set default INC_PATH: ${INC_PATH}") +else () + message(STATUS "env INC_PATH: ${INC_PATH}") +endif() + +set(CUST_PKG_PATH "${INC_PATH}/opp/vendors/customize/op_api") + +set(LIB_PATH $ENV{NPU_HOST_LIB}) + +# Dynamic libraries in the stub directory can only be used for compilation +if (NOT DEFINED ENV{NPU_HOST_LIB}) + set(LIB_PATH "/usr/local/Ascend/ascend-toolkit/latest/acllib/lib64/stub/") + set(LIB_PATH1 "/usr/local/Ascend/ascend-toolkit/latest/atc/lib64/stub/") + message(STATUS "set default LIB_PATH: ${LIB_PATH}") +else () + message(STATUS "env LIB_PATH: ${LIB_PATH}") +endif() + +# Header path +include_directories( + ${INC_PATH}/runtime/include + ${INC_PATH}/atc/include + ${CUST_PKG_PATH}/include +) + +# add host lib path +link_directories( + ${LIB_PATH} + ${LIB_PATH1} + ${CUST_PKG_PATH}/lib +) + +add_executable(execute_lin_space_v2_op + main.cpp +) + +target_link_libraries(execute_lin_space_v2_op + ascendcl + cust_opapi + acl_op_compiler + nnopbase + stdc++ +) + +install(TARGETS execute_lin_space_v2_op DESTINATION ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}) diff --git a/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/README.md b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7caa5820f5ad7bb3dda80409122d6a89fe086c7c --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/README.md @@ -0,0 +1,62 @@ +## 概述 + +通过aclnn调用的方式调用LinSpaceV2算子。 + +## 目录结构介绍 +``` +├── AclNNInvocationNaive +│ ├── CMakeLists.txt // 编译规则文件 +│ ├── gen_data.py // 算子期望数据生成脚本 +│ ├── main.cpp // 单算子调用应用的入口 +│ ├── run.sh // 编译运行算子的脚本 +│ └── verify_result.py // 计算结果精度比对脚本 +``` +## 代码实现介绍 +完成自定义算子的开发部署后,可以通过单算子调用的方式来验证单算子的功能。main.cpp代码为单算子API执行方式。单算子API执行是基于C语言的API执行算子,无需提供单算子描述文件进行离线模型的转换,直接调用单算子API接口。 + +自定义算子编译部署后,会自动生成单算子API,可以直接在应用程序中调用。算子API的形式一般定义为“两段式接口”,形如: + ```cpp + aclnnStatus aclnnLinSpaceV2GetWorkspaceSize(const aclTensor* start, const aclTensor* stop, const aclTensor* num_axes, const aclTensor* output, uint64_t workspaceSize, aclOpExecutor **executor); + aclnnStatus aclnnLinSpaceV2(void *workspace, int64_t workspaceSize, aclOpExecutor **executor, aclrtStream stream); + ``` +其中aclnnLinSpaceV2GetWorkspaceSize为第一段接口,主要用于计算本次API调用计算过程中需要多少的workspace内存。获取到本次API计算需要的workspace大小之后,按照workspaceSize大小申请Device侧内存,然后调用第二段接口aclnnLinSpaceV2执行计算。具体参考[AscendCL单算子调用](https://hiascend.com/document/redirect/CannCommunityAscendCInVorkSingleOp)>单算子API执行 章节。 + +## 运行样例算子 + **请确保已根据算子包编译部署步骤完成本算子的编译部署动作。** + + - 进入样例代码所在路径 + + ```bash + cd ${git_clone_path}/cann-ops/src/math/lin_space_v2/examples/AclNNInvocationNaive + ``` + + - 环境变量配置 + + 需要设置环境变量,以arm为例 + + ```bash + export DDK_PATH=/usr/local/Ascend/ascend-toolkit/latest + export NPU_HOST_LIB=/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/devlib + ``` + - 样例执行 + + 样例执行过程中会自动生成测试数据,然后编译与运行aclnn样例,最后打印运行结果。 + + ```bash + mkdir -p build + cd build + cmake .. && make + ./execute_lin_space_v2_op + ``` + + 用户亦可参考run.sh脚本进行编译与运行。 + + ```bash + bash run.sh + ``` + +## 更新说明 + +| 时间 | 更新事项 | +| ---------- | ------------ | +| 2025/07/16 | 新增本readme | \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/gen_data.py b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/gen_data.py new file mode 100644 index 0000000000000000000000000000000000000000..64a3c31e66428150bb5dd59db0894f08b0861876 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/gen_data.py @@ -0,0 +1,37 @@ +#!/usr/bin/python3 +# -*- coding:utf-8 -*- +# Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. +# This file is a part of the CANN Open Software. +# Licensed under CANN Open Software License Agreement Version 1.0 (the "License"). +# Please refer to the License for details. You may not use this file except in compliance with the License. +# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, +# INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. +# See LICENSE in the root of the software repository for the full text of the License. +# ====================================================================================================================== + +import os +import numpy as np +import torch + +def gen_golden_data_simple(): + + start = np.array([0]).astype(np.float32) + stop = np.array([125]).astype(np.float32) + num = np.array([20]).astype(np.int32) + + input_start = torch.from_numpy(start) + input_stop = torch.from_numpy(stop) + input_num = torch.from_numpy(num) + golden = torch.linspace(input_start[0], input_stop[0], input_num[0]).numpy().astype(input_start.numpy().dtype) + + os.system("mkdir -p input") + os.system("mkdir -p output") + start.tofile("./input/start.bin") + stop.tofile("./input/stop.bin") + num.tofile("./input/num.bin") + golden.tofile("./output/golden.bin") + + +if __name__ == "__main__": + gen_golden_data_simple() + diff --git a/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/main.cpp b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/main.cpp new file mode 100644 index 0000000000000000000000000000000000000000..bafcf02c39e5e08de532631ab36165f7740cc438 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/main.cpp @@ -0,0 +1,247 @@ +/** + * Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. + * This file is a part of the CANN Open Software. + * Licensed under CANN Open Software License Agreement Version 1.0 (the "License"). + * Please refer to the License for details. You may not use this file except in compliance with the License. + * THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, + * INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. + * See LICENSE in the root of the software repository for the full text of the License. + */ + +/** + * @file main.cpp + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "acl/acl.h" +#include "aclnn_lin_space_v2.h" + +#define SUCCESS 0 +#define FAILED 1 + +#define INFO_LOG(fmt, args...) fprintf(stdout, "[INFO] " fmt "\n", ##args) +#define WARN_LOG(fmt, args...) fprintf(stdout, "[WARN] " fmt "\n", ##args) +#define ERROR_LOG(fmt, args...) fprintf(stderr, "[ERROR] " fmt "\n", ##args) + +#define CHECK_RET(cond, return_expr) \ + do { \ + if (!(cond)) { \ + return_expr; \ + } \ + } while (0) + +#define LOG_PRINT(message, ...) \ + do { \ + printf(message, ##__VA_ARGS__); \ + } while (0) + +bool ReadFile(const std::string &filePath, size_t fileSize, void *buffer, size_t bufferSize) +{ + struct stat sBuf; + int fileStatus = stat(filePath.data(), &sBuf); + if (fileStatus == -1) { + ERROR_LOG("failed to get file %s", filePath.c_str()); + return false; + } + if (S_ISREG(sBuf.st_mode) == 0) { + ERROR_LOG("%s is not a file, please enter a file", filePath.c_str()); + return false; + } + + std::ifstream file; + file.open(filePath, std::ios::binary); + if (!file.is_open()) { + ERROR_LOG("Open file failed. path = %s", filePath.c_str()); + return false; + } + + std::filebuf *buf = file.rdbuf(); + size_t size = buf->pubseekoff(0, std::ios::end, std::ios::in); + if (size == 0) { + ERROR_LOG("file size is 0"); + file.close(); + return false; + } + if (size > bufferSize) { + ERROR_LOG("file size is larger than buffer size"); + file.close(); + return false; + } + buf->pubseekpos(0, std::ios::in); + buf->sgetn(static_cast(buffer), size); + fileSize = size; + file.close(); + return true; +} + +bool WriteFile(const std::string &filePath, const void *buffer, size_t size) +{ + if (buffer == nullptr) { + ERROR_LOG("Write file failed. buffer is nullptr"); + return false; + } + + int fd = open(filePath.c_str(), O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWRITE); + if (fd < 0) { + ERROR_LOG("Open file failed. path = %s", filePath.c_str()); + return false; + } + + auto writeSize = write(fd, buffer, size); + (void) close(fd); + if (writeSize != size) { + ERROR_LOG("Write file Failed."); + return false; + } + + return true; +} + +int64_t GetShapeSize(const std::vector &shape) +{ + int64_t shapeSize = 1; + for (auto i : shape) { + shapeSize *= i; + } + return shapeSize; +} + +int Init(int32_t deviceId, aclrtStream *stream) +{ + // 固定写法,acl初始化 + auto ret = aclInit(nullptr); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclInit failed. ERROR: %d\n", ret); return FAILED); + ret = aclrtSetDevice(deviceId); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclrtSetDevice failed. ERROR: %d\n", ret); return FAILED); + ret = aclrtCreateStream(stream); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclrtCreateStream failed. ERROR: %d\n", ret); return FAILED); + + return SUCCESS; +} + +template +int CreateAclTensor(const std::vector &hostData, const std::vector &shape, void **deviceAddr, + aclDataType dataType, aclTensor **tensor) +{ + auto size = GetShapeSize(shape) * sizeof(T); + // 调用aclrtMalloc申请device侧内存 + auto ret = aclrtMalloc(deviceAddr, size, ACL_MEM_MALLOC_HUGE_FIRST); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclrtMalloc failed. ERROR: %d\n", ret); return FAILED); + + // 调用aclrtMemcpy将host侧数据拷贝到device侧内存上 + ret = aclrtMemcpy(*deviceAddr, size, hostData.data(), size, ACL_MEMCPY_HOST_TO_DEVICE); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclrtMemcpy failed. ERROR: %d\n", ret); return FAILED); + + // 调用aclCreateTensor接口创建aclTensor + *tensor = aclCreateTensor(shape.data(), shape.size(), dataType, nullptr, 0, aclFormat::ACL_FORMAT_ND, shape.data(), + shape.size(), *deviceAddr); + return SUCCESS; +} + +int main(int argc, char **argv) +{ + // 1. (固定写法)device/stream初始化, 参考acl对外接口列表 + // 根据自己的实际device填写deviceId + int32_t deviceId = 0; + aclrtStream stream; + auto ret = Init(deviceId, &stream); + CHECK_RET(ret == 0, LOG_PRINT("Init acl failed. ERROR: %d\n", ret); return FAILED); + + // 2. 构造输入与输出,需要根据API的接口自定义构造 + std::vector inputShape = {1}; + std::vector outputShape = {20}; + void *inputStartDeviceAddr = nullptr; + void *inputStopDeviceAddr = nullptr; + void *inputNumDeviceAddr = nullptr; + void *outputDeviceAddr = nullptr; + aclTensor *inputStart = nullptr; + aclTensor *inputStop = nullptr; + aclTensor *inputNum = nullptr; + aclTensor *output = nullptr; + size_t inputShapeSize_1 = inputShape[0]; + size_t outputShapeSize_1= outputShape[0]; + size_t dataType = 4; + std::vector inputStartHostData(inputShape[0]); + std::vector inputStopHostData(inputShape[0]); + std::vector inputNumHostData(inputShape[0]); + std::vector outputHostData(outputShape[0]); + + size_t fileSize = 0; + void** input1 = (void**)(&inputStartHostData); + void** input2 = (void**)(&inputStopHostData); + void** input3 = (void**)(&inputNumHostData); + //读取数据 + ReadFile("../input/start.bin", fileSize, *input1, inputShapeSize_1 * dataType); + ReadFile("../input/stop.bin", fileSize, *input2, inputShapeSize_1 * dataType); + ReadFile("../input/num.bin", fileSize, *input3, inputShapeSize_1 * dataType); + + INFO_LOG("Set input success"); + // 创建input aclTensor + ret = CreateAclTensor(inputStartHostData, inputShape, &inputStartDeviceAddr, aclDataType::ACL_FLOAT, &inputStart); + CHECK_RET(ret == ACL_SUCCESS, return FAILED); + ret = CreateAclTensor(inputStopHostData, inputShape, &inputStopDeviceAddr, aclDataType::ACL_FLOAT, &inputStop); + CHECK_RET(ret == ACL_SUCCESS, return FAILED); + ret = CreateAclTensor(inputNumHostData, inputShape, &inputNumDeviceAddr, aclDataType::ACL_INT32, &inputNum); + CHECK_RET(ret == ACL_SUCCESS, return FAILED); + // 创建output aclTensor + ret = CreateAclTensor(outputHostData, outputShape, &outputDeviceAddr, aclDataType::ACL_FLOAT, &output); + CHECK_RET(ret == ACL_SUCCESS, return FAILED); + + // 3. 调用CANN自定义算子库API + uint64_t workspaceSize = 0; + aclOpExecutor *executor; + // 计算workspace大小并申请内存 + ret = aclnnLinSpaceV2GetWorkspaceSize(inputStart, inputStop, inputNum, output, &workspaceSize, &executor); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclnnLinSpaceV2GetWorkspaceSize failed. ERROR: %d\n", ret); return FAILED); + void *workspaceAddr = nullptr; + if (workspaceSize > 0) { + ret = aclrtMalloc(&workspaceAddr, workspaceSize, ACL_MEM_MALLOC_HUGE_FIRST); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("allocate workspace failed. ERROR: %d\n", ret); return FAILED;); + } + // 执行算子 + ret = aclnnLinSpaceV2(workspaceAddr, workspaceSize, executor, stream); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclnnLinSpaceV2 failed. ERROR: %d\n", ret); return FAILED); + + // 4. (固定写法)同步等待任务执行结束 + ret = aclrtSynchronizeStream(stream); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("aclrtSynchronizeStream failed. ERROR: %d\n", ret); return FAILED); + + // 5. 获取输出的值,将device侧内存上的结果拷贝至host侧,需要根据具体API的接口定义修改 + auto size = GetShapeSize(outputShape); + std::vector resultData(size, 0); + ret = aclrtMemcpy(resultData.data(), resultData.size() * sizeof(resultData[0]), outputDeviceAddr, + size * sizeof(float), ACL_MEMCPY_DEVICE_TO_HOST); + CHECK_RET(ret == ACL_SUCCESS, LOG_PRINT("copy result from device to host failed. ERROR: %d\n", ret); return FAILED); + void** output1 = (void**)(&resultData); + //写出数据 + WriteFile("../output/output.bin", *output1, outputShapeSize_1 * dataType); + INFO_LOG("Write output success"); + + // 6. 释放aclTensor,需要根据具体API的接口定义修改 + aclDestroyTensor(inputStart); + aclDestroyTensor(inputStop); + aclDestroyTensor(inputNum); + aclDestroyTensor(output); + + // 7. 释放device资源,需要根据具体API的接口定义修改 + aclrtFree(inputStartDeviceAddr); + aclrtFree(inputStopDeviceAddr); + aclrtFree(inputNumDeviceAddr); + aclrtFree(outputDeviceAddr); + if (workspaceSize > 0) { + aclrtFree(workspaceAddr); + } + aclrtDestroyStream(stream); + aclrtResetDevice(deviceId); + aclFinalize(); + + return SUCCESS; +} \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/run.sh b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/run.sh new file mode 100644 index 0000000000000000000000000000000000000000..a9f2c5736625923c9ba40f0134b8401b01323e5a --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/run.sh @@ -0,0 +1,54 @@ +#!/bin/bash +# Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. +# This file is a part of the CANN Open Software. +# Licensed under CANN Open Software License Agreement Version 1.0 (the "License"). +# Please refer to the License for details. You may not use this file except in compliance with the License. +# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, +# INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. +# See LICENSE in the root of the software repository for the full text of the License. +# ====================================================================================================================== + +if [ -n "$ASCEND_INSTALL_PATH" ]; then + _ASCEND_INSTALL_PATH=$ASCEND_INSTALL_PATH +elif [ -n "$ASCEND_HOME_PATH" ]; then + _ASCEND_INSTALL_PATH=$ASCEND_HOME_PATH +else + if [ -d "$HOME/Ascend/ascend-toolkit/latest" ]; then + _ASCEND_INSTALL_PATH=$HOME/Ascend/ascend-toolkit/latest + else + _ASCEND_INSTALL_PATH=/usr/local/Ascend/ascend-toolkit/latest + fi +fi +source $_ASCEND_INSTALL_PATH/bin/setenv.bash +export DDK_PATH=$_ASCEND_INSTALL_PATH +export NPU_HOST_LIB=$_ASCEND_INSTALL_PATH/lib64 + +rm -rf $HOME/ascend/log/* +rm ./input/*.bin +rm ./output/*.bin + +python3 gen_data.py + +if [ $? -ne 0 ]; then + echo "ERROR: generate input data failed!" + return 1 +fi +echo "INFO: generate input data success!" +set -e +rm -rf build +mkdir -p build +cmake -B build +cmake --build build -j +( + cd build + ./execute_lin_space_v2_op +) +ret=`python3 verify_result.py output/output.bin output/golden.bin` +echo $ret +if [ "x$ret" == "xtest pass" ]; then + echo "" + echo "#####################################" + echo "INFO: you have passed the Precision!" + echo "#####################################" + echo "" +fi \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/verify_result.py b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/verify_result.py new file mode 100644 index 0000000000000000000000000000000000000000..668d52449b2f3da830f4c8bd0dbf475fb0055b25 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/examples/AclNNInvocationNaive/verify_result.py @@ -0,0 +1,37 @@ +#!/usr/bin/python3 +# -*- coding:utf-8 -*- +# Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. +# This file is a part of the CANN Open Software. +# Licensed under CANN Open Software License Agreement Version 1.0 (the "License"). +# Please refer to the License for details. You may not use this file except in compliance with the License. +# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, +# INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. +# See LICENSE in the root of the software repository for the full text of the License. +# ====================================================================================================================== + +import os +import sys +import numpy as np + +LOSS = 1e-4 # 容忍偏差,一般fp32要求绝对误差和相对误差均不超过万分之一 +MINIMUM = 10e-10 + + +def verify_result(real_result, golden): + dtype = np.float32 + real_result = np.fromfile(real_result, dtype=dtype) # 从bin文件读取实际运算结果 + golden = np.fromfile(golden, dtype=dtype) # 从bin文件读取预期运算结果 + result = np.abs(real_result - golden) # 计算运算结果和预期结果偏差 + deno = np.maximum(np.abs(real_result), np.abs(golden)) # 获取最大值并组成新数组 + result_atol = np.less_equal(result, LOSS) # 计算绝对误差 + result_rtol = np.less_equal(result / np.add(deno, MINIMUM), LOSS) # 计算相对误差 + if not result_rtol.all() and not result_atol.all(): + if np.sum(result_rtol == False) > real_result.size * LOSS and \ + np.sum(result_atol == False) > real_result.size * LOSS: # 误差超出预期时返回打印错误,返回对比失败 + print("[ERROR] result error") + return False + print("test pass") + return True + +if __name__ == '__main__': + verify_result(sys.argv[1], sys.argv[2]) diff --git a/operator_contrib/LinSpaceV2Sample/framework/CMakeLists.txt b/operator_contrib/LinSpaceV2Sample/framework/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..b6be9b492610f4d45b25bb7725648df9aac39a12 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/framework/CMakeLists.txt @@ -0,0 +1,11 @@ +if(NOT EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/mindspore") + if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/caffe_plugin") + add_subdirectory(caffe_plugin) + endif() + if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/tf_plugin") + add_subdirectory(tf_plugin) + endif() + if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_plugin") + add_subdirectory(onnx_plugin) + endif() +endif() diff --git a/operator_contrib/LinSpaceV2Sample/framework/tf_plugin/CMakeLists.txt b/operator_contrib/LinSpaceV2Sample/framework/tf_plugin/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..22edc5273ed352e79068ca180380924abe63ea9d --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/framework/tf_plugin/CMakeLists.txt @@ -0,0 +1,13 @@ +aux_source_directory(${CMAKE_CURRENT_SOURCE_DIR} plugin_srcs) +add_library(cust_tf_parsers SHARED ${plugin_srcs}) +target_compile_definitions(cust_tf_parsers PRIVATE google=ascend_private) +if(ENABLE_CROSS_COMPILE) + target_link_directories(cust_tf_parsers PRIVATE + ${CMAKE_COMPILE_COMPILER_LIBRARY} + ${CMAKE_COMPILE_RUNTIME_LIBRARY} + ) +endif() +target_link_libraries(cust_tf_parsers PRIVATE intf_pub graph) +install(TARGETS cust_tf_parsers + LIBRARY DESTINATION packages/vendors/${vendor_name}/framework/tensorflow +) diff --git a/operator_contrib/LinSpaceV2Sample/framework/tf_plugin/tensorflow_lin_space_v2_plugin.cc b/operator_contrib/LinSpaceV2Sample/framework/tf_plugin/tensorflow_lin_space_v2_plugin.cc new file mode 100644 index 0000000000000000000000000000000000000000..f8ee69d240c644d9e146dc699270bb414466040a --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/framework/tf_plugin/tensorflow_lin_space_v2_plugin.cc @@ -0,0 +1,19 @@ +/** + * Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. + * This file is a part of the CANN Open Software. + * Licensed under CANN Open Software License Agreement Version 1.0 (the "License"). + * Please refer to the License for details. You may not use this file except in compliance with the License. + * THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, + * INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. + * See LICENSE in the root of the software repository for the full text of the License. + */ + +#include "register/register.h" + +namespace domi { +// register op info to GE +REGISTER_CUSTOM_OP("LinSpaceV2") + .FrameworkType(TENSORFLOW) // type: CAFFE, TENSORFLOW + .OriginOpType("LinSpaceV2") // name in tf module + .ParseParamsByOperatorFn(AutoMappingByOpFn); +} // namespace domi diff --git a/operator_contrib/LinSpaceV2Sample/op_host/lin_space_v2.cpp b/operator_contrib/LinSpaceV2Sample/op_host/lin_space_v2.cpp new file mode 100644 index 0000000000000000000000000000000000000000..996e0fe23cd4265133f1d9394b4c34340768ab58 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/op_host/lin_space_v2.cpp @@ -0,0 +1,105 @@ + +/* + * @file lin_space_v2.cpp + * + * Copyright (c) Huawei Technologies Co., Ltd. 2024. All rights reserved. + * This program is free software; you can redistribute it and/or modify + * it under the terms of the Apache License Version 2.0. + * You may not use this file except in compliance with the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + */ + +#include "lin_space_v2_tiling.h" +#include "register/op_def_registry.h" +#include "tiling/platform/platform_ascendc.h" + +namespace optiling { +static ge::graphStatus TilingFunc(gert::TilingContext* context) +{ + LinSpaceV2TilingData tiling; + int dtype = context->GetInputDesc(0)->GetDataType(); + if(dtype == ge::DT_BF16) { + context->SetTilingKey(100); + } else if(dtype == ge::DT_INT8) { + context->SetTilingKey(200); + } else if(dtype == ge::DT_INT32) { + context->SetTilingKey(201); + } else if(dtype == ge::DT_UINT8) { + context->SetTilingKey(202); + } else if(dtype == ge::DT_INT16) { + context->SetTilingKey(203); + } else if(dtype == ge::DT_FLOAT16) { + context->SetTilingKey(300); + } else { + context->SetTilingKey(400); + } + + const gert::Tensor *num_tensor = context->GetInputTensor(2); + if (num_tensor == nullptr) { + return ge::GRAPH_FAILED; + } + auto numL = num_tensor->GetData(); + int32_t num = *numL; + if(num == 1) { + context->SetTilingKey(900); + } + + context->SetBlockDim(1); + tiling.SaveToBuffer(context->GetRawTilingData()->GetData(), context->GetRawTilingData()->GetCapacity()); + context->GetRawTilingData()->SetDataSize(tiling.GetDataSize()); + + return ge::GRAPH_SUCCESS; +} +} + + +namespace ge { +static ge::graphStatus InferShape(gert::InferShapeContext* context) +{ + const gert::Shape* x1_shape = context->GetInputShape(0); + gert::Shape* y_shape = context->GetOutputShape(0); + *y_shape = *x1_shape; + return GRAPH_SUCCESS; +} +} + + +namespace ops { +class LinSpaceV2 : public OpDef { +public: + explicit LinSpaceV2(const char* name) : OpDef(name) + { + this->Input("start") + .ParamType(REQUIRED) + .DataType({ge::DT_FLOAT, ge::DT_INT8, ge::DT_UINT8, ge::DT_INT32, ge::DT_INT16, ge::DT_FLOAT16, ge::DT_BF16}) + .Format({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}) + .UnknownShapeFormat({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}); + this->Input("stop") + .ParamType(REQUIRED) + .DataType({ge::DT_FLOAT, ge::DT_INT8, ge::DT_UINT8, ge::DT_INT32, ge::DT_INT16, ge::DT_FLOAT16, ge::DT_BF16}) + .Format({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}) + .UnknownShapeFormat({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}); + this->Input("num_axes") + .ParamType(REQUIRED) + .DataType({ge::DT_INT32, ge::DT_INT32, ge::DT_INT32, ge::DT_INT32, ge::DT_INT32, ge::DT_INT32, ge::DT_INT32}) + .Format({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}) + .UnknownShapeFormat({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}); + this->Output("output") + .ParamType(REQUIRED) + .DataType({ge::DT_FLOAT, ge::DT_INT8, ge::DT_UINT8, ge::DT_INT32, ge::DT_INT16, ge::DT_FLOAT16, ge::DT_BF16}) + .Format({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}) + .UnknownShapeFormat({ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND, ge::FORMAT_ND}); + + this->SetInferShape(ge::InferShape); + + this->AICore() + .SetTiling(optiling::TilingFunc); + this->AICore().AddConfig("ascend310b"); + } +}; + +OP_ADD(LinSpaceV2); +} diff --git a/operator_contrib/LinSpaceV2Sample/op_host/lin_space_v2_tiling.h b/operator_contrib/LinSpaceV2Sample/op_host/lin_space_v2_tiling.h new file mode 100644 index 0000000000000000000000000000000000000000..bb9b5e9001d568c54bfc265b6495825cad5ffbaf --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/op_host/lin_space_v2_tiling.h @@ -0,0 +1,26 @@ +/* + * @file lin_space_v2_tiling.h + * + * Copyright (c) Huawei Technologies Co., Ltd. 2024. All rights reserved. + * This program is free software; you can redistribute it and/or modify + * it under the terms of the Apache License Version 2.0. + * You may not use this file except in compliance with the License. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + */ +#ifndef LIN_SPACE_V2_TILING_H +#define LIN_SPACE_V2_TILING_H + +#include "register/tilingdata_base.h" + +namespace optiling { +BEGIN_TILING_DATA_DEF(LinSpaceV2TilingData) + TILING_DATA_FIELD_DEF(uint32_t, CoreDataNum); +END_TILING_DATA_DEF; + +REGISTER_TILING_DATA_CLASS(LinSpaceV2, LinSpaceV2TilingData) +} + +#endif \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/op_kernel/lin_space_v2.cpp b/operator_contrib/LinSpaceV2Sample/op_kernel/lin_space_v2.cpp new file mode 100644 index 0000000000000000000000000000000000000000..28a59e51ede82919ce8e185ce1a3a462f93bdb13 --- /dev/null +++ b/operator_contrib/LinSpaceV2Sample/op_kernel/lin_space_v2.cpp @@ -0,0 +1,175 @@ + +/* +* @file lin_space_v2.cpp +* +* Copyright (c) Huawei Technologies Co., Ltd. 2024. All rights reserved. +* This program is free software; you can redistribute it and/or modify +* it under the terms of the Apache License Version 2.0. +* You may not use this file except in compliance with the License. +* +* This program is distributed in the hope that it will be useful, +* but WITHOUT ANY WARRANTY; without even the implied warranty of +* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +#include "kernel_operator.h" +using namespace AscendC; + +template class KernelLinSpaceSP1 { +public: + __aicore__ inline KernelLinSpaceSP1() {} + __aicore__ inline void Init(GM_ADDR start, GM_ADDR stop, GM_ADDR num_axes, GM_ADDR output) { + ASSERT(GetBlockNum() != 0 && "block dim can not be zero!"); + startGm.SetGlobalBuffer((__gm__ typeT*)start, 1); + stopGm.SetGlobalBuffer((__gm__ typeT*)stop, 1); + num_axesGm.SetGlobalBuffer((__gm__ int32_t*)num_axes, 1); + this->num_axes = num_axesGm.GetValue(0); + outputGm.SetGlobalBuffer((__gm__ typeT*)output, this->num_axes); + typeT tmp = startGm.GetValue(0); + outputGm.SetValue(0, tmp); + } + +private: + GlobalTensor startGm; + GlobalTensor stopGm; + GlobalTensor num_axesGm; + GlobalTensor outputGm; + int32_t num_axes; +}; + +template class KernelLinSpaceV2 { +public: + __aicore__ inline KernelLinSpaceV2() {} + __aicore__ inline void Init(GM_ADDR start, GM_ADDR stop, GM_ADDR num_axes, GM_ADDR output, TPipe* pipeIn) { + ASSERT(GetBlockNum() != 0 && "block dim can not be zero!"); + this->pipe = pipeIn; + startGm.SetGlobalBuffer((__gm__ typeT*)start, 1); + stopGm.SetGlobalBuffer((__gm__ typeT*)stop, 1); + num_axesGm.SetGlobalBuffer((__gm__ int32_t*)num_axes, 1); + this->num_axes = num_axesGm.GetValue(0); + outputGm.SetGlobalBuffer((__gm__ typeT*)output, this->num_axes); + + if constexpr (std::is_same_v) { + pipe->InitBuffer(tmp1, 16 * sizeof(float)); + pipe->InitBuffer(tmp2, 16 * sizeof(typeT)); + } else if constexpr (std::is_same_v || std::is_same_v) { + pipe->InitBuffer(tmp1, 16 * sizeof(float)); + pipe->InitBuffer(tmp2, 16 * sizeof(typeT)); + pipe->InitBuffer(tmp3, 16 * sizeof(half)); + } else if constexpr (std::is_same_v || std::is_same_v) { + pipe->InitBuffer(tmp1, 16 * sizeof(float)); + pipe->InitBuffer(tmp2, 16 * sizeof(typeT)); + } + } + __aicore__ inline void Process() { + if constexpr (std::is_same_v) { + LocalTensor tmp1Local = tmp1.Get(); + LocalTensor tmp2Local = tmp2.Get(); + typeT start = startGm.GetValue(0); + typeT stop = stopGm.GetValue(0); + outputGm.SetValue(0, start); + outputGm.SetValue(this->num_axes - 1, stop); + tmp2Local.SetValue(0, start); + tmp2Local.SetValue(1, stop); + Cast(tmp1Local, tmp2Local, RoundMode::CAST_NONE, 16); + float startfp = tmp1Local.GetValue(0); + float stopfp = tmp1Local.GetValue(1); + float interval = (stopfp - startfp) / (this->num_axes - 1); + for(int32_t k = 1; k < this->num_axes - 1; k++) { + float tmpvalue = startfp + (k * interval); + tmp1Local.SetValue(0, tmpvalue); + Cast(tmp2Local, tmp1Local, RoundMode::CAST_RINT, 16); + typeT tmpbf16 = tmp2Local.GetValue(0); + outputGm.SetValue(k, tmpbf16); + } + } else if constexpr (std::is_same_v) { + LocalTensor tmp1Local = tmp1.Get(); + LocalTensor tmp2Local = tmp2.Get(); + LocalTensor tmp3Local = tmp3.Get(); + float start = startGm.GetValue(0); + float stop = stopGm.GetValue(0); + outputGm.SetValue(0, start); + outputGm.SetValue(this->num_axes - 1, stop); + float interval = (stop - start) / (this->num_axes - 1); + half tmpfp16 = interval; + interval = tmpfp16; + for(int32_t k = 1; k < this->num_axes - 1; k++) { + tmp3Local.SetValue(0, interval); + Muls(tmp3Local, tmp3Local, half(k), 1); + Adds(tmp3Local, tmp3Local, half(start), 1); + Cast(tmp2Local, tmp3Local, RoundMode::CAST_TRUNC, 16); + typeT tmpint = tmp2Local.GetValue(0); + outputGm.SetValue(k, tmpint); + } + } else if constexpr (std::is_same_v) { + LocalTensor tmp1Local = tmp1.Get(); + LocalTensor tmp2Local = tmp2.Get(); + LocalTensor tmp3Local = tmp3.Get(); + typeT startori = startGm.GetValue(0); + typeT stopori = stopGm.GetValue(0); + tmp2Local.SetValue(0, startori); + tmp2Local.SetValue(1, stopori); + Cast(tmp3Local, tmp2Local, RoundMode::CAST_NONE, 16); + float start = tmp3Local.GetValue(0); + float stop = tmp3Local.GetValue(1); + outputGm.SetValue(0, startori); + outputGm.SetValue(this->num_axes - 1, stopori); + float interval = (stop - start) / (this->num_axes - 1); + for(int32_t k = 1; k < this->num_axes - 1; k++) { + float tmpvalue = start + (k * interval); + tmp1Local.SetValue(0, tmpvalue); + Cast(tmp3Local, tmp1Local, RoundMode::CAST_RINT, 16); + Cast(tmp2Local, tmp3Local, RoundMode::CAST_TRUNC, 16); + typeT tmpint = tmp2Local.GetValue(0); + outputGm.SetValue(k, tmpint); + } + } else if constexpr (std::is_same_v || std::is_same_v) { + LocalTensor tmp1Local = tmp1.Get(); + LocalTensor tmp2Local = tmp2.Get(); + float start = startGm.GetValue(0); + float stop = stopGm.GetValue(0); + outputGm.SetValue(0, start); + outputGm.SetValue(this->num_axes - 1, stop); + float interval = (stop - start) / (this->num_axes - 1); + for(int32_t k = 1; k < this->num_axes - 1; k++) { + float tmpvalue = start + (k * interval); + tmp1Local.SetValue(0, tmpvalue); + Cast(tmp2Local, tmp1Local, RoundMode::CAST_TRUNC, 16); + typeT tmpint = tmp2Local.GetValue(0); + outputGm.SetValue(k, tmpint); + } + } else if constexpr (std::is_same_v || std::is_same_v) { + float start = startGm.GetValue(0); + float stop = stopGm.GetValue(0); + outputGm.SetValue(0, start); + outputGm.SetValue(this->num_axes - 1, stop); + float interval = (stop - start) / (this->num_axes - 1); + for(int32_t k = 1; k < this->num_axes - 1; k++) { + float tmpvalue = start + (k * interval); + outputGm.SetValue(k, tmpvalue); + } + } + } + +private: + TPipe* pipe; + TBuf tmp1, tmp2, tmp3; + GlobalTensor startGm; + GlobalTensor stopGm; + GlobalTensor num_axesGm; + GlobalTensor outputGm; + int32_t num_axes; +}; + +extern "C" __global__ __aicore__ void lin_space_v2(GM_ADDR start, GM_ADDR stop, GM_ADDR num_axes, GM_ADDR output, GM_ADDR workspace, GM_ADDR tiling) { + GET_TILING_DATA(tiling_data, tiling); + TPipe pipe; + if (TILING_KEY_IS(900)) { + KernelLinSpaceSP1 op; + op.Init(start, stop, num_axes, output); + } else if (TILING_KEY_IS(100) || TILING_KEY_IS(200) || TILING_KEY_IS(201) || TILING_KEY_IS(202) || TILING_KEY_IS(203) || TILING_KEY_IS(300) || TILING_KEY_IS(400)){ + KernelLinSpaceV2 op; + op.Init(start, stop, num_axes, output, &pipe); + op.Process(); + } +} \ No newline at end of file diff --git a/operator_contrib/LinSpaceV2Sample/opp_kernel_aicpu/.gitkeep b/operator_contrib/LinSpaceV2Sample/opp_kernel_aicpu/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/operator_contrib/LinSpaceV2Sample/tests/st/.gitkeep b/operator_contrib/LinSpaceV2Sample/tests/st/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/operator_contrib/LinSpaceV2Sample/tests/ut/.gitkeep b/operator_contrib/LinSpaceV2Sample/tests/ut/.gitkeep new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/operator_contrib/README.md b/operator_contrib/README.md index da799262c54be193526c3682e60f8c2e119fbcdb..51e8ae5817a26ffea08aba64a369dff12feba873 100644 --- a/operator_contrib/README.md +++ b/operator_contrib/README.md @@ -23,6 +23,7 @@ | [LayerNormCustomSample](./LayerNormCustomSample) | 基于AscendC的LayerNorm自定义算子及调用样例 | Atlas A2训练系列产品 | 8.0.RC1.alpha003 | | [LerpSample](./LerpSample) | 基于Ascend C的Lerp自定义Vector算子及调用样例 | Atlas 200/500 A2 推理产品
Atlas A2训练系列产品/Atlas 800I A2推理产品 | 8.0.RC2.alpha003 | | [LessEqualSample](./LessEqualSample) | 基于Ascend C的LessEqual自定义Vector算子及调用样例 | Atlas 200/500 A2 推理产品 | 8.0.RC1.alpha003 | +| [LinSpaceV2Sample](./LinSpaceV2Sample) | 基于Ascend C的LinSpaceV2自定义Vector算子及调用样例 | Atlas 200/500 A2 推理产品
Atlas A2训练系列产品/Atlas 800I A2推理产品 | 8.0.0.alpha003 | | [LpNormV2CustomSample](./LpNormV2CustomSample) | 基于Ascend C的LpNormV2自定义Vector算子及调用样例 | Atlas 200/500 A2
Atlas A2训练系列产品/Atlas 800I A2推理产品 | 8.0.RC1.alpha003 | | [MoeSoftMaxTopkCustomSample](./MoeSoftMaxTopkCustomSample) | 基于AscendC的MoeSoftMaxTopk自定义算子及调用样例 | Atlas A2训练系列产品 | 8.0.RC1.alpha003 | | [MseLossGradSample](./MseLossGradSample) | 基于Ascend C的MseLossGrad自定义Vector算子及调用样例 | Atlas 200/500 A2 推理产品
Atlas A2训练系列产品/Atlas 800I A2推理产品 | 8.0.RC1.alpha003 |