# zDNN **Repository Path**: mirrors_ibm/zDNN ## Basic Information - **Project Name**: zDNN - **Description**: IBM Z Deep Neural Network Library (zDNN) provides an interface for applications making use of Neural Network Processing Assist Facility (NNPA). - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-10-22 - **Last Updated**: 2025-09-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # zDNN API Reference ## Contacts - Nicholas Marion () - Andreas Krebbel () - Steven Jones () ## Version 1.2.0 ## Table of Contents 1. [Overview](#overview) 2. [Environment](#environment) 3. [Building zDNN](#building-and-installing-zdnn) 4. [Common Data Types and Structs](#common-types-and-structs) - [Version Information](#common-version-info) - [zDNN zTensor](#common-ztensor) - [General zTensor Requirements](#gen-zten-reqs) - [Concatenated zTensor Requirements](#concat-zten-reqs) - [Quantized zTensor Requirements](#quan-zten-reqs) - [zDNN Tensor Descriptors](#common-descriptors) - [zDNN Data Layouts](#common-layouts) - [zDNN Data Formats](#common-formats) - [zDNN Data Types](#common-types) - [zDNN Quantized Transform Types](#quantized-transform-types) - [zDNN Statuses](#common-statuses) 5. [Runtime Environment Variables](#env-vars) 6. [Validating the Runtime Environment](#runtime-val) 7. [API Reference](#api-reference) - [Support Functions](#support-functions) - [Data Transformation](#data-transformation) - [Operations](#operations) - [Element-wise](#elwise-ops) - [Activation](#act-ops) - [Normalization](#norm-ops) - [Matmul with Operation](#zdnn_matmul_op) - [Matmul Broadcast with Operation](#zdnn_matmul_bcast_op) - [Matmul Transpose with Operation](#zdnn_matmul_transpose_op) - [Quantized Matmul Operation](#zdnn_quantized_matmul_op) - [LSTM](#zdnn_lstm) - [GRU](#zdnn_gru) - [Average Pool 2D](#zdnn_avgpool2d) - [Max Pool 2D](#zdnn_maxpool2d) - [Convolution 2D](#zdnn_conv2d) - [Convenience Functions](#convenience-functions) 8. [Usage Examples](#usage-examples) ## Overview **Deep Learning Library** - the deep learning library support (zDNN) is the SW enablement technology provided by IBM to meet the following requirements: - Specialized-function-assist instructions are intended to provide performance improvements for specific operations used in software libraries, utilities, and operating system (OS) services. The facilities and instructions described as specialized-function-assist instructions may be replaced or removed in the future. As such, the IBM recommendation for these instructions is that a software library or operating system function be used instead of directly accessing the instructions. This is the function provided by zDNN. - zAIU has very complex data layout requirements; these requirements arrange the tensor to enhance the performance characteristics of the operations. zDNN will format the tensor appropriately on behalf of the caller, and it will do so using an optimized approach. - For deep learning operations, zAIU requires the use of internal data types: - DLFLOAT16, a 2-byte data type supported in Telum I, which optimizes training and inference while minimizing the loss of accuracy at inference time (versus standard 4-byte formats), - INT8, a 1-byte data type supported with Telum II, which allows tensor quantization features. The zDNN library provides a set of APIs that an exploiter will utilize to drive the desired request. zDNN will be available on both z/OS and Linux on Z; the inclusion of Linux on Z provides particular benefit, as it will allow us to enable acceleration in frameworks for z/OS via z/OS Container Extensions (zCX). --- ## Environment z/OS: - Problem state - AMODE64 - XPLINK ### Alignment requirements #### zAIU Op Limits _This implies a zDNN limitation as well at this point._ - For all ops: - Number of elements in any dimension must not exceed the value returned by `zdnn_get_max_for_dim(uint8_t dimension)` - Total number of bytes required for storing a transformed tensor must not exceed the value returned by `zdnn_get_nnpa_max_tensor_size()` ### Application interfaces for zAIU Enterprise Neural Network Inference #### zDNN General The zDNN deep learning library provides the standard IBM Z software interface to the zAIU. This IBM-provided C library provides a set of functions that handle the data transformation requirements of the zAIU and provide wrapper functions for the NNPA instruction primitives. The zDNN functions use the following criteria to determine if zAIU can be used to accelerate a deep learning primitive: - Neural Network Processing Assist (NNPA) facility indicator in the system STFLE output. - Output of the NNPA-QAF (Query Available Functions) request. #### Using zDNN To use the IBM-provided zDNN C library for the NNPA instruction, follow these steps: 1. Link or re-link applications to use the IBM-provided zDNN. The IBM-provided zDNN is a library file in the z/OS UNIX System Services file system and can be statically or dynamically linked into your applications. The paths for the zDNN archive file and the zDNN header files are: **z/OS (LE required):** Path for 64-bit dynamic library files: - `/lib/libzdnn.so` - `/lib/libzdnn.x` Path for the zDNN header files: - `/usr/include/` The XL C/C++ compiler and the z/OS Language Environment provide various environment variables to control processing, in addition to the variables provided by the zDNN library itself. 1. Use the environment variable `_CEE_RUNOPTS` to specify invocation Language Environment runtime options. For more information about using the environment variable `_CEE_RUNOPTS` and other C and LE variables, see z/OS XL C/C++ Programming Guide. 2. For environment variables accepted by the zDNN library, see [Runtime Environment Variables](#env-vars). **Linux on Z:** On Linux on Z we expect to ship source as well a package-installable library and header. The library installation will conform to the standards of the packaging method chosen. --- ## Building and Installing zDNN ### Clone the Repository and Submodules ``` git clone --recurse-submodules git@github.com:IBM/zDNN.git ``` ### Create configure script To create configure script ``` autoreconf . ``` ### Configure Build Prepare the build and install environment and check for necessary dependencies using `./configure` script. ``` ./configure [OPTION]... [VAR=VALUE]... ``` #### Installation Options - `--prefix=PREFIX` - Install architecture-independent files in PREFIX. Default location is `/usr/local` - `--exec-prefix=EPREFIX` - Install architecture-independent files in EPREFIX. Default location is `PREFIX` _To explore all available configuration options and features, use `-h`_ ### Build Library Compile zDNN library using: ``` make build ``` ### Run Tests To run tests: ``` make test ``` #### Unity Requirement _Please note that the Unity test framework source code is required to run unit tests. If you did not clone submodules along with initial zDNN clone, please perform the following steps to setup Unity prior to issuing `make tests`:_ 1. Clone the source code from the [Throw The Switch - Unity](https://github.com/ThrowTheSwitch/Unity) repository. 2. Set the `UNITY_ROOT` environment variable to the folder containing the Unity source code. #### Python Package Requirements _Please note that `junit_xml` and `pyparsing` are required python packages in order to properly parse and format Unity test results. Follow standard python package installation practices to meet requirements._ ### Install Install zDNN library: ``` sudo make install ``` ### Reference Commands Configure help: ``` ./configure -h ``` Make help: ``` make help ``` ### Prerequisite Tools Compilers: - `GCC: GNU Compiler Collection (gcc)` or - `IBM XL C/C++: (xlc)` Build Tools and Dependencies: - `Autoconf` - `Make` - `Unity` - `Python Packages` _For formatting test results_ - junit_xml - pyparsing --- ## Common Types and Structs Include Files: `zdnn.h` ### Version Information [Back to Table of Contents](#TOC) ```C #define ZDNN_VERSION "1.2.0" #define ZDNN_VERNUM 0x010200 // 0x[major][minor][patch] #define ZDNN_VER_MAJOR 1 #define ZDNN_VER_MINOR 2 #define ZDNN_VER_PATCH 0 ``` 1. zDNN major version (_ZDNN_VER_MAJOR_) will be incremented if any backwards incompatible changes are introduced to the API. It may also include minor and patch level changes. Patch and minor version will be reset to 0 when major version is incremented. 2. zDNN minor version (_ZDNN_VER_MINOR_) will be incremented if new, backwards compatible functionalities are introduced to the API or if any API functionalities are marked as deprecated. It may also include patch level changes. Patch version will be reset to 0 when minor version is incremented. 3. zDNN patch version (_ZDNN_VER_PATCH_) will be incremented if only backwards compatible bug fixes are introduced. A bug fix being defined as an internal change that fixes incorrect behavior. Functions for checking version incompatibility with the zDNN load library are provided and described in the [Support Functions](#support-functions) section. ### zDNN zTensor [Back to Table of Contents](#TOC) ```C typedef struct zdnn_ztensor { zdnn_tensor_desc *pre_transformed_desc; // tensor's shape information before transformation zdnn_tensor_desc *transformed_desc; // transformed tensor's shape information uint64_t buffer_size; // tensor size in bytes void *buffer; // pointer to the tensor in memory bool is_transformed; // indicator if data in buffer has been transformed char reserved[3]; // not currently used, should contain zeros. float rec_scale; // the scale factor for quantization, stored as reciprocal float offset; // the offset for quantization char reserved2[20]; // not currently used, should contain zeros. } zdnn_ztensor; ``` #### General zTensor Requirements [Back to Table of Contents](#TOC) - `buffer` requirements: - Calling [zdnn_init_ztensor_with_malloc](#zdnn_init_ztensor_with_malloc) automatically allocates and sets a valid `buffer` for a tensor. - `buffer` field must point to storage allocated of sufficient size to contain the transformed tensor data described by the its `transformed_desc` field. - Calling [zdnn_getsize_ztensor](#zdnn_getsize_ztensor) with the tensor's `transformed_desc` returns the required size. - Start of `buffer` field must be 4k aligned. - `reserved` should contain zeros, otherwise the program may not operate compatibly in the future. - Calling [zdnn_init_ztensor](#zdnn_init_ztensor) or [zdnn_init_ztensor_with_malloc](#zdnn_init_ztensor_with_malloc) will set `reserved` to zeros. #### Concatenated zTensor Requirements [Back to Table of Contents](#TOC) - For use with weights/biases/hidden-weights/hidden-biases RNN-gates tensors. - You must use [zdnn_generate_transformed_desc_concatenated](#zdnn_generate_transformed_desc_concatenated) with the appropriate concatenation info - Do not use `zdnn_generate_transformed_desc` with concatenated tensors - The pre-transformed shape dimensions should not include the concatenation. - Thus, the pre-transformed shape should be that of a single gate, not the shape of the combined gates - Afterward transform with [zdnn_transform_ztensor](#zdnn_transform_ztensor) as normal - Must follow [general tensor requirements](#gen-zten-reqs) #### Quantized zTensor Requirements [Back to Table of Contents](#TOC) - Supported `transform_desc` and `pre_transformed_desc` types for [zdnn_transform_quantized_ztensor](#zdnn_transform_quantized_ztensor) and [zdnn_generate_quantized_transformed_desc](#zdnn_generate_quantized_transformed_desc): - `ZDNN_FORMAT_4DFEATURE` format: - ZDNN_DLFLOAT16 - FP16, FP32, BFLOAT - ZDNN_BINARY_INT8 - INT8, FP16, FP32, BFLOAT - `ZDNN_FORMAT_4DWEIGHTS` format: - ZDNN_BINARY_INT8 - INT8 ### zDNN Tensor Descriptors [Back to Table of Contents](#TOC) ```C typedef struct zdnn_tensor_desc { zdnn_data_layouts layout; // data layout zdnn_data_formats format; // internal use only zdnn_data_types type; // data type uint32_t dim4; // number of elements in outermost dimension uint32_t dim3; // ... outer dimension uint32_t dim2; // ... inner dimension uint32_t dim1; // number of elements in innermost dimension } zdnn_tensor_desc; ``` #### Programming Notes - Helper methods [zdnn_init_pre_transformed_desc](#zdnn_init_pre_transformed_desc) and [zdnn_generate_transformed_desc](#zdnn_generate_transformed_desc) or [zdnn_generate_transformed_desc_concatenated](#zdnn_generate_transformed_desc_concatenated) will set the correct dims based on the layout and format. - The [layout](#common-layouts) of the tensor descriptor affects the expected order of the dims. For example: - For tensors with less than 4 dimensions, unspecified dims: - In the [pre_transformed_desc](#common-ztensor) are ignored. For example a [ZDNN_3D](#common-layouts) expects values in dim4, dim3, and dim2. - In the [transformed_desc](#common-ztensor) "unused" dims must be 1. - A [ZDNN_NHWC](#common-layouts) expects dims such that dim4 = N, dim3 = H, dim2 = W, dim1 = C - A [ZDNN_NCHW](#common-layouts) expects dims such that dim4 = N, dim3 = C, dim2 = H, dim1 = W - A [ZDNN_HWCK](#common-layouts) expects dims such that dim4 = H, dim3 = W, dim2 = C, dim1 = K - The [format](#common-formats) changes the expected dims order for [ZDNN_4D](#common-layouts) tensors layouts - [ZDNN_FORMAT_4DFEATURE](#common-formats) expects dims such that dim4 = N, dim3 = H, dim2 = W, dim1 = C - [ZDNN_FORMAT_4DKERNEL](#common-formats) expects dims such that dim4 = H, dim3 = W, dim2 = C, dim1 = K ### zDNN Data Layouts [Back to Table of Contents](#TOC) The following are layouts for zDNN ztensor descriptors. These indicate the number and order of dimensions to expect for the ztensor data. ```C typedef enum zdnn_data_layouts { ZDNN_1D, // 1d tensor ZDNN_2D, // 2d tensor ZDNN_2DS, // represents special 2D tensors required by LSTM/GRU ZDNN_3D, // 3d tensor ZDNN_3DS, // represents special 3D tensors required by // LSTM/GRU/Softmax/Matmul ZDNN_ZRH, // represents (update, reset, hidden) used by GRU ZDNN_4D, // 4d tensor ZDNN_4DS, // represents special 4D tensors required by LSTM/GRU output ZDNN_NHWC, // 4d feature tensor in NHWC ZDNN_NCHW, // 4d feature tensor in NCHW ZDNN_FICO, // represents (forget, input, cell, output) used by LSTM ZDNN_HWCK, // 4d kernel CNN tensor ZDNN_BIDIR_ZRH, // ZRH variant to work with bidirectional LSTM/GRU output ZDNN_BIDIR_FICO // FICO variant to work with bidirectional LSTM/GRU output } zdnn_data_layouts; ``` Some layouts also indicate special re-arrangement of the data during ztensor transformation. - `ZDNN_2DS` - The outermost dimension of the original shape is promoted to dim4 during transformation. For example, a shape of (a, b) becomes [a, 1, 1, b] (dim4, dim3, dim2, dim1) in the `transformed_desc` - `ZDNN_3DS` - The outermost dimension of the original shape is promoted to dim4 during transformation. For example, a shape of (a, b, c) becomes [a, 1, b, c] (dim4, dim3, dim2, dim1) in the `transformed_desc` - `ZDNN_4DS` - Arrangement for RNN output tensor The followings are set automatically in `transformed_desc` based on `info` when calling `zdnn_generate_transformed_desc_concatenated()`: - `ZDNN_ZRH/FICO` - During transformation, the RNN input gates data are concatenated on the innermost dimension. Supported with `pre_transformed_layout` of `ZDNN_2DS` or `ZDNN_3DS`. - `ZDNN_BIDIR_ZRH/FICO` - Similar to `ZDNN_ZRH/FICO`, used when: 1. transforming RNN input weight gate data, and 2. the input tensor for the current RNN layer is a bidirectional RNN output from a previous RNN layer ### zDNN Data Formats [Back to Table of Contents](#TOC) ```C typedef enum zdnn_data_formats { ZDNN_FORMAT_4DFEATURE, // tensor in zAIU data layout format 0 ZDNN_FORMAT_4DKERNEL, // tensor in zAIU data layout format 1 ZDNN_FORMAT_4DWEIGHTS, // tensor in zAIU data layout format 2 ZDNN_FORMAT_4DGENERIC, // tensor in zAIU data layout format 31 } zdnn_data_formats; ``` ### zDNN Data Types [Back to Table of Contents](#TOC) ```C typedef enum zdnn_data_types { ZDNN_DLFLOAT16, // 16-bit deep learning format ZDNN_BINARY_FP32, // 32-bit binary-floating-point format ZDNN_BINARY_INT8, // 8-bit signed or unsighed binary integer ZDNN_BINARY_INT32, // 32-bit signed or unsigned binary integer INT8, // 8-bit signed or unsigned binary integer format INT32, // 32-bit signed or unsigned binary integer format BFLOAT, // Brain floating point format FP16, // 16-bit IEEE-754 floating point format FP32, // 32-bit IEEE-754 floating point format } zdnn_data_types; ``` ### zDNN Quantized Transform Types [Back to Table of Contents](#TOC) ```C typedef enum zdnn_quantized_transform_types { QUANTIZED_DLFLOAT16 = 0, // quantized dlfloat16 QUANTIZED_INT8 = 1, // quantized int8 QUANTIZED_WEIGHTS_INT8 = 2 // quantized weights } zdnn_quantized_transform_types; ``` ### zDNN Statuses [Back to Table of Contents](#TOC) | Mnemonic Constant | Value | Meaning | | -------------------------------- | ---------- | ------------------------------ | | ZDNN_OK | 0x00000000 | Success. | #### Warning Statuses | Mnemonic Constant | Value | Meaning | | -------------------------------- | ---------- | ------------------------------ | | ZDNN_ELEMENT_RANGE_VIOLATION | 0x00020001 | zAIU operation resulted in data that was out of the normal range. | _Note: ZDNN_ELEMENT_RANGE_VIOLATION indicates a **range violation** occurred for the zAIU operation based on the data in the tensors. This usually indicates an overflow of an NNPA internal data type, but can also be associated with operation specific errors, such as "divide by zero". See the "z/Architecture Principles of Operation" for information about range violation on the operation that encountered the violation._ #### General Failing Statuses | Mnemonic Constant | Value | Meaning | | -------------------------------- | ---------- | ------------------------------ | | ZDNN_INVALID_SHAPE\* | 0x00040001 | Invalid shape information in one (or more) of the input/output tensor(s). | | ZDNN_INVALID_LAYOUT | 0x00040002 | Invalid layout information in one (or more) of the input/output tensor(s). | | ZDNN_INVALID_TYPE\* | 0x00040003 | Invalid type information in one (or more) of the input/output tensor(s). | | ZDNN_INVALID_FORMAT\* | 0x00040004 | Invalid format information in one (or more) of the input/output tensor(s). | | ZDNN_INVALID_DIRECTION | 0x00040005 | Invalid RNN direction. | | ZDNN_INVALID_CONCAT_INFO | 0x00040006 | Invalid concatenation info. | | ZDNN_INVALID_STRIDE_PADDING\* | 0x00040007 | Invalid padding type parameter for current strides. | | ZDNN_INVALID_STRIDES\* | 0x00040008 | Invalid stride height or width parameter. | | ZDNN_MISALIGNED_PARMBLOCK\* | 0x00040009 | NNPA parameter block is not on double word boundary. | | ZDNN_INVALID_CLIPPING_VALUE | 0x0004000A | Invalid clipping for the specified operation. | | ZDNN_INVALID_ADJUSTMENT_FACTOR | 0x0004000B | Invalid adjustment for the specified operation. | | ZDNN_INVALID_EPSILON | 0x0004000C | Invalid epsilon for the specified operation. | | ZDNN_INVALID_TRANSFORM_TYPE | 0x0004000D | Invalid transformation type. | | ZDNN_INVALID_BETA | 0x0004000E | Invalid beta value for the specified operation. | | ZDNN_INVALID_GAMMA | 0x0004000F | Invalid gamma value for the specified operation. | | ZDNN_INVALID_BESSEL_CORRECTION | 0x00040010 | Invalid bessel correction value for the specified operation. | | ZDNN_INVALID_SCALE | 0x00040011 | Invalid scale value for the specified operation. | | ZDNN_INVALID_OFFSET | 0x00040012 | Invalid offset value for the specified operation. | | ZDNN_ALLOCATION_FAILURE | 0x00100001 | Can not allocate storage. | | ZDNN_INVALID_BUFFER | 0x00100002 | Buffer address is NULL or not on 4K-byte boundary or insufficient buffer size. | | ZDNN_CONVERT_FAILURE | 0x00100003 | Floating point data conversion failure. | | ZDNN_INVALID_STATE | 0x00100004 | Invalid zTensor state. | | ZDNN_UNSUPPORTED_AIU_EXCEPTION | 0x00100005 | zAIU operation returned an unexpected exception. | _Note: \*In certain scenarios, these statuses are returned only if [ZDNN_ENABLE_PRECHECK](#env-vars) is enabled. When not enabled, these scenarios will lead to abnormal program termination._ #### Hardware Statuses The following statuses indicate issues returned from the hardware. | Mnemonic Constant | Value | Meaning | | -------------------------------- | ---------- | ------------------------------ | | ZDNN_UNSUPPORTED_PARMBLOCK | 0x000C0001 | NNPA parameter block format is not supported by the model. | | ZDNN_UNAVAILABLE_FUNCTION | 0x000C0002 | Specified NNPA function is not defined or installed on the machine. | | ZDNN_UNSUPPORTED_FORMAT | 0x000C0010 | Specified tensor data layout format is not supported. | | ZDNN_UNSUPPORTED_TYPE | 0x000C0011 | Specified tensor data type is not supported. | | ZDNN_EXCEEDS_MDIS | 0x000C0012 | Tensor dimension exceeds maximum dimension index size (MDIS). | | ZDNN_EXCEEDS_MTS | 0x000C0013 | Total number of bytes in tensor exceeds maximum tensor size. (MTS). | | ZDNN_MISALIGNED_TENSOR | 0x000C0014 | Tensor address is not on 4K-byte boundary. | | ZDNN_MISALIGNED_SAVEAREA | 0x000C0015 | Function specific save area address is not on 4K-byte boundary. | The meaning of the following hardware statuses vary based on operation. See the operation that returned the status for the specific meaning. | Mnemonic Constant | Value | Meaning | | -------------------------------- | ---------- | ------------------------------ | | ZDNN_FUNC_RC_F000 | 0x000CF000 | Function specific response code (F000). | | ZDNN_FUNC_RC_F001 | 0x000CF001 | Function specific response code (F001). | | ZDNN_FUNC_RC_F002 | 0x000CF002 | Function specific response code (F002). | | ZDNN_FUNC_RC_F003 | 0x000CF003 | Function specific response code (F003). | | ZDNN_FUNC_RC_F004 | 0x000CF004 | Function specific response code (F004). | | ZDNN_FUNC_RC_F005 | 0x000CF005 | Function specific response code (F005). | | ZDNN_FUNC_RC_F006 | 0x000CF006 | Function specific response code (F006). | | ZDNN_FUNC_RC_F007 | 0x000CF007 | Function specific response code (F007). | | ZDNN_FUNC_RC_F008 | 0x000CF008 | Function specific response code (F008). | | ZDNN_FUNC_RC_F009 | 0x000CF009 | Function specific response code (F009). | --- ## Runtime Environment Variables [Back to Table of Contents](#TOC) - `ZDNN_ENABLE_PRECHECK`: true/false - If set to `true`, tensor integrity prechecks are run before issuing NNPA operations. - Enabling precheck may impact performance. - Enable to debug issues which cause hardware exceptions that otherwise would result in abnormal program termination. - `ZDNN_STATUS_DIAG`: nnnnnnnn (decimal) or 0xnnnnnnnn (hexadecimal) - Prints or produces diagnostic information whenever zDNN status code is equal to the specified value. Only one status value can be specified. _The following are only available when the zDNN library was built with `ZDNN_CONFIG_DEBUG` enabled._ - `ZDNN_LOGLEVEL`: off/fatal/error/warn/info/debug/trace - Sets logging facility's output level - `ZDNN_LOGMODULE`: module name(s) - Produces log output only when the issuer's module name is in the list. You may specify multiple module names by separating them with either commas or spaces. ### Programming Notes - Environment variables settings are checked during initial library load by [zdnn_init](#zdnn_init). - To change environment variable settings afterward, [zdnn_init](#zdnn_init) must be called again manually. ## Validating the environment at runtime ### Programming Notes - Most API calls require a minimum zDNN library and hardware for the API to function. There are three zDNN APIs for validation of the zDNN runtime environment: - Validating the zDNN Library version: - This is the version of the libzdnn package installed on the host or embedded in the runtime application. - The zDNN library version is independent of the hardware available on the current system. - zDNN APIs introduced in newer versions of the zDNN library will not exist in older versions of the library. Attempting to call them will result in application crashes. - The zDNN library version is returned by [zdnn_get_library_version](#zdnn_get_library_version). - Validating the zDNN API version: - This is the version of zDNN APIs that are compatible on the current system and is separate of the zDNN library version. - Calling zDNN APIs while running on a system which does not support that zDNN API version will return a [hardware status](#hw-statuses) instead of [ZDNN_OK](#common-statuses). - The zDNN API version available is returned by [zdnn_get_max_runnable_version](#zdnn_get_max_runnable_version) and is reflected in the return value of [zdnn_is_version_runnable](#zdnn_is_version_runnable). - zDNN API 1.0.x indicates the API requires Telum I or greater. - zDNN API 1.1.x indicates the API requires Telum II or greater. - Validating NNPA availability: - This indicates if the current system has zAIU hardware present and enabled. - It is possible to be on a system with zAIU hardware but the feature is unavailable, such as z/VM when there is a mix of hardware levels. - This is returned by [zdnn_is_nnpa_installed](#zdnn_is_nnpa_installed) - Examples: - Given a Telum I system with zDNN 1.1.0 installed: - [zdnn_get_library_version](#zdnn_get_library_version) will return `0x00010100` indicating zDNN library 1.1.0 is installed. - [zdnn_is_nnpa_installed](#zdnn_is_nnpa_installed) will return `true` (unless the zAIU feature is disabled for the system). - [zdnn_get_max_runnable_version](#zdnn_get_max_runnable_version) will return `0x000100FF` indicating zDNN APIs 1.0.x and below are available for use on the system. - Checking [zdnn_is_version_runnable(0x00010100)](#zdnn_is_version_runnable) (1.1.0) will return `false` as only zDNN APIs 1.0.x and below are available for use on the system. - Checking [zdnn_is_version_runnable(0x00010100)](#zdnn_is_version_runnable) (1.0.0) will return `true` as zDNN APIs 1.0.x and below are available for use on the system. - Given a Telum II system with zDNN 1.1.0 installed: - [zdnn_get_library_version](#zdnn_get_library_version) will return `0x00010100` indicating zDNN library 1.1.0 is installed. - [zdnn_is_nnpa_installed](#zdnn_is_nnpa_installed) will return `true` (unless the zAIU feature is disabled for the system). - [zdnn_get_max_runnable_version](#zdnn_get_max_runnable_version) will return `0x000101FF` indicating zDNN APIs 1.1.x and below are available for use on the system. - Checking [zdnn_is_version_runnable(0x00010100)](#zdnn_is_version_runnable) (1.1.0) will return `true` as zDNN APIs 1.1.x and below are available for use on the system. - Checking [zdnn_is_version_runnable(0x00010100)](#zdnn_is_version_runnable) (1.0.0) will return `true` as zDNN APIs 1.1.x and below are available for use on the system. - Given a Telum II system with zDNN 1.0.0 installed: - [zdnn_get_library_version](#zdnn_get_library_version) will return `0x00010000` indicating zDNN library 1.0.0 is installed. - [zdnn_is_nnpa_installed](#zdnn_is_nnpa_installed) will return `true` (unless the zAIU feature is disabled for the system). - [zdnn_get_max_runnable_version](#zdnn_get_max_runnable_version) will return `0x000100FF` indicating zDNN APIs 1.0.x and below are available for use on the system. - Checking [zdnn_is_version_runnable(0x00010100)](#zdnn_is_version_runnable) (1.1.0) will return `false` as only zDNN APIs 1.0.x and below are available for use on the system. - Checking [zdnn_is_version_runnable(0x00010100)](#zdnn_is_version_runnable) (1.0.0) will return `true` as zDNN APIs 1.1.x and below are available for use on the system. --- ## API Reference [Back to Table of Contents](#TOC) - [Support Functions](#support-functions) - [Data Transformation](#data-transformation) - [Operations](#operations) - [Convenience Functions](#convenience-functions) --- ## Support Functions [Back to Table of Contents](#TOC) - [Initialization](#zdnn_init) - [Get smallest of the max index size value from across all dimensions](#zdnn_get_nnpa_max_dim_idx_size) - [Get max index for a given dimension](#zdnn_get_max_for_dim) - [Get Size](#zdnn_getsize_ztensor) - [Get Range](#zdnn_getrange_ztensor) - [Get maximum limit for a given data type](#zdnn_get_max_limit) - [Get minimum limit for a given data type](#zdnn_get_min_limit) - [Initialize pre-transformed tensor descriptor](#zdnn_init_pre_transformed_desc) - [Generate transformed tensor descriptor](#zdnn_generate_transformed_desc) - [Generate quantized transformed tensor descriptor](#zdnn_generate_quantized_transformed_desc) - [Generate concatenated transformed tensor descriptor](#zdnn_generate_transformed_desc_concatenated) - [Initialize zTensor](#zdnn_init_ztensor) - [Initialize zTensor with memory allocate](#zdnn_init_ztensor_with_malloc) - [Initialize quantized zTensor](#zdnn_init_quantized_ztensor) - [Initialize quantized zTensor with memory allocate](#zdnn_init_quantized_ztensor_with_malloc) - [Reset zTensor](#zdnn_reset_ztensor) - [Allocate memory for zTensor](#zdnn_allochelper_ztensor) - [De-allocate memory for zTensor](#zdnn_free_ztensor_buffer) - [Retrieve status message of the status code](#zdnn_get_status_message) - [Reshape zTensor](#zdnn_reshape_ztensor) - [Check if version is runnable](#zdnn_is_version_runnable) - [Get maximum runnable version](#zdnn_get_max_runnable_version) --- ### zdnn_init #### Description Initialize the zDNN library. This sends an NNPA_QAF to query the NNPA and loads the current environment variable settings. This needs to be invoked at least once if zDNN library is statically-linked. It is automatically invoked if zDNN library is dynamically loaded. #### Format ```C void zdnn_init(); ``` #### Parameters None #### Returns None #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_nnpa_max_dim_idx_size #### Description Retrieve the smallest of the maximum dimension index size values across all dimensions currently supported by the zAIU from zDNN's internal memory. #### Format ```C uint32_t zdnn_get_nnpa_max_dim_idx_size(); ``` #### Parameters None #### Returns Maximum dimension index size supported by the zAIU across all dimensions #### Since Introduced in zDNN 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_max_for_dim #### Description Retrieve the maximum dimension index size value currently supported by the zAIU for a given dimension from zDNN's internal memory. These limits relate to ztensor's transformed descriptor values. Special care is required when using layouts with special re-arrangements of data. See [zDNN Data Layouts](#zdnn_data_layouts) for more details. #### Format ```C uint32_t zdnn_get_max_for_dim(uint8_t dimension); ``` #### Parameters - `int dimension` - dimension to get maximum index size for #### Returns Maximum dimension index size supported by the zAIU for a given dimension #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_nnpa_max_tensor_size #### Description Retrieve the maximum tensor size value (number of bytes required for storing a transformed tensor) currently supported by the zAIU from zDNN's internal memory. #### Format ```C uint64_t zdnn_get_nnpa_max_tensor_size(); ``` #### Parameters None #### Returns Maximum tensor size supported by the zAIU --- #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). ### zdnn_is_nnpa_installed #### Description Interrogates the hardware to determine if the NNPA and associated instructions are installed. Use this function during application initialization to determine whether the zAIU hardware is available. #### Format ```C bool zdnn_is_nnpa_installed(); ``` #### Parameters - None. #### Returns `true` if NNPA and associated instructions are installed, `false` otherwise. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_nnpa_function_installed #### Description Query, from zDNN internal memory, if requested NNPA functions are available. #### Format ```C bool zdnn_is_nnpa_function_installed(int count, ...); ``` #### Parameters - `int count` - number of NNPA functions to check - `... (additional arguments)` - Function names separated by commas, e.g., _NNPA_MUL, NNPA_MIN_ ``` NNPA_QAF NNPA_ADD NNPA_SUB NNPA_MUL NNPA_DIV NNPA_MIN NNPA_MAX NNPA_LOG NNPA_EXP NNPA_RELU NNPA_TANH NNPA_SIGMOID NNPA_SOFTMAX NNPA_BATCHNORMALIZATION NNPA_MAXPOOL2D NNPA_AVGPOOL2D NNPA_LSTMACT NNPA_GRUACT NNPA_CONVOLUTION NNPA_MATMUL_OP NNPA_MATMUL_OP_BCAST23 NNPA_MATMUL_OP_BCAST1 NNPA_TRANSFORM ``` #### Returns `true` if all queried formats are installed or if `count` is zero, `false` otherwise. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_nnpa_parmblk_fmt_installed #### Description Query, from zDNN internal memory, if requested parameter block formats are installed. #### Format ```C bool zdnn_is_nnpa_parmblk_fmt_installed(int count, ...); ``` #### Parameters - `int count` - number of NNPA parameter block formats to check - `... (additional arguments)` - NNPA parameter block formats separated by commas ``` NNPA_PARMBLKFORMAT_0 NNPA_PARMBLKFORMAT_1 ``` #### Returns `true` if all queried formats are installed or if `count` is zero, `false` otherwise. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_nnpa_datatype_installed #### Description Query, from zDNN internal memory, if requested NNPA data type are installed. #### Format ```C bool zdnn_is_nnpa_datatype_installed(uint16_t types_bitmask); ``` #### Parameters - `uint16_t types_bitmask` - OR'd type bitmasks as defined in zdnn_query_datatypes enum ``` QUERY_DATATYPE_INTERNAL1 QUERY_DATATYPE_BINARY_FP32 QUERY_DATATYPE_BINARY_INT8 QUERY_DATATYPE_BINARY_INT32 ``` #### Returns `true` if all queried data types are installed, `false` otherwise. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_nnpa_layout_fmt_installed #### Description Query, from zDNN internal memory, if requested NNPA data layout format are installed. #### Format ```C bool zdnn_is_nnpa_layout_fmt_installed(uint32_t layout_bitmask); ``` #### Parameters - `uint32_t layout_bitmask` - OR'd layout bitmasks as defined in zdnn_query_layoutfmts enum ``` QUERY_LAYOUTFMT_4DFEATURE QUERY_LAYOUTFMT_4DKERNEL QUERY_LAYOUTFMT_4DWEIGHTS QUERY_LAYOUTFMT_4DGENERIC ``` #### Returns `true` if all queried data layouts are installed, `false` otherwise. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_nnpa_conversion_installed #### Description Query, from zDNN internal memory, if requested NNPA data-type to/from BFP format conversions are installed. #### Format ```C bool zdnn_is_nnpa_conversion_installed(nnpa_data_type type, uint16_t format_bitmask); ``` #### Parameters - `nnpa_data_type type` - NNPA data-type number as defined in nnpa_data_type enum ``` NNPA_DATATYPE_1 ``` - `uint16_t format_bitmask` - OR'd BFP format bitmasks as defined in zdnn_query_bfpfmts enum ``` QUERY_BFPFMT_TINY (FP16) QUERY_BFPFMT_SHORT (FP32/BFLOAT) ``` #### Returns `true` if all queried conversions are installed, `false` otherwise. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_library_version #### Description Retrieve library version number as a 32-bit hex value in the form `0x00[major][minor][patch]` where each segment is 1 byte. For example zDNN 1.2.3 would return `0x00010203`. This is the version of the libzdnn package installed on the system or zDNN embeded in a runtime application. The zDNN library version is independant of the system that zDNN is running on. The library version indicates what zDNN APIs exist in that version of the zDNN library. It does **NOT** indicate whether those APIs are available for use. To check API availablity at runtime, see [Validating the environment at runtime](#runtime-val). #### Format ``` uint32_t zdnn_get_library_version(); ``` #### Returns Library version number in `0x00[major][minor][patch]` format. #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_library_version_str #### Description Retrieve the library version number and build information as a string. #### Format ```C char *zdnn_get_library_version_str(); ``` #### Returns Library version number and build information as a string. #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_refresh_nnpa_query_result #### Description Refresh zDNN in-memory query result from zAIU. #### Format ```C zdnn_status zdnn_refresh_nnpa_query_result(); ``` #### Parameters None ##### Programming Notes This is called automatically as a part of `zdnn_init` and should not need to be called directly. Manually refreshing query results before making other `zdnn_query_*` calls may noticeably impact performance. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_UNAVAILABLE_FUNCTION` #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_getsize_ztensor #### Description Used to determine the buffer size required for the transformed tensor (including concatenated) in zDNN transformed format. Requires tensor descriptor (`zdnn_tensor_desc`) with transformed shape information. #### Format ```C uint64_t zdnn_getsize_ztensor(const zdnn_tensor_desc *tfrmd_desc); ``` #### Parameters - `zdnn_tensor_desc *tfrmd_desc` - Contains transformed information about the shape, layout and data type. #### Returns zdnn_status indications - required buffer size in bytes #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_getrange_ztensor #### Description Used to determine the minimum negative value and maximum positive value of the passed zdnn_ztensor, storing the results in min and max. #### Format ```C void zdnn_getrange_ztensor(const zdnn_ztensor *ztensor, float *min, float *max); ``` #### Parameters - `const zdnn_ztensor *ztensor` - The zdnn_ztensor to return the min and max value of. - `float *min` - Pointer to a float used to store minimum negative value. - If all values are positive, -0.0 will be used instead. - `float *max` - Pointer to a float used to store maximum positive value. - If all values are negative, 0.0 will be used instead. #### Returns - None #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_max_limit #### Description Returns the maximum representable value between a transformed and pre-transformed zdnn_data_type. #### Format ```C zdnn_status zdnn_get_max_limit(zdnn_data_types transformed_type, zdnn_data_types pre_transformed_type, void *limit); ``` #### Parameters - `zdnn_data_types transformed_type` - input zdnn transformed data type. - Restricted to the following transformed data types: - ZDNN_DLFLOAT16 - ZDNN_BINARY_INT8 - ZDNN_BINARY_INT32 - `zdnn_data_types pre_transformed_type` - input zdnn pre-transformed data type. - Restricted to the following transformed data types: - INT32 - INT8 - FP32 - FP16 - BFLOAT - `void *limit` - pointer to max value between transformed_type and pre_transformed_type in data type of pre_transformed_type. #### Returns - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - invalid transformed or pre_transformed `type` used and conversion could not be completed. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_min_limit #### Description Return the minimum representable value between a transformed and pre-transformed zdnn_data_type. #### Format ```C zdnn_status zdnn_get_min_limit(zdnn_data_types transformed_type, zdnn_data_types pre_transformed_type, void *limit); ``` #### Parameters - `zdnn_data_types transformed_type` - input zdnn transformed data type. - Restricted to the following transformed data types: - ZDNN_DLFLOAT16 - ZDNN_BINARY_INT8 - ZDNN_BINARY_INT32 - `zdnn_data_types pre_transformed_type` - input zdnn pre-transformed data type. - Restricted to the following transformed data types: - INT32 - INT8 - FP32 - FP16 - BFLOAT - `void *limit` - pointer to min value between transformed_type and pre_transformed_type in data type of pre_transformed_type. #### Returns - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - invalid transformed or pre_transformed `type` used and conversion could not be completed. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_init_pre_transformed_desc #### Description Initialize tensor descriptor (`zdnn_tensor_desc`) struct with pre-transformed (original) shape information. #### Format ```C void zdnn_init_pre_transformed_desc(zdnn_data_layouts layout, zdnn_data_types type, zdnn_tensor_desc *pre_tfrmd_desc, ...); ``` #### Parameters - `zdnn_data_layouts layout` - data layout - `zdnn_data_types type` - data type - `zdnn_tensor_desc *pre_tfrmd_desc` - output zdnn_tensor_desc struct - `... (additional arguments)` - Variadic: number of elements in each dimension in accordance to the layout, in outermost to innermost order #### Returns - None #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_generate_transformed_desc #### Description Generate transformed tensor descriptor information based on supplied pre-transformed tensor descriptor. #### Format ```C zdnn_status zdnn_generate_transformed_desc( const zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - `zdnn_tensor_desc *tfrmd_desc` - output `zdnn_tensor_desc` struct #### zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - pre-transformed `type` is not recognized or is a type only used for quantized ztensors. - `ZDNN_INVALID_LAYOUT` - pre-transformed `layout` is not recognized or is a layout only used for concatenated tensors. #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_generate_quantized_transformed_desc #### Description Generate quantized transformed tensor descriptor information based on supplied pre-transformed tensor descriptor and quantized transform type. #### Format ```C zdnn_status zdnn_generate_quantized_transformed_desc( const zdnn_tensor_desc *pre_tfrmd_desc, zdnn_quantized_transform_types transform_type, zdnn_tensor_desc *tfrmd_desc); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - Has the following additional restrictions: - Only the following pre-transformed layouts are supported. - ZDNN_1D - ZDNN_2D - ZDNN_2DS - ZDNN_3D - ZDNN_3DS - ZDNN_4D - ZDNN_NHWC - `zdnn_quantized_transform_types transform_type` - Type of quantized transformation - QUANTIZED_DLFLOAT16 - QUANTIZED_INT8 - QUANTIZED_WEIGHTS_INT8 - `zdnn_tensor_desc *tfrmd_desc` - output `zdnn_tensor_desc` struct #### zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - pre-transformed `type` is not recognized, not supported for quantized ztensors: [Quantized zTensor Requirements](#quan-zten-reqs) - `ZDNN_INVALID_LAYOUT` - pre-transformed `layout` is not recognized, not supported for quantized ztensors, or is a layout only used for concatenated tensors. - `ZDNN_INVALID_TRANSFORM_TYPE` - Invalid transformation type: [Quantized zTensor Requirements](#quan-zten-reqs) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_generate_transformed_desc_concatenated #### Description Generate concatenated transformed tensor descriptor information for RNN input-gates tensors based on a supplied pre-transformed tensor descriptor. #### Format ```C zdnn_status zdnn_generate_transformed_desc_concatenated( const zdnn_tensor_desc *pre_tfrmd_desc, zdnn_concat_info info, zdnn_tensor_desc *tfrmd_desc); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - `zdnn_concat_info info` - Information about how the tensors will be concatenated, consists of the RNN_TYPE, PREV_LAYER and USAGE flags OR'd together: RNN_TYPE flags: - RNN_TYPE_LSTM - For LSTM - RNN_TYPE_GRU - For GRU PREV_LAYER flags: - PREV_LAYER_UNI - Previous RNN layer is uni-directional - PREV_LAYER_NONE - Previous layer is not a RNN layer - PREV_LAYER_BIDIR - Previous RNN layer is bi-directional USAGE flags: - USAGE_WEIGHTS - Concatenate as input weights - USAGE_HIDDEN_WEIGHTS - Concatenate as input hidden-weights - USAGE_BIASES - Concatenate as input biases - USAGE_HIDDEN_BIASES - Concatenate as input hidden-biases - `zdnn_tensor_desc *tfrmd_desc` - output `zdnn_tensor_desc` struct #### zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - pre-transformed `type` is not recognized or is not supported for concatenated tensors. - `ZDNN_INVALID_LAYOUT` - pre-transformed `layout` is not recognized or is not supported for concatenated tensors. - `ZDNN_INVALID_CONCAT_INFO` - invalid concatenation information. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_init_ztensor #### Description Initialize a `zdnn_ztensor` struct using the pre-transformed and transformed tensor shape information #### Format ```C void zdnn_init_ztensor(zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc, zdnn_ztensor *output); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - `zdnn_tensor_desc *tfrmd_desc` - input tensor descriptor with transformed shape information - `zdnn_ztensor *output` - The `zdnn_ztensor` struct being initialized. #### Returns - None #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_init_ztensor_with_malloc #### Description Same functionality as `zdnn_init_ztensor`, and computes the size required for the tensor in the zDNN transformed format and allocates the storage for it. Sets `buffer` and `buffer_size` fields within `output`. #### Format ```C zdnn_status zdnn_init_ztensor_with_malloc(zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc, zdnn_ztensor *output); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - `zdnn_tensor_desc *tfrmd_desc` - input tensor descriptor with transformed shape information - `zdnn_ztensor *output` - The `zdnn_ztensor` struct being initialized. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_FORMAT` - `tfrmd_desc->format` is not recognized. - `ZDNN_INVALID_TYPE` - `tfrmd_desc->type` is not recognized or is a pre_tfrmd_desc type. - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - One of `tfrmd_desc->dim*` dimensions is 0. - One of `tfrmd_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - Note: concatenation dimensions have a smaller maximum size. See [LSTM](#lstm-hid_sz) or [GRU](#gru-hid_sz). - The total number of tfrmd_desc elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_ALLOCATION_FAILURE` - Unable to allocate required memory on a 4K boundary. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_init_quantized_ztensor #### Description Initialize a `zdnn_ztensor` struct using the pre-transformed and quantized transformed tensor shape information along with scale and offset. #### Format ```C void zdnn_init_quantized_ztensor(zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc, float scale, float offset, zdnn_ztensor *output); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - `zdnn_tensor_desc *tfrmd_desc` - input tensor descriptor with quantized transformed shape information - `float scale` - scale for quantized ztensor, must not be 0. - `float offset` - offset for quantized ztensor - `zdnn_ztensor *output` - The `zdnn_ztensor` struct being initialized. #### Programming Notes - The reciprocal of the `scale` value is stored as `output->rec_scale` and is used within subsequent quantized calls with reduced precision. Due to this, large `scale` values will lead to a `output->rec_scale` that underflows to 0.0 and will result in an error in subsequent quantized calls. - The `offset` value is stored as `output->offset` and is used within subsequent quantized calls with reduced precision. Due to this, large `offset` values will overflow to infinity and will result in an error in subsequent quantized calls. #### Returns - None #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_init_quantized_ztensor_with_malloc #### Description Same functionality as `zdnn_init_quantized_ztensor`, and computes the size required for the tensor in the zDNN transformed format and allocates the storage for it. Sets `buffer` and `buffer_size` fields within `output`. #### Format ```C zdnn_status zdnn_init_quantized_ztensor_with_malloc( zdnn_tensor_desc *pre_tfrmd_desc, zdnn_tensor_desc *tfrmd_desc, float scale, float offset, zdnn_ztensor *output); ``` #### Parameters - `zdnn_tensor_desc *pre_tfrmd_desc` - input tensor descriptor with pre-transformed shape information - `zdnn_tensor_desc *tfrmd_desc` - input tensor descriptor with quantized transformed shape information - `float scale` - scale for quantized ztensor, must not be 0. - `float offset` - offset for quantized ztensor - `zdnn_ztensor *output` - The `zdnn_ztensor` struct being initialized. #### Programming Notes - The reciprocal of the `scale` value is stored as `output->rec_scale` and is used within subsequent quantized calls with reduced precision. Due to this, large `scale` values will lead to a `output->rec_scale` that underflows to 0.0 and will result in an error in subsequent quantized calls. - The `offset` value is stored as `output->offset` and is used within subsequent quantized calls with reduced precision. Due to this, large `offset` values will overflow to infinity and will result in an error in subsequent quantized calls. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_FORMAT` - `tfrmd_desc->format` is not recognized. - `ZDNN_INVALID_TYPE` - `tfrmd_desc->type` is not recognized or is a pre_tfrmd_desc type. - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - One of `tfrmd_desc->dim*` dimensions is 0. - One of `tfrmd_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - The total number of tfrmd_desc elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_ALLOCATION_FAILURE` - Unable to allocate required memory on a 4K boundary. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_quantized_ztensor #### Description Check if a given `zdnn_ztensor` represents a quantized ztensor or not #### Format ```C bool zdnn_is_quantized_ztensor(zdnn_ztensor *ztensor); ``` #### Parameters - `zdnn_ztensor *ztensor` - The `zdnn_ztensor` being checked. #### Returns `true` if `zdnn_ztensor` represents a quantized ztensor, `false` if not. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_reset_ztensor #### Description Reset a `zdnn_ztensor` struct for reuse. _Note this operation does not set or reset the `buffer` and `buffer_size` fields nor free the transformed area storage._ #### Format ```C void zdnn_reset_ztensor(zdnn_ztensor *ztensor); ``` #### Parameters - `zdnn_ztensor *output` - The `zdnn_ztensor` struct being reset. #### Returns - None #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_allochelper_ztensor #### Description Calculate the size required for the tensor in the zDNN transformed format and allocate the needed storage, satisfying alignment requirements. Sets `buffer` and `buffer_size` fields within `ztensor`. _Note that the calling application assumes ownership of this storage and is responsible for freeing it._ #### Format ```C zdnn_status zdnn_allochelper_ztensor(zdnn_ztensor *ztensor); ``` #### Parameters - `zdnn_ztensor *ztensor` - A `zdnn_ztensor` struct that contains the transformed shape information in the `transformed_desc` field. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_FORMAT` - `ztensor->transformed_desc->format` is not recognized. - `ZDNN_INVALID_TYPE` - `ztensor->transformed_desc->type` is not recognized or is a pre_transformed_desc type. - `ZDNN_INVALID_LAYOUT` - `zdnn_ztensor->transformed_desc->layout` is not recognized or is not a valid transformed_desc layout. - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - One of `ztensor->transformed_desc->dim*` dimensions is 0. - One of `ztensor->transformed_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - Note: concatenation dimensions have a smaller maximum size. See [LSTM](#lstm-hid_sz) or [GRU](#gru-hid_sz). - The total number of transformed_desc elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_ALLOCATION_FAILURE` - Unable to allocate required memory on a 4K boundary. #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_free_ztensor_buffer #### Description Given an input zdnn_ztensor, zdnn_free_ztensor_buffer will free the transformed area storage associated with it. _Note that the routine does not free the storage allocated for the zdnn_ztensor struct itself._ #### Format ```C zdnn_status zdnn_free_ztensor_buffer(const zdnn_ztensor *ztensor); ``` #### Parameters - `zdnn_ztensor *tensor` - A `zdnn_ztensor` struct with field buffer pointing to storage allocated. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_BUFFER` - `tensor->buffer` is `NULL` #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_status_message #### Description Retrieve status message of the status code #### Format ```C const char *zdnn_get_status_message(zdnn_status status); ``` #### Parameters - `zdnn_status status` - Status code #### Returns Pointer to the description string or "(Status string is not defined.)" if `status` is not defined. #### Since 1.0.0 #### Requirements - Any System Z hardware level See [Validating the environment at runtime](#runtime-val). --- ### zdnn_reshape_ztensor #### Description Reshape and copy buffer content from source zTensor's buffer to destination zTensor's in accordance to destination zTensor's shape. The following conditions must be satisfied: - Both tensor's transformed_desc must be fully initialized - `dest->buffer` must be pre-allocated - `src` must be transformed - `dest` must be not already transformed - Both `transformed_desc->layout` must be the same and either NHWC or HWCK - Both zTensors must contain equal number of elements #### Format ```C zdnn_status zdnn_reshape_ztensor(const zdnn_ztensor *src, zdnn_ztensor *dest); ``` #### Parameters - `src` - Source zTensor to copy from - `dest` - Destination zTensor to copy to #### Programming Notes - If `src` and `dest` have the same `transformed_desc->dim1` dimension size, the transformed data is directly copied to the destination without untransformation. - If `src` and `dest` have different `transformed_desc->dim1` dimension sizes, reshaping will internally un-transform the source and then re-transform the values into the destination. #### Returns - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - `src`'s and `dest`'s `transformed_desc->dim*` total to different numbers of elements. - One of `dest->transformed_desc->dim*` dimensions is 0. - One of `dest->transformed_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - Note: concatenation dimensions have a smaller maximum size. See [LSTM](#lstm-hid_sz) or [GRU](#gru-hid_sz). - The total number of `dest->transformed_desc-dim*` elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_INVALID_LAYOUT` - (if any of the following are true) - `src`'s and `dest`'s `transformed_desc->layout` are not the same. - `transformed_desc->layout` is not `ZDNN_NHWC` nor `ZDNN_HWCK`. - `src->pre_transformed_desc->layout` is not recognized or is not a valid pre_transformed_desc layout. - `dest->pre_transformed_desc->layout` is not recognized or is not a valid pre_transformed_desc layout. - `ZDNN_INVALID_STATE` - (if any of the following are true) - `src` is not already transformed. - `dest` is already transformed. - `ZDNN_INVALID_FORMAT` - `src->transformed_desc->format` is not `ZDNN_FORMAT_4DFEATURE`. - `ZDNN_INVALID_TYPE` (if any of the following are true) - `src->pre_transformed_desc->type` is not recognized or is a transformed_desc type. - `dest->pre_transformed_desc->type` is not recognized or is a transformed_desc type. - `dest->transformed_desc->type` is not recognized or is a pre_transformed_desc type. - `ZDNN_INVALID_BUFFER` (if any of the following are true) - `src->buffer` is `NULL`. - `src->buffer` is not on a 4K boundary. - `dest->buffer` is `NULL`. - `dest->buffer` is not on a 4K boundary. - `dest->buffer_size` is too small to hold transformed values. - `ZDNN_CONVERT_FAILURE` - Values failed to un-transform or transform. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_is_version_runnable #### Description Check if application built for zDNN version `ver_num` can be run on the current zAIU hardware with the installed zDNN library #### Format ```C bool zdnn_is_version_runnable(uint32_t ver_num); ``` #### Parameters - `ver_num` - Version number of the zDNN library application itself, in 0x00\[major\]\[minor\]\[patch\] form. Typically this is the ZDNN_VERNUM used to compile the application #### Returns - true/false #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_get_max_runnable_version #### Description Returns the maximum version number associated with the APIs supported by the hardware and zDNN software in the current environment. This can be compared with the version documented in the "REQUIRES" section of each programming interface to discern whether the interface is supported at run-time. The returned value is a version number in the `major`.`minor` format. APIs defined at that level and below will be supported in the current environment. #### Format ```C uint32_t zdnn_get_max_runnable_version(); ``` #### Parameters - None #### Returns - A 32-bit zDNN version number in `0x00\[major\]\[minor\]FF` form. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ## Data Transformation [Back to Table of Contents](#TOC) - [Transform to zTensor](#zdnn_transform_ztensor) - [Transform to zTensor with saturation](#zdnn_transform_ztensor_with_saturation) - [Transform to quantized zTensor](#zdnn_transform_quantized_ztensor) - [Transform to Original](#zdnn_transform_origtensor) --- zAIU requires the tensor data to be arranged in a format that enhances the performance characteristics of the operations. In this documentation, it is referred to as "transformed format". In addition, data conversions are necessary from the common formats (FP32, FP16, BFLOAT) to formats (DLFLOAT16) supported by the zAIU (DLFLOAT16, INT8). The following functions are provided: - '`zdnn_transform_ztensor` and `zdnn_transform_ztensor_with_saturation` - These functions will transform the input tensor and convert the input data to the format required by the zAIU. The resulting transformed ztensor can be reused as many times as necessary. - See [zdnn_transform_ztensor](#zdnn_transform_ztensor) and [zdnn_transform_ztensor_with_saturation](#zdnn_transform_ztensor_with_saturation) for details and restrictions on transforming an input tensor to the internal format. - `zdnn_transform_origtensor` - zdnn_transform_origtensor transforms a ztensor (usually output from an operation or network) to the format and data types that are usable by the application. - See [zdnn_transform_origtensor](#zdnn_transform_origtensor) for details on transforming an input tensor to the internal format. --- ### zdnn_transform_ztensor #### Description Converts the input tensor to the supported transformed format for execution by zdnn operations. If transformation is successful the `is_transformed` field within `ztensor` will be set to `true` otherwise it is set to `false`. Transformation will fail if `is_transformed` was already `true`. _Note that the tensor layout in memory, once in transformed format, is dependent on the content of the input tensor's descriptors (`zdnn_tensor_desc` fields). Once converted, a `zdnn_ztensor` should only be manipulated by zDNN API functions._ #### Format ```C zdnn_status zdnn_transform_ztensor(zdnn_ztensor *ztensor, ...); ``` #### Parameters - `zdnn_ztensor *tensor` - The input `zdnn_ztensor` struct. `pre_transformed_desc` and `transformed_desc` must be set, `is_transformed` must be `false`. A 4k-aligned tensor storage must be pre-allocated by the caller (directly or by calling the zDNN allocation helper function) and field `buffer` must point to the storage. - `... (additional arguments)` - Variadic: list of pointers for input data to be transformed: - Non-concatenated: 1 data pointer - LSTM concatenated: 4 data pointers, one for each input gate in Forget, Input, Cell, Output (FICO) order - GRU concatenated: 3 data pointers, one for each input gate in (Z)update, Reset, Hidden, (ZRH) gate order #### Programming Notes - This function clears the pre-thread floating-point exception flags at entry, and may set `FE_UNDERFLOW` / `FE_INVALID` / `FE_INEXACT` / `FE_OVERFLOW` when it encounters errors during data conversion. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_FORMAT` - `zdnn_ztensor->transformed_desc->format` is not recognized. - `ZDNN_INVALID_LAYOUT` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->layout` is not recognized or is not a valid pre_transformed_desc layout. - `zdnn_ztensor->transformed_desc->layout` is not recognized or is not a valid transformed_desc layout. - `ZDNN_INVALID_TYPE` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->type` is not recognized or is a transformed_desc type. - `zdnn_ztensor->transformed_desc->type` is not recognized or is a pre_transformed_desc type. - `ZDNN_INVALID_BUFFER` (if any of the following are true) - `buffer` is `NULL`. - `buffer` is not on a 4K boundary. - `buffer_size` is too small to hold transformed values. - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - One of `zdnn_ztensor->transformed_desc->dim*` dimensions is 0. - One of `zdnn_ztensor->transformed_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - Note: concatenation dimensions have a smaller maximum size. See [LSTM](#lstm-hid_sz) or [GRU](#gru-hid_sz). - The total number of transformed_desc elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_INVALID_STATE` - Tensor is already transformed. - `ZDNN_CONVERT_FAILURE` - Values failed to transform. - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - unsupported transformation function. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_transform_ztensor_with_saturation #### Description Converts the input tensor to the supported transformed format for execution by zdnn operations. If during transformation, an element results in a value that exceeds the smallest or largest value that can be represented by DLFLOAT16, the resulting element will contain the smallest or largest value and no range-violation status will be triggered. If transformation is successful the `is_transformed` field within `ztensor` will be set to `true` otherwise it is set to `false`. Transformation will fail if `is_transformed` was already `true`. _Note that the tensor layout in memory, once in transformed format, is dependent on the content of the input tensor's descriptors (`zdnn_tensor_desc` fields). Once converted, a `zdnn_ztensor` should only be manipulated by zDNN API functions._ #### Format ```C zdnn_status zdnn_transform_ztensor_with_saturation(zdnn_ztensor *ztensor, ...); ``` #### Parameters - `zdnn_ztensor *tensor` - The input `zdnn_ztensor` struct. `pre_transformed_desc` and `transformed_desc` must be set, `is_transformed` must be `false`. A 4k-aligned tensor storage must be pre-allocated by the caller (directly or by calling the zDNN allocation helper function) and field `buffer` must point to the storage. - Has the following additional restrictions: - Only non-quantized ztensors are supported. Use `zdnn_transform_quantized_ztensor` if required. - `... (additional arguments)` - Variadic: list of pointers for input data to be transformed: - 1 data pointer supported at this time. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_ELEMENT_RANGE_VIOLATION` - `ZDNN_INVALID_FORMAT` - `zdnn_ztensor->transformed_desc->format` is not ZDNN_FORMAT_4DFEATURE. - `ZDNN_INVALID_LAYOUT` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->layout` is not recognized or is not a valid pre_transformed_desc layout. - `zdnn_ztensor->transformed_desc->layout` is not recognized or is not a valid transformed_desc layout. - `ZDNN_INVALID_TYPE` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->type` is not recognized or is not a valid pre_transformed_desc type. - `zdnn_ztensor->transformed_desc->type` is not recognized or is not a valid transformed_desc type. - `ZDNN_INVALID_BUFFER` (if any of the following are true) - `buffer` is `NULL`. - `buffer` is not on a 4K boundary. - `buffer_size` is too small to hold transformed values. - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - One of `zdnn_ztensor->transformed_desc->dim*` dimensions is 0. - One of `zdnn_ztensor->transformed_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - The total number of transformed_desc elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_INVALID_STATE` - Tensor is already transformed. - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - unsupported transformation function. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_transform_quantized_ztensor #### Description Converts the input tensor to the supported quantized transformed format for execution by zdnn operations. If transformation is successful the `is_transformed` field within `ztensor` will be set to `true` otherwise it is set to `false`. Transformation will fail if `is_transformed` was already `true`. _Note that the tensor layout in memory, once in transformed format, is dependent on the content of the input tensor's descriptors (`zdnn_tensor_desc` fields). Once converted, a `zdnn_ztensor` should only be manipulated by zDNN API functions._ #### Format ```C zdnn_status zdnn_transform_quantized_ztensor(zdnn_ztensor *ztensor, bool saturation_control, int8_t clip_min, int8_t clip_max, const void *data); ``` #### Parameters - `zdnn_ztensor *tensor` - The input `zdnn_ztensor` struct. `pre_transformed_desc` and `transformed_desc` must be set, `is_transformed` must be `false`. A 4k-aligned tensor storage must be pre-allocated by the caller (directly or by calling the zDNN allocation helper function) and field `buffer` must point to the storage. - Has the following additional restrictions: - Only the following pre-transformed layouts are supported. - ZDNN_1D - ZDNN_2D - ZDNN_2DS - ZDNN_3D - ZDNN_3DS - ZDNN_4D - ZDNN_NHWC - Only NHWC transformed layout is supported. - See [Quantized zTensor Requirements](#quan-zten-reqs) for supported transform types. - `bool saturation_control` - When enabled and an element results in a value that exceeds the smallest or largest value that can be represented by DLFLOAT16, the resulting element will contain the smallest or largest value and no range-violation status will be triggered. - Only applicable when all the following are true: - `zdnn_ztensor *tensor` is of zdnn_quantized_transform_types QUANTIZED_DLFLOAT16. - The `pre_transformed_desc` `type` of the `zdnn_ztensor *tensor` is FP32. - `int8_t clip_min` - Minimum clipping value - Only applicable when `zdnn_ztensor *tensor` is of zdnn_quantized_transform_types QUANTIZED_INT8. - Must be less than `clip_max` - `int8_t clip_max` - Maximum clipping value - Only applicable when `zdnn_ztensor *tensor` is of zdnn_quantized_transform_types QUANTIZED_INT8. - Must be greater than `clip_min` #### Programming Notes - This function clears the pre-thread floating-point exception flags at entry, and may set `FE_UNDERFLOW` / `FE_INVALID` / `FE_INEXACT` / `FE_OVERFLOW` when it encounters errors during data conversion. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_FORMAT` - `zdnn_ztensor->transformed_desc->format` is not recognized. - `ZDNN_INVALID_LAYOUT` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->layout` is not recognized or is not a valid pre_transformed_desc layout. - `zdnn_ztensor->transformed_desc->layout` is not recognized or is not a valid transformed_desc layout. - `ZDNN_INVALID_TYPE` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->type` is not recognized or is a transformed_desc type: [Quantized zTensor Requirements](#quan-zten-reqs) - `zdnn_ztensor->transformed_desc->type` is not recognized or is a pre_transformed_desc type: [Quantized zTensor Requirements](#quan-zten-reqs) - `ZDNN_INVALID_BUFFER` (if any of the following are true) - `buffer` is `NULL`. - `buffer` is not on a 4K boundary. - `buffer_size` is too small to hold transformed values. - `ZDNN_INVALID_SHAPE` - (if any of the following are true) - One of `zdnn_ztensor->transformed_desc->dim*` dimensions is 0. - One of `zdnn_ztensor->transformed_desc->dim*` dimensions is greater than [zdnn_get_max_for_dim](#zdnn_get_max_for_dim). - The total number of transformed_desc elements is larger than `zdnn_get_nnpa_max_tensor_size`. - `ZDNN_INVALID_STATE` - Tensor is already transformed. - `ZDNN_INVALID_CLIPPING_VALUE` - clip_min value is not less than clip_max value. - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Unsupported transformation function. - `ZDNN_FUNC_RC_F001` - Either scale or offset is non-numeric or scale value is zero. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ### zdnn_transform_origtensor #### Description Converts the input tensor from the zDNN transformed format back to a standard non-transformed layout. The `is_transformed` field within `ztensor` must be `true`. All stick format tensors are supported, except: - Kernel tensors - Concatenated RNN input-gates tensors #### Format ```C zdnn_status zdnn_transform_origtensor(const zdnn_ztensor *ztensor, void *out_buf); ``` #### Parameters - `zdnn_ztensor *ztensor` - The input `zdnn_ztensor` struct. `pre_transformed_desc`, `transformed_desc` and `buffer` must be set, `is_transformed` must be `true`. - `void *out_buf` - The buffer for storing the standard non-transformed tensor data. Must be pre-allocated by the caller. #### Programming Notes - This function clears the pre-thread floating-point exception flags at entry, and may set `FE_UNDERFLOW` / `FE_INVALID` / `FE_INEXACT` / `FE_OVERFLOW` when it encounters errors during data conversion. #### Returns zdnn_status indications - `ZDNN_OK` - `ZDNN_INVALID_FORMAT` - `ztensor->transformed_desc->format` is not `ZDNN_FORMAT_4DFEATURE`. - `ZDNN_INVALID_LAYOUT` - (if any of the following are true) - `zdnn_ztensor->pre_transformed_desc->layout` is not recognized or is not a valid pre_transformed_desc layout. - `zdnn_ztensor->transformed_desc->layout` is not recognized or is not a valid transformed_desc layout required by this function. - `ZDNN_INVALID_TYPE` - `ztensor->pre_transformed_desc->type` is not recognized or is a transformed_desc type. - `ztensor->transformed_desc->type` is not recognized or is a pre_transformed_desc type. - `ZDNN_INVALID_BUFFER` (if any of the following are true) - `ztensor->buffer` is `NULL`. - `ztensor->buffer` is not on a 4K boundary. - `ZDNN_INVALID_STATE` - `ztensor` is not transformed. - `ZDNN_CONVERT_FAILURE` - Values failed to un-transform. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). --- ## Operations See [Table of Contents](#TOC) for operations list --- ## Element-wise Operations [Back to Table of Contents](#TOC) - [Addition](#zdnn_add) - [Subtraction](#zdnn_sub) - [Multiplication](#zdnn_mul) - [Division](#zdnn_div) - [Minimum](#zdnn_min) - [Maximum](#zdnn_max) - [Natural Logarithm](#zdnn_log) - [Exponential](#zdnn_exp) - [Square Root](#zdnn_sqrt) - [Inverse Square Root](#zdnn_invsqrt) --- ### zdnn_add - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given two input tensors in zDNN transformed format, performs element-wise addition and stores the result into the provided output zDNN tensor. _Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape._ #### Format ```C zdnn_status zdnn_add(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with addends to add to `input_b` tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with addends to add to `input_a` tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor to hold the result of the addition - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Addition](https://www.tensorflow.org/api_docs/python/tf/math/add) [ONNX Addition](https://onnx.ai/onnx/operators/onnx__Add.html#l-onnx-doc-add) --- ### zdnn_sub - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given two input tensors in zDNN transformed format, performs element-wise subtraction and stores the result into the provided output zDNN tensor. _Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape._ #### Format ```C zdnn_status zdnn_sub(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with minuends that will be subtracted by `input_b` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with subtrahends to subtract from `input_a` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor to hold the result of the subtraction - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Subtraction](https://www.tensorflow.org/api_docs/python/tf/math/subtract) [ONNX Subtraction](https://onnx.ai/onnx/operators/onnx__Sub.html#l-onnx-doc-sub) --- ### zdnn_mul - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given two input tensors in zDNN transformed format, performs element-wise multiplication and stores the result into the provided output zDNN tensor. _Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape._ #### Format ```C zdnn_status zdnn_mul(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with multiplicands that will be multiplied by `input_b` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with multipliers for `input_a` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor to hold the result of the multiplication. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Multiplication](https://www.tensorflow.org/api_docs/python/tf/math/multiply) [ONNX Multiplication](https://onnx.ai/onnx/operators/onnx__Mul.html#l-onnx-doc-mul) --- ### zdnn_div - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given two input tensors in zDNN transformed format, performs element-wise division and stores the result into the provided output zDNN tensor. _Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape._ #### Format ```C zdnn_status zdnn_div(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with dividends that will be divided by `input_b` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with divisors for `input_a` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor to hold the result of the division. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Division](https://www.tensorflow.org/api_docs/python/tf/math/divide) [ONNX Division](https://onnx.ai/onnx/operators/onnx__Div.html#l-onnx-doc-div) --- ### zdnn_min - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given two input tensors in zDNN transformed format, computes the element-wise minimum and stores the result into the provided output zDNN tensor. _Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape._ #### Format ```C zdnn_status zdnn_min(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with values that will be compared with `input_b` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with values that will be compared with `input_a` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the smaller value from each comparison of the inputs. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Minimum](https://www.tensorflow.org/api_docs/python/tf/math/minimum) [ONNX Minimum](https://onnx.ai/onnx/operators/onnx__Min.html#l-onnx-doc-min) --- ### zdnn_max - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given two input tensors in zDNN transformed format, computes the element-wise maximum and stores the result into the provided output zDNN tensor. _Note that for zDNN use, broadcasting of the input tensor(s) must be performed by the caller. As such, the input tensors must be of the same shape._ #### Format ```C zdnn_status zdnn_max(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with values that will be compared with `input_b` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with values that will be compared with `input_a` tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the larger value from each comparison of the inputs. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions)s - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Maximum](ttps://www.tensorflow.org/api_docs/python/tf/math/maximum) [ONNX Maximum](https://onnx.ai/onnx/operators/onnx__Max.html#l-onnx-doc-max) --- ### zdnn_log - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given an input tensor in zDNN transformed format, computes the natural logarithm element-wise and stores the result into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_log(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the calculated natural logarithm of each value from `input_a` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Natural Logarithm](https://www.tensorflow.org/api_docs/python/tf/math/log) [ONNX Natural Logarithm](https://onnx.ai/onnx/operators/onnx__Log.html#l-onnx-doc-log) --- ### zdnn_exp - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given an input tensor in zDNN transformed format, computes the exponential element-wise and stores the result into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_exp(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the calculated exponential of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Exponential](https://www.tensorflow.org/api_docs/python/tf/math/exp) [ONNX Exponential](https://onnx.ai/onnx/operators/onnx__Exp.html#l-onnx-doc-exp) --- ### zdnn_sqrt - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given an input tensor in zDNN transformed format, computes the square root element-wise and stores the result into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_sqrt(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the calculated square root of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Square Root](https://www.tensorflow.org/api_docs/python/tf/math/sqrt) [ONNX Square Root](https://onnx.ai/onnx/operators/onnx__Sqrt.html#l-onnx-doc-sqrt) --- ### zdnn_invsqrt - [Back to Table of Contents](#TOC) - [Back to Element-wise Operations](#elwise-ops) #### Description Given an input tensor in zDNN transformed format, computes the inverse square root element-wise and stores the result into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_invsqrt(const zdnn_ztensor *input, float epsilon, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `float epsilon` - A float value added to input prior to computation. - `zdnn_ztensor *output` - Tensor that holds the calculated inverse square root of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_EPSILON` - [hardware statuses](#hw-statuses) #### Programming Notes - On some models, if either or both an element and epsilon are very large, the addition of the two may result in a nonnumeric value, the inverse square root of which will also be nonnumeric. This may occur even though the inverse square root of an unconstrained sum would easily fit in the data type of an output-tensor element. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Reciprical Square Root](https://www.tensorflow.org/api_docs/python/tf/math/rsqrt) --- ## Activation Operations [Back to Table of Contents](#TOC) - [Rectified Linear](#zdnn_relu) - [Leaky Rectified Linear](#zdnn_leaky_relu) - [Hyperbolic Tangent](#zdnn_tanh) - [Sigmoid](#zdnn_sigmoid) - [Softmax](#zdnn_softmax) - [Softmax with Mask](#zdnn_softmax_mask) - [Gaussian Error Linear Unit](#zdnn_gelu) --- ### zdnn_relu - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format produce an output tensor where the rectified linear function, y = max(0, x) is applied to the input element-wise. If an optional clipping_value is provided, clipping is performed against the intermediate output where z = min(y, clipping_value). #### Format ```C zdnn_status zdnn_relu(const zdnn_ztensor *input, const void *clipping_value, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `void *clipping_value` - A pointer to an FP32 value, used to clip input tensor's elements. - If set to NULL or 0, no clipping will occur. - Must not be a negative value. - `zdnn_ztensor *output` - Tensor that holds the rectified linear function result of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_CLIPPING_VALUE` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Rectified Linear](https://www.tensorflow.org/api_docs/python/tf/nn/relu) [ONNX Rectified Linear](https://onnx.ai/onnx/operators/onnx__Relu.html#l-onnx-doc-relu) --- ### zdnn_leaky_relu - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format produce an output tensor where the leaky rectified linear function is applied to the input element-wise. The calculation used depends on the input element. When negative, y = a \* x, where a is the adjustment factor. When 0 or positive, y = x. If an optional clipping_value is provided, clipping is performed against the intermediate output where z = min(y, clipping_value). #### Format ```C zdnn_status zdnn_leaky_relu(const zdnn_ztensor *input, const void *clipping_value, float adjustment_factor, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `void *clipping_value` - A pointer to an FP32 value, used to clip input tensor's elements. - If set to NULL or 0, no clipping will occur. - Must not be a negative value. - `float adjustment_factor` - A float value multiplied with negative elements from input. - Must not be a negative value. - Must not be greater than 1. - `zdnn_ztensor *output` - Tensor that holds the rectified linear function result of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_CLIPPING_VALUE` - `ZDNN_INVALID_ADJUSTMENT_FACTOR` - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Leaky Rectified Linear](https://www.tensorflow.org/api_docs/python/tf/nn/leaky_relu) [ONNX Leaky Rectified Linear](https://onnx.ai/onnx/operators/onnx__LeakyRelu.html#l-onnx-doc-leakyrelu) --- ### zdnn_tanh - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format, produces an output tensor where the hyperbolic tangent is applied to the input element-wise. #### Format ```C zdnn_status zdnn_tanh(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the hyperbolic tangent result of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Hyperbolic Tangent](https://www.tensorflow.org/api_docs/python/tf/math/tanh) [ONNX Hyperbolic Tangent](https://onnx.ai/onnx/operators/onnx__Tanh.html#l-onnx-doc-tanh) --- ### zdnn_sigmoid - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format, produces an output tensor where the sigmoid function is applied to the input element-wise. #### Format ```C zdnn_status zdnn_sigmoid(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the sigmoid result of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Sigmoid](https://www.tensorflow.org/api_docs/python/tf/math/sigmoid) [ONNX Sigmoid](https://onnx.ai/onnx/operators/onnx__Sigmoid.html#l-onnx-doc-sigmoid) --- ### zdnn_softmax - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format, computes the softmax (normalized exponential) for each vector formed in dimension-1, then if `act_func` is not `SOFTMAX_ACT_NONE`, the activation function is applied to the results. Finally stores the results into the provided output zDNN tensor. _Note: Other parameters, such as axis, are not supported._ #### Format ```C zdnn_status zdnn_softmax(const zdnn_ztensor *input, void *save_area, zdnn_softmax_act act_func, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - [ZDNN_3DS](#common-layouts) tensor with pre-transformed shape [batch size, batch size, vector dimension size] or output from another operation that is of the correct shape. - Must follow [general tensor requirements](#gen-zten-reqs) - `void *save_area` - A preallocated memory address to use for temporary storage during internal operation processing. - The preallocate memory must be at least 8K bytes in size, aligned on a 4k boundary. - If set to NULL, the operation will determine, allocate and free storage automatically. - `zdnn_softmax_act act_func` - Activation function to apply to the results. - `SOFTMAX_ACT_NONE` or `SOFTMAX_ACT_LOG` - `zdnn_ztensor *output` - [ZDNN_3DS](#common-layouts) tensor with the same shape as `input_a` that holds the softmax result of each value from `input_a`. - Must follow [general tensor requirements](#gen-zten-reqs) #### Programming Notes - If all elements of a dimension 1 vector are the largest magnitude negative number possible for the transformed data type, accuracy may be reduced. - A `ZDNN_3DS` tensor is expected, where the `transformed_desc` dim1 describes the vector, and dim2 and dim4 are used to batch multiple vector requests together. Dim3 must always be 1. The `zdnn_softmax` operation is performed against the vector in dim1 repeating for each dim1 vector in the dim4 and dim2 dimensions. - Tensors that cannot be processed as vectors in dim1 or as batches of dim1 vectors must be coerced or reshaped by the caller. - When the entire tensor is to be processed by softmax, it can be coerced by simply creating an alternate descriptor prior to zDNN transformation. For example: - A 4D tensor with `pre_transformed_desc` dimensions 2x2x2x2 and a data array of 16 FP32 entries could have an alternate `ZDNN_3DS` layout `pre_transformed_desc` using dimensions 8x1x2 and use the same original data array prior to `zdnn_transform_ztensor`. After transformation, such a tensor would be valid for `zdnn_softmax`. - In another example, the 4D 2x2x2x2 tensor could be processed as 8 batches of 2 vectors using a `ZDNN_3DS` layout `pre_transformed_desc` with dimensions 1x8x2. - The inner-most dimension must remain the same during this coercion. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_ALLOCATION_FAILURE` - A preallocated `save_area` was not specified and internal allocation for the required memory failed. - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - input tensor `input->transformed_desc->dim3` was not 1. - `ZDNN_FUNC_RC_F001` - Invalid `act_func` #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Softmax](https://www.tensorflow.org/api_docs/python/tf/nn/softmax) [ONNX Softmax](https://onnx.ai/onnx/operators/onnx__Softmax.html#l-onnx-doc-softmax) --- ### zdnn_softmax_mask - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format, computes the softmax (normalized exponential) for each vector formed in dimension-1 (from element zero to mask - 1), then if `act_func` is not `SOFTMAX_ACT_NONE`, the activation function is applied to the results. Finally stores the results into the provided output zDNN tensor. _Note: Other parameters, such as axis, are not supported._ #### Format ```C zdnn_status zdnn_softmax_mask(const zdnn_ztensor *input, void *save_area, zdnn_softmax_act act_func, uint32_t softmax_mask, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - [ZDNN_3DS](#common-layouts) tensor with pre-transformed shape [batch size, batch size, vector dimension size] or output from another operation that is of the correct shape. - Must follow [general tensor requirements](#gen-zten-reqs) - `void *save_area` - A preallocated memory address to use for temporary storage during internal operation processing. - The preallocate memory must be at least 8K bytes in size, aligned on a 4k boundary. - If set to NULL, the operation will determine, allocate and free storage automatically. - `zdnn_softmax_act act_func` - Activation function to apply to the results. - `SOFTMAX_ACT_NONE` or `SOFTMAX_ACT_LOG` - `uint32_t softmax_mask` - 32-bit unsigned binary integer that specifies a count of dimensions 1 elements to be processed. - If 0, behavior matches `zdnn_softmax` - Must not exceed dimension 1 of input tensor. - `zdnn_ztensor *output` - [ZDNN_3DS](#common-layouts) tensor with the same shape as `input_a` that holds the softmax result of each value from `input_a`. - Must follow [general tensor requirements](#gen-zten-reqs) #### Programming Notes - If all elements of a dimension 1 vector are the largest magnitude negative number possible for the transformed data type, accuracy may be reduced. - A `ZDNN_3DS` tensor is expected, where the `transformed_desc` dim1 describes the vector, and dim2 and dim4 are used to batch multiple vector requests together. Dim3 must always be 1. The `zdnn_softmax_mask` operation is performed against the vector in dim1 repeating for each dim1 vector in the dim4 and dim2 dimensions. - Tensors that cannot be processed as vectors in dim1 or as batches of dim1 vectors must be coerced or reshaped by the caller. - When the entire tensor is to be processed by softmax, it can be coerced by simply creating an alternate descriptor prior to zDNN transformation. For example: - A 4D tensor with `pre_transformed_desc` dimensions 2x2x2x2 and a data array of 16 FP32 entries could have an alternate `ZDNN_3DS` layout `pre_transformed_desc` using dimensions 8x1x2 and use the same original data array prior to `zdnn_transform_ztensor`. After transformation, such a tensor would be valid for `zdnn_softmax_mask`. - In another example, the 4D 2x2x2x2 tensor could be processed as 8 batches of 2 vectors using a `ZDNN_3DS` layout `pre_transformed_desc` with dimensions 1x8x2. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_ALLOCATION_FAILURE` - A preallocated `save_area` was not specified and internal allocation for the required memory failed. - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - input tensor `input->transformed_desc->dim3` was not 1. - `ZDNN_FUNC_RC_F001` - Invalid `act_func` - `ZDNN_FUNC_RC_F002` - `softmax_mask` exceeds dimension 1 of input tensor. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Softmax](https://www.tensorflow.org/api_docs/python/tf/nn/softmax) [ONNX Softmax](https://onnx.ai/onnx/operators/onnx__Softmax.html#l-onnx-doc-softmax) --- ### zdnn_gelu - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format produce an output tensor where the Gaussian Error Linear Unit activation function, y = 0.5 \* x \* (1 + tanh(x \* 0.7978845608 \* (1 + 0.044715 \* x \* x))), is applied to the input element-wise. #### Format ```C zdnn_status zdnn_gelu(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor that holds the Gaussian Error Linear Unit results of each value from `input` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Programming Notes - The range of certain input-element values may result in an error of greater than 1% in the output element, however the accuracy of properly conditioned models is not significantly degraded. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Gaussian Error Linear Unit](https://www.tensorflow.org/api_docs/python/tf/nn/gelu) [ONNX Gaussian Error Linear Unit](https://onnx.ai/onnx/operators/onnx__Gelu.html#l-onnx-doc-gelu) --- ## Normalization Operations [Back to Table of Contents](#TOC) - [Mean Reduce](#zdnn_meanreduce2d) - [Batch Norm](#zdnn_batchnorm) - [Normalization](#zdnn_norm) - [Moments](#zdnn_moments) - [Layer Normalization](#zdnn_layernorm) - [Reduce](#zdnn_reduce) --- ### zdnn_meanreduce2d - [Back to Table of Contents](#TOC) - [Back to Normalization Operations](#norm-ops) #### Description Given an input tensor in zDNN transformed format, produces a downsampled tensor reducing the middle dimensions to a size of 1 based on the mean of the original values and stores the result to the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_meanreduce2d(const zdnn_ztensor *input, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [batch_Num, Height, Width, Channel]. - Height and Width dimension must be less than or equal to 1024. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - The result tensor which will hold the result of the pooling operation in its buffer. - Shape: - `output` dimensions batch_Num and Channel must be the same as the respective input dimensions. - `output` dimensions Height and Width must be 1. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - Shape of input or output tensor is invalid based on given kernel and stride parameters - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F001` - `input` tensor has a Height or Width dimension greater than allowed for `zdnn_meanreduce2d`. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Reduce Mean] with `axis` set for the Height and Width axes and `keepdims` set to True. [tensorflow reduce mean]: https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean [ONNX Reduce Mean] [onnx reduce mean]: https://onnx.ai/onnx/operators/onnx__ReduceMean.html#l-onnx-doc-reducemean --- ### zdnn_batchnorm - [Back to Table of Contents](#TOC) - [Back to Normalization Operations](#norm-ops) #### Description Given three input zDNN tensors `input_a`, `input_b`, and `input_c`, computes the batch-normalized result for each vector formed in dimension-1 as follows: output = input_b \* input_a + input_c where `input_b` is a precomputed elementwise divide of scale and variance tensors, and `input_c` is a precomputed elementwise multiply of (-1) \* mean and 'input_b' + input bias tensors. #### Format ```C zdnn_status zdnn_batchnorm(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, const zdnn_ztensor *input_c, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Must be a 4D [ZDNN_NHWC](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Must be a 1D [ZDNN_1D](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_c` - Must be a 1D [ZDNN_1D](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - A zdnn_ztensor of the same size as `input_a` representing the computed value of the above formula - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Batchnorm] [tensorflow batchnorm]: https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization [ONNX Batchnorm] [onnx batchnorm]: https://onnx.ai/onnx/operators/onnx__BatchNormalization.html#l-onnx-doc-batchnormalization --- ### zdnn_norm - [Back to Table of Contents](#TOC) - [Back to Normalization Operations](#norm-ops) #### Description Given input_a and input_b tensors in zDNN transformed format, produces the norm of the difference of vectors. Calculation is performed as follows: 1. Each element in dimension 1 of input_b is subtracted by the corresponding element of input_a. 2. The difference is squared. 3. The sum of the squared differences for dimension 1 is computed. 4. The square root of the sum is placed in the first element of dimension 1 of output tensor. #### Format ```C zdnn_status zdnn_norm(const zdnn_ztensor *input_a, zdnn_ztensor *input_b, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output` - Tensor with the result of the normalization operation. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Normalization] [tensorflow normalization]: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization [ONNX Normalization] N / A --- ### zdnn_moments - [Back to Table of Contents](#TOC) - [Back to Normalization Operations](#norm-ops) #### Description Given an input tensor in zDNN transformed format and a bessel correction type, this produces the mean and variance for respective input tensor. #### Format ```C zdnn_status zdnn_moments(const zdnn_ztensor *input, zdnn_moments_bessel bessel_correction_type, zdnn_ztensor *output_a, zdnn_ztensor *output_b); ``` #### Parameters - `zdnn_ztensor *input_a` - Must be a 4D [ZDNN_NHWC](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_moments_bessel bessel_correction_type` - Bessel correction type to perform moments. - `MOMENTS_BESSEL_POPULATION` - `MOMENTS_BESSEL_SAMPLE` - `zdnn_ztensor *output_a` - The output tensor that will hold the mean. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *output_b` - The output tensor that will hold the variance. - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_BESSEL_CORRECTION` - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Programming Notes - The `zdnn_moments` operation may be used in combination of the `zdnn_layernorm` operation. Please see [zdnn_layernorm](#zdnn_layernorm) for more guidance. - When `MOMENTS_BESSEL_SAMPLE` is provided for the bessel correction type, all provided input dimensions of the input tensor must not be equal to 1. #### Framework Examples [TensorFlow Moments] [tensorflow moments]: https://www.tensorflow.org/api_docs/python/tf/nn/moments [ONNX Moments] N/A --- ### zdnn_layernorm - [Back to Table of Contents](#TOC) - [Back to Normalization Operations](#norm-ops) #### Description Given input_a, input_b, and input_c tensors in zDNN transformed format, produces the layernorm of the given tensors. Calculation is performed as follows: 1. Each element in dimension 1 of input_b is subtracted by the corresponding element of input_a. 2. A corresponding element of input_c is added to epsilon. 3. The square root of the sume from step 2 is computed. 4. The difference from step 1 is divided by the result of step 3. 5. The quotient from step 4 is multiplied by gamma. 6. The product from step 5 is added to beta. 7. Result is stored in the corresponding element of output. The above calculation could be depicted as follows: layernorm(a)=(a-b)/sqrt(c+\epsilon)*\gamma+\beta #### Format ```C zdnn_status zdnn_layernorm(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, const zdnn_ztensor *input_c, float beta, float gamma, float epsilon, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input_a` - Must be a 4D [ZDNN_NHWC](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *input_b` - Must be a 4D [ZDNN_NHWC](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - Contains arithmetic means ([Moments](#zdnn_moments) output_a) - `zdnn_ztensor *input_c` - Must be a 4D [ZDNN_NHWC](#common-layouts) tensor - Must follow [general tensor requirements](#gen-zten-reqs) - Contains arithmetic variances ([Moments](#zdnn_moments) output_b) - `float beta` - Final result adjustment addend. - `float gamma` - Final result adjustment multiplier. - `float epsilon` - Intermediate variance adjustment. - `zdnn_ztensor *output` - Must follow [general tensor requirements](#gen-zten-reqs) #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_BETA` - `ZDNN_INVALID_GAMMA` - `ZDNN_INVALID_EPSILON` - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Programming Notes - `zdnn_layernorm` is intended to be used in combination with the `zdnn_moments` normalization operation. The `zdnn_moments` operation produces two output tensors containing the means and variances, respectively, of the dimension- 4-index elements of the input tensor. The original input tensor to `zdnn_moments` is intended to be used as the input-tensor 1 to `zdnn_layernorm`. The output-tensors 1 and 2 of `zdnn_moments`are intended to be used as input as input-tensor 2 and input-tensor 3 of the `zdnn_layernorm` operation. - The beta and gamma values in the 4th and 5th parameters of `zdnn_layernorm`, (also reffered to as bias and gain), provide a learned scale and offset. The epsilon value in parameter 6 of `zdnn_layernorm` is intended to be a small value (for example, 0.001) to provide numerical stability. #### Framework Examples [TensorFlow Layernorm](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization) [ONNX Layernorm](https://onnx.ai/onnx/operators/onnx__LayerNormalization.html#l-onnx-doc-layernormalization) --- ### zdnn_reduce - [Back to Table of Contents](#TOC) - [Back to Activation Operations](#act-ops) #### Description Given an input tensor in zDNN transformed format, produces an output tensor where the given reduction operation is performed. #### Format ```C zdnn_status zdnn_reduce(const zdnn_ztensor *input, void *save_area, zdnn_reduce_ops op_type, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with values to evaluate. - Must follow [general tensor requirements](#gen-zten-reqs) - `void *save_area` - A preallocated memory address to use for temporary storage during internal operation processing. - The preallocate memory must be at least 8K bytes in size, aligned on a 4k boundary. - If set to NULL, the operation will determine, allocate and free storage automatically. - `zdnn_reduction_ops op_type` - Reduction Operation to perform on input tensor. - `REDUCE_OP_MINIMUM` - `REDUCE_OP_MINIMUM_IDX` - `REDUCE_OP_MAXIMUM` - `REDUCE_OP_MINIMUM_IDX` - `zdnn_ztensor *output` - Tensor that holds the reduction operation result of each value from `input` - Output dimension 1 must 1 - Must follow [general tensor requirements](#gen-zten-reqs) - Data Type must be as follows: - (FP32, FP16, BFLOAT) when `op_type` is `REDUCE_OP_MINIMUM` or `REDUCE_OP_MAXIMUM`. - INT32 when `op_type` is `REDUCE_OP_MINIMUM_IDX` or `REDUCE_OP_MAXIMUM_IDX` The output when op_type is `REDUCE_OP_MINIMUM` or `REDUCE_OP_MAXIMUM` can be initialized using: ```C zdnn_data_layouts input_layout = ZDNN_3DS; zdnn_data_types input_type = FP32; uint32_t dim4 = 4; uint32_t dim2 = 5; uint32_t dim1 = 6; zdnn_tensor_desc input_pre_transformed_desc; zdnn_init_pre_transformed_desc(input_layout, input_type, &input_pre_transformed_desc, dim4, dim2, dim1); zdnn_tensor_desc output_pre_transformed_desc; zdnn_init_pre_transformed_desc(input_layout, input_type, &output_pre_transformed_desc, dim4, dim2, 1); ``` The output when op_type is `REDUCE_OP_MINIMUM_IDX` or `REDUCE_OP_MAXIMUM_IDX` can be initialized using: ```C zdnn_data_layouts input_layout = ZDNN_3DS; zdnn_data_types input_type = FP32; uint32_t dim4 = 4; uint32_t dim2 = 5; uint32_t dim1 = 6; zdnn_tensor_desc input_pre_transformed_desc; zdnn_init_pre_transformed_desc(input_layout, input_type, &input_pre_transformed_desc, dim4, dim2, dim1); zdnn_data_types output_type = INT32; zdnn_tensor_desc output_pre_transformed_desc; zdnn_init_pre_transformed_desc(input_layout, output_type, &output_pre_transformed_desc, dim4, dim2, 1); ``` #### Programming Notes - If a nonnumeric element is encountered in a dimension-1 vecotr of input-tenzor 1, then (a) the resulting element in dimension 1 of output-tensor 1 is unpredictable, and the range-violation status will be returned. - When the reduction operation is `REDUCE_OP_MINIMUM_IDX` the index of the first min value, from left-to-right, is returned when there are mulitple elements with the same min value. - When the reduction operation is `REDUCE_OP_MAXIMUM_IDX` the index of the first max value, from left-to-right, is returned when there are mulitple elements with the same max value. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_UNAVAILABLE_FUNCTION` - `ZDNN_ALLOCATION_FAILURE` - A preallocated `save_area` was not specified and internal allocation for the required memory failed. - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Invalid `op_type`. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Reduce Min and Max] [tensorflow reduce minimum](https://www.tensorflow.org/api_docs/python/tf/math/reduce_min) [tensorflow reduce maximum](https://www.tensorflow.org/api_docs/python/tf/math/reduce_max) [ONNX Reduce Min and Max] [onnx reduce minimum](https://onnx.ai/onnx/operators/onnx__ReduceMin.html#l-onnx-doc-reducemin) [onnx reduce maximum](https://onnx.ai/onnx/operators/onnx__ReduceMax.html#l-onnx-doc-reducemax) --- ### zdnn_matmul_op [Back to Table of Contents](#TOC) #### Description Given three input zDNN tensors `input_a`, `input_b`, and `input_c`, determine the matrix multiplication of `input_a` \* `input_b` then perform one of the following operations, using `input_c` against the dot product, storing the result into the specified `output` zDNN tensor: - Addition - Compare - If dot product is greater than element. - Compare - If dot product is greater or equal to element. - Compare - If dot product is equal to element. - Compare - If dot product is not equal to element. - Compare - If dot product is less than or equal to element. - Compare - If dot product is less than element. For an operation type of addition, `input_c` is added to the intermediate dot product. For operation types of comparison, the intermediate dot product is compared to `input_c` and if the comparison is true, the result is set to a value of 1; otherwise it is set to a value of 0. The outermost dimension can optionally indicate that the inputs are stacks of matrices. The results for each matrix stack is independent of other stacks but all stacks are calculated in a single call. #### Format ```C zdnn_status zdnn_matmul_op(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, const zdnn_ztensor *input_c, zdnn_matmul_ops op_type, zdnn_ztensor *output); ``` #### Input / Output matmul tensor requirements - See table in this section for `pre_transformed_desc` and shape requirements for each tensor. - All tensors must either be stacked or unstacked. - Must follow [general tensor requirements](#gen-zten-reqs) | type | input_a | input_b | input_c | result | | --------- | -------------------- | -------------------- | ----------------- | -------------------- | | unstacked | `ZDNN_2D` (m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_2D` (m, p) | | stacked | `ZDNN_3DS` (s, m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | #### Parameters - `zdnn_ztensor *input_a` - Input tensor with the first matrix for multiplication - pre_transformed shape and layout must match [matmul tensor requirements](#matmul-io-table) - `zdnn_ztensor *input_b` - Input tensor with the second matrix for multiplication - pre_transformed shape and layout must match [matmul tensor requirements](#matmul-io-table) - `zdnn_ztensor *input_c` - Input tensor that will have the requested operation performed against the intermediate dot product of `input_a` and `input_b`. - pre_transformed shape and layout must match [matmul tensor requirements](#matmul-io-table) - `zdnn_matmul_ops op_type` - Operation to perform on dot product. - `MATMUL_OP_ADDITION` - `MATMUL_OP_GREATER` - `MATMUL_OP_GREATER_EQUAL` - `MATMUL_OP_EQUAL` - `MATMUL_OP_NOT_EQUAL` - `MATMUL_OP_LESSER_EQUAL` - `MATMUL_OP_LESSER` - `zdnn_ztensor *output` - The output tensor which will hold the result of the operation in its buffer. - pre_transformed shape and layout must match [matmul tensor requirements](#matmul-io-table) #### Programming Notes - Care must be exercised when comparing values for equality or inequality since the order of operations and rounding may produce, what appear to be, slightly different values when they are essentially the same value. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Invalid `op_type`. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow MatMul](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/mat-mul) [ONNX MatMul](https://onnx.ai/onnx/operators/onnx__MatMul.html#l-onnx-doc-matmul) --- ### zdnn_matmul_bcast_op [Back to Table of Contents](#TOC) #### Description Given three input zDNN tensors `input_a`, `input_b`, and `input_c`, determine the matrix multiplication of `input_a` \* `input_b`, then perform one of the following operations, using `input_c` against the dot product, storing the result into the specified `output` zDNN tensor: - Addition - Compare - If dot product is greater than element. - Compare - If dot product is greater or equal to element. - Compare - If dot product is equal to element. - Compare - If dot product is not equal to element. - Compare - If dot product is less than or equal to element. - Compare - If dot product is less than element. When an input is `ZDNN_3DS`, the outermost dimension for that input can optionally indicate that the input is a stack of matrices. Likewise, when an input is `ZDNN_2DS`, the outermost dimension for that input can optionally indicate that the input is a stack of vectors For exmaple, if `input_a` were `ZDNN_3DS`, each stack of `input_a` is multiplied by the same `input_b` matrix and `input_c` vector which are broadcast over each stack of `input_a`. Results for each stack are returned in the corresponding stack index of `output`. Likewise, if `input_b` were `ZDNN_3DS` and `input_c` were `ZDNN_2DS`, each stack of `input_b` is multiplied by the same `input_a` matrix which is broadcast over each stack of `input_b` and `input_c`. Results for each stack are returned in the corresponding stack index of `output`. #### Format ```C zdnn_status zdnn_matmul_bcast_op(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, const zdnn_ztensor *input_c, zdnn_matmul_bcast_ops op_type, zdnn_ztensor *output); ``` #### Input / Output matmul broadcast tensor requirements - See table in this section for `pre_transformed_desc` and shape requirements for each tensor. - Must follow [general tensor requirements](#gen-zten-reqs) | type | input_a | input_b | input_c | result | | --------- | -------------------- | -------------------- | ----------------- | -------------------- | | unstacked | `ZDNN_2D` (m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_2D` (m, p) | | stacked | `ZDNN_3DS` (s, m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | | bcast1 | `ZDNN_2D` (m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | | bcast23 | `ZDNN_3DS` (s, m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_3DS` (s, m, p) | #### Parameters - `zdnn_ztensor *input_a` - Input tensor with the first matrix for multiplication. - pre_transformed shape and layout must match [matmul broadcast tensor requirements](#matmul-bcast-io-table) - `zdnn_ztensor *input_b` - Input tensor with the second matrix for multiplication. - pre_transformed shape and layout must match [matmul broadcast tensor requirements](#matmul-bcast-io-table) - `zdnn_ztensor *input_c` - Input tensor that will have the requested operation performed against the intermediate dot product for each "m" dimension in `output`. - pre_transformed shape and layout must match [matmul broadcast tensor requirements](#matmul-bcast-io-table) - `zdnn_matmul_bcast_ops op_type` - Operation to perform on dot product. - `MATMUL_BCAST_OP_ADDITION` - `MATMUL_BCAST_OP_GREATER` - `MATMUL_BCAST_OP_GREATER_EQUAL` - `MATMUL_BCAST_OP_EQUAL` - `MATMUL_BCAST_OP_NOT_EQUAL` - `MATMUL_BCAST_OP_LESSER_EQUAL` - `MATMUL_BCAST_OP_LESSER` - `zdnn_ztensor *output` - The output tensor which will hold the result of the operation in its buffer. - pre_transformed shape and layout must match [matmul broadcast tensor requirements](#matmul-bcast-io-table) #### Programming Notes - When `NNPA_PARMBLKFORMAT_1` is not installed, `zdnn_matmul_bcast_ops` only supports the `MATMUL_BCAST_OP_ADDITION` op_type. If any other op_type is provided, `ZDNN_UNAVAILABLE_FUNCTION` is returned. - `BCAST1` is not supported when `NNPA_PARMBLKFORMAT_1` is not installed and will return `ZDNN_UNAVAILABLE_FUNCTION`. - Care must be exercised when comparing values for equality or inequality since the order of operations and rounding may produce what appear to be slightly different values when they are essentially the same value. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_UNAVAILABLE_FUNCTION` - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Invalid `op_type`. - `ZDNN_FUNC_RC_F001` - Invalid input/output type or format combination. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime for the following: - `zdnn_data_layouts` specification - bcast1 - `zdnn_matmul_bcast_ops` specifications: - `MATMUL_BCAST_OP_GREATER` - `MATMUL_BCAST_OP_GREATER_EQUAL` - `MATMUL_BCAST_OP_EQUAL` - `MATMUL_BCAST_OP_NOT_EQUAL` - `MATMUL_BCAST_OP_LESSER_EQUAL` - `MATMUL_BCAST_OP_LESSER` - the underlying hardware supports zDNN APIs 1.0.x or later at runtime. See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow MatMul](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/mat-mul) [ONNX MatMul](https://onnx.ai/onnx/operators/onnx__MatMul.html#l-onnx-doc-matmul) --- ### zdnn_matmul_transpose_op [Back to Table of Contents](#TOC) #### Description Given three input zDNN tensors `input_a`, `input_b`, and `input_c`, determine the matrix multiplication of `input_a` \* `input_b` then perform one of the following operations, using `input_c` against the dot product, storing the result into the specified `output` zDNN tensor: - Addition - Compare - If dot product is greater than element. - Compare - If dot product is greater or equal to element. - Compare - If dot product is equal to element. - Compare - If dot product is not equal to element. - Compare - If dot product is less than or equal to element. - Compare - If dot product is less than element. For an operation type of addition, `input_c` is added to the intermediate dot product. For operation types of comparison, the intermediate dot product is compared to `input_c` and if the comparison is true, the result is set to a value of 1; otherwise it is set to a value of 0. The outermost dimension can optionally indicate that the inputs are stacks of matrices. The results for each matrix stack is independent of other stacks but all stacks are calculated in a single call. #### Format ```C zdnn_status zdnn_matmul_transpose_op(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, const zdnn_ztensor *input_c, bool transpose_a, bool transpose_b, zdnn_matmul_ops op_type, zdnn_ztensor *output); ``` #### Input / Output matmul transpose tensor requirements - See table in this section for `pre_transformed_desc` and shape requirements for each tensor. - All tensors must either be stacked or unstacked. - Must follow [general tensor requirements](#gen-zten-reqs) | type | input_a | input_b | input_c | result | | --------- | -------------------- | -------------------- | ----------------- | -------------------- | | unstacked | `ZDNN_2D` (m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_2D` (m, p) | | stacked | `ZDNN_3DS` (s, m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | | bcast1 | `ZDNN_2D` (m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | | bcast23 | `ZDNN_3DS` (s, m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_3DS` (s, m, p) | #### Parameters - `zdnn_ztensor *input_a` - Input tensor with the first matrix for multiplication - pre_transformed shape and layout must match [matmul transpose tensor requirements](#matmul-transpose-io-table) - `zdnn_ztensor *input_b` - Input tensor with the second matrix for multiplication - pre_transformed shape and layout must match [matmul transpose tensor requirements](#matmul-transpose-io-table) - `zdnn_ztensor *input_c` - Input tensor that will have the requested operation performed against the intermediate dot product of `input_a` and `input_b`. - pre_transformed shape and layout must match [matmul transpose tensor requirements](#matmul-transpose-io-table) - `bool transpose_a` - Whether to transpose `input_a` prior to dot product. - If `true`, `input_a` should have the unstacked dimensions (n, m) or stacked dimensions (s, n, m) - `bool transpose_b` - Whether to transpose `input_b` prior to dot product. - If `true`, `input_b` should have the unstacked dimensions (p, n) or stacked dimensions (s, p, n) - `zdnn_matmul_ops op_type` - Operation to perform on dot product. - `MATMUL_OP_ADDITION` - `MATMUL_OP_GREATER` - `MATMUL_OP_GREATER_EQUAL` - `MATMUL_OP_EQUAL` - `MATMUL_OP_NOT_EQUAL` - `MATMUL_OP_LESSER_EQUAL` - `MATMUL_OP_LESSER` - `zdnn_ztensor *output` - The output tensor which will hold the result of the operation in its buffer. - pre_transformed shape and layout must match [matmul transpose tensor requirements](#matmul-transpose-io-table) #### Programming Notes - `zdnn_matmul_transpose_op` is not supported when `NNPA_PARMBLKFORMAT_1` is not installed and will return `ZDNN_UNAVAILABLE_FUNCTION`. - Care must be exercised when comparing values for equality or inequality since the order of operations and rounding may produce, what appear to be, slightly different values when they are essentially the same value. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_UNAVAILABLE_FUNCTION` - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Invalid `op_type`. - `ZDNN_FUNC_RC_F001` - Invalid input/output type or format combination. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.0.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow MatMul](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/mat-mul) [ONNX MatMul](https://onnx.ai/onnx/operators/onnx__MatMul.html#l-onnx-doc-matmul) --- ### zdnn_quantized_matmul_op [Back to Table of Contents](#TOC) #### Description Given three input zDNN tensors `input_a`, `input_b`, and `input_c`, determine the matrix multiplication of `input_a` \* `input_b` then perform one of the following operations, using `input_c` against the dot product, storing the result into the specified `output` zDNN tensor: - Addition - Compare - If dot product is greater than element. - Compare - If dot product is greater or equal to element. - Compare - If dot product is equal to element. - Compare - If dot product is not equal to element. - Compare - If dot product is less than or equal to element. - Compare - If dot product is less than element. For an operation type of addition, `input_c` is added to the intermediate dot product. For operation types of comparison, the intermediate dot product is compared to `input_c` and if the comparison is true, the result is set to a value of 1; otherwise it is set to a value of 0. The outermost dimension can optionally indicate that the inputs are stacks of matrices. The results for each matrix stack is independent of other stacks but all stacks are calculated in a single call. When dequantize is `true` the output will be dequantized after computation. When `pre_computed` is `true`. The pre-computed value of `input_c` for Addition can be achieved using: ```C Za = input_a->offset; Sa = 1 / input_a->rec_scale; Zb = input_b->offset; Sb = 1 / input_b->rec_scale; Zc = input_c->offset; Sc = 1 / input_c->rec_scale; Zy = output->offset; Sy = 1 / output->rec_scale; N = input_b->pre_transformed_desc->dim2; pre_computed = Zy - (Sc/Sy) * Zc - (Sc/Sy) * input_c + ((Sa * Sb) / Sy) * NZaZb; ``` The pre-computed value of `input_c` for Compare can be achieved using: ```C Za = input_a->offset; Sa = 1 / input_a->rec_scale; Zb = input_b->offset; Sb = 1 / input_b->rec_scale; Zc = input_c->offset; Sc = 1 / input_c->rec_scale; pre_computed = Sc / (Sa * Sb) * (input_c - Zc) + Za * sum(input_b, axis=-2) ``` #### Format ```C zdnn_status zdnn_quantized_matmul_op(const zdnn_ztensor *input_a, const zdnn_ztensor *input_b, const zdnn_ztensor *input_c, zdnn_matmul_ops op_type, const int8_t clip_min, const int8_t clip_max, const bool disable_clipping, const bool dequantize, const bool pre_computed, void *work_area, zdnn_ztensor *output); ``` #### Input / Output quantized matmul tensor requirements - See table in this section for `pre_transformed_desc` and shape requirements for each tensor. - All tensors must either be stacked or unstacked. - Must follow [general tensor requirements](#gen-zten-reqs) - All tensors should use `zdnn_generate_quantized_transformed_desc` when generating transformed descriptors, passing the appropriate `zdnn_quantized_transform_types`. - All quantized tensors should use `zdnn_init_quantized_ztensor` or `zdnn_init_quantized_ztensor_with_malloc` when initializing, passing the `scale` and `offset` quantization parameters. - `scale` must be in range ([-DLFLT_MAX](#zdnn_get_max_limit) <= scale <= [DLFLT_MAX](#zdnn_get_max_limit)) and scale != 0. - `offset` must be in range ([-DLFLT_MAX](#zdnn_get_max_limit) <= offset <= [DLFLT_MAX](#zdnn_get_max_limit)). - All quantized input tensors should use `zdnn_transform_quantized_ztensor` when transforming, passing the `clip_min` and `clip_max` quantization parameters. ##### zdnn_data_layouts | type | input_a | input_b | input_c | result | | --------- | -------------------- | -------------------- | ----------------- | -------------------- | | unstacked | `ZDNN_2D` (m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_2D` (m, p) | | stacked | `ZDNN_3DS` (s, m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | | bcast1 | `ZDNN_2D` (m, n) | `ZDNN_3DS` (s, n, p) | `ZDNN_2DS` (s, p) | `ZDNN_3DS` (s, m, p) | | bcast23 | `ZDNN_3DS` (s, m, n) | `ZDNN_2D` (n, p) | `ZDNN_1D` (p) | `ZDNN_3DS` (s, m, p) | ##### zdnn_quantized_transform_types | type | input_a | input_b | input_c | result | | ---------- | ------------------- | ---------------------- | -------------- | ------------------- | | normal | QUANTIZED_INT8 | QUANTIZED_WEIGHTS_INT8 | QUANTIZED_INT8 | QUANTIZED_DLFLOAT16 | | on-the-fly | QUANTIZED_DLFLOAT16 | QUANTIZED_WEIGHTS_INT8 | QUANTIZED_INT8 | QUANTIZED_DLFLOAT16 | #### Parameters - `zdnn_ztensor *input_a` - Input tensor with the first matrix for multiplication - pre_transformed shape and layout must match [quantized matmul tensor requirements](#quantized-matmul-io-table) - `zdnn_ztensor *input_b` - Input tensor with the second matrix for multiplication - pre_transformed shape and layout must match [quantized matmul tensor requirements](#quantized-matmul-io-table) - `zdnn_ztensor *input_c` - Input tensor that will have the requested operation performed against the intermediate dot product of `input_a` and `input_b`. - pre_transformed shape and layout must match [quantized matmul tensor requirements](#quantized-matmul-io-table) - `int8_t clip_min` - Minimum quantized value for `input_a` prior to dot product. - Only applicable when performing `on-the-fly` quantization. - Must be less than `clip_max`. - `int8_t clip_max` - Maximum quantized value for `input_a` prior to dot product. - Only applicable when performing `on-the-fly` quantization. - Must be greater than `clip_min`. - `bool disable_clipping` - When `true` disables clipping and rounding. - `bool dequantize` - Whether to dequantize returned ztensor. - `bool pre_computed` - Whether bias is already pre-computed. - `void *work_area` - A preallocated memory address to use for temporary storage during internal operation processing. - If set to NULL, the operation will determine, allocate and free storage automatically. - Amount of required storage is the same as `input_c->buffer_size`. - The start of the buffer must be 4k aligned. - `zdnn_matmul_ops op_type` - Operation to perform on dot product. - `MATMUL_OP_ADDITION` - `MATMUL_OP_GREATER` - `MATMUL_OP_GREATER_EQUAL` - `MATMUL_OP_EQUAL` - `MATMUL_OP_NOT_EQUAL` - `MATMUL_OP_LESSER_EQUAL` - `MATMUL_OP_LESSER` - `zdnn_ztensor *output` - The output tensor which will hold the result of the operation in its buffer. - pre_transformed shape and layout must match [quantized matmul tensor requirements](#quantized-matmul-io-table) #### Programming Notes - `zdnn_quantized_matmul_op` is not supported when `NNPA_PARMBLKFORMAT_1` is not installed and will return `ZDNN_UNAVAILABLE_FUNCTION`. - Care must be exercised when comparing values for equality or inequality since the order of operations and rounding may produce, what appear to be, slightly different values when they are essentially the same value. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - `ZDNN_INVALID_TYPE`: [Quantized zTensor Requirements](#quan-zten-reqs) - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_SCALE` - `ZDNN_INVALID_OFFSET` - `ZDNN_INVALID_CLIPPING_VALUE` - `ZDNN_UNAVAILABLE_FUNCTION` - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Invalid `op_type`. - `ZDNN_FUNC_RC_F001` - Invalid input/output type or format combination. - `ZDNN_FUNC_RC_F002` - Invalid input/output scale. #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Quantized MatMul](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/quantized-mat-mul) [ONNX Quantize Linear](https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html#l-onnx-doc-quantizelinear) --- ### zdnn_lstm [Back to Table of Contents](#TOC) #### Description Implements Long-Short Term Memory layer (LSTM - Hochreiter 1997). The following formula is computed for the input tensor input(t) for all time steps: (Default: f=Sigmoid, g=Tanh, h=Tanh): ```C - it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi) - ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Wbf + Rbf) - ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc) - Ct = ft (.) Ct-1 + it (.) ct - ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Wbo + Rbo) - Ht = ot (.) h(Ct) ``` #### Format ```C zdnn_status zdnn_lstm(const zdnn_ztensor *input, const zdnn_ztensor *h0, const zdnn_ztensor *c0, const zdnn_ztensor *weights, const zdnn_ztensor *biases, const zdnn_ztensor *hidden_weights, const zdnn_ztensor *hidden_biases, lstm_gru_direction direction, void *work_area, zdnn_ztensor *hn_output, zdnn_ztensor *cf_output); ``` Also see an [example](#example-of-an-application-calling-the-zdnn_lstm-api) in the usage example section. #### LSTM Input / Output requirements - `num_hidden` dimensions: - Any num_hidden dimension must be less than or equal to `zdnn_get_max_for_dim(2) / 4` elements. #### Parameters - `zdnn_ztensor *input` - Input must be a tensor with the shape (num_timesteps, num_batches, num_features) prior to transformation with the `zdnn_transform_ztensor` API. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *h0` - Tensor containing the initial hidden state with shape (num_dirs, num_batches, num_hidden) prior to transformation with the `zdnn_transform_ztensor` API. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - `zdnn_ztensor *c0` - Tensor containing the initial cell state with shape (num_dirs, num_batches, num_hidden) prior to transformation with the `zdnn_transform_ztensor` API. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - `zdnn_ztensor *weights` - Tensor containing the concatenated input connection weights in Forget, Input, Cell, Output (FICO) order. - Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_features, num_hidden) by the caller. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_LSTM` - `USAGE_WEIGHTS` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - `zdnn_ztensor *biases` - Tensor containing the concatenated input connection bias in Forget, Input, Cell, Output (FICO) order. - Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden). - Expects `pre_transformed_desc->layout` to be `ZDNN_2DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_LSTM` - `USAGE_BIASES` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - `zdnn_ztensor *hidden_weights` - Tensor containing the concatenated hidden connection weights in Forget, Input, Cell, Output (FICO) order. - Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_hidden, num_hidden) by the caller. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_LSTM` - `USAGE_HIDDEN_WEIGHTS` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - `zdnn_ztensor *hidden_biases` - Tensor containing the concatenated hidden connection bias in Forget, Input, Cell, Output (FICO) order. - Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden). - Expects `pre_transformed_desc->layout` to be `ZDNN_2DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_LSTM` - `USAGE_HIDDEN_BIASES` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - `lstm_gru_direction direction` - Direction indicator of `lstm_gru_direction direction` type. Valid values: - `FWD` (forward) - `BWD` (backward) - `BIDIR` (bi-directional). - For input and output shapes, the num_dirs dimension should be: - `1` for unidirectional calls such as FWD or BWD - `2` for bidirectional calls such that: - dimension 0 contains FWD values. - dimension 1 contains BWD values. - `void *work_area` - A preallocated memory address to use for temporary storage during internal operation processing. - If set to NULL, the operation will determine, allocate and free storage automatically. - Amount of required storage can be determined given the LSTM timestep, batch, and num_hidden values. - The sample code below creates a ztensor descriptor that is an equivalent size of the required `work_area`. To use this sample code yourself, replace the `num_timesteps`, `num_batches`, and `num_hidden` variables with your own values. ```C zdnn_tensor_desc desc; desc.dim4 = (4 * num_timesteps) + 6; desc.dim3 = 1; desc.dim2 = num_batches; desc.dim1 = num_hidden; uint64_t work_area_size = zdnn_getsize_ztensor(&desc); ``` - For bidirectional, twice the amount of contiguous storage is required. - The start of the buffer must be 4k aligned. - `zdnn_ztensor *hn_output` - Output results of the hidden states - Expects pre_transformed_desc->layout to be `ZDNN_4DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - Output pre-transformed shapes: - all timesteps: (num_timesteps, num_dirs, num_batches, num_hidden) - final timestep only: (1, num_dirs, num_batches, num_hidden) - For bidirectional (`BIDIR`) output: - Forward and backward results are concatenated on the innermost dimension. - Can be used directly as input for subsequent RNN layers without needing untransformation. - Can not be used directly as input for other non-RNN zDNN ops. - Untransformation is supported. - Note that for `BWD` and the backward component of `BIDIR` directions, the output order matches the order of the input, not the processing order. For example, the first input timestep is the last to be processed and its result is the first timestep of the output. - `zdnn_ztensor *cf_output` - Output results of the cell state for the last processed timestep - Expects pre_transformed_desc->layout to be `ZDNN_4DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - Output pre-transformed shapes: - (1, num_dirs, num_batches, num_hidden) - For bidirectional (`BIDIR`): - Forward and backward results are concatenated on the innermost dimension. - Can not be used directly as input for other non-RNN zDNN ops. - Untransformation is supported. #### Summary | | pre-transformed layout | pre-transformed shape | | -------------- | ---------------------- | --------------------------------------------------------------------------------------------------- | | input | `ZDNN_3DS` | (num_timesteps, num_batches, num_features) | | h0 | `ZDNN_3DS` | (num_dirs, num_batches, num_hidden) | | c0 | `ZDNN_3DS` | (num_dirs, num_batches, num_hidden) | | weights | `ZDNN_3DS` | (num_dirs, num_features, num_hidden) | | bias | `ZDNN_2DS` | (num_dirs, num_hidden) | | hidden_weights | `ZDNN_3DS` | (num_dirs, num_hidden, num_hidden) | | hidden_biases | `ZDNN_2DS` | (num_dirs, num_hidden) | | hn_output | `ZDNN_4DS` | (num_timesteps, num_dirs, num_batches, num_hidden)
(last timestep only when `num_timesteps` = 1) | | cf_output | `ZDNN_4DS` | (1, num_dirs, num_batches, num_hidden) | | | create transformed descriptor via | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | input | `zdnn_generate_transformed_desc` | | h0 | `zdnn_generate_transformed_desc` | | c0 | `zdnn_generate_transformed_desc` | | weights | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_WEIGHTS` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | bias | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_BIASES` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | hidden_weights | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_HIDDEN_WEIGHTS` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | hidden_biases | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_HIDDEN_BIASES` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | hn_output | `zdnn_generate_transformed_desc` | | cf_output | `zdnn_generate_transformed_desc` | #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_SHAPE` - (if any of the following are not true) - `hn_output` timesteps dimension must be 1 or the same size as `input` timestep dimension. - All tensors with a direction dimension have the same direction dimension size. - `input` timestep dimension must be greater than or equal to 1. - Other general shape violations (exceeds MDIS, etc.) - `ZDNN_INVALID_DIRECTION` - `direction` parameter was not a recognized `lstm_gru_direction`. - `ZDNN_ALLOCATION_FAILURE` - A preallocated `work_area` was not specified and internal allocation for the required memory failed. - [hardware statuses](#hw-statuses) #### Since 1.1.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell) [ONNX LSTM](https://onnx.ai/onnx/operators/onnx__LSTM.html#l-onnx-doc-lstm) --- ### zdnn_gru [Back to Table of Contents](#TOC) #### Description Implements Gated Recurrent Unit (Kyunghyun Cho 2014). Supports only reset after linear. The following formula is computed for the input tensor input(t) for all time steps: ```C (Default: f=Sigmoid, g=Tanh): - zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz) - rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr) - ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) - Ht = (1 - zt) (.) ht + zt (.) Ht-1 ``` #### Format ```C zdnn_status zdnn_gru(const zdnn_ztensor *input, const zdnn_ztensor *h0, const zdnn_ztensor *weights, const zdnn_ztensor *biases, const zdnn_ztensor *hidden_weights, const zdnn_ztensor *hidden_biases, lstm_gru_direction direction, void *work_area, zdnn_ztensor *hn_output); ``` Also see an [example](#example-of-an-application-calling-the-zdnn_gru-api) in the usage example section. #### GRU Input / Output requirements - `num_hidden` dimensions: - Any num_hidden dimension must be less than or equal to `zdnn_get_max_for_dim(2) / 3` elements. #### Parameters - `zdnn_ztensor *input` - Input must be a tensor with the shape (num_timesteps, num_batches, num_features) prior to transformation with the `zdnn_transform_ztensor` API. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *h0` - Tensor containing the initial hidden state with shape (num_dirs, num_batches, num_hidden) prior to transformation with the `zdnn_transform_ztensor` API. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - Must follow [num_hidden requirements](#gru-hid_sz) - `zdnn_ztensor *weights` - Tensor containing the concatenated input connection weights in (Z)update, Reset, Hidden, (ZRH) order. - Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_features, num_hidden) by the caller. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_GRU` - `USAGE_WEIGHTS` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#gru-hid_sz) - `zdnn_ztensor *biases` - Tensor containing the concatenated input connection bias in (Z)update, Reset, Hidden, (ZRH) order. - Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden). - Expects `pre_transformed_desc->layout` to be `ZDNN_2DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_GRU` - `USAGE_BIASES` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#gru-hid_sz) - `zdnn_ztensor *hidden_weights` - Tensor containing the concatenated hidden connection weights in (Z)update, Reset, Hidden, (ZRH) order. - Prior to transformation, each gate needs to be transposed to shape (num_dirs, num_hidden, num_hidden) by the caller. - Expects `pre_transformed_desc->layout` to be `ZDNN_3DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_GRU` - `USAGE_HIDDEN_WEIGHTS` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#gru-hid_sz) - `zdnn_ztensor *hidden_biases` - Tensor containing the concatenated hidden connection bias in (Z)update, Reset, Hidden, (ZRH) order. - Prior to transformation, expects each gate needs to be shape (num_dirs, num_hidden). - Expects `pre_transformed_desc->layout` to be `ZDNN_2DS`. - Expects `zdnn_concat_info` having the following flags turned on: - `RNN_TYPE_GRU` - `USAGE_HIDDEN_BIASES` - Appropriate `PREV_LAYER` flag: - `PREV_LAYER_NONE` if `input` tensor is not from a previous RNN layer - `PREV_LAYER_UNI` if `input` tensor is uni-directional output from a previous RNN layer - `PREV_LAYER_BIDIR` if `input` tensor is bi-directional output from a previous RNN layer - Must follow [concatenated tensor requirements](#concat-zten-reqs) - Must follow [num_hidden requirements](#gru-hid_sz) - `lstm_gru_direction direction` - Direction indicator of `lstm_gru_direction direction` type. Valid values: - `FWD` (forward) - `BWD` (backward) - `BIDIR` (bi-directional). - For input shapes, the num_dirs dimension should be: - `1` for unidirectional calls such as FWD or BWD - `2` for bidirectional calls such that: - dimension 0 contains FWD values. - dimension 1 contains BWD values. - `void *work_area` - A preallocated memory address to use for temporary storage during internal operation processing. - If set to NULL, the operation will determine, allocate and free storage automatically. - Amount of required storage can be determined given the GRU timestep, batch, and num_hidden values. - The sample code below creates a ztensor descriptor that is an equivalent size of the required `work_area`. To use this sample code yourself, replace the `num_timesteps`, `num_batches`, and `num_hidden` variables with your own values. ```C zdnn_tensor_desc desc; desc.dim4 = (3 * num_timesteps) + 5; desc.dim3 = 1; desc.dim2 = num_batches; desc.dim1 = num_hidden; uint64_t work_area_size = zdnn_getsize_ztensor(&desc); ``` - For bidirectional, twice the amount of contiguous storage is required. - The start of the buffer must be 4k aligned. - `zdnn_ztensor *hn_output` - Output results of the hidden states - Expects pre_transformed_desc->layout to be `ZDNN_4DS`. - Must follow [general tensor requirements](#gen-zten-reqs) - Must follow [num_hidden requirements](#lstm-hid_sz) - Output pre-transformed shapes: - all timesteps: (num_timesteps, num_dirs, num_batches, num_hidden) - final timestep only: (1, num_dirs, num_batches, num_hidden) - For bidirectional (`BIDIR`) output: - Forward and backward results are concatenated on the innermost dimension. - Can be used directly as input for subsequent RNN layers without needing untransformation. - Can not be used directly as input for other non-RNN zDNN ops. - Untransformation is supported. - Note that for `BWD` and the backward component of `BIDIR` directions, the output order matches the order of the input, not the processing order. For example, the first input timestep is the last to be processed and its result is the first timestep of the output. #### Summary | | pre-transformed layout | pre-transformed shape | | -------------- | ---------------------- | --------------------------------------------------------------------------------------------------- | | input | `ZDNN_3DS` | (num_timesteps, num_batches, num_features) | | h0 | `ZDNN_3DS` | (num_dirs, num_batches, num_hidden) | | weights | `ZDNN_3DS` | (num_dirs, num_features, num_hidden) | | bias | `ZDNN_2DS` | (num_dirs, num_hidden) | | hidden_weights | `ZDNN_3DS` | (num_dirs, num_hidden, num_hidden) | | hidden_biases | `ZDNN_2DS` | (num_dirs, num_hidden) | | hn_output | `ZDNN_4DS` | (num_timesteps, num_dirs, num_batches, num_hidden)
(last timestep only when `num_timesteps` = 1) | | | create transformed descriptor via | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | input | `zdnn_generate_transformed_desc` | | h0 | `zdnn_generate_transformed_desc` | | weights | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_WEIGHTS` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | bias | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_BIASES` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | hidden_weights | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_HIDDEN_WEIGHTS` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | hidden_biases | `zdnn_generate_transformed_desc_concatenated` - `RNN_TYPE_LSTM` + `USAGE_HIDDEN_BIASES` + one of the following:
`PREV_LAYER_NONE`/`PREV_LAYER_UNI`/`PREV_LAYER_BIDIR` | | hn_output | `zdnn_generate_transformed_desc` | #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_SHAPE` - (if any of the following are not true) - `hn_output` timesteps dimension must be 1 or the same size as `input` timestep dimension. - All tensors with a direction dimension have the same direction dimension size. - `input` timestep dimension must be greater than or equal to 1. - Other general shape violations (exceeds MDIS, etc.) - `ZDNN_INVALID_DIRECTION` - `direction` parameter was not a recognized `lstm_gru_direction`. - `ZDNN_ALLOCATION_FAILURE` - A preallocated `work_area` was not specified and internal allocation for the required memory failed. - [hardware statuses](#hw-statuses) #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow GRU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRUCell) [ONNX GRU](https://onnx.ai/onnx/operators/onnx__GRU.html#l-onnx-doc-gru) --- ### zdnn_avgpool2d [Back to Table of Contents](#TOC) #### Description Given an input tensor in zDNN transformed format, padding type, kernel size and kernel stride, produces a downsampled tensor reducing the middle dimensions based on the mean values within the kernel window at each step and stores the results into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_avgpool2d(const zdnn_ztensor *input, zdnn_pool_padding padding_type, uint32_t kernel_height, uint32_t kernel_width, uint32_t stride_height, uint32_t stride_width, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with original values to be downsampled in the output tensor. - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [batch_Num, Height, Width, Channel]. - See [Parameter Restrictions](#avgpool2d-parm-restrictions) below for information on the expected shape of the input tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `padding_type` - The type of padding to use for the pooling operations. - Valid values: are `SAME_PADDING` or `VALID_PADDING`. - See [Parameter Restrictions](#avgpool2d-parm-restrictions) below for information on the expected value of padding_type. - For information on "same" vs "valid" padding see: . - `kernel_height` - Size of the kernel window that passes over the input's height dimension. - See [Parameter Restrictions](#avgpool2d-parm-restrictions) below for information on the expected value of kerneL_height. - `kernel_width` - Size of the kernel window that passes over the input's width dimension. - See [Parameter Restrictions](#avgpool2d-parm-restrictions) below for information on the expected value of kerneL_width. - `stride_height` - Number of positions the kernel moves over input's height dimension at each step. - If `stride_height` is 0 then `stride_width` must also be 0. - If strides are greater than 0 then `stride_height` must be less than or equal to 30. - `stride_width` - Number of positions the kernel moves over the input's width dimension at each step. - If `stride_height` is 0 then `stride_width` must also be 0. - If strides are greater than 0 then `stride_width` must be less than or equal to 30. - `zdnn_ztensor *output` - The result tensor which will hold the result of the pooling operation its buffer. - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [batch_Num, Height, Width, Channel]. - See [Parameter Restrictions](#avgpool2d-parm-restrictions) below for information on the expected shape of the output tensor. - Must follow [general tensor requirements](#gen-zten-reqs) #### AvgPool2D Parameter Restrictions Parameter restrictions may vary based on provided strides and padding_type. - Input tensor batch_Num and Channel dimensions must always match the output tensor's respective dimensions. - If strides are 0: - Both input tensor's Height dimension and the kernel_height must match and be less than or equal to 1024. - Both input tensor's Width dimension and the kernel_width must match and be less than or equal to 1024. - Output tensor's height and width dimensions must be 1. - padding_type must be `VALID_PADDING`. - If strides are greater than zero: - kernel_width and kernel_height must be less than or equal to 64. - input tensor's height or weight dimension must not be greater than 1024. - If padding_type is `SAME_PADDING`: - Output tensor's height dimension must equal `ceil((float)input's height / stride_height)`. - Output tensor's width dimension must equal `ceil((float)input's width / stride_width)`. - If padding_type is `VALID_PADDING`: - Output tensor's height dimension must equal `ceil((float)(input's height - kernel_height + 1) / stride_height)`. - Output tensor's width dimension must equal `ceil((float)(input's width - kernel_width + 1) / stride_width)`. #### Programming Notes - If the magnitude of difference between elements of `input` is large (greater than 10), accuracy may be reduced. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - Shape of input or output tensor is invalid based on given kernel and stride parameters - Other general shape violations (exceeds MDIS, etc.) - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_STRIDE_PADDING` - `ZDNN_INVALID_STRIDES` - One stride was non-zero, but not the other. - [hardware statuses](#hw-statuses) - `ZDNN_EXCEEDS_MDIS` will also occur if any of the following conditions occur: - stride_height is larger than `zdnn_get_max_for_dim(3)`. - stride_width is larger than `zdnn_get_max_for_dim(2)`. - kernel_height is 0 or is larger than `zdnn_get_max_for_dim(3)`. - kernel_width is 0 or is larger than `zdnn_get_max_for_dim(2)`. - `ZDNN_FUNC_RC_F000` - Invalid `padding_type` - `ZDNN_FUNC_RC_F001` - `stride_height` = 0 and `stride_width` = 0, but a kernel parameter is greater than allowed (see `kernel_height` or `kernel_width` above) - `ZDNN_FUNC_RC_F002` - `stride_height` > 0 and `stride_width` > 0, but a kernel parameter is greater than allowed (see `kernel_height` or `kernel_width` above) - `ZDNN_FUNC_RC_F003` - `stride_height` > 0 and `stride_width` > 0, but a stride parameter is greater than allowed (see `stride_height` or `stride_width` above) - `ZDNN_FUNC_RC_F004` - `stride_height` > 0 and `stride_width` > 0, but either input tensor's height or weight dimension is greater than 1024. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow AvgPool](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/avg-pool) [ONNX AvgPool](https://onnx.ai/onnx/operators/onnx__AveragePool.html#l-onnx-doc-averagepool) --- ### zdnn_maxpool2d [Back to Table of Contents](#TOC) #### Description Given an input tensor in zDNN transformed format, padding type, kernel size and kernel stride, produces a downsampled tensor reducing the middle dimensions based on the maximum values within the kernel window at each step and stores the results into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_maxpool2d(const zdnn_ztensor *input, zdnn_pool_padding padding_type, uint32_t kernel_height, uint32_t kernel_width, uint32_t stride_height, uint32_t stride_width, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with original values to be downsampled in the output tensor. - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [batch_Num, Height, Width, Channel]. - See [Parameter Restrictions](#maxpool2d-parm-restrictions) below for information on the expected shape of the input tensor. - Must follow [general tensor requirements](#gen-zten-reqs) - `padding_type` - The type of padding to use for the pooling operations. - Valid values: are `SAME_PADDING` or `VALID_PADDING`. - See [Parameter Restrictions](#maxpool2d-parm-restrictions) below for information on the expected value of padding_type. - For information on "same" vs "valid" padding see: . - `kernel_height` - Size of the kernel window that passes over the input's height dimension. - See [Parameter Restrictions](#maxpool2d-parm-restrictions) below for information on the expected value of kerneL_height. - `kernel_width` - Size of the kernel window that passes over the input's width dimension. - See [Parameter Restrictions](#maxpool2d-parm-restrictions) below for information on the expected value of kerneL_width. - `stride_height` - Number of positions the kernel moves over input's height dimension at each step. - If `stride_height` is 0 then `stride_width` must also be 0. - If strides are greater than 0 then `stride_height` must be less than or equal to 30. - `stride_width` - Number of positions the kernel moves over the input's width dimension at each step. - If `stride_height` is 0 then `stride_width` must also be 0. - If strides are greater than 0 then `stride_width` must be less than or equal to 30. - `zdnn_ztensor *output` - The result tensor which will hold the result of the pooling operation its buffer. - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [batch_Num, Height, Width, Channel]. - See [Parameter Restrictions](#maxpool2d-parm-restrictions) below for information on the expected shape of the output tensor. - Must follow [general tensor requirements](#gen-zten-reqs) #### MaxPool2D Parameter Restrictions Parameter restrictions may vary based on provided strides and padding_type. - Input tensor batch_Num and Channel dimensions must always match the output tensor's respective dimensions. - If strides are 0: - Both input tensor's Height dimension and the kernel_height must match and be less than or equal to 1024. - Both input tensor's Width dimension and the kernel_width must match and be less than or equal to 1024. - Output tensor's height and width dimensions must be 1. - padding_type must be `VALID_PADDING`. - If strides are greater than zero: - kernel_width and kernel_height must be less than or equal to 64. - input tensor's height or weight dimension must not be greater than 1024. - If padding_type is `SAME_PADDING`: - Output tensor's height dimension must equal `ceil((float)input's height / stride_height)`. - Output tensor's width dimension must equal `ceil((float)input's width / stride_width)`. - If padding_type is `VALID_PADDING`: - Output tensor's height dimension must equal `ceil((float)(input's height - kernel_height + 1) / stride_height)`. - Output tensor's width dimension must equal `ceil((float)(input's width - kernel_width + 1) / stride_width)`. #### Programming Notes - If the magnitude of difference between elements of `input` is large (greater than 10), accuracy may be reduced. #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - `ZDNN_INVALID_SHAPE` - Shape of input or output tensor is invalid based on given kernel and stride parameters - Other general shape violations (exceeds MDIS, etc.) - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_STRIDE_PADDING` - `ZDNN_INVALID_STRIDES` - One stride was non-zero, but not the other. - [hardware statuses](#hw-statuses) - `ZDNN_EXCEEDS_MDIS` will also occur if any of the following conditions occur: - stride_height is larger than `zdnn_get_max_for_dim(3)`. - stride_width is larger than `zdnn_get_max_for_dim(2)`. - kernel_height is 0 or is larger than `zdnn_get_max_for_dim(3)`. - kernel_width is 0 or is larger than `zdnn_get_max_for_dim(2)`. - `ZDNN_FUNC_RC_F000` - Invalid `padding_type` - `ZDNN_FUNC_RC_F001` - `stride_height` = 0 and `stride_width` = 0, but a kernel parameter is greater than allowed (see `kernel_height` or `kernel_width` above) - `ZDNN_FUNC_RC_F002` - `stride_height` > 0 and `stride_width` > 0, but a kernel parameter is greater than allowed (see `kernel_height` or `kernel_width` above) - `ZDNN_FUNC_RC_F003` - `stride_height` > 0 and `stride_width` > 0, but a stride parameter is greater than allowed (see `stride_height` or `stride_width` above) - `ZDNN_FUNC_RC_F004` - `stride_height` > 0 and `stride_width` > 0, but either input tensor's height or weight dimension is greater than 1024. #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow MaxPool](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/max-pool) [ONNX MaxPool](https://onnx.ai/onnx/operators/onnx__MaxPool.html#l-onnx-doc-maxpool) --- ### zdnn_conv2d [Back to Table of Contents](#TOC) #### Description Perform 2D convolution over an input tensor in zDNN transformed format. First the `input` tensor is convolved with the `kernel` tensor. Then the `bias` tensor is added to the results. Then if `act_func` is not `CONV2D_ACT_NONE`, the activation function is applied to the results. Then if `act_func` is set to `CONV2D_ACT_RELU`, and clipping_value is not `NULL` or `0`, clipping is performed against the intermediate result where z = min(intermediate_result, clipping_value). Finally the results are stored into the provided output zDNN tensor. #### Format ```C zdnn_status zdnn_conv2d(const zdnn_ztensor *input, const zdnn_ztensor *kernel, const zdnn_ztensor *bias, zdnn_pool_padding padding_type, uint32_t stride_height, uint32_t stride_width, zdnn_conv2d_act act_func, const void *clipping_value, zdnn_ztensor *output); ``` #### Parameters - `zdnn_ztensor *input` - Tensor with original values to be downsampled in the output tensor. - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [num_batches, height_in, width_in, channels_in]. - See [Convolution 2D Requirements](#convolution-2d-requirements) for requirements. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *kernel` - The kernel tensor to convolute with the input tensor. - Must be a [ZDNN_HWCK](#common-layouts) tensor with pre_transformed shape [kernel_height, kernel_width, channels_in, channels_out]. - See [Convolution 2D Requirements](#convolution-2d-requirements) for requirements. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_ztensor *bias` - The bias tensor to add to the convoluted results. - Must be a [ZDNN_1D](#common-layouts) tensor with pre_transformed shape [channels_out]. - See [Convolution 2D Requirements](#convolution-2d-requirements) for requirements. - Must follow [general tensor requirements](#gen-zten-reqs) - `zdnn_pool_padding padding_type` - The type of padding to use for the pooling operations. - Valid values: are `SAME_PADDING` or `VALID_PADDING`. - For information on "same" vs "valid" padding see: . - `uint32_t stride_height` - Number of positions the kernel moves over the input's `dim3` dimension at each step. - See [Convolution 2D Requirements](#convolution-2d-requirements) for requirements. - `uint32_t stride_width` - Number of positions the kernel moves over the input's `dim2` dimension at each step. - See [Convolution 2D Requirements](#convolution-2d-requirements) for requirements. - `zdnn_conv2d_act act_func` - Activation function to apply to the results. - `CONV2D_ACT_NONE` or `CONV2D_ACT_RELU` - `void *clipping_value` - A pointer to an FP32 value, used to clip input tensor's elements. - If set to NULL or 0, no clipping will occur. - Must not be a negative value. - Value is ignored if `act_func` is not set to `CONV2D_ACT_RELU`. - `zdnn_ztensor *output` - The result tensor which will hold the results. - Must be a [ZDNN_NHWC](#common-layouts) tensor with pre_transformed shape [num_batches, height_out, width_out, channels_out]. - See [Convolution 2D Requirements](#convolution-2d-requirements) for requirements. - Must follow [general tensor requirements](#gen-zten-reqs) #### Convolution 2D Requirements | strides and padding | input (num_batches, height_in, width_in, channels_in) | kernel (kernel_height, kernel_width, channels_in, channels_out) | bias (channels_out) | output (num_batches, height_out, width_out, channels_out) | | ----------------------------------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | both strides > 0 and =< 13, SAME padding | | both kernel_height and kernel_width must be =< 64 | | height_out = ceil(height_in/stride_height)
width_out = ceil(width_in/stride_width) | | both strides > 0 and =< 13, VALID padding | height_in must be >= kernel_height
width_in must be >= kernel_width | both kernel_height and kernel_width must be =< 64 | | height_out = ceil((height_in - kernel_height + 1)/stride_height)
width_out = ceil((width_in - kernel_width + 1)/stride_width) | | both strides = 0, VALID padding | height_in must be = kernel_height
width_in must be = kernel_width | both kernel_height and kernel_width must be =< 448 | | both height_out and width_out must be 1 | #### Returns (see [zDNN Statuses](#common-statuses) for descriptions) - `ZDNN_OK` - [warning statuses](#warning-statuses) - `ZDNN_INVALID_SHAPE` - Shape of input or output tensor is invalid based on given kernel and stride parameters - Other general shape violations (exceeds MDIS, etc.) - `ZDNN_INVALID_TYPE` - `ZDNN_INVALID_FORMAT` - `ZDNN_INVALID_STRIDE_PADDING` - `ZDNN_INVALID_STRIDES` - `ZDNN_INVALID_CLIPPING_VALUE` - [hardware statuses](#hw-statuses) - `ZDNN_FUNC_RC_F000` - Invalid `padding_type` - `ZDNN_FUNC_RC_F001` - Invalid `act_func` - `ZDNN_FUNC_RC_F002` - `stride_height` = 0 and `stride_width` = 0, but either `kernel_height` or `kernel_width` > 448 - `ZDNN_FUNC_RC_F003` - `stride_height` > 0 and `stride_width` > 0, but either `kernel_height` or `kernel_width` > 64 - `ZDNN_FUNC_RC_F004` - Either `stride_height` or `stride_width` > 13 #### Since 1.0.0 #### Requirements This feature requires that: - `zdnn_is_nnpa_installed()` returns true - the underlying hardware supports zDNN APIs 1.1.x or later at runtime See [Validating the environment at runtime](#runtime-val). #### Framework Examples [TensorFlow Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) [ONNX Conv2D](https://onnx.ai/onnx/operators/onnx__Conv.html#l-onnx-doc-conv) ## Convenience Functions [Back to Table of Contents](#TOC) - None --- ## Usage Examples ### Example flow of an application calling the zDNN APIs [Back to Table of Contents](#TOC) ```C #include #include #include #include #include #include "zdnn.h" // *************************************************************************** // Sample: // // Create 2 zTensors a and b, and add them together via zdnn_add() // *************************************************************************** int main(int argc, char *argv[]) { zdnn_tensor_desc pre_tfrmd_desc, tfrmd_desc; zdnn_ztensor ztensor_a; zdnn_ztensor ztensor_b; zdnn_ztensor ztensor_out; zdnn_status status; uint32_t dim_n = 1, dim_h = 32, dim_w = 32, dim_c = 3; zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes uint64_t num_elements = dim_n * dim_h * dim_w * dim_c; // allocate tensor data storage void *data1 = malloc(num_elements * element_size); void *data2 = malloc(num_elements * element_size); void *data_out = malloc(num_elements * element_size); // read input_data // check status for zAIU availability, supported ops, etc. here // status = zdnn_query(); // set input tensor data to 0 to 127 sequentially and repeat for (uint64_t i = 0; i < num_elements; i++) { ((float *)data1)[i] = (float)(i & 0x7f); ((float *)data2)[i] = (float)(i & 0x7f); } zdnn_init_pre_transformed_desc(ZDNN_NHWC, type, &pre_tfrmd_desc, dim_n, dim_h, dim_w, dim_c); // generate transformed shape information status = zdnn_generate_transformed_desc(&pre_tfrmd_desc, &tfrmd_desc); assert(status == ZDNN_OK); // initialize zTensors and allocate 4k-aligned storage via helper function status = zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_a); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_b); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, &ztensor_out); assert(status == ZDNN_OK); // transform the feature tensor status = zdnn_transform_ztensor(&ztensor_a, data1); assert(status == ZDNN_OK); status = zdnn_transform_ztensor(&ztensor_b, data2); assert(status == ZDNN_OK); // perform element-wise add between the two input tensors status = zdnn_add(&ztensor_a, &ztensor_b, &ztensor_out); assert(status == ZDNN_OK); // transform resultant zTensor back to original data format status = zdnn_transform_origtensor(&ztensor_out, data_out); assert(status == ZDNN_OK); for (uint64_t i = 0; i < num_elements; i++) { printf("out element %" PRIu64 " %f\n", i, ((float *)data_out)[i]); } // Free zTensors status = zdnn_free_ztensor_buffer(&ztensor_a); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&ztensor_b); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&ztensor_out); assert(status == ZDNN_OK); free(data1); free(data2); free(data_out); } ``` --- ### Example of calling the zdnn_quantized_matmul_op API (normal) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021, 2024 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: Quantized Matmul int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * Quantized Matmul: * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (s, m, n) * weights | ZDNN_3DS | (s, n, p) * input_biases | ZDNN_2DS | (s, p) * * OUTPUTS ------------------------------------------------------------- * output | ZDNN_3DS | (s, m, p) ***********************************************************************/ uint32_t s = 2; uint32_t m = 3; uint32_t n = 4; uint32_t p = 5; short int8_size = 1; // size of each int8 element in bytes short float_size = 4; // size of each float element in bytes /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &input_pre_tfrmd_desc, s, m, n); status = zdnn_generate_quantized_transformed_desc( &input_pre_tfrmd_desc, QUANTIZED_INT8, &input_tfrmd_desc); assert(status == ZDNN_OK); float input_scale = 1.f; float input_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, input_scale, input_offset, &input); assert(status == ZDNN_OK); uint64_t input_data_size = s * m * n * float_size; void *input_data = malloc(input_data_size); status = zdnn_transform_quantized_ztensor(&input, false, INT8_MIN, INT8_MAX, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create weights zTensor ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, INT8, &weights_pre_tfrmd_desc, s, n, p); status = zdnn_generate_quantized_transformed_desc( &weights_pre_tfrmd_desc, QUANTIZED_WEIGHTS_INT8, &weights_tfrmd_desc); assert(status == ZDNN_OK); float weights_scale = 1.f; float weights_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, weights_scale, weights_offset, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = s * n * p * int8_size; void *weights_data = malloc(weights_data_size); status = zdnn_transform_quantized_ztensor(&weights, false, INT8_MIN, INT8_MAX, weights_data); assert(status == ZDNN_OK); /*********************************************************************** * Create biases zTensor ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, FP32, &biases_pre_tfrmd_desc, s, p); status = zdnn_generate_quantized_transformed_desc( &biases_pre_tfrmd_desc, QUANTIZED_INT8, &biases_tfrmd_desc); assert(status == ZDNN_OK); float biases_scale = 1.f; float biases_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, biases_scale, biases_offset, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = s * p * float_size; void *biases_data = malloc(biases_data_size); status = zdnn_transform_quantized_ztensor(&biases, false, INT8_MIN, INT8_MAX, biases_data); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ zdnn_tensor_desc output_pre_tfrmd_desc, output_tfrmd_desc; zdnn_ztensor output; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &output_pre_tfrmd_desc, s, m, p); status = zdnn_generate_quantized_transformed_desc( &output_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &output_tfrmd_desc); assert(status == ZDNN_OK); float output_scale = 1.f; float output_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&output_pre_tfrmd_desc, &output_tfrmd_desc, output_scale, output_offset, &output); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ status = zdnn_quantized_matmul_op(&input, &weights, &biases, MATMUL_OP_ADDITION, INT8_MIN, INT8_MAX, NULL, &output); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t output_data_size = s * m * p * float_size; void *output_data = malloc(output_data_size); status = zdnn_transform_origtensor(&output, output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&output); assert(status == ZDNN_OK); free(input_data); free(weights_data); free(biases_data); free(output_data); } ``` --- ### Example of calling the zdnn_quantized_matmul_op API (on-the-fly) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021, 2024 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: Quantized Matmul on-the-fly int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * Quantized Matmul on-the-fly: * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (s, m, n) * weights | ZDNN_3DS | (s, n, p) * input_biases | ZDNN_2DS | (s, p) * * OUTPUTS ------------------------------------------------------------- * output | ZDNN_3DS | (s, m, p) ***********************************************************************/ uint32_t s = 2; uint32_t m = 3; uint32_t n = 4; uint32_t p = 5; short int8_size = 1; // size of each int8 element in bytes short float_size = 4; // size of each float element in bytes /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &input_pre_tfrmd_desc, s, m, n); status = zdnn_generate_quantized_transformed_desc( &input_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &input_tfrmd_desc); assert(status == ZDNN_OK); float input_scale = 1.f; float input_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, input_scale, input_offset, &input); assert(status == ZDNN_OK); uint64_t input_data_size = s * m * n * float_size; void *input_data = malloc(input_data_size); status = zdnn_transform_ztensor(&input, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create weights zTensor ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, INT8, &weights_pre_tfrmd_desc, s, n, p); status = zdnn_generate_quantized_transformed_desc( &weights_pre_tfrmd_desc, QUANTIZED_WEIGHTS_INT8, &weights_tfrmd_desc); assert(status == ZDNN_OK); float weights_scale = 1.f; float weights_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, weights_scale, weights_offset, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = s * n * p * int8_size; void *weights_data = malloc(weights_data_size); status = zdnn_transform_quantized_ztensor(&weights, false, INT8_MIN, INT8_MAX, weights_data); assert(status == ZDNN_OK); /*********************************************************************** * Create biases zTensor ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, FP32, &biases_pre_tfrmd_desc, s, p); status = zdnn_generate_quantized_transformed_desc( &biases_pre_tfrmd_desc, QUANTIZED_INT8, &biases_tfrmd_desc); assert(status == ZDNN_OK); float biases_scale = 1.f; float biases_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, biases_scale, biases_offset, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = s * p * float_size; void *biases_data = malloc(biases_data_size); status = zdnn_transform_quantized_ztensor(&biases, false, INT8_MIN, INT8_MAX, biases_data); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ zdnn_tensor_desc output_pre_tfrmd_desc, output_tfrmd_desc; zdnn_ztensor output; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &output_pre_tfrmd_desc, s, m, p); status = zdnn_generate_quantized_transformed_desc( &output_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &output_tfrmd_desc); assert(status == ZDNN_OK); float output_scale = 1.f; float output_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&output_pre_tfrmd_desc, &output_tfrmd_desc, output_scale, output_offset, &output); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ status = zdnn_quantized_matmul_op(&input, &weights, &biases, MATMUL_OP_ADDITION, INT8_MIN, INT8_MAX, NULL, &output); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t output_data_size = s * m * p * float_size; void *output_data = malloc(output_data_size); status = zdnn_transform_origtensor(&output, output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&output); assert(status == ZDNN_OK); free(input_data); free(weights_data); free(biases_data); free(output_data); } ``` --- ### Example of calling the zdnn_quantized_matmul with pre_computed=true API (normal) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021, 2024 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: Quantized Matmul Pre-Computed int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * Quantized Matmul Pre-Computed: * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (s, m, n) * weights | ZDNN_3DS | (s, n, p) * input_biases | ZDNN_2DS | (s, p) * * OUTPUTS ------------------------------------------------------------- * output | ZDNN_3DS | (s, m, p) ***********************************************************************/ uint32_t s = 2; uint32_t m = 3; uint32_t n = 4; uint32_t p = 5; short int8_size = 1; // size of each int8 element in bytes short float_size = 4; // size of each float element in bytes /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &input_pre_tfrmd_desc, s, m, n); status = zdnn_generate_quantized_transformed_desc( &input_pre_tfrmd_desc, QUANTIZED_INT8, &input_tfrmd_desc); assert(status == ZDNN_OK); float input_scale = 1.f; float input_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, input_scale, input_offset, &input); assert(status == ZDNN_OK); uint64_t input_data_size = s * m * n * float_size; void *input_data = malloc(input_data_size); status = zdnn_transform_quantized_ztensor(&input, false, INT8_MIN, INT8_MAX, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create weights zTensor ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, INT8, &weights_pre_tfrmd_desc, s, n, p); status = zdnn_generate_quantized_transformed_desc( &weights_pre_tfrmd_desc, QUANTIZED_WEIGHTS_INT8, &weights_tfrmd_desc); assert(status == ZDNN_OK); float weights_scale = 1.f; float weights_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, weights_scale, weights_offset, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = s * n * p * int8_size; void *weights_data = malloc(weights_data_size); status = zdnn_transform_quantized_ztensor(&weights, false, INT8_MIN, INT8_MAX, weights_data); assert(status == ZDNN_OK); /*********************************************************************** * Create pre-computed biases zTensor ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, FP32, &biases_pre_tfrmd_desc, s, p); status = zdnn_generate_quantized_transformed_desc( &biases_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &biases_tfrmd_desc); assert(status == ZDNN_OK); float biases_scale = 1.f; float biases_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, biases_scale, biases_offset, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = s * p * float_size; void *biases_data = malloc(biases_data_size); status = zdnn_transform_ztensor(&biases, biases_data); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ zdnn_tensor_desc output_pre_tfrmd_desc, output_tfrmd_desc; zdnn_ztensor output; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &output_pre_tfrmd_desc, s, m, p); status = zdnn_generate_quantized_transformed_desc( &output_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &output_tfrmd_desc); assert(status == ZDNN_OK); float output_scale = 1.f; float output_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&output_pre_tfrmd_desc, &output_tfrmd_desc, output_scale, output_offset, &output); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ status = zdnn_quantized_matmul_op(&input, &weights, &biases, MATMUL_OP_ADDITION, INT8_MIN, INT8_MAX, false, true, NULL, &output); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t output_data_size = s * m * p * float_size; void *output_data = malloc(output_data_size); status = zdnn_transform_origtensor(&output, output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&output); assert(status == ZDNN_OK); free(input_data); free(weights_data); free(biases_data); free(output_data); } ``` --- ### Example of calling the zdnn_quantized_matmul_op with pre_computed=true API (on-the-fly) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021, 2024 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: Quantized Matmul Pre-Computed on-the-fly int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * Quantized Matmul Pre-Computed on-the-fly: * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (s, m, n) * weights | ZDNN_3DS | (s, n, p) * input_biases | ZDNN_2DS | (s, p) * * OUTPUTS ------------------------------------------------------------- * output | ZDNN_3DS | (s, m, p) ***********************************************************************/ uint32_t s = 2; uint32_t m = 3; uint32_t n = 4; uint32_t p = 5; short int8_size = 1; // size of each int8 element in bytes short float_size = 4; // size of each float element in bytes /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &input_pre_tfrmd_desc, s, m, n); status = zdnn_generate_quantized_transformed_desc( &input_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &input_tfrmd_desc); assert(status == ZDNN_OK); float input_scale = 1.f; float input_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, input_scale, input_offset, &input); assert(status == ZDNN_OK); uint64_t input_data_size = s * m * n * float_size; void *input_data = malloc(input_data_size); status = zdnn_transform_ztensor(&input, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create weights zTensor ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, INT8, &weights_pre_tfrmd_desc, s, n, p); status = zdnn_generate_quantized_transformed_desc( &weights_pre_tfrmd_desc, QUANTIZED_WEIGHTS_INT8, &weights_tfrmd_desc); assert(status == ZDNN_OK); float weights_scale = 1.f; float weights_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, weights_scale, weights_offset, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = s * n * p * int8_size; void *weights_data = malloc(weights_data_size); status = zdnn_transform_quantized_ztensor(&weights, false, INT8_MIN, INT8_MAX, weights_data); assert(status == ZDNN_OK); /*********************************************************************** * Create pre-computed biases zTensor ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, FP32, &biases_pre_tfrmd_desc, s, p); status = zdnn_generate_quantized_transformed_desc( &biases_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &biases_tfrmd_desc); assert(status == ZDNN_OK); float biases_scale = 1.f; float biases_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, biases_scale, biases_offset, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = s * p * float_size; void *biases_data = malloc(biases_data_size); status = zdnn_transform_ztensor(&biases, biases_data); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ zdnn_tensor_desc output_pre_tfrmd_desc, output_tfrmd_desc; zdnn_ztensor output; zdnn_init_pre_transformed_desc(ZDNN_3DS, FP32, &output_pre_tfrmd_desc, s, m, p); status = zdnn_generate_quantized_transformed_desc( &output_pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &output_tfrmd_desc); assert(status == ZDNN_OK); float output_scale = 1.f; float output_offset = 0.f; status = zdnn_init_quantized_ztensor_with_malloc(&output_pre_tfrmd_desc, &output_tfrmd_desc, output_scale, output_offset, &output); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ status = zdnn_quantized_matmul_op(&input, &weights, &biases, MATMUL_OP_ADDITION, INT8_MIN, INT8_MAX, false, true, NULL, &output); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t output_data_size = s * m * p * float_size; void *output_data = malloc(output_data_size); status = zdnn_transform_origtensor(&output, output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&output); assert(status == ZDNN_OK); free(input_data); free(weights_data); free(biases_data); free(output_data); } ``` --- ### Example of an application calling the zdnn_lstm API (forward) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: LSTM int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * LSTM (FWD/BWD): * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (num_timesteps, num_batches, num_features) * h0 | ZDNN_3DS | (1, num_batches, num_hidden) * c0 | ZDNN_3DS | (1, num_batches, num_hidden) * weights | ZDNN_3DS | (1, num_features, num_hidden) * biases | ZDNN_2DS | (1, num_hidden) * hidden_weights | ZDNN_3DS | (1, num_hidden, num_hidden) * hidden_biases | ZDNN_2DS | (1, num_hidden) * * OUTPUTS ------------------------------------------------------------- * hn_output | ZDNN_4DS | (num_timesteps, 1, num_batches, num_hidden) * | | or (1, 1, num_batches, num_hidden) * cf_output | ZDNN_4DS | (1, 1, num_batches, num_hidden) ***********************************************************************/ /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; uint32_t num_timesteps = 5; uint32_t num_batches = 3; uint32_t num_features = 32; uint32_t num_hidden = 5; zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes lstm_gru_direction dir = FWD; uint8_t num_dirs = 1; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc, num_timesteps, num_batches, num_features); status = zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, &input); assert(status == ZDNN_OK); uint64_t input_data_size = num_timesteps * num_batches * num_features * element_size; void *input_data = malloc(input_data_size); status = zdnn_transform_ztensor(&input, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create initial hidden and cell state zTensors ***********************************************************************/ zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc; zdnn_ztensor h0, c0; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc, &h0); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc, &c0); assert(status == ZDNN_OK); uint64_t h0c0_data_size = num_batches * num_hidden * element_size; void *hidden_state_data = malloc(h0c0_data_size); void *cell_state_data = malloc(h0c0_data_size); status = zdnn_transform_ztensor(&h0, hidden_state_data); assert(status == ZDNN_OK); status = zdnn_transform_ztensor(&c0, cell_state_data); assert(status == ZDNN_OK); /*********************************************************************** * Create input weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc, num_dirs, num_features, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | PREV_LAYER_NONE, &weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = num_features * num_hidden * element_size; void *weights_data_f = malloc(weights_data_size); void *weights_data_i = malloc(weights_data_size); void *weights_data_c = malloc(weights_data_size); void *weights_data_o = malloc(weights_data_size); status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i, weights_data_c, weights_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | PREV_LAYER_NONE, &biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = num_hidden * element_size; void *biases_data_f = malloc(biases_data_size); void *biases_data_i = malloc(biases_data_size); void *biases_data_c = malloc(biases_data_size); void *biases_data_o = malloc(biases_data_size); status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i, biases_data_c, biases_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc; zdnn_ztensor hidden_weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc, num_dirs, num_hidden, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE, &hidden_weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc, &hidden_weights_tfrmd_desc, &hidden_weights); assert(status == ZDNN_OK); uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size; void *hidden_weights_data_f = malloc(hidden_weights_data_size); void *hidden_weights_data_i = malloc(hidden_weights_data_size); void *hidden_weights_data_c = malloc(hidden_weights_data_size); void *hidden_weights_data_o = malloc(hidden_weights_data_size); status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f, hidden_weights_data_i, hidden_weights_data_c, hidden_weights_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc; zdnn_ztensor hidden_biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE, &hidden_biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc( &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases); assert(status == ZDNN_OK); uint64_t hidden_biases_data_size = num_hidden * element_size; void *hidden_biases_data_f = malloc(hidden_biases_data_size); void *hidden_biases_data_i = malloc(hidden_biases_data_size); void *hidden_biases_data_c = malloc(hidden_biases_data_size); void *hidden_biases_data_o = malloc(hidden_biases_data_size); status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f, hidden_biases_data_i, hidden_biases_data_c, hidden_biases_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ // get only the last timestep, thus hn and cf can share descriptor zdnn_tensor_desc hncf_pre_tfrmd_desc, hncf_tfrmd_desc; zdnn_ztensor hn_output_ztensor, cf_output_ztensor; zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hncf_pre_tfrmd_desc, 1, 1, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc, &hn_output_ztensor); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hncf_pre_tfrmd_desc, &hncf_tfrmd_desc, &cf_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ void *work_area = NULL; status = zdnn_lstm(&input, &h0, &c0, &weights, &biases, &hidden_weights, &hidden_biases, dir, work_area, &hn_output_ztensor, &cf_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t hncf_data_size = num_batches * num_hidden * element_size; void *hn_output_data = malloc(hncf_data_size); void *cf_output_data = malloc(hncf_data_size); status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data); assert(status == ZDNN_OK); status = zdnn_transform_origtensor(&cf_output_ztensor, cf_output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&h0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&c0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hn_output_ztensor); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&cf_output_ztensor); assert(status == ZDNN_OK); free(input_data); free(hidden_state_data); free(cell_state_data); free(weights_data_f); free(weights_data_i); free(weights_data_c); free(weights_data_o); free(hidden_weights_data_f); free(hidden_weights_data_i); free(hidden_weights_data_c); free(hidden_weights_data_o); free(biases_data_f); free(biases_data_i); free(biases_data_c); free(biases_data_o); free(hidden_biases_data_f); free(hidden_biases_data_i); free(hidden_biases_data_c); free(hidden_biases_data_o); free(hn_output_data); free(cf_output_data); } ``` --- #### Example of an application calling the zdnn_lstm API (bi-directional) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: LSTM BI-DIR int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * LSTM (BI-DIR): * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (num_timesteps, num_batches, num_features) * h0 | ZDNN_3DS | (2, num_batches, num_hidden) * c0 | ZDNN_3DS | (2, num_batches, num_hidden) * weights | ZDNN_3DS | (2, num_features, num_hidden) * biases | ZDNN_2DS | (2, num_hidden) * hidden_weights | ZDNN_3DS | (2, num_hidden, num_hidden) * hidden_biases | ZDNN_2DS | (2, num_hidden) * * OUTPUTS ------------------------------------------------------------- * hn_output | ZDNN_4DS | (num_timesteps, 2, num_batches, num_hidden) * | | or (1, 2, num_batches, num_hidden) * cf_output | ZDNN_4DS | (1, 2, num_batches, num_hidden) ***********************************************************************/ /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; uint32_t num_timesteps = 5; uint32_t num_batches = 3; uint32_t num_features = 32; uint32_t num_hidden = 5; zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes lstm_gru_direction dir = BIDIR; uint8_t num_dirs = 2; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc, num_timesteps, num_batches, num_features); status = zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, &input); assert(status == ZDNN_OK); uint64_t input_data_size = num_timesteps * num_batches * num_features * element_size; void *input_data = malloc(input_data_size); status = zdnn_transform_ztensor(&input, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create initial hidden and cell state zTensors ***********************************************************************/ zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc; zdnn_ztensor h0, c0; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc, &h0); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc, &c0); assert(status == ZDNN_OK); uint64_t h0c0_data_size = num_batches * num_hidden * element_size; void *hidden_state_data = malloc(h0c0_data_size); void *cell_state_data = malloc(h0c0_data_size); status = zdnn_transform_ztensor(&h0, hidden_state_data); assert(status == ZDNN_OK); status = zdnn_transform_ztensor(&c0, cell_state_data); assert(status == ZDNN_OK); /*********************************************************************** * Create input weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc, num_dirs, num_features, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | PREV_LAYER_NONE, &weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = num_features * num_hidden * element_size; void *weights_data_f = malloc(weights_data_size); void *weights_data_i = malloc(weights_data_size); void *weights_data_c = malloc(weights_data_size); void *weights_data_o = malloc(weights_data_size); status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i, weights_data_c, weights_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | PREV_LAYER_NONE, &biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = num_hidden * element_size; void *biases_data_f = malloc(biases_data_size); void *biases_data_i = malloc(biases_data_size); void *biases_data_c = malloc(biases_data_size); void *biases_data_o = malloc(biases_data_size); status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i, biases_data_c, biases_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc; zdnn_ztensor hidden_weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc, num_dirs, num_hidden, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE, &hidden_weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc, &hidden_weights_tfrmd_desc, &hidden_weights); assert(status == ZDNN_OK); uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size; void *hidden_weights_data_f = malloc(hidden_weights_data_size); void *hidden_weights_data_i = malloc(hidden_weights_data_size); void *hidden_weights_data_c = malloc(hidden_weights_data_size); void *hidden_weights_data_o = malloc(hidden_weights_data_size); status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f, hidden_weights_data_i, hidden_weights_data_c, hidden_weights_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc; zdnn_ztensor hidden_biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE, &hidden_biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc( &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases); assert(status == ZDNN_OK); uint64_t hidden_biases_data_size = num_hidden * element_size; void *hidden_biases_data_f = malloc(hidden_biases_data_size); void *hidden_biases_data_i = malloc(hidden_biases_data_size); void *hidden_biases_data_c = malloc(hidden_biases_data_size); void *hidden_biases_data_o = malloc(hidden_biases_data_size); status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f, hidden_biases_data_i, hidden_biases_data_c, hidden_biases_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ zdnn_tensor_desc hn_pre_tfrmd_desc, hn_tfrmd_desc, cf_pre_tfrmd_desc, cf_tfrmd_desc; zdnn_ztensor hn_output_ztensor, cf_output_ztensor; zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc, num_timesteps, 2, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc); assert(status == ZDNN_OK); zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &cf_pre_tfrmd_desc, 1, 2, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc, &hn_output_ztensor); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc, &cf_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ void *work_area = NULL; status = zdnn_lstm(&input, &h0, &c0, &weights, &biases, &hidden_weights, &hidden_biases, dir, work_area, &hn_output_ztensor, &cf_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t hn_data_size = num_timesteps * 2 * num_batches * num_hidden * element_size; uint64_t cf_data_size = 2 * num_batches * num_hidden * element_size; void *hn_output_data = malloc(hn_data_size); void *cf_output_data = malloc(cf_data_size); status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data); assert(status == ZDNN_OK); status = zdnn_transform_origtensor(&cf_output_ztensor, cf_output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&h0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&c0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hn_output_ztensor); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&cf_output_ztensor); assert(status == ZDNN_OK); free(input_data); free(hidden_state_data); free(cell_state_data); free(weights_data_f); free(weights_data_i); free(weights_data_c); free(weights_data_o); free(hidden_weights_data_f); free(hidden_weights_data_i); free(hidden_weights_data_c); free(hidden_weights_data_o); free(biases_data_f); free(biases_data_i); free(biases_data_c); free(biases_data_o); free(hidden_biases_data_f); free(hidden_biases_data_i); free(hidden_biases_data_c); free(hidden_biases_data_o); free(hn_output_data); free(cf_output_data); } ``` --- ### Example of an application calling the zdnn_lstm API #### Example of an application calling the zdnn_lstm API (multi-layer bi-directional) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" void do_bidir_layer(zdnn_ztensor *input, uint32_t num_hidden, zdnn_ztensor *hn_output, bool is_prev_layer_bidir) { zdnn_status status; uint32_t num_batches = input->pre_transformed_desc->dim2; // if input is bidir output from previous layer then number of features for // this layer is 2x of hidden-state size (dim1) of the previous layer uint32_t num_features = input->pre_transformed_desc->dim1 * (is_prev_layer_bidir ? 2 : 1); zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes lstm_gru_direction dir = BIDIR; uint8_t num_dirs = 2; /*********************************************************************** * Create initial hidden and cell state zTensors ***********************************************************************/ zdnn_tensor_desc h0c0_pre_tfrmd_desc, h0c0_tfrmd_desc; zdnn_ztensor h0, c0; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0c0_pre_tfrmd_desc, num_dirs, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc, &h0); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0c0_pre_tfrmd_desc, &h0c0_tfrmd_desc, &c0); assert(status == ZDNN_OK); uint64_t h0c0_data_size = num_batches * num_hidden * element_size; void *hidden_state_data = malloc(h0c0_data_size); void *cell_state_data = malloc(h0c0_data_size); status = zdnn_transform_ztensor(&h0, hidden_state_data); assert(status == ZDNN_OK); status = zdnn_transform_ztensor(&c0, cell_state_data); assert(status == ZDNN_OK); /*********************************************************************** * Create input weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; // if using previous layer bidir output as input then number of features of // this layer is zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc, num_dirs, num_features, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_WEIGHTS | (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI), &weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = num_features * num_hidden * element_size; void *weights_data_f = malloc(weights_data_size); void *weights_data_i = malloc(weights_data_size); void *weights_data_c = malloc(weights_data_size); void *weights_data_o = malloc(weights_data_size); status = zdnn_transform_ztensor(&weights, weights_data_f, weights_data_i, weights_data_c, weights_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_BIASES | (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI), &biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = num_hidden * element_size; void *biases_data_f = malloc(biases_data_size); void *biases_data_i = malloc(biases_data_size); void *biases_data_c = malloc(biases_data_size); void *biases_data_o = malloc(biases_data_size); status = zdnn_transform_ztensor(&biases, biases_data_f, biases_data_i, biases_data_c, biases_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc; zdnn_ztensor hidden_weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc, num_dirs, num_hidden, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_weights_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_HIDDEN_WEIGHTS | (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI), &hidden_weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc, &hidden_weights_tfrmd_desc, &hidden_weights); assert(status == ZDNN_OK); uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size; void *hidden_weights_data_f = malloc(hidden_weights_data_size); void *hidden_weights_data_i = malloc(hidden_weights_data_size); void *hidden_weights_data_c = malloc(hidden_weights_data_size); void *hidden_weights_data_o = malloc(hidden_weights_data_size); status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_f, hidden_weights_data_i, hidden_weights_data_c, hidden_weights_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc; zdnn_ztensor hidden_biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_biases_pre_tfrmd_desc, RNN_TYPE_LSTM | USAGE_HIDDEN_BIASES | (is_prev_layer_bidir ? PREV_LAYER_BIDIR : PREV_LAYER_UNI), &hidden_biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc( &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases); assert(status == ZDNN_OK); uint64_t hidden_biases_data_size = num_hidden * element_size; void *hidden_biases_data_f = malloc(hidden_biases_data_size); void *hidden_biases_data_i = malloc(hidden_biases_data_size); void *hidden_biases_data_c = malloc(hidden_biases_data_size); void *hidden_biases_data_o = malloc(hidden_biases_data_size); status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_f, hidden_biases_data_i, hidden_biases_data_c, hidden_biases_data_o); assert(status == ZDNN_OK); /*********************************************************************** * Create cf output zTensor ***********************************************************************/ zdnn_tensor_desc cf_pre_tfrmd_desc, cf_tfrmd_desc; zdnn_ztensor cf_output_ztensor; zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &cf_pre_tfrmd_desc, 1, 2, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&cf_pre_tfrmd_desc, &cf_tfrmd_desc, &cf_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ void *work_area = NULL; status = zdnn_lstm(input, &h0, &c0, &weights, &biases, &hidden_weights, &hidden_biases, dir, work_area, hn_output, &cf_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Cleanup and Return ***********************************************************************/ status = zdnn_free_ztensor_buffer(&h0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&c0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&cf_output_ztensor); assert(status == ZDNN_OK); free(hidden_state_data); free(cell_state_data); free(weights_data_f); free(weights_data_i); free(weights_data_c); free(weights_data_o); free(hidden_weights_data_f); free(hidden_weights_data_i); free(hidden_weights_data_c); free(hidden_weights_data_o); free(biases_data_f); free(biases_data_i); free(biases_data_c); free(biases_data_o); free(hidden_biases_data_f); free(hidden_biases_data_i); free(hidden_biases_data_c); free(hidden_biases_data_o); } // Sample: LSTM multi-layer BIDIR int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif uint32_t num_hidden[2] = {5, 4}; /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; uint32_t num_timesteps = 5; uint32_t num_batches = 3; uint32_t num_features = 32; zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc, num_timesteps, num_batches, num_features); status = zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, &input); assert(status == ZDNN_OK); uint64_t input_data_size = num_timesteps * num_batches * num_features * element_size; void *input_data = malloc(input_data_size); status = zdnn_transform_ztensor(&input, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create 2 hn output zTensors ***********************************************************************/ zdnn_tensor_desc hn_pre_tfrmd_desc[2], hn_tfrmd_desc[2]; zdnn_ztensor hn_output[2]; for (int i = 0; i < 2; i++) { zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc[i], num_timesteps, 2, num_batches, num_hidden[i]); status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc[i], &hn_tfrmd_desc[i]); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc[i], &hn_tfrmd_desc[i], &hn_output[i]); assert(status == ZDNN_OK); } /*********************************************************************** * Do the layers ***********************************************************************/ // call the first layer with input, previous layer bidir = false, output goes // to hn_output[0] do_bidir_layer(&input, num_hidden[0], &hn_output[0], false); // call the second layer with hn_output[0] from layer 1, previous layer bidir // = true, output goes to hn_output[1] do_bidir_layer(&hn_output[0], num_hidden[1], &hn_output[1], true); /*********************************************************************** * Output and Cleanup ***********************************************************************/ void *hn_output_data[2]; for (int i = 0; i < 2; i++) { uint64_t hn_output_data_size = (uint64_t)num_timesteps * num_batches * num_hidden[i] * 2 * element_size; hn_output_data[i] = malloc(hn_output_data_size); status = zdnn_transform_origtensor(&hn_output[i], hn_output_data[i]); assert(status == ZDNN_OK); } status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hn_output[0]); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hn_output[1]); assert(status == ZDNN_OK); free(input_data); free(hn_output_data[0]); free(hn_output_data[1]); } ``` --- ### Example of an application calling the zdnn_gru API #### Example of an application calling the zdnn_gru API (forward) [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2021 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include "zdnn.h" // Sample: GRU int main(int argc, char *argv[]) { zdnn_status status; #ifdef STATIC_LIB zdnn_init(); #endif /*********************************************************************** * * GRU (FWD/BWD): * * INPUTS -------------------------------------------------------------- * input | ZDNN_3DS | (num_timesteps, num_batches, num_features) * h0 | ZDNN_3DS | (1, num_batches, num_hidden) * weights | ZDNN_3DS | (1, num_features, num_hidden) * input_biases | ZDNN_2DS | (1, num_hidden) * hidden_weights | ZDNN_3DS | (1, num_hidden, num_hidden) * hidden_biases | ZDNN_2DS | (1, num_hidden) * * OUTPUTS ------------------------------------------------------------- * hn_output | ZDNN_4DS | (num_timesteps, 1, num_batches, num_hidden) * | | or (1, 1, num_batches, num_hidden) ***********************************************************************/ /*********************************************************************** * Create input zTensor ***********************************************************************/ zdnn_tensor_desc input_pre_tfrmd_desc, input_tfrmd_desc; zdnn_ztensor input; uint32_t num_timesteps = 5; uint32_t num_batches = 3; uint32_t num_features = 32; uint32_t num_hidden = 5; zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes lstm_gru_direction dir = FWD; uint8_t num_dirs = 1; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &input_pre_tfrmd_desc, num_timesteps, num_batches, num_features); status = zdnn_generate_transformed_desc(&input_pre_tfrmd_desc, &input_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&input_pre_tfrmd_desc, &input_tfrmd_desc, &input); assert(status == ZDNN_OK); uint64_t input_data_size = num_timesteps * num_batches * num_features * element_size; void *input_data = malloc(input_data_size); status = zdnn_transform_ztensor(&input, input_data); assert(status == ZDNN_OK); /*********************************************************************** * Create initial hidden zTensor ***********************************************************************/ zdnn_tensor_desc h0_pre_tfrmd_desc, h0_tfrmd_desc; zdnn_ztensor h0; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &h0_pre_tfrmd_desc, num_dirs, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&h0_pre_tfrmd_desc, &h0_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&h0_pre_tfrmd_desc, &h0_tfrmd_desc, &h0); assert(status == ZDNN_OK); uint64_t h0_data_size = num_batches * num_hidden * element_size; void *hidden_state_data = malloc(h0_data_size); status = zdnn_transform_ztensor(&h0, hidden_state_data); assert(status == ZDNN_OK); /*********************************************************************** * Create input weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc weights_pre_tfrmd_desc, weights_tfrmd_desc; zdnn_ztensor weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &weights_pre_tfrmd_desc, num_dirs, num_features, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &weights_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_WEIGHTS | PREV_LAYER_NONE, &weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&weights_pre_tfrmd_desc, &weights_tfrmd_desc, &weights); assert(status == ZDNN_OK); uint64_t weights_data_size = num_features * num_hidden * element_size; void *weights_data_z = malloc(weights_data_size); void *weights_data_r = malloc(weights_data_size); void *weights_data_h = malloc(weights_data_size); status = zdnn_transform_ztensor(&weights, weights_data_z, weights_data_r, weights_data_h); assert(status == ZDNN_OK); /*********************************************************************** * Create biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc biases_pre_tfrmd_desc, biases_tfrmd_desc; zdnn_ztensor biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &biases_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_BIASES | PREV_LAYER_NONE, &biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&biases_pre_tfrmd_desc, &biases_tfrmd_desc, &biases); assert(status == ZDNN_OK); uint64_t biases_data_size = num_hidden * element_size; void *biases_data_z = malloc(biases_data_size); void *biases_data_r = malloc(biases_data_size); void *biases_data_h = malloc(biases_data_size); status = zdnn_transform_ztensor(&biases, biases_data_z, biases_data_r, biases_data_h); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden weights zTensor * Resultant zTensor is concatenated ***********************************************************************/ zdnn_tensor_desc hidden_weights_pre_tfrmd_desc, hidden_weights_tfrmd_desc; zdnn_ztensor hidden_weights; zdnn_init_pre_transformed_desc(ZDNN_3DS, type, &hidden_weights_pre_tfrmd_desc, num_dirs, num_hidden, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_weights_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_HIDDEN_WEIGHTS | PREV_LAYER_NONE, &hidden_weights_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hidden_weights_pre_tfrmd_desc, &hidden_weights_tfrmd_desc, &hidden_weights); assert(status == ZDNN_OK); uint64_t hidden_weights_data_size = num_hidden * num_hidden * element_size; void *hidden_weights_data_z = malloc(hidden_weights_data_size); void *hidden_weights_data_r = malloc(hidden_weights_data_size); void *hidden_weights_data_h = malloc(hidden_weights_data_size); status = zdnn_transform_ztensor(&hidden_weights, hidden_weights_data_z, hidden_weights_data_r, hidden_weights_data_h); assert(status == ZDNN_OK); /*********************************************************************** * Create hidden biases zTensors * Resultant zTensors are concatenated ***********************************************************************/ zdnn_tensor_desc hidden_biases_pre_tfrmd_desc, hidden_biases_tfrmd_desc; zdnn_ztensor hidden_biases; zdnn_init_pre_transformed_desc(ZDNN_2DS, type, &hidden_biases_pre_tfrmd_desc, num_dirs, num_hidden); status = zdnn_generate_transformed_desc_concatenated( &hidden_biases_pre_tfrmd_desc, RNN_TYPE_GRU | USAGE_HIDDEN_BIASES | PREV_LAYER_NONE, &hidden_biases_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc( &hidden_biases_pre_tfrmd_desc, &hidden_biases_tfrmd_desc, &hidden_biases); assert(status == ZDNN_OK); uint64_t hidden_biases_data_size = num_hidden * element_size; void *hidden_biases_data_z = malloc(hidden_biases_data_size); void *hidden_biases_data_r = malloc(hidden_biases_data_size); void *hidden_biases_data_h = malloc(hidden_biases_data_size); status = zdnn_transform_ztensor(&hidden_biases, hidden_biases_data_z, hidden_biases_data_r, hidden_biases_data_h); assert(status == ZDNN_OK); /*********************************************************************** * Create output zTensor ***********************************************************************/ // get only the last timestep zdnn_tensor_desc hn_pre_tfrmd_desc, hn_tfrmd_desc; zdnn_ztensor hn_output_ztensor; zdnn_init_pre_transformed_desc(ZDNN_4DS, type, &hn_pre_tfrmd_desc, 1, 1, num_batches, num_hidden); status = zdnn_generate_transformed_desc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc); assert(status == ZDNN_OK); status = zdnn_init_ztensor_with_malloc(&hn_pre_tfrmd_desc, &hn_tfrmd_desc, &hn_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Call the zAIU ***********************************************************************/ void *work_area = NULL; status = zdnn_gru(&input, &h0, &weights, &biases, &hidden_weights, &hidden_biases, dir, work_area, &hn_output_ztensor); assert(status == ZDNN_OK); /*********************************************************************** * Output and Cleanup ***********************************************************************/ uint64_t hn_data_size = num_batches * num_hidden * element_size; void *hn_output_data = malloc(hn_data_size); status = zdnn_transform_origtensor(&hn_output_ztensor, hn_output_data); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&input); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&h0); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_weights); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hidden_biases); assert(status == ZDNN_OK); status = zdnn_free_ztensor_buffer(&hn_output_ztensor); assert(status == ZDNN_OK); free(input_data); free(hidden_state_data); free(weights_data_z); free(weights_data_r); free(weights_data_h); free(hidden_weights_data_z); free(hidden_weights_data_r); free(hidden_weights_data_h); free(biases_data_z); free(biases_data_r); free(biases_data_h); free(hidden_biases_data_z); free(hidden_biases_data_r); free(hidden_biases_data_h); free(hn_output_data); } ``` --- ### Example of an application creating a quantized ztensor [Back to Table of Contents](#TOC) ```C // SPDX-License-Identifier: Apache-2.0 /* * Copyright IBM Corp. 2023 * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include #include #include #include #include #include "zdnn.h" // *************************************************************************** // Sample: // // Create a quantized zTensors // *************************************************************************** int main(int argc, char *argv[]) { zdnn_tensor_desc pre_tfrmd_desc, tfrmd_desc; zdnn_ztensor ztensor; zdnn_status status; uint32_t dim_n = 1, dim_h = 32, dim_w = 32, dim_c = 3; zdnn_data_types type = FP32; short element_size = 4; // size of each element in bytes uint64_t num_elements = dim_n * dim_h * dim_w * dim_c; // allocate tensor data storage void *data1 = malloc(num_elements * element_size); // read input_data // check status for zAIU availability, supported ops, etc. here // status = zdnn_query(); // set input tensor data to 0 to 127 sequentially and repeat for (uint64_t i = 0; i < num_elements; i++) { ((float *)data1)[i] = (float)(i & 0x7f); } zdnn_init_pre_transformed_desc(ZDNN_NHWC, type, &pre_tfrmd_desc, dim_n, dim_h, dim_w, dim_c); float scale = 3; float offset = 2; // generate transformed shape information status = zdnn_generate_quantized_transformed_desc( &pre_tfrmd_desc, QUANTIZED_DLFLOAT16, &tfrmd_desc); assert(status == ZDNN_OK); // initialize zTensors and allocate 4k-aligned storage via helper function status = zdnn_init_quantized_ztensor_with_malloc(&pre_tfrmd_desc, &tfrmd_desc, scale, offset, &ztensor); assert(status == ZDNN_OK); // transform the feature tensor status = zdnn_transform_ztensor(&ztensor, data1); assert(status == ZDNN_OK); // Free zTensors status = zdnn_free_ztensor_buffer(&ztensor); assert(status == ZDNN_OK); free(data1); } ```