diff --git a/docs/zh/server/_toc.yaml b/docs/zh/server/_toc.yaml index af3220972c8ed0b3fb8e7901aed8ef0f69bdea94..03925b5f2456314959e537bcaf42464e412ed824 100644 --- a/docs/zh/server/_toc.yaml +++ b/docs/zh/server/_toc.yaml @@ -78,6 +78,7 @@ sections: upstream: https://gitee.com/openeuler/compiler-docs/blob/openEuler-24.03-LTS-SP2/docs/zh/bisheng_autotuner/_toc.yaml - href: ./development/ai4c/_toc.yaml - href: ./development/fangtian/_toc.yaml + - href: ./development/annc/_toc.yaml - label: HA高可用 sections: - href: ./high_availability/ha/_toc.yaml diff --git a/docs/zh/server/development/annc/_toc.yaml b/docs/zh/server/development/annc/_toc.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ab3b9e9edca68e0b23dd4cd6d4334b0fdbef1b1d --- /dev/null +++ b/docs/zh/server/development/annc/_toc.yaml @@ -0,0 +1,6 @@ +label: ANNC使用手册 +isManual: true +description: ANNC(Accelerated Neural Network Compiler)是专注于加速神经网络计算的编译器,聚焦于通过计算图优化,高性能融合算子生成和对接技术以及高效代码生成和优化能力,加速推荐和大模型的推理性能,支持主流开源推理框架接入。 +sections: + - label: ANNC使用手册 + href: ./annc_user_manual.md diff --git a/docs/zh/server/development/annc/annc_user_manual.md b/docs/zh/server/development/annc/annc_user_manual.md new file mode 100644 index 0000000000000000000000000000000000000000..f8d229c2d11b838a5f7ee2e1ca7064993a44caaa --- /dev/null +++ b/docs/zh/server/development/annc/annc_user_manual.md @@ -0,0 +1,248 @@ +# ANNC 使用手册 + +## 1 ANNC 介绍 + +ANNC(Accelerated Neural Network Compiler)是专注于加速神经网络计算的编译器,聚焦于通过计算图优化,高性能融合算子生成和对接技术以及高效代码生成和优化能力,加速推荐和大模型的推理性能,支持主流开源推理框架接入。 + +## 2 ANNC 的安装构建 + +### 2.1 直接安装ANNC + +若用户使用最新的openEuler系统(24.03-LTS-SP2),可以直接安装`ANNC`包。 + +```shell +yum install -y ANNC +``` + +若用户使用其他版本的`ANNC`特性或在其他OS版本中安装`ANNC`,需重新构建`ANNC`,可以参考以下步骤。 + +### 2.2 RPM包构建安装流程(推荐) + +1. 使用 root 权限,安装 rpmbuild、rpmdevtools,具体命令如下: + + ```bash + # 安装 rpmbuild + yum install dnf-plugins-core rpm-build + # 安装 rpmdevtools + yum install rpmdevtools + ``` + +2. 在主目录`/root`下生成 rpmbuild 文件夹: + + ```bash + rpmdev-setuptree + # 检查自动生成的目录结构 + ls ~/rpmbuild/ + BUILD BUILDROOT RPMS SOURCES SPECS SRPMS + ``` + +3. 使用`git clone -b openEuler-24.03-LTS-SP2 https://gitee.com/src-openeuler/ANNC.git`,从目标仓库的 `openEuler-24.03-LTS-SP2` 分支拉取代码,并把目标文件放入 rpmbuild 的相应文件夹下: + + ``` shell + cp ANNC/*.tar.gz* ~/rpmbuild/SOURCES + cp ANNC/*.patch ~/rpmbuild/SOURCES/ + cp ANNC/ANNC.spec ~/rpmbuild/SPECS/ + ``` + +4. 用户可通过以下步骤生成 `ANNC` 的 RPM 包: + + ```bash + # 安装 ANNC 所需依赖 + yum-builddep ~/rpmbuild/SPECS/ANNC.spec + # 构建 ANNC 依赖包 + # 若出现 check-rpaths 相关报错,则需要在 rpmbuild 前添加 QA_RPATHS=0x0002,例如 + # QA_RPATHS=0x0002 rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec + rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec + # 安装 RPM 包 + cd ~/rpmbuild/RPMS/ + rpm -ivh ANNC--..rpm + ``` + + 注意事项:若系统因存有旧版本的 RPM 安装包而导致文件冲突,可以通过以下方式解决: + + ```bash + # 解决方案一:强制安装新版本 + rpm -ivh ANNC--..rpm --force + # 解决方案二:更新安装包 + rpm -Uvh ANNC--..rpm + ``` + +### 2.3 源码构建安装流程 + +ANNC 的源码地址: + +保证以下依赖包已安装: + +```shell +yum install -y gcc gcc-c++ bzip2 python3-devel python3-numpy python3-setuptools python3-wheel libstdc++-static java-11-openjdk java-11-openjdk-devel make +``` + +安装bazel, 从该地址 获取bazel-6.5.0包 + +```bash +unzip bazel-6.5.0-dist.zip -d bazel-6.5.0 +cd bazel-6.5.0 +env EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" bash ./compile.sh + +export PATH=/path/to/bazel-6.5.0/output:$PATH +bazel --version +``` + +安装ANNC, 从源码地址下载ANNC源码包 + +```bash +git clone https://gitee.com/openeuler/ANNC.git + +cd ANNC + +bazel --output_user_root=./output build -c opt \ + --copt="-DANNC_ENABLE_GRAPH_OPT" \ + --copt="-DANNC_ENABLE_OPENBLAS" \ + annc/service/cpu:libannc.so + +cp bazel-bin/annc/service/cpu/libannc.so /usr/lib64 +mkdir -p /usr/include/annc +cp annc/service/cpu/kdnn_rewriter.h /usr/include/annc +cd python +python3 setup.py bdist_wheel +python3 -m pip install dist/*.whl +``` + +## 3 使用流程 + +**注意事项:** + +* ANNC使用者需提前部署好tf-serving,通过编译选项和代码补丁的方式接入ANNC编译优化扩展套件 + +### 3.1 图融合结合手动大算子 + +下载基线模型 + +```bash +git clone https://gitee.com/openeuler/sra_benchmark.git +``` + +从基线模型库中获取以下目标推荐模型 **DeepFM、DFFM、DLRM、W&D** + +命令行实现图融合 + +```bash +# 安装依赖库 + +python3 -m pip install tensorflow==2.15.1 + +# 运行模型转换,以DeepFM模型为例 + +annc-opt -I /path/to/model_DeepFM/1730800001/1 -O deepfm_new/1 dnn_sparse linear_sparse +cp -r /path/to/model_DeepFM/1730800001/1/variables deepfm_new/1 +``` + +输出目录deepfm_new/1下应生成新的模型文件saved_model.pbtxt,搜索KPFusedSparseEmbedding, 确认图融合算子正确生成 + +然后将ANNC提供的开源算子库注册到tf-serving + +```bash +# 进入tf-serving目录,创建自定义算子文件夹 + +cd /path/to/serving +mkdir tensorflow_serving/custom_ops + +# 将ANNC算子拷贝到该目录下 + +cp /usr/include/annc/fused*.cc tensorflow_serving/custom_ops/ +``` + +创建算子编译文件tensorflow_serving/custom_ops/BUILD, 并在该文件中写入以下内容: + +```ini +package( + default_visibility = [ + "//visibility:public", + ], + licenses = ["notice"], +) + +cc_library( + name = 'recom_embedding_ops', + srcs = [ + "fused_sparse_embedding.cc", + "fused_linear_embedding_with_hash_bucket.cc", + "fused_dnn_embedding_with_hash_bucket.cc" + ], + alwayslink = 1, + deps = [ + "@org_tensorflow//tensorflow/core:framework", + ] +) +``` + +```bash +# 打开 tensorflow_serving/model_servers/BUILD, 搜索SUPPORTED_TENSORFLOW_OPS, 添加以下内容注册我们的算子: + +"//tensorflow_serving/custom_ops:recom_embedding_ops" +``` + +完成算子注册后,使用以下命令重新编译tf-serving,编译成功即表示算子成功注册: + +```bash + bazel --output_user_root=./output build -c opt --distdir=./proxy \ + tensorflow_serving/model_servers:tensorflow_model_server +``` + +### 3.2 自动图融合和算子融合 + +进入tf-serving目录,使用如下选项重新编译tf-serving使能ANNC: + +```bash +bazel --output_user_root=./output build -c opt --distdir=./proxy \ + --copt=-DANNC_ENABLED_KDNN \ + tensorflow_serving/model_servers:tensorflow_model_server +``` + +启动tf-serving,制定目标模型,确认服务正常启动即可 + +### 3.3 算子内存和布局优化 + +进入tf-serving目录,需打上相关patch并重新编译 + +```bash +# 打上相关patch, patch位于ANNC目录下 + +export ANNC_PATH=/path/to/ANNC +cp $ANNC_PATH/tfserver/llvm/llvm.sh /path/to/serving/output/{id}/external/llvm-raw +cp ANNC_PATH/tfserver/xla/xla.sh /path/to/serving/output/{id}/external/org_tensorflow/third_party/xla + +cd path/to/serving/output/{id}/external/llvm-raw +bash ./llvm.sh +cd /path/to/serving/output/{id}/external/org_tensorflow/third_party/xla +bash ./xla.sh + +# 重新编译 + +bazel --output_user_root=./output build -c opt --distdir=./proxy \ + --copt=-DANNC_ENABLED_KDNN \ + tensorflow_serving/model_servers:tensorflow_model_server +``` + +设置环境变量,开启优化特性 + +```bash +export XLA_FLAGS="--xla_cpu_use_xla_runtime=true --xla_cpu_enable_concat_optimization=true --xla_cpu_enable_output_tensor_reuse=true --xla_cpu_enable_mlir_tiling_and_fusion=true" +``` + +### 3.4 算子选择和算子库对接 + +进入tf-serving目录,重新编译 + +```bash +cd ./output/{id}/external/org_tensorflow +patch -p1 < /usr/include/tensorflow.patch + +# 开启MatMul算子下发和算子库对接编译选项 + +cd /path/to/serving +bazel --output_user_root=./output build -c opt --distdir=./proxy \ + --copt=-DANNC_ENABLED_KDNN \ + --copt=-DDISABLE_TF_MATMUL_FUSION \ + tensorflow_serving/model_servers:tensorflow_model_server +```