diff --git "a/AICore Profiling\345\267\245\345\205\267\344\275\277\347\224\250\346\214\207\345\257\274\344\271\246.md" "b/AICore Profiling\345\267\245\345\205\267\344\275\277\347\224\250\346\214\207\345\257\274\344\271\246.md" new file mode 100644 index 0000000000000000000000000000000000000000..75f01c8e3161afc4a7982e5a1f25dce9db92ecf4 --- /dev/null +++ "b/AICore Profiling\345\267\245\345\205\267\344\275\277\347\224\250\346\214\207\345\257\274\344\271\246.md" @@ -0,0 +1,67 @@ +## 1 背景 +(1)模型调测过程中,性能分析是一大块工作,所以一款好用的profiling工具必不可少。 +(2)630之前的profiling工具其实就已经有了,但易用性上直接让人感到反感,这次改版之后,改进很大,数据采集和数据解析在易用性上也有很大提升。 + +## 2 数据采集(训练任务JOB) +其他方式不在这里介绍,这里只介绍设置环境变量的方式,开启profiling功能: +``` +export PROFILING_MODE=true +export PROFILING_OPTIONS='{"output": "./cann_profiling", "training_trace": "on", "task_trace": "on", "aicpu": "on", "fp_point": "", "bp_point": "", "aic_metrics": "PipeUtilization"}' +``` +其中,output指明采集数据的存放目录,需提前创建好目录, aicpu为aicpu算子开关,aic_metrics为aicore算子开关,其他参数见说明[CANN V100R020C10 开发辅助工具指南 (训练) 01](https://support.huawei.com/enterprise/zh/doc/EDOC1100164832/6f4033fd) + +设置好环境变量后,然后执行训练,之后会在./cann_profiling目录下产生profiling数据: +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig1.png) + + +## 3 数据解析 +### 3.1 解析工具 +解析数据必须要用到昇腾软件包的安装目录下的msprof工具,一般默认路径为: +``` +x86路径为: +/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/toolkit/tools/profiler/profiler_tool/analysis/msprof/msprof.py + +arm路径为: +/usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/toolkit/tools/profiler/profiler_tool/analysis/msprof/msprof.py +``` +以下指令以x86_64环境为例 + +### 3.2 解析profiling数据 +``` +python3.7 /usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/toolkit/tools/profiler/profiler_tool/analysis/msprof/msprof.py import -dir ./cann_profiling/ +``` +其中,./cann_profiling为上面采集数据的保存路径 +执行过程: +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig2.png) + +执行之后会在cann_profiling目录下生成sqlite等数据目录: +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig3.png) + +### 3.3 导出timeline数据 +导出timeline数据,执行如下命令: +``` +python3.7 /usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/toolkit/tools/profiler/profiler_tool/analysis/msprof/msprof.py export timeline -dir ./cann_profiling/ +``` +如果需要指定某一个迭代step,可设置参数:-iteration-id +执行之后会生成timeline文件夹: +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig4.png) + +里面的json文件可以用chrome://tracing查看 +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig5.png) + +### 3.4 导出summary数据 +导出summary数据,执行如下命令: +``` +python3.7 /usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/toolkit/tools/profiler/profiler_tool/analysis/msprof/msprof.py export summary -dir ./cann_profiling/ +``` + +如果需要指定某一个迭代step,可设置参数:-iteration-id +执行之后会生成summary文件夹,里面csv文件就是summary数据,能看到算子名称,算子执行顺序,算子耗时。 +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig6.png) + +csv文件 +![](https://gitee.com/zwx5317131/ascend-pytorch-crowdintelligence-doc/raw/master/figures/aicore_profiling_fig7.png) + +## 4 展望 +(1)该工具还可以profiling系统性能数据,如PCIE,DVPP,HBM等。 +(2)目前算子的input shape及其dtype、format,output shape及其dtype、format等信息(task_info信息)pytorch场景没有生成,tf场景是有的。因为对于op_based场景,GE没有上报这些信息。已经提了这个需求。 diff --git a/figures/aicore_profiling_fig1.png b/figures/aicore_profiling_fig1.png new file mode 100644 index 0000000000000000000000000000000000000000..cccb56eab432c1ac8c363bfc3d48e582c4f4d67c Binary files /dev/null and b/figures/aicore_profiling_fig1.png differ diff --git a/figures/aicore_profiling_fig2.png b/figures/aicore_profiling_fig2.png new file mode 100644 index 0000000000000000000000000000000000000000..f983683ff7fd2e15d2081ecdc8ea2410508e625a Binary files /dev/null and b/figures/aicore_profiling_fig2.png differ diff --git a/figures/aicore_profiling_fig3.png b/figures/aicore_profiling_fig3.png new file mode 100644 index 0000000000000000000000000000000000000000..5e4ae0c5b57d36b334e168e182fb5e19aabc3251 Binary files /dev/null and b/figures/aicore_profiling_fig3.png differ diff --git a/figures/aicore_profiling_fig4.png b/figures/aicore_profiling_fig4.png new file mode 100644 index 0000000000000000000000000000000000000000..2192c7959db28c963dc69bd7c623e683ae02b69f Binary files /dev/null and b/figures/aicore_profiling_fig4.png differ diff --git a/figures/aicore_profiling_fig5.png b/figures/aicore_profiling_fig5.png new file mode 100644 index 0000000000000000000000000000000000000000..9470f58b866d7e75f1adcaf230e55b228174d90b Binary files /dev/null and b/figures/aicore_profiling_fig5.png differ diff --git a/figures/aicore_profiling_fig6.png b/figures/aicore_profiling_fig6.png new file mode 100644 index 0000000000000000000000000000000000000000..65440a7b8763b801a0d7da66022b5785e8dfd7d2 Binary files /dev/null and b/figures/aicore_profiling_fig6.png differ diff --git a/figures/aicore_profiling_fig7.png b/figures/aicore_profiling_fig7.png new file mode 100644 index 0000000000000000000000000000000000000000..f9f9bae08ef6674dfbba1738c3cd797ea98a39ac Binary files /dev/null and b/figures/aicore_profiling_fig7.png differ