diff --git a/.gitee/PULL_REQUEST_TEMPLATE.zh-CN.md b/.gitee/PULL_REQUEST_TEMPLATE.zh-CN.md index fc9e09f35030f71a8b23b5bc9fe86b120820b8bc..e9cc1deb82ff0498f1a8267cd288ecde798f308c 100644 --- a/.gitee/PULL_REQUEST_TEMPLATE.zh-CN.md +++ b/.gitee/PULL_REQUEST_TEMPLATE.zh-CN.md @@ -17,6 +17,11 @@ --- +## 3. 分支合并要求 +- [ ] **代码合并**(请确保将 master 分支的最新代码同步合并至 poc 分支及 pre-research 分支,同时保证 poc 分支的代码也已正确合并到 pre-research 分支。) + +--- + ## 3. 代码检视 - **要求:** - 合入代码超过 200 行,需三人以上会议检视。 diff --git a/.gitignore b/.gitignore index 2f15a00811101c8743f981fecb6976c7066fb941..2417a7f3477ee3d635fb09975cbe0473f2637031 100644 --- a/.gitignore +++ b/.gitignore @@ -2,7 +2,6 @@ __pycache__/ *.py[cod] *$py.class -.idea # C extensions *.so @@ -143,4 +142,7 @@ cython_debug/ att_advisor*.html *.xlsx operator_tuning_file*.cfg -.ipynb_checkpoints/ \ No newline at end of file +.ipynb_checkpoints/ + +# pycharm settings +.idea \ No newline at end of file diff --git a/.gitmodules b/.gitmodules deleted file mode 100644 index b08433f072bf89f62edf88b3aff40d24c1040ea8..0000000000000000000000000000000000000000 --- a/.gitmodules +++ /dev/null @@ -1,3 +0,0 @@ -[submodule "dynolog_npu/third_party/dynolog"] - path = dynolog_npu/third_party/dynolog - url = https://github.com/facebookincubator/dynolog.git diff --git a/OWNERS b/OWNERS index 415d737ed907c577bc61e71c2839a485395b899c..2e949debf181a6e75fdb5b1e1e091ce7a39c7e69 100644 --- a/OWNERS +++ b/OWNERS @@ -1,6 +1,7 @@ approvers: - leo920320 - wo-wenjie +- ma-dongfang - xhahn - aerfaliang - wangchao285 @@ -10,14 +11,16 @@ approvers: - ly-qianxiao - blian - kun_8 +- binghamhuang reviewers: - lv-kaimeng +- litian_drinksnow +- binghamhuang - wo-wenjie - ly-qianxiao - leo920320 - sunboquan +- stby - Seanesmhxocism - TAJh -- czr9775 -- kali20gakki -- wjchuee \ No newline at end of file +- czr9775 \ No newline at end of file diff --git a/README.md b/README.md index 5ae0bf742fced7ed86452d03d013670cc3528316..dd25d20158d7a42bec57efc931d3fad5e838a73b 100644 --- a/README.md +++ b/README.md @@ -1,75 +1,96 @@ -# 🚨 重要通知 +# 变更通知 -**1. Ascend Training Tools 更名为 MindStudio Training Tools (mstt)。** +原Ascend Training Tools工具更名为MindStudio Training Tools,MindStudio训练工具链。变更计划如下: -**2. 本代码仓 URL 变更为 [https://gitee.com/ascend/mstt](https://gitee.com/ascend/mstt),原 URL 仍然可用(2024.07.04 )。** +1. 2024.06.25本代码仓名称变更为mstt。 +2. 2024.07.04 URL变更为[https://gitee.com/ascend/mstt](https://gitee.com/ascend/mstt),原始URL仍然可用,但建议使用新URL。 ---- +# MindStudio Training Tools -# 🧰 MindStudio Training Tools +MindStudio Training Tools,MindStudio训练工具链。针对训练&大模型场景,提供端到端命令行&可视化调试调优工具,帮助用户快速提高模型开发效率。 -![Build Status](https://img.shields.io/badge/build-passing-brightgreen) -![Commit Activity](https://img.shields.io/badge/commit%20activity-high-red) -![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue) +## 模型训练迁移全流程 +![输入图片说明](debug/resources/model_training_migration_process.png) -## [分析迁移工具](https://gitee.com/ascend/mstt/wikis/工具介绍/分析迁移工具/分析迁移工具介绍) +## 使用说明 + +### [分析迁移工具](https://gitee.com/ascend/mstt/wikis/工具介绍/分析迁移工具/分析迁移工具介绍) 1. [脚本分析工具](https://gitee.com/ascend/mstt/wikis/%E5%B7%A5%E5%85%B7%E4%BB%8B%E7%BB%8D/%E5%88%86%E6%9E%90%E8%BF%81%E7%A7%BB%E5%B7%A5%E5%85%B7/%E5%88%86%E6%9E%90%E5%B7%A5%E5%85%B7%E4%BD%BF%E7%94%A8%E6%8C%87%E5%AF%BC) - 脚本分析工具可以帮助用户在执行迁移操作前,分析基于 GPU 平台的 PyTorch 训练脚本中算子、三方库套件、API 亲和性以及动态 shape 的支持情况。 + 脚本分析工具提供分析脚本,帮助用户在执行迁移操作前,分析基于GPU平台的PyTorch训练脚本中算子、三方库套件、亲和API分析以及动态shape的支持情况。 2. [(推荐)自动迁移工具](https://gitee.com/ascend/mstt/wikis/%E5%B7%A5%E5%85%B7%E4%BB%8B%E7%BB%8D/%E5%88%86%E6%9E%90%E8%BF%81%E7%A7%BB%E5%B7%A5%E5%85%B7/%E8%87%AA%E5%8A%A8%E8%BF%81%E7%A7%BB%E5%B7%A5%E5%85%B7%E4%BD%BF%E7%94%A8%E6%8C%87%E5%AF%BC) - 自动迁移工具只需在训练脚本中导入库代码即可完成模型脚本的迁移,使用方式简单,且修改内容少。 + 自动迁移只需在训练脚本中导入库代码即可完成模型脚本迁移,使用方式较简单,且修改内容最少。 3. [脚本迁移工具](https://gitee.com/ascend/mstt/wikis/%E5%B7%A5%E5%85%B7%E4%BB%8B%E7%BB%8D/%E5%88%86%E6%9E%90%E8%BF%81%E7%A7%BB%E5%B7%A5%E5%85%B7/%E8%84%9A%E6%9C%AC%E8%BF%81%E7%A7%BB%E5%B7%A5%E5%85%B7%E4%BD%BF%E7%94%A8%E6%8C%87%E5%AF%BC) - 脚本迁移工具通过后端命令行,将 GPU 上训练的 PyTorch 脚本迁移至 NPU 上,得到新的训练脚本用于训练。 + 脚本迁移工具提供后端命令行用于将GPU上训练的PyTorch脚本迁移至NPU上,得到新的训练脚本用于训练。 + +4. [训推一体权重转换工具](https://gitee.com/Ascend/mstt/wikis/%E5%B7%A5%E5%85%B7%E4%BB%8B%E7%BB%8D/%E5%88%86%E6%9E%90%E8%BF%81%E7%A7%BB%E5%B7%A5%E5%85%B7/%E8%AE%AD%E6%8E%A8%E4%B8%80%E4%BD%93%E6%9D%83%E9%87%8D%E8%BD%AC%E6%8D%A2%E5%B7%A5%E5%85%B7%E4%BD%BF%E7%94%A8%E6%8C%87%E5%AF%BC) + + 训推一体权重转换工具,支持在GPU和NPU上训练好的模型转成加速推理支持的格式。 -## [精度工具](./debug/accuracy_tools/) +### [精度工具](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools) -[MindStudio Probe(msprobe,MindStudio 精度调试工具)](./debug/accuracy_tools/msprobe)。 +1. [api_accuracy_checker(Ascend模型精度预检工具)](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/api_accuracy_checker) -## [性能工具](./profiler/msprof_analyze) + 在昇腾NPU上扫描用户训练模型中所有API,进行API复现,给出精度情况的诊断和分析。 -1. [compare_tools(性能比对工具)](./profiler/msprof_analyze/compare_tools) +2. [ptdbg_ascend(PyTorch精度工具)](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/ptdbg_ascend) - 提供 NPU 与 GPU 性能拆解功能以及算子、通信、内存性能的比对功能。 + 进行PyTorch整网API粒度的数据dump、精度比对和溢出检测,从而定位PyTorch训练场景下的精度问题。 -2. [cluster_analyse(集群分析工具)](./profiler/msprof_analyze/cluster_analyse) +### [性能工具](https://gitee.com/ascend/mstt/tree/master/profiler) - 提供多机多卡的集群分析能力(基于通信域的通信分析和迭代耗时分析), 当前需要配合 MindStudio Insight 的集群分析功能使用。 +1. [compare_tools(性能比对工具)](https://gitee.com/ascend/mstt/tree/master/profiler/compare_tools) -3. [advisor](./profiler/msprof_analyze/advisor) + 提供NPU与GPU性能拆解功能以及算子、通信、内存性能的比对功能。 - 将 Ascend PyTorch Profiler 或者 msprof 采集的 PyTorch 场景性能数据进行分析,并输出性能调优建议。 +2. [cluster_analyse(集群分析工具)](https://gitee.com/ascend/mstt/tree/master/profiler/cluster_analyse) -4. [bind_core](./profiler/affinity_cpu_bind) + 提供多机多卡的集群分析能力(基于通信域的通信分析和迭代耗时分析), 当前需要配合MindStudio Insight的集群分析功能使用。 - 绑核脚本,支持非侵入修改工程代码,实现一键式绑核功能。 +3. [affinity_cpu_bind (亲和性cpu绑核工具) ](https://gitee.com/ascend/mstt/tree/master/profiler/affinity_cpu_bind) -## [Tensorboard](./plugins/tensorboard-plugins/tb_plugin) + 提供亲和性CPU绑核能力,改善host_bound调度问题。 -Tensorboard 支持 NPU 性能数据可视化插件 PyTorch Profiler TensorBoard NPU Plugin。 +### [Tensorboard](https://gitee.com/ascend/mstt/tree/master/plugins/tensorboard-plugins/tb_plugin) -支持将 Ascend 平台采集、解析的 PyTorch Profiling 数据可视化呈现,也兼容 GPU 数据采集、解析可视化。 +Tensorboard支持NPU性能数据可视化插件PyTorch Profiler TensorBoard NPU Plugin。 + +支持将Ascend平台采集、解析的Pytorch Profiling数据可视化呈现,也兼容GPU数据采集、解析可视化。 ## 分支维护策略 -1. MindStudio Training Tools 工具版本分支的维护阶段如下: +MindStudio Training Tools工具版本分支的维护阶段如下: + +| **状态** | **时间** | **说明** | +| ------------------- | -------- | ------------------------------------------------ | +| 计划 | 1—3 个月 | 计划特性 | +| 开发 | 3个月 | 开发特性 | +| 维护 | 6—12个月 | 合入所有已解决的问题并发布版本 | +| 无维护 | 0—3 个月 | 合入所有已解决的问题,无专职维护人员,无版本发布 | +| 生命周期终止(EOL) | N/A | 分支不再接受任何修改 | + +## 现有分支的维护状态 + +MindStudio Training Tools分支版本号命名规则如下: + +mstt仓每年发布4个版本,每个版本都将对应一个分支;以v6.0为例,其将对应v6.0.RC1、v6.0.RC2、v6.0.RC3以及v6.0.0四个版本,在仓库中将存在与之对应的分支。 + +| **分支** | **状态** | **发布日期** | **后续状态** | **EOL日期** | +| ------------- | -------- | ------------ | ------------------------ | ----------- | +| **v6.0.0** | 维护 | 2023/12/12 | 预计2024/12/12起无维护 | | - | **状态** | **时间** | **说明** | - | ------------------- | -------- | ------------------------------------------------ | - | 计划 | 1—3 个月 | 计划特性 | - | 开发 | 3个月 | 开发特性 | - | 维护 | 6—12个月 | 合入所有已解决的问题并发布版本 | - | 无维护 | 0—3 个月 | 合入所有已解决的问题,无专职维护人员,无版本发布 | - | 生命周期终止(EOL) | N/A | 分支不再接受任何修改 | +## 参与贡献 -2. MindStudio Training Tools 分支版本号命名规则如下: +1. Fork 本仓库 +2. 新建 xxx 分支 +3. 提交代码 +4. 新建 Pull Request - mstt 仓每年发布 4 个版本,每个版本都将对应一个分支;以 v6.0 为例,其将对应 v6.0.RC1、v6.0.RC2、v6.0.RC3 以及 v6.0.0 四个版本,在仓库中将存在与之对应的分支。 +## 版本过渡提示 - | **分支** | **状态** | **发布日期** | **后续状态** | **EOL日期** | - | ------------- | -------- | ------------ | ------------------------ | ----------- | - | **v6.0.0** | 维护 | 2023.12.12 | 预计 2024.12.12 起无维护 | | +当前版本预检和ptdbg维护到2024/09/30,准备于2024/09/30下线,相关目录mstt/debug/accuracy_tools/api_accuracy_checker和mstt/debug/accuracy_tools/ptdbg_ascend将于2024/09/30删除。新版本的预检和ptdbg已经合到mstt/debug/accuracy_tools/atat目录下。 diff --git a/debug/OWNERS b/debug/OWNERS index 0bda9243569f0b6bcd0ce761d7817d512b487ddd..bd8412edd67a6e36a3fa50cb44bc0fae757020c3 100644 --- a/debug/OWNERS +++ b/debug/OWNERS @@ -4,6 +4,8 @@ approvers: - wangchao285 - kun_8 - brightlyking +- wqc01202410 +- shawnzhu1 reviewers: - lv-kaimeng - TAJh @@ -12,5 +14,3 @@ reviewers: - zhengxinqian - louyujing - yang_chen_2001_02_14 -- shawnzhu1 -- wqc01202410 diff --git a/debug/accuracy_tools/msprobe/README.md b/debug/accuracy_tools/msprobe/README.md index 0e68d1f8d9bdaba93a2f65220f85d08eb45f8586..6b7d483078a6a744ce935591ced0971dea2f5b2f 100644 --- a/debug/accuracy_tools/msprobe/README.md +++ b/debug/accuracy_tools/msprobe/README.md @@ -44,6 +44,7 @@ export MSPROBE_LOG_LEVEL={x} - msprobe支持AscendPyTorch 1.11.0或更高版本,支持的PyTorch和CANN以及PyTorch和python软件版本配套关系请参见《[Ascend Extension for PyTorch插件](https://gitee.com/ascend/pytorch)》。 - msprobe支持MindSpore 2.4.0或更高版本,支持的MindSpore和CANN以及MindSpore和python软件版本配套关系请参见《[MindSpore版本发布列表](https://www.mindspore.cn/versions)》。 +- msprobe支持MSAdapter 2.1.0。 - msprobe支持的固件驱动版本与配套CANN软件支持的固件驱动版本相同,开发者可通过“[昇腾社区-固件与驱动](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fhardware%2Ffirmware-drivers%2Fcommunity%3Fproduct%3D2%26model%3D28%26cann%3D8.0.RC3.alpha003%26driver%3D1.0.25.alpha)”页面根据产品型号与CANN软件版本获取配套的固件与驱动。 @@ -69,35 +70,37 @@ export MSPROBE_LOG_LEVEL={x} ### 1 数据采集 -msprobe 通过在训练脚本中添加 PrecisionDebugger 接口的方式对 API 执行精度数据 dump 操作,对应 config.json 中的 task 为 statistics 或 tensor。 +msprobe 通过在训练脚本中添加 PrecisionDebugger 接口的方式对 API 执行精度数据 dump 操作。对应 config.json 中的 "statistics" 或 "tensor" task。 [PyTorch 场景的数据采集](./docs/05.data_dump_PyTorch.md) [MindSpore 场景的数据采集](./docs/06.data_dump_MindSpore.md) +[MSAdapter 场景的数据采集](./docs/29.data_dump_MSAdapter.md) + ### 2 精度预检 -精度预检旨在昇腾 NPU 上扫描训练模型中的所有 API 进行 API 复现,给出精度情况的诊断和分析。对应 config.json 中的 task 为 run_ut。 +精度预检旨在昇腾 NPU 上扫描训练模型中的所有 API 进行 API 复现,给出精度情况的诊断和分析。对应 config.json 中的 "run_ut" task。 PyTorch 场景的[离线预检](./docs/07.accuracy_checker_PyTorch.md)和[在线预检](./docs/08.accuracy_checker_online_PyTorch.md) MindSpore 动态图场景的[离线预检](./docs/09.accuracy_checker_MindSpore.md) -### 3 精度比对 +### 3 分级可视化构图比对 -该功能进行 PyTorch 整网 API 粒度的数据 dump、精度比对,进而定位训练场景下的精度问题。 +该功能将msprobe工具dump的精度数据进行解析,还原模型图结构,实现模型各个层级的精度数据比对,方便用户理解模型结构、分析精度问题。 -[PyTorch 场景的精度比对](./docs/10.accuracy_compare_PyTorch.md) +[PyTorch 场景的分级可视化构图比对](./docs/21.visualization_PyTorch.md) -[MindSpore 场景的精度比对](./docs/11.accuracy_compare_MindSpore.md) +[MindSpore 场景的分级可视化构图比对](./docs/22.visualization_MindSpore.md) -### 4 溢出检测与解析 +### 4 精度比对 -溢出检测与解析是在执行精度数据 dump 时,判断是否存在输入正常但输出存在溢出的 API,从而判断是否为正常溢出。对应 config.json 中的 overflow_check。 +该功能进行 PyTorch 整网 API 粒度的数据 dump、精度比对,进而定位训练场景下的精度问题。 -[PyTorch 场景的溢出检测与解析](./docs/12.overflow_check_PyTorch.md) +[PyTorch 场景的精度比对](./docs/10.accuracy_compare_PyTorch.md) -[MindSpore 场景的溢出检测与解析](./docs/13.overflow_check_MindSpore.md) +[MindSpore 场景的精度比对](./docs/11.accuracy_compare_MindSpore.md) ### 5 数据解析 @@ -129,26 +132,28 @@ MindSpore 动态图场景的[离线预检](./docs/09.accuracy_checker_MindSpore. [兼容 PyTorch 和 MindSpore 框架的训练状态监控](./docs/19.monitor.md) -### 10 分级可视化构图比对 +### 10 单算子API自动生成脚本 -该功能将msprobe工具dump的精度数据进行解析,还原模型图结构,实现模型各个层级的精度数据比对,方便用户理解模型结构、分析精度问题。 +该功能将msprobe工具dump的精度数据进行解析,自动生成单API脚本,用于复现整网中出现的算子问题,降低用户复现问题的成本,供开发分析算子问题。 -[PyTorch 场景的分级可视化构图比对](./docs/21.visualization_PyTorch.md) +[PyTorch 单算子API自动生成脚本](./docs/23.generate_operator_PyTorch.md) -[MindSpore 场景的分级可视化构图比对](./docs/22.visualization_MindSpore.md) +### 11 数码关联 +该功能只支持 MindSpore 静态图场景,用于将IR图与dump数据进行关联,获取dump数据和代码调用栈的关联关系。 -### 11 单算子API自动生成脚本 +[MindSpore 场景的数码关联](./docs/24.code_mapping_Mindspore.md) -该功能将msprobe工具dump的精度数据进行解析,自动生成单API脚本,用于复现整网中出现的算子问题,降低用户复现问题的成本,供开发分析算子问题。 +### 12 溢出检测与解析 -[PyTorch 单算子API自动生成脚本](./docs/23.generate_operator_PyTorch.md) +溢出检测用于采集溢出 API 或 模块的精度数据,而溢出解析则是通过对溢出数据的分析,进一步判断是否为正常溢出。对应 config.json 中的 "overflow_check" task。 +推荐直接使用[数据采集](#1-数据采集)功能采集统计量信息,检测溢出问题。 -### 12 数码关联 +[PyTorch 场景的溢出检测与解析](./docs/12.overflow_check_PyTorch.md) -该功能只支持 MindSpore 静态图场景,用于将IR图与dump数据进行关联,获取dump数据和代码调用栈的关联关系。 +[MindSpore 场景的溢出检测](./docs/13.overflow_check_MindSpore.md) -[MindSpore 场景的数码关联](./docs/24.code_mapping_Mindspore.md) +[MSAdapter 场景的溢出检测](./docs/30.overflow_check_MSAdapter.md) ## 📑 补充材料 diff --git a/debug/accuracy_tools/msprobe/ccsrc/base/DebuggerConfig.hpp b/debug/accuracy_tools/msprobe/ccsrc/base/DebuggerConfig.hpp index 15ea9e6fda47c0380d9718f135a1baf0658788eb..d56191443f8e6a7819c2bfbf402a5937bacd92ff 100644 --- a/debug/accuracy_tools/msprobe/ccsrc/base/DebuggerConfig.hpp +++ b/debug/accuracy_tools/msprobe/ccsrc/base/DebuggerConfig.hpp @@ -199,7 +199,7 @@ public: OverflowCheckCfg() = default; ~OverflowCheckCfg() = default; - uint32_t overflowNums{1}; + int32_t overflowNums{1}; DebuggerOpCheckLevel checkMode{DebuggerOpCheckLevel::CHECK_LEVEL_ALL}; private: diff --git a/debug/accuracy_tools/msprobe/ccsrc/core/AclDumpDataProcessor.cpp b/debug/accuracy_tools/msprobe/ccsrc/core/AclDumpDataProcessor.cpp index 0fe3443fa1f9286fe77c710c955d543d94c4b3a4..d26b1a6a2c341e0a60f0bc71b021f64ab6da5a1b 100644 --- a/debug/accuracy_tools/msprobe/ccsrc/core/AclDumpDataProcessor.cpp +++ b/debug/accuracy_tools/msprobe/ccsrc/core/AclDumpDataProcessor.cpp @@ -56,23 +56,30 @@ constexpr const char* kStatsHeaderShape = "Shape"; constexpr const char* kStatsHeaderMax = "Max Value"; constexpr const char* kStatsHeaderMin = "Min Value"; constexpr const char* kStatsHeaderAvg = "Avg Value"; -constexpr const char* kStatsHeaderL2Norm = "L2 Norm Value"; +constexpr const char* kStatsHeaderL2Norm = "l2norm"; +constexpr const char* kStatsHeaderL2NormInCsv = "L2Norm Value"; constexpr const char* kStatsHeaderMD5 = "MD5 Value"; constexpr const char* kStatsHeaderNan = "Nan Count"; +constexpr const char* kStatsHeaderNanInCsv = "NaN Count"; constexpr const char* kStatsHeaderNegInf = "Negative Inf Count"; constexpr const char* kStatsHeaderPosInf = "Positive Inf Count"; constexpr const char* kRankId = "RANK_ID"; constexpr const char* kDigitalNumbers = "0123456789"; -static const std::map summaryOptionHeaderStrMap = { - {DebuggerSummaryOption::MAX, kStatsHeaderMax}, - {DebuggerSummaryOption::MIN, kStatsHeaderMin}, - {DebuggerSummaryOption::MEAN, kStatsHeaderAvg}, - {DebuggerSummaryOption::L2NORM, kStatsHeaderL2Norm}, - {DebuggerSummaryOption::NAN_CNT, kStatsHeaderNan}, - {DebuggerSummaryOption::NEG_INF_CNT, kStatsHeaderNegInf}, - {DebuggerSummaryOption::POS_INF_CNT, kStatsHeaderPosInf}, - {DebuggerSummaryOption::MD5, kStatsHeaderMD5}, +static const std::map> summaryOptionHeaderStrMap = { + {DebuggerSummaryOption::MAX, {kStatsHeaderMax, kStatsHeaderMax}}, + {DebuggerSummaryOption::MIN, {kStatsHeaderMin, kStatsHeaderMin}}, + {DebuggerSummaryOption::MEAN, {kStatsHeaderAvg, kStatsHeaderAvg}}, + {DebuggerSummaryOption::L2NORM, {kStatsHeaderL2Norm, kStatsHeaderL2NormInCsv}}, + {DebuggerSummaryOption::NAN_CNT, {kStatsHeaderNan, kStatsHeaderNanInCsv}}, + {DebuggerSummaryOption::NEG_INF_CNT, {kStatsHeaderNegInf, kStatsHeaderNegInf}}, + {DebuggerSummaryOption::POS_INF_CNT, {kStatsHeaderPosInf, kStatsHeaderPosInf}}, + {DebuggerSummaryOption::MD5, {kStatsHeaderMD5, kStatsHeaderMD5}}, +}; + +const static std::map kDtypeTransMap = { + {AclDtype::DT_BF16, AclDtype::DT_FLOAT}, + {AclDtype::DT_INT4, AclDtype::DT_INT8}, }; class AclTensorStats { @@ -170,7 +177,7 @@ static std::map ParseTensorSummaryHeaderOrder(c for (uint32_t pos = 0; pos < segs.size(); ++pos) { const std::string& opt = segs[pos]; for (auto it = summaryOptionHeaderStrMap.begin(); it != summaryOptionHeaderStrMap.end(); ++it) { - if (opt == it->second) { + if (opt == it->second.first) { ret[pos] = it->first; break; } @@ -233,7 +240,7 @@ std::string AclTensorStats::GetCsvHeader() const ret.append("Op Type,Op Name,Task ID,Stream ID,Timestamp,Input/Output,Slot,Data Size,Data Type,Format,Shape"); for (auto it = stats.begin(); it != stats.end(); it++) { ret.append(","); - ret.append(summaryOptionHeaderStrMap.at(it->first)); + ret.append(summaryOptionHeaderStrMap.at(it->first).second); } ret.append("\n"); @@ -603,7 +610,7 @@ static std::string GenDataPath(const std::string& path) { inline std::string GetTensorInfoSuffix(AclTensorInfo& tensor) { return "." + tensor.inout + "." + std::to_string(tensor.slot) + - "." + DataUtils::GetFormatString(tensor.hostFmt) + "." + DataUtils::GetDTypeString(tensor.dtype); + "." + DataUtils::GetFormatString(tensor.hostFmt) + "." + DataUtils::GetDTypeString(tensor.oriDtype); } static DebuggerErrno DumpOneAclTensorFmtBin(AclTensorInfo& tensor) @@ -640,10 +647,13 @@ static DebuggerErrno DumpOneAclTensorFmtNpy(AclTensorInfo& tensor) return DebuggerErrno::OK; } - if (tensor.dtype == AclDtype::DT_BF16) { - ret = AclTensor::TransDtype(tensor, AclDtype::DT_FLOAT); + auto it = kDtypeTransMap.find(tensor.dtype); + if (it != kDtypeTransMap.end()) { + AclDtype dstDtype = it->second; + ret = AclTensor::TransDtype(tensor, dstDtype); if (ret != DebuggerErrno::OK) { - LOG_ERROR(ret, tensor + ": Failed to transform dtype from bf16 to fp32."); + LOG_ERROR(ret, tensor + ": Failed to transform dtype from " + DataUtils::GetDTypeString(it->first) + " to " + + DataUtils::GetDTypeString(it->second)+ "."); return ret; } } @@ -705,6 +715,7 @@ static DebuggerErrno WriteOneTensorStatToDisk(const AclTensorStats& stat) if (i >= retry) { LOG_ERROR(DebuggerErrno::ERROR_SYSCALL_FAILED, "Failed to occupy file " + dumpfile); + close(fd); return DebuggerErrno::ERROR_SYSCALL_FAILED; } @@ -736,7 +747,9 @@ static DebuggerErrno DumpOneAclTensor(AclTensorInfo& tensor, std::vector overflowNums; +} + +void AclDumper::CountOverflowNumbers(const acldumpChunk* chunk) +{ + if (IsOverflowCompleted() || !isOverflowDump || !chunk->isLastChunk) { + return; + } + const std::string fileName = chunk->fileName; + auto separator = fileName.rfind("/"); + auto fileBaseName = fileName.substr(separator + 1); + if (fileBaseName.rfind("Opdebug.Node_OpDebug.") == 0) { + // count according to the first file: Node_OpDebug + realOverflowNums++; + } + return; +} + std::string AclDumper::GetDumpPath(uint32_t curStep) const { if (!initialized || foreDumpPath.empty()) { @@ -357,6 +377,11 @@ DebuggerErrno AclDumper::Initialize() void AclDumper::OnAclDumpCallBack(const acldumpChunk* chunk, int32_t len) { DEBUG_FUNC_TRACE(); + CountOverflowNumbers(chunk); + if (IsOverflowCompleted()) { + return; + } + std::string dumpPath = FileUtils::GetAbsPath(chunk->fileName); auto it = dataProcessors.find(dumpPath); if (it == dataProcessors.end()) { @@ -424,6 +449,8 @@ void AclDumper::SetDump(uint32_t rank, uint32_t curStep, ExtArgs& args) ret = AclDumpGenStatJson(statisticsCfg, rank, curStep, kernels); } else if (overflowCheckCfg != nullptr) { ret = AclDumpGenOverflowJson(overflowCheckCfg, rank, curStep); + overflowNums = overflowCheckCfg->overflowNums; + isOverflowDump = true; } if (ret != DebuggerErrno::OK) { diff --git a/debug/accuracy_tools/msprobe/ccsrc/core/AclDumper.hpp b/debug/accuracy_tools/msprobe/ccsrc/core/AclDumper.hpp index dcfad5fafcabdf944e1d4b0b0a3cd77251ce047d..6985df65e166101c08501e5e206e003bda494b9a 100644 --- a/debug/accuracy_tools/msprobe/ccsrc/core/AclDumper.hpp +++ b/debug/accuracy_tools/msprobe/ccsrc/core/AclDumper.hpp @@ -58,11 +58,17 @@ private: uint32_t curStep, const char** kernels); DebuggerErrno AclDumpGenOverflowJson(std::shared_ptr overflowCfg, uint32_t rank, uint32_t curStep); + void CountOverflowNumbers(const acldumpChunk* chunk); + bool IsOverflowCompleted(); + bool initialized{false}; bool aclDumpHasSet{false}; std::string foreDumpPath; std::vector hostAnalysisOpt; std::map> dataProcessors; + bool isOverflowDump{false}; + int32_t overflowNums{1}; + int32_t realOverflowNums{0}; }; void KernelInitDump(); diff --git a/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.cpp b/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.cpp index 45adff4962156f87f52c17166bc3b381f07f2978..4a5ec4c555198015603d7cc1446be66fda05765d 100644 --- a/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.cpp +++ b/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.cpp @@ -291,7 +291,11 @@ static inline void AssertDim(const AclShape& shape, size_t dim) static inline void AssertConsis(const AclTensorInfo& tensor) { - if (EleNumOfTensor(tensor, false) * SizeOfAclDType(tensor) != tensor.dataSize) { + size_t tensor_size = EleNumOfTensor(tensor, false) * SizeOfAclDType(tensor); + // Processing dtype whose size < 1 + // The ele num of quantization type(qint4*2) in MindSpore must be even. + if (tensor.dtype == AclDtype::DT_INT4) tensor_size = EleNumOfTensor(tensor, false) / 2; + if (tensor_size != tensor.dataSize) { throw std::runtime_error(tensor + ": The internal data of Tensor is inconsistent."); } } @@ -343,7 +347,7 @@ AclTensorInfo ParseAttrsFromDumpData(const std::string& dumpPath, const uint8_t* } int32_t subFormat = tensor.sub_format(); - return AclTensorInfo{dumpPath, data, dtype, dFmt, hFmt, dShape, hShape, dataSize, subFormat, io, slot, dumpOriginData}; + return AclTensorInfo{dumpPath, data, dtype, dtype, dFmt, hFmt, dShape, hShape, dataSize, subFormat, io, slot, dumpOriginData}; } template AclTensorInfo ParseAttrsFromDumpData( @@ -763,34 +767,80 @@ static void TransBf16ToFp32(const uint8_t* input, size_t num, uint8_t* output, s } } -DebuggerErrno TransDtype(AclTensorInfo& tensor, AclDtype to) +static void TransInt4ToInt8(const uint8_t* input, size_t elemNums, uint8_t* output, size_t bufferSize) { + if (bufferSize < elemNums * sizeof(int8_t)) { + LOG_ERROR(DebuggerErrno::ERROR_BUFFER_OVERFLOW, "Insufficient space for converting data from int4 to int8."); + return; + } + const int8_t *srcData = reinterpret_cast(input); + int8_t *dstData = reinterpret_cast(output); + size_t inputLength = elemNums / 2; + int maxValue = 7; + int minValue = -8; + int signBitShift = 3; + int signBitMask = 0x08; + for (size_t i = 0; i < inputLength; ++i) { + int8_t s = *srcData; + int8_t t = s & 0xf; + // keep the sign bit not change + int8_t signBit = (t & signBitMask) >> signBitShift; + if (signBit == 1) { + t = t | 0xf0; + } else { + t = t & 0x0f; + } + if (t < minValue || t > maxValue) { + LOG_ERROR(DebuggerErrno::ERROR_INVALID_VALUE, "Invalid int4 value."); + } + *dstData = t; + ++dstData; + + int highByteShift = 4; + t = s >> highByteShift; + signBit = (t & signBitMask) >> signBitShift; + if (signBit == 1) { + t = t | 0xf0; + } else { + t = t & 0x0f; + } + if (t < minValue || t > maxValue) { + LOG_ERROR(DebuggerErrno::ERROR_INVALID_VALUE, "Invalid int4 value."); + } + *dstData = t; + ++dstData; + ++srcData; + } + return; +} - const static std::set> kSupportedDtypeTrans = { - {AclDtype::DT_BF16, AclDtype::DT_FLOAT}, - }; +DebuggerErrno TransDtype(AclTensorInfo& tensor, AclDtype to) +{ if (tensor.dtype == to) { return DebuggerErrno::OK; } - if (kSupportedDtypeTrans.find({tensor.dtype, to}) == kSupportedDtypeTrans.end()) { - return DebuggerErrno::ERROR_UNKNOWN_TRANS; - } - + tensor.oriDtype = tensor.dtype; std::vector buffer; AssertConsis(tensor); size_t bufferSize = EleNumOfTensor(tensor) * SizeOfAclDType(to); - buffer.reserve(bufferSize); + buffer.resize(bufferSize); const uint8_t* input = tensor.transBuf.empty() ? tensor.aclData : tensor.transBuf.data(); uint8_t* output = buffer.data(); - /* 目前仅支持bf16->fp32,若有通用转换需求再用更泛化的方式重写 */ if (tensor.dtype == AclDtype::DT_BF16 && to == AclDtype::DT_FLOAT) { TransBf16ToFp32(input, EleNumOfTensor(tensor), output, bufferSize); + } else if (tensor.dtype == AclDtype::DT_INT4 && to == AclDtype::DT_INT8) { + TransInt4ToInt8(input, EleNumOfTensor(tensor), output, bufferSize); + } else { + LOG_ERROR(DebuggerErrno::ERROR_UNKNOWN_TRANS, tensor + ": Trans " + DataUtils::GetDTypeString(tensor.dtype) + + " to " + DataUtils::GetDTypeString(to) + " is not supported."); + return DebuggerErrno::ERROR_UNKNOWN_TRANS; } tensor.transBuf = std::move(buffer); + tensor.dtype = to; return DebuggerErrno::OK; } diff --git a/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.hpp b/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.hpp index 8b5ba5b06d935d5aaa2dff35e921b9072db6aa1a..f2ac429a7f14370ea1721369c7f9089cb971bb6e 100644 --- a/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.hpp +++ b/debug/accuracy_tools/msprobe/ccsrc/core/AclTensor.hpp @@ -40,6 +40,7 @@ struct AclTensorInfo { std::string dumpPath; const uint8_t* aclData; AclDtype dtype; + AclDtype oriDtype; AclFormat deviceFmt; AclFormat hostFmt; AclShape deviceShape; @@ -52,7 +53,7 @@ struct AclTensorInfo { std::vector transBuf; std::string ToString() const { - return "AclTensor(path=" + dumpPath + ",dtype=" + std::to_string(dtype) + ",inout=" + inout + ")"; + return "AclTensor(path=" + dumpPath + ",dtype=" + DataUtils::GetDTypeString(dtype) + ",inout=" + inout + ")"; } }; @@ -71,6 +72,7 @@ AclTensorInfo ParseAttrsFromDumpData(const std::string &dumpPath, const uint8_t* const std::string& io, uint32_t slot); DebuggerErrno TransFormatD2H(AclTensorInfo& tensor); DebuggerErrno TransDtype(AclTensorInfo& tensor, AclDtype to); +bool IsDtypeSupportTrans(AclDtype dtype); } } diff --git a/debug/accuracy_tools/msprobe/config.json b/debug/accuracy_tools/msprobe/config.json index 553b7f9ee3b89215647b00fb14b70af44ea5f00c..9bf9579b80770210bdda668b782a41540e7cb763 100644 --- a/debug/accuracy_tools/msprobe/config.json +++ b/debug/accuracy_tools/msprobe/config.json @@ -25,7 +25,9 @@ "run_ut": { "white_list": [], "black_list": [], - "error_data_path": "./" + "error_data_path": "./", + "master_ip": "127.0.0.1", + "master_port": "8888" }, "grad_probe": { "grad_level": "L1", diff --git a/debug/accuracy_tools/msprobe/core/common/const.py b/debug/accuracy_tools/msprobe/core/common/const.py index d9623b807121ea129484a535fe8a9e2293e662f3..ff9f29e4d296c2fef4c1574749b98df72fdbb55f 100644 --- a/debug/accuracy_tools/msprobe/core/common/const.py +++ b/debug/accuracy_tools/msprobe/core/common/const.py @@ -27,6 +27,8 @@ class Const: ipv4_pattern = "([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5])(\.([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5])){3}$" SEP = "." + COLON = ":" + DOUBLE_SLASH = "//" REGEX_PREFIX_MAX_LENGTH = 20 REGEX_PREFIX_PATTERN = r"^[a-zA-Z0-9_-]+$" REGEX_FORWARD_BACKWARD = r'\.(forward|backward)\.' @@ -77,6 +79,8 @@ class Const: NUMPY_SUFFIX = ".npy" NUMPY_PATTERN = "*.npy" PT_SUFFIX = ".pt" + PY_SUFFIX = ".py" + INIT_PY = "init.py" ONE_GB = 1073741824 # 1 * 1024 * 1024 * 1024 TEN_GB = 10737418240 # 10 * 1024 * 1024 * 1024 ONE_MB = 1048576 # 1 * 1024 * 1024 @@ -163,6 +167,7 @@ class Const: LEFT_MOVE_INDEX = -1 RIGHT_MOVE_INDEX = 1 LAST_INDEX = -1 + MAX_TRAVERSAL_DEPTH = 5 TOP_LAYER = "TopLayer" CELL = "Cell" @@ -206,12 +211,15 @@ class Const: TORCH_FLOAT32 = "torch.float32" TORCH_BFLOAT16 = "torch.bfloat16" + TYPE = 'type' DTYPE = 'dtype' SHAPE = 'shape' MAX = 'Max' MIN = 'Min' MEAN = 'Mean' NORM = 'Norm' + DATA_NAME = 'data_name' + TENSOR_STAT_INDEX = 'tensor_stat_index' CODE_STACK = 'Code Stack' OP_NAME = 'Op Name' @@ -224,12 +232,105 @@ class Const: SCOPE_SEPARATOR = "/" REPLACEMENT_CHARACTER = "_" + FORWARD_PATTERN = SEP + FORWARD + SEP + BACKWARD_PATTERN = SEP + BACKWARD + SEP + OPTIMIZER = "optimizer" CLIP_GRAD = "clip_grad" END_PREFIX = "end_" TENSOR_STAT_LEN = 2 + SUPPORT_API_FILE_NAME = "support_wrap_ops.yaml" + + PT_API_TYPE_FUNCTIONAL = "functional" + PT_API_TYPE_TENSOR = "tensor" + PT_API_TYPE_TORCH = "torch" + PT_API_TYPE_VF = "_VF" + PT_API_TYPE_NPU = "torch_npu" + PT_API_TYPE_ATEN = "aten" + PT_API_TYPE_DIST = "distributed" + PT_API_TYPE_NPU_DIST = "npu_distributed" + PT_API_TYPE_MINDSPEED = "mindspeed" + + MS_API_TYPE_OPS = "ops" + MS_API_TYPE_TENSOR = "tensor" + MS_API_TYPE_STUB_TENSOR = "stubtensor" + MS_API_TYPE_MINT = "mint.ops" + MS_API_TYPE_MINT_FUNC = "mint.nn.functional" + MS_API_TYPE_COM = "communication.comm_func" + + FUNCTIONAL_API_TYPE_PREFIX = "Functional" + TENSOR_API_TYPE_PREFIX = "Tensor" + DIST_API_TYPE_PREFIX = "Distributed" + + TORCH_API_TYPE_PREFIX = "Torch" + NPU_API_TYPE_PREFIX = "NPU" + ATEN_API_TYPE_PREFIX = "Aten" + VF_API_TYPE_PREFIX = "VF" + MINDSPEED_API_TYPE_PREFIX = "MindSpeed" + + MINT_API_TYPE_PREFIX = "Mint" + MINT_FUNC_API_TYPE_PREFIX = "MintFunctional" + + SUPPORT_API_DICT_KEY_MAP = { + PT_FRAMEWORK: { + PT_API_TYPE_FUNCTIONAL: PT_API_TYPE_FUNCTIONAL, + PT_API_TYPE_TENSOR: PT_API_TYPE_TENSOR, + PT_API_TYPE_TORCH: PT_API_TYPE_TORCH, + PT_API_TYPE_VF: PT_API_TYPE_VF, + PT_API_TYPE_NPU: PT_API_TYPE_NPU, + PT_API_TYPE_ATEN: PT_API_TYPE_ATEN, + PT_API_TYPE_DIST: PT_API_TYPE_DIST, + PT_API_TYPE_NPU_DIST: PT_API_TYPE_NPU_DIST, + PT_API_TYPE_MINDSPEED: PT_API_TYPE_MINDSPEED + }, + MS_FRAMEWORK: { + MS_API_TYPE_OPS: MS_API_TYPE_OPS, + MS_API_TYPE_TENSOR: MS_API_TYPE_TENSOR, + MS_API_TYPE_STUB_TENSOR: MS_API_TYPE_TENSOR, + MS_API_TYPE_MINT: MS_API_TYPE_MINT, + MS_API_TYPE_MINT_FUNC: MS_API_TYPE_MINT_FUNC, + MS_API_TYPE_COM: MS_API_TYPE_COM + }, + MT_FRAMEWORK: { + PT_API_TYPE_FUNCTIONAL: PT_API_TYPE_FUNCTIONAL, + PT_API_TYPE_TENSOR: PT_API_TYPE_TENSOR, + PT_API_TYPE_TORCH: PT_API_TYPE_TORCH, + PT_API_TYPE_NPU: PT_API_TYPE_NPU, + PT_API_TYPE_DIST: PT_API_TYPE_DIST + } + } + + API_DATA_PREFIX = { + PT_FRAMEWORK: { + PT_API_TYPE_FUNCTIONAL: FUNCTIONAL_API_TYPE_PREFIX, + PT_API_TYPE_TENSOR: TENSOR_API_TYPE_PREFIX, + PT_API_TYPE_TORCH: TORCH_API_TYPE_PREFIX, + PT_API_TYPE_VF: VF_API_TYPE_PREFIX, + PT_API_TYPE_NPU: NPU_API_TYPE_PREFIX, + PT_API_TYPE_ATEN: ATEN_API_TYPE_PREFIX, + PT_API_TYPE_DIST: DIST_API_TYPE_PREFIX, + PT_API_TYPE_NPU_DIST: DIST_API_TYPE_PREFIX, + PT_API_TYPE_MINDSPEED: MINDSPEED_API_TYPE_PREFIX + }, + MS_FRAMEWORK: { + MS_API_TYPE_OPS: FUNCTIONAL_API_TYPE_PREFIX, + MS_API_TYPE_TENSOR: TENSOR_API_TYPE_PREFIX, + MS_API_TYPE_STUB_TENSOR: TENSOR_API_TYPE_PREFIX, + MS_API_TYPE_MINT: MINT_API_TYPE_PREFIX, + MS_API_TYPE_MINT_FUNC: MINT_FUNC_API_TYPE_PREFIX, + MS_API_TYPE_COM: DIST_API_TYPE_PREFIX + }, + MT_FRAMEWORK: { + PT_API_TYPE_FUNCTIONAL: FUNCTIONAL_API_TYPE_PREFIX, + PT_API_TYPE_TENSOR: TENSOR_API_TYPE_PREFIX, + PT_API_TYPE_TORCH: TORCH_API_TYPE_PREFIX, + PT_API_TYPE_NPU: NPU_API_TYPE_PREFIX, + PT_API_TYPE_DIST: DIST_API_TYPE_PREFIX + } + } + class CompareConst: """ @@ -256,6 +357,7 @@ class CompareConst: MEAN_DIFF = "Mean diff" NORM_DIFF = "L2norm diff" COSINE = "Cosine" + EUC_DIST = "EucDist" MAX_ABS_ERR = "MaxAbsErr" MAX_RELATIVE_ERR = "MaxRelativeErr" MIN_RELATIVE_ERR = "MinRelativeErr" @@ -330,8 +432,8 @@ class CompareConst: ULP_ERR_STATUS = "ulp_err_status" COMPARE_RESULT_HEADER = [ - NPU_NAME, BENCH_NAME, NPU_DTYPE, BENCH_DTYPE, NPU_SHAPE, BENCH_SHAPE, COSINE, MAX_ABS_ERR, MAX_RELATIVE_ERR, - ONE_THOUSANDTH_ERR_RATIO, FIVE_THOUSANDTHS_ERR_RATIO, + NPU_NAME, BENCH_NAME, NPU_DTYPE, BENCH_DTYPE, NPU_SHAPE, BENCH_SHAPE, COSINE, EUC_DIST, + MAX_ABS_ERR, MAX_RELATIVE_ERR, ONE_THOUSANDTH_ERR_RATIO, FIVE_THOUSANDTHS_ERR_RATIO, NPU_MAX, NPU_MIN, NPU_MEAN, NPU_NORM, BENCH_MAX, BENCH_MIN, BENCH_MEAN, BENCH_NORM, ACCURACY, ERROR_MESSAGE ] @@ -357,18 +459,16 @@ class CompareConst: Const.MD5: MD5_COMPARE_RESULT_HEADER } - ALL_COMPARE_INDEX = [COSINE, MAX_ABS_ERR, MAX_RELATIVE_ERR, ONE_THOUSANDTH_ERR_RATIO, FIVE_THOUSANDTHS_ERR_RATIO] + ALL_COMPARE_INDEX = [COSINE, EUC_DIST, MAX_ABS_ERR, MAX_RELATIVE_ERR, ONE_THOUSANDTH_ERR_RATIO, + FIVE_THOUSANDTHS_ERR_RATIO] SUMMARY_COMPARE_INDEX = [MAX_DIFF, MIN_DIFF, MEAN_DIFF, NORM_DIFF, MAX_RELATIVE_ERR, MIN_RELATIVE_ERR, MEAN_RELATIVE_ERR, NORM_RELATIVE_ERR] # dtype match - MS_TYPE = [ - [Const.FLOAT16, Const.FLOAT32], [Const.FLOAT32, Const.FLOAT16], - [Const.FLOAT16, Const.BFLOAT16], [Const.BFLOAT16, Const.FLOAT16] - ] - TORCH_TYPE = [ - [Const.TORCH_FLOAT16, Const.TORCH_FLOAT32], [Const.TORCH_FLOAT32, Const.TORCH_FLOAT16], - [Const.TORCH_FLOAT16, Const.TORCH_BFLOAT16], [Const.TORCH_BFLOAT16, Const.TORCH_FLOAT16] + + DTYPE_MATCH_GROUPS = [ + {Const.FLOAT16, Const.FLOAT32, Const.BFLOAT16}, + {Const.TORCH_FLOAT16, Const.TORCH_FLOAT32, Const.TORCH_BFLOAT16} ] # read_op @@ -467,7 +567,7 @@ class CompareConst: BENCH_MEAN: None, BENCH_NORM: None, ACCURACY: '', ERROR_MESSAGE: '' } MS_GRAPH_NPY = { - COSINE: None, MAX_ABS_ERR: None, MAX_RELATIVE_ERR: None, ONE_THOUSANDTH_ERR_RATIO: None, + COSINE: None, EUC_DIST: None, MAX_ABS_ERR: None, MAX_RELATIVE_ERR: None, ONE_THOUSANDTH_ERR_RATIO: None, FIVE_THOUSANDTHS_ERR_RATIO: None } MS_GRAPH_STATISTIC = { @@ -504,6 +604,7 @@ class FileCheckConst: XLSX_SUFFIX = ".xlsx" YAML_SUFFIX = ".yaml" IR_SUFFIX = ".ir" + ZIP_SUFFIX = ".zip" MAX_PKL_SIZE = 1073741824 # 1 * 1024 * 1024 * 1024 MAX_NUMPY_SIZE = 10737418240 # 10 * 1024 * 1024 * 1024 MAX_JSON_SIZE = 1073741824 # 1 * 1024 * 1024 * 1024 @@ -512,6 +613,8 @@ class FileCheckConst: MAX_XLSX_SIZE = 1073741824 # 1 * 1024 * 1024 * 1024 MAX_YAML_SIZE = 1073741824 # 1 * 1024 * 1024 * 1024 MAX_IR_SIZE = 1073741824 # 1 * 1024 * 1024 * 1024 + MAX_ZIP_SIZE = 10737418240 # 10 * 1024 * 1024 * 1024 + MAX_FILE_IN_ZIP_SIZE = 1073741824 # 1 * 1024 * 1024 * 1024 COMMOM_FILE_SIZE = 1048576 # 1 * 1024 * 1024 DIR = "dir" FILE = "file" @@ -525,7 +628,8 @@ class FileCheckConst: CSV_SUFFIX: MAX_CSV_SIZE, XLSX_SUFFIX: MAX_XLSX_SIZE, YAML_SUFFIX: MAX_YAML_SIZE, - IR_SUFFIX: MAX_IR_SIZE + IR_SUFFIX: MAX_IR_SIZE, + ZIP_SUFFIX: MAX_ZIP_SIZE } CSV_BLACK_LIST = r'^[+-=%@\+\-=%@]|;[+-=%@\+\-=%@]' @@ -538,61 +642,6 @@ class OverflowConst: OVERFLOW_DEBUG_MODE = 1 -class MsCompareConst: - # api_info field - MINT = "Mint" - MINT_FUNCTIONAL = "MintFunctional" - TENSOR_API = "Tensor" - - API_NAME_STR_LENGTH = 4 - MAX_RECURSION_DEPTH = 20 - - # Mindtorch api_info field - MINDTORCH_TENSOR = "Tensor" - MINDTORCH = "Torch" - MINDTORCH_FUNC = "Functional" - MINDTORCH_NPU = "NPU" - MINDTORCH_DIST = "Distributed" - - - - MT_VALID_API_TYPES = [ - MINDTORCH, MINDTORCH_FUNC, MINDTORCH_TENSOR - ] - - TASK_FIELD = "task" - STATISTICS_TASK = "statistics" - FRAMEWORK = "framework" - TENSOR_TASK = "tensor" - DUMP_DATA_DIR_FIELD = "dump_data_dir" - DATA_FIELD = "data" - - # supported api yaml - SUPPORTED_API_LIST_FILE = "checker_support_api.yaml" - SUPPORTED_TENSOR_LIST_KEY = "tensor" - - # detail_csv - DETAIL_CSV_API_NAME = "API Name" - DETAIL_CSV_BENCH_DTYPE = "Bench Dtype" - DETAIL_CSV_TESTED_DTYPE = "Tested Dtype" - DETAIL_CSV_SHAPE = "Shape" - DETAIL_CSV_PASS_STATUS = "Status" - DETAIL_CSV_MESSAGE = "Message" - DETAIL_CSV_FILE_NAME = "accuracy_checking_details" - - # result_csv - RESULT_CSV_FORWARD_TEST_SUCCESS = "Forward Test Success" - RESULT_CSV_BACKWARD_TEST_SUCCESS = "Backward Test Success" - RESULT_CSV_FILE_NAME = "accuracy_checking_result" - - EPSILON = 1e-8 - - class ProcessStatus: - SUCCESS = "success" - API_NOT_FOUND = "api_not_found" - EXCEPTION_SKIP = "exception_skip" - - class MsgConst: """ Class for log messages const @@ -674,3 +723,65 @@ class MonitorConst: CSV = "csv" API = "api" HEADER_NAME = 'name' + + +class DistributedCheckConst: + API_FULL_NAME = "api_full_name" + API_NAME = "api_name" + GROUP = "group" + GROUP_RANKS = "group_ranks" + GROUP_INDEX = "group_index" + SRC = "src" + SRC_INDEX = "src_index" + OP = "op" + SCATTER_LIST = "scatter_list" + TORCH_PROCESS_GROUP = "torch.ProcessGroup" + ALL_ARGS = "all_args" + ALL_KWARGS = "all_kwargs" + RESULT_FILE_PATH = "result_file_path" + BENCHMARK_RESULT = "benchmark_result" + MASTER_IP = "master_ip" + MASTER_PORT = "master_port" + WORLD_SIZE = "world_size" + HCCL = "hccl" + TCP = "tcp" + BROADCAST = "broadcast" + REDUCE = "reduce" + ALL_REDUCE = "all_reduce" + SCATTER = "scatter" + GATHER = "gather" + ALL_GATHER = "all_gather" + ALL_TO_ALL = "all_to_all" + ALL_TO_ALL_SINGLE = "all_to_all_single" + BROADCAST_SRC_INDEX = 1 + FIRST_TENSOR_INDEX = 0 + MAX_CUMSUM_CHECK_NUM = 1000 + + REDOPTYPE_SUM = "RedOpType.SUM" + REDOPTYPE_PRODUCT = "RedOpType.PRODUCT" + REDOPTYPE_MIN = "RedOpType.MIN" + REDOPTYPE_MAX = "RedOpType.MAX" + REDOPTYPE_BAND = "RedOpType.BAND" + REDOPTYPE_BOR = "RedOpType.BOR" + REDOPTYPE_BXOR = "RedOpType.BXOR" + + API_ARGS_INDEX = { + "broadcast": { + "group": 2, + "src": 1 + }, + "reduce": { + "op": 2, + "dst": 1 + }, + "all_reduce": { + "reduce_op": 2 + }, + "scatter": { + "src": 2, + "scatter_list": 1 + }, + "gather": { + "dst": 2 + } + } diff --git a/debug/accuracy_tools/msprobe/core/common/exceptions.py b/debug/accuracy_tools/msprobe/core/common/exceptions.py index d71d30224b677fb19361f62de0ee25b2d32d389f..252860aee756c336ec7265207ead36a720780a98 100644 --- a/debug/accuracy_tools/msprobe/core/common/exceptions.py +++ b/debug/accuracy_tools/msprobe/core/common/exceptions.py @@ -28,12 +28,14 @@ class MsprobeException(CodedException): OVERFLOW_NUMS_ERROR = 1 RECURSION_LIMIT_ERROR = 2 INTERFACE_USAGE_ERROR = 3 + UNSUPPORTED_TYPE_ERROR = 4 err_strs = { INVALID_PARAM_ERROR: "[msprobe] 无效参数:", OVERFLOW_NUMS_ERROR: "[msprobe] 超过预设溢出次数 当前溢出次数:", RECURSION_LIMIT_ERROR: "[msprobe] 递归调用超过限制:", - INTERFACE_USAGE_ERROR: "[msprobe] Invalid interface usage: " + INTERFACE_USAGE_ERROR: "[msprobe] Invalid interface usage: ", + UNSUPPORTED_TYPE_ERROR: "[msprobe] Unsupported type: " } diff --git a/debug/accuracy_tools/msprobe/core/common/file_utils.py b/debug/accuracy_tools/msprobe/core/common/file_utils.py index fdc626ca6a1a90e9060cefa237f9d5d8d7e42844..38eb9cd3dafdff5c83015fce91a9c3bf02a6757a 100644 --- a/debug/accuracy_tools/msprobe/core/common/file_utils.py +++ b/debug/accuracy_tools/msprobe/core/common/file_utils.py @@ -20,6 +20,9 @@ import stat import json import re import shutil +import sys +import zipfile +import multiprocessing from datetime import datetime, timezone from dateutil import parser import yaml @@ -30,6 +33,8 @@ from msprobe.core.common.log import logger from msprobe.core.common.exceptions import FileCheckException from msprobe.core.common.const import FileCheckConst +proc_lock = multiprocessing.Lock() + class FileChecker: """ @@ -671,3 +676,121 @@ def read_xlsx(file_path): logger.error(f"The xlsx file failed to load. Please check the path: {file_path}.") raise RuntimeError(f"Read xlsx file {file_path} failed.") from e return result_df + + +def create_file_with_list(result_list, filepath): + check_path_before_create(filepath) + filepath = os.path.realpath(filepath) + try: + with FileOpen(filepath, 'w', encoding='utf-8') as file: + fcntl.flock(file, fcntl.LOCK_EX) + for item in result_list: + file.write(item + '\n') + fcntl.flock(file, fcntl.LOCK_UN) + except Exception as e: + logger.error(f'Save list to file "{os.path.basename(filepath)}" failed.') + raise RuntimeError(f"Save list to file {os.path.basename(filepath)} failed.") from e + change_mode(filepath, FileCheckConst.DATA_FILE_AUTHORITY) + + +def create_file_with_content(data, filepath): + check_path_before_create(filepath) + filepath = os.path.realpath(filepath) + try: + with FileOpen(filepath, 'w', encoding='utf-8') as file: + fcntl.flock(file, fcntl.LOCK_EX) + file.write(data) + fcntl.flock(file, fcntl.LOCK_UN) + except Exception as e: + logger.error(f'Save content to file "{os.path.basename(filepath)}" failed.') + raise RuntimeError(f"Save content to file {os.path.basename(filepath)} failed.") from e + change_mode(filepath, FileCheckConst.DATA_FILE_AUTHORITY) + + +def add_file_to_zip(zip_file_path, file_path, arc_path=None): + """ + Add a file to a ZIP archive, if zip does not exist, create one. + + :param zip_file_path: Path to the ZIP archive + :param file_path: Path to the file to add + :param arc_path: Optional path inside the ZIP archive where the file should be added + """ + check_file_suffix(zip_file_path, FileCheckConst.ZIP_SUFFIX) + check_file_size(file_path, FileCheckConst.MAX_FILE_IN_ZIP_SIZE) + zip_size = os.path.getsize(zip_file_path) if os.path.exists(zip_file_path) else 0 + if zip_size + os.path.getsize(file_path) > FileCheckConst.MAX_ZIP_SIZE: + raise RuntimeError(f"ZIP file size exceeds the limit of {FileCheckConst.MAX_ZIP_SIZE} bytes") + check_path_before_create(zip_file_path) + try: + proc_lock.acquire() + with zipfile.ZipFile(zip_file_path, 'a') as zip_file: + zip_file.write(file_path, arc_path) + except Exception as e: + logger.error(f'add file to zip "{os.path.basename(zip_file_path)}" failed.') + raise RuntimeError(f"add file to zip {os.path.basename(zip_file_path)} failed.") from e + finally: + proc_lock.release() + change_mode(zip_file_path, FileCheckConst.DATA_FILE_AUTHORITY) + + +def create_file_in_zip(zip_file_path, file_name, content): + """ + Create a file with content inside a ZIP archive. + + :param zip_file_path: Path to the ZIP archive + :param file_name: Name of the file to create + :param content: Content to write to the file + """ + check_file_suffix(zip_file_path, FileCheckConst.ZIP_SUFFIX) + check_path_before_create(zip_file_path) + zip_size = os.path.getsize(zip_file_path) if os.path.exists(zip_file_path) else 0 + if zip_size + sys.getsizeof(content) > FileCheckConst.MAX_ZIP_SIZE: + raise RuntimeError(f"ZIP file size exceeds the limit of {FileCheckConst.MAX_ZIP_SIZE} bytes") + try: + proc_lock.acquire() + with zipfile.ZipFile(zip_file_path, 'a') as zip_file: + zip_info = zipfile.ZipInfo(file_name) + zip_info.compress_type = zipfile.ZIP_DEFLATED + zip_file.writestr(zip_info, content) + except Exception as e: + logger.error(f'Save content to file "{os.path.basename(zip_file_path)}" failed.') + raise RuntimeError(f"Save content to file {os.path.basename(zip_file_path)} failed.") from e + finally: + proc_lock.release() + change_mode(zip_file_path, FileCheckConst.DATA_FILE_AUTHORITY) + + +def extract_zip(zip_file_path, extract_dir): + """ + Extract the contents of a ZIP archive to a specified directory. + + :param zip_file_path: Path to the ZIP archive + :param extract_dir: Directory to extract the contents to + """ + check_file_suffix(zip_file_path, FileCheckConst.ZIP_SUFFIX) + try: + proc_lock.acquire() + with zipfile.ZipFile(zip_file_path, 'r') as zip_file: + total_size = 0 + if len(zip_file.infolist()) > FileCheckConst.MAX_FILE_IN_ZIP_SIZE: + raise ValueError(f"Too many files in {os.path.basename(zip_file_path)}") + for file_info in zip_file.infolist(): + if file_info.file_size > FileCheckConst.MAX_FILE_IN_ZIP_SIZE: + raise ValueError(f"File {file_info.filename} is too large to extract") + + total_size += file_info.file_size + if total_size > FileCheckConst.MAX_ZIP_SIZE: + raise ValueError(f"Total extracted size exceeds the limit of {FileCheckConst.MAX_ZIP_SIZE} bytes") + except Exception as e: + logger.error(f'Save content to file "{os.path.basename(zip_file_path)}" failed.') + raise RuntimeError(f"Save content to file {os.path.basename(zip_file_path)} failed.") from e + finally: + proc_lock.release() + with zipfile.ZipFile(zip_file_path, 'r') as zip_file: + zip_file.extractall(extract_dir) + + +def split_zip_file_path(zip_file_path): + check_file_suffix(zip_file_path, FileCheckConst.ZIP_SUFFIX) + zip_file_path = os.path.realpath(zip_file_path) + return os.path.dirname(zip_file_path), os.path.basename(zip_file_path) diff --git a/debug/accuracy_tools/msprobe/core/common/utils.py b/debug/accuracy_tools/msprobe/core/common/utils.py index c06b5b64927bf47da1573df3b1d4db34dfa24cb1..e08e40f1ee6198488e1fa0a68b01aa6f7233d0b3 100644 --- a/debug/accuracy_tools/msprobe/core/common/utils.py +++ b/debug/accuracy_tools/msprobe/core/common/utils.py @@ -75,6 +75,7 @@ class MsprobeBaseException(Exception): MERGE_COMPARE_RESULT_ERROR = 33 NAMES_STRUCTS_MATCH_ERROR = 34 INVALID_STATE_ERROR = 35 + INVALID_API_NAME_ERROR = 36 def __init__(self, code, error_info: str = ""): super(MsprobeBaseException, self).__init__() @@ -247,6 +248,10 @@ def md5_find(data): def detect_framework_by_dump_json(file_path): + json_data = load_json(file_path) + framework = json_data.get("framework", None) + if framework in [Const.PT_FRAMEWORK, Const.MS_FRAMEWORK]: + return framework pattern_ms = r'"type":\s*"mindspore' pattern_pt = r'"type":\s*"torch' with FileOpen(file_path, 'r') as file: @@ -279,7 +284,7 @@ def set_dump_path(input_param): npu_path_valid = npu_path is not None and npu_path.endswith("dump.json") bench_path_valid = bench_path is not None and bench_path.endswith("dump.json") if not npu_path_valid or not bench_path_valid: - logger.error(f"Please check the json path is valid. npu_path: {npu_path}, bench_path: {bench_path}") + logger.error(f"Please check the json path is valid and ensure that neither npu_path nor bench_path is None.") raise CompareException(CompareException.INVALID_PATH_ERROR) input_param['npu_dump_data_dir'] = os.path.join(os.path.dirname(npu_path), Const.DUMP_TENSOR_DATA) input_param['bench_dump_data_dir'] = os.path.join(os.path.dirname(bench_path), Const.DUMP_TENSOR_DATA) @@ -424,6 +429,15 @@ def get_real_step_or_rank(step_or_rank_input, obj): return real_step_or_rank +def check_init_step(step): + if not is_int(step): + raise MsprobeException(MsprobeException.INVALID_PARAM_ERROR, + f"{step} must be an integer") + if not step >= 0: + raise MsprobeException(MsprobeException.INVALID_PARAM_ERROR, + f"{step} must be greater than or equal to 0") + + def check_seed_all(seed, mode, rm_dropout): if is_int(seed): if seed < 0 or seed > Const.MAX_SEED_VALUE: @@ -472,13 +486,13 @@ recursion_depth = defaultdict(int) # 装饰一个函数,当函数递归调用超过限制时,抛出异常并打印函数信息。 -def recursion_depth_decorator(func_info): +def recursion_depth_decorator(func_info, max_depth=Const.MAX_DEPTH): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): func_id = id(func) recursion_depth[func_id] += 1 - if recursion_depth[func_id] > Const.MAX_DEPTH: + if recursion_depth[func_id] > max_depth: msg = f"call {func_info} exceeds the recursion limit." logger.error_log_with_exp( msg, diff --git a/debug/accuracy_tools/msprobe/core/compare/acc_compare.py b/debug/accuracy_tools/msprobe/core/compare/acc_compare.py index 55229d72657c67428186bcb233371e3b9eee73e0..29b1bcd10208780fa7b02ada3e66f2f0e81b215c 100644 --- a/debug/accuracy_tools/msprobe/core/compare/acc_compare.py +++ b/debug/accuracy_tools/msprobe/core/compare/acc_compare.py @@ -282,6 +282,8 @@ class Comparator: result = [] bench_ops_all[CompareConst.N_A] = self._generate_na_data(bench_ops_all) for ms_op_name, bench_op_name in self.data_mapping_dict.items(): + check_op_str_pattern_valid(ms_op_name) + check_op_str_pattern_valid(bench_op_name) if ms_op_name in npu_ops_all and bench_op_name in bench_ops_all: npu_stack_info = npu_ops_all.get(ms_op_name).get("stack_info", None) bench_stack_info = bench_ops_all.get(bench_op_name).get("stack_info", None) @@ -311,9 +313,9 @@ class Comparator: ] if self.dump_mode == Const.SUMMARY: - result_item = base_result_item + [" "] * 8 + result_item = base_result_item + [" "] * 8 # 8个统计量数据情况的比对指标 else: - result_item = base_result_item + [" "] * 5 + result_item = base_result_item + [" "] * 6 # 6个真实数据情况的比对指标 npu_summary_data = npu_ops_all.get(ms_op_name).get("summary") result_item.extend(npu_summary_data) @@ -329,8 +331,11 @@ class Comparator: else: result_item.append(CompareConst.NONE) if self.dump_mode == Const.ALL: - result_item.append(npu_ops_all.get(ms_op_name).get("data_name", None)) + ms_data_name = npu_ops_all.get(ms_op_name).get("data_name", None) + pt_data_name = bench_ops_all.get(bench_op_name).get("data_name", None) + result_item.append([ms_data_name, pt_data_name]) result.append(result_item) + logger.info(f"{ms_op_name}, {bench_op_name} compared.") elif ms_op_name not in npu_ops_all: logger.warning(f'Can not find npu op name : `{ms_op_name}` in npu dump json file.') elif bench_op_name not in npu_ops_all: @@ -349,47 +354,48 @@ class Comparator: result_df = self.make_result_table(result) return result_df - def compare_by_op(self, npu_op_name, bench_op_name, op_name_mapping_dict, input_param, bench_data): + def compare_by_op(self, npu_op_name, bench_op_name, op_name_mapping_dict, input_param): """ :param npu_op_name: excel中的NPU_Name,例如:MintFunctional.conv2d.0.forward.input.3.0 :param bench_op_name: excel中的Bench_Name,例如:Functional.conv2d.0.forward.input.3.0 :param op_name_mapping_dict: op_name和npy或pt文件的映射关系 :param input_param: npu_json_path/bench_json_path/stack_json_path等参数 - :param bench_data: bench的dump数据中"data"字段 :return: result_list,包含余弦相似度、最大绝对误差、最大相对误差、千分之一误差率、千分之五误差率和错误信息 - 用于读取excel中的NPU_Name和Bench_Name,根据映射关系找到npy或pt文件,然后读取文件中的数据进行比较,计算余弦相似度、 + 用于读取excel中的NPU_Name和Bench_Name,根据映射关系找到npy或pt文件,然后读取文件中的数据进行比较,计算余弦相似度、欧式距离 最大绝对误差、最大相对误差、千分之一误差率、千分之五误差率并生成错误信息 """ - npu_bench_name_list = op_name_mapping_dict[npu_op_name] - data_name = safe_get_value(npu_bench_name_list, 1, "npu_bench_name_list") error_file, relative_err, error_flag = None, None, False - bench_data_name = get_bench_data_name(bench_op_name, bench_data) - if data_name == '-1' or data_name == -1: # 没有真实数据路径 - n_value, b_value = CompareConst.READ_NONE, CompareConst.READ_NONE - error_flag = True - elif not bench_data_name: + + data_name_pair = op_name_mapping_dict.get(npu_op_name) + npu_data_name = data_name_pair[0] + bench_data_name = data_name_pair[1] + + if str(npu_data_name) == '-1': # 没有npu真实数据 + n_value, b_value, error_flag = CompareConst.READ_NONE, CompareConst.READ_NONE, True + elif str(bench_data_name) == '-1': # 没有bench真实数据 n_value, b_value, error_flag = CompareConst.READ_NONE, CompareConst.READ_NONE, True error_file = 'no_bench_data' else: + npu_dir = input_param.get("npu_dump_data_dir") + bench_dir = input_param.get("bench_dump_data_dir") try: - read_npy_data = getattr(self, "read_npy_data") frame_name = getattr(self, "frame_name") + read_npy_data = getattr(self, "read_npy_data") if frame_name == "MSComparator": - n_value = read_npy_data(input_param.get("npu_dump_data_dir"), npu_op_name + Const.NUMPY_SUFFIX) + n_value = read_npy_data(npu_dir, npu_data_name) if self.cross_frame: - b_value = read_npy_data(input_param.get("bench_dump_data_dir"), bench_data_name, - load_pt_file=True) + b_value = read_npy_data(bench_dir, bench_data_name, load_pt_file=True) else: - b_value = read_npy_data(input_param.get("bench_dump_data_dir"), bench_data_name) + b_value = read_npy_data(bench_dir, bench_data_name) else: - n_value = read_npy_data(input_param.get("npu_dump_data_dir"), npu_op_name + Const.PT_SUFFIX) - b_value = read_npy_data(input_param.get("bench_dump_data_dir"), bench_data_name) + n_value = read_npy_data(npu_dir, npu_data_name) + b_value = read_npy_data(bench_dir, bench_data_name) except IOError as error: error_file = error.filename n_value, b_value = CompareConst.READ_NONE, CompareConst.READ_NONE error_flag = True except (FileCheckException, CompareException): - error_file = data_name + error_file = npu_data_name n_value, b_value = CompareConst.READ_NONE, CompareConst.READ_NONE error_flag = True @@ -427,7 +433,9 @@ class Comparator: logger.info("Please check whether the input data belongs to you. If not, there may be security risks.") file_name = add_time_with_xlsx("compare_result" + suffix) file_path = os.path.join(os.path.realpath(output_path), file_name) - remove_path(file_path) + if os.path.exists(file_path): + logger.warning(f"{file_path} will be recovered") + remove_path(file_path) highlight_dict = {"red_rows": set(), "yellow_rows": set(), "red_lines": [], "yellow_lines": []} npu_json = input_param.get("npu_json_path") @@ -456,21 +464,23 @@ class Comparator: def compare_ops(self, idx, dump_path_dict, result_df, lock, input_param): cos_result = [] + euc_dist_result = [] max_err_result = [] max_relative_err_result = [] - err_mess = [] one_thousand_err_ratio_result = [] five_thousand_err_ratio_result = [] + err_mess = [] + is_print_compare_log = input_param.get("is_print_compare_log") - bench_data = load_json(input_param.get("bench_json_path")).get('data') + for i in range(len(result_df)): npu_op_name = result_df.iloc[i, 0] bench_op_name = result_df.iloc[i, 1] if is_print_compare_log: logger.info("start compare: {}".format(npu_op_name)) - cos_sim, max_abs_err, max_relative_err, one_thousand_err_ratio, five_thousand_err_ratio, err_msg = \ - self.compare_by_op(npu_op_name, bench_op_name, dump_path_dict, input_param, bench_data) + cos_sim, euc_dist, max_abs_err, max_relative_err, one_thousand_err_ratio, five_thousand_err_ratio, err_msg \ + = self.compare_by_op(npu_op_name, bench_op_name, dump_path_dict, input_param) if is_print_compare_log: logger.info( @@ -479,71 +489,30 @@ class Comparator: "five_thousand_err_ratio {}".format(npu_op_name, cos_sim, max_abs_err, max_relative_err, err_msg, one_thousand_err_ratio, five_thousand_err_ratio)) cos_result.append(cos_sim) + euc_dist_result.append(euc_dist) max_err_result.append(max_abs_err) max_relative_err_result.append(max_relative_err) - err_mess.append(err_msg) one_thousand_err_ratio_result.append(one_thousand_err_ratio) five_thousand_err_ratio_result.append(five_thousand_err_ratio) + err_mess.append(err_msg) cr = ComparisonResult( cos_result=cos_result, + euc_dist_result=euc_dist_result, max_err_result=max_err_result, max_relative_err_result=max_relative_err_result, - err_msgs=err_mess, one_thousand_err_ratio_result=one_thousand_err_ratio_result, - five_thousand_err_ratio_result=five_thousand_err_ratio_result + five_thousand_err_ratio_result=five_thousand_err_ratio_result, + err_msgs=err_mess ) return _save_cmp_result(idx, cr, result_df, lock) - def do_multi_process(self, input_parma, result_df): + def do_multi_process(self, input_param, result_df): try: - result_df = _handle_multi_process(self.compare_ops, input_parma, result_df, + result_df = _handle_multi_process(self.compare_ops, input_param, result_df, multiprocessing.Manager().RLock()) return result_df except ValueError as e: logger.error('result dataframe is not found.') raise CompareException(CompareException.INVALID_DATA_ERROR) from e - - -def get_bench_data_name(bench_op_name, bench_data): - bench_name_list = re.split(r'\.(input|output|kwargs|parameters|parameters_grad)\.', bench_op_name) - if len(bench_name_list) > 1 and bench_name_list[1] == Const.PARAMS_GRAD: - bench_data_bundle = bench_data.get(bench_name_list[0] + Const.SEP + bench_name_list[1], {}) - else: - bench_data_bundle = bench_data.get(bench_name_list[0], {}) - if not bench_data_bundle or len(bench_name_list) < 3: - return None - layers = bench_name_list[2].split(Const.SEP) - - def _get(key, container): - if isinstance(container, dict): - return container.get(key) - if isinstance(container, list): - try: - return container[int(key)] - except (ValueError, IndexError): - return None - return None - - def get_by_layer(container, params_grad=False): - data = container - # dump.json中parameters_grad的结构为key:[{}], 如果存在key,有且只有一个列表元素,而op_name中只命名到了key,因此加'0' - if params_grad: - layers.append('0') - for layer in layers: - data = _get(layer, data) - return _get(CompareConst.DATA_NAME.lower(), data) - - if Const.INPUT == bench_name_list[1]: - return get_by_layer(bench_data_bundle.get(Const.INPUT, bench_data_bundle.get(Const.INPUT_ARGS))) - elif Const.KWARGS == bench_name_list[1]: - return get_by_layer(bench_data_bundle.get(Const.INPUT_KWARGS)) - elif Const.OUTPUT == bench_name_list[1]: - return get_by_layer(bench_data_bundle.get(Const.OUTPUT)) - elif Const.PARAMS == bench_name_list[1]: - return get_by_layer(bench_data_bundle.get(Const.PARAMS)) - elif Const.PARAMS_GRAD == bench_name_list[1]: - return get_by_layer(bench_data_bundle, params_grad=True) - else: - return None diff --git a/debug/accuracy_tools/msprobe/core/compare/check.py b/debug/accuracy_tools/msprobe/core/compare/check.py index 653823e20b29b14b6e7ede929f3bd2865bffaa18..9429d7ffa1a3c1feffb0bc68f5cde777e5f8d460 100644 --- a/debug/accuracy_tools/msprobe/core/compare/check.py +++ b/debug/accuracy_tools/msprobe/core/compare/check.py @@ -82,12 +82,8 @@ def check_type_shape_match(npu_struct, bench_struct): f'should both be 2, please check!') raise CompareException(CompareException.INDEX_OUT_OF_BOUNDS_ERROR) from error shape_match = npu_shape == bench_shape - type_match = npu_type == bench_type - if not type_match: - if ([npu_type, bench_type] in CompareConst.MS_TYPE) or ([npu_type, bench_type] in CompareConst.TORCH_TYPE): - type_match = True - else: - type_match = False + type_match = ((npu_type == bench_type) or + any(npu_type in group and bench_type in group for group in CompareConst.DTYPE_MATCH_GROUPS)) struct_match = shape_match and type_match if not struct_match: return False diff --git a/debug/accuracy_tools/msprobe/core/compare/highlight.py b/debug/accuracy_tools/msprobe/core/compare/highlight.py index cf3e1c4c03e9553f5566870b7c5ebe2d890e9774..1983313249f34680a8f25c3a2466d8871fe0a693 100644 --- a/debug/accuracy_tools/msprobe/core/compare/highlight.py +++ b/debug/accuracy_tools/msprobe/core/compare/highlight.py @@ -146,11 +146,13 @@ class HighlightRules: } # 用于比较输入和输出的规则 + # 真实数据检查规则 compare_rules = { "check_order_magnitude": CheckOrderMagnitude(), "check_one_thousand_error": CheckOneThousandErrorRatio(), "check_cosine_similarity": CheckCosineSimilarity() } + # 统计量数据检查规则 summary_compare_rules = { "check_order_magnitude": CheckOrderMagnitude(), "check_max_relative_diff": CheckMaxRelativeDiff(), diff --git a/debug/accuracy_tools/msprobe/core/compare/merge_result/merge_result.py b/debug/accuracy_tools/msprobe/core/compare/merge_result/merge_result.py index b605bd59fca0b2b3a510a7a686caa94383488bd2..44f968b763f7bfbba5aa4047999e658cd7f54198 100644 --- a/debug/accuracy_tools/msprobe/core/compare/merge_result/merge_result.py +++ b/debug/accuracy_tools/msprobe/core/compare/merge_result/merge_result.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -21,7 +21,8 @@ from functools import partial import pandas as pd from tqdm import tqdm -from msprobe.core.common.file_utils import load_yaml, logger, FileChecker, save_excel, read_xlsx, create_directory +from msprobe.core.common.file_utils import load_yaml, logger, FileChecker, save_excel, read_xlsx, create_directory, \ + remove_path from msprobe.core.common.const import FileCheckConst, Const, CompareConst from msprobe.core.common.utils import CompareException, add_time_with_xlsx from msprobe.core.compare.utils import table_value_is_valid @@ -63,6 +64,7 @@ def get_result_path(input_dir): for f in os.listdir(input_dir) if f.endswith(FileCheckConst.XLSX_SUFFIX)] filt_compare_result_path_list = [] for file_path in compare_result_path_list: + FileChecker(file_path, FileCheckConst.FILE, FileCheckConst.READ_ABLE).common_check() file_name = os.path.basename(file_path) if check_compare_result_name(file_name): compare_result_path_checker = FileChecker(file_path, FileCheckConst.FILE, FileCheckConst.READ_ABLE) @@ -329,6 +331,10 @@ def generate_merge_result(all_compare_index_dict_list, all_rank_num_list, all_co for i, df in enumerate(merge_df_list): # merge_df_list中df与compare_index_list中compare_index一一对应 final_result_df_list.append((df, compare_index_list[i])) + + if os.path.exists(output_path): + logger.warning(f"{output_path} will be recovered") + remove_path(output_path) save_excel(output_path, final_result_df_list) logger.info(f"The compare results of the multi-ranks are merged and saved in: {output_path}.") diff --git a/debug/accuracy_tools/msprobe/core/compare/multiprocessing_compute.py b/debug/accuracy_tools/msprobe/core/compare/multiprocessing_compute.py index c2c1461e452f9d2c7f4e0e2803dfe51be2a132c0..71b0f29d64f717adc87b74cf48e891652e9e753f 100644 --- a/debug/accuracy_tools/msprobe/core/compare/multiprocessing_compute.py +++ b/debug/accuracy_tools/msprobe/core/compare/multiprocessing_compute.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -15,14 +15,17 @@ import multiprocessing from dataclasses import dataclass +from functools import partial + import pandas as pd from tqdm import tqdm + from msprobe.core.common.log import logger from msprobe.core.common.utils import CompareException from msprobe.core.common.const import CompareConst -def _handle_multi_process(func, input_parma, result_df, lock): +def _handle_multi_process(func, input_param, result_df, lock): process_num = max(int((multiprocessing.cpu_count() + 1) // 4), 1) op_name_mapping_dict = read_dump_data(result_df) @@ -44,7 +47,7 @@ def _handle_multi_process(func, input_parma, result_df, lock): progress_bar = tqdm(total=len(result_df), desc="API/Module Item Compare Process", unit="row", ncols=100) - def update_progress(size, progress_lock): + def update_progress(size, progress_lock, extra_param=None): with progress_lock: progress_bar.update(size) @@ -52,10 +55,12 @@ def _handle_multi_process(func, input_parma, result_df, lock): idx = df_chunk_size * process_idx chunk_size = len(df_chunk) result = pool.apply_async(func, - args=(idx, op_name_mapping_dict, df_chunk, lock, input_parma), + args=(idx, op_name_mapping_dict, df_chunk, lock, input_param), error_callback=err_call, - callback=update_progress(chunk_size, lock)) + callback=partial(update_progress, chunk_size, lock) + ) results.append(result) + final_results = [r.get() for r in results] pool.close() pool.join() @@ -92,12 +97,12 @@ def _ms_graph_handle_multi_process(func, result_df, mode): def read_dump_data(result_df): try: npu_dump_name_list = result_df.iloc[0:, 0].tolist() - npu_dump_tensor_list = result_df.iloc[0:, -1].tolist() + dump_tensor_pair_list = result_df.iloc[0:, -1].tolist() op_name_mapping_dict = {} for index, _ in enumerate(npu_dump_name_list): npu_dump_name = npu_dump_name_list[index] - npu_dump_tensor = npu_dump_tensor_list[index] - op_name_mapping_dict[npu_dump_name] = [npu_dump_tensor, npu_dump_tensor] + dump_tensor_pair = dump_tensor_pair_list[index] + op_name_mapping_dict[npu_dump_name] = dump_tensor_pair return op_name_mapping_dict except ValueError as e: logger.error('result dataframe is not found.') @@ -110,11 +115,12 @@ def read_dump_data(result_df): @dataclass class ComparisonResult: cos_result: list + euc_dist_result: list max_err_result: list max_relative_err_result: list - err_msgs: list one_thousand_err_ratio_result: list five_thousand_err_ratio_result: list + err_msgs: list def _save_cmp_result(offset, result: ComparisonResult, result_df, lock): @@ -135,15 +141,16 @@ def _save_cmp_result(offset, result: ComparisonResult, result_df, lock): for i, _ in enumerate(result.cos_result): process_index = i + offset result_df.loc[process_index, CompareConst.COSINE] = result.cos_result[i] + result_df.loc[process_index, CompareConst.EUC_DIST] = result.euc_dist_result[i] result_df.loc[process_index, CompareConst.MAX_ABS_ERR] = result.max_err_result[i] result_df.loc[process_index, CompareConst.MAX_RELATIVE_ERR] = result.max_relative_err_result[i] - result_df.loc[process_index, CompareConst.ERROR_MESSAGE] = result.err_msgs[i] - result_df.loc[process_index, CompareConst.ACCURACY] = ( - check_accuracy(result.cos_result[i], result.max_err_result[i])) result_df.loc[process_index, CompareConst.ONE_THOUSANDTH_ERR_RATIO] = ( result.one_thousand_err_ratio_result)[i] result_df.loc[process_index, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] = ( result.five_thousand_err_ratio_result)[i] + result_df.loc[process_index, CompareConst.ACCURACY] = ( + check_accuracy(result.cos_result[i], result.max_err_result[i])) + result_df.loc[process_index, CompareConst.ERROR_MESSAGE] = result.err_msgs[i] return result_df except ValueError as e: logger.error('result dataframe is not found.') diff --git a/debug/accuracy_tools/msprobe/core/compare/npy_compare.py b/debug/accuracy_tools/msprobe/core/compare/npy_compare.py index c551985780cb9b56e32573727f9bf88f274da24e..4103d361fec14284fc38f97e1418e5405e939cd9 100644 --- a/debug/accuracy_tools/msprobe/core/compare/npy_compare.py +++ b/debug/accuracy_tools/msprobe/core/compare/npy_compare.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -70,7 +70,7 @@ def get_error_flag_and_msg(n_value, b_value, error_flag=False, error_file=None): error_flag = True return CompareConst.NONE, CompareConst.NONE, error_flag, err_msg if not n_value.shape: # 判断数据是否为0维张量 - err_msg = (f"This is type of 0-d tensor, can not calculate '{CompareConst.COSINE}', " + err_msg = (f"This is type of 0-d tensor, can not calculate '{CompareConst.COSINE}', '{CompareConst.EUC_DIST}', " f"'{CompareConst.ONE_THOUSANDTH_ERR_RATIO}' and '{CompareConst.FIVE_THOUSANDTHS_ERR_RATIO}'. ") error_flag = False # 0-d tensor 最大绝对误差、最大相对误差仍然支持计算,因此error_flag设置为False,不做统一处理 return n_value, b_value, error_flag, err_msg @@ -168,8 +168,9 @@ def statistics_data_check(result_dict): class TensorComparisonBasic(abc.ABC): """NPU和bench中npy数据的比较模板""" + @abc.abstractmethod - def apply(self, n_value, b_value, relative_err): + def apply(self, n_value, b_value, relative_err, err_msg): raise NotImplementedError @@ -190,6 +191,7 @@ def get_relative_err(n_value, b_value): class GetCosineSimilarity(TensorComparisonBasic): """计算cosine相似度""" + @staticmethod def correct_data(result): if result == CompareConst.NAN: @@ -198,9 +200,9 @@ class GetCosineSimilarity(TensorComparisonBasic): return round(float(result), 6) return result - def apply(self, n_value, b_value, relative_err): - if not n_value.shape: - return CompareConst.UNSUPPORTED, "" + def apply(self, n_value, b_value, relative_err, err_msg): + if "This is type of 0-d tensor" in err_msg: + return CompareConst.UNSUPPORTED, err_msg with np.errstate(divide="ignore", invalid="ignore"): if len(n_value) == 1: @@ -224,9 +226,22 @@ class GetCosineSimilarity(TensorComparisonBasic): return result, "" +class GetEuclideanDistance(TensorComparisonBasic): + """计算欧式距离""" + + def apply(self, n_value, b_value, relative_err, err_msg): + if "This is type of 0-d tensor" in err_msg: + return CompareConst.UNSUPPORTED, err_msg + + distance = np.linalg.norm(n_value - b_value, ord=2) + + return distance, "" + + class GetMaxAbsErr(TensorComparisonBasic): """计算最大绝对误差""" - def apply(self, n_value, b_value, relative_err): + + def apply(self, n_value, b_value, relative_err, err_msg): temp_res = n_value - b_value max_value = np.max(np.abs(temp_res)) if np.isnan(max_value): @@ -237,7 +252,8 @@ class GetMaxAbsErr(TensorComparisonBasic): class GetMaxRelativeErr(TensorComparisonBasic): """计算最大相对误差""" - def apply(self, n_value, b_value, relative_err): + + def apply(self, n_value, b_value, relative_err, err_msg): max_relative_err = np.max(np.abs(relative_err)) if np.isnan(max_relative_err): msg = "Cannot compare by MaxRelativeError, the data contains nan/inf/-inf in dump data." @@ -247,12 +263,13 @@ class GetMaxRelativeErr(TensorComparisonBasic): class GetErrRatio(TensorComparisonBasic): """计算相对误差小于指定阈值(千分之一、千分之五)的比例""" + def __init__(self, threshold): self.threshold = threshold - def apply(self, n_value, b_value, relative_err): - if not n_value.shape: - return CompareConst.UNSUPPORTED, "" + def apply(self, n_value, b_value, relative_err, err_msg): + if "This is type of 0-d tensor" in err_msg: + return CompareConst.UNSUPPORTED, err_msg if not np.size(relative_err): return CompareConst.NAN, "" @@ -264,6 +281,7 @@ class GetErrRatio(TensorComparisonBasic): class CompareOps: compare_ops = { "cosine_similarity": GetCosineSimilarity(), + "euclidean_distance": GetEuclideanDistance(), "max_abs_error": GetMaxAbsErr(), "max_relative_error": GetMaxRelativeErr(), "one_thousand_err_ratio": GetErrRatio(CompareConst.THOUSAND_RATIO_THRESHOLD), @@ -295,7 +313,7 @@ def compare_ops_apply(n_value, b_value, error_flag, err_msg): n_value, b_value = reshape_value(n_value, b_value) for op in CompareOps.compare_ops.values(): - result, msg = op.apply(n_value, b_value, relative_err) + result, msg = op.apply(n_value, b_value, relative_err, err_msg) result_list.append(result) err_msg += msg return result_list, err_msg diff --git a/debug/accuracy_tools/msprobe/core/compare/utils.py b/debug/accuracy_tools/msprobe/core/compare/utils.py index a2edf57e5bb91400675fe01734ea7fbf0e1df893..66dc9ba94ee168f2ea7dbba15f4c76d5e6ef6f13 100644 --- a/debug/accuracy_tools/msprobe/core/compare/utils.py +++ b/debug/accuracy_tools/msprobe/core/compare/utils.py @@ -285,9 +285,9 @@ def result_item_init(n_info, b_info, dump_mode): md5_compare_result = CompareConst.PASS if n_info.struct[2] == b_info.struct[2] else CompareConst.DIFF result_item.extend([n_info.struct[2], b_info.struct[2], md5_compare_result]) elif dump_mode == Const.SUMMARY: - result_item.extend([" "] * 8) + result_item.extend([" "] * 8) # 8个统计量数据情况的比对指标 else: - result_item.extend([" "] * 5) + result_item.extend([" "] * 6) # 6个真实数据情况的比对指标 else: err_msg = "index out of bounds error will occur in result_item_init, please check!\n" \ f"npu_info_struct is {n_info.struct}\n" \ @@ -321,8 +321,8 @@ def get_accuracy(result, n_dict, b_dict, dump_mode): has_stack = npu_stack_info and bench_stack_info if dump_mode == Const.ALL: - npu_data_name = n_dict.get("data_name", None) - bench_data_name = b_dict.get("data_name", None) + npu_data_name_list = n_dict.get("data_name", None) + bench_data_name_list = b_dict.get("data_name", None) for index in range(min_len): n_name = safe_get_value(n_dict, n_start + index, "n_dict", key="op_name") @@ -353,7 +353,9 @@ def get_accuracy(result, n_dict, b_dict, dump_mode): result_item.append(err_msg) result_item = stack_column_process(result_item, has_stack, index, key, npu_stack_info) if dump_mode == Const.ALL: - result_item.append(safe_get_value(npu_data_name, n_start + index, "npu_data_name")) + npu_data_name = safe_get_value(npu_data_name_list, n_start + index, "npu_data_name_list") + bench_data_name = safe_get_value(bench_data_name_list, b_start + index, "bench_data_name_list") + result_item.append([npu_data_name, bench_data_name]) result.append(result_item) @@ -371,7 +373,7 @@ def get_accuracy(result, n_dict, b_dict, dump_mode): continue result_item = [ n_name, CompareConst.NAN, n_struct[0], CompareConst.NAN, n_struct[1], CompareConst.NAN, - " ", " ", " ", " ", " " + " ", " ", " ", " ", " ", " " ] summary_data = n_dict.get(CompareConst.SUMMARY)[n_start + index] result_item.extend(summary_data) @@ -388,7 +390,8 @@ def get_accuracy(result, n_dict, b_dict, dump_mode): result_item.append(err_msg) result_item = stack_column_process(result_item, has_stack, index, key, npu_stack_info) if dump_mode == Const.ALL: - result_item.append(safe_get_value(npu_data_name, n_start + index, "npu_data_name")) + npu_data_name = safe_get_value(npu_data_name_list, n_start + index, "npu_data_name_list") + result_item.append([npu_data_name, "-1"]) result.append(result_item) @@ -453,9 +456,9 @@ def get_un_match_accuracy(result, n_dict, dump_mode): result.append(result_item) continue if dump_mode == Const.SUMMARY: - result_item.extend([CompareConst.N_A] * 8) + result_item.extend([CompareConst.N_A] * 8) # 8个统计量数据情况的比对指标 if dump_mode == Const.ALL: - result_item.extend([CompareConst.N_A] * 5) + result_item.extend([CompareConst.N_A] * 6) # 6个真实数据情况的比对指标 npu_summary_data = safe_get_value(summary_reorder, index, "summary_reorder") bench_summary_data = [CompareConst.N_A] * 4 @@ -467,7 +470,7 @@ def get_un_match_accuracy(result, n_dict, dump_mode): result_item.append(err_msg) append_stack_info(result_item, npu_stack_info, index) if dump_mode == Const.ALL and result_item[1] == CompareConst.N_A: - result_item.extend(["-1"]) + result_item.extend([["-1", "-1"]]) result.append(result_item) @@ -542,10 +545,17 @@ def get_name_and_state(name): state type: input, output, kwargs, parameters, parameters_grad """ + if not isinstance(name, str): + logger.error(f'Invalid name: {name}, type should be string, please check.') + raise CompareException(CompareException.INVALID_API_NAME_ERROR) + if Const.PARAMS_GRAD in name.split(Const.SEP): return name.split(Const.PARAMS_GRAD)[0], Const.PARAMS_GRAD split = re.split(Const.REGEX_FORWARD_BACKWARD, name) + if len(split) < 3: + logger.error(f'Invalid name string: {name}, can not be split by forward/backward, please check.') + raise CompareException(CompareException.INVALID_API_NAME_ERROR) api = f'{split[0]}.{split[1]}.' state_str = split[2] match = re.match(r'^(\d+\.)?(input|output|kwargs|parameters)\..+$', state_str) diff --git a/debug/accuracy_tools/msprobe/core/data_dump/api_registry.py b/debug/accuracy_tools/msprobe/core/data_dump/api_registry.py new file mode 100644 index 0000000000000000000000000000000000000000..1bef962232e47bc1eed399093e6812baa8f18f9c --- /dev/null +++ b/debug/accuracy_tools/msprobe/core/data_dump/api_registry.py @@ -0,0 +1,176 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Dict, Any, Optional, Callable, Union, List, Tuple + +from msprobe.core.common.const import Const +from msprobe.core.common.file_utils import load_yaml + + +def _get_attr(module, attr_name): + if Const.SEP in attr_name: + sub_module_name, sub_attr = attr_name.rsplit(Const.SEP, 1) + sub_module = getattr(module, sub_module_name, None) + attr = getattr(sub_module, sub_attr, None) + else: + attr = getattr(module, attr_name, None) + return attr + + +class ApiWrapper: + def __init__( + self, api_types: Dict[str, Dict[str, Any]], + api_list_paths: Union[str, List[str], Tuple[str]] + ): + self.api_types = api_types + if not isinstance(api_list_paths, (list, tuple)): + api_list_paths = [api_list_paths] * len(self.api_types) + elif len(api_list_paths) != len(self.api_types): + raise RuntimeError("The number of api_list_paths must be equal to the number of frameworks in 'api_types', " + "when api_list_paths is a list or tuple.") + self.api_list_paths = api_list_paths + self.api_names = self._get_api_names() + self.wrapped_api_functions = dict() + + def wrap_api( + self, api_templates, hook_build_func: Optional[Callable] + ): + api_types_num = sum([len(v) for v in self.api_types.values()]) + if not isinstance(api_templates, (list, tuple)): + api_templates = [api_templates] * api_types_num + elif len(api_templates) != api_types_num: + raise RuntimeError("The number of api_templates must be equal to the number of api_types, " + "when api_templates is a list or tuple.") + + self.wrapped_api_functions.clear() + index = 0 + for framework, api_types in self.api_types.items(): + wrapped_functions_in_framework = dict() + for api_type, api_modules in api_types.items(): + wrapped_functions = dict() + name_prefix = Const.API_DATA_PREFIX.get(framework, {}).get(api_type, "API") + api_template = api_templates[index] + index += 1 + for api_name in self.api_names.get(framework, {}).get(api_type, []): + ori_api = _get_attr(api_modules[0], api_name) + if callable(ori_api): + def wrap_api_func(api_name, api_func, prefix, hook_build_func, api_template): + def api_function(*args, **kwargs): + return api_template(api_name, api_func, prefix, hook_build_func)(*args, **kwargs) + api_function.__name__ = api_name + return api_function + wrapped_functions[api_name] = wrap_api_func(api_name, ori_api, name_prefix, + hook_build_func, api_template) + wrapped_functions_in_framework[api_type] = wrapped_functions + self.wrapped_api_functions[framework] = wrapped_functions_in_framework + return self.wrapped_api_functions + + def _get_api_names(self): + api_names = dict() + + for index, framework in enumerate(self.api_types.keys()): + api_list = load_yaml(self.api_list_paths[index]) + valid_names = dict() + for api_type, api_modules in self.api_types.get(framework, {}).items(): + api_from_file = api_list.get(Const.SUPPORT_API_DICT_KEY_MAP.get(framework, {}).get(api_type), []) + names = set() + for api_name in api_from_file: + target_attr = api_name + target_module = api_modules[0] + if Const.SEP in api_name: + sub_module_name, target_attr = api_name.rsplit(Const.SEP, 1) + target_module = getattr(api_modules[0], sub_module_name, None) + if target_module and target_attr in dir(target_module): + names.add(api_name) + valid_names[api_type] = names + api_names[framework] = valid_names + + return api_names + + +class ApiRegistry: + """ + Base class for api registry. + """ + + def __init__(self, api_types, inner_used_api, supported_api_list_path, api_templates): + self.ori_api_attr = dict() + self.wrapped_api_attr = dict() + self.inner_used_ori_attr = dict() + self.inner_used_wrapped_attr = dict() + self.api_types = api_types + self.inner_used_api = inner_used_api + self.supported_api_list_path = supported_api_list_path + self.api_templates = api_templates + + @staticmethod + def store_ori_attr(ori_api_group, api_list, api_ori_attr): + for api in api_list: + api_ori_attr[api] = _get_attr(ori_api_group, api) + + @staticmethod + def set_api_attr(api_group, attr_dict): + for api, api_attr in attr_dict.items(): + if Const.SEP in api: + sub_module_name, sub_op = api.rsplit(Const.SEP, 1) + sub_module = getattr(api_group, sub_module_name, None) + if sub_module is not None: + setattr(sub_module, sub_op, api_attr) + else: + setattr(api_group, api, api_attr) + + def register_all_api(self): + for framework, api_types in self.api_types.items(): + for api_type, api_modules in api_types.items(): + api_type_with_framework = framework + Const.SEP + api_type + for module in api_modules[1]: + self.set_api_attr(module, self.wrapped_api_attr.get(api_type_with_framework, {})) + + def register_inner_used_api(self): + for api_type in self.inner_used_api.keys(): + self.set_api_attr(self.inner_used_api.get(api_type)[0], self.inner_used_wrapped_attr.get(api_type, {})) + + def restore_all_api(self): + for framework, api_types in self.api_types.items(): + for api_type, api_modules in api_types.items(): + api_type_with_framework = framework + Const.SEP + api_type + for module in api_modules[1]: + self.set_api_attr(module, self.ori_api_attr.get(api_type_with_framework, {})) + + def restore_inner_used_api(self): + for api_type in self.inner_used_api.keys(): + self.set_api_attr(self.inner_used_api.get(api_type)[0], self.inner_used_ori_attr.get(api_type, {})) + + def initialize_hook(self, hook_build_func): + api_wrapper = ApiWrapper(self.api_types, self.supported_api_list_path) + wrapped_api_functions = api_wrapper.wrap_api(self.api_templates, hook_build_func) + + for framework, api_types in self.api_types.items(): + for api_type, api_modules in api_types.items(): + ori_attr = dict() + self.store_ori_attr(api_modules[0], api_wrapper.api_names.get(framework).get(api_type), ori_attr) + api_type_with_framework = framework + Const.SEP + api_type + self.ori_api_attr[api_type_with_framework] = ori_attr + self.wrapped_api_attr[api_type_with_framework] = wrapped_api_functions.get(framework).get(api_type) + + for inner_used_api_type, inner_used_api_list in self.inner_used_api.items(): + ori_attr = dict() + wrapped_attr = dict() + for api_name in inner_used_api_list[1:]: + if self.ori_api_attr.get(inner_used_api_type, {}).get(api_name): + ori_attr[api_name] = self.ori_api_attr.get(inner_used_api_type).get(api_name) + wrapped_attr[api_name] = self.wrapped_api_attr.get(inner_used_api_type).get(api_name) + self.inner_used_ori_attr[inner_used_api_type] = ori_attr + self.inner_used_wrapped_attr[inner_used_api_type] = wrapped_attr diff --git a/debug/accuracy_tools/msprobe/core/data_dump/data_collector.py b/debug/accuracy_tools/msprobe/core/data_dump/data_collector.py index 20e4489f89e4bd345595e6a1db1e39ab427d4908..622c441a27db9fa7a88eb23265c6176163d2aa92 100644 --- a/debug/accuracy_tools/msprobe/core/data_dump/data_collector.py +++ b/debug/accuracy_tools/msprobe/core/data_dump/data_collector.py @@ -213,9 +213,6 @@ class DataCollector: data_info = self.data_processor.analyze_params(grad_name, param_name, data) self.handle_data(grad_name, data_info, flush=self.data_processor.is_terminated) - def fill_stack_tensor_data(self): - self.data_writer.fill_stack_tensor_data() - def debug_data_collect_forward(self, variable, name_with_count): data_info = self.data_processor.analyze_debug_forward(variable, name_with_count) diff --git a/debug/accuracy_tools/msprobe/core/data_dump/data_processor/base.py b/debug/accuracy_tools/msprobe/core/data_dump/data_processor/base.py index 775a80b2418ef356867228b4ca09fad8c86cce25..962dc527c595b11e433ea2472f2fbd8280e5f2d1 100644 --- a/debug/accuracy_tools/msprobe/core/data_dump/data_processor/base.py +++ b/debug/accuracy_tools/msprobe/core/data_dump/data_processor/base.py @@ -79,12 +79,11 @@ class ModuleBackwardOutputs: class TensorStatInfo: - def __init__(self, max_val=None, min_val=None, mean_val=None, norm_val=None, stack_tensor_stat=None): + def __init__(self, max_val=None, min_val=None, mean_val=None, norm_val=None): self.max = max_val self.min = min_val self.mean = mean_val self.norm = norm_val - self.stack_tensor_stat = stack_tensor_stat class BaseDataProcessor: diff --git a/debug/accuracy_tools/msprobe/core/data_dump/data_processor/mindspore_processor.py b/debug/accuracy_tools/msprobe/core/data_dump/data_processor/mindspore_processor.py index 8c4542a1917b76809aad21971e148ec17bd6045e..b2d8c6111305713125573a6bc82d373bb4508657 100644 --- a/debug/accuracy_tools/msprobe/core/data_dump/data_processor/mindspore_processor.py +++ b/debug/accuracy_tools/msprobe/core/data_dump/data_processor/mindspore_processor.py @@ -26,7 +26,7 @@ from msprobe.core.data_dump.data_processor.base import (BaseDataProcessor, Tenso from msprobe.core.common.file_utils import path_len_exceeds_limit, save_npy from msprobe.mindspore.common.utils import convert_bf16_to_fp32, save_tensor_as_npy from msprobe.mindspore.common.log import logger -from msprobe.mindspore.dump.hook_cell.api_registry import api_register +from msprobe.mindspore.dump.hook_cell.api_register import get_api_register has_adump = True try: @@ -44,6 +44,7 @@ class MindsporeDataProcessor(BaseDataProcessor): "dtype": self.analyze_dtype_in_kwargs } self._async_dump_cache = {} + self.api_register = get_api_register() @staticmethod def get_md5_for_tensor(x): @@ -60,11 +61,10 @@ class MindsporeDataProcessor(BaseDataProcessor): def get_stat_info_sync(data): tensor_stat = TensorStatInfo() if data.dtype == ms.bool_: - data_np = data.asnumpy() - tensor_stat.max = np.max(data_np).item() - tensor_stat.min = np.min(data_np).item() + tensor_stat.max = mint.any(data) + tensor_stat.min = mint.all(data) elif not data.shape: - tensor_stat.max = tensor_stat.min = tensor_stat.mean = tensor_stat.norm = data.item() + tensor_stat.max = tensor_stat.min = tensor_stat.mean = tensor_stat.norm = data elif data.dtype == ms.complex64 or data.dtype == ms.complex128: data_abs = np.abs(data.asnumpy()) tensor_stat.max = np.max(data_abs).item() @@ -74,46 +74,32 @@ class MindsporeDataProcessor(BaseDataProcessor): else: if not ops.is_floating_point(data) or data.dtype == ms.float64: data = data.to(ms.float32) - api_register.norm_inner_op_set_ori_func() - get_max_value = api_register.mint_ops_ori_attr.get("max", mint.max) - get_min_value = api_register.mint_ops_ori_attr.get("min", mint.min) - get_mean_value = api_register.mint_ops_ori_attr.get("mean", mint.mean) - if hasattr(mint, "norm"): - get_norm_value = api_register.mint_ops_ori_attr.get("norm", mint.norm) - else: - get_norm_value = api_register.functional_ori_attr.get("norm", ops.norm) - tensor_stat.max = get_max_value(data).item() - tensor_stat.min = get_min_value(data).item() - tensor_stat.mean = get_mean_value(data).item() - tensor_stat.norm = get_norm_value(data).item() - api_register.norm_inner_op_set_hook_func() + get_norm_value = mint.norm if hasattr(mint, "norm") else ops.norm + tensor_stat.max = mint.max(data) + tensor_stat.min = mint.min(data) + tensor_stat.mean = mint.mean(data) + tensor_stat.norm = get_norm_value(data) return tensor_stat @staticmethod def get_stat_info_async(data): tensor_stat = TensorStatInfo() - stack_method = api_register.functional_ori_attr.get("stack", ms.ops.stack) - if data.dtype == ms.complex64 or data.dtype == ms.complex128: + if data.dtype == ms.bool_: + tensor_stat.max = mint.any(data) + tensor_stat.min = mint.all(data) + elif not data.shape: + tensor_stat.max = tensor_stat.min = tensor_stat.mean = tensor_stat.norm = data + elif data.dtype == ms.complex64 or data.dtype == ms.complex128: logger.warning("Async dump do not support complex data!") return tensor_stat - elif data.dtype == ms.bool_: - tensor_stat.stack_tensor_stat = (["Max", "Min"], stack_method([data.any(), data.all()])) - elif not data.shape: - tensor_stat.stack_tensor_stat = (["Max", "Min", "Mean", "Norm"], stack_method([data, data, data, data])) else: if not ops.is_floating_point(data) or data.dtype == ms.float64: data = data.to(ms.float32) - api_register.norm_inner_op_set_ori_func() - get_max_value = api_register.mint_ops_ori_attr.get("max", mint.max) - get_min_value = api_register.mint_ops_ori_attr.get("min", mint.min) - get_mean_value = api_register.mint_ops_ori_attr.get("mean", mint.mean) - if hasattr(mint, "norm"): - get_norm_value = api_register.mint_ops_ori_attr.get("norm", mint.norm) - else: - get_norm_value = api_register.functional_ori_attr.get("norm", ops.norm) - tensor_stat.stack_tensor_stat = (["Max", "Min", "Mean", "Norm"], stack_method( - [get_max_value(data), get_min_value(data), get_mean_value(data), get_norm_value(data)])) - api_register.norm_inner_op_set_hook_func() + get_norm_value = mint.norm if hasattr(mint, "norm") else ops.norm + tensor_stat.max = mint.max(data) + tensor_stat.min = mint.min(data) + tensor_stat.mean = mint.mean(data) + tensor_stat.norm = get_norm_value(data) return tensor_stat @staticmethod @@ -125,14 +111,17 @@ class MindsporeDataProcessor(BaseDataProcessor): return super().get_special_types() + cls.mindspore_special_type def get_stat_info(self, data): + self.api_register.restore_inner_used_api() tensor_stat = TensorStatInfo() if data.numel() == 0: - return tensor_stat + stat_info = tensor_stat else: if self.config.async_dump: - return MindsporeDataProcessor.get_stat_info_async(data) + stat_info = MindsporeDataProcessor.get_stat_info_async(data) else: - return MindsporeDataProcessor.get_stat_info_sync(data) + stat_info = MindsporeDataProcessor.get_stat_info_sync(data) + self.api_register.register_inner_used_api() + return stat_info def analyze_single_element(self, element, suffix_stack): if suffix_stack and suffix_stack[-1] in self.mindspore_object_key: @@ -159,13 +148,18 @@ class MindsporeDataProcessor(BaseDataProcessor): 'shape': tensor.shape } - if tensor_stat.stack_tensor_stat is None: - tensor_json.update({'Max': self.transfer_type(tensor_stat.max)}) - tensor_json.update({'Min': self.transfer_type(tensor_stat.min)}) - tensor_json.update({'Mean': self.transfer_type(tensor_stat.mean)}) - tensor_json.update({'Norm': self.transfer_type(tensor_stat.norm)}) - else: - tensor_json.update({'tensor_stat': tensor_stat.stack_tensor_stat}) + # 将统计值存入全局 buffer,并返回占位索引 + stat_values = [ + tensor_stat.max, + tensor_stat.min, + tensor_stat.mean, + tensor_stat.norm + ] + + placeholder_index = self.data_writer.append_stat_to_buffer(stat_values) + + tensor_json.update({Const.TENSOR_STAT_INDEX: placeholder_index}) + if self.config.summary_mode == Const.MD5 and not self.config.async_dump: tensor_md5 = self.get_md5_for_tensor(tensor) tensor_json.update({Const.MD5: tensor_md5}) @@ -191,7 +185,7 @@ class TensorDataProcessor(MindsporeDataProcessor): else: save_tensor_as_npy(tensor, file_path) return single_arg - + def _analyze_numpy(self, ndarray, suffix): dump_data_name, file_path = self.get_save_file_path(suffix) save_npy(ndarray, file_path) @@ -262,11 +256,20 @@ class OverflowCheckDataProcessor(MindsporeDataProcessor): self.cached_tensors_and_file_paths = {} def _analyze_maybe_overflow_tensor(self, tensor_json): - if tensor_json['Max'] is None: + tensor_stat_index = tensor_json.get(Const.TENSOR_STAT_INDEX) + if tensor_stat_index is None: + logger.warning("tensor_stat_index does not exist in tensor_json.") return - if np.isinf(tensor_json['Max']) or np.isnan(tensor_json['Max']): + max_tensor = self.data_writer.get_buffer_values_max(tensor_stat_index) + min_tensor = self.data_writer.get_buffer_values_min(tensor_stat_index) + if max_tensor is None or min_tensor is None: + return + + if mint.isinf(max_tensor) or mint.isnan(max_tensor): self.has_overflow = True - if np.isinf(tensor_json['Min']) or np.isnan(tensor_json['Min']): + return + + if mint.isinf(min_tensor) or mint.isnan(min_tensor): self.has_overflow = True def _analyze_tensor(self, tensor, suffix): diff --git a/debug/accuracy_tools/msprobe/core/data_dump/data_processor/pytorch_processor.py b/debug/accuracy_tools/msprobe/core/data_dump/data_processor/pytorch_processor.py index 64253aa4260cab608e5ca84a5d006b28b94a33ab..a54dbe60d4092dcaf7684173d4669686bca6fe24 100644 --- a/debug/accuracy_tools/msprobe/core/data_dump/data_processor/pytorch_processor.py +++ b/debug/accuracy_tools/msprobe/core/data_dump/data_processor/pytorch_processor.py @@ -24,14 +24,14 @@ from torch import distributed as dist from torch.distributed.distributed_c10d import _get_default_group from msprobe.core.common.const import Const +from msprobe.core.common.exceptions import MsprobeException from msprobe.core.common.file_utils import path_len_exceeds_limit from msprobe.core.common.log import logger -from msprobe.core.common.utils import convert_tuple +from msprobe.core.common.utils import convert_tuple, recursion_depth_decorator from msprobe.core.data_dump.data_processor.base import BaseDataProcessor, ModuleBackwardInputsOutputs, \ ModuleForwardInputsOutputs, TensorStatInfo -from msprobe.pytorch.common.utils import save_pt, load_pt +from msprobe.pytorch.common.utils import Const as PtConst, save_pt, is_hifloat8_tensor, is_float8_tensor from msprobe.pytorch.free_benchmark import FreeBenchmarkCheck, UnequalRow -from msprobe.core.common.utils import recursion_depth_decorator is_gpu = False try: @@ -92,6 +92,7 @@ class PytorchDataProcessor(BaseDataProcessor): def analyze_dtype_in_kwargs(element): return {"type": "torch.dtype", "value": str(element)} + @staticmethod def get_stat_info_async(data): tensor_stat = TensorStatInfo() @@ -99,19 +100,17 @@ class PytorchDataProcessor(BaseDataProcessor): logger.warning("Async dump do not support complex data!") return tensor_stat elif data.dtype == torch.bool: - tensor_stat.stack_tensor_stat = (["Max", "Min"], torch.stack( - [torch.any(data), torch.all(data)])) + tensor_stat.max = torch.any(data) + tensor_stat.min = torch.all(data) elif not data.shape: - tensor_stat.stack_tensor_stat = (["Max", "Min", "Mean", "Norm"], torch.stack([data, data, data, data])) + tensor_stat.max = tensor_stat.min = tensor_stat.mean = tensor_stat.norm = data else: - if not data.is_floating_point() or data.dtype == torch.float64: + if data.dtype == torch.float64 or not data.is_floating_point(): data = data.float() - tensor_stat.stack_tensor_stat = (["Max", "Min", "Mean", "Norm"], torch.stack([ - torch.max(data), - torch.min(data), - torch.mean(data), - torch.norm(data) - ])) + tensor_stat.max = torch.max(data) + tensor_stat.min = torch.min(data) + tensor_stat.mean = torch.mean(data) + tensor_stat.norm = torch.norm(data) return tensor_stat @staticmethod @@ -124,17 +123,17 @@ class PytorchDataProcessor(BaseDataProcessor): tensor_stat.min = np.min(data_abs).item() tensor_stat.mean = np.mean(data_abs).item() elif data.dtype == torch.bool: - tensor_stat.max = torch.any(data).item() - tensor_stat.min = torch.all(data).item() + tensor_stat.max = torch.any(data) + tensor_stat.min = torch.all(data) elif not data.shape: - tensor_stat.max = tensor_stat.min = tensor_stat.mean = tensor_stat.norm = data.item() + tensor_stat.max = tensor_stat.min = tensor_stat.mean = tensor_stat.norm = data else: - if not data.is_floating_point() or data.dtype == torch.float64: + if data.dtype == torch.float64 or not data.is_floating_point(): data = data.float() - tensor_stat.max = torch.max(data).item() - tensor_stat.min = torch.min(data).item() - tensor_stat.mean = torch.mean(data).item() - tensor_stat.norm = torch.norm(data).item() + tensor_stat.max = torch.max(data) + tensor_stat.min = torch.min(data) + tensor_stat.mean = torch.mean(data) + tensor_stat.norm = torch.norm(data) return tensor_stat @staticmethod @@ -143,7 +142,7 @@ class PytorchDataProcessor(BaseDataProcessor): if data.is_meta: return tensor_stat data_clone = data.detach() - if data_clone.numel() == 0: + if not data_clone.numel() or not data_clone.data_ptr(): return tensor_stat else: if data_clone.device.type == Const.CPU_LOWERCASE or not async_dump: @@ -214,6 +213,18 @@ class PytorchDataProcessor(BaseDataProcessor): logger.warning(f"Failed to get value of torch.distributed.ReduceOp with error info: {e}.") return {"type": "torch.distributed.ReduceOp", "value": op_type} + @staticmethod + def _cast_to_float_if_fp8(tensor): + dtype = str(tensor.dtype) + if is_float8_tensor(tensor): + dtype = PtConst.HIFLOAT8_TYPE if is_hifloat8_tensor(tensor) else dtype + logger.debug( + f"The {dtype} tensor analyzing/saving is unsupported in dump function." + f"Casting to float for processing." + ) + tensor = tensor.float() + return tensor, dtype + @classmethod def get_special_types(cls): return super().get_special_types() + cls.pytorch_special_type @@ -228,7 +239,7 @@ class PytorchDataProcessor(BaseDataProcessor): if isinstance(element, dist.ProcessGroup): return self._analyze_process_group(element) if isinstance(element, dist.P2POp): - return self._analyze_p2pop(element) + return self._analyze_p2pop(element, Const.SEP.join([str(suffix) for suffix in suffix_stack])) if isinstance(element, dist.ReduceOp): return self._analyze_reduce_op(element) converted_numpy, numpy_type = self._convert_numpy_to_builtin(element) @@ -247,10 +258,10 @@ class PytorchDataProcessor(BaseDataProcessor): module_input_output.update_output_with_args_and_kwargs() return super().analyze_forward_output(name, module, module_input_output) - def _analyze_p2pop(self, arg): + def _analyze_p2pop(self, arg, suffix): p2pop_info = {"class_type": "torch.distributed.P2POp"} try: - tensor_info = self._analyze_tensor(arg.tensor, []) + tensor_info = self._analyze_tensor(arg.tensor, suffix) p2pop_info.update({"tensor": tensor_info}) p2pop_info.update({"op": arg.op.__name__}) p2pop_info.update({"peer": arg.peer}) @@ -263,27 +274,23 @@ class PytorchDataProcessor(BaseDataProcessor): return p2pop_info def _analyze_tensor(self, tensor, suffix): + tensor, dtype = self._cast_to_float_if_fp8(tensor) tensor_stat = self.get_stat_info(tensor, self.config.async_dump) tensor_json = {} tensor_json.update({'type': 'torch.Tensor'}) - tensor_json.update({'dtype': str(tensor.dtype)}) + tensor_json.update({'dtype': dtype}) tensor_json.update({"shape": tensor.shape}) - if tensor_stat.stack_tensor_stat is None: - tensor_json.update({"Max": tensor_stat.max}) - tensor_json.update({"Min": tensor_stat.min}) - tensor_json.update({"Mean": tensor_stat.mean}) - tensor_json.update({"Norm": tensor_stat.norm}) - tensor_json.update({"requires_grad": tensor.requires_grad}) - if tensor_stat.max is not None: - if np.isinf(tensor_stat.max) or np.isnan(tensor_stat.max): - tensor_json['Max_except_inf_nan'] = self.handle_tensor_extremum_nan_inf(tensor, "max") - if tensor_stat.min is not None: - if np.isinf(tensor_stat.min) or np.isnan(tensor_stat.min): - tensor_json['Min_except_inf_nan'] = self.handle_tensor_extremum_nan_inf(tensor, "min") - else: - tensor_json.update({"requires_grad": tensor.requires_grad}) - tensor_json.update({"tensor_stat": tensor_stat.stack_tensor_stat}) + stat_values = [ + tensor_stat.max, + tensor_stat.min, + tensor_stat.mean, + tensor_stat.norm + ] + placeholder_index = self.data_writer.append_stat_to_buffer(stat_values) + + tensor_json.update({Const.TENSOR_STAT_INDEX: placeholder_index}) + tensor_json.update({"requires_grad": tensor.requires_grad}) if self.config.summary_mode == Const.MD5 and not self.config.async_dump: tensor_md5 = self.get_md5_for_tensor(tensor) @@ -305,13 +312,14 @@ class TensorDataProcessor(PytorchDataProcessor): dump_data_name, file_path = self.get_save_file_path(suffix) single_arg = super()._analyze_tensor(tensor, suffix) single_arg.update({"data_name": dump_data_name}) + tensor, _ = self._cast_to_float_if_fp8(tensor) if self.config.async_dump: self._async_dump_cache[file_path] = tensor.clone().detach() else: saved_tensor = tensor.clone().contiguous().detach() save_pt(saved_tensor, file_path) return single_arg - + def _analyze_numpy(self, ndarray, suffix): dump_data_name, file_path = self.get_save_file_path(suffix) save_pt(torch.tensor(ndarray), file_path) @@ -383,7 +391,8 @@ class OverflowCheckDataProcessor(PytorchDataProcessor): self._analyze_maybe_overflow_flag() if self.has_overflow: for file_path, tensor in self.cached_tensors_and_file_paths.items(): - save_pt(tensor, file_path) + tensor, _ = self._cast_to_float_if_fp8(tensor) + save_pt(tensor.clone().contiguous().detach(), file_path) self.real_overflow_nums += 1 if self.overflow_nums != -1 and self.real_overflow_nums >= self.overflow_nums: logger.info(f"[{Const.TOOL_NAME}] Reached the preset overflow times, " @@ -409,10 +418,22 @@ class OverflowCheckDataProcessor(PytorchDataProcessor): raise RuntimeError(f"overflow check failed") from e def _analyze_maybe_overflow_tensor(self, tensor_json): - if tensor_json['Max'] is None or tensor_json['Min'] is None: + tensor_stat_index = tensor_json.get(Const.TENSOR_STAT_INDEX) + if tensor_stat_index is None: + logger.warning("tensor_stat_index does not exist in tensor_json.") return - self.has_overflow = np.isinf(tensor_json['Max']) or np.isnan(tensor_json['Max']) or \ - np.isinf(tensor_json['Min']) or np.isnan(tensor_json['Min']) + max_tensor = self.data_writer.get_buffer_values_max(tensor_stat_index) + min_tensor = self.data_writer.get_buffer_values_min(tensor_stat_index) + + if max_tensor is None or min_tensor is None: + return + + if torch.isinf(max_tensor) or torch.isnan(max_tensor): + self.has_overflow = True + return + + if torch.isinf(min_tensor) or torch.isnan(min_tensor): + self.has_overflow = True def _analyze_tensor(self, tensor, suffix): dump_data_name, file_path = self.get_save_file_path(suffix) @@ -508,11 +529,13 @@ class KernelDumpDataProcessor(PytorchDataProcessor): return if self.config.is_backward_kernel_dump: - self.forward_args = self.clone_and_detach_tensor(module_input_output.args) - self.forward_kwargs = self.clone_and_detach_tensor(module_input_output.kwargs) try: + self.forward_args = self.clone_and_detach_tensor(module_input_output.args) + self.forward_kwargs = self.clone_and_detach_tensor(module_input_output.kwargs) output = module.forward(*self.forward_args, **self.forward_kwargs) - except Exception: + except Exception as e: + if isinstance(e, MsprobeException): + logger.warning(str(e)) self._print_unsupported_log(name) self.enable_kernel_dump = False return @@ -557,6 +580,11 @@ class KernelDumpDataProcessor(PytorchDataProcessor): @recursion_depth_decorator("KernelDump: KernelDumpDataProcessor.clone_and_detach_tensor") def clone_and_detach_tensor(self, input_params): if isinstance(input_params, torch.Tensor): + if is_float8_tensor(input_params): + raise MsprobeException( + MsprobeException.UNSUPPORTED_TYPE_ERROR, + f"L2 backward dump does not support float8 type." + ) if input_params.requires_grad: return input_params.clone().detach().requires_grad_() return input_params.clone() @@ -571,6 +599,8 @@ class KernelDumpDataProcessor(PytorchDataProcessor): def analyze_single_element(self, element, suffix_stack): if isinstance(element, torch.Tensor): + if is_float8_tensor(element): + return {} if not self.is_found_output_tensor: if element.requires_grad: self.forward_output_tensor = element diff --git a/debug/accuracy_tools/msprobe/core/data_dump/json_writer.py b/debug/accuracy_tools/msprobe/core/data_dump/json_writer.py index b1e26d16f9741765c1c9600a64efb112aa0f42d7..7e3152266767612a08d5e8ddd723893267538088 100644 --- a/debug/accuracy_tools/msprobe/core/data_dump/json_writer.py +++ b/debug/accuracy_tools/msprobe/core/data_dump/json_writer.py @@ -16,13 +16,16 @@ import csv import os import copy -import numpy as np +import threading from msprobe.core.common.const import Const, FileCheckConst from msprobe.core.common.file_utils import change_mode, FileOpen, save_json, load_json +from msprobe.core.common.utils import recursion_depth_decorator from msprobe.core.common.log import logger from msprobe.core.common.exceptions import MsprobeException +lock = threading.Lock() + class DataWriter: @@ -38,6 +41,7 @@ class DataWriter: self.cache_stack = {} self.cache_construct = {} self.cache_debug = {} + self.stat_stack_list = [] @staticmethod def write_data_to_csv(result: list, result_header: tuple, file_path: str): @@ -54,6 +58,46 @@ class DataWriter: if is_new_file: change_mode(file_path, FileCheckConst.DATA_FILE_AUTHORITY) + @recursion_depth_decorator("JsonWriter: DataWriter._replace_stat_placeholders") + def _replace_stat_placeholders(self, data, stat_result): + if isinstance(data, dict): + keys = list(data.keys()) # 获取当前所有键 + for key in keys: # 递归所有变量 + value = data[key] + if key == Const.TENSOR_STAT_INDEX and isinstance(value, int): + if value > 0: + idx = value + else: + return + stat_values = stat_result[idx] if idx < len(stat_result) else [None] * 4 + # 构建新字段并删除旧键 + new_entries = { + "type": data["type"], + "dtype": data["dtype"], + "shape": data["shape"], + "Max": stat_values[0], + "Min": stat_values[1], + "Mean": stat_values[2], + "Norm": stat_values[3] + } + del data[key] + + # 重构字典顺序 + updated_dict = {} + # 通过插入排序后字段保证字段写入json的有序 + updated_dict.update(new_entries) + # 遍历原字典其他字段(排除已删除的tensor_stat_index) + for k in data: + if k not in new_entries: + updated_dict[k] = data[k] + data.clear() + data.update(updated_dict) + else: + self._replace_stat_placeholders(value, stat_result) + elif isinstance(data, (list, tuple)): + for item in data: + self._replace_stat_placeholders(item, stat_result) + def reset_cache(self): self.cache_data = {} self.cache_stack = {} @@ -90,28 +134,32 @@ class DataWriter: self.write_json() def update_data(self, new_data): - if not isinstance(new_data, dict) or len(new_data.keys()) != 1: - logger.warning(f"The data info({new_data}) should be a dict with only one outer key.") - return - dump_data = self.cache_data.get(Const.DATA) - if not isinstance(dump_data, dict): - logger.warning(f"The dump data({dump_data}) should be a dict.") - return - - key = next(iter(new_data.keys())) - if key in dump_data: - dump_data.get(key).update(new_data.get(key)) - else: - dump_data.update(new_data) + with lock: + if not isinstance(new_data, dict) or len(new_data.keys()) != 1: + logger.warning(f"The data info({new_data}) should be a dict with only one outer key.") + return + dump_data = self.cache_data.get(Const.DATA) + if not isinstance(dump_data, dict): + logger.warning(f"The dump data({dump_data}) should be a dict.") + return + + key = next(iter(new_data.keys())) + if key in dump_data: + dump_data.get(key).update(new_data.get(key)) + else: + dump_data.update(new_data) def update_stack(self, new_data): - self.cache_stack.update(new_data) + with lock: + self.cache_stack.update(new_data) def update_construct(self, new_data): - self.cache_construct.update(new_data) + with lock: + self.cache_construct.update(new_data) def update_debug(self, new_data): - self.cache_debug['data'].update(new_data) + with lock: + self.cache_debug['data'].update(new_data) def write_data_json(self, file_path): logger.info(f"dump.json is at {os.path.dirname(os.path.dirname(file_path))}. ") @@ -126,38 +174,60 @@ class DataWriter: def write_debug_info_json(self, file_path): save_json(file_path, self.cache_debug, indent=1) + def append_stat_to_buffer(self, stat_vector): + """ + 直接使用 Python list 存储 stat_vector, + 将 stat_vector 存入 self.stat_stack_list 的方式 + """ + self.stat_stack_list.append(stat_vector) + return len(self.stat_stack_list) - 1 + + def get_buffer_values_max(self, index): + if 0 <= index < len(self.stat_stack_list) and len(self.stat_stack_list[index]) >= 1: + return self.stat_stack_list[index][0] + else: + logger.warning(f"stat_stack_list[{index}] The internal data is incomplete," + f" and the maximum value cannot be obtained.") + return None + + def get_buffer_values_min(self, index): + if 0 <= index < len(self.stat_stack_list) and len(self.stat_stack_list[index]) >= 1: + return self.stat_stack_list[index][1] + else: + logger.warning(f"stat_stack_list[{index}] Internal data is incomplete" + f" and minimum values cannot be obtained.") + return None + + def flush_stat_stack(self): + """ + 在 flush 阶段,将所有存储的统计值从设备搬到 CPU, + 这里返回一个列表,每个元素是 [Max, Min, Mean, Norm] 的数值列表 + """ + if not self.stat_stack_list: + return [] + result = [ + [ + x.item() if hasattr(x, "item") else x + for x in stat_values + ] + for stat_values in self.stat_stack_list + ] + self.stat_stack_list = [] + return result + def write_json(self): - if self.cache_data: - self.write_data_json(self.dump_file_path) - if self.cache_stack: - self.write_stack_info_json(self.stack_file_path) - if self.cache_construct: - self.write_construct_info_json(self.construct_file_path) - if self.cache_debug: - self.write_debug_info_json(self.debug_file_path) - - def fill_stack_tensor_data(self): - self.process_stat_data_recursive(self.cache_data) - - def process_stat_data_recursive(self, data, depth=0): - if depth > Const.MAX_DEPTH: - logger.error(f"The maximum depth of recursive process stat data, {Const.MAX_DEPTH} is reached.") - raise MsprobeException(MsprobeException.RECURSION_LIMIT_ERROR) - if isinstance(data, dict): - if "tensor_stat" in data.keys(): - tensor_stat = data["tensor_stat"] - if len(tensor_stat) != Const.TENSOR_STAT_LEN or len(tensor_stat[0]) != len(tensor_stat[1]): - logger.warning("Some bad data in async dump") - else: - tensor_stat_index, tensor_stat_data = tensor_stat[0], tensor_stat[1] - if hasattr(tensor_stat_data, "device") and tensor_stat_data.device != Const.CPU_LOWERCASE: - tensor_stat_data = tensor_stat_data.cpu() - for index, stat in zip(tensor_stat_index, tensor_stat_data): - data.update({index: stat.item()}) - del data["tensor_stat"] - else: - for key in data.keys(): - self.process_stat_data_recursive(data[key], depth + 1) - elif isinstance(data, (list, tuple)): - for i in data: - self.process_stat_data_recursive(i, depth + 1) \ No newline at end of file + with lock: + # 在写 JSON 前,统一获取统计值 + stat_result = self.flush_stat_stack() + # 遍历 cache_data,将占位符替换为最终统计值 + if stat_result: + self._replace_stat_placeholders(self.cache_data, stat_result) + if self.cache_data: + self.write_data_json(self.dump_file_path) + if self.cache_stack: + self.write_stack_info_json(self.stack_file_path) + if self.cache_construct: + self.write_construct_info_json(self.construct_file_path) + if self.cache_debug: + self.write_debug_info_json(self.debug_file_path) + diff --git a/debug/accuracy_tools/msprobe/core/overflow_check/level.py b/debug/accuracy_tools/msprobe/core/overflow_check/level.py index 2f40468f6551a3787bdae7f9d94a5f66599151a0..0848110178d0effa9b3bc40ae6d4437a800d3f04 100644 --- a/debug/accuracy_tools/msprobe/core/overflow_check/level.py +++ b/debug/accuracy_tools/msprobe/core/overflow_check/level.py @@ -1,22 +1,22 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from enum import Enum - - -class OverflowLevel(Enum): - MEDIUM = "medium" - HIGH = "high" - CRITICAL = "critical" +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from enum import Enum + + +class OverflowLevel(Enum): + MEDIUM = "medium" + HIGH = "high" + CRITICAL = "critical" diff --git a/debug/accuracy_tools/msprobe/docs/02.config_introduction.md b/debug/accuracy_tools/msprobe/docs/02.config_introduction.md index f134bd4536294d209e7b3e6e73fd80b9be61041d..d77a2a7a14c41c4e93bc1bd7af023beba371821f 100644 --- a/debug/accuracy_tools/msprobe/docs/02.config_introduction.md +++ b/debug/accuracy_tools/msprobe/docs/02.config_introduction.md @@ -12,45 +12,46 @@ | 参数 | 解释 | 是否必选 | | ----------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -------- | -| task | dump 的任务类型,str 类型。可选参数:
"statistics":仅采集统计信息,默认值;
"tensor":采集统计信息和完全复刻整网的真实数据;
"run_ut":精度预检,仅 PyTorch 场景支持,采集数据时勿选;
"overflow_check":溢出检测;
"free_benchmark":无标杆比对;
"grad_probe":梯度监控;
"structure":仅采集模型结构以及调用栈信息,不采集具体数据。
根据 task 参数取值的不同,可以配置不同场景参数,详见:
[1.2 task 配置为 statistics](#12-task-配置为-statistics),
[1.3 task 配置为 tensor](#13-task-配置为-tensor),
[1.4 task 配置为 run_ut](#14-task-配置为-run_ut),
[1.5 task 配置为 overflow_check](#15-task-配置为-overflow_check),
[1.6 task 配置为 free_benchmark](#16-task-配置为-free_benchmark),
[1.7 task 配置为 grad_probe](#17-task-配置为-grad_probe)。
**配置示例**:"task": "tensor"。 | 否 | +| task | dump 的任务类型,str 类型。可选参数:
"statistics":仅采集统计信息,默认值;
"tensor":采集统计信息和完全复刻整网的真实数据;
"run_ut":精度预检,仅 PyTorch 场景支持,采集数据时勿选;
"overflow_check":溢出检测;
"free_benchmark":无标杆比对,不支持 MSAdapter 场景;
"grad_probe":梯度监控, 不支持 MSAdapter 场景;
"structure":仅采集模型结构以及调用栈信息,不采集具体数据。
根据 task 参数取值的不同,可以配置不同场景参数,详见:
[1.2 task 配置为 statistics](#12-task-配置为-statistics),
[1.3 task 配置为 tensor](#13-task-配置为-tensor),
[1.4 task 配置为 run_ut](#14-task-配置为-run_ut),
[1.5 task 配置为 overflow_check](#15-task-配置为-overflow_check),
[1.6 task 配置为 free_benchmark](#16-task-配置为-free_benchmark),
[1.7 task 配置为 grad_probe](#17-task-配置为-grad_probe)。
**配置示例**:"task": "tensor"。 | 否 | | dump_path | 设置 dump 数据目录路径,str 类型。
**配置示例**:"dump_path": "./dump_path"。 | 是 | -| rank | 指定对某张卡上的数据进行采集,list[Union[int, str]] 类型,默认未配置(表示采集所有卡的数据),应配置元素为 ≥0 的整数或类似"4-6"的字符串,且须配置实际可用的 Rank ID。
PyTorch 场景: Rank ID 从 0 开始计数,最大取值为所有节点可用卡总数-1,若所配置的值大于实际训练所运行的卡的 Rank ID,则 dump 数据为空,比如当前环境 Rank ID 为 0 到 7,实际训练运行 0 到 3 卡,此时若配置 Rank ID 为 4 或不存在的 10 等其他值,dump 数据为空。
MindSpore 场景:所有节点的 Rank ID 均从 0 开始计数,最大取值为每个节点可用卡总数-1,config.json 配置一次 rank 参数对所有节点同时生效。
注意,单卡训练时,rank必须为[],即空列表,不能指定rank。
**配置示例**:"rank": [1, "4-6"]。 | 否 | +| rank | 指定对某张卡上的数据进行采集,list[Union[int, str]] 类型,默认未配置(表示采集所有卡的数据),应配置元素为 ≥0 的整数或类似"4-6"的字符串,且须配置实际可用的 Rank ID。
PyTorch 场景: Rank ID 从 0 开始计数,最大取值为所有节点可用卡总数-1,若所配置的值大于实际训练所运行的卡的 Rank ID,则 dump 数据为空,比如当前环境 Rank ID 为 0 到 7,实际训练运行 0 到 3 卡,此时若配置 Rank ID 为 4 或不存在的 10 等其他值,dump 数据为空。
MindSpore 场景:所有节点的 Rank ID 均从 0 开始计数,最大取值为每个节点可用卡总数-1,config.json 配置一次 rank 参数对所有节点同时生效。静态图 L0 级别 dump 暂不支持指定rank。
注意,单卡训练时,rank必须为[],即空列表,不能指定rank。
**配置示例**:"rank": [1, "4-6"]。 | 否 | | step | 指定采集某个 step 的数据,list[Union[int, str]] 类型。默认未配置,表示采集所有 step 数据。采集特定 step 时,须指定为训练脚本中存在的 step,可逐个配置,也可以指定范围。
**配置示例**:"step": [0, 1 , 2, "4-6"]。 | 否 | -| level | dump 级别,str 类型,根据不同级别采集不同数据。可选参数:
"L0":dump 模块级精度数据,仅 PyTorch 与 MindSpore 动态图场景支持,使用背景详见 [1.1.1 模块级精度数据 dump 说明](#111-模块级精度数据-dump-说明);
"L1":dump API 级精度数据,默认值,仅 PyTorch 与 MindSpore 动态图场景支持;
"L2":dump kernel 级精度数据,PyTorch场景详细介绍见 [PyTorch 场景的 kernel dump 说明](./04.kernel_dump_PyTorch.md);MindSpore场景详细介绍见 [MindSpore 场景的 kernel dump 说明](./28.kernel_dump_MindSpore.md);
"mix":dump module 模块级和 API 级精度数据,即"L0"+"L1",仅 PyTorch 与 MindSpore 动态图场景支持。
"debug":单点保存功能,细节详见[单点保存工具 README](./28.debugger_save_instruction.md)
**配置示例**:"level": "L1"。 | 否 | +| level | dump 级别,str 类型,根据不同级别采集不同数据。可选参数:
"L0":dump 模块级精度数据,使用背景详见 [1.1.1 模块级精度数据 dump 说明](#111-模块级精度数据-dump-说明);
"L1":dump API 级精度数据,默认值,仅 PyTorch、MSAdapter 以及 MindSpore 均支持;
"L2":dump kernel 级精度数据,PyTorch 场景详细介绍见 [PyTorch 场景的 kernel dump 说明](./04.kernel_dump_PyTorch.md);MindSpore 动态图场景详细介绍见 [MindSpore 动态图场景的 kernel dump 说明](./28.kernel_dump_MindSpore.md);MindSpore 静态图场景详细介绍见《MindSpore 场景的数据采集》中的 ["**8.1 静态图场景**"](./06.data_dump_MindSpore.md#81-静态图场景)小节;
"mix":dump module 模块级和 API 级精度数据,即"L0"+"L1",仅 PyTorch、MSAdapter 以及 MindSpore 动态图场景支持。
"debug":单点保存功能,细节详见[单点保存工具 README](./28.debugger_save_instruction.md)
**配置示例**:"level": "L1"。 | 否 | | enable_dataloader | 自动控制开关,bool 类型,仅 PyTorch 场景支持。可选参数 true(开启)或 false(关闭),默认为 false。配置为 true 后自动识别 step 参数指定的迭代,并在该迭代执行完成后退出训练,此时 start、stop 和 step 函数可不配置,开启该开关要求训练脚本是通过 torch.utils.data.dataloader 方式加载数据。仅支持 PyTorch 单卡训练使用,分布式训练场景下存在数据 dump 不全问题。 **这个特性下个版本将被废弃** | 否 | | async_dump | 异步 dump 开关,bool 类型。可选参数 true(开启)或 false(关闭),默认为 false。配置为 true 后开启异步 dump,即采集的精度数据会在当前 step 训练结束后统一落盘,训练过程中工具不触发同步操作。由于使用该模式有**显存溢出**的风险,当 task 配置为 tensor 时,即真实数据的异步dump模式,必须配置 [list](#13-task-配置为-tensor) 参数,指定需要 dump 的 tensor 。该模式暂不支持复数类型 tensor
的统计量计算。 | 否 | #### 1.1.1 模块级精度数据 dump 说明 -仅 PyTorch 与 MindSpore 动态图场景支持。 +PyTorch 与 MindSpore 均支持。 大模型场景下,通常不是简单的利用自动迁移能力实现从 GPU 到 NPU 的训练脚本迁移,而是会对 NPU 网络进行一系列针对性的适配,因此,常常会造成迁移后的 NPU 模型存在部分子结构不能与 GPU 原始模型完全对应。模型结构不一致导致 API 调用类型及数量不一致,若直接按照 API 粒度进行精度数据 dump 和比对,则无法完全比对所有的 API。 本小节介绍的功能是对模型中的大粒度模块进行数据 dump,使其比对时,对于无法以 API 粒度比对的模块可以直接以模块粒度进行比对。 -模块指的是继承 nn.Module 类(PyTorch场景)或 nn.Cell 类(MindSpore场景)的子类,通常情况下这类模块就是一个小模型,可以被视为一个整体,dump 数据时以模块为粒度进行 dump。 - +模块指的是继承 nn.Module 类(PyTorch 与 MSAdapter 场景)或 nn.Cell 类(MindSpore 场景)的子类,通常情况下这类模块就是一个小模型,可以被视为一个整体,dump 数据时以模块为粒度进行 dump。 ### 1.2 task 配置为 statistics - - - - + + - - - + + + + +
参数解释是否必选
scopePyTorch 和 MindSpore 动态图场景 dump 范围,list[str] 类型,默认未配置(list 也未配置时表示 dump 所有 API 的数据)。该参数可以在 [ ] 内配置两个模块名或 API 名,要求列表长度必须为2,需要配置按照工具命名格式的完整模块名或API名称,用于锁定区间,dump 该范围内的数据。
配置示例: +
scopePyTorch、MSAdapter 以及 MindSpore 动态图场景 dump 范围,list[str] 类型,默认未配置(list 也未配置时表示 dump 所有 API 的数据)。该参数可以在 [ ] 内配置两个模块名或 API 名,要求列表长度必须为2,需要配置按照工具命名格式的完整模块名或API名称,用于锁定区间,dump 该范围内的数据。
配置示例: "scope": ["Module.conv1.Conv2d.forward.0", "Module.fc2.Linear.forward.0"], 或 "scope": ["Cell.conv1.Conv2d.forward.0", "Cell.fc2.Dense.backward.0"], 或"scope": ["Tensor.add.0.forward", "Functional.square.2.forward"]。与 level 参数取值相关,level 为 L0 级别时,可配置模块名;level 为 L1 级别时,可配置 API 名, level为 mix 级别时,可配置为模块名或API名。
list自定义采集的算子列表,list[str] 类型,默认未配置(scope 也未配置时表示 dump 所有 API 的数据),包含以下配置方法:
PyTorch 和 MindSpore 动态图场景配置具体的 API 全称,dump 该 API 数据。在 PyTorch 场景,如果 level 配置成 L2,该配置为必填项。
配置示例:"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.backward"]。
PyTorch 和 MindSpore 动态图场景在level为 mix 级别时可以配置模块名称,dump该模块展开数据 (dump该模块从执行开始到执行结束期间的所有数据)。 +
PyTorch、MSAdapter 以及 MindSpore 动态图场景配置具体的 API 全称,dump 该 API 数据。在 PyTorch 场景,如果 level 配置成 L2,该配置为必填项。
配置示例:"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.backward"]。
PyTorch 和 MindSpore 动态图场景在level为 mix 级别时可以配置模块名称,dump该模块展开数据 (dump该模块从执行开始到执行结束期间的所有数据)。
配置示例:"list": ["Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0"], 或 "list": ["Cell.network_with_loss.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0"]
PyTorch 和 MindSpore 动态图场景指定某一类 API,dump 某一类的 API 级别输入输出数据。
配置示例:"list": ["relu"]。
PyTorch 和 MindSpore 动态图场景在level为 mix 级别时, 会dump名称中包含list中配置的字符串的API数据,还会将名称中包含list中配置的字符串的模块进行展开dump (dump该模块从执行开始到执行结束期间的所有数据)。
MindSpore 静态图场景配置 kernel_name,可以是算子的名称列表,也可以指定算子类型("level": "L2"时不支持),还可以配置算子名称的正则表达式(当字符串符合“name-regex(xxx)”格式时,后台则会将其作为正则表达式。
配置示例:list: ["name-regex(Default/.+)"]
可匹配算子名称以“Default/”开头的所有算子。
PyTorch、MSAdapter 以及 MindSpore 动态图场景指定某一类 API,dump 某一类的 API 级别输入输出数据。
配置示例:"list": ["relu"]。
PyTorch、MSAdapter 以及 MindSpore 动态图场景在level为 mix 级别时, 会dump名称中包含list中配置的字符串的API数据,还会将名称中包含list中配置的字符串的模块进行展开dump (dump该模块从执行开始到执行结束期间的所有数据)。
MindSpore 静态图场景配置 kernel_name,可以是算子的名称列表,也可以指定算子类型(jit_level=O2 时不支持),还可以配置算子名称的正则表达式(当字符串符合“name-regex(xxx)”格式时,后台则会将其作为正则表达式。
配置示例:list: ["name-regex(Default/.+)"]
可匹配算子名称以“Default/”开头的所有算子。
data_modedump 数据过滤,str 类型。
PyTorch 与 MindSpore 动态图场景:支持"all"、"forward"、"backward"、"input"和"output",除"all"外,其余参数可以自由组合。默认为["all"],即保存所有 dump 的数据。
配置示例:"data_mode": ["backward"] (仅保存反向数据)或 "data_mode": ["forward", "input"](仅保存前向的输入数据)。
MindSpore 静态图场景:仅支持"all"、"input"和"output"参数,且各参数只能单独配置,不支持自由组合。
配置示例:"data_mode": ["all"]。
summary_mode控制 dump 文件输出的模式,str 类型,仅 PyTorch 与 MindSpore 动态图场景支持,可选参数:
md5:dump 输出包含 CRC-32 值以及 API 统计信息的 dump.json 文件,用于验证数据的完整性;
statistics:dump 仅输出包含 API 统计信息的 dump.json 文件,默认值。
配置示例:"summary_mode": "md5"。
MindSpore静态图jit_level=O2场景L2级dump,支持上述配置的同时额外支持配置统计项列表,可选统计项为max、min、mean、l2norm,可从中任意选取组合搭配。其中mean、l2norm的结果为float数据格式。
配置示例:"summary_mode": ["max", "min"]。
PyTorch、MSAdapter 以及 MindSpore 动态图场景:支持"all"、"forward"、"backward"、"input"和"output",除"all"外,其余参数可以自由组合。默认为["all"],即保存所有 dump 的数据。
配置示例:"data_mode": ["backward"] (仅保存反向数据)或 "data_mode": ["forward", "input"](仅保存前向的输入数据)。
MindSpore 静态图场景:L0 级别 dump 仅支持"all"、"forward"和"backward"参数;L2 级别 dump 仅支持"all"、"input"和"output"参数,且各参数只能单独配置,不支持自由组合。
配置示例:"data_mode": ["all"]。
summary_mode控制 dump 文件输出的模式,str 类型,支持 PyTorch、MSAdapter、MindSpore 动态图以及 MindSpore 静态图 jit_level=O2 场景。
PyTorch、MSAdapter 以及 MindSpore 动态图场景:可选参数为
md5:dump 输出包含 CRC-32 值以及 API 统计信息的 dump.json 文件,用于验证数据的完整性;
statistics:dump 仅输出包含 API 统计信息的 dump.json 文件,默认值。
配置示例:"summary_mode": "md5"。
MindSpore 静态图 jit_level=O2 场景:支持上述配置的同时额外支持配置统计项列表,可选统计项为max、min、mean、l2norm,可从中任意选取组合搭配。其中mean、l2norm的结果为float数据格式。
配置示例:"summary_mode": ["max", "min"]。
-**说明**:"summary_mode"配置为"md5"时,所使用的校验算法为CRC-32算法。 +**说明**:"summary_mode" 配置为 "md5" 时,所使用的校验算法为 CRC-32 算法。 ### 1.3 task 配置为 tensor @@ -86,16 +87,16 @@ ### 1.5 task 配置为 overflow_check -PyTorch 与 MindSpore 动态图场景下,"level"须为"L0"或"L1";MindSpore 静态图场景下,"level"须为"L2",且模型编译优化等级(jit_level)须为"O2"。 +PyTorch、MSAdapter 以及 MindSpore 动态图场景下,"level"须为"L0"或"L1";MindSpore 静态图场景下,"level"须为"L2",且模型编译优化等级(jit_level)须为"O2"。 | 参数 | 解释 | 是否必选 | | ------------- | ---------------------- | -------- | -| overflow_nums | 最大溢出次数,int 类型,默认为 1,仅 PyTorch 与 MindSpore 动态图场景支持。表示第 N 次溢出后,不再进行溢出检测。过程中检测到溢出 API 对应的 输入输出 数据均 dump。
**配置示例**:"overflow_nums": 3。配置为 -1 时,表示持续检测溢出直到训练结束。 | 否 | -| check_mode | 溢出类型,str 类型,仅 MindSpore 场景支持,可选参数:
"aicore":开启 AI Core 的溢出检测,不支持 MindSpore v2.3.0 以上版本;
"atomic":开启 Atomic 的溢出检测,不支持 MindSpore v2.3.0 以上版本;
"all":开启算子的溢出检测,默认值。
**配置示例**:"check_mode": "all"。 | 否 | +| overflow_nums | 最大溢出次数,int 类型,默认为 1,仅 PyTorch、MSAdapter 以及 MindSpore 动态图场景支持。表示第 N 次溢出后,不再进行溢出检测。过程中检测到溢出 API 对应的 输入输出 数据均 dump。
**配置示例**:"overflow_nums": 3。配置为 -1 时,表示持续检测溢出直到训练结束。 | 否 | +| check_mode | 溢出类型,str 类型,仅 MindSpore v2.3.0 以下版本的静态图场景支持,可选参数:
"aicore":开启 AI Core 的溢出检测;
"atomic":开启 Atomic 的溢出检测;
"all":开启算子的溢出检测,默认值。
**配置示例**:"check_mode": "all"。 | 否 | ### 1.6 task 配置为 free_benchmark -仅 PyTorch 场景与 MindSpore 动态图场景支持,且"level"为"L1"。 +仅 PyTorch 与 MindSpore 动态图场景支持,且"level"为"L1"。 - task 配置为 free_benchmark 时,开启**无标杆比对**,在 NPU 环境下通过对当前模型 API 的输入添加扰动因子,二次执行,将得到的输出与未添加扰动因子前的输出进行比对,从而**得出该模型中可能存在因迁移等变化导致精度降低的 API**。 diff --git a/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md b/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md index db9a989c9d1c731fd9099d311f3ab3b95e5c7d5d..be9386df1d906330f76b536f19ef1a13d1553754 100644 --- a/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md +++ b/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md @@ -183,10 +183,25 @@ save(variable, name, save_backward=True) **参数说明**: | 参数名称 | 参数含义 | 支持数据类型 | 是否必选| | ---------- | ------------------| ------------------- | ------------------- | -| variable | 需要保存的变量 |dict, list, torch.tensor, int, float, str | 是 | +| variable | 需要保存的变量 |dict, list, tuple, torch.tensor, int, float, str | 是 | | name | 指定的名称 | str | 是 | | save_backward | 是否保存反向数据 | boolean | 否 | +### 1.10 set_init_step + +**功能说明**:设置起始step数,step数默认从0开始计数,使用该接口后step从指定值开始计数。该函数需在 **start** 函数调用前使用,建议写在训练迭代的循环开始前。 + +**原型**: + +```Python +debugger.set_init_step(step) +``` + +**参数说明**: + +1.step: 指定的起始step数。 + + ## 2 示例代码 ### 2.1 快速上手 @@ -355,7 +370,7 @@ if __name__ == "__main__": ``` * `rank`:设备 ID,每张卡的数据保存在对应的 `rank{ID}` 目录下。非分布式场景下没有 rank ID,目录名称为 rank。 * `dump_tensor_data`:保存采集到的张量数据。 -* `dump.json`: 保存API或Module前反向数据的统计量信息。包含dump数据的API名称或Module名称,各数据的dtype、 shape、max、min、mean、L2norm(L2范数,平方根)统计信息以及当配置summary_mode="md5"时的CRC-32数据。具体介绍可参考[dump.json文件说明](./27.dump_json_instruction.md#1-dumpjson文件介绍pytorch)。 +* `dump.json`: 保存API或Module前反向数据的统计量信息。包含dump数据的API名称或Module名称,各数据的dtype、 shape、max、min、mean、L2norm(L2范数,平方根)统计信息以及当配置summary_mode="md5"时的CRC-32数据。具体介绍可参考[dump.json文件说明](./27.dump_json_instruction.md#1-PyTorch场景下的dump.json文件)。 * `stack.json`:API/Module的调用栈信息。 * `construct.json`:分层分级结构,level为L1时,construct.json内容为空。 diff --git a/debug/accuracy_tools/msprobe/docs/06.data_dump_MindSpore.md b/debug/accuracy_tools/msprobe/docs/06.data_dump_MindSpore.md index f7507facd2a92f3acbefdc92fa6cd808a155d6e3..ef04ea9cc0d5d8ff6922e328286448dc8c1fe148 100644 --- a/debug/accuracy_tools/msprobe/docs/06.data_dump_MindSpore.md +++ b/debug/accuracy_tools/msprobe/docs/06.data_dump_MindSpore.md @@ -30,8 +30,10 @@ dump 的"tensor"模式采集数据量大小,可以参考[数据量基线](data ## 5. 场景介绍 -### 5.1 静态图场景 -在静态图场景下,msprobe 仅支持 **L2 Level** 的数据采集。 +### 5.1 静态图场景 +在静态图场景下,msprobe 支持 **L0 Level** 和 **L2 Level** 的数据采集。且当 MindSpore 版本高于 2.5.0 时,若需采集 **L2 Level** 数据,必须使用编包时添加了`--include-mod=adump`选项的 mindstudio-probe whl 包进行 msprobe 工具安装。 +- **L0 Level(Cell 级)** :采集 `Cell` 对象的数据,适用于需要分析特定网络模块的情况。 + - **L2 Level(Kernel 级)** :采集底层算子的输入输出数据,适用于深入分析算子级别的精度问题。 采集方式请参见[示例代码 > 静态图场景](#71-静态图场景)。详细介绍请参见[《config.json 配置文件介绍》](./02.config_introduction.md#11-通用配置)中的“level 参数”和[《config.json 配置示例》](./03.config_examples.md#2-mindspore-静态图场景) 中的“MindSpore 静态图场景”。 @@ -110,7 +112,7 @@ stop() **功能说明**:结束一个 step 的数据采集,完成所有数据落盘并更新 dump 参数。在一个 step 结束的位置添加,且必须在 **stop** 函数之后的位置调用。 该函数需要配合 **start** 和 **stop** 函数使用,尽量添加在反向计算代码之后,否则可能会导致反向数据丢失。 -**仅未使用 Model 高阶 API 的动态图场景支持。** +**仅未使用 Model 高阶 API 的动态图和静态图场景支持。** **原型**: @@ -144,15 +146,28 @@ save(variable, name, save_backward=True) **参数说明**: | 参数名称 | 参数含义 | 支持数据类型 | 是否必选| | ---------- | ------------------| ------------------- | ------------------- | -| variable | 需要保存的变量 |dict, list, torch.tensor, int, float, str | 是 | +| variable | 需要保存的变量 |dict, list, tuple, torch.tensor, int, float, str | 是 | | name | 指定的名称 | str | 是 | | save_backward | 是否保存反向数据 | boolean | 否 | +#### 6.1.6 set_init_step + +**功能说明**:设置起始step数,step数默认从0开始计数,使用该接口后step从指定值开始计数。该函数需在 **start** 函数调用前使用,建议写在训练迭代的循环开始前。 + +**原型**: + +```Python +set_init_step(step) +``` + +**参数说明**: + +1.step: 指定的起始step数。 -### 6.2 msprobe.mindspore.common.utils.MsprobeStep +### 6.2 msprobe.mindspore.MsprobeStep -**功能说明**:MindSpore Callback类,自动在每个step开始时调用start()接口,在每个step结束时调用stop()、step()接口。实现使用 Model 高阶 API 的动态图场景下 L0、L1、mix 级别的精度数据采集控制,控制粒度为单个 **Step** ,而 PrecisionDebugger.start, PrecisionDebugger.stop 接口的控制粒度任意训练代码段。 +**功能说明**:MindSpore Callback类,自动在每个step开始时调用start()接口,在每个step结束时调用stop()、step()接口。实现使用 Model 高阶 API 的动态图场景下 L0、L1、mix 级别,和静态图场景下 L0级别的精度数据采集控制,控制粒度为单个 **Step** ,而 PrecisionDebugger.start, PrecisionDebugger.stop 接口的控制粒度任意训练代码段。 **原型**: @@ -164,7 +179,17 @@ MsprobeStep(debugger) 1. debugger:PrecisionDebugger对象。 -### 6.3 msprobe.mindspore.seed_all +### 6.3 msprobe.mindspore.MsprobeInitStep + +**功能说明**:MindSpore Callback 类,自动获取并设置初始 step 值。仅适用于静态图 O0/O1 模式的断点续训场景。 + +**原型**: + +```Python +MsprobeInitStep() +``` + +### 6.4 msprobe.mindspore.seed_all **功能说明**:用于固定网络中的随机性和开启确定性计算。 @@ -181,12 +206,59 @@ seed_all(seed=1234, mode=False, rm_dropout=True) 3. rm_dropout:控制dropout失效的开关。可配置 True 或 False,默认值:True,非必选。参数示例:rm_dropout=True。该参数设置为 True 后,将会使mindspore.ops.Dropout,mindspore.ops.Dropout2D,mindspore.ops.Dropout3D,mindspore.mint.nn.Dropout和mindspore.mint.nn.functional.dropout失效,以避免因随机dropout造成的网络随机性。建议在采集mindspore数据前开启。注意:通过rm_dropout控制dropout失效或生效需要在初始化Dropout实例前调用才能生效。 +## 7. 示例代码 + +### 7.1 静态图场景 +#### 7.1.1 L0 级别 +**说明**: 静态图 L0 级别的Dump功能是基于mindspore.ops.TensorDump算子实现。在Ascend平台上的Graph模式下,可以通过设置环境变量 [MS_DUMP_SLICE_SIZE 和 MS_DUMP_WAIT_TIME](https://www.mindspore.cn/docs/zh-CN/r2.5.0/api_python/env_var_list.html) 解决在输出大Tesnor或输出Tensor比较密集场景下算子执行失败的问题。 -## 7. 示例代码 +##### 7.1.1.1 未使用 Model 高阶 API -### 7.1 静态图场景 + +```python +import mindspore as ms +ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend") + +from msprobe.mindspore import PrecisionDebugger +debugger = PrecisionDebugger(config_path="./config.json") + +# 模型、损失函数的定义以及初始化等操作 +# ... +model = Network() +# 数据集迭代的地方往往是模型开始训练的地方 +for data, label in data_loader: + debugger.start(model) # 进行 L0 级别下Cell 对象的数据采集时调用 + # 如下是模型每个 step 执行的逻辑 + grad_net = ms.grad(model)(data) + # ... + debugger.step() # 更新迭代数 +``` + +##### 7.1.1.2 使用 Model 高阶 API + + +```python +import mindspore as ms +from mindspore.train import Model +ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend") + +from msprobe.mindspore import PrecisionDebugger +from msprobe.mindspore.common.utils import MsprobeStep +debugger = PrecisionDebugger(config_path="./config.json") + +# 模型、损失函数的定义以及初始化等操作 +# ... + +model = Network() +# 进行 L0 级别下 Cell 对象的数据采集时调用 +debugger.start(model) +trainer = Model(model, loss_fn=loss_fn, optimizer=optimizer, metrics={'accuracy'}) +trainer.train(1, train_dataset, callbacks=[MsprobeStep(debugger)]) +``` + +#### 7.1.2 L2 级别 ```python import mindspore as ms @@ -301,11 +373,14 @@ trainer.train(1, train_dataset) ### 8.1 静态图场景 -训练结束后,数据将保存在 `dump_path` 指定的目录下。 +训练结束后,数据将保存在 `dump_path` 指定的目录下。
+L0 级别 dump 的目录结构与动态图场景下目录结构一致。
+L2 级别 dump 的目录结构如下所示: -若jit_level=O2,且使用mindstudio-probe发布包或源码编包时添加了`--include-mod=adump`选项,目录结构示例如下: +若jit_level=O2,MindSpore 版本不低于 2.5.0,且使用mindstudio-probe发布包或源码编包时添加了`--include-mod=adump`选项,目录结构示例如下: ``` ├── dump_path +│ ├── acl_dump_{device_id}.json │ ├── rank_0 │ | ├── {timestamp} │ | │ ├── step_0 @@ -329,9 +404,9 @@ trainer.train(1, train_dataset) **说明** 1. 若配置文件中指定落盘npy格式,但是实际数据格式不在npy支持范围内(如bf16、int4等),则该tensor会以原始码流落盘,并不会转换为npy格式。 2. 若原始文件全名长度超过255个字符,则文件基础名会被转换为长度为32位的随机数字字符串,原始文件名与转换后文件名的对应关系会保存在同目录下的`mapping.csv`文件中。 +3. acl_dump_{device_id}.json 为在 Dump 接口调用过程中生成的中间文件,一般情况下无需关注。 - -其他场景请参见 MindSpore 官方文档中的[数据对象目录](https://www.mindspore.cn/docs/zh-CN/r2.4.0/model_train/debug/dump.html)。 +其他场景下,除 kernel_kbyk_dump.json(jit_level=O0/O1)、kernel_graph_dump.json(jit_level=O2)等无需关注的中间文件外的其他 dump 结果文件请参见 MindSpore 官方文档中的[ Ascend 下 O0/O1 模式 Dump 数据对象目录和数据文件介绍](https://www.mindspore.cn/docs/zh-CN/r2.5.0/model_train/debug/dump.html#%E6%95%B0%E6%8D%AE%E5%AF%B9%E8%B1%A1%E7%9B%AE%E5%BD%95%E5%92%8C%E6%95%B0%E6%8D%AE%E6%96%87%E4%BB%B6%E4%BB%8B%E7%BB%8D)与[ Ascend 下 O2 模式 Dump 数据对象目录和数据文件介绍](https://www.mindspore.cn/docs/zh-CN/r2.5.0/model_train/debug/dump.html#%E6%95%B0%E6%8D%AE%E5%AF%B9%E8%B1%A1%E7%9B%AE%E5%BD%95%E5%92%8C%E6%95%B0%E6%8D%AE%E6%96%87%E4%BB%B6%E4%BB%8B%E7%BB%8D-1)。 ### 8.2 动态图场景 @@ -372,7 +447,7 @@ dump 结果目录结构示例如下: * `rank`:设备 ID,每张卡的数据保存在对应的 `rank{ID}` 目录下。非分布式场景下没有 rank ID,目录名称为 rank。 * `dump_tensor_data`:保存采集到的张量数据。 -* `dump.json`: 保存API或Cell前反向数据的统计量信息。包含dump数据的API名称或Cell名称,各数据的dtype、 shape、max、min、mean、L2norm(L2范数,平方根)统计信息以及当配置summary_mode="md5"时的CRC-32数据。具体介绍可参考[dump.json文件说明](./27.dump_json_instruction.md#2-dumpjson文件示例mindspore)。 +* `dump.json`: 保存API或Cell前反向数据的统计量信息。包含dump数据的API名称或Cell名称,各数据的dtype、 shape、max、min、mean、L2norm(L2范数,平方根)统计信息以及当配置summary_mode="md5"时的CRC-32数据。具体介绍可参考[dump.json文件说明](./27.dump_json_instruction.md#2-MindSpore场景下的dump.json文件)。 * `stack.json`:API/Cell的调用栈信息。 * `construct.json`:分层分级结构,level为L1时,construct.json内容为空。 @@ -411,3 +486,6 @@ ops: - adaptive_avg_pool2d - adaptive_avg_pool3d ``` +### 9.2 不支持模型 + +静态图场景L0级暂不支持Yi模型。 \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/docs/07.accuracy_checker_PyTorch.md b/debug/accuracy_tools/msprobe/docs/07.accuracy_checker_PyTorch.md index b07568e25a2915a4e8e5c2157e7de4252410f38d..ba7f978b09a4ffbc25d6525492aeeda43a698279 100644 --- a/debug/accuracy_tools/msprobe/docs/07.accuracy_checker_PyTorch.md +++ b/debug/accuracy_tools/msprobe/docs/07.accuracy_checker_PyTorch.md @@ -107,7 +107,7 @@ msprobe -f pytorch multi_run_ut -api_info ./dump_path/step{step_number}/rank{ran | -save_error_data | 保存精度未达标的 API 输入输出数据。 | 否 | | -o 或 --out_path | 指定 run_ut 执行结果存盘路径,默认“./”。 | 否 | | -j 或 --jit_compile | 开启 jit 编译。 | 否 | -| -n | 同时执行 run_ut 线程的数量,默认为 8,最大支持 64,但每个 Device 最大支持 8 个线程。当指定多个线程和多个 Device 时,线程数在每张卡上均分。 | 否 | +| -n 或 --num_splits | 同时执行 run_ut 线程的数量,默认为 8,最大支持 64,但每个 Device 最大支持 8 个线程。当指定多个线程和多个 Device 时,线程数在每张卡上均分。 | 否 | | -d 或 --device | 指定 Device ID,选择 UT 代码运行所在的卡,默认值为 0,支持同时指定 0~7,共 8 个 Device。 | 否 | | -csv_path 或 --result_csv_path | 指定本次运行中断时生成的 `accuracy_checking_result_{timestamp}.csv` 文件路径,执行 run_ut 中断时,若想从中断处继续执行,配置此参数即可。需要指定为上次中断的 `accuracy_checking_result_{timestamp}.csv` 文件。详见 [3.3 断点续检](#33-断点续检)。 | run_ut 操作中断后继续执行场景下必须配置 | | -f 或 --filter_api | 过滤模型中除最大值和最小值以外其他参数和结构相同的 API。适用于模型较大且重复 API 较多的场景。 | 否 | diff --git a/debug/accuracy_tools/msprobe/docs/09.accuracy_checker_MindSpore.md b/debug/accuracy_tools/msprobe/docs/09.accuracy_checker_MindSpore.md index 8e5ab781ce0652ea572e0a0e5fb053655c5f48ec..3bf65032edae2b8e35c5818d5c030c9ce4c79e95 100644 --- a/debug/accuracy_tools/msprobe/docs/09.accuracy_checker_MindSpore.md +++ b/debug/accuracy_tools/msprobe/docs/09.accuracy_checker_MindSpore.md @@ -2,7 +2,7 @@ ## 1 简介 -**MindSpore 动态图精度预检**a通过扫描昇腾 NPU 上用户训练 MindSpore 模型中的所有 Mint API,输出精度情况的诊断和分析。工具以模型中所有 Mint API 前反向的 dump 结果为输入,构造相应的 API 单元测试,将 NPU 输出与标杆(CPU 高精度)比对,计算对应的精度指标,从而找出 NPU 中存在精度问题的 Mint API。本工具支持**随机生成模式和真实数据模式**b。 +**MindSpore 动态图精度预检**a通过扫描昇腾 NPU 上用户训练 MindSpore 模型中的所有 Mint API 以及 Msadapter场景下迁移的 Mindspore API,输出精度情况的诊断和分析。工具以模型中所有 API 前反向的 dump 结果为输入,构造相应的 API 单元测试,将 NPU 输出与标杆(CPU 高精度)比对,计算对应的精度指标,从而找出 NPU 中存在精度问题的 API。本工具支持**随机生成模式和真实数据模式**b。 a. 支持 Mindspore 版本:2.4/2.5; diff --git a/debug/accuracy_tools/msprobe/docs/10.accuracy_compare_PyTorch.md b/debug/accuracy_tools/msprobe/docs/10.accuracy_compare_PyTorch.md index b4525d738d849a17ca5049bd2214784c6f788d21..6f886215b0a389582bc3cc4c31943f76e6a414a3 100644 --- a/debug/accuracy_tools/msprobe/docs/10.accuracy_compare_PyTorch.md +++ b/debug/accuracy_tools/msprobe/docs/10.accuracy_compare_PyTorch.md @@ -257,11 +257,11 @@ PyTorch 精度比对是以 CPU 或 GPU 的计算结果为标杆,通过计算 统计量有 4 种:最大值(max)、最小值(min)、平均值(mean)和 L2-范数(L2 norm)。 -|dump 数据模式|Cosine (tensor 余弦相似度)|MaxAbsErr (tensor 最大绝对误差)|MaxRelativeErr (tensor 最大相对误差)|One Thousandth Err Ratio (tensor 相对误差小于千分之一的比例)|Five Thousandth Err Ratio (tensor 相对误差小于千分之五的比例)|NPU 和 bench 的统计量绝对误差 (max, min, mean, L2 norm) diff| NPU 和 bench 的统计量相对误差 (max, min, mean, L2 norm) RelativeErr |NPU 和 bench 的统计量 (max, min, mean, L2 norm)|NPU MD5 (NPU 数据 CRC-32 值)|BENCH MD5 (bench 数据 CRC-32 值)|Result (比对结果)|Accuracy Reached or Not (计算精度是否达标)|Err_message (错误信息提示)|NPU_Stack_Info (堆栈信息)|Data_Name (NPU 真实数据名)| -|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| -|真实数据模式|√|√|√|√|√|||√||||√|√|√|√| -|统计数据模式||||||√|√|√|||√||√|√|| -|MD5 模式|||||||||√|√|√|||√|| +|dump 数据模式|Cosine (tensor 余弦相似度)|EucDist (tensor 欧式距离)|MaxAbsErr (tensor 最大绝对误差)|MaxRelativeErr (tensor 最大相对误差)|One Thousandth Err Ratio (tensor 相对误差小于千分之一的比例)|Five Thousandth Err Ratio (tensor 相对误差小于千分之五的比例)|NPU 和 bench 的统计量绝对误差 (max, min, mean, L2 norm) diff| NPU 和 bench 的统计量相对误差 (max, min, mean, L2 norm) RelativeErr |NPU 和 bench 的统计量 (max, min, mean, L2 norm)|NPU MD5 (NPU 数据 CRC-32 值)|BENCH MD5 (bench 数据 CRC-32 值)|Result (比对结果)|Accuracy Reached or Not (计算精度是否达标)|Err_message (错误信息提示)|NPU_Stack_Info (堆栈信息)| Data_Name ([NPU真实数据名,Bench真实数据名]) | +|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---------------------------------:| +|真实数据模式|√|√|√|√|√|√|||√||||√|√|√| √ | +|统计数据模式|||||||√|√|√|||√||√|√| | +|MD5 模式||||||||||√|√|√|||√| | 上表中NPU_Stack_Info字段需要配置-s参数生成。 @@ -320,7 +320,7 @@ MD5 模式: 5. "This is empty data, can not compare.":读取到的数据为空(真实数据模式); 6. "Shape of NPU and bench Tensor do not match. Skipped.":NPU 和 Bench 的数据结构不一致(真实数据模式); 7. "The Position of inf or nan in NPU and bench Tensor do not match.":NPU 和 Bench 的数据有 nan/inf(真实数据模式); -8. "This is type of 0-d tensor, can not calculate 'Cosine', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'.":NPU 为0维张量(真实数据模式); +8. "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'.":NPU 为0维张量(真实数据模式); 9. "Dtype of NPU and bench Tensor do not match.":NPU 和 Bench 数据的数据类型不同(真实数据模式); 10. "":除以上情况的其余情况(真实数据模式、统计数据模式)。 @@ -330,13 +330,15 @@ MD5 模式: 1. Cosine:通过计算两个向量的余弦值来判断其相似度,数值越接近于 1 说明计算出的两个张量越相似,实际可接受阈值为大于 0.99。在计算中可能会存在 nan,主要由于可能会出现其中一个向量为 0。 -2. MaxAbsErr:当最大绝对误差越接近 0 表示其计算的误差越小,实际可接受阈值为小于 0.001。 +2. EucDist:通过计算两个向量的欧式距离来判断其相似度,定义为多维空间中两个点之间的绝对距离。数值越接近0,张量越相似,数值越大,差异越大。 -3. MaxRelativeErr:当最大相对误差越接近 0 表示其计算的误差越小。 +3. MaxAbsErr:当最大绝对误差越接近 0 表示其计算的误差越小,实际可接受阈值为小于 0.001。 + +4. MaxRelativeErr:当最大相对误差越接近 0 表示其计算的误差越小。 当 dump 数据中存在 0 或 Nan 时,比对结果中最大相对误差则出现 inf 或 Nan 的情况,属于正常现象。 -4. One Thousandth Err Ratio(相对误差小于千分之一的元素比例)、Five Thousandths Err Ratio(相对误差小于千分之五的元素比例)精度指标:是指 NPU 的 Tensor 中的元素逐个与对应的标杆数据对比,相对误差小于千分之一、千分之五的比例占总元素个数的比例。该数据仅作为精度下降趋势的参考,并不参与计算精度是否通过的判定。 +5. One Thousandth Err Ratio(相对误差小于千分之一的元素比例)、Five Thousandths Err Ratio(相对误差小于千分之五的元素比例)精度指标:是指 NPU 的 Tensor 中的元素逐个与对应的标杆数据对比,相对误差小于千分之一、千分之五的比例占总元素个数的比例。该数据仅作为精度下降趋势的参考,并不参与计算精度是否通过的判定。 ## 4 多卡比对结果提取汇总通信算子数据 diff --git a/debug/accuracy_tools/msprobe/docs/11.accuracy_compare_MindSpore.md b/debug/accuracy_tools/msprobe/docs/11.accuracy_compare_MindSpore.md index 1b1824a774f15a86106585669d5f3412b3faca2e..c4e50f82bb740568903189df7336bd9341f61e97 100644 --- a/debug/accuracy_tools/msprobe/docs/11.accuracy_compare_MindSpore.md +++ b/debug/accuracy_tools/msprobe/docs/11.accuracy_compare_MindSpore.md @@ -16,8 +16,10 @@ msprobe精度比对工具主要用于如下场景: - MindSpore与PyTorch跨框架比对 - 通过对同一个网络模型,在整网环境下分别在MindSpore动态图和PyTorch环境下获得API dump数据,以PyTorch数据作为标杆,进行自动比对,从而实现跨框架的精度对比。 - 通过对同一个网络模型,在整网环境下分别在MindSpore动态图和PyTorch环境下获得cell dump数据,由用户指定可以比对的cell list,以PyTorch数据作为标杆,进行自动比对,从而实现跨框架的精度对比。 + - 通过对同一个网络模型,在整网环境下分别在MindSpore静态图和PyTorch环境下获得cell dump数据,由用户指定可以比对的cell list,以PyTorch数据作为标杆,进行自动比对,从而实现跨框架的精度对比。 - 通过对同一个网络模型,在整网环境下分别在MindSpore动态图和PyTorch环境下获得API或模块dump数据,由用户指定可以比对的API或模块,以PyTorch数据作为标杆,进行自动比对,从而实现跨框架的精度对比。 - 通过对同一个网络模型,在整网环境下分别在MindSpore动态图和PyTorch环境下获得API或模块dump数据,由用户指定可以比对的模型代码中的Layer层,以PyTorch数据作为标杆,进行自动比对,从而实现跨框架的精度对比。 + - 通过对同一个网络模型,在整网环境下分别在MindSpore静态图和PyTorch环境下获得模块dump数据,由用户指定可以比对的模型代码中的Layer层,以PyTorch数据作为标杆,进行自动比对,从而实现跨框架的精度对比。 执行精度比对操作需要安装msprobe工具。详见《[MindStudio精度调试工具](../README.md)》的“工具安装”章节。 @@ -35,17 +37,17 @@ msprobe -f mindspore compare -i ./compare.json -o ./output -s **完整参数说明** -| 参数名 | 说明 | 是否必选 | -| -------------------- | ------------------------------------------------------------ | -------- | -| -i或--input_path | 指定比对文件。比对文件内容及示例请参见[比对文件](#31-比对文件)或[比对文件(kernel)](#32-比对文件kernel)(比对文件(kernel)仅[不同版本下的全量kernel比对](#23-不同版本下的全量kernel比对)场景支持)。 | 是 | +| 参数名 | 说明 | 是否必选 | +| -------------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -------- | +| -i或--input_path | 指定比对文件。比对文件内容及示例请参见[比对文件](#31-比对文件)或[比对文件(kernel)](#32-比对文件kernel)(比对文件(kernel)仅[不同版本下的全量kernel比对](#23-不同版本下的全量kernel比对)场景支持)。 | 是 | | -o或--output_path | 配置比对结果文件存盘目录,默认会在当前目录创建output目录。文件名称基于时间戳自动生成,格式为:
`compare_result_{timestamp}.xlsx`
`compare_result_{rank_id}_{step_id}_{timestamp}.xlsx`(仅[不同版本下的全量kernel比对](#23-不同版本下的全量kernel比对)场景支持)。 | 否 | -| -s或--stack_mode | 比对结果展示调用栈信息(NPU_Stack_Info)的开关,bool 类型。单卡场景开启时,需要使用[比对文件](#31-比对文件)的单卡场景配置stack_path指定stack.json文件,才能生成详细调用栈信息,否则在比对时会报错;暂不支持多卡场景。通过直接配置该参数开启,默认未配置,表示关闭。 | 否 | -| -c或--compare_only | 仅比对开关,bool 类型。该参数默认未配置,会启用自动精度分析,工具自动针对比对结果进行分析,识别到第一个精度可能不达标节点(在比对结果文件中的 Accuracy Reached or Not 列显示为 No),并给出问题可能产生的原因(打屏展示并生成 `advisor_{timestamp}.txt` 文件)。通过配置该参数取消自动精度分析,仅输出比对结果表格。 | 否 | -| -f或--fuzzy_match | 模糊匹配。开启后,对于网络中同一层级且命名仅调用次数不同的API,可匹配并进行比对。通过直接配置该参数开启,默认未配置,表示关闭。 | 否 | -| -am或--api_mapping | 跨框架比对。配置该参数时表示开启跨框架API比对功能,可以指定自定义映射文件*.yaml,不指定映射文件时按照msprobe定义的默认映射关系进行比对。自定义映射文件的格式请参见[自定义映射文件(api_mapping)](#33-自定义映射文件api_mapping)。仅[跨框架的API比对](#25-跨框架的api比对)场景需要配置。 | 否 | -| -cm或--cell_mapping | 跨框架比对。配置该参数时表示开启跨框架cell模块比对功能,可以指定自定义映射文件*.yaml,不指定映射文件时按照msprobe定义的默认映射关系进行比对。自定义映射文件的格式请参见[自定义映射文件(cell_mapping)](#34-自定义映射文件cell_mapping)。仅[跨框架的cell模块比对](#26-跨框架的cell模块比对)场景需要配置。 | 否 | -| -dm或--data_mapping | 同框架或跨框架比对。通过映射文件指定两个具体参数的对应关系,可以在L0、L1或mix采集场景下使用。配置该参数的同时需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(data_mapping)](#35-自定义映射文件data_mapping)。 | 否 | -| -lm或--layer_mapping | 跨框架比对。配置该参数时表示开启跨框架Layer层的比对功能,指定模型代码中的Layer层后,可以识别对应dump数据中的模块或API。需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(Layer_mapping)](#36-自定义映射文件layer_mapping)。仅[跨框架的Layer层比对](#27-跨框架的layer层比对)场景需要配置。 | 否 | +| -s或--stack_mode | 比对结果展示调用栈信息(NPU_Stack_Info)的开关,bool 类型。单卡场景开启时,需要使用[比对文件](#31-比对文件)的单卡场景配置stack_path指定stack.json文件,才能生成详细调用栈信息,否则在比对时会报错;暂不支持多卡场景。通过直接配置该参数开启,默认未配置,表示关闭。 | 否 | +| -c或--compare_only | 仅比对开关,bool 类型。该参数默认未配置,会启用自动精度分析,工具自动针对比对结果进行分析,识别到第一个精度可能不达标节点(在比对结果文件中的 Accuracy Reached or Not 列显示为 No),并给出问题可能产生的原因(打屏展示并生成 `advisor_{timestamp}.txt` 文件)。通过配置该参数取消自动精度分析,仅输出比对结果表格。 | 否 | +| -f或--fuzzy_match | 模糊匹配。开启后,对于跨框架比对场景不再校验dtype与pytorch侧的一致性,可匹配并进行比对。通过直接配置该参数开启,默认未配置,表示关闭。 | 否 | +| -am或--api_mapping | 跨框架比对。配置该参数时表示开启跨框架API比对功能,可以指定自定义映射文件*.yaml,不指定映射文件时按照msprobe定义的默认映射关系进行比对。自定义映射文件的格式请参见[自定义映射文件(api_mapping)](#33-自定义映射文件api_mapping)。仅[跨框架的API比对](#25-跨框架的api比对)场景需要配置。 | 否 | +| -cm或--cell_mapping | 跨框架比对。配置该参数时表示开启跨框架cell模块比对功能,可以指定自定义映射文件*.yaml,不指定映射文件时按照msprobe定义的默认映射关系进行比对。自定义映射文件的格式请参见[自定义映射文件(cell_mapping)](#34-自定义映射文件cell_mapping)。仅[跨框架的cell模块比对](#26-跨框架的cell模块比对)场景需要配置。 | 否 | +| -dm或--data_mapping | 同框架或跨框架比对。通过映射文件指定两个具体参数的对应关系,可以在L0、L1或mix采集场景下使用。配置该参数的同时需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(data_mapping)](#35-自定义映射文件data_mapping)。 | 否 | +| -lm或--layer_mapping | 跨框架比对。配置该参数时表示开启跨框架Layer层的比对功能,指定模型代码中的Layer层后,可以识别对应dump数据中的模块或API。需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(Layer_mapping)](#36-自定义映射文件layer_mapping)。仅[跨框架的Layer层比对](#27-跨框架的layer层比对)场景需要配置。 | 否 | 动态图模式没有填写任何mapping时,按照同框架比对的方式进行比对,比对数据和标杆数据的Cell或Api名称需要完全相同才能匹配得上。 @@ -149,6 +151,11 @@ msprobe -f mindspore compare -i ./compare.json -o ./output -s cell_mapping.yaml文件配置请参见[自定义映射文件(cell_mapping)](#34-自定义映射文件cell_mapping)。 不传入cell_mapping.yaml的情况下仅将Cell改成Module后进行匹配;传入cell_mapping.yaml的情况下将按照cell_mapping.yaml的内容进行匹配。 + 如果跨框架比对场景不需要考虑dtype与pytorch侧的一致性,匹配并进行比对,可以开启-f或--fuzzy_match选项,例: + ```shell + msprobe -f mindspore compare -i ./compare.json -o ./output -s -f -cm cell_mapping.yaml + ``` + 此外,也可以通过data_mapping.yaml文件实现具体参数的匹配,例: ```shell msprobe -f mindspore compare -i ./compare.json -o ./output -s -dm data_mapping.yaml diff --git a/debug/accuracy_tools/msprobe/docs/12.overflow_check_PyTorch.md b/debug/accuracy_tools/msprobe/docs/12.overflow_check_PyTorch.md index 97b049000c6aca9a69aeca66e1a27a4260b3d142..983477554e138f3e547f2d3efcf14fdfc4a991a0 100644 --- a/debug/accuracy_tools/msprobe/docs/12.overflow_check_PyTorch.md +++ b/debug/accuracy_tools/msprobe/docs/12.overflow_check_PyTorch.md @@ -28,7 +28,7 @@ msprobe 工具在 PyTorch 场景下提供溢出数据采集功能和溢出数据 溢出数据采集功能在昇腾 NPU 上支持饱和模式(仅支持 Atlas 训练系列产品)和 INF/NAN 模式。 -INF/NAN 模式遵循 IEEE 754 标准,根据定义输出 INF/NAN 的计算结果。与之对应的饱和模式在计算出现溢出时,饱和为浮点数极值(+-MAX)。对于 CANN 侧配置,Atlas 训练系列产品,默认为饱和模式,且不建议使用 INF/NAN 模式;Atlas A2 训练系列产品,默认为 INF/NAN 模式,且不建议使用饱和模式。 +INF/NAN 模式遵循 IEEE 754 标准,根据定义输出 INF/NAN 的计算结果。与之对应的饱和模式在计算出现溢出时,饱和为浮点数极值(+-MAX)。对于 CANN 侧配置,Atlas 训练系列产品,默认为饱和模式,且不支持使用 INF/NAN 模式;Atlas A2 训练系列产品,默认为 INF/NAN 模式,且不建议使用饱和模式。 INF/NAN 模式的使能方式如下: diff --git a/debug/accuracy_tools/msprobe/docs/13.overflow_check_MindSpore.md b/debug/accuracy_tools/msprobe/docs/13.overflow_check_MindSpore.md index 33ff4a0259aef02d122022402966c65358e8efff..3b674a35e40e8e79b37a43ade7525219a45ee38e 100644 --- a/debug/accuracy_tools/msprobe/docs/13.overflow_check_MindSpore.md +++ b/debug/accuracy_tools/msprobe/docs/13.overflow_check_MindSpore.md @@ -11,7 +11,7 @@ export INF_NAN_MODE_ENABLE=1 export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE" ``` -**a**:在处理浮点数计算溢出问题时,NPU 当前支持两种溢出模式:INF/NAN 模式与饱和模式。INF/NAN 模式遵循 IEEE 754 标准,根据定义输出 INF/NAN 的计算结果。与之对应的饱和模式在计算出现溢出时,饱和为浮点数极值(+-MAX)。对于 CANN 侧配置,Atlas 训练系列产品,默认为饱和模式,且不建议使用 INF/NAN 模式;Atlas A2训练系列产品,默认为 INF/NAN 模式,且不建议使用饱和模式。对于 MindSpore 框架侧配置,仅支持对 Atlas A2 训练系列产品进行设置,默认为 INF/NAN 模式。CANN 侧 与 MindSpore 框架侧配置须一致。 +**a**:在处理浮点数计算溢出问题时,NPU 当前支持两种溢出模式:INF/NAN 模式与饱和模式。INF/NAN 模式遵循 IEEE 754 标准,根据定义输出 INF/NAN 的计算结果。与之对应的饱和模式在计算出现溢出时,饱和为浮点数极值(+-MAX)。对于 CANN 侧配置,Atlas 训练系列产品,默认为饱和模式,且不支持使用 INF/NAN 模式;Atlas A2训练系列产品,默认为 INF/NAN 模式,且不建议使用饱和模式。对于 MindSpore 框架侧配置,仅支持对 Atlas A2 训练系列产品进行设置,默认为 INF/NAN 模式。CANN 侧 与 MindSpore 框架侧配置须一致。 溢出检测任务的配置示例见[MindSpore 静态图场景下 task 配置为 overflow_check](https://gitee.com/ascend/mstt/blob/master/debug/accuracy_tools/msprobe/docs/03.config_examples.md#23-task-%E9%85%8D%E7%BD%AE%E4%B8%BA-overflow_check)、[MindSpore 动态图场景下 task 配置为 overflow_check](https://gitee.com/ascend/mstt/blob/master/debug/accuracy_tools/msprobe/docs/03.config_examples.md#33-task-%E9%85%8D%E7%BD%AE%E4%B8%BA-overflow_check)。 @@ -28,4 +28,6 @@ export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE" ## 3 溢出检测结果文件介绍 -溢出检测结果文件目录结构与含义与数据采集任务一致,但仅保存溢出 API 或 kernel 的真实数据或统计信息。详见MindSpore 场景的精度数据采集中的["**3 dump 结果文件介绍**"](./06.data_dump_MindSpore.md#3-dump-结果文件介绍)章节。 +溢出检测结果文件目录结构与含义与数据采集任务一致,但仅保存溢出 API 或 kernel 的真实数据或统计信息。详见MindSpore 场景的精度数据采集中的["**8. dump 结果文件介绍**"](./06.data_dump_MindSpore.md#8-dump-结果文件介绍)章节。 + +**说明**:在静态图 O2 编译等级下,若 MindSpore 版本为 2.4,或者 MindSpore 版本为 2.5,且未使用编包时添加了`--include-mod=adump`选项的 mindstudio-probe whl 包,则会产生 kernel_graph_overflow_check.json 中间文件,一般情况下无需关注。 \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/docs/19.monitor.md b/debug/accuracy_tools/msprobe/docs/19.monitor.md index fa1b7d06d6c52b55c49f26352f823de41b28cb2d..099bba30c0dbbf7fce8b52891897b3ad48ae04a9 100644 --- a/debug/accuracy_tools/msprobe/docs/19.monitor.md +++ b/debug/accuracy_tools/msprobe/docs/19.monitor.md @@ -411,14 +411,13 @@ python3 -m msprobe.pytorch.monitor.anomaly_analyse -d $MONITOR_OUTPUT_DIR/anomal from msprobe.pytorch.monitor.csv2tb import csv2tensorboard_by_step # 前三个参数用来指定需要转换的一批文件,指定monitor输出目录及一个时间范围,会对这个范围内的文件进行转换 # process_num指定拉起的进程个数,默认为1,更多的进程个数可以加速转换 -# data_type_list是一个列表,指定需要转换的数据类型, 数据类型应来自输出件文件前缀,所有类型数据: -# ["actv", "actv_grad", "exp_avg", "exp_avg_sq", "grad_unreduced", "grad_reduced", "param"] -# 不指定就转换全部数据 -# output_dirpath可指定输出目录, 不传值时保存到"{curtime}_csv2tensorboard_by_step"文件夹,其中curtime为自动获取的当前时间戳 +# data_type_list是一个列表,指定需要转换的数据类型,默认转换全部数据,数据类型应来自输出件文件前缀,所有类型数据: +# ["actv", "actv_grad", "exp_avg", "exp_avg_sq", "grad_unreduced", "grad_reduced", "param"] +# output_dirpath可指定输出目录,默认保存到"{curtime}_csv2tensorboard_by_step"文件夹,其中curtime为自动获取的当前时间戳 csv2tensorboard_by_step( - monitor_path="~/monitor_output", - time_start="Dec03_21-34-40", - time_end="Dec03_21-34-42", + monitor_path="~/monitor_output", # 必填 + time_start="Dec03_21-34-40", # 必填 + time_end="Dec03_21-34-42", # 必填 process_num=8, data_type_list=["param"] ) @@ -500,7 +499,7 @@ csv2tensorboard_by_step(monitor_path, time_start, time_end, process_num=1, data_ | time_end | 结束时间戳。搭配time_start一起使用。指定一个时间范围,会对这个范围内的文件进行转换。左闭右闭的区间。 | 是 | | process_num | 指定拉起的进程个数,默认为1,更多的进程个数可以加速转换。 | 否 | | data_type_list | 指定需要转换的数据类型, 数据类型应来自输出件文件前缀,所有类型数据:
["actv", "actv_grad", "exp_avg", "exp_avg_sq", "grad_unreduced", "grad_reduced", "param"]。
不指定就转换全部数据。 | 否 | - +| output_dirpath | 指定转换后的输出路径,默认输出到"{curtime}_csv2tensorboard_by_step"文件夹,其中curtime为自动获取的当前时间戳。 | 否 | - 在模型任意位置获取当前参数**梯度**统计量 ```python TrainerMon.generate_wgrad_metrics() -> tuple[dict, dict] diff --git a/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md b/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md index 34cdc2aa99b8f6f1ab65a2b692424506f2563b56..fe7a431a738cf60cf5879f756f00ea4305d3a7e0 100644 --- a/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md +++ b/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md @@ -51,6 +51,7 @@ msprobe -f pytorch graph -i ./compare.json -o ./output | -oc 或 --overflow_check | 是否开启溢出检测模式,开启后会在输出vis文件中(`compare_{timestamp}.vis或build_{timestamp}.vis`)对每个溢出节点进行标记溢出等级,溢出等级说明参考[溢出等级说明](#312-溢出等级说明) | 否 | | -f 或 --fuzzy_match | 是否开启模糊匹配,bool类型。模糊匹配说明参考[匹配说明](#311-匹配说明) | 否 | | -cs 或 --complete_stack | 是否使用完整的堆栈信息,bool类型。默认使用精简的堆栈信息,数据量小有助于增加流畅度。完整堆栈和精简堆栈信息参考[堆栈信息说明](#72-堆栈信息说明) | 否 | +| -mm 或 --multi_mapping | 一对一、一对多、多对一、多对多节点映射,例如待调试侧若干小算子与标杆侧融合算子比对等场景,需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(multi)](#73-自定义映射文件multi) | 否 | #### 3.1.1 匹配说明 @@ -465,6 +466,27 @@ yaml文件中只需配置待调试侧与标杆侧模型代码中功能一致但 ] } +``` +### 7.3 自定义映射文件(multi) +支持一对一、一对多、多对一、多对多节点映射配置,**多个节点使用英文逗号,分隔开**。 + +配置多个节点时,如果待配置节点为Module.layer3.Linear.forward.0、Module.layer4.Linear.forward.0和Module.layer5.Linear.forward.0,则Module.layer4.Linear.forward.0无需配置,仅取首尾节点配置即可(Module.layer3.Linear.forward.0,Module.layer5.Linear.forward.0)。注意,**配置节点的先后顺序不能乱(construct.json中的节点名称顺序代表先后顺序,请参考[dump结果文件介绍](./05.data_dump_PyTorch.md#3-dump-结果文件介绍))**,Module.layer3.Linear.forward.0在前,就不能配置成Module.layer5.Linear.forward.0,Module.layer3.Linear.forward.0,会导致配置无效。 + +```yaml +# 一对一 +Module.layer.Linear.forward.0: Module.layer1.Linear.forward.0 +``` +```yaml +# 一对多 +Module.layer.Linear.forward.0: Module.layer1.Linear.forward.0,Module.layer2.Linear.forward.0 +``` +```yaml +# 多对一 +Module.layer1.Linear.forward.0,Module.layer2.Linear.forward.0: Module.layer.Linear.forward.0 +``` +```yaml +# 多对多 +Module.layer3.Linear.forward.0,Module.layer5.Linear.forward.0: Module.layer1.Linear.forward.0,Module.layer2.Linear.forward.0 ``` # FAQ 1. 图比对场景,节点呈现灰色,且没有精度比对数据,怎么处理? diff --git a/debug/accuracy_tools/msprobe/docs/22.visualization_MindSpore.md b/debug/accuracy_tools/msprobe/docs/22.visualization_MindSpore.md index 12306b8be027e7cee715f99f75b00f7504ba8252..4da58b028c60184855982277e599852902b0b607 100644 --- a/debug/accuracy_tools/msprobe/docs/22.visualization_MindSpore.md +++ b/debug/accuracy_tools/msprobe/docs/22.visualization_MindSpore.md @@ -51,6 +51,7 @@ msprobe -f mindspore graph -i ./compare.json -o ./output | -oc 或 --overflow_check | 是否开启溢出检测模式,开启后会在输出vis文件中(`compare_{timestamp}.vis或build_{timestamp}.vis`)对每个溢出节点进行标记溢出等级,溢出等级说明参考[溢出等级说明](#312-溢出等级说明) | 否 | | -f 或 --fuzzy_match | 是否开启模糊匹配,bool类型。模糊匹配说明参考[匹配说明](#311-匹配说明) | 否 | | -cs 或 --complete_stack | 是否使用完整的堆栈信息,bool类型。默认使用精简的堆栈信息,数据量小有助于增加流畅度。完整堆栈和精简堆栈信息参考[堆栈信息说明](#72-堆栈信息说明) | 否 | +| -mm 或 --multi_mapping | 一对一、一对多、多对一、多对多节点映射,例如待调试侧若干小算子与标杆侧融合算子比对等场景,需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(multi)](#73-自定义映射文件multi) | 否 | #### 3.1.1 匹配说明 @@ -482,6 +483,27 @@ yaml文件中只需配置MindSpore与PyTorch模型代码中功能一致但名称 ] } ``` +### 7.3 自定义映射文件(multi) +支持一对一、一对多、多对一、多对多节点映射配置,**多个节点使用英文逗号,分隔开**。 + +配置多个节点时,如果待配置节点为Cell.layer3.Linear.forward.0、Cell.layer4.Linear.forward.0和Cell.layer5.Linear.forward.0,则Cell.layer4.Linear.forward.0无需配置,仅取首尾节点配置即可(Cell.layer3.Linear.forward.0,Cell.layer5.Linear.forward.0)。注意,**配置节点的先后顺序不能乱(construct.json中的节点名称顺序代表先后顺序,请参考[dump结果文件介绍](./06.data_dump_MindSpore.md#82-动态图场景))**,Cell.layer3.Linear.forward.0在前,就不能配置成Cell.layer5.Linear.forward.0,Cell.layer3.Linear.forward.0,会导致配置无效。 + +```yaml +# 一对一 +Cell.layer.Linear.forward.0: Cell.layer1.Linear.forward.0 +``` +```yaml +# 一对多 +Cell.layer.Linear.forward.0: Cell.layer1.Linear.forward.0,Cell.layer2.Linear.forward.0 +``` +```yaml +# 多对一 +Cell.layer1.Linear.forward.0,Cell.layer2.Linear.forward.0: Cell.layer.Linear.forward.0 +``` +```yaml +# 多对多 +Cell.layer3.Linear.forward.0,Cell.layer5.Linear.forward.0: Cell.layer1.Linear.forward.0,Cell.layer2.Linear.forward.0 +``` # FAQ 1. 图比对场景,节点呈现灰色,且没有精度比对数据,怎么处理? diff --git a/debug/accuracy_tools/msprobe/docs/27.dump_json_instruction.md b/debug/accuracy_tools/msprobe/docs/27.dump_json_instruction.md index f994dc2301bcae6b23dc7a7503297aa4fe5b3724..bf5998bce0b4cd174b9713d9417d1afb674c2b56 100644 --- a/debug/accuracy_tools/msprobe/docs/27.dump_json_instruction.md +++ b/debug/accuracy_tools/msprobe/docs/27.dump_json_instruction.md @@ -1,8 +1,8 @@ # dump.json文件说明及示例 -## 1. dump.json文件示例(PyTorch) +## 1. PyTorch 场景下的 dump.json 文件 -### 1.1 L0级别 +### 1.1 L0 级别 L0级别的dump.json文件包括模块的前反向的输入输出,以及模块的参数和参数梯度。以PyTorch的Conv2d模块为例,网络中模块调用代码为: `output = self.conv2(input) # self.conv2 = torch.nn.Conv2d(64, 128, 5, padding=2, bias=True)` @@ -168,7 +168,7 @@ dump.json文件中包含以下数据名称: } ``` -### 1.2 L1级别 +### 1.2 L1 级别 L1级别的dump.json文件包括API的前反向的输入输出。以PyTorch的relu函数为例,网络中API调用代码为: `output = torch.nn.functional.relu(input)` @@ -264,13 +264,13 @@ dump.json文件中包含以下数据名称: } ``` -### 1.3 mix级别 +### 1.3 mix 级别 mix级别的dump.json文件同时包括L0和L1级别的dump数据,文件格式与上述示例相同。 -## 2. dump.json文件示例(MindSpore) +## 2. MindSpore 场景下的 dump.json 文件 -### 2.1 L0级别 +### 2.1 L0 级别 L0级别的dump.json文件包括模块的前反向的输入输出,以及模块的参数和参数梯度。 以MindSpore的Conv2d模块为例,dump.json文件中使用的模块调用代码为: @@ -429,7 +429,7 @@ dump.json文件中包含以下数据名称: } ``` -### 2.2 L1级别 +### 2.2 L1 级别 L1级别的dump.json文件包括API的前反向的输入输出,以MindSpore的relu函数为例,网络中API调用代码为: `output = mindspore.ops.relu(input)` @@ -521,5 +521,275 @@ L1级别的dump.json文件包括API的前反向的输入输出,以MindSpore的 } ``` -### 2.3 mix级别 +### 2.3 mix 级别 + mix级别的dump.json文件同时包括L0和L1级别的dump数据,文件格式与上述示例相同。 + +## 3. MSAdapter 场景下的 dump.json 文件 + +### 3.1 L0 级别 + +L0 级别的 dump.json 文件包括模块的前反向的输入输出,以及模块的参数和参数梯度。以 Conv2d 模块为例,网络中模块调用代码为: +`output = self.conv2(input) # self.conv2 = torch.nn.Conv2d(64, 128, 5, padding=2, bias=True)` + +dump.json文件中包含以下数据名称: + +- `Module.conv2.Conv2d.forward.0`:模块的前向数据,其中input_args为模块的输入数据(位置参数),input_kwargs为模块的输入数据(关键字参数),output为模块的输出数据,parameters为模块的参数数据,包括权重(weight)和偏置(bias)。 +- `Module.conv2.Conv2d.parameters_grad`:模块的参数梯度数据,包括权重(weight)和偏置(bias)的梯度。 +- `Module.conv2.Conv2d.backward.0`:模块的反向数据,其中input为模块反向的输入梯度(对应前向输出的梯度),output为模块的反向输出梯度(对应前向输入的梯度)。 + +**说明**:当dump时传入的model参数为List[torch.nn.Module]或Tuple[torch.nn.Module]时,模块级数据的命名中包含该模块在列表中的索引index,命名格式为`{Module}.{index}.*`,*表示以上三种模块级数据的命名格式,例如:`Module.0.conv1.Conv2d.forward.0`。 + +```json +{ + "task": "tensor", + "level": "L0", + "framework": "mindtorch", + "dump_data_dir": "/dump/path", + "data": { + "Module.conv2.Conv2d.forward.0": { + "input_args": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 8, + 16, + 14, + 14 + ], + "Max": 1.638758659362793, + "Min": 0.0, + "Mean": 0.2544615864753723, + "Norm": 70.50277709960938, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.input.0.npy" + } + ], + "input_kwargs": {}, + "output": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 8, + 32, + 10, + 10 + ], + "Max": 1.6815717220306396, + "Min": -1.5120246410369873, + "Mean": -0.025344856083393097, + "Norm": 149.65576171875, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.output.0.npy" + } + ], + "parameters": { + "weight": { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32, + 16, + 5, + 5 + ], + "Max": 0.05992485210299492, + "Min": -0.05999220535159111, + "Mean": -0.0006165213999338448, + "Norm": 3.421217441558838, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.parameters.weight.npy" + }, + "bias": { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32 + ], + "Max": 0.05744686722755432, + "Min": -0.04894155263900757, + "Mean": 0.006410328671336174, + "Norm": 0.17263513803482056, + "requires_grad": true, + "data_name": "Module.conv2.Conv2d.forward.0.parameters.bias.npy" + } + } + }, + "Module.conv2.Conv2d.parameters_grad": { + "weight": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32, + 16, + 5, + 5 + ], + "Max": 0.018550323322415352, + "Min": -0.008627401664853096, + "Mean": 0.0006675920449197292, + "Norm": 0.26084786653518677, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.parameters_grad.weight.npy" + } + ], + "bias": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32 + ], + "Max": 0.014914230443537235, + "Min": -0.006656786892563105, + "Mean": 0.002657240955159068, + "Norm": 0.029451673850417137, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.parameters_grad.bias.npy" + } + ] + }, + "Module.conv2.Conv2d.backward.0": { + "input": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 8, + 32, + 10, + 10 + ], + "Max": 0.0015069986693561077, + "Min": -0.001139344065450132, + "Mean": 3.3215508210560074e-06, + "Norm": 0.020567523315548897, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.backward.0.input.0.npy" + } + ], + "output": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 8, + 16, + 14, + 14 + ], + "Max": 0.0007466732058674097, + "Min": -0.00044813455315306783, + "Mean": 6.814070275140693e-06, + "Norm": 0.01474067009985447, + "requires_grad": false, + "data_name": "Module.conv2.Conv2d.backward.0.output.0.npy" + } + ] + } + } +} +``` + +### 3.2 L1 级别 +L1级别的dump.json文件包括API的前反向的输入输出。以 relu API 为例,网络中 API 调用代码为: +`output = torch.nn.functional.relu(input)` + +dump.json文件中包含以下数据名称: +- `Functional.relu.0.forward`:API的前向数据,其中input_args为API的输入数据(位置参数),input_kwargs为API的输入数据(关键字参数),output为API的输出数据。 +- `Functional.relu.0.backward`:API的反向数据,其中input为API的反向输入梯度(对应前向输出的梯度),output为API的反向输出梯度(对应前向输入的梯度)。 + +```json +{ + "task": "tensor", + "level": "L1", + "framework": "mindtorch", + "dump_data_dir":"/dump/path", + "data": { + "Functional.relu.0.forward": { + "input_args": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 1.3864083290100098, + "Min": -1.3364859819412231, + "Mean": 0.03711778670549393, + "Norm": 236.20692443847656, + "requires_grad": true, + "data_name": "Functional.relu.0.forward.input.0.npy" + } + ], + "input_kwargs": {}, + "output": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 1.3864083290100098, + "Min": 0.0, + "Mean": 0.16849493980407715, + "Norm": 175.23345947265625, + "requires_grad": true, + "data_name": "Functional.relu.0.forward.output.0.npy" + } + ] + }, + "Functional.relu.0.backward": { + "input": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 0.0001815402356442064, + "Min": -0.00013352684618439525, + "Mean": 0.00011915402356442064, + "Norm": 0.007598237134516239, + "requires_grad": false, + "data_name": "Functional.relu.0.backward.input.0.npy" + } + ], + "output": [ + { + "type": "mindspore.Tensor", + "dtype": "Float32", + "shape": [ + 32, + 16, + 28, + 28 + ], + "Max": 0.0001815402356442064, + "Min": -0.00012117840378778055, + "Mean": 2.0098118724831693e-08, + "Norm": 0.006532244384288788, + "requires_grad": false, + "data_name": "Functional.relu.0.backward.output.0.npy" + } + ] + } + } +} +``` + +### 3.3 mix 级别 + +mix级别的dump.json文件同时包括L0和L1级别的dump数据,文件格式与上述示例相同。 \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/docs/28.kernel_dump_MindSpore.md b/debug/accuracy_tools/msprobe/docs/28.kernel_dump_MindSpore.md index 6b8cc558aa22526158033cfb35f31203d8b04278..4988586c0568b391739f7c14f1a9452461f1a6f1 100644 --- a/debug/accuracy_tools/msprobe/docs/28.kernel_dump_MindSpore.md +++ b/debug/accuracy_tools/msprobe/docs/28.kernel_dump_MindSpore.md @@ -1,4 +1,4 @@ -# MindSpore 场景的 kernel dump 说明 +# MindSpore 动态图场景的 kernel dump 说明 当使用 msprobe 数据采集功能时,level 配置为 "L2" 表示采集 kernel 层级的算子数据,仅支持昇腾 NPU 平台。 diff --git a/debug/accuracy_tools/msprobe/docs/29.data_dump_MSAdapter.md b/debug/accuracy_tools/msprobe/docs/29.data_dump_MSAdapter.md new file mode 100644 index 0000000000000000000000000000000000000000..f67b28af517f4552d297cf5fbe417a46bc8a6714 --- /dev/null +++ b/debug/accuracy_tools/msprobe/docs/29.data_dump_MSAdapter.md @@ -0,0 +1,229 @@ +# MSAdapter 场景的精度数据采集 + +MSAdapter 是一款 MindSpore 生态适配工具,可以将 PyTorch 训练脚本高效迁移至 MindSpore 框架执行,以实现在不改变原有 PyTorch 用户开发习惯的情况下,使得 PyTorch 代码能在昇腾上获得高效性能。 + +msprobe 工具主要通过在训练脚本内添加 dump 接口、启动训练的方式采集精度数据。 + +本工具提供固定的 API 支持列表,若需要删除或增加 dump 的 API,可以在 msprobe/pytorch/hook_module/support_wrap_ops.yaml 文件内手动修改,如下示例: + +```yaml +functional: # functional为算子类别,找到对应的类别,在该类别下按照下列格式删除或添加API + - conv1d + - conv2d + - conv3d +``` + +删除 API 的场景:部分模型代码逻辑会存在 API 原生类型校验,工具执行dump操作时,对封装后的模型 API 可能与模型的原生 API 类型不一致,此时可能引发校验失败,详见《[FAQ](FAQ.md)》中“异常情况”的第10和11条。 + +## 1. 工具安装 + +请参见[《msprobe 工具安装指南》](./01.installation.md)。 + +## 2 接口介绍 + +### 2.1 msprobe.mindspore.PrecisionDebugger + +**功能说明**:通过加载 dump 配置文件的方式来确定 dump 操作的详细配置。 + +**原型**: + +```Python +PrecisionDebugger(config_path=None, task=None, dump_path=None, level=None, step=None) +``` + +**参数说明**: + +1. config_path:指定 dump 配置文件路径,string 类型。参数示例:"./config.json"。未配置该路径时,默认使用 [config.json](../config.json) 文件的默认配置,配置选项含义可见 [config.json 介绍](./02.config_introduction.md)。 + +2. 其他参数与 [config.json](../config.json) 文件中的同名配置字段含义相同,具体可见 [config.json 介绍](./02.config_introduction.md)。当参数值非None时,优先级高于 [config.json](../config.json) 文件中的同名配置。 + +#### 2.1.1 start + +**功能说明**:启动精度数据采集。需要与 [**stop**](#212-stop) 接口一起添加在训练迭代的 for 循环内。 + +**原型**: + +```Python +start(model=None) +``` + +**参数说明**: + +1. model:指定需要采集 Module 级数据的模型,支持传入 torch.nn.Module、list[torch.nn.Module]或Tuple[torch.nn.Module] 类型,默认未配置。level 配置为 "L0" 或 "mix" 时,必须在该接口中配置该参数。API级别("L1" level)dump 时,传入 model 可以采集 model 内包含 primitive op 对象在内的所有 API 数据,若不传入 model 参数,则只采集非 primitive op 的 API 数据。 + +#### 2.1.2 stop + +**功能说明**:停止精度数据采集。在 **start** 接口调用之后的任意位置添加。若 **stop** 接口添加在反向计算代码之后,则会采集 **start** 和该接口之间的前反向数据。 +若 **stop** 接口添加在反向计算代码之前,则需要将 [**step**](#213-step) 接口添加到反向计算代码之后,才能采集 **start** 和该接口之间的前反向数据。 + +**注意**:**stop** 接口必须调用,否则可能导致精度数据落盘不全。 + +**原型**: + +```Python +stop() +``` + +#### 2.1.3 step + +**功能说明**:进行训练 step 数的自增,完成当前 step 所有数据的落盘并更新 dump 参数。在一个 step 训练结束的位置添加,且必须在 **stop** 接口之后的位置调用。该接口需要配合 **start** 和 **stop** 函数使用,尽量添加在反向计算代码之后,否则可能会导致反向数据丢失。 + +**原型**: + +```Python +step() +``` + +#### 2.1.4 forward_backward_dump_end + +**功能说明**:停止精度数据采集。与 **stop** 接口功能相同,该函数在将来会被移除,建议使用 **stop** 接口。 + +**原型**: + +```Python +forward_backward_dump_end() +``` + +#### 2.1.5 save + +**功能说明**:单点保存网络执行过程中正反向数值,并以统计值/张量文件落盘。 + +**原型**: +```python +save(variable, name, save_backward=True) +``` + +**参数说明**: +| 参数名称 | 参数含义 | 支持数据类型 | 是否必选| +| ---------- | ------------------| ------------------- | ------------------- | +| variable | 需要保存的变量 |dict, list, tuple, torch.tensor, int, float, str | 是 | +| name | 指定的名称 | str | 是 | +| save_backward | 是否保存反向数据 | boolean | 否 | + +### 2.2 msprobe.mindspore.seed_all + +**功能说明**:用于固定网络中的随机性和开启确定性计算。 + +**原型**: +```python +seed_all(seed=1234, mode=False, rm_dropout=True) +``` + +**参数说明**: + +1. seed: 随机性种子,默认值:1234,非必选。参数示例: seed=1000。该参数用于 random、numpy.random, mindspore.common.Initializer、mindspore.nn.probability.distribution的随机数生成以及 Python 中 str、bytes、datetime 对象的 hash 算法。 + +2. mode:确定性计算使能,可配置 True 或 False,默认值:False,非必选。参数示例:mode=True。该参数设置为 True 后,将会开启算子确定性运行模式与归约类通信算子(AllReduce、ReduceScatter、Reduce)的确定性计算。注意:确定性计算会导致 API 执行性能降低,建议在发现模型多次执行结果不同的情况下开启。 + +3. rm_dropout:控制 dropout 失效的开关。可配置 True 或 False,默认值:True,非必选。参数示例:rm_dropout=True。该参数设置为 True 后,将会使 mindspore.ops.Dropout,mindspore.ops.Dropout2D,mindspore.ops.Dropout3D,mindspore.mint.nn.Dropout和mindspore.mint.nn.functional.dropout 失效,以避免因随机 dropout 造成的网络随机性。建议在采集数据前调用。 + +**注意**:通过 rm_dropout 控制 dropout 失效或生效需要在初始化 Dropout 实例前调用才能生效。 + +## 3 示例代码 + +以下为添加了 msprobe 工具 dump 接口的示例训练脚本。 + +```python +import mindspore as ms +import torch +import torch.nn as nn +import torch.nn.functional as F + +# 导入工具的数据采集接口 +from msprobe.mindspore import PrecisionDebugger + +# 在模型训练开始前实例化PrecisionDebugger +debugger = PrecisionDebugger(config_path='./config.json') + + +# 定义网络 +class Net(nn.Module): + def __init__(self) -> None: + super().__init__() + self.linear1 = nn.Linear(in_features=8, out_features=4) + self.linear2 = nn.Linear(in_features=4, out_features=2) + + def forward(self, x): + x1 = self.linear1(x) + x2 = self.linear2(x1) + logits = F.relu(x2) + return logits + + +net = Net() + + +def train_step(inputs): + return net(inputs) + + +if __name__ == "__main__": + data = (torch.randn(10, 8), torch.randn(10, 8), torch.randn(10, 8)) + grad_fn = ms.value_and_grad(train_step, grad_position=0) + + for inputs in data: + # 开启数据 dump + debugger.start(model=net) + + out, grad = grad_fn(inputs) + + # 停止数据 dump + debugger.stop() + # 更新 step 信息 + debugger.step() +``` + +## 4 dump 结果文件介绍 + +训练结束后,工具将 dump 的数据保存在 dump_path 参数指定的目录下。目录结构示例如下: + +```lua +├── dump_path +│ ├── step0 +│ | ├── rank0 +│ | │ ├── dump_tensor_data +| | | | ├── Tensor.permute.1.forward.npy +| | | | ├── Functional.linear.5.backward.output.npy # 命名格式为{api_type}.{api_name}.{API调用次数}.{forward/backward}.{input/output}.{参数序号}, 其中,“参数序号”表示该API的第n个输入或输出,例如1,则为第一个参数,若该参数为list格式,则根据list继续排序,例如1.1,表示该API的第1个参数的第1个元素。 +| | | | ... +| | | | ├── Module.conv1.Conv2d.forward.0.input.0.npy # 命名格式为{Module}.{module_name}.{class_name}.{forward/backward}.{调用次数}.{input/output}.{参数序号}, 其中,“参数序号”表示该Module的第n个参数,例如1,则为第一个参数,若该参数为list格式,则根据list继续排序,例如1.1,表示该Module的第1个参数的第1个元素。 +| | | | ├── Module.conv1.Conv2D.forward.0.parameters.bias.npy # 模块参数数据:命名格式为{Module}.{module_name}.{class_name}.forward.{调用次数}.parameters.{parameter_name}。 +| | | | └── Module.conv1.Conv2D.parameters_grad.weight.npy # 模块参数梯度数据:命名格式为{Module}.{module_name}.{class_name}.parameters_grad.{parameter_name}。因为同一模块的参数使用同一梯度进行更新,所以参数梯度文件名不包含调用次数。 +| | | | # 当dump时传入的model参数为List[torch.nn.Module]或Tuple[torch.nn.Module]时,模块级数据的命名中包含该模块在列表中的索引index,命名格式为{Module}.{index}.*,*表示以上三种模块级数据的命名格式,例如:Module.0.conv1.Conv2d.forward.0.input.0.npy。 +│ | | ├── dump.json +│ | | ├── stack.json +│ | | └── construct.json +│ | ├── rank1 +| | | ├── dump_tensor_data +| | | | └── ... +│ | | ├── dump.json +│ | | ├── stack.json +| | | └── construct.json +│ | ├── ... +│ | | +| | └── rank7 +│ ├── step1 +│ | ├── ... +│ ├── step2 +``` +* `rank`:设备 ID,每张卡的数据保存在对应的 `rank{ID}` 目录下。非分布式场景下没有 rank ID,目录名称为 rank。 +* `dump_tensor_data`:保存采集到的张量数据。 +* `dump.json`: 保存 API 或 Module 前反向数据的统计量信息。包含 dump 数据的 API 名称或 Module 名称,各数据的 dtype、 shape、max、min、mean、L2norm(L2范数,平方根)统计信息以及当配置 summary_mode="md5" 时的 CRC-32 数据。具体介绍可参考[dump.json文件说明](./27.dump_json_instruction.md#3-MSAdapter场景下的dump.json文件)。 +* `stack.json`:API/Module 的调用栈信息。 +* `construct.json`:分层分级结构,level 为 L1 时,construct.json 内容为空。 + + +当 task 为 tensor 时,dump 过程中,npy 文件在对应算子或者模块被执行后就会落盘,而 json 文件则需要在正常执行 PrecisionDebugger.stop() 后才会写入完整数据。因此如果程序异常终止,终止前被执行算子的相关 npy 文件得以保存,但 json 文件中的数据可能丢失。 + +其中 rank 为设备上各卡的 ID,每张卡上 dump 的数据会生成对应 dump 目录。非分布式场景下没有 rank ID,目录名称为 rank。 + +npy 文件名的前缀含义如下: + +| 前缀 | 含义 | +| ----------- | ---------------------------- | +| Tensor | torch.Tensor API数据 | +| Torch | torch API数据 | +| Functional | torch.nn.functional API数据 | +| NPU | NPU 亲和API数据 | +| Distributed | torch.distributed API数据 | +| Jit | 被 "jit" 装饰的模块或函数数据 | +| Module | torch.nn.Module 类(模块)数据 | \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/docs/30.overflow_check_MSAdapter.md b/debug/accuracy_tools/msprobe/docs/30.overflow_check_MSAdapter.md new file mode 100644 index 0000000000000000000000000000000000000000..01d64c808d40a1e5c4ea2190c028a7c389ffbdc4 --- /dev/null +++ b/debug/accuracy_tools/msprobe/docs/30.overflow_check_MSAdapter.md @@ -0,0 +1,31 @@ +# MSAdapter 场景的溢出检测 + +msprobe 工具提供 MSAdapter 场景下的溢出检测功能。其检测对象为 **API** 级别(除 Primitive 和 Jit 类 API)或**模块**级别,分别对应 config.json 配置中的 **"L1"** 、**"L0"** level。 + +需要注意,本工具仅支持在 INF/NAN 模式a下进行溢出检测。INF/NAN 模式的使能方式如下: + +```Shell +# 使能 CANN 侧 INF/NAN 模式 +export INF_NAN_MODE_ENABLE=1 +# 使能 MindSpore 框架侧 INF/NAN 模式 +export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE" +``` + +**a**:在处理浮点数计算溢出问题时,NPU 当前支持两种溢出模式:INF/NAN 模式与饱和模式。INF/NAN 模式遵循 IEEE 754 标准,根据定义输出 INF/NAN 的计算结果。与之对应的饱和模式在计算出现溢出时,饱和为浮点数极值(+-MAX)。对于 CANN 侧配置,Atlas 训练系列产品,默认为饱和模式,且不建议使用 INF/NAN 模式;Atlas A2训练系列产品,默认为 INF/NAN 模式,且不建议使用饱和模式。对于 MindSpore 框架侧配置,仅支持对 Atlas A2 训练系列产品进行设置,默认为 INF/NAN 模式。CANN 侧 与 MindSpore 框架侧配置须一致。 + +溢出检测任务的配置示例见["**MindSpore 动态图场景 task 配置为 overflow_check**"](./03.config_examples.md#33-task配置为overflow_check)小节。 + + +## 1 接口介绍 + +溢出检测功能提供的接口与数据采集任务一致,详见 MSAdapter 场景的精度数据采集中的["**2 接口介绍**"](./29.data_dump_MSAdapter.md#2-接口介绍)小节。 + +需要注意,目前暂不支持 "L1" level 下 primitive op 的溢出检测。 + +## 2 示例代码 + +溢出检测功能使用方式与数据采集任务一致,详见 MSAdapter 场景的精度数据采集中的["**3 示例代码**"](./29.data_dump_MSAdapter.md#3-示例代码)小节。 + +## 3 溢出检测结果文件介绍 + +溢出检测结果文件目录结构与含义与数据采集任务一致,但仅保存溢出 API 或 模块 的真实数据或统计信息。详见 MSAdapter 场景的精度数据采集中的["**4 dump 结果文件介绍**"](./29.data_dump_MSAdapter.md#4-dump-结果文件介绍)小节。 \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/docs/FAQ.md b/debug/accuracy_tools/msprobe/docs/FAQ.md index 833ca07a236f33e69b102d4acb45d35cd6fe7e3a..252fc94e97fb7ea74caa96f4e09851ecbccf8d88 100644 --- a/debug/accuracy_tools/msprobe/docs/FAQ.md +++ b/debug/accuracy_tools/msprobe/docs/FAQ.md @@ -58,11 +58,7 @@ 答:对于 fp16 的数据,CPU 会上升一个精度 fp32 去计算,这是和算子那边对齐的精度结论,CPU 用更高精度去计算会更接近真实值。 -6. 添加预检工具后截取操作报错:`IndexError: too many indices for tensor of dimension x` 或 `TypeError: len() of a 0-d tensor`。 - - 答:注释工具目录 `mstt/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml` 文件中 Tensor: 下的 `- __getitem__`,工具会跳过采集该 API。如果是需要 dump 关键位置 API 也可以考虑根据报错堆栈信息注释引发报错的类型检查。 - -7. Tensor 魔法函数具体对应什么操作? +6. Tensor 魔法函数具体对应什么操作? 答: @@ -202,15 +198,11 @@ def npu_forward_fused_softmax(self, input_, mask): 答:正常现象,dataloader 通过 raise 结束程序,堆栈信息可忽略。 -10. 添加 msprobe 工具后截取操作报错:`IndexError: too many indices for tensor of dimension x` 或 `TypeError: len() of a 0-d tensor`。 - - 答:注释工具目录 `mstt/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml` 文件中 `Tensor: ` 下的 `- __getitem__`,工具会跳过采集该 API。如果是需要采集关键位置 API 也可以考虑根据报错堆栈信息注释引发报错的类型检查。 - -11. 使用 msprobe 工具数据采集功能后,模型出现报错,报错信息为:`activation_func must be F.gelu` 或 `ValueError(Only support fusion of gelu and swiglu)`。 +10. 使用 msprobe 工具数据采集功能后,模型出现报错,报错信息为:`activation_func must be F.gelu` 或 `ValueError(Only support fusion of gelu and swiglu)`。 答:这一类报错常见于 Megatron/MindSpeed/ModelLink 等加速库或模型仓中,原因是工具本身会封装 torch 的 API(API类型和地址会发生改变),而有些 API 在工具使能前类型和地址就已经确定,此时工具无法对这类 API 再进行封装,而加速库中会对某些 API 进行类型检查,即会把工具无法封装的原始的 API和工具封装之后的 API 进行判断,所以会报错。 规避方式有3种:①将PrecisionDebugger的实例化放在文件的开始位置,即导包后的位置,确保所有API都被封装;②注释 `mstt/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml` 文件中的 `-gelu` 或者 `-silu`,工具会跳过采集该 API。③ 可以考虑根据报错堆栈信息注释引发报错的类型检查。 -12. 添加 msprobe 工具后触发与 AsStrided 算子相关、或者编译相关的报错,如:`Failed to compile Op [AsStrided]`。 +11. 添加 msprobe 工具后触发与 AsStrided 算子相关、或者编译相关的报错,如:`Failed to compile Op [AsStrided]`。 答:注释工具目录 `mstt/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml` 文件中 `Tensor: `下的 `-t` 和 `- transpose`。 diff --git a/debug/accuracy_tools/msprobe/docs/img/compare_result.png b/debug/accuracy_tools/msprobe/docs/img/compare_result.png index 07cdb51707fe43d07723ed976275d99f55b50571..b6d7ec6dfcbc44b4b7056e1297a481f495ceb86e 100644 Binary files a/debug/accuracy_tools/msprobe/docs/img/compare_result.png and b/debug/accuracy_tools/msprobe/docs/img/compare_result.png differ diff --git a/debug/accuracy_tools/msprobe/mindspore/__init__.py b/debug/accuracy_tools/msprobe/mindspore/__init__.py index 089c29eb098ad4305edcca1306462f8924dd9291..cbdab34f0446ee12c07b2aba8b4f75018496eda6 100644 --- a/debug/accuracy_tools/msprobe/mindspore/__init__.py +++ b/debug/accuracy_tools/msprobe/mindspore/__init__.py @@ -17,12 +17,13 @@ import os try: from msprobe.lib import _msprobe_c - os.environ["MS_HOOK_ENABLE"] = "on" os.environ["HOOK_TOOL_PATH"] = _msprobe_c.__file__ except ImportError: from .common.log import logger logger.info("Module _msprobe_c has not been installed. L2-Dump may not work normally.") from msprobe.mindspore.debugger.precision_debugger import PrecisionDebugger -from msprobe.mindspore.common.utils import seed_all -from msprobe.mindspore.monitor.module_hook import TrainerMon \ No newline at end of file +from msprobe.mindspore.common.utils import seed_all, MsprobeStep, MsprobeInitStep +from msprobe.mindspore.monitor.module_hook import TrainerMon + +os.environ["MS_HOOK_ENABLE"] = "on" diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_accuracy_checker.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_accuracy_checker.py index 98c6b4b98530ec447c2e239c11b5d4d7b927d874..557d731e042913da3a622035219ec8dea0409ab4 100644 --- a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_accuracy_checker.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_accuracy_checker.py @@ -16,7 +16,7 @@ import os from tqdm import tqdm -from msprobe.core.common.const import Const, CompareConst, MsCompareConst +from msprobe.core.common.const import Const, CompareConst from msprobe.core.common.file_utils import FileOpen, create_directory, write_csv, load_json, load_yaml from msprobe.core.common.utils import add_time_as_suffix from msprobe.mindspore.api_accuracy_checker.api_info import ApiInfo @@ -25,6 +25,7 @@ from msprobe.mindspore.api_accuracy_checker.base_compare_algorithm import compar from msprobe.mindspore.api_accuracy_checker.data_manager import DataManager from msprobe.mindspore.api_accuracy_checker.utils import (check_and_get_from_json_dict, global_context, trim_output_compute_element_list) +from msprobe.mindspore.common.const import MsCompareConst from msprobe.mindspore.common.log import logger from msprobe.mindspore.api_accuracy_checker import torch_mindtorch_importer @@ -156,6 +157,7 @@ class ApiAccuracyChecker: real_api_str = Const.SEP.join(api_name_str_list[1:-2]) api_list = load_yaml(yaml_path) supported_tensor_api_list = api_list.get(MsCompareConst.SUPPORTED_TENSOR_LIST_KEY) + supported_fusion_api_list = MsCompareConst.SUPPORTED_FUSION_LIST if api_type_str in (MsCompareConst.MINT, MsCompareConst.MINT_FUNCTIONAL) \ and global_context.get_framework() == Const.MS_FRAMEWORK: return True @@ -165,6 +167,9 @@ class ApiAccuracyChecker: if api_type_str == MsCompareConst.TENSOR_API and real_api_str in supported_tensor_api_list \ and global_context.get_framework() == Const.MS_FRAMEWORK: return True + if api_type_str == MsCompareConst.FUNCTIONAL_API and real_api_str in supported_fusion_api_list \ + and global_context.get_framework() == Const.MS_FRAMEWORK: + return True return False def parse(self, api_info_path): diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_runner.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_runner.py index f42702be0b114e40e5e31dc4326bd9ca21f82202..36e506f67737cdea4452ba27f4fad0524d4c2884 100644 --- a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_runner.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/api_runner.py @@ -15,11 +15,13 @@ import mindspore from mindspore import ops -from msprobe.core.common.const import Const, MsCompareConst +from msprobe.core.common.const import Const from msprobe.core.common.exceptions import ApiAccuracyCheckerException from msprobe.mindspore.api_accuracy_checker.compute_element import ComputeElement from msprobe.mindspore.api_accuracy_checker.type_mapping import float_dtype_str_list, torch_dtype_to_dtype_str from msprobe.mindspore.api_accuracy_checker.utils import convert_to_tuple +from msprobe.mindspore.api_accuracy_checker.bench_functions.fusion_operator import fusion +from msprobe.mindspore.common.const import MsCompareConst from msprobe.mindspore.common.log import logger @@ -64,7 +66,9 @@ api_parent_module_mapping = { (MsCompareConst.MINDTORCH_FUNC, Const.MT_FRAMEWORK): mindtorch_func, (MsCompareConst.MINDTORCH_FUNC, Const.PT_FRAMEWORK): torch.nn.functional, (MsCompareConst.MINDTORCH_DIST, Const.MT_FRAMEWORK): mindtorch_dist, - (MsCompareConst.MINDTORCH_DIST, Const.PT_FRAMEWORK): torch.distributed + (MsCompareConst.MINDTORCH_DIST, Const.PT_FRAMEWORK): torch.distributed, + (MsCompareConst.FUNCTIONAL_API, Const.MS_FRAMEWORK): mindspore.ops, + (MsCompareConst.FUSION_API, Const.PT_FRAMEWORK): fusion } @@ -83,7 +87,9 @@ api_parent_module_str_mapping = { (MsCompareConst.MINDTORCH_FUNC, Const.MT_FRAMEWORK): "mindtorch_func", (MsCompareConst.MINDTORCH_FUNC, Const.PT_FRAMEWORK): "torch.nn.functional", (MsCompareConst.MINDTORCH_DIST, Const.MT_FRAMEWORK): "mindtorch_dist", - (MsCompareConst.MINDTORCH_DIST, Const.PT_FRAMEWORK): "torch.distributed" + (MsCompareConst.MINDTORCH_DIST, Const.PT_FRAMEWORK): "torch.distributed", + (MsCompareConst.FUNCTIONAL_API, Const.MS_FRAMEWORK): "mindspore.ops", + (MsCompareConst.FUSION_API, Const.PT_FRAMEWORK): "fusion" } @@ -125,7 +131,8 @@ class ApiRunner: err_msg = f"ApiRunner.get_info_from_name failed: api_name_str: {api_name_str} is not in defined format" logger.error_log_with_exp(err_msg, ApiAccuracyCheckerException(ApiAccuracyCheckerException.WrongValue)) api_type_str, api_sub_name = api_name_list[0], api_name_list[1] - if api_type_str not in [MsCompareConst.MINT, MsCompareConst.MINT_FUNCTIONAL, MsCompareConst.TENSOR_API] \ + if api_type_str not in [MsCompareConst.MINT, MsCompareConst.MINT_FUNCTIONAL, MsCompareConst.TENSOR_API, + MsCompareConst.FUNCTIONAL_API] \ and api_platform == Const.MS_FRAMEWORK: err_msg = f"ApiRunner.get_info_from_name failed: not mint, mint.nn.functional or Tensor api" logger.error_log_with_exp(err_msg, ApiAccuracyCheckerException(ApiAccuracyCheckerException.WrongValue)) @@ -139,9 +146,9 @@ class ApiRunner: def get_api_instance(api_type_str, api_sub_name, api_platform): """ Args: - api_type_str: str, Union["MintFunctional", "Mint", "Tensor"] + api_type_str: str, Union["MintFunctional", "Mint", "Tensor", "Functional"] api_sub_name: str, e.g. "relu" - api_platform: str: Union["mindpore", "torch"] + api_platform: str: Union["mindpore", "pytorch"] Return: api_instance: function object @@ -151,9 +158,12 @@ class ApiRunner: mindspore.mint.{api_sub_name} <--> torch.{api_sub_name} mindspore.mint.nn.functional.{api_sub_name} <--> torch.nn.functional.{api_sub_name} """ - - api_parent_module = api_parent_module_mapping.get((api_type_str, api_platform)) - api_parent_module_str = api_parent_module_str_mapping.get((api_type_str, api_platform)) + if api_sub_name in MsCompareConst.SUPPORTED_FUSION_LIST and api_platform == "pytorch": + api_parent_module = api_parent_module_mapping.get((MsCompareConst.FUSION_API, api_platform)) + api_parent_module_str = api_parent_module_str_mapping.get((MsCompareConst.FUSION_API, api_platform)) + else: + api_parent_module = api_parent_module_mapping.get((api_type_str, api_platform)) + api_parent_module_str = api_parent_module_str_mapping.get((api_type_str, api_platform)) full_api_name = api_parent_module_str + Const.SEP + api_sub_name if not hasattr(api_parent_module, api_sub_name): diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/base_compare_algorithm.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/base_compare_algorithm.py index ead03d25ea5c2e6bb0422486f1939c5b31ee589b..da2f8ad612fcf3a42083894ff1b8e56db757f919 100644 --- a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/base_compare_algorithm.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/base_compare_algorithm.py @@ -18,9 +18,10 @@ from abc import ABC, abstractmethod import mindspore import numpy as np import torch -from msprobe.core.common.const import CompareConst, MsCompareConst +from msprobe.core.common.const import CompareConst from msprobe.core.common.exceptions import ApiAccuracyCheckerException from msprobe.mindspore.common.log import logger +from msprobe.mindspore.common.const import MsCompareConst class CompareResult: diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/bench_functions/flash_attention_score.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/bench_functions/flash_attention_score.py new file mode 100644 index 0000000000000000000000000000000000000000..cb268efeae90a51465493c65caa948045bae4913 --- /dev/null +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/bench_functions/flash_attention_score.py @@ -0,0 +1,602 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from collections import namedtuple +import torch +import torch.nn as nn +import numpy as np + +from einops import rearrange + + +from msprobe.pytorch.common.utils import logger + +GTYPE = torch.float64 # arm host必须选择float64,x86环境选择float32即可,64也行。arm计算很慢,s=8k的场景建议使用x86 +SOFTMAX_BUILD_MODE = "QKV" # "MAX_SUM" + +FaForwardParams = namedtuple("FaForwardParams", + ["q", "k", "v", "drop_mask", "attn_mask", "pse", "scalar_value", "keep_prob"]) +FaBackwardParams = namedtuple("FaBackwardParams", + ["dx", "q", "k", "v", "softmax_res", "drop_mask", "pse", "scalar_value", "keep_prob"]) +RebuildSoftmaxParams = namedtuple("RebuildSoftmaxParams", + ["q", "k", "attn_mask", "pse", "scalar_value", "softmax_max", "softmax_sum"]) + + +def softmax_forward(x): + x_max = torch.max(x, dim=-1, keepdims=True)[0] + x_sub = x.sub(x_max) + y = torch.exp(x_sub) + x_sum = y.sum(dim=-1, keepdims=True) + res = y.div(x_sum) + return res, x_max, x_sum + + +def softmax_grad(dp, softmax_res): + muls = dp * softmax_res + muls_r = muls.sum(dim=-1, keepdims=True) + sub_r = dp - muls_r + res = sub_r * softmax_res + return res + + +def broadcast_kv(num_heads, num_kv_heads, kv_tensor, dtype): + if num_kv_heads == 0 or num_kv_heads > num_heads: + raise ValueError(f"num_kv_heads must be non-zero and bigger than num_heads.") + + factor = num_heads // num_kv_heads + kv_shape = kv_tensor.shape + b = kv_shape[0] + s = kv_shape[2] + d = kv_shape[3] + kv_res = torch.zeros([b, num_heads, s, d]).to(dtype) + for i in range(num_heads): + j = i // factor + kv_res[:, i:i + 1, :, :] = kv_tensor[:, j:j + 1, :, :] + return kv_res + + +def calculate_qk(q, k, attn_mask, pse, scalar_value): + if k.dim() != 4: + raise ValueError(f"k tensor dimension must be 4, but got {k.dim()} dimensions (shape: {k.shape})") + + if k.dim() == 3: + k = k.unsqueeze(1) # 在head维度扩展 + + if pse is None or len(pse.shape) == 0: + qk = torch.matmul(q, k.permute(0, 1, 3, 2)).mul(scalar_value) + else: + qk = (torch.matmul(q, k.permute(0, 1, 3, 2)) + pse).mul(scalar_value) + if attn_mask is None or len(attn_mask.shape) == 0: + return qk + else: + qk = qk + attn_mask.bool() * (-40000.0) # -10000 + return qk + + +def fusion_attention_forward(forward_params): + q = forward_params.q + k = forward_params.k + v = forward_params.v + drop_mask = forward_params.drop_mask + attn_mask = forward_params.attn_mask + pse = forward_params.pse + scalar_value = forward_params.scalar_value + keep_prob = forward_params.keep_prob + + qk = calculate_qk(q, k, attn_mask, pse, scalar_value) + softmax_res, softmax_max, softmax_sum = softmax_forward(qk) + if drop_mask is None or len(drop_mask.shape) == 0: + drop_res = softmax_res + else: + drop_res = softmax_res * drop_mask * (1.0 / keep_prob) + y = torch.matmul(drop_res, v) + return y, softmax_max, softmax_sum + + +def fusion_attention_backward(backward_params): + dx = backward_params.dx + q = backward_params.q + k = backward_params.k + v = backward_params.v + softmax_res = backward_params.softmax_res + drop_mask = backward_params.drop_mask + pse = backward_params.pse + scalar_value = backward_params.scalar_value + keep_prob = backward_params.keep_prob + dp = torch.matmul(dx, v.permute(0, 1, 3, 2)) + if drop_mask is None or len(drop_mask.shape) == 0: + drop_res = softmax_res.permute(0, 1, 3, 2) + dp_drop = dp + else: + drop_res = softmax_res.mul(drop_mask).mul(1.0 / keep_prob).permute(0, 1, 3, 2) + dp_drop = dp * drop_mask * (1.0 / keep_prob) + dv = torch.matmul(drop_res, dx) + softmax_grad_res = (softmax_grad(dp_drop, softmax_res) * scalar_value) + dq = torch.matmul(softmax_grad_res, k) + dk = torch.matmul(softmax_grad_res.permute(0, 1, 3, 2), q) + return dq, dk, dv + + +def parse_bsnd_args(query, key, head_num, input_layout): + supported_input_layout = ["BSH", "SBH", "BSND", "BNSD", "TND"] + b, s1, s2, n1, n2, d, h1, h2 = None, None, None, head_num, None, None, None, None + + if not isinstance(input_layout, str) or input_layout not in supported_input_layout: + raise ValueError(f"Invalid input_layout arg which must be one of {supported_input_layout}.") + + if input_layout == "TND": + raise ValueError(f"input_layout {input_layout} does not supported for now.") + try: + if input_layout == "BSH": + b, s1, h1 = query.shape + _, s2, h2 = key.shape + d = h1 // n1 + n2 = h2 // d + elif input_layout == "SBH": + s1, b, h1 = query.shape + s2, _, h2 = key.shape + d = h1 // n1 + n2 = h2 // d + elif input_layout == "BSND": + b, s1, n1, d = query.shape + _, s2, n2, _ = key.shape + h1 = n1 * d + h2 = n2 * d + elif input_layout == "BNSD": + b, n1, s1, d = query.shape + _, n2, s2, _ = key.shape + h1 = n1 * d + h2 = n2 * d + except Exception as e: + raise ValueError(f"query.shape: {query.shape}, key.shape: {key.shape}, parse_bsnd_args error: {e}") from e + + if d == 0: + raise ValueError(f"Value d must be non-zero.") + _dtype = query.dtype + ret = (b, s1, s2, n1, n2, d, h1, h2, _dtype) + return ret + + +def convert_from_bnsd(_input, input_layout): + """ + transform qkv from bnsd to input_layout. + B: batch_size + S: sequence_length + N: num_heads + D: head_dim + Args: + _input (torch.Tensor): tensor of shape (B,N,S,D) + input_layout (str): "BSH" or "SBH" or "BSND" or "BNSD" or "TND" + Returns: + tensor of shape (B,N,S,D) or (B,S,N,D) or (S,B,H) or (B,S,H) + """ + if input_layout == "BSH": + # (B,N,S,D)=>(B,S,N*D) + out = rearrange(_input, 'b n s d -> b s (n d)').contiguous() + elif input_layout == "SBH": + # (B,N,S,D)=>(S,B,N*D) + out = rearrange(_input, 'b n s d -> s b (n d)').contiguous() + elif input_layout == "BSND": + # (B,N,S,D)=>(B,S,N,D) + out = rearrange(_input, 'b n s d -> b s n d').contiguous() + elif input_layout == "TND": + raise ValueError(f"input_layout {input_layout} does not supported for now.") + else: + out = _input + return out + + +def convert_to_bnsd(_input, n, input_layout): + """ + transform qkv from input_layout to bnsd. + B: batch_size + S: sequence_length + N: num_heads + D: head_dim + Args: + _input (torch.Tensor): tensor of shape (B,N,S,D) or (B,S,N,D) or (S,B,H) or (B,S,H) + n (int): num_heads + input_layout (str):"BSH" or "SBH" or "BSND" or "BNSD" or "TND" + Returns: + tensor of shape (B,N,S,D) + """ + if input_layout == "BSH": + # (B,S,N*D)=>(B,N,S,D) + out = rearrange(_input, 'b s (n d) -> b n s d', n=n) + elif input_layout == "SBH": + # (S,B,N*D)=>(B,N,S,D) + out = rearrange(_input, 's b (n d) -> b n s d', n=n) + elif input_layout == "BSND": + # (B,S,N,D)=>(B,N,S,D) + out = rearrange(_input, 'b s n d -> b n s d', n=n) + elif input_layout == "TND": + raise ValueError(f"input_layout {input_layout} does not supported for now.") + else: + out = _input + if out.dim() != 4: + raise ValueError(f"convert qkv format failed with input_layout {input_layout}.") + return out.to(GTYPE) + + +def convert_from_bsnd(_input, input_layout): + """ + transform qkv from bsnd to input_layout. + B: batch_size + S: sequence_length + N: num_heads + D: head_dim + Args: + _input (torch.Tensor): tensor of shape (B,S,N,D) + input_layout (str): "BSH" or "SBH" or "BSND" or "BNSD" or "TND" + Returns: + tensor of shape (B,N,S,D) or (B,S,N,D) or (S,B,H) or (B,S,H) + """ + if input_layout == "BSH": + # (B,S,N,D)=>(B,S,N*D) + out = rearrange(_input, 'b s n d -> b s (n d)').contiguous() + elif input_layout == "SBH": + # (B,S,N,D)=>(S,B,N*D) + out = rearrange(_input, 'b s n d -> s b (n d)').contiguous() + elif input_layout == "BNSD": + # (B,S,N,D)=>(B,N,S,D) + out = rearrange(_input, 'b s n d -> b n s d').contiguous() + elif input_layout == "TND": + raise ValueError(f"input_layout {input_layout} does not supported for now.") + else: + out = _input + return out + + +def convert_to_bsnd(_input, n, input_layout): + """ + transform qkv from input_layout to bsnd. + B: batch_size + S: sequence_length + N: num_heads + D: head_dim + Args: + _input (torch.Tensor): tensor of shape (B,N,S,D) or (B,S,N,D) or (S,B,H) or (B,S,H) + n (int): num_heads + input_layout (str):"BSH" or "SBH" or "BSND" or "BNSD" or "TND" + Returns: + tensor of shape (B,S,N,D) + """ + if input_layout == "BSH": + # (B,S,N*D)=>(B,S,N,D) + out = rearrange(_input, 'b s (n d) -> b s n d', n=n) + elif input_layout == "SBH": + # (S,B,N*D)=>(B,S,N,D) + out = rearrange(_input, 's b (n d) -> b s n d', n=n) + elif input_layout == "BNSD": + # (B,N,S,D)=>(B,S,N,D) + out = rearrange(_input, 'b n s d -> b s n d', n=n) + elif input_layout == "TND": + raise ValueError(f"input_layout {input_layout} does not supported for now.") + else: + out = _input + if out.dim() != 4: + raise ValueError(f"convert qkv format failed with input_layout {input_layout}.") + return out + + +def generate_attn_mask(*args): + """ + # 当sparse_mode=2、3、4时小算子到融合算子会走这个优化,反过来看就要拆解回原来的基本实现 + ===> attn_mask = torch.from_numpy(np.triu(np.ones([2048, 2048]), k=1)).to(dtype) + """ + + sparse_mode, attn_mask, b, n1, s1, s2, pre_tocken, next_tocken, dtype = args + shape = [s1, s2] + + if attn_mask is not None: + # 当FA的输入已经包含attn_mask时,可以认为已经是转换之后的mask矩阵了,有三种特殊场景,即稀疏矩阵场景,需要进行逆向还原 + if sparse_mode == 2 or sparse_mode == 3 or sparse_mode == 4: + logger.info(f"s1: {s1}, s2:{s2}, attn_mask.shape:{attn_mask.shape}, attn_mask.dtype:{attn_mask.dtype}") + + if attn_mask.dim() == 2 and attn_mask.shape[0] == 2048 and attn_mask.shape[1] == 2048: + if attn_mask.equal(torch.from_numpy(np.triu(np.ones([2048, 2048]), k=1)).to(attn_mask.dtype)): + if sparse_mode == 2: + attn_mask = torch.from_numpy(np.triu(np.ones(shape), k=1)) + elif sparse_mode == 3: + attn_mask = torch.from_numpy(np.triu(np.ones(shape), k=s2 - s1 + 1)) + elif sparse_mode == 4: + attn_mask_u = torch.from_numpy(np.triu(np.ones(shape), k=next_tocken + 1)) + attn_mask_l = torch.from_numpy(np.tril(np.ones(shape), k=-pre_tocken - 1)) + attn_mask = attn_mask_u + attn_mask_l + logger.debug(f"反向转换attn_mask {attn_mask.shape}") + return attn_mask.to(dtype) + + return attn_mask.to(dtype) + + if attn_mask is not None: + if attn_mask.dim() == 2: + if attn_mask.shape[0] != s1 or attn_mask.shape[1] != s2: + raise ValueError(f"Invalid attn_mask shape `SS` {attn_mask.shape}") + shape = [s1, s2] + elif attn_mask.dim() == 4: + if attn_mask.shape[1] == 1: + shape = [b, 1, s1, s2] if b != 1 else [1, 1, s1, s2] + else: + shape = [b, n1, s1, s2] if b != 1 else [1, n1, s1, s2] + + if sparse_mode == 0: + attn_mask_u = torch.from_numpy(np.triu(np.ones(shape), k=next_tocken + 1)) + attn_mask_l = torch.from_numpy(np.tril(np.ones(shape), k=-pre_tocken - 1)) + attn_mask = attn_mask_u + attn_mask_l + elif sparse_mode == 1: # no sparse + attn_mask = torch.from_numpy(np.zeros(shape)) + elif sparse_mode == 2: + attn_mask = torch.from_numpy(np.triu(np.ones(shape), k=1)) + elif sparse_mode == 3: + attn_mask = torch.from_numpy(np.triu(np.ones(shape), k=s2 - s1 + 1)) + elif sparse_mode == 4: + attn_mask_u = torch.from_numpy(np.triu(np.ones(shape), k=next_tocken + 1)) + attn_mask_l = torch.from_numpy(np.tril(np.ones(shape), k=-pre_tocken - 1)) + attn_mask = attn_mask_u + attn_mask_l + # 注:不会出现sparse_mode=5的情况,该情况要求必须要传入attn_mask,且attn_mask矩阵数据格式须为BNSS或B1SS, + # 因此可以认为FA的输入已经是正确的attn_mask了 + return attn_mask.to(dtype) + + +def generate_kv(key, value, n1, n2): + # N不等长适配by cdy + if not (n1 == n2): + k_new = broadcast_kv(n1, n2, key, key.dtype) + v_new = broadcast_kv(n1, n2, value, value.dtype) + else: + k_new = key + v_new = value + return k_new, v_new + + +def rebuid_softmax_by_qkv(q, k, attn_mask, pse, scalar_value): + """ + attention = softmax(QK^T/sqrt(d))V + softmax(x_i) = e^(x_i - x_max) / sum(e^(x_i - x_max)) + """ + logger.info("Using QKV to rebuild original softmax") + qk = calculate_qk(q, k, attn_mask, pse, scalar_value) + softmax_res, _, _ = softmax_forward(qk) + return softmax_res + + +def rebuild_softmax_by_max_sum(softmax_params): + """ + attention = softmax(QK^T/sqrt(d))V + softmax(x_i) = e^(x_i - x_max_i) / x_sum_i) + """ + q = softmax_params.q + k = softmax_params.k + attn_mask = softmax_params.attn_mask + pse = softmax_params.pse + scalar_value = softmax_params.scalar_value + softmax_max = softmax_params.softmax_max + softmax_sum = softmax_params.softmax_sum + logger.info("Using softmax_max and softmax_sum to rebuild original softmax") + + qk = calculate_qk(q, k, attn_mask, pse, scalar_value) + if softmax_max.shape[-1] == 0: + raise ValueError(f"softmax_max.shape[-1] must be non-zero, softmax_max.shape: {softmax_max.shape}") + repeat_dim = qk.shape[-1] // softmax_max.shape[-1] + softmax_res = torch.exp(qk.sub(softmax_max.repeat(1, 1, 1, repeat_dim))).div( + softmax_sum.repeat(1, 1, 1, repeat_dim)) + return softmax_res + + +def get_head_num(*args, **kwargs): + if kwargs.get("head_num", None): + head_num = kwargs.get("head_num") + elif len(args) >= 4: + head_num = args[3] + else: + raise ValueError(f"Unsupported npu_fusion_attention args {args}.") + return head_num + + +def get_input_layout(*args, **kwargs): + if kwargs.get("input_layout", None): + input_layout = kwargs.get("input_layout") + elif len(args) >= 5: + input_layout = args[4] + else: + raise ValueError(f"Unsupported npu_fusion_attention args {args}.") + return input_layout + + +def npu_fusion_attention_forward_patch(*args, **kwargs): + if len(args) < 2: + raise RuntimeError("npu_fusion_attention_forward_patch: length of args should greater than or equal to 2.") + + # query, key, value, head_num, input_layout + head_num = get_head_num(*args, **kwargs) + input_layout = get_input_layout(*args, **kwargs) + + b, s1, s2, n1, n2, d, h1, h2, dtype = parse_bsnd_args(args[0], args[1], head_num, input_layout) + if n1 == n2 and s1 == s2: + logger.debug(f"running case : BNSD = {b}_{n1}_{s1}_{d}, sparse = {kwargs.get('sparse_mode', 0)}") + else: + logger.debug(f"running case: BNSD = {b}_{n1}({n2})_{s1}({s2})_{d}, sparse = {kwargs.get('sparse_mode', 0)}") + if not (n1 % n2 == 0 and n1 >= n2): + raise ValueError(f"N1与N2不匹配,请检查: n1 = {n1}, n2 = {n2}.") + + dims_kwargs = { + "b": b, "s1": s1, "s2": s2, "n1": n1, "n2": n2, + "d": d, "h1": h1, "h2": h2, "dtype": dtype + } + new_kwargs = { + "keep_prob": 1, + "scalar_value": kwargs.get("scalar_value", 1 / (d ** 0.5)), + "sparse_mode": kwargs.get("sparse_mode", 0), + "prefix": kwargs.get("prefix"), + "pre_tockens": kwargs.get("pre_tockens", 2147483647), + "next_tockens": kwargs.get("next_tockens", 2147483647), + "pse": kwargs.get("pse"), + "padding_mask": kwargs.get("padding_mask"), + "attn_mask": kwargs.get("attn_mask") + } + + return args, dims_kwargs, new_kwargs + + +def npu_fusion_attention_backward_patch(*args, **kwargs): + if len(args) != 6: + raise ValueError(f"Unsupported npu_fusion_attention_grad args {args}.") + + b, s1, s2, n1, n2, d, h1, h2, dtype = parse_bsnd_args(args[0], args[1], args[4], args[5]) + if n1 == n2 and s1 == s2: + logger.info(f"running case : bnsd = {b}_{n1}_{s1}_{d}, sparse = {kwargs.get('sparse_mode', 0)}") + else: + logger.info(f"running case: bnsd = {b}_{n1}({n2})_{s1}({s2})_{d}, sparse = {kwargs.get('sparse_mode', 0)}") + if not (n1 % n2 == 0 and n1 >= n2): + raise ValueError(f"N1与N2不匹配,请检查: n1 = {n1}, n2 = {n2}.") + + dims_kwargs = { + "b": b, "s1": s1, "s2": s2, "n1": n1, "n2": n2, + "d": d, "h1": h1, "h2": h2, "dtype": dtype + } + + new_kwargs = { + "keep_prob": 1, + "scalar_value_value": kwargs.get("scalar_value_value", 1 / (d ** 0.5)), + "sparse_mode": kwargs.get("sparse_mode", 0), + "prefix": kwargs.get("prefix"), + "pre_tockens": kwargs.get("pre_tockens", 2147483647), + "next_tockens": kwargs.get("next_tockens", 2147483647), + "pse": kwargs.get("pse"), + "padding_mask": kwargs.get("padding_mask"), + "softmax_max": kwargs.get("softmax_max"), + "softmax_sum": kwargs.get("softmax_sum"), + "softmax_in": kwargs.get("softmax_in"), + "attention_in": kwargs.get("attention_in"), + "seed": kwargs.get("seed", 0), + "offset": kwargs.get("offset", 0), + "numels": kwargs.get("numels", 0), + "attn_mask": kwargs.get("attn_mask") + } + + return args, dims_kwargs, new_kwargs + + +class FlashAttentionScore(nn.Module): + def __init__(self): + super(FlashAttentionScore, self).__init__() + # You can initialize any parameters here if necessary + + def forward(self, *inputs, **kwargs): + # Extract the inputs for the attention calculation + new_args, dims_kwargs, new_kwargs = npu_fusion_attention_forward_patch(*inputs, **kwargs) + query, key, value = new_args[0], new_args[1], new_args[2] + + input_layout = get_input_layout(*inputs, **kwargs) + + n1 = dims_kwargs.get("n1") + n2 = dims_kwargs.get("n2") + s1 = dims_kwargs.get("s1") + s2 = dims_kwargs.get("s2") + b = dims_kwargs.get("b") + dtype = dims_kwargs.get("dtype") + attn_mask = new_kwargs.get("attn_mask") + keep_prob = new_kwargs.get("keep_prob") + sparse_mode = new_kwargs.get("sparse_mode") + pre_tockens = new_kwargs.get("pre_tockens") + next_tockens = new_kwargs.get("next_tokens") + pse = new_kwargs.get("real_shift") + scalar_value = new_kwargs.get("scalar_value") + + args_temp = [sparse_mode, attn_mask, b, n1, s1, s2, pre_tockens, next_tockens, dtype] + + attn_mask = generate_attn_mask(*args_temp) + query = convert_to_bnsd(query, n1, input_layout) + key = convert_to_bnsd(key, n2, input_layout) + value = convert_to_bnsd(value, n2, input_layout) + + forward_params = FaForwardParams( + q=query, + k=key, + v=value, + drop_mask=None, + attn_mask=attn_mask, + pse=pse, + scalar_value=scalar_value, + keep_prob=keep_prob + ) + + out_golden, softmax_max, softmax_sum = fusion_attention_forward(forward_params) + + # If output dimension is 5, reshape accordingly + if out_golden.dim() == 5: + out_golden = out_golden.reshape(out_golden.size(0), + out_golden.size(1) * out_golden.size(2), + out_golden.size(3), out_golden.size(4)) + + out_golden = convert_from_bnsd(out_golden, input_layout) + + # Ensure the output matches the desired layout + out_golden = out_golden.cpu(), softmax_max.repeat(1, 1, 1, 8).cpu(), softmax_sum.repeat(1, 1, 1, 8).cpu() + + return out_golden + + def backward(self, *inputs, **kwargs): + # The backward pass will be similar to what was described for the gradient computation + new_args, dims_kwargs, new_kwargs = npu_fusion_attention_backward_patch(*inputs, **kwargs) + query, key, value, dx, input_layout = new_args[0], new_args[1], new_args[2], new_args[3], new_args[5] + n1 = dims_kwargs.get("n1") + n2 = dims_kwargs.get("n2") + s1 = dims_kwargs.get("s1") + s2 = dims_kwargs.get("s2") + b = dims_kwargs.get("b") + dtype = dims_kwargs.get("dtype") + attn_mask = new_kwargs.get("attn_mask") + keep_prob = new_kwargs.get("keep_prob") + sparse_mode = new_kwargs.get("sparse_mode") + pre_tockens = new_kwargs.get("pre_tockens") + next_tockens = new_kwargs.get("next_tockens") + pse = new_kwargs.get("pse") + softmax_max = new_kwargs.get("softmax_max") + softmax_sum = new_kwargs.get("softmax_sum") + scalar_value = new_kwargs.get("scalar_value") + + args_temp = [sparse_mode, attn_mask, b, n1, s1, s2, pre_tockens, next_tockens, dtype] + attn_mask = generate_attn_mask(*args_temp) + + query = convert_to_bnsd(query, n1, input_layout) + dx = convert_to_bnsd(dx, n1, input_layout) + key = convert_to_bnsd(key, n2, input_layout) + value = convert_to_bnsd(value, n2, input_layout) + + k_new, v_new = generate_kv(key, value, n1, n2) + + if SOFTMAX_BUILD_MODE == "QKV": + softmax_res = rebuid_softmax_by_qkv(query, k_new, attn_mask, pse, scalar_value) + else: + softmax_params = RebuildSoftmaxParams(query, k_new, attn_mask, pse, scalar_value, softmax_max, softmax_sum) + softmax_res = rebuild_softmax_by_max_sum(softmax_params) + + backward_params = FaBackwardParams(dx, query, k_new, v_new, softmax_res, None, pse, scalar_value, keep_prob) + dq, dk, dv = fusion_attention_backward(backward_params) + + # Reshape as needed + if dq.dim() == 5: + dq = dq.reshape(dq.size(0), dq.size(1) * dq.size(2), dq.size(3), dq.size(4)) + if dk.dim() == 5: + dk = dk.reshape(dk.size(0), dk.size(1) * dk.size(2), dk.size(3), dk.size(4)) + if dv.dim() == 5: + dv = dv.reshape(dv.size(0), dv.size(1) * dv.size(2), dv.size(3), dv.size(4)) + + dq = convert_from_bnsd(dq, input_layout) + dk = convert_from_bnsd(dk, input_layout) + dv = convert_from_bnsd(dv, input_layout) + + return dq.cpu(), dk.cpu(), dv.cpu() diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_tensor.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/bench_functions/fusion_operator.py similarity index 30% rename from debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_tensor.py rename to debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/bench_functions/fusion_operator.py index f93c09a12415f22d96306ebc9de919520c025236..e1344541e89c4dafd9d49d63e3fdea117366bdd9 100644 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_tensor.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/bench_functions/fusion_operator.py @@ -13,57 +13,29 @@ # See the License for the specific language governing permissions and # limitations under the License. -import os +from msprobe.mindspore.api_accuracy_checker.bench_functions.flash_attention_score import FlashAttentionScore -import torch -from msprobe.pytorch.hook_module.hook_module import HOOKModule -from msprobe.pytorch.common.utils import torch_device_guard, parameter_adapter -from msprobe.core.common.const import Const -from msprobe.core.common.file_utils import load_yaml +class FusionOperator: + """ + 所有融合算子的父类,定义了通用的接口和属性。 + """ + # 初始化操作符字典 + def __init__(self): + self.flash_attention_score = None # 用于存放 FlashAttentionScore 操作符 + self._register_operators() -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, "support_wrap_ops.yaml") + def __getattr__(self, name): + """ 动态获取算子类 """ + if hasattr(self, name): + return getattr(self, name) + else: + raise AttributeError(f"'FusionOperator' object has no attribute '{name}'") + def _register_operators(self): + """ 注册操作符到父类,以便通过 fusion.xxx 调用 """ + self.flash_attention_score = FlashAttentionScore() -def get_tensor_ops(): - _tensor_ops = dir(torch.Tensor) - yaml_data = load_yaml(yaml_path) - wrap_tensor_ops = yaml_data.get('tensor') - return set(wrap_tensor_ops) & set(_tensor_ops) - -TensorOps = {op: getattr(torch.Tensor, op) for op in get_tensor_ops()} - - -class HOOKTensor(object): - pass - - -class TensorOPTemplate(HOOKModule): - - def __init__(self, op_name, hook, need_hook=True): - self.op_name_ = op_name - self.prefix_op_name_ = "Tensor" + Const.SEP + str(op_name) + Const.SEP - if need_hook: - super().__init__(hook) - - @torch_device_guard - @parameter_adapter - def forward(self, *args, **kwargs): - return TensorOps[str(self.op_name_)](*args, **kwargs) - - -def wrap_tensor_op(op_name, hook): - - def tensor_op_template(*args, **kwargs): - return TensorOPTemplate(op_name, hook)(*args, **kwargs) - - return tensor_op_template - - -def wrap_tensor_ops_and_bind(hook): - _tensor_ops = get_tensor_ops() - for op_name in _tensor_ops: - setattr(HOOKTensor, "wrap_" + str(op_name), wrap_tensor_op(op_name, hook)) +fusion = FusionOperator() diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/data_manager.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/data_manager.py index 748adf7d02cafe3983fe1990b40b1e77e993698b..fc2680d68a5697dae165c70a276b21038f87fbe0 100644 --- a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/data_manager.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/data_manager.py @@ -16,12 +16,13 @@ import os import csv -from msprobe.core.common.const import Const, CompareConst, MsCompareConst +from msprobe.core.common.const import Const, CompareConst from msprobe.core.common.file_utils import FileOpen, create_directory, write_csv, read_csv from msprobe.core.common.utils import add_time_as_suffix, MsprobeBaseException from msprobe.mindspore.api_accuracy_checker.base_compare_algorithm import compare_algorithms from msprobe.core.common.file_utils import check_file_or_directory_path from msprobe.mindspore.common.log import logger +from msprobe.mindspore.common.const import MsCompareConst class ResultCsvEntry: diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/multi_api_accuracy_checker.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/multi_api_accuracy_checker.py index e764140badf4c107ea83044353aba19a1c412fe0..1913675ad162bf690fc0aed5fc84c245ae4f73ca 100644 --- a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/multi_api_accuracy_checker.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/multi_api_accuracy_checker.py @@ -27,10 +27,11 @@ import numpy as np from tqdm import tqdm # 本地应用/库特定导入 -from msprobe.core.common.const import Const, CompareConst, MsCompareConst +from msprobe.core.common.const import Const, CompareConst from msprobe.mindspore.api_accuracy_checker.api_accuracy_checker import ApiAccuracyChecker, BasicInfoAndStatus from msprobe.mindspore.api_accuracy_checker.multi_data_manager import MultiDataManager from msprobe.mindspore.common.log import logger +from msprobe.mindspore.common.const import MsCompareConst class MultiApiAccuracyChecker(ApiAccuracyChecker): diff --git a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/torch_mindtorch_importer.py b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/torch_mindtorch_importer.py index 84f2706cc55fa3d0a1fba13d54ba8310371f1a43..7b319382eb4eba4abac3bd6894cc3b0262032d88 100644 --- a/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/torch_mindtorch_importer.py +++ b/debug/accuracy_tools/msprobe/mindspore/api_accuracy_checker/torch_mindtorch_importer.py @@ -19,7 +19,8 @@ import sys from pathlib import Path import mindspore from msprobe.mindspore.common.log import logger -from msprobe.core.common.const import Const, CompareConst, MsCompareConst +from msprobe.core.common.const import Const, CompareConst +from msprobe.mindspore.common.const import MsCompareConst import torch as mindtorch from torch import Tensor as mindtorch_tensor import torch.nn.functional as mindtorch_func diff --git a/debug/accuracy_tools/msprobe/mindspore/common/const.py b/debug/accuracy_tools/msprobe/mindspore/common/const.py index 9e8c79e51284b8e9696dde150481609f7da8b488..b41dc5ce012dc5353a2f62607eabc604fda4eb3a 100644 --- a/debug/accuracy_tools/msprobe/mindspore/common/const.py +++ b/debug/accuracy_tools/msprobe/mindspore/common/const.py @@ -61,6 +61,7 @@ class Const: DROPOUT_API_NAME_PREFIX = "dropout" GRAPH_DATA_MODE_LIST = [CoreConst.ALL, CoreConst.INPUT, CoreConst.OUTPUT] + GRAPH_CELL_DUMP_DATA_MODE_LIST = [CoreConst.ALL, CoreConst.FORWARD, CoreConst.BACKWARD] HOOK_MS_PREFIX_DICT = { OPS_DATA_PREFIX: OPS_PREFIX, @@ -70,6 +71,67 @@ class Const: } +class MsCompareConst: + # api_info field + MINT = "Mint" + MINT_FUNCTIONAL = "MintFunctional" + TENSOR_API = "Tensor" + FUNCTIONAL_API = "Functional" + FUSION_API = "FUSION" + + API_NAME_STR_LENGTH = 4 + MAX_RECURSION_DEPTH = 20 + + # Mindtorch api_info field + MINDTORCH_TENSOR = "Tensor" + MINDTORCH = "Torch" + MINDTORCH_FUNC = "Functional" + MINDTORCH_NPU = "NPU" + MINDTORCH_DIST = "Distributed" + + + + MT_VALID_API_TYPES = [ + MINDTORCH, MINDTORCH_FUNC, MINDTORCH_TENSOR + ] + SUPPORTED_FUSION_LIST = ["flash_attention_score"] + + + TASK_FIELD = "task" + STATISTICS_TASK = "statistics" + FRAMEWORK = "framework" + TENSOR_TASK = "tensor" + DUMP_DATA_DIR_FIELD = "dump_data_dir" + DATA_FIELD = "data" + + # supported api yaml + SUPPORTED_API_LIST_FILE = "checker_support_api.yaml" + SUPPORTED_TENSOR_LIST_KEY = "tensor" + + # detail_csv + DETAIL_CSV_API_NAME = "API Name" + DETAIL_CSV_BENCH_DTYPE = "Bench Dtype" + DETAIL_CSV_TESTED_DTYPE = "Tested Dtype" + DETAIL_CSV_SHAPE = "Shape" + DETAIL_CSV_PASS_STATUS = "Status" + DETAIL_CSV_MESSAGE = "Message" + DETAIL_CSV_FILE_NAME = "accuracy_checking_details" + + # result_csv + RESULT_CSV_FORWARD_TEST_SUCCESS = "Forward Test Success" + RESULT_CSV_BACKWARD_TEST_SUCCESS = "Backward Test Success" + RESULT_CSV_FILE_NAME = "accuracy_checking_result" + + EPSILON = 1e-8 + + class ProcessStatus: + SUCCESS = "success" + API_NOT_FOUND = "api_not_found" + EXCEPTION_SKIP = "exception_skip" + + + + class FreeBenchmarkConst: ADD_NOISE = "add_noise" BIT_NOISE = "bit_noise" diff --git a/debug/accuracy_tools/msprobe/mindspore/common/utils.py b/debug/accuracy_tools/msprobe/mindspore/common/utils.py index ded3faaa22b565ef35c17a7596782976ddf9125d..625842da589a3090cddc75c50175ac577f1777b6 100644 --- a/debug/accuracy_tools/msprobe/mindspore/common/utils.py +++ b/debug/accuracy_tools/msprobe/mindspore/common/utils.py @@ -28,6 +28,30 @@ from msprobe.core.common.const import Const from msprobe.core.common.utils import CompareException, check_seed_all +class MsprobeStep(ms.train.Callback): + def __init__(self, debugger): + super(MsprobeStep, self).__init__() + self.debugger = debugger + + def on_train_step_begin(self, run_context): + self.debugger.start() + + def on_train_step_end(self, run_context): + self.debugger.stop() + self.debugger.step() + + +class MsprobeInitStep(ms.train.Callback): + def on_train_begin(self, run_context): + try: + from ms._c_expression import _set_init_iter + except ImportError: + logger.warning('MsprobeInitStep does not work on this version of MindSpore.') + return + cb_params = run_context.original_args() + _set_init_iter(cb_params.cur_step_num) + + def get_rank_if_initialized(): if ms.communication.GlobalComm.INITED: return ms.communication.get_rank() @@ -93,20 +117,6 @@ def seed_all(seed=1234, mode=False, rm_dropout=True): remove_dropout() -class MsprobeStep(ms.train.Callback): - - def __init__(self, debugger): - super(MsprobeStep, self).__init__() - self.debugger = debugger - - def on_train_step_begin(self, run_context): - self.debugger.start() - - def on_train_step_end(self, run_context): - self.debugger.stop() - self.debugger.step() - - class Dropout(ops.Dropout): def __init__(self, keep_prob=0.5, seed0=0, seed1=1): super().__init__(1., seed0, seed1) @@ -169,7 +179,7 @@ def set_register_backward_hook_functions(): from msprobe.mindspore.mindtorch import (_call_impl, register_full_backward_pre_hook, register_full_backward_hook) - if not hasattr(torch, "register_full_backward_hook"): + if not hasattr(torch.nn.Module, "register_full_backward_hook"): setattr(torch.nn.Module, "_call_impl", _call_impl) setattr(torch.nn.Module, "register_full_backward_pre_hook", register_full_backward_pre_hook) setattr(torch.nn.Module, "register_full_backward_hook", register_full_backward_hook) @@ -182,9 +192,9 @@ def set_register_backward_hook_functions(): def check_save_param(variable, name, save_backward): # try catch this api to skip invalid call - if not isinstance(variable, (list, dict, ms.Tensor, int, float, str)): + if not isinstance(variable, (list, dict, tuple, ms.Tensor, int, float, str)): logger.warning("PrecisionDebugger.save variable type not valid, " - "should be one of list, dict, ms.Tensor, int, float or string. " + "should be one of list, dict, tuple, ms.Tensor, int, float or string. " "Skip current save process.") raise ValueError if not isinstance(name, str): @@ -196,4 +206,4 @@ def check_save_param(variable, name, save_backward): logger.warning("PrecisionDebugger.save_backward name not valid, " "should be bool. " "Skip current save process.") - raise ValueError \ No newline at end of file + raise ValueError diff --git a/debug/accuracy_tools/msprobe/mindspore/compare/ms_compare.py b/debug/accuracy_tools/msprobe/mindspore/compare/ms_compare.py index 8509a7f38add0c2e8d3f3638f4c247895e07bd6d..843afa1a98fcd890a7a7391c33f7e1a7b911f72b 100644 --- a/debug/accuracy_tools/msprobe/mindspore/compare/ms_compare.py +++ b/debug/accuracy_tools/msprobe/mindspore/compare/ms_compare.py @@ -22,10 +22,10 @@ import pandas as pd from msprobe.core.common.const import CompareConst, Const from msprobe.core.common.exceptions import FileCheckException -from msprobe.core.common.file_utils import FileOpen, create_directory, load_json, load_npy, load_yaml +from msprobe.core.common.file_utils import create_directory, load_json, load_npy, load_yaml from msprobe.core.common.log import logger from msprobe.core.common.utils import CompareException, check_compare_param, check_configuration_param, \ - check_op_str_pattern_valid, get_dump_mode, set_dump_path + check_op_str_pattern_valid, get_dump_mode, set_dump_path, detect_framework_by_dump_json from msprobe.core.compare.acc_compare import Comparator, ModeConfig from msprobe.core.compare.check import dtype_mapping from msprobe.core.compare.layer_mapping import generate_data_mapping_by_layer_mapping @@ -78,6 +78,11 @@ class MSComparator(Comparator): raise TypeError(f"The type of parameter `data_mapping` must be dict, str or None, but got " f"{type(self.data_mapping)}") + @staticmethod + def process_data_name(result): + result['data_name_x'] = result.apply(lambda row: [row['data_name_x'], row['data_name_y']], axis=1) + return result + def calc_accuracy(self, result_df, header): condition_no_bench = result_df[CompareConst.BENCH_NAME] == CompareConst.N_A result_df[condition_no_bench] = result_df[condition_no_bench].fillna(CompareConst.N_A) @@ -125,7 +130,8 @@ class MSComparator(Comparator): result_df.loc[warning_flag, CompareConst.RESULT] = CompareConst.WARNING result_df.loc[warning_flag, CompareConst.ERROR_MESSAGE] = 'Need double check api accuracy.' else: - fill_cols = [CompareConst.COSINE, CompareConst.MAX_ABS_ERR, CompareConst.MAX_RELATIVE_ERR, + fill_cols = [CompareConst.COSINE, CompareConst.EUC_DIST, + CompareConst.MAX_ABS_ERR, CompareConst.MAX_RELATIVE_ERR, CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO, CompareConst.ERROR_MESSAGE] result_df.loc[~condition_no_bench, fill_cols] = '' @@ -139,6 +145,8 @@ class MSComparator(Comparator): header.append(CompareConst.STACK) if self.dump_mode == Const.ALL: header.append(CompareConst.DATA_NAME) + result = self.process_data_name(result) + result.rename(columns={'op_name_x': CompareConst.NPU_NAME, 'op_name_y': CompareConst.BENCH_NAME, 'dtype_x': CompareConst.NPU_DTYPE, @@ -169,6 +177,7 @@ class MSComparator(Comparator): result[npu_summary] = result['summary_x'].apply(set_summary).tolist() result[bench_summary] = result['summary_y'].apply(set_summary).tolist() + result_df = pd.DataFrame(columns=header) for h in header: if h in result.columns: @@ -260,7 +269,8 @@ class MSComparator(Comparator): npu_df[CompareConst.COMPARE_SHAPE] = npu_df[Const.SHAPE] bench_df[CompareConst.COMPARE_KEY] = bench_df[CompareConst.OP_NAME] bench_df[CompareConst.COMPARE_SHAPE] = bench_df[Const.SHAPE] - match_result = pd.merge(npu_df, bench_df, on=[CompareConst.COMPARE_KEY, CompareConst.COMPARE_SHAPE], + match_result = pd.merge(npu_df, bench_df, on=([CompareConst.COMPARE_KEY] if self.fuzzy_match + else [CompareConst.COMPARE_KEY, CompareConst.COMPARE_SHAPE]), how='outer') match_result = match_result[match_result['op_name_x'].notna()].fillna(CompareConst.N_A) @@ -269,17 +279,18 @@ class MSComparator(Comparator): bench_dtype = match_result['dtype_y'] if self.cross_frame: npu_dtype = npu_dtype.map(dtype_mapping).fillna(npu_dtype) - return ((npu_dtype == bench_dtype) | - ((npu_dtype == Const.FLOAT16) & (bench_dtype == Const.FLOAT32)) | - ((npu_dtype == Const.FLOAT32) & (bench_dtype == Const.FLOAT16)) | - ((npu_dtype == Const.FLOAT16) & (bench_dtype == Const.BFLOAT16)) | - ((npu_dtype == Const.BFLOAT16) & (bench_dtype == Const.FLOAT16)) | - ((npu_dtype == Const.TORCH_FLOAT16) & (bench_dtype == Const.TORCH_FLOAT32)) | - ((npu_dtype == Const.TORCH_FLOAT32) & (bench_dtype == Const.TORCH_FLOAT16)) | - ((npu_dtype == Const.TORCH_FLOAT16) & (bench_dtype == Const.TORCH_BFLOAT16)) | - ((npu_dtype == Const.TORCH_BFLOAT16) & (bench_dtype == Const.TORCH_FLOAT16))) - - match_result.loc[~gen_dtype_condition(), [i + '_y' for i in bench_df.columns]] = CompareConst.N_A + + equal_condition = npu_dtype == bench_dtype + match_condition = ( + (npu_dtype.isin(CompareConst.DTYPE_MATCH_GROUPS[0]) & bench_dtype.isin( + CompareConst.DTYPE_MATCH_GROUPS[0])) | + (npu_dtype.isin(CompareConst.DTYPE_MATCH_GROUPS[1]) & bench_dtype.isin( + CompareConst.DTYPE_MATCH_GROUPS[1])) + ) + return equal_condition | match_condition + + if not self.fuzzy_match: + match_result.loc[~gen_dtype_condition(), [i + '_y' for i in bench_df.columns]] = CompareConst.N_A return self.make_result_df(match_result) def modify_compare_data_with_user_mapping(self, npu_df, bench_df): @@ -382,12 +393,11 @@ class MSComparator(Comparator): def check_cross_framework(bench_json_path): - pattern = r'"data_name":\s*"[^"]+\.pt"' - with FileOpen(bench_json_path, 'r') as file: - for line in file: - if re.search(pattern, line): - return True - return False + framework = detect_framework_by_dump_json(bench_json_path) + if framework == Const.PT_FRAMEWORK: + return True + else: + return False def ms_compare(input_param, output_path, **kwargs): diff --git a/debug/accuracy_tools/msprobe/mindspore/compare/ms_graph_compare.py b/debug/accuracy_tools/msprobe/mindspore/compare/ms_graph_compare.py index 701988ba483de4e13d85892dbb42d62c7cc805b8..153f4fd655212b24904c33d29dad694ee1dd2c1f 100644 --- a/debug/accuracy_tools/msprobe/mindspore/compare/ms_graph_compare.py +++ b/debug/accuracy_tools/msprobe/mindspore/compare/ms_graph_compare.py @@ -195,11 +195,12 @@ class GraphMSComparator: if not error_flag: result_list, err_msg = compare_ops_apply(n_value, b_value, False, "") result_dict[CompareConst.COSINE] = result_list[0] - result_dict[CompareConst.MAX_ABS_ERR] = result_list[1] - result_dict[CompareConst.MAX_RELATIVE_ERR] = result_list[2] - result_dict[CompareConst.ONE_THOUSANDTH_ERR_RATIO] = result_list[3] - result_dict[CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] = result_list[4] - result_dict[CompareConst.ACCURACY] = check_accuracy(result_list[0], result_list[1]) + result_dict[CompareConst.EUC_DIST] = result_list[1] + result_dict[CompareConst.MAX_ABS_ERR] = result_list[2] + result_dict[CompareConst.MAX_RELATIVE_ERR] = result_list[3] + result_dict[CompareConst.ONE_THOUSANDTH_ERR_RATIO] = result_list[4] + result_dict[CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] = result_list[5] + result_dict[CompareConst.ACCURACY] = check_accuracy(result_list[0], result_list[2]) result_dict[CompareConst.ERROR_MESSAGE] = err_msg return pd.Series(result_dict) diff --git a/debug/accuracy_tools/msprobe/mindspore/debugger/debugger_config.py b/debug/accuracy_tools/msprobe/mindspore/debugger/debugger_config.py index 92155b4ec4ebd636477ef67f1c75b43e7a82b802..558df954326aa276ce640bbc8319237c02398885 100644 --- a/debug/accuracy_tools/msprobe/mindspore/debugger/debugger_config.py +++ b/debug/accuracy_tools/msprobe/mindspore/debugger/debugger_config.py @@ -42,6 +42,10 @@ class DebuggerConfig: self.framework = Const.MS_FRAMEWORK self.summary_mode = task_config.summary_mode self.async_dump = common_config.async_dump if common_config.async_dump else False + if hasattr(task_config, 'td_config_path'): + self.td_config_path = "" if not task_config.td_config_path else task_config.td_config_path + else: + self.td_config_path = "" self.check() create_directory(self.dump_path) diff --git a/debug/accuracy_tools/msprobe/mindspore/debugger/precision_debugger.py b/debug/accuracy_tools/msprobe/mindspore/debugger/precision_debugger.py index 7694d71dd98ae1c7c4611f9435a274ac018e5df6..453b181de49714208ea6acd740bdb021b705ec25 100644 --- a/debug/accuracy_tools/msprobe/mindspore/debugger/precision_debugger.py +++ b/debug/accuracy_tools/msprobe/mindspore/debugger/precision_debugger.py @@ -22,18 +22,19 @@ from mindspore._c_expression import MSContext from msprobe.core.common.const import Const, FileCheckConst, MsgConst from msprobe.core.common.exceptions import MsprobeException from msprobe.core.common.file_utils import FileChecker -from msprobe.core.common.utils import get_real_step_or_rank +from msprobe.core.common.utils import get_real_step_or_rank, check_init_step from msprobe.mindspore.cell_processor import CellProcessor from msprobe.mindspore.common.const import Const as MsConst from msprobe.mindspore.common.utils import set_register_backward_hook_functions, check_save_param from msprobe.mindspore.debugger.debugger_config import DebuggerConfig -from msprobe.mindspore.dump.hook_cell.api_registry import api_register +from msprobe.mindspore.dump.hook_cell.api_register import get_api_register from msprobe.mindspore.dump.hook_cell.hook_cell import HOOKCell from msprobe.mindspore.grad_probe.grad_monitor import GradientMonitor from msprobe.mindspore.ms_config import parse_json_config from msprobe.mindspore.runtime import Runtime from msprobe.mindspore.service import Service from msprobe.mindspore.task_handler_factory import TaskHandlerFactory +from msprobe.mindspore.dump.graph_mode_cell_dump import GraphModeCellDump try: from msprobe.lib import _msprobe_c @@ -84,7 +85,7 @@ class PrecisionDebugger: common_config.dump_path = dump_path if dump_path else common_config.dump_path self.config = DebuggerConfig(common_config, task_config) - if _msprobe_c: + if self._need_msprobe_c() and _msprobe_c: _msprobe_c._PrecisionDebugger(framework="MindSpore", config_path=config_path) self.config.execution_mode = self._get_execution_mode() @@ -151,7 +152,7 @@ class PrecisionDebugger: instance = cls._instance if not instance: raise Exception(MsgConst.NOT_CREATED_INSTANCE) - if _msprobe_c: + if cls._need_msprobe_c() and _msprobe_c: _msprobe_c._PrecisionDebugger().start() if instance.task in PrecisionDebugger.task_not_need_service: return @@ -163,8 +164,8 @@ class PrecisionDebugger: instance.service.start(model) else: if not instance.first_start: - api_register.api_set_ori_func() - handler = TaskHandlerFactory.create(instance.config) + get_api_register().restore_all_api() + handler = TaskHandlerFactory.create(instance.config, model) handler.handle() instance.first_start = True @@ -180,7 +181,7 @@ class PrecisionDebugger: instance = cls._instance if not instance: raise Exception(MsgConst.NOT_CREATED_INSTANCE) - if _msprobe_c: + if cls._need_msprobe_c() and _msprobe_c: _msprobe_c._PrecisionDebugger().stop() if instance.task == Const.GRAD_PROBE: instance.gm.stop() @@ -195,10 +196,13 @@ class PrecisionDebugger: instance = cls._instance if not instance: raise Exception(MsgConst.NOT_CREATED_INSTANCE) - if _msprobe_c: + if cls._need_msprobe_c() and _msprobe_c: _msprobe_c._PrecisionDebugger().step() if instance.task in PrecisionDebugger.task_not_need_service: return + if instance.config.execution_mode != MsConst.PYNATIVE_MODE and instance.config.level == MsConst.CELL: + GraphModeCellDump.step() + return if instance.service: instance.service.step() HOOKCell.cell_count = defaultdict(int) @@ -233,6 +237,14 @@ class PrecisionDebugger: instance.service = Service(instance.config) instance.service.save(variable, name, save_backward) + @classmethod + def set_init_step(cls, step): + instance = cls._instance + if not instance: + raise Exception(MsgConst.NOT_CREATED_INSTANCE) + check_init_step(step) + instance.service.init_step = step + @classmethod def _need_service(cls): instance = cls._instance @@ -241,4 +253,11 @@ class PrecisionDebugger: if instance.config.execution_mode != MsConst.PYNATIVE_MODE: return False else: - return instance.config.task != Const.FREE_BENCHMARK and not instance._is_graph_dump(instance.config) \ No newline at end of file + return instance.config.task != Const.FREE_BENCHMARK and not instance._is_graph_dump(instance.config) + + @classmethod + def _need_msprobe_c(cls): + instance = cls._instance + if not instance: + raise Exception(MsgConst.NOT_CREATED_INSTANCE) + return instance.config.level_ori == Const.LEVEL_L2 diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/cell_dump_process.py b/debug/accuracy_tools/msprobe/mindspore/dump/cell_dump_process.py new file mode 100644 index 0000000000000000000000000000000000000000..a77f3d4fe3747f378365a63027e57e479a4eae24 --- /dev/null +++ b/debug/accuracy_tools/msprobe/mindspore/dump/cell_dump_process.py @@ -0,0 +1,586 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import time +import re +import json +import atexit +from multiprocessing import Pool + +import numpy as np +import mindspore as ms +from mindspore import nn, ops + +from msprobe.mindspore.common.log import logger +from msprobe.core.common.const import Const as CoreConst +from msprobe.core.common.file_utils import load_npy, save_json, remove_path, load_yaml +from msprobe.core.common.const import FileCheckConst + + +CONSTRUCT_FILE_NAME = "construct.json" +DEFAULT_RANK_DIR = "rank0" +KEY_LAYERS = "layers" +construct = {} +cell_list = [] +KEY_SIDE_EFFECT = "side_effect_io" +KEY_TOPLAYER = "TopLayer" +KEY_FORWARD = CoreConst.FORWARD +KEY_BACKWARD = CoreConst.BACKWARD +KEY_INPUT = CoreConst.INPUT +KEY_OUTPUT = CoreConst.OUTPUT +td = ops.TensorDump() +if (ms.__version__ >= "2.5.0"): + td_in = ops.TensorDump("in") +else: + td_in = ops.TensorDump() +td.add_prim_attr(KEY_SIDE_EFFECT, False) +td_in.add_prim_attr(KEY_SIDE_EFFECT, False) +np_ms_dtype_dict = { + "bool": ms.bool_, + "int8": ms.int8, + "byte": ms.byte, + "int16": ms.int16, + "short": ms.short, + "int32": ms.int32, + "intc": ms.intc, + "int64": ms.int64, + "intp": ms.intp, + "uint8": ms.uint8, + "ubyte": ms.ubyte, + "uint16": ms.uint16, + "ushort": ms.ushort, + "uint32": ms.uint32, + "uintc": ms.uintc, + "uint64": ms.uint64, + "uintp": ms.uintp, + "float16": ms.float16, + "half": ms.half, + "float32": ms.float32, + "single": ms.single, + "float64": ms.float64, + "double": ms.double, + "bfloat16": ms.bfloat16, + "complex64": ms.complex64, + "complex128": ms.complex128 +} + + +def gen_file_path(dump_path, cell_prefix, suffix, io_type, index): + step_path = os.path.join(dump_path, "{step}") + rank_path = os.path.join(step_path, "{rank}") + data_path = os.path.join(rank_path, CoreConst.DUMP_TENSOR_DATA) + file_name = CoreConst.SEP.join([cell_prefix, suffix, io_type, str(index)]) + return os.path.join(data_path, file_name) + + +def partial_func(func, dump_path, cell_prefix, index, io_type): + def newfunc(*args, **kwargs): + return func(dump_path, cell_prefix, index, io_type, *args, **kwargs) + return newfunc + + +def clip_gradient(dump_path, cell_prefix, index, io_type, dx): + if io_type == KEY_OUTPUT: + temp = td(gen_file_path(dump_path, cell_prefix, KEY_BACKWARD, io_type, index), dx) + dx = ops.depend(dx, temp) + if io_type == KEY_INPUT: + temp = td_in(gen_file_path(dump_path, cell_prefix, KEY_BACKWARD, io_type, index), dx) + dx = ops.depend(dx, temp) + return dx + + +def need_tensordump_in(cell_obj, attr): + return hasattr(cell_obj, attr) and getattr(cell_obj, attr) == "in" + + +def cell_construct_wrapper(func, self): + def new_construct(self, *args, **kwargs): + new_args = [] + out_list = [] + + index = 0 + item = None + # The inputs of the cell. + for index, item in enumerate(args): + if self.data_mode == "backward" or self.data_mode == "all": + if ops.is_tensor(item): + item = self.output_clips[index](item) + if self.data_mode == "forward" or self.data_mode == "all": + if ops.is_tensor(item): + if need_tensordump_in(self, 'input_dump_mode'): + temp = td_in( + gen_file_path(self.dump_path, self.cell_prefix, KEY_FORWARD, KEY_INPUT, index), + item + ) + else: + temp = td( + gen_file_path(self.dump_path, self.cell_prefix, KEY_FORWARD, KEY_INPUT, index), + item + ) + item = ops.depend(item, temp) + new_args.append(item) + + out = func(*new_args, **kwargs) + + # The outputs of the cell. + if isinstance(out, tuple): + for index, item in enumerate(out): + if self.data_mode == "backward" or self.data_mode == "all": + if ops.is_tensor(item): + item = self.input_clips[index](item) + if self.data_mode == "forward" or self.data_mode == "all": + if ops.is_tensor(item): + if need_tensordump_in(self, 'output_dump_mode'): + temp = td_in( + gen_file_path(self.dump_path, self.cell_prefix, KEY_FORWARD, KEY_OUTPUT, index), + item + ) + else: + temp = td( + gen_file_path(self.dump_path, self.cell_prefix, KEY_FORWARD, KEY_OUTPUT, index), + item + ) + item = ops.depend(item, temp) + out_list.append(item) + else: + out_list.append(item) + out_list = tuple(out_list) + return out_list + else: + if self.data_mode == "backward" or self.data_mode == "all": + out = self.input_clips[0](out) + if self.data_mode == "forward" or self.data_mode == "all": + if ops.is_tensor(out): + if need_tensordump_in(self, 'output_dump_mode'): + temp = td_in( + gen_file_path(self.dump_path, self.cell_prefix, KEY_FORWARD, KEY_OUTPUT, 0), + out + ) + else: + temp = td( + gen_file_path(self.dump_path, self.cell_prefix, KEY_FORWARD, KEY_OUTPUT, 0), + out + ) + out = ops.depend(out, temp) + return out + + return new_construct.__get__(self, type(self)) + + +# 获取目录下所有文件名并根据TensorDump落盘自增id从小到大排序 +def sort_filenames(path): + filenames = os.listdir(path) + id_pattern = re.compile(rf'{CoreConst.REPLACEMENT_CHARACTER}(\d+){CoreConst.NUMPY_SUFFIX}$') + filenames.sort(key=lambda x: int(id_pattern.findall(x)[0])) + return filenames + + +# 删除重复dump的文件:自定义文件名相同,并且数据相同 +def del_same_file(path, filenames): + result_list = [] + seen_prefixes = {} + for current_filename in filenames: + parts = current_filename.rsplit(CoreConst.REPLACEMENT_CHARACTER, 1) + prefix = parts[0] + if prefix not in seen_prefixes: + result_list.append(current_filename) + seen_prefixes[prefix] = current_filename + else: + current_file_path = os.path.join(path, current_filename) + current_file = load_npy(current_file_path) + prev_filename = seen_prefixes[prefix] + prev_file_path = os.path.join(path, prev_filename) + prev_file = load_npy(prev_file_path) + if np.array_equal(current_file, prev_file): + remove_path(current_file_path) + logger.warning(f"{current_file_path} is deleted!") + else: + result_list.append(current_filename) + return result_list + + +def rename_filename(path): + filenames = sort_filenames(path) + filenames = del_same_file(path, filenames) + + filename_dict = {} + for filename in filenames: + name_field = filename.rsplit(CoreConst.REPLACEMENT_CHARACTER, 1)[0] + + if name_field in filename_dict: + filename_dict[name_field] += 1 + else: + filename_dict[name_field] = 0 + + cell_index = filename_dict[name_field] + + # 修改文件名,增加重复调用Cell的序号 + if CoreConst.FORWARD_PATTERN in filename: + #Format: Cell.{cell_name}.{class_name}.{forward/backward}.{number}.{input/output}.{index}_{dtype}_{id}.npy + newFileName = filename.replace(CoreConst.FORWARD_PATTERN, CoreConst.FORWARD_PATTERN + str(cell_index) + CoreConst.SEP) + if CoreConst.BACKWARD_PATTERN in filename: + newFileName = filename.replace(CoreConst.BACKWARD_PATTERN, CoreConst.BACKWARD_PATTERN + str(cell_index) + CoreConst.SEP) + os.rename(os.path.join(path, filename), os.path.join(path, newFileName)) + logger.info(f"==========The rename_filename phase is Finished!==========") + + +# Extract the field between the first "." and the third to last ".", i.e. {cell_name} +def get_cell_name(str): + parts = str.split(CoreConst.SEP) + if len(parts) < 4: + return None + start_index = 1 + end_index = len(parts) - 3 + return CoreConst.SEP.join(parts[start_index:end_index]) + + +# Extract the field between the last "." and the second to last ".", i.e. {data_made} +def get_data_mode(str): + last_dot_index = str.rfind(CoreConst.SEP) + second_last_dot_index = str.rfind(CoreConst.SEP, 0, last_dot_index) + data_mode = str[second_last_dot_index + 1:last_dot_index] + return data_mode + + +# 判断二者之间是否存在父子关系 +def check_relation(cell_name, parent_cell_name): + layers_pattern = rf"{CoreConst.SEP}{KEY_LAYERS}{CoreConst.SEP}\d+$" + last_dot_index = cell_name.rfind(CoreConst.SEP) + if last_dot_index != -1: + # 如果cell_name最后一个'.'之前的字段等于parent_cell_name,则判定存在父子关系 + sub_cell_name = cell_name[:last_dot_index] + if sub_cell_name == parent_cell_name: + return True + elif re.search(layers_pattern, cell_name): + # 如果cell_name以".layer.{layer_id}"结尾,且去掉该字段后等于parent_cell_name,则判定存在父子关系 + sub_cell_name = re.sub(layers_pattern, '', cell_name) + if sub_cell_name == parent_cell_name: + return True + return False + + +def get_construct(cell_list_input): + for cell in cell_list_input: + cell_name = get_cell_name(cell) + cell_data_mode = get_data_mode(cell) + found_flag = False + for parent_cell in cell_list_input: + parent_cell_name = get_cell_name(parent_cell) + parent_data_mode = get_data_mode(parent_cell) + has_relation = check_relation(cell_name, parent_cell_name) + if has_relation and parent_data_mode == cell_data_mode: + construct.update({cell: parent_cell}) + found_flag = True + break + if not found_flag: + construct.update({cell: None}) + + +def generate_construct(path): + global construct + filenames = sort_filenames(path) + + # 提取文件名中Cell.{cell_name}.{class_name}.{data_mode}.{重复调用此cell的序号}字段,并存入cell_list + for filename in filenames: + point_position = 3 + mid_field = filename.rsplit(CoreConst.SEP, point_position)[0] + if KEY_INPUT in filename: + if mid_field in cell_list: + cell_list.remove(mid_field) + cell_list.append(mid_field) + else: + if mid_field not in cell_list: + index = filenames.index(filename) + output_field = mid_field + KEY_OUTPUT + find_flag = False + for filename_other in cell_list[index + 1:]: + if output_field in filename_other: + find_flag = True + if find_flag is False: + cell_list.append(mid_field) + + get_construct(cell_list) + + # 生成JSON文件 + rank_dir = os.path.dirname(path) + json_path = os.path.join(rank_dir, CONSTRUCT_FILE_NAME) + save_json(json_path, construct, indent=1) + + # 清空'construct'继续处理下一个路径下的数据 + construct = {} + logger.info(f"Construct data saved to {json_path}") + + +def process_file(file_path): + try: + # 读取.npy文件内容 + npy_content = load_npy(file_path) + logger.info(f"Loaded {file_path}: shape is {npy_content.shape}, dtype is {npy_content.dtype}") + + # 文件名举例:Cell.network._backbone.loss.CrossEntropyLoss.forward.0.input.0_float32_165.npy + parts = os.path.basename(file_path).split(CoreConst.SEP) + data_dtype = "" + # 获取0_float32_165或者0_in_float32_165中的float32 + data_dtype_list = parts[-2].split('_') + if len(data_dtype_list) > 1: + data_dtype = data_dtype_list[-2] + # op_name是Cell.network._backbone.loss.CrossEntropyLoss.forward.0 + op_name = CoreConst.SEP.join(parts[:-3]) + ms_dtype = np_ms_dtype_dict.get(data_dtype) + if ms_dtype is None: + logger.warning(f"Get dtype None from file {file_path}") + + #修改落盘文件名字,去掉TensorDump自带的数据类型和自增id字段 + data_file_name = os.path.basename(file_path) + data_file_dir = os.path.dirname(file_path) + parts = data_file_name.split(CoreConst.SEP) + if len(parts) >= 2: + param_index = parts[-2].split(CoreConst.REPLACEMENT_CHARACTER)[0] + pre_parts = CoreConst.SEP.join(parts[:-2]) + new_file_name = pre_parts + CoreConst.SEP + param_index + CoreConst.NUMPY_SUFFIX + os.rename(os.path.join(data_file_dir, data_file_name), os.path.join(data_file_dir, new_file_name)) + logger.info(f"{data_file_name} is renamed to {new_file_name}") + else: + logger.warning(f"Failed to rename {data_file_name}.") + new_file_name = data_file_name + + tensor_json = { + CoreConst.TYPE: 'mindspore.Tensor', + CoreConst.DTYPE: str(ms_dtype), + CoreConst.SHAPE: list(npy_content.shape), + CoreConst.MAX: npy_content.max().item(), + CoreConst.MIN: npy_content.min().item(), + CoreConst.MEAN: npy_content.mean().item(), + CoreConst.NORM: np.linalg.norm(npy_content).item(), + CoreConst.DATA_NAME: new_file_name + } + + # 根据文件名的最后一个部分(输入或输出)确定是添加到input_args还是output + if parts[-3] == KEY_INPUT: + return op_name, CoreConst.INPUT_ARGS, tensor_json + elif parts[-3] == KEY_OUTPUT: + return op_name, KEY_OUTPUT, tensor_json + else: + return None, None, None + + except Exception as e: + logger.error(f"Error reading {file_path}: {e}") + return None, None, None + + +def custom_sort(item, key_to_index): + key = item[0] + return key_to_index.get(key, float('inf')) + + +def generate_dump_info(path): + if not os.path.exists(path): + logger.error("The provided path does not exist.") + return + + dump_data = {"task": "tensor", "level": "L0", "dump_data_dir": path, "data": {}} + + with Pool(processes=10) as pool: + file_paths = [] + for root, _, files in os.walk(path): + for file in files: + if file.endswith(FileCheckConst.NUMPY_SUFFIX): + file_paths.append((os.path.join(root, file),)) + file_paths.sort() + results = pool.starmap(process_file, file_paths) + + # 收集结果 + for op_name, key, tensor_json in results: + if op_name: + if op_name not in dump_data.get(CoreConst.DATA, {}): + dump_data.get(CoreConst.DATA, {})[op_name] = {CoreConst.INPUT_ARGS: [], + CoreConst.INPUT_KWARGS: {}, + KEY_OUTPUT: []} + if key not in dump_data.get(CoreConst.DATA, {}).get(op_name, {}): + dump_data.get(CoreConst.DATA, {}).get(op_name, {})[key] = [] + dump_data.get(CoreConst.DATA, {}).get(op_name, {}).get(key, []).append(tensor_json) + + # 根据cell_list排序 + data_dict = dump_data.get(CoreConst.DATA, {}) + key_to_index = {key: index for index, key in enumerate(cell_list)} + sorted_data_dict = dict(sorted(data_dict.items(), key=lambda item: custom_sort(item, key_to_index))) + dump_data[CoreConst.DATA] = sorted_data_dict + + # 将数据写入dump.json + json_path = os.path.join(os.path.dirname(path), 'dump.json') + save_json(json_path, dump_data, indent=1) + + logger.info(f"Dump data saved to {json_path}") + + +def generate_stack_info(path): + if not os.path.exists(path): + logger.error("The provided path does not exist.") + return + + stack_data = {} + file_paths = [] + # 传入的path为工具生成的./dump_tensor_data,内容为npy文件 + for root, _, files in os.walk(path): + for file in files: + if file.endswith(FileCheckConst.NUMPY_SUFFIX): + file_paths.append(os.path.join(root, file)) + file_paths.sort() + for file_path in file_paths: + # 文件名举例:Cell.network._backbone.loss.CrossEntropyLoss.forward.0.input.0_float32_165.npy + parts = os.path.basename(file_path).split(CoreConst.SEP) + # op_name是Cell.network._backbone.loss.CrossEntropyLoss.forward.0 + op_name = CoreConst.SEP.join(parts[:-3]) + stack_data.update({op_name: []}) + + # 将数据写入stack.json + json_path = os.path.join(os.path.dirname(path), 'stack.json') + save_json(json_path, stack_data, indent=1) + + logger.info(f"Stack data saved to {json_path}") + + +def is_download_finished(directory, interval=3): + """ + 判断指定目录在一段时间后是否有数据被下载完成 + :param directory: 指定目录的路径 + :param interval: 检查的时间间隔(秒),默认为 3 秒 + :return: 如有数据被下载完成返回 True,否则返回 False + """ + # 检查目录是否存在 + if not os.path.exists(directory): + logger.warning(f"The specified directory {directory} does not exist.") + return False + initial_modification_time = os.path.getmtime(directory) + time.sleep(interval) + current_modification_time = os.path.getmtime(directory) + # 比较初始和当前修改时间 + if current_modification_time > initial_modification_time: + return False + else: + return True + + +def process(dump_path): + rank_id = os.environ.get('RANK_ID') + rank_dir = DEFAULT_RANK_DIR + if rank_id is not None: + rank_dir = CoreConst.RANK + str(rank_id) + + step_dir_list = os.listdir(dump_path) + for step_dir in step_dir_list: + step_path = os.path.join(dump_path, step_dir) + rank_path = os.path.join(step_path, rank_dir) + npy_path = os.path.join(rank_path, CoreConst.DUMP_TENSOR_DATA) + while True: + is_finished = is_download_finished(npy_path) + if not is_finished: + logger.info(f"There is data being downloaded in the specified directory, continue checking...") + else: + logger.info(f"There is no data being downloaded in the specified directory, Stop checking.") + break + logger.info(f"==========Start processing data that has already been stored on the disk!==========") + rename_filename(npy_path) + generate_construct(npy_path) + generate_dump_info(npy_path) + generate_stack_info(npy_path) + logger.info(f"==========JSON file generation completed!==========") + + +def get_yaml_keys(yaml_data): + keys = [] + for key, _ in yaml_data.items(): + keys.append(key) + return keys + + +def get_tensordump_mode(input_str): + left_index = input_str.find('(') + right_index = input_str.find(')') + + # 提取括号内的字符串 + if left_index != -1 and right_index != -1: + inner_str = input_str[left_index + 1:right_index] + # 分割字符串得到元素列表 + elements = inner_str.split(',') + if len(elements) >= 2: + # 去除元素前后的空格 + first_element = elements[0].strip() + second_element = elements[1].strip() + return first_element, second_element + return None, None + + +def set_tensordump_mode(cell, input_str): + first_str, second_str = get_tensordump_mode(input_str) + if first_str and second_str: + cell.input_dump_mode = first_str + cell.output_dump_mode = second_str + + +def start(net=None, dump_path="./", data_mode=CoreConst.ALL, td_config_path=''): + if net is None: + return + + if td_config_path == "": + yaml_data = {} + else: + yaml_data = load_yaml(td_config_path) + first_layer_key = get_yaml_keys(yaml_data) + + black_list = ["grad_reducer", ""] + for name, cell in net.cells_and_names(): + class_name = cell.__class__.__name__ + # 跳过黑名单cell + if name in black_list: + logger.info(f"Cell {name}.{class_name} is skipped!") + continue + # 跳过框架内部的cell + if class_name.startswith(CoreConst.REPLACEMENT_CHARACTER): + logger.info(f"Cell {name}.{class_name} is skipped!") + continue + else: + #Format: Cell.{cell_name}.{class_name} + cell.cell_prefix = CoreConst.SEP.join([CoreConst.CELL, name, cell.__class__.__name__]) + + # 根据yaml配置文件设置cell的TensorDump模式 + if class_name in first_layer_key: + layer_data = yaml_data.get(class_name) + if layer_data: + for child_name, child_cell in cell.cells_and_names(): + if child_name in layer_data: + set_tensordump_mode(child_cell, layer_data[child_name]) + top_layer_data = yaml_data.get(KEY_TOPLAYER) + if top_layer_data and name in top_layer_data: + set_tensordump_mode(cell, top_layer_data[name]) + + # 替换construct函数 + cell.construct = cell_construct_wrapper(cell.construct, cell) + logger.info(f"Cell {name}: construct function is wrapped!") + cell.dump_path = dump_path + cell.data_mode = data_mode + cell.input_clips = [] + cell.output_clips = [] + # It is assumed that each cell has a maximum of 50 outputs and 50 inputs. + for i in range(50): + cell.input_clips.append( + ops.InsertGradientOf(partial_func(clip_gradient, cell.dump_path, cell.cell_prefix, i, KEY_INPUT)) + ) + cell.output_clips.append( + ops.InsertGradientOf(partial_func(clip_gradient, cell.dump_path, cell.cell_prefix, i, KEY_OUTPUT)) + ) + + logger.info(f"==========The cell_dump_process_start phase is Finished!==========") + atexit.register(process, dump_path=dump_path) diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/dump_tool_factory.py b/debug/accuracy_tools/msprobe/mindspore/dump/dump_tool_factory.py index 0ca63b4a84aee00127bca37b7da36888e905a5aa..66fa892599f6290e476287a6cf17ca3107e0685b 100644 --- a/debug/accuracy_tools/msprobe/mindspore/dump/dump_tool_factory.py +++ b/debug/accuracy_tools/msprobe/mindspore/dump/dump_tool_factory.py @@ -17,13 +17,14 @@ from msprobe.mindspore.common.const import Const from msprobe.mindspore.debugger.debugger_config import DebuggerConfig from msprobe.mindspore.dump.kernel_graph_dump import KernelGraphDump from msprobe.mindspore.dump.kernel_kbyk_dump import KernelKbykDump +from msprobe.mindspore.dump.graph_mode_cell_dump import GraphModeCellDump class DumpToolFactory: tools = { Const.CELL: { - Const.GRAPH_KBYK_MODE: None, - Const.GRAPH_GE_MODE: None, + Const.GRAPH_KBYK_MODE: GraphModeCellDump, + Const.GRAPH_GE_MODE: GraphModeCellDump, Const.PYNATIVE_MODE: None }, Const.API: { @@ -39,9 +40,13 @@ class DumpToolFactory: } @staticmethod - def create(config: DebuggerConfig): - if len(config.data_mode) != 1 or config.data_mode[0] not in Const.GRAPH_DATA_MODE_LIST: - raise Exception("data_mode must be one of all, input, output.") + def create(config: DebuggerConfig, model=None): + if config.level == Const.CELL: + if len(config.data_mode) != 1 or config.data_mode[0] not in Const.GRAPH_CELL_DUMP_DATA_MODE_LIST: + raise Exception("data_mode must be one of all, forward, backward.") + else: + if len(config.data_mode) != 1 or config.data_mode[0] not in Const.GRAPH_DATA_MODE_LIST: + raise Exception("data_mode must be one of all, input, output.") tool = DumpToolFactory.tools.get(config.level) if not tool: raise Exception("Valid level is needed.") @@ -49,4 +54,4 @@ class DumpToolFactory: if not tool: raise Exception(f"Data dump is not supported in {config.execution_mode} mode " f"when dump level is {config.level}.") - return tool(config) + return tool(config, model) if tool == GraphModeCellDump else tool(config) diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/graph_mode_cell_dump.py b/debug/accuracy_tools/msprobe/mindspore/dump/graph_mode_cell_dump.py new file mode 100644 index 0000000000000000000000000000000000000000..52e2d57af22940807d361272273e1d82f7c8520c --- /dev/null +++ b/debug/accuracy_tools/msprobe/mindspore/dump/graph_mode_cell_dump.py @@ -0,0 +1,86 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +from msprobe.mindspore.common.log import logger +from msprobe.mindspore.debugger.debugger_config import DebuggerConfig +import mindspore as ms +from mindspore.ops.primitive import _run_op +from mindspore import hal, ops +import msprobe.mindspore.dump.cell_dump_process as cellDumper +from msprobe.mindspore.common.const import Const + +tensordump_flag = True +try: + from mindspore._c_expression import _tensordump_set_step +except ImportError: + tensordump_flag = False + + +class GraphModeCellDump: + def __init__(self, config: DebuggerConfig, model): + self.net = model + self.white_list = [] + self.black_list = [] + self.dump_path = config.dump_path if config.dump_path else "./" + self.rank = config.rank + self.step = config.step + self.scope = config.scope + self.list = config.list + self.data_mode = config.data_mode + self.file_format = config.file_format + self.td_config_path = config.td_config_path + self.check_config() + self.set_step() + + @staticmethod + def step(): + hal.synchronize() + temp_tensor = ms.Tensor([1], dtype=ms.float32) + step_flag = "" + _run_op(ops.TensorDump(), "TensorDump", (step_flag, temp_tensor)) + ops.tensordump(step_flag, temp_tensor) + + def check_config(self): + if self.rank != []: + raise Exception("In graph mode, cell dump does not currently support specifying rank.") + if self.scope != []: + raise Exception("In graph mode, cell dump does not currently support specifying scope.") + if self.list != []: + raise Exception("In graph mode, cell dump does not currently support specifying list.") + if len(self.data_mode) != 1 or self.data_mode[0] not in Const.GRAPH_CELL_DUMP_DATA_MODE_LIST: + raise Exception("In graph mode and cell dump, data_mode must be one of all, forword, backword.") + if self.file_format != []: + logger.warning("In graph mode, cell dump does not currently support specifying file_format. The file will be stored in npy format.") + if not self.net: + raise Exception("The model is empty and cell dump is not enabled.") + return True + + def set_step(self): + if tensordump_flag: + _tensordump_set_step(self.step) + else: + raise Exception( + "Importing _tensordump_set_step failed, " + "please use the latest version package of MindSpore." + ) + + def handle(self): + os.environ['MS_JIT_MODULES'] = 'msprobe' + cellDumper.start( + net=self.net, + dump_path=self.dump_path, + data_mode=self.data_mode[0], + td_config_path=self.td_config_path + ) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/api_register.py b/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/api_register.py new file mode 100644 index 0000000000000000000000000000000000000000..7a5737662d4e6619d90a6744f975d49fe1784825 --- /dev/null +++ b/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/api_register.py @@ -0,0 +1,142 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +from mindspore import Tensor, ops, mint +from mindspore.mint.nn import functional +from mindspore.communication import comm_func + +from msprobe.core.common.file_utils import load_yaml +from msprobe.core.common.utils import Const +from msprobe.core.data_dump.api_registry import ApiRegistry +from msprobe.mindspore.common.const import Const as MsConst +from msprobe.mindspore.common.utils import is_mindtorch +from msprobe.mindspore.dump.hook_cell.hook_cell import HOOKCell + + +stub_tensor_existed = True +try: + from mindspore.common._stub_tensor import StubTensor +except ImportError: + stub_tensor_existed = False + +cur_path = os.path.dirname(os.path.realpath(__file__)) +if not is_mindtorch(): + _api_types = { + Const.MS_FRAMEWORK: { + Const.MS_API_TYPE_OPS: (ops, (ops,)), + Const.MS_API_TYPE_TENSOR: (Tensor, (Tensor,)), + Const.MS_API_TYPE_MINT: (mint, (mint,)), + Const.MS_API_TYPE_MINT_FUNC: (functional, (functional,)), + Const.MS_API_TYPE_COM: (comm_func, (comm_func,)) + } + } + if stub_tensor_existed: + _api_types.get(Const.MS_FRAMEWORK).update( + {Const.MS_API_TYPE_STUB_TENSOR: (StubTensor, (StubTensor,))} + ) + + _supported_api_list_path = (os.path.join(cur_path, MsConst.SUPPORTED_API_LIST_FILE),) +else: + import torch + import torch_npu + _api_types = { + Const.MT_FRAMEWORK: { + Const.PT_API_TYPE_FUNCTIONAL: (torch.nn.functional, (torch.nn.functional,)), + Const.PT_API_TYPE_TENSOR: (torch.Tensor, (torch.Tensor,)), + Const.PT_API_TYPE_TORCH: (torch, (torch,)), + Const.PT_API_TYPE_NPU: (torch_npu, (torch_npu,)), + Const.PT_API_TYPE_DIST: (torch.distributed, (torch.distributed, torch.distributed.distributed_c10d)) + } + } + _supported_api_list_path = (os.path.join(cur_path, '../../../pytorch/hook_module', + MsConst.SUPPORTED_API_LIST_FILE),) + +_inner_used_api = { + Const.MS_FRAMEWORK + Const.SEP + Const.MS_API_TYPE_OPS: ( + ops, "norm", "square", "sqrt", "is_complex", "stack", "is_floating_point" + ), + Const.MS_FRAMEWORK + Const.SEP + Const.MS_API_TYPE_TENSOR: ( + Tensor, "to", "numel" + ), + Const.MS_FRAMEWORK + Const.SEP + Const.MS_API_TYPE_MINT: ( + mint, "max", "min", "mean", "norm" + ) +} + + +class ApiTemplate(HOOKCell): + def __init__(self, api_name, api_func, prefix, hook_build_func): + self.api_name = api_name + self.api_func = api_func + self.prefix_api_name = prefix + Const.SEP + str(api_name.split(Const.SEP)[-1]) + Const.SEP + super().__init__(hook_build_func) + + @staticmethod + def async_to_sync(output): + # Fake handle, used to return after the CommHandle executes the wait method + fake_handle = type("FakeHandle", (), {"wait": lambda self: None})() + if isinstance(output, tuple) and len(output) == 2 and hasattr(output[1], "wait"): + output[1].wait() + output = (output[0], fake_handle) + elif hasattr(output, "wait"): + output.wait() + output = fake_handle + return output + + def construct(self, *args, **kwargs): + if self.api_name.startswith(MsConst.DROPOUT_API_NAME_PREFIX): + return args[0] if args else kwargs.get(Const.INPUT) + + output = self.api_func(*args, **kwargs) + + if self.prefix_api_name.startswith(MsConst.DISTRIBUTED_DATA_PREFIX): + if kwargs.get("async_op") or self.api_name in ["isend", "irecv"]: + output = self.async_to_sync(output) + return output + + def forward(self, *args, **kwargs): + if self.api_name.startswith(MsConst.DROPOUT_API_NAME_PREFIX): + return args[0] if args else kwargs.get(Const.INPUT) + return self.api_func(*args, **kwargs) + + +api_register = None +stub_tensor_set = False + + +def get_api_register(return_new=False): + global stub_tensor_set + + def stub_method(method): + def wrapped_method(*args, **kwargs): + return method(*args, **kwargs) + return wrapped_method + if not is_mindtorch() and stub_tensor_existed and not stub_tensor_set: + api_names = load_yaml(_supported_api_list_path[0]).get(Const.MS_API_TYPE_TENSOR, []) + for attr_name in dir(StubTensor): + attr = getattr(StubTensor, attr_name) + if attr_name in api_names and callable(attr): + setattr(StubTensor, attr_name, stub_method(attr)) + stub_tensor_set = True + + if return_new: + return ApiRegistry(_api_types, _inner_used_api, _supported_api_list_path, ApiTemplate) + + global api_register + if api_register is None: + api_register = ApiRegistry(_api_types, _inner_used_api, _supported_api_list_path, ApiTemplate) + return api_register diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/api_registry.py b/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/api_registry.py deleted file mode 100644 index 7aee1deccd9689985c7a2e270648bd0877cd7cf3..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/api_registry.py +++ /dev/null @@ -1,207 +0,0 @@ -# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from mindspore import Tensor, ops, mint -from mindspore.mint.nn import functional -from mindspore.common._stub_tensor import StubTensor -from mindspore.communication import comm_func - -from msprobe.mindspore.dump.hook_cell.wrap_api import (HOOKTensor, HOOKStubTensor, HOOKFunctionalOP, - HOOKMintOP, HOOKMintNNFunctionalOP, HOOKDistributedOP, - HOOKTorchOP, HOOKTorchTensor, HOOKTorchFunctionalOP, - HOOKTorchDistributedOP, HOOKTorchNpuOP, - get_wrap_api_list, get_wrap_torch_api_list, setup_hooks) -from msprobe.core.common.utils import Const -from msprobe.mindspore.common.utils import is_mindtorch - -if is_mindtorch(): - import torch - import torch_npu - - -def stub_method(method): - def wrapped_method(*args, **kwargs): - return method(*args, **kwargs) - return wrapped_method - - -class ApiRegistry: - def __init__(self): - self.tensor_ori_attr = {} - self.stub_tensor_ori_attr = {} - self.functional_ori_attr = {} - self.mint_ops_ori_attr = {} - self.mint_func_ops_ori_attr = {} - self.distributed_ori_attr = {} - self.norm_inner_ops_ori_attr = {} - - self.torch_ori_attr = {} - self.torch_tensor_ori_attr = {} - self.torch_functional_ori_attr = {} - self.torch_distributed_ori_attr = {} - self.torch_npu_ori_attr = {} - - self.tensor_hook_attr = {} - self.stub_tensor_hook_attr = {} - self.functional_hook_attr = {} - self.mint_ops_hook_attr = {} - self.mint_func_ops_hook_attr = {} - self.distibuted_hook_attr = {} - self.norm_inner_ops_hook_attr = {} - - self.torch_hook_attr = {} - self.torch_tensor_hook_attr = {} - self.torch_functional_hook_attr = {} - self.torch_distributed_hook_attr = {} - self.torch_npu_hook_attr = {} - - self.norm_inner_ops = ["norm", "square", "sqrt", "is_complex"] - - @staticmethod - def store_ori_attr(ori_api_group, api_list, api_ori_attr): - for api in api_list: - if Const.SEP in api: - sub_module_name, sub_op = api.rsplit(Const.SEP, 1) - sub_module = getattr(ori_api_group, sub_module_name) - ori_api_func = getattr(sub_module, sub_op) - else: - ori_api_func = getattr(ori_api_group, api) - if ori_api_group == StubTensor: - api_ori_attr[api] = stub_method(ori_api_func) - continue - api_ori_attr[api] = ori_api_func - - @staticmethod - def set_api_attr(api_group, attr_dict): - for api, api_attr in attr_dict.items(): - if Const.SEP in api: - sub_module_name, sub_op = api.rsplit(Const.SEP, 1) - sub_module = getattr(api_group, sub_module_name, None) - if sub_module is not None: - setattr(sub_module, sub_op, api_attr) - else: - setattr(api_group, api, api_attr) - - def norm_inner_op_set_hook_func(self): - self.set_api_attr(ops, self.norm_inner_ops_hook_attr) - - def norm_inner_op_set_ori_func(self): - self.set_api_attr(ops, self.norm_inner_ops_ori_attr) - - def api_set_hook_func(self): - if is_mindtorch(): - self.set_api_attr(torch, self.torch_hook_attr) - self.set_api_attr(torch.Tensor, self.torch_tensor_hook_attr) - self.set_api_attr(torch.nn.functional, self.torch_functional_hook_attr) - self.set_api_attr(torch.distributed, self.torch_distributed_hook_attr) - self.set_api_attr(torch.distributed.distributed_c10d, self.torch_distributed_hook_attr) - self.set_api_attr(torch_npu, self.torch_npu_hook_attr) - else: - self.set_api_attr(Tensor, self.tensor_hook_attr) - self.set_api_attr(StubTensor, self.stub_tensor_hook_attr) - self.set_api_attr(ops, self.functional_hook_attr) - self.set_api_attr(mint, self.mint_ops_hook_attr) - self.set_api_attr(functional, self.mint_func_ops_hook_attr) - self.set_api_attr(comm_func, self.distibuted_hook_attr) - - def api_set_ori_func(self): - if is_mindtorch(): - self.set_api_attr(torch, self.torch_ori_attr) - self.set_api_attr(torch.Tensor, self.torch_tensor_ori_attr) - self.set_api_attr(torch.nn.functional, self.torch_functional_ori_attr) - self.set_api_attr(torch.distributed, self.torch_distributed_ori_attr) - self.set_api_attr(torch.distributed.distributed_c10d, self.torch_distributed_ori_attr) - self.set_api_attr(torch_npu, self.torch_npu_ori_attr) - else: - self.set_api_attr(Tensor, self.tensor_ori_attr) - self.set_api_attr(StubTensor, self.stub_tensor_ori_attr) - self.set_api_attr(ops, self.functional_ori_attr) - self.set_api_attr(mint, self.mint_ops_ori_attr) - self.set_api_attr(functional, self.mint_func_ops_ori_attr) - self.set_api_attr(comm_func, self.distributed_ori_attr) - - def initialize_hook(self, hook): - setup_hooks(hook) - if is_mindtorch(): - wrap_torch_api_name = get_wrap_torch_api_list() - self.store_ori_attr(torch, - wrap_torch_api_name.torch_api_names, self.torch_ori_attr) - self.store_ori_attr(torch.Tensor, - wrap_torch_api_name.tensor_api_names, self.torch_tensor_ori_attr) - self.store_ori_attr(torch.nn.functional, - wrap_torch_api_name.functional_api_names, self.torch_functional_ori_attr) - self.store_ori_attr(torch.distributed, - wrap_torch_api_name.distributed_api_names, self.torch_distributed_ori_attr) - self.store_ori_attr(torch_npu, - wrap_torch_api_name.npu_api_names, self.torch_npu_ori_attr) - for attr_name in dir(HOOKTorchOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.torch_hook_attr[api_name] = getattr(HOOKTorchOP, attr_name) - for attr_name in dir(HOOKTorchTensor): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.torch_tensor_hook_attr[api_name] = getattr(HOOKTorchTensor, attr_name) - for attr_name in dir(HOOKTorchFunctionalOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.torch_functional_hook_attr[api_name] = getattr(HOOKTorchFunctionalOP, attr_name) - for attr_name in dir(HOOKTorchDistributedOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.torch_distributed_hook_attr[api_name] = getattr(HOOKTorchDistributedOP, attr_name) - for attr_name in dir(HOOKTorchNpuOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.torch_npu_hook_attr[api_name] = getattr(HOOKTorchNpuOP, attr_name) - return - - wrap_api_name = get_wrap_api_list() - self.store_ori_attr(Tensor, wrap_api_name.tensor_api_names, self.tensor_ori_attr) - self.store_ori_attr(StubTensor, wrap_api_name.stub_tensor_api_names, self.stub_tensor_ori_attr) - self.store_ori_attr(ops, wrap_api_name.ops_api_names, self.functional_ori_attr) - self.store_ori_attr(mint, wrap_api_name.mint_api_names, self.mint_ops_ori_attr) - self.store_ori_attr(functional, wrap_api_name.mint_nn_func_api_names, self.mint_func_ops_ori_attr) - self.store_ori_attr(comm_func, wrap_api_name.distributed_api_names, self.distributed_ori_attr) - self.store_ori_attr(ops, self.norm_inner_ops, self.norm_inner_ops_ori_attr) - for attr_name in dir(HOOKTensor): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.tensor_hook_attr[api_name] = getattr(HOOKTensor, attr_name) - for attr_name in dir(HOOKStubTensor): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.stub_tensor_hook_attr[api_name] = getattr(HOOKStubTensor, attr_name) - for attr_name in dir(HOOKFunctionalOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.functional_hook_attr[api_name] = getattr(HOOKFunctionalOP, attr_name) - if api_name in self.norm_inner_ops: - self.norm_inner_ops_hook_attr[api_name] = getattr(HOOKFunctionalOP, attr_name) - for attr_name in dir(HOOKMintOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.mint_ops_hook_attr[api_name] = getattr(HOOKMintOP, attr_name) - for attr_name in dir(HOOKMintNNFunctionalOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.mint_func_ops_hook_attr[api_name] = getattr(HOOKMintNNFunctionalOP, attr_name) - for attr_name in dir(HOOKDistributedOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - api_name = attr_name[Const.ATTR_NAME_PREFIX_LEN:] - self.distibuted_hook_attr[api_name] = getattr(HOOKDistributedOP, attr_name) - - -api_register = ApiRegistry() diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/hook_cell.py b/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/hook_cell.py index b68a7d995a56497a219281c5a43d692c46cfac4d..7007992ca4540a06b1ebc85a068179e88ec589cc 100644 --- a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/hook_cell.py +++ b/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/hook_cell.py @@ -28,23 +28,22 @@ def get_cell_count(name): return HOOKCell.cell_count[name] -def __init__(self, build_hook) -> None: +def __init__(self, hook_build_func) -> None: super(HOOKCell, self).__init__() self.changed_status = False self.input_kwargs = {} - self.prefix = "" if not HOOKCell.g_stop_hook: HOOKCell.g_stop_hook = True self.changed_status = True - if hasattr(self, "prefix_api_name"): - self.prefix = self.prefix_api_name - self.forward_data_collected = False - forward_pre_hook, forward_hook, backward_hook, backward_pre_hook = build_hook(self.prefix) - self.register_forward_pre_hook(forward_pre_hook) - self.register_forward_hook(forward_hook) - register_backward_hook_functions["full"](self, backward_hook) - register_backward_hook_functions["pre"](self, backward_pre_hook) + + prefix = self.prefix_api_name if hasattr(self, "prefix_api_name") else "" + if callable(hook_build_func): + forward_pre_hook, forward_hook, backward_hook, backward_pre_hook = hook_build_func(prefix) + self.register_forward_pre_hook(forward_pre_hook) + self.register_forward_hook(forward_hook) + register_backward_hook_functions["full"](self, backward_hook) + register_backward_hook_functions["pre"](self, backward_pre_hook) # 重载call,加全局标志。 diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/wrap_api.py b/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/wrap_api.py deleted file mode 100644 index 0e97929ecd7f8444b19fd531efc49883d0df58de..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/mindspore/dump/hook_cell/wrap_api.py +++ /dev/null @@ -1,212 +0,0 @@ -# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os - -from mindspore import Tensor, mint, ops -from mindspore.common._stub_tensor import StubTensor -from mindspore.communication import comm_func -from mindspore.mint.nn import functional - -from msprobe.core.common.const import Const -from msprobe.core.common.file_utils import load_yaml -from msprobe.mindspore.common.const import Const as MsConst -from msprobe.mindspore.common.utils import is_mindtorch -from msprobe.mindspore.dump.hook_cell.hook_cell import HOOKCell - -if is_mindtorch(): - import torch - import torch_npu - -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, MsConst.SUPPORTED_API_LIST_FILE) -torch_yaml_path = os.path.join(cur_path, "../../../pytorch/hook_module", MsConst.SUPPORTED_API_LIST_FILE) - - -class HOOKTensor(object): - pass - - -class HOOKStubTensor(object): - pass - - -class HOOKFunctionalOP(object): - pass - - -class HOOKMintOP(object): - pass - - -class HOOKMintNNFunctionalOP(object): - pass - - -class HOOKDistributedOP(object): - pass - - -class HOOKTorchOP(object): - pass - - -class HOOKTorchTensor(object): - pass - - -class HOOKTorchFunctionalOP(object): - pass - - -class HOOKTorchDistributedOP(object): - pass - - -class HOOKTorchNpuOP(object): - pass - - -class ApiTemplate(HOOKCell): - def __init__(self, api_name, api_dict, prefix, hook): - self.api_name = api_name - self.api_func = api_dict[api_name] - self.prefix_api_name = prefix + str(api_name.split(Const.SEP)[-1]) + Const.SEP - super().__init__(hook) - - @staticmethod - def async_to_sync(output): - # Fake handle, used to return after the CommHandle executes the wait method - fake_handle = type("FakeHandle", (), {"wait": lambda self: None})() - if isinstance(output, tuple) and len(output) == 2 and hasattr(output[1], "wait"): - output[1].wait() - output = (output[0], fake_handle) - elif hasattr(output, "wait"): - output.wait() - output = fake_handle - return output - - def construct(self, *args, **kwargs): - if self.api_name.startswith(MsConst.DROPOUT_API_NAME_PREFIX): - return args[0] if args else kwargs.get(Const.INPUT) - - output = self.api_func(*args, **kwargs) - - if self.prefix_api_name.startswith(MsConst.DISTRIBUTED_DATA_PREFIX): - if kwargs.get("async_op") or self.api_name in ["isend", "irecv"]: - output = self.async_to_sync(output) - return output - - def forward(self, *args, **kwargs): - if self.api_name.startswith(MsConst.DROPOUT_API_NAME_PREFIX): - return args[0] if args else kwargs.get(Const.INPUT) - return self.api_func(*args, **kwargs) - - -class WrapApiName: - def __init__(self, tensor_api_names, stub_tensor_api_names, ops_api_names, mint_api_names, mint_nn_func_api_names, - distributed_api_names): - self.tensor_api_names = tensor_api_names - self.stub_tensor_api_names = stub_tensor_api_names - self.ops_api_names = ops_api_names - self.mint_api_names = mint_api_names - self.mint_nn_func_api_names = mint_nn_func_api_names - self.distributed_api_names = distributed_api_names - - -class WrapTorchApiName: - def __init__(self, torch_api_names, tensor_api_names, functional_api_names, distributed_api_names, npu_api_names): - self.torch_api_names = torch_api_names - self.tensor_api_names = tensor_api_names - self.functional_api_names = functional_api_names - self.distributed_api_names = distributed_api_names - self.npu_api_names = npu_api_names - - -def get_wrap_api_list(): - api_list = load_yaml(yaml_path) - tensor_api = api_list.get(MsConst.SUPPORTED_TENSOR_LIST_KEY) - ops_api = api_list.get(MsConst.SUPPORTED_OPS_LIST_KEY) - mint_api = api_list.get(MsConst.SUPPORTED_MINT_LIST_KEY) - mint_nn_func_api = api_list.get(MsConst.SUPPORTED__MINT_NN_FUNC_LIST_KEY) - distributed_api = api_list.get(MsConst.SUPPORTED_COMM_LIST_KEY) - wrap_api_name = WrapApiName(set(tensor_api) & set(dir(Tensor)), - set(tensor_api) & set(dir(StubTensor)), - set(ops_api) & set(dir(ops)), - set(mint_api) & set(dir(mint)), - set(mint_nn_func_api) & set(dir(functional)), - set(distributed_api) & set(dir(comm_func))) - return wrap_api_name - - -def get_wrap_torch_api_list(): - api_list = load_yaml(torch_yaml_path) - torch_api = api_list.get("torch") - tensor_api = api_list.get("tensor") - functional_api = api_list.get("functional") - distributed_api = api_list.get("distributed") - npu_api = api_list.get("torch_npu") - wrap_api_name = WrapTorchApiName(set(torch_api) & set(dir(torch)), - set(tensor_api) & set(dir(torch.Tensor)), - set(functional_api) & set(dir(torch.nn.functional)), - set(distributed_api) & set(dir(torch.distributed)), - set(npu_api) & set(dir(torch_npu))) - return wrap_api_name - - -def wrap_api_func(api_name, api_dict, prefix, hook): - def api_function(*args, **kwargs): - return ApiTemplate(api_name, api_dict, prefix, hook)(*args, **kwargs) - return api_function - - -def wrap_api_func_and_bind(api_list, api_dict, prefix, hook, hook_class): - for api_name in api_list: - if callable(api_dict[api_name]): - setattr(hook_class, Const.ATTR_NAME_PREFIX + api_name, wrap_api_func(api_name, api_dict, prefix, hook)) - - -def setup_hooks(hook): - if is_mindtorch(): - torch_wrap_api_name = get_wrap_torch_api_list() - wrap_api_func_and_bind(torch_wrap_api_name.torch_api_names, - {f: getattr(torch, f) for f in dir(torch)}, - MsConst.TORCH_DATA_PREFIX, hook, HOOKTorchOP) - wrap_api_func_and_bind(torch_wrap_api_name.tensor_api_names, - {f: getattr(torch.Tensor, f) for f in dir(torch.Tensor)}, - MsConst.TENSOR_DATA_PREFIX, hook, HOOKTorchTensor) - wrap_api_func_and_bind(torch_wrap_api_name.functional_api_names, - {f: getattr(torch.nn.functional, f) for f in dir(torch.nn.functional)}, - MsConst.OPS_DATA_PREFIX, hook, HOOKTorchFunctionalOP) - wrap_api_func_and_bind(torch_wrap_api_name.distributed_api_names, - {f: getattr(torch.distributed, f) for f in dir(torch.distributed)}, - MsConst.DISTRIBUTED_DATA_PREFIX, hook, HOOKTorchDistributedOP) - wrap_api_func_and_bind(torch_wrap_api_name.npu_api_names, {f: getattr(torch_npu, f) for f in dir(torch_npu)}, - MsConst.TORCH_NPU_DATA_PREFIX, hook, HOOKTorchNpuOP) - return - - wrap_api_name = get_wrap_api_list() - wrap_api_func_and_bind(wrap_api_name.tensor_api_names, {f: getattr(Tensor, f) for f in dir(Tensor)}, - MsConst.TENSOR_DATA_PREFIX, hook, HOOKTensor) - wrap_api_func_and_bind(wrap_api_name.stub_tensor_api_names, {f: getattr(StubTensor, f) for f in dir(StubTensor)}, - MsConst.STUB_TENSOR_DATA_PREFIX, hook, HOOKStubTensor) - wrap_api_func_and_bind(wrap_api_name.ops_api_names, {f: getattr(ops, f) for f in dir(ops)}, - MsConst.OPS_DATA_PREFIX, hook, HOOKFunctionalOP) - wrap_api_func_and_bind(wrap_api_name.mint_api_names, {f: getattr(mint, f) for f in dir(mint)}, - MsConst.MINT_DATA_PREFIX, hook, HOOKMintOP) - wrap_api_func_and_bind(wrap_api_name.mint_nn_func_api_names, {f: getattr(functional, f) for f in dir(functional)}, - MsConst.MINT_NN_FUNC_DATA_PREFIX, hook, HOOKMintNNFunctionalOP) - wrap_api_func_and_bind(wrap_api_name.distributed_api_names, {f: getattr(comm_func, f) for f in dir(comm_func)}, - MsConst.DISTRIBUTED_DATA_PREFIX, hook, HOOKDistributedOP) diff --git a/debug/accuracy_tools/msprobe/mindspore/dump/jit_dump.py b/debug/accuracy_tools/msprobe/mindspore/dump/jit_dump.py index 0a32200639a1f3805f815c37caaef5d3bb64c82f..634b15767528da447adadbe324aa4163adc14838 100644 --- a/debug/accuracy_tools/msprobe/mindspore/dump/jit_dump.py +++ b/debug/accuracy_tools/msprobe/mindspore/dump/jit_dump.py @@ -16,6 +16,7 @@ import os from collections import defaultdict +import mindspore from mindspore._c_expression import PyNativeExecutor_ try: from mindspore.common.api import _MindsporeFunctionExecutor @@ -25,7 +26,10 @@ except ImportError: from msprobe.core.common.log import logger from msprobe.core.common.const import Const from msprobe.core.data_dump.data_processor.base import ModuleForwardInputsOutputs, ModuleBackwardInputsOutputs -from msprobe.mindspore.dump.hook_cell.api_registry import api_register +from msprobe.mindspore.dump.hook_cell.api_register import get_api_register + + +_api_register = get_api_register() def dump_jit(name, in_feat, out_feat, is_forward): @@ -69,7 +73,7 @@ class JitDump(_MindsporeFunctionExecutor): def __call__(self, *args, **kwargs): if JitDump.jit_dump_switch: - api_register.api_set_ori_func() + _api_register.restore_all_api() out = super().__call__(*args, **kwargs) if JitDump.jit_dump_switch and len(args) > 0: if self.name and self.name != "construct": @@ -80,7 +84,7 @@ class JitDump(_MindsporeFunctionExecutor): elif len(args) == 0: logger.warning(f"The jit function {self.name} has no input arguments, nothing will be dumped.") if JitDump.jit_dump_switch: - api_register.api_set_hook_func() + _api_register.register_all_api() return out @classmethod @@ -101,9 +105,12 @@ class JitDump(_MindsporeFunctionExecutor): def grad(self, obj, grad, weights, grad_position, *args, **kwargs): if JitDump.jit_dump_switch and JitDump.jit_enable: - api_register.api_set_ori_func() - output = self._executor.grad(grad, obj, weights, grad_position, *args, *(kwargs.values())) + _api_register.restore_all_api() + if mindspore.__version__ >= "2.5": + output = self._executor.grad(grad, obj, weights, grad_position, False, *args, *(kwargs.values())) + else: + output = self._executor.grad(grad, obj, weights, grad_position, *args, *(kwargs.values())) if JitDump.jit_dump_switch and JitDump.jit_enable: dump_jit(obj, args, None, False) - api_register.api_set_hook_func() + _api_register.register_all_api() return output diff --git a/debug/accuracy_tools/msprobe/mindspore/free_benchmark/api_pynative_self_check.py b/debug/accuracy_tools/msprobe/mindspore/free_benchmark/api_pynative_self_check.py index 57b7de4fa567d73a19178256d79f5e4cbeb38864..da4821b3ac45a689fab5ba5c63515f88bd6e17c3 100644 --- a/debug/accuracy_tools/msprobe/mindspore/free_benchmark/api_pynative_self_check.py +++ b/debug/accuracy_tools/msprobe/mindspore/free_benchmark/api_pynative_self_check.py @@ -19,6 +19,7 @@ import os import traceback import mindspore as ms + from msprobe.core.common.const import Const from msprobe.core.common.exceptions import DistributedNotInitializedError from msprobe.core.common.file_utils import check_path_length, load_yaml @@ -27,7 +28,7 @@ from msprobe.mindspore.common.const import FreeBenchmarkConst from msprobe.mindspore.common.log import logger from msprobe.mindspore.common.utils import get_rank_if_initialized from msprobe.mindspore.debugger.debugger_config import DebuggerConfig -from msprobe.mindspore.dump.hook_cell.api_registry import api_register +from msprobe.mindspore.dump.hook_cell.api_register import get_api_register from msprobe.mindspore.dump.hook_cell.hook_cell import HOOKCell from msprobe.mindspore.free_benchmark.common.config import Config from msprobe.mindspore.free_benchmark.common.handler_params import HandlerParams @@ -37,6 +38,9 @@ from msprobe.mindspore.free_benchmark.perturbation.perturbation_factory import P from msprobe.mindspore.runtime import Runtime +_api_register = get_api_register() + + class ApiPyNativeSelfCheck: def __init__(self, config: DebuggerConfig): Config.is_enable = True @@ -60,8 +64,8 @@ class ApiPyNativeSelfCheck: self.store_original_func() def handle(self): - api_register.initialize_hook(self.build_hook) - api_register.api_set_hook_func() + _api_register.initialize_hook(self.build_hook) + _api_register.register_all_api() def build_hook(self, api_name): def pre_hook(cell, input_data): @@ -166,13 +170,13 @@ def check_self(api_name_with_id, output, ori_func, *args, **kwargs): return ret logger.info(f"[{api_name_with_id}] is {Config.handler_type}ing.") - api_register.api_set_ori_func() + _api_register.restore_all_api() try: perturbation = PerturbationFactory.create(api_name_with_id) params.fuzzed_result = perturbation.handle(params) if params.fuzzed_result is False: - api_register.api_set_hook_func() + _api_register.register_all_api() return ret if Config.stage == Const.BACKWARD: params.original_result = Tools.get_grad(params.original_func, *params.args, **params.kwargs) @@ -183,7 +187,7 @@ def check_self(api_name_with_id, output, ori_func, *args, **kwargs): logger.error(f"[{api_name_with_id}] Error: {str(e)}") logger.error(f"[{api_name_with_id}] Error detail: {traceback.format_exc()}") - api_register.api_set_hook_func() + _api_register.register_all_api() return ret diff --git a/debug/accuracy_tools/msprobe/mindspore/mindtorch/__init__.py b/debug/accuracy_tools/msprobe/mindspore/mindtorch/__init__.py index fc695d05ccc010f824b61db39a8ea77714d2d73b..13427188c913bff230fd798c1374c9839d7dd092 100644 --- a/debug/accuracy_tools/msprobe/mindspore/mindtorch/__init__.py +++ b/debug/accuracy_tools/msprobe/mindspore/mindtorch/__init__.py @@ -1,18 +1,18 @@ -# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from .mindtorch_adaptor import (_call_impl, - register_full_backward_pre_hook, - register_full_backward_hook) +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .mindtorch_adaptor import (_call_impl, + register_full_backward_pre_hook, + register_full_backward_hook) diff --git a/debug/accuracy_tools/msprobe/mindspore/monitor/utils.py b/debug/accuracy_tools/msprobe/mindspore/monitor/utils.py index a27172f19ead537f276c5ce0820b405d7abb6e25..c85e66a65ba26fdbc1d10a8e55c8273236409b36 100644 --- a/debug/accuracy_tools/msprobe/mindspore/monitor/utils.py +++ b/debug/accuracy_tools/msprobe/mindspore/monitor/utils.py @@ -258,7 +258,7 @@ def validate_config(config): step_interval = config.get('step_interval', 1) validate_step_interval(step_interval) - collect_times = config.get('collect_times', 1e8) + collect_times = config.get('collect_times', int(1e8)) validate_collect_times(collect_times) if not targets: diff --git a/debug/accuracy_tools/msprobe/mindspore/ms_config.py b/debug/accuracy_tools/msprobe/mindspore/ms_config.py index f20ed804c5bb8d8fbe4dba3e208060e8f52a3120..ff7fc28e76e25a6463b26fd49d2aea3d1900207e 100644 --- a/debug/accuracy_tools/msprobe/mindspore/ms_config.py +++ b/debug/accuracy_tools/msprobe/mindspore/ms_config.py @@ -28,6 +28,7 @@ class TensorConfig(BaseConfig): super().__init__(json_config) self.check_mode = None self.file_format = json_config.get("file_format") + self.td_config_path = json_config.get("td_config_path") self.check_config() self._check_config() diff --git a/debug/accuracy_tools/msprobe/mindspore/service.py b/debug/accuracy_tools/msprobe/mindspore/service.py index 5afbd046be4caf29c4b247a0f8fdd655c5208fd0..1d30b19e3112aa3421ca76d3477c5c824fa649e7 100644 --- a/debug/accuracy_tools/msprobe/mindspore/service.py +++ b/debug/accuracy_tools/msprobe/mindspore/service.py @@ -41,7 +41,7 @@ from msprobe.mindspore.cell_processor import CellProcessor from msprobe.mindspore.common.log import logger from msprobe.mindspore.common.utils import (get_rank_if_initialized, clean_input_kwargs, is_mindtorch, register_backward_hook_functions) -from msprobe.mindspore.dump.hook_cell.api_registry import api_register +from msprobe.mindspore.dump.hook_cell.api_register import get_api_register from msprobe.mindspore.dump.hook_cell.primitive_hooks import PrimitiveHookService from msprobe.mindspore.dump.jit_dump import JitDump from msprobe.mindspore.dump.hook_cell.hook_cell import HOOKCell @@ -63,6 +63,8 @@ class Service: self.inner_switch = False self.primitive_switch = False self.current_iter = 0 + self.loop = 0 + self.init_step = 0 self.first_start = True self.current_rank = None self.dump_iter_dir = None @@ -71,6 +73,7 @@ class Service: self.params_grad_info = {} self.hook_handle_dict = {} # 提前注册,确保注册尽可能多的API hook + self.api_register = get_api_register() self.register_api_hook() self.init_for_debug_level() @@ -271,16 +274,15 @@ class Service: def step(self): if self.config.level == Const.LEVEL_DEBUG: return - if self.config.async_dump: - self.data_collector.fill_stack_tensor_data() - if self.config.task == Const.TENSOR: - self.data_collector.data_processor.dump_async_data() + if self.config.async_dump and self.config.task == Const.TENSOR: + self.data_collector.data_processor.dump_async_data() self.data_collector.write_json() - self.current_iter += 1 - self.data_collector.update_iter(self.current_iter) + self.loop += 1 self.reset_status() def start(self, model=None): + self.current_iter = self.loop + self.init_step + self.data_collector.update_iter(self.current_iter) if self.config.level == Const.LEVEL_DEBUG: return self.start_call = True @@ -321,7 +323,7 @@ class Service: PIJitCaptureContext.__exit__ = self.empty self.first_start = False - api_register.api_set_hook_func() + self.api_register.register_all_api() self.switch = True self.primitive_switch = True logger.info(f"Dump switch is turned on at step {self.current_iter}. ") @@ -346,10 +348,8 @@ class Service: self.switch = False self.primitive_switch = False self.start_call = False - if self.config.async_dump: - self.data_collector.fill_stack_tensor_data() - if self.config.task == Const.TENSOR: - self.data_collector.data_processor.dump_async_data() + if self.config.async_dump and self.config.task == Const.TENSOR: + self.data_collector.data_processor.dump_async_data() self.data_collector.write_json() JitDump.jit_dump_switch = False @@ -410,8 +410,8 @@ class Service: def register_api_hook(self): if self.config.level in [Const.LEVEL_MIX, Const.LEVEL_L1, Const.LEVEL_L2]: logger.info(f"The api {self.config.task} hook function is successfully mounted to the model.") - api_register.initialize_hook(functools.partial(self.build_hook, BaseScope.Module_Type_API)) - api_register.api_set_hook_func() + self.api_register.initialize_hook(functools.partial(self.build_hook, BaseScope.Module_Type_API)) + self.api_register.register_all_api() def get_cells_and_names(self): cells_and_names_with_index = {} diff --git a/debug/accuracy_tools/msprobe/mindspore/task_handler_factory.py b/debug/accuracy_tools/msprobe/mindspore/task_handler_factory.py index a9cb5e6dd4037dcdeffe3c4d9584ad93c42022d6..10b74ea22b02d0668d0b3b17a569c5e1a67c1dd8 100644 --- a/debug/accuracy_tools/msprobe/mindspore/task_handler_factory.py +++ b/debug/accuracy_tools/msprobe/mindspore/task_handler_factory.py @@ -29,11 +29,14 @@ class TaskHandlerFactory: } @staticmethod - def create(config: DebuggerConfig): + def create(config: DebuggerConfig, model=None): task = TaskHandlerFactory.tasks.get(config.task) if not task: raise Exception("Valid task is needed.") - handler = task.create(config) + if task == DumpToolFactory: + handler = task.create(config, model) + else: + handler = task.create(config) if not handler: raise Exception("Can not find task handler") return handler diff --git a/debug/accuracy_tools/msprobe/msprobe.py b/debug/accuracy_tools/msprobe/msprobe.py index 8e0386fde6dccc071c3d9d8e1a86729a2c483c7c..127e042f65aba809e153787d09275f3bc699d077 100644 --- a/debug/accuracy_tools/msprobe/msprobe.py +++ b/debug/accuracy_tools/msprobe/msprobe.py @@ -51,6 +51,7 @@ def main(): graph_service_cmd_parser = subparsers.add_parser('graph') op_generate_cmd_parser = subparsers.add_parser('op_generate') merge_result_parser = subparsers.add_parser('merge_result') + config_checking_parser = subparsers.add_parser('config_checking') _compare_parser(compare_cmd_parser) _merge_result_parser(merge_result_parser) @@ -71,6 +72,8 @@ def main(): from msprobe.visualization.graph_service import _pt_graph_service_parser, _pt_graph_service_command from msprobe.pytorch.api_accuracy_checker.generate_op_script.op_generator import _op_generator_parser, \ _run_operator_generate_commond + from msprobe.pytorch.config_checking.config_checking import _config_checking_parser, \ + _run_config_checking_command _run_ut_parser(run_ut_cmd_parser) _run_ut_parser(multi_run_ut_cmd_parser) @@ -80,6 +83,7 @@ def main(): _run_overflow_check_parser(run_overflow_check_cmd_parser) _pt_graph_service_parser(graph_service_cmd_parser) _op_generator_parser(op_generate_cmd_parser) + _config_checking_parser(config_checking_parser) elif framework_args.framework == Const.MS_FRAMEWORK: from msprobe.mindspore.api_accuracy_checker.cmd_parser import add_api_accuracy_checker_argument from msprobe.visualization.graph_service import _ms_graph_service_parser, _ms_graph_service_command @@ -118,6 +122,8 @@ def main(): compare_cli(args) elif sys.argv[3] == "merge_result": merge_result_cli(args) + elif sys.argv[3] == "config_checking": + _run_config_checking_command(args) else: if not is_module_available(Const.MS_FRAMEWORK): logger.error("MindSpore does not exist, please install MindSpore library") diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/common/config.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/common/config.py index f2b2d6a30463c62846bcc02e147c9c319f55d1b8..588a1eb349a6223f1c86df04fe3ae590a4e2a1ca 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/common/config.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/common/config.py @@ -52,7 +52,9 @@ class Config: 'host': str, 'port': int, 'rank_list': list, - 'tls_path': str + 'tls_path': str, + 'master_ip': str, + 'master_port': str } if key not in validators: raise ValueError(f"{key} must be one of {validators.keys()}") @@ -72,6 +74,10 @@ class Config: RunUTConfig.check_nfs_path_config(value) if key == 'tls_path': RunUTConfig.check_tls_path_config(value) + if key == 'master_ip': + RunUTConfig.check_master_ip_config(value) + if key == 'master_port': + RunUTConfig.check_master_port_config(value) return value @@ -91,6 +97,8 @@ class CheckerConfig: self.port = msCheckerConfig.port self.rank_list = msCheckerConfig.rank_list self.tls_path = msCheckerConfig.tls_path + self.master_ip = msCheckerConfig.master_ip + self.master_port = msCheckerConfig.master_port if task_config: self.load_config(task_config) @@ -105,6 +113,8 @@ class CheckerConfig: self.port = task_config.port self.rank_list = task_config.rank_list self.tls_path = task_config.tls_path + self.master_ip = task_config.master_ip + self.master_port = task_config.master_port def get_online_config(self): return OnlineConfig( diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/algorithm.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/algorithm.py index ddee254c2b1085f9af96fe2774c53fb88c5821f4..abe8f2b4b3cd1cf8195fc86ed5c6a07e1daddf15 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/algorithm.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/algorithm.py @@ -261,3 +261,54 @@ def compare_bool_tensor(bench_output, device_output): error_rate = float(error_nums / bench_output.size) result = CompareConst.PASS if error_rate == 0 else CompareConst.ERROR return error_rate, result, "" + + +def maximize_kahan_loss(cumsum, addend, negative=False): + """ + Calculate the precision loss in Kahan summation and select the maximum or minimum loss. + + Parameters: + cumsum (torch.Tensor): The current cumulative sum. + addend (torch.Tensor): The value to be added in the current step. + negative (bool): Whether to select the negative direction of loss. + Default is False (select positive direction which minimizes the sum). + + Returns: + loss_res (torch.Tensor): The selected maximum or minimum loss value. + mask (torch.Tensor): + A boolean mask indicating whether the loss value should be compensated. + """ + loss_all = (cumsum + addend) - cumsum - addend + if negative: + loss_res = torch.min(loss_all, dim=0)[0] + mask = loss_res <= 0 + else: + loss_res = torch.max(loss_all, dim=0)[0] + mask = loss_res >= 0 + return loss_res, mask + + +def kahan_range(tensors, negative=False): + """ + Perform Kahan summation on a list of tensors and track precision loss. + + Parameters: + tensors (list of torch.Tensor): The list of tensors to be summed. + negative (bool): Whether to select the negative direction of loss. + Default is False (select positive direction which minimizes the sum). + Returns: + sum_max: The summation results. + """ + if len(tensors) < 1: + raise ValueError("tensors should have at least 1 element") + cumsum_temp = torch.clone(tensors[0]).unsqueeze(dim=0) + sum_max = torch.clone(tensors[0]) + loss_max = torch.tensor(0) + + for tensor in tensors[1:]: + addend = tensor - loss_max + loss_max, mask = maximize_kahan_loss(cumsum_temp, addend, negative) + sum_max = sum_max + (addend - torch.where(mask, loss_max, 0)) + loss_max = torch.where(mask, 0, loss_max) + cumsum_temp = torch.cat((cumsum_temp, sum_max.unsqueeze(dim=0))) + return sum_max diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/api_precision_compare.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/api_precision_compare.py index 8f7db73b58f42a4a64728bb0f12d25cf6f9f9ebe..cd60d8bc15f5b1c2889c8bdf96d3e9490ce09498 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/api_precision_compare.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/compare/api_precision_compare.py @@ -430,6 +430,7 @@ def _api_precision_compare(parser=None): _api_precision_compare_parser(parser) args = parser.parse_args(sys.argv[1:]) _api_precision_compare_command(args) + logger.info("Compare task completed.") def _api_precision_compare_command(args): @@ -457,8 +458,3 @@ def _api_precision_compare_parser(parser): parser.add_argument("-o", "--out_path", dest="out_path", default="", type=str, help=" The api precision compare task result out path.", required=False) - - -if __name__ == '__main__': - _api_precision_compare() - logger.info("Compare task completed.") diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/config.yaml b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/config.yaml index 2ec9251009e61ef68dbfed987abe457d47b91e9a..30cea3b8e01f1c1a8a3a3d25620ba4bb2c9e709a 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/config.yaml +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/config.yaml @@ -8,3 +8,5 @@ host: "" port: -1 rank_list: [0] tls_path: "./" +master_ip: '127.0.0.1' +master_port: '2688' diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/generate_op_script/op_generator.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/generate_op_script/op_generator.py index 797210f09c3b55a64002a4aa84a3d39770ae803c..641eada030353ec67f6ce7b59bd3d14909a56e51 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/generate_op_script/op_generator.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/generate_op_script/op_generator.py @@ -183,6 +183,7 @@ class APIExtractor: self.update_data_name(v, dump_data_dir) return value + @recursion_depth_decorator("OpGenerator: APIExtractor.update_data_name") def update_data_name(self, data, dump_data_dir): if isinstance(data, list): for item in data: diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/data_generate.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/data_generate.py index 9d89b2de32f70c6fa7abf38add49b58a13531d7a..15e14b68c7da4f2c7fadd4e0285c79fec5fa78f1 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/data_generate.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/data_generate.py @@ -1,9 +1,7 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # -# Licensed under the Apache License, Version 2.0 (the "License"); +# Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # @@ -15,20 +13,27 @@ # See the License for the specific language governing permissions and # limitations under the License. -import os import math -import torch +import os + import numpy +import torch -from msprobe.pytorch.api_accuracy_checker.run_ut.run_ut_utils import hf_32_standard_api +from msprobe.core.common.const import Const, FileCheckConst, CompareConst, DistributedCheckConst +from msprobe.core.common.file_utils import FileChecker, load_npy from msprobe.pytorch.api_accuracy_checker.common.utils import check_object_type, get_full_data_path, \ CompareException, get_module_and_atttribute_name, get_attribute -from msprobe.core.common.file_utils import FileChecker, load_npy +from msprobe.pytorch.api_accuracy_checker.run_ut.run_ut_utils import hf_32_standard_api from msprobe.pytorch.common.log import logger from msprobe.pytorch.common.utils import load_pt -from msprobe.core.common.const import Const, FileCheckConst, CompareConst +from msprobe.pytorch.hook_module.api_register import get_api_register +api_register = get_api_register(return_new=True) +api_register.initialize_hook(None) +distribute_api_key = Const.PT_FRAMEWORK + Const.SEP + Const.PT_API_TYPE_DIST +distribute_api_list = list(api_register.ori_api_attr.get(distribute_api_key, {}).keys()) + TORCH_TYPE = ["torch.device", "torch.dtype"] TENSOR_DATA_LIST = ["torch.Tensor", "torch.nn.parameter.Parameter"] FLOAT_TYPE = [ @@ -68,7 +73,7 @@ def gen_data(info, api_name, need_grad, convert_type, real_data_path=None): data = gen_random_tensor(info, convert_type) if api_name in hf_32_standard_api and data.dtype == torch.float32: data = fp32_to_hf32_to_fp32(data) - if info.get('requires_grad') and need_grad: + if info.get('requires_grad') and need_grad and api_name not in distribute_api_list: data.requires_grad_(True) temp_data = data * 1 data = temp_data.type_as(data) @@ -261,11 +266,14 @@ def gen_args(args_info, api_name, func_options): Function Description: Based on API basic information, generate input parameters: args, for API forward running Parameter: - api_info: API basic information. List + args_info: API basic information. DICT api_name: API name - need_grad: set Tensor grad for backward - convert_type: convert ori_type to dist_type flag. - real_data_path: the root directory for storing real data. + func_options: the options for generating args. Dict + need_grad: set Tensor grad for backward + convert_type: convert ori_type to dist_type flag. + real_data_path: the root directory for storing real data. + depth: the depth of recursion. + kwargs_params: the input kwargs parameters. """ check_object_type(args_info, list) args_result = [] @@ -274,6 +282,7 @@ def gen_args(args_info, api_name, func_options): convert_type = func_options.get('convert_type', None) real_data_path = func_options.get('real_data_path', None) depth = func_options.get('depth', 0) + kwargs_params = func_options.get('input_kwargs', {}) if depth > Const.MAX_DEPTH: logger.error("The depth of args is too large, please check the input args.") @@ -284,7 +293,11 @@ def gen_args(args_info, api_name, func_options): func_options['depth'] = depth + 1 data = gen_args(arg, api_name, func_options) elif isinstance(arg, dict): - data = gen_data(arg, api_name, need_grad, convert_type, real_data_path) + if arg.get('type') == DistributedCheckConst.TORCH_PROCESS_GROUP: + data = None + kwargs_params[DistributedCheckConst.GROUP] = arg + else: + data = gen_data(arg, api_name, need_grad, convert_type, real_data_path) elif arg is None: data = None else: @@ -311,6 +324,8 @@ def gen_kwargs(api_info, api_name, convert_type=None, real_data_path=None): kwargs_params[key] = gen_list_kwargs(value, api_name, convert_type, real_data_path) elif value is None: kwargs_params[key] = None + elif key == DistributedCheckConst.GROUP and value.get('type') == DistributedCheckConst.TORCH_PROCESS_GROUP: + kwargs_params[key] = value elif key == 'atten_mask' and api_name == 'npu_fusion_attention': sparse_mode = kwargs_params.get('sparse_mode', {}) if isinstance(sparse_mode, dict): @@ -415,17 +430,19 @@ def gen_api_params(api_info, api_name, need_grad=True, convert_type=None, real_d if convert_type and convert_type not in Const.CONVERT: error_info = f"convert_type params not support {convert_type}." raise CompareException(CompareException.INVALID_PARAM_ERROR, error_info) - kwargs_params = gen_kwargs(api_info, api_name, convert_type, real_data_path) + func_options = { 'need_grad': need_grad, 'convert_type': convert_type, 'real_data_path': real_data_path, - 'depth': 0 + 'depth': 0, + 'input_kwargs': api_info.get("input_kwargs", {}) } if api_info.get("input_args"): args_params = gen_args(api_info.get("input_args"), api_name, func_options) else: logger.warning(f'Warning: No args in {api_info} ') args_params = [] + kwargs_params = gen_kwargs(api_info, api_name, convert_type, real_data_path) output_dtype = get_output_dtype(api_info) return args_params, kwargs_params, output_dtype diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_bench_function.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_bench_function.py new file mode 100644 index 0000000000000000000000000000000000000000..18ff05bc00c2c5271e965dbd91fd54be1d410876 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_bench_function.py @@ -0,0 +1,204 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import torch + +from msprobe.core.common.const import DistributedCheckConst +from msprobe.pytorch.api_accuracy_checker.common.utils import check_object_type +from msprobe.pytorch.api_accuracy_checker.compare.algorithm import kahan_range +from msprobe.pytorch.api_accuracy_checker.run_ut.run_ut_utils import get_distributed_args + + +def sort_all_input(inputs): + ranks = len(inputs) + if ranks <= 1: + return inputs + combined_tensor = torch.stack(inputs) + sorted_indices = torch.argsort(combined_tensor, descending=True, dim=0) + combined_tensor = torch.gather(combined_tensor, 0, sorted_indices) + sorted_inputs = [combined_tensor[i] for i in range(ranks)] + return sorted_inputs + + +def reduce_sum(tensors): + min_bound = torch.min( + kahan_range(tensors, negative=False), + kahan_range(tensors[::-1], negative=False), + ) + max_bound = torch.max( + kahan_range(tensors, negative=True), kahan_range(tensors[::-1], negative=True) + ) + tensors_sorted = sort_all_input(tensors) + min_sorted_bound = torch.min( + kahan_range(tensors_sorted, negative=False), + kahan_range(tensors_sorted[::-1], negative=False), + ) + max_sorted_bound = torch.max( + kahan_range(tensors_sorted, negative=True), + kahan_range(tensors_sorted[::-1], negative=True), + ) + return torch.min(min_bound, min_sorted_bound), torch.max( + max_bound, max_sorted_bound + ) + + +def reduce_product(tensors): + return torch.stack(tensors).prod(dim=0) + + +def reduce_min(tensors): + return torch.stack(tensors).min(dim=0).values + + +def reduce_max(tensors): + return torch.stack(tensors).max(dim=0).values + + +def reduce_band(tensors): + reduce_tensor = tensors[0].clone() + if len(tensors) > 1: + for t in tensors[1:]: + reduce_tensor &= t + return reduce_tensor + + +def reduce_bor(tensors): + reduce_tensor = tensors[0].clone() + if len(tensors) > 1: + for t in tensors[1:]: + reduce_tensor |= t + return reduce_tensor + + +def reduce_bxor(tensors): + reduce_tensor = tensors[0].clone() + if len(tensors) > 1: + for t in tensors[1:]: + reduce_tensor ^= t + return reduce_tensor + + +def mock_broadcast(api_name, input_args, input_kwargs): + check_object_type(input_args, list) + check_object_type(input_kwargs, list) + if len(input_args) < 1 or len(input_kwargs) < 1: + raise ValueError("input_args and input_kwargs should have at least 1 element") + + src = get_distributed_args(api_name, input_args[0], input_kwargs[0], DistributedCheckConst.SRC) + + group = get_distributed_args(api_name, input_args[0], input_kwargs[0], DistributedCheckConst.GROUP) + group_ranks = group.get(DistributedCheckConst.GROUP_RANKS, []) + if not group_ranks: + raise ValueError("group_ranks should not be empty") + real_src = src - min(group_ranks) + if len(input_args) <= real_src: + raise ValueError("input_args should have at least {} element".format(real_src + 1)) + + return input_args[real_src][0] + + +def mock_reduce(api_name, input_args, input_kwargs): + check_object_type(input_args, list) + check_object_type(input_kwargs, list) + if len(input_args) < 1 or len(input_kwargs) < 1: + raise ValueError("input_args and input_kwargs should have at least 1 element") + + reduce_op = get_distributed_args(api_name, input_args[0], input_kwargs[0], DistributedCheckConst.OP) + tensors = [] + for arg in input_args: + if len(arg) > 0: + tensors.append(arg[0]) + reduce_tensor = None + if not tensors: + return reduce_tensor + reduce_ops = { + DistributedCheckConst.REDOPTYPE_SUM: reduce_sum, + DistributedCheckConst.REDOPTYPE_PRODUCT: reduce_product, + DistributedCheckConst.REDOPTYPE_MIN: reduce_min, + DistributedCheckConst.REDOPTYPE_MAX: reduce_max, + DistributedCheckConst.REDOPTYPE_BAND: reduce_band, + DistributedCheckConst.REDOPTYPE_BOR: reduce_bor, + DistributedCheckConst.REDOPTYPE_BXOR: reduce_bxor, + } + if reduce_op not in reduce_ops: + raise ValueError(f"Unsupported reduce operation: {reduce_op}") + reduce_tensor = reduce_ops[reduce_op](tensors) + + return reduce_tensor + + +def mock_scatter(api_name, input_args, input_kwargs): + check_object_type(input_args, list) + check_object_type(input_kwargs, list) + if len(input_args) < 1 or len(input_kwargs) < 1: + raise ValueError("input_args and input_kwargs should have at least 1 element") + + src = get_distributed_args(api_name, input_args[0], input_kwargs[0], DistributedCheckConst.SRC) + group = get_distributed_args(api_name, input_args[0], input_kwargs[0], DistributedCheckConst.GROUP) + group_ranks = group.get(DistributedCheckConst.GROUP_RANKS, []) + if not group_ranks: + raise ValueError("group_ranks should not be empty") + real_src = src - min(group_ranks) + if len(input_args) <= real_src: + raise ValueError("input_args should have at least {} element".format(real_src + 1)) + scatter_list = get_distributed_args(api_name, input_args[real_src], input_kwargs[real_src], + DistributedCheckConst.SCATTER_LIST) + return scatter_list + + +def mock_all_gather(api_name, input_args, input_kwargs): + check_object_type(input_args, list) + check_object_type(input_kwargs, list) + gather_tensor = [] + for data in input_args: + if len(data) > 1: + gather_tensor.append(data[1]) + return gather_tensor + + +def mock_all_to_all(api_name, input_args, input_kwargs): + check_object_type(input_args, list) + check_object_type(input_kwargs, list) + input_tensor_list = [] + for data in input_args: + if len(data) >= 2: + input_tensor_list.append(data[1]) + world_size = len(input_tensor_list) + output_tensor_list = [] + for rank in range(world_size): + output_chunk = [] + for data in input_tensor_list: + if len(data) <= rank: + raise ValueError("input_tensor_list should have at least {} element".format(rank + 1)) + output_chunk.append(data[rank]) + output_tensor_list.append(output_chunk) + return output_tensor_list + + +def mock_all_to_all_single(api_name, input_args, input_kwargs): + check_object_type(input_args, list) + check_object_type(input_kwargs, list) + input_tensor_list = [] + for data in input_args: + if len(data) >= 2: + input_tensor_list.append(data[1]) + if not input_tensor_list: + return [] + input_tensor = torch.stack(input_tensor_list) + output_tensor = input_tensor.t() + output_tensor_list = [tensor.clone() for tensor in output_tensor] + return output_tensor_list diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_compare_function.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_compare_function.py new file mode 100644 index 0000000000000000000000000000000000000000..f7cf95a1d0d9060b75a45a360e6a4d5d8b087637 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_compare_function.py @@ -0,0 +1,116 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import itertools +import torch +import tqdm + +from msprobe.core.common.const import CompareConst, DistributedCheckConst + + +def cumulative_check(rank, inputs, output, min_bound, max_bound): + # 检查每个元素是否在最小值和最大值之间 + res = CompareConst.PASS + out_of_bounds = torch.nonzero((output < min_bound) | (output > max_bound)) + if out_of_bounds.shape[0] == 0: + return res + # 对超出范围的值进行累加序遍历检查 + perms = list(itertools.permutations(list(range(len(inputs))))) + if len(out_of_bounds) > DistributedCheckConst.MAX_CUMSUM_CHECK_NUM: + res = CompareConst.WARNING + out_of_bounds = out_of_bounds[: DistributedCheckConst.MAX_CUMSUM_CHECK_NUM] + pbar = tqdm.tqdm( + out_of_bounds, + position=rank + 1, + desc=f"Suspicious cumulative result check for rank{rank}", + ) + for indice in pbar: + indice_tuple = tuple(indice) + input_values = torch.stack([input_[indice_tuple] for input_ in inputs])[perms] + for i in range(1, len(inputs)): + input_values[:, 0] += input_values[:, i] + if output[indice_tuple] not in input_values[:, 0]: + res = CompareConst.ERROR + break + pbar.close() + return res + + +def compare_broadcast(device_out, bench_out, **kwargs): + if len(device_out) < 1: + raise ValueError("device_out should not be empty") + compare_result = torch.equal(device_out[0].cpu(), bench_out) + + return CompareConst.PASS if compare_result else CompareConst.ERROR + + +def compare_all_reduce(device_out, bench_out, **kwargs): + if len(device_out) < 1: + raise ValueError("device_out should not be empty") + if isinstance(bench_out, tuple): + rank = kwargs.get("local_rank", 0) + input_args = kwargs.get("input_args", []) + tensors = [] + for arg in input_args: + if len(arg) > 0: + tensors.append(arg[0]) + if len(tensors) < 1: + raise ValueError("input_args should have at least 1 element") + result = cumulative_check(rank, tensors, device_out[0].cpu(), *bench_out) + else: + compare_result = torch.equal(device_out[0].cpu(), bench_out) + result = CompareConst.PASS if compare_result else CompareConst.ERROR + return result + + +def compare_scatter(device_out, bench_out, **kwargs): + rank = kwargs.get("local_rank", 0) + if len(device_out) < 1: + raise ValueError("device_out should not be empty") + if len(bench_out) <= rank: + raise ValueError("bench_out should have at least rank+1 outputs") + compare_result = torch.equal(device_out[0].cpu(), bench_out[rank]) + + return CompareConst.PASS if compare_result else CompareConst.ERROR + + +def compare_all_gather(device_out, bench_out, **kwargs): + if len(device_out) < 1: + raise ValueError("device_out should not be empty") + device_out_cpu = [tensor.cpu() for tensor in device_out[0]] + compare_result = all(torch.equal(a, b) for a, b in zip(device_out_cpu, bench_out)) + + return CompareConst.PASS if compare_result else CompareConst.ERROR + + +def compare_all_to_all(device_out, bench_out, **kwargs): + rank = kwargs.get("local_rank", 0) + if len(device_out) < 1: + raise ValueError("device_out should not be empty") + device_out_cpu = [tensor.cpu() for tensor in device_out[0]] + compare_result = all(torch.equal(a, b) for a, b in zip(device_out_cpu, bench_out[rank])) + + return CompareConst.PASS if compare_result else CompareConst.ERROR + + +def compare_all_to_all_single(device_out, bench_out, **kwargs): + rank = kwargs.get("local_rank", 0) + if len(device_out) < 1: + raise ValueError("device_out should not be empty") + compare_result = torch.equal(device_out[0].cpu(), bench_out[rank]) + + return CompareConst.PASS if compare_result else CompareConst.ERROR diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_function_registry.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_function_registry.py new file mode 100644 index 0000000000000000000000000000000000000000..6758b4ff4f8b286477880f74cd34e3516060c3fb --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/distributed_function_registry.py @@ -0,0 +1,68 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from typing import Callable + +from msprobe.pytorch.api_accuracy_checker.run_ut.distributed_bench_function import \ + mock_broadcast, mock_reduce, mock_scatter, mock_all_gather, mock_all_to_all, \ + mock_all_to_all_single +from msprobe.pytorch.api_accuracy_checker.run_ut.distributed_compare_function import \ + compare_broadcast, compare_all_reduce, compare_scatter, \ + compare_all_gather, compare_all_to_all, compare_all_to_all_single +from msprobe.core.common.const import DistributedCheckConst + + +class DistributedFunctionRegistry: + def __init__(self): + self.compare_functions = {} + self.bench_functions = {} + self.support_api_list = [DistributedCheckConst.BROADCAST, DistributedCheckConst.ALL_REDUCE, + DistributedCheckConst.SCATTER, DistributedCheckConst.ALL_GATHER, + DistributedCheckConst.ALL_TO_ALL, DistributedCheckConst.ALL_TO_ALL_SINGLE] + + def register_compare_function(self, api_name: str, function: Callable): + self.compare_functions[api_name] = function + + def register_bench_function(self, api_name: str, function: Callable): + self.bench_functions[api_name] = function + + def register_functions(self, functions_dict): + for api_name, (bench_function, compare_function) in functions_dict.items(): + self.register_bench_function(api_name, bench_function) + self.register_compare_function(api_name, compare_function) + + def get_compare_function(self, api_name: str) -> Callable: + if not self.compare_functions.get(api_name): + raise Exception("No compare function registered for api: {}".format(api_name)) + return self.compare_functions.get(api_name) + + def get_bench_function(self, api_name: str) -> Callable: + if not self.bench_functions.get(api_name): + raise Exception("No benchmark function registered for api: {}".format(api_name)) + return self.bench_functions.get(api_name) + + +functions_map = { + DistributedCheckConst.BROADCAST: (mock_broadcast, compare_broadcast), + DistributedCheckConst.ALL_REDUCE: (mock_reduce, compare_all_reduce), + DistributedCheckConst.SCATTER: (mock_scatter, compare_scatter), + DistributedCheckConst.ALL_GATHER: (mock_all_gather, compare_all_gather), + DistributedCheckConst.ALL_TO_ALL: (mock_all_to_all, compare_all_to_all), + DistributedCheckConst.ALL_TO_ALL_SINGLE: (mock_all_to_all_single, compare_all_to_all_single) +} +distributed_func_registry = DistributedFunctionRegistry() +distributed_func_registry.register_functions(functions_map) diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/multi_run_ut.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/multi_run_ut.py index 498102b475f564564d6039a81e305fba3bceec17..1354e2dea17439d89a938ab660b60c9b514d31dc 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/multi_run_ut.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/multi_run_ut.py @@ -50,6 +50,9 @@ def split_json_file(input_file, num_splits, filter_api): backward_data[f"{data_name}.backward"] = backward_data.pop(data_name) input_data = load_json(input_file) + if "dump_data_dir" not in input_data.keys(): + logger.error("Invalid input file, 'dump_data_dir' field is missing") + raise CompareException("Invalid input file, 'dump_data_dir' field is missing") if input_data.get("data") is None: logger.error("Invalid input file, 'data' field is missing") raise CompareException("Invalid input file, 'data' field is missing") @@ -221,7 +224,3 @@ def main(): args = parser.parse_args() config = prepare_config(args) run_parallel_ut(config) - - -if __name__ == '__main__': - main() diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_distributed_check.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_distributed_check.py new file mode 100644 index 0000000000000000000000000000000000000000..54f3790bbc048a9265419a52e18519b77ab25de8 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_distributed_check.py @@ -0,0 +1,254 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import os +import sys +import time +from collections import namedtuple +import copy + +import torch_npu +import torch.distributed as dist +import torch.multiprocessing as mp + +from msprobe.core.common.const import Const, FileCheckConst, DistributedCheckConst, CompareConst +from msprobe.core.common.file_utils import FileChecker, write_csv, create_directory +from msprobe.core.compare.utils import check_and_return_dir_contents +from msprobe.pytorch.api_accuracy_checker.common.config import CheckerConfig +from msprobe.pytorch.api_accuracy_checker.common.utils import extract_basic_api_segments +from msprobe.pytorch.api_accuracy_checker.run_ut.run_ut_utils import generate_device_params, get_group_info, \ + is_port_in_use +from msprobe.pytorch.api_accuracy_checker.run_ut.run_ut import get_api_info +from msprobe.pytorch.api_accuracy_checker.run_ut.distributed_function_registry import distributed_func_registry +from msprobe.pytorch.common.log import logger +from msprobe.pytorch.common.parse_json import parse_json_info_forward_backward +from msprobe.pytorch.hook_module.api_register import get_api_register +from msprobe.pytorch.pt_config import parse_json_config + + +api_register = get_api_register(return_new=True) +api_register.initialize_hook(None) +distribute_api_key = Const.PT_FRAMEWORK + Const.SEP + Const.PT_API_TYPE_DIST +distributed_func = api_register.ori_api_attr.get(distribute_api_key, {}) + +current_time = time.strftime("%Y%m%d%H%M%S") +RESULT_FILE_NAME = "accuracy_checking_result_" + current_time + ".csv" +RESULT_CSV_HEADER = [['API_NAME', 'RANK', 'COMPARE_RESULT', 'MESSAGE']] +DistributedCheckParams = namedtuple("DistributedCheckParams", ["api_full_name", "all_args", "all_kwargs", + "group_ranks", "result_file_path", "checker_config"]) +special_rank_api_list = [DistributedCheckConst.SCATTER, + DistributedCheckConst.ALL_TO_ALL, + DistributedCheckConst.ALL_TO_ALL_SINGLE] + + +def cleanup(): + dist.destroy_process_group() + + +def distributed_setup(rank, world_size, master_ip, master_port): + init_method = DistributedCheckConst.TCP + Const.COLON + Const.DOUBLE_SLASH + master_ip + Const.COLON + master_port + dist.init_process_group(backend=DistributedCheckConst.HCCL, init_method=init_method, + world_size=world_size, rank=rank) + + +def parse_distributed_api(forward_content): + distributed_api = {} + for api_full_name, api_info_dict in forward_content.items(): + split_name = api_full_name.split(Const.SEP)[0] + if split_name == Const.DISTRIBUTED: + distributed_api.update({api_full_name: api_info_dict}) + return distributed_api + + +def _run_distributed_parser(parser): + parser.add_argument("-api_info", "--api_info_dir", dest="api_info_dir", default="", type=str, + help=" The api param tool result dir: generate from api param tool. ", + required=True) + parser.add_argument("-o", "--out_path", dest="out_path", default="", type=str, + help=" The ut task result out path.", + required=False) + parser.add_argument("-config", "--config_path", dest="config_path", default="", type=str, + help=" The path of config.json", required=False) + + +def _run_distributed(parser=None): + if parser is None: + parser = argparse.ArgumentParser() + _run_distributed_parser(parser) + args = parser.parse_args(sys.argv[1:]) + run_distributed_command(args) + + +def run_distributed_command(args): + input_checker = FileChecker(args.api_info_dir, FileCheckConst.DIR, ability=FileCheckConst.READ_ABLE) + api_info_dir = input_checker.common_check() + ranks = sorted(check_and_return_dir_contents(api_info_dir, Const.RANK)) + file_paths = [os.path.join(api_info_dir, rank, 'dump.json') for rank in ranks] + forward_contents = [] + real_data_paths = [] + for file_path in file_paths: + forward_content, _, real_data_path = parse_json_info_forward_backward(file_path) + if real_data_path: + dump_path = os.path.dirname(file_path) + real_data_path = os.path.join(dump_path, Const.DUMP_TENSOR_DATA) + distributed_api = parse_distributed_api(forward_content) + forward_contents.append(distributed_api) + real_data_paths.append(real_data_path) + + out_path = args.out_path if args.out_path else Const.DEFAULT_PATH + create_directory(out_path) + out_path_checker = FileChecker(out_path, FileCheckConst.DIR, ability=FileCheckConst.WRITE_ABLE) + out_path = out_path_checker.common_check() + result_file_path = os.path.join(out_path, RESULT_FILE_NAME) + write_csv(RESULT_CSV_HEADER, result_file_path) + if args.config_path: + config_path_checker = FileChecker(args.config_path, FileCheckConst.FILE, + FileCheckConst.READ_ABLE, FileCheckConst.JSON_SUFFIX) + checked_config_path = config_path_checker.common_check() + _, task_config = parse_json_config(checked_config_path, Const.RUN_UT) + checker_config = CheckerConfig(task_config) + else: + checker_config = CheckerConfig() + run_distributed_check(forward_contents, real_data_paths, result_file_path, checker_config) + + +def run_distributed_check(forward_contents, real_data_paths, result_file_path, checker_config): + for rank, forward_content in enumerate(forward_contents): + logger.info("Start to check distributed api in rank {}.".format(rank)) + + for api_full_name, api_info_dict in forward_content.items(): + _, api_name = extract_basic_api_segments(api_full_name) + + if api_name not in distributed_func_registry.support_api_list: + message = "The api {} doesn't support distributed check.".format(api_full_name) + logger.warning(message) + result_rows = [] + df_row = list([api_full_name, rank, CompareConst.SKIP, message]) + result_rows.append(df_row) + write_csv(result_rows, result_file_path) + continue + + if api_info_dict.get('used'): + continue + + group_ranks, group_id = get_group_info(api_full_name, api_name, api_info_dict) + if not group_ranks or not group_id: + logger.warning("The api {} doesn't support distributed check.".format(api_full_name)) + continue + all_args, all_kwargs = get_distributed_args_kwargs(forward_contents, api_full_name, + real_data_paths, group_ranks) + try: + distributed_check_params = DistributedCheckParams(api_full_name, all_args, all_kwargs, group_ranks, + result_file_path, checker_config) + distributed_check(distributed_check_params) + except Exception as e: + logger.error("The api {} in rank {} distributed check failed.".format(api_full_name, rank)) + result_rows = [] + df_row = list([api_full_name, rank, CompareConst.ERROR, str(e)]) + result_rows.append(df_row) + write_csv(result_rows, result_file_path) + + +def distributed_check(distributed_check_params): + api_full_name = distributed_check_params.api_full_name + all_args = distributed_check_params.all_args + all_kwargs = distributed_check_params.all_kwargs + group_ranks = distributed_check_params.group_ranks + result_file_path = distributed_check_params.result_file_path + checker_config = distributed_check_params.checker_config + + _, api_name = extract_basic_api_segments(api_full_name) + nprocs = len(group_ranks) + distributed_config = {} + distributed_config[DistributedCheckConst.API_FULL_NAME] = api_full_name + distributed_config[DistributedCheckConst.API_NAME] = api_name + distributed_config[DistributedCheckConst.GROUP_RANKS] = group_ranks + distributed_config[DistributedCheckConst.ALL_ARGS] = all_args + distributed_config[DistributedCheckConst.ALL_KWARGS] = all_kwargs + distributed_config[DistributedCheckConst.RESULT_FILE_PATH] = result_file_path + benchmark_function = distributed_func_registry.get_bench_function(api_name) + distributed_config[DistributedCheckConst.BENCHMARK_RESULT] = benchmark_function(api_name, all_args, all_kwargs) + distributed_config[DistributedCheckConst.MASTER_IP] = checker_config.master_ip + distributed_config[DistributedCheckConst.MASTER_PORT] = checker_config.master_port + distributed_config[DistributedCheckConst.WORLD_SIZE] = nprocs + + if is_port_in_use(checker_config.master_port, checker_config.master_ip): + raise ValueError( + f"Warning: Port {checker_config.master_port} on host " + f"{checker_config.master_ip} is already in use." + ) + logger.info(f"Port {checker_config.master_port} on host {checker_config.master_ip} is available.") + + mp.spawn(run_hccl, + args=(distributed_config,), + nprocs=nprocs) + + +def run_hccl(rank, distributed_config): + local_rank = distributed_config[DistributedCheckConst.GROUP_RANKS][rank] + torch_npu.npu.set_device(local_rank) + world_size = distributed_config[DistributedCheckConst.WORLD_SIZE] + master_ip = distributed_config[DistributedCheckConst.MASTER_IP] + master_port = distributed_config[DistributedCheckConst.MASTER_PORT] + distributed_setup(rank, world_size, master_ip, master_port) + api_full_name = distributed_config[DistributedCheckConst.API_FULL_NAME] + api_name = distributed_config[DistributedCheckConst.API_NAME] + input_args = distributed_config[DistributedCheckConst.ALL_ARGS] + rank_args = input_args[rank] + rank_kwargs = distributed_config[DistributedCheckConst.ALL_KWARGS][rank] + result_file_path = distributed_config[DistributedCheckConst.RESULT_FILE_PATH] + benchmark_result = distributed_config[DistributedCheckConst.BENCHMARK_RESULT] + device_args, _ = generate_device_params(rank_args, rank_kwargs, False, api_name) + logger.info("Start to check distributed api {} in rank {}.".format(api_full_name, local_rank)) + distributed_func.get(api_name)(*device_args) + dist.barrier() + if api_name in special_rank_api_list: + local_rank = rank + kwargs = { + "local_rank": local_rank, + "input_args": input_args + } + compare_function = distributed_func_registry.get_compare_function(api_name) + status = compare_function(device_args, benchmark_result, **kwargs) + message = '' + result_rows = [] + df_row = list([api_full_name, local_rank, status, message]) + result_rows.append(df_row) + write_csv(result_rows, result_file_path) + cleanup() + + +def get_distributed_args_kwargs(forward_contents, api_full_name, real_data_paths, group_ranks): + all_args, all_kwargs = [], [] + _, api_name = extract_basic_api_segments(api_full_name) + for group_rank in group_ranks: + target_api_info = forward_contents[group_rank].get(api_full_name) + if not target_api_info: + logger.warning("The api {} doesn't exist in rank {}.".format(api_full_name, group_rank)) + continue + if target_api_info.get('used'): + continue + target_api_info['used'] = True + args, kwargs, _ = get_api_info(target_api_info, api_name, real_data_paths[group_rank]) + all_args.append(args) + all_kwargs.append(kwargs) + return all_args, all_kwargs + + +if __name__ == '__main__': + logger.info("Start to run distributed ut task.") + _run_distributed() + logger.info("End to run distributed ut task.") diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_overflow_check.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_overflow_check.py index 6214d892906bef44d94474c6415674f39099357b..f0490ed62edbaef51d1be10fdb7010a56d174041 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_overflow_check.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_overflow_check.py @@ -161,6 +161,7 @@ def _run_overflow_check(parser=None): _run_overflow_check_parser(parser) args = parser.parse_args(sys.argv[1:]) _run_overflow_check_command(args) + logger.info("UT task completed.") def _run_overflow_check_command(args): @@ -175,8 +176,3 @@ def _run_overflow_check_command(args): logger.error(f"Set NPU device id failed. device id is: {args.device_id}") raise NotImplementedError from error run_overflow_check(api_info) - - -if __name__ == '__main__': - _run_overflow_check() - logger.info("UT task completed.") diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut.py index 905687c1bfc932883396481410c333a7566fd342..0f13f3a4980e0ae8b9f7ff5d0db8004788b29ebd 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut.py @@ -65,6 +65,7 @@ DETAILS_FILE_NAME = "accuracy_checking_details_" + current_time + ".csv" not_backward_list = ['repeat_interleave'] unsupported_backward_list = ['masked_select'] +unsupported_api_list = ["to"] tqdm_params = { @@ -83,6 +84,9 @@ tqdm_params = { } +seed_all() + + def run_ut(config): logger.info("start UT test") if config.online_config.is_online: @@ -218,6 +222,7 @@ def blacklist_and_whitelist_filter(api_name, black_list, white_list): If api is both in black_list and black_list, black_list first. return: False for exec api, True for not exec """ + black_list.extend(unsupported_api_list) if black_list and api_name in black_list: return True if white_list and api_name not in white_list: @@ -317,7 +322,8 @@ def run_torch_api_online(api_full_name, api_data, backward_content): if kwargs.get("device"): del kwargs["device"] - device_out = exec_api(api_type, api_name, Const.CUDA_LOWERCASE, args, kwargs) + device_exec_params = ExecParams(api_type, api_name, current_device, args, kwargs, False, None) + device_out = exec_api(device_exec_params) device_out = move2device_exec(device_out, "cpu") return UtDataInfo(None, None, out, device_out, None, in_fwd_data_list, None, rank=api_data.rank) @@ -579,9 +585,8 @@ def run_ut_command(args): } run_ut_config = checker_config.get_run_ut_config(**config_params) run_ut(run_ut_config) + logger.info("UT task completed.") if __name__ == '__main__': - seed_all() _run_ut() - logger.info("UT task completed.") diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut_utils.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut_utils.py index dc0174212e3f8f8cf70fa1701aadc664138dbcdf..cbd75da166b4d7ddaa89e9672fd1442cba87028e 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut_utils.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/run_ut/run_ut_utils.py @@ -1,9 +1,7 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # -# Licensed under the Apache License, Version 2.0 (the "License"); +# Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # @@ -16,10 +14,11 @@ # limitations under the License. import os +import socket from collections import namedtuple import re -import torch +import torch try: import torch_npu except ImportError: @@ -29,15 +28,13 @@ else: current_device = "npu" from torch_npu.npu.amp import autocast -from msprobe.core.common.const import FileCheckConst, Const, CompareConst +from msprobe.core.common.const import FileCheckConst, Const, CompareConst, DistributedCheckConst from msprobe.core.common.file_utils import FileChecker from msprobe.core.common.log import logger from msprobe.core.common.utils import CompareException +from msprobe.pytorch.hook_module.api_register import ApiTemplate, get_api_register from msprobe.pytorch.hook_module.wrap_aten import AtenOPTemplate -from msprobe.pytorch.hook_module.wrap_functional import FunctionalOPTemplate -from msprobe.pytorch.hook_module.wrap_npu_custom import NpuOPTemplate -from msprobe.pytorch.hook_module.wrap_tensor import TensorOPTemplate -from msprobe.pytorch.hook_module.wrap_torch import TorchOPTemplate + hf_32_standard_api = ["conv1d", "conv2d"] not_detach_set = {'resize_', 'resize_as_', 'set_', 'transpose_', 't_', 'squeeze_', 'unsqueeze_'} @@ -108,17 +105,30 @@ def exec_api(exec_params): kwargs = exec_params.kwargs is_autocast = exec_params.is_autocast autocast_dtype = exec_params.autocast_dtype - - if api_type == "Functional": - torch_api = FunctionalOPTemplate(api_name, str, False) - if api_type == "Tensor": - torch_api = TensorOPTemplate(api_name, str, False) - if api_type == "Torch": - torch_api = TorchOPTemplate(api_name, str, False) - if api_type == "Aten": + out = None + + prefix_map = Const.API_DATA_PREFIX.get(Const.PT_FRAMEWORK, {}) + if not prefix_map or api_type not in prefix_map.values() or \ + api_type not in ( + Const.FUNCTIONAL_API_TYPE_PREFIX, + Const.TENSOR_API_TYPE_PREFIX, + Const.TORCH_API_TYPE_PREFIX, + Const.ATEN_API_TYPE_PREFIX, + Const.NPU_API_TYPE_PREFIX + ): + return out + + if api_type == Const.ATEN_API_TYPE_PREFIX: torch_api = AtenOPTemplate(api_name, None, False) - if api_type == "NPU": - torch_api = NpuOPTemplate(api_name, None, False, device) + else: + api_register = get_api_register() + api_register.initialize_hook(None) + api_func_type = list(prefix_map.keys())[list(prefix_map.values()).index(api_type)] + api_func = api_register.ori_api_attr.get(Const.PT_FRAMEWORK + Const.SEP + api_func_type, {}).get(api_name) + if api_func is None: + return out + + torch_api = ApiTemplate(api_name, api_func, api_type, None, need_hook=False, device=device) if is_autocast: with autocast(dtype=autocast_dtype): out = torch_api.forward(*args, **kwargs) @@ -225,7 +235,7 @@ def generate_cpu_params(input_args, input_kwargs, need_backward, api_name): origin_dtype = need_raise_dtypes.pop() raise_dtype = PRECISION_MAPPING.get(origin_dtype, torch.float32) autocast_dtype = origin_dtype - + elif len(need_raise_dtypes) >= 2: raise_dtype = torch.float32 need_raise_dtypes.discard(torch.float32) @@ -252,3 +262,65 @@ def is_unsupported_api(api_name, is_overflow_check=False): if flag: logger.info(f"{split_name} api is not supported for run ut. SKIP.") return flag + + +def get_args_index(api_name, args_name): + """ + 根据 API 名字和参数名获取参数索引。获取 group_index 或者 src_index。 + :param api_name: API 名字,如 "broadcast" 或 "all_reduce" + :param args_name: 参数名,如 "group" 或 "src" + :return: 参数索引 或 None(如果 API 名字或参数名不存在) + """ + api_info = DistributedCheckConst.API_ARGS_INDEX.get(api_name) + if api_info: + return api_info.get(args_name) + return None + + +def get_distributed_args(api_name, input_args, input_kwargs, args_name): + res = None + res = input_kwargs.get(args_name) + if res: + return res + res_index = get_args_index(api_name, args_name) + if not res_index or len(input_args) <= res_index: + return None + res = input_args[res_index] + return res + + +def get_group_info(api_full_name, api_name, api_info_dict): + input_args = api_info_dict.get('input_args', []) + input_kwargs = api_info_dict.get('input_kwargs', {}) + group = get_distributed_args(api_name, input_args, input_kwargs, DistributedCheckConst.GROUP) + + if not group: + logger.warning("The api {} doesn't have group info.".format(api_full_name)) + return None, None + group_ranks = group.get('group_ranks') + if not group_ranks: + logger.warning("The group of api {} doesn't have group_ranks info.".format(api_full_name)) + return None, None + group_id = group.get('group_id') + if not group_id: + logger.warning("The group of api {} doesn't have group_id info.".format(api_full_name)) + return None, None + return group_ranks, group_id + + +def is_port_in_use(port, host): + """ + 检测指定端口是否被占用。 + :param port: 要检测的端口号 + :param host: 主机地址 + :return: 如果端口被占用返回 True,否则返回 False + """ + if not isinstance(port, str) or not port.isdigit(): + raise Exception(f"port: {port} is invalid. Port must be a numeric string.") + port = int(port) + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + try: + s.bind((host, port)) + return False # 端口未被占用 + except socket.error: + return True # 端口已被占用 diff --git a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/attl.py b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/attl.py index f31c29c6bb6fa8a863b83bf09d15aba09645436f..f858067b6616ff34ee95e1c4394e63fe4385b397 100644 --- a/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/attl.py +++ b/debug/accuracy_tools/msprobe/pytorch/api_accuracy_checker/tensor_transport_layer/attl.py @@ -27,6 +27,8 @@ from msprobe.pytorch.api_accuracy_checker.tensor_transport_layer.client import T from msprobe.pytorch.api_accuracy_checker.tensor_transport_layer.server import TCPServer from msprobe.core.common.file_utils import remove_path from msprobe.pytorch.common.utils import logger, save_api_data, load_api_data, save_pkl, load_pkl +from msprobe.core.common.utils import recursion_depth_decorator + BufferType = Union[ApiData, Dict[str, Any], str] # Union[Tensor, Tuple[Optional[Tensor]]] @@ -168,11 +170,12 @@ class ATTL: return buffer +@recursion_depth_decorator("move2device_exec") def move2device_exec(obj, device): if isinstance(obj, (tuple, list)): data_list = [move2device_exec(val, device) for val in obj] return data_list if isinstance(obj, list) else tuple(data_list) - if isinstance(obj, dict): + if isinstance(obj, dict): return {key: move2device_exec(val, device) for key, val in obj.items()} elif isinstance(obj, torch.Tensor): obj = obj.detach() diff --git a/debug/accuracy_tools/msprobe/pytorch/common/utils.py b/debug/accuracy_tools/msprobe/pytorch/common/utils.py index 16067f6d2bee70645bcc337d1809a14f41ae5b96..2191e545287696e9aff9b46f8f60fd3c02159f5e 100644 --- a/debug/accuracy_tools/msprobe/pytorch/common/utils.py +++ b/debug/accuracy_tools/msprobe/pytorch/common/utils.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -57,7 +57,7 @@ def parameter_adapter(func): @wraps(func) def inner(self, *args, **kwargs): - if self.op_name_ == "__getitem__" and len(args) > 1 and isinstance(args[1], torch.Tensor): + if self.api_name == "__getitem__" and len(args) > 1 and isinstance(args[1], torch.Tensor): input_tensor = args[0] indices = args[1] if indices.dtype == torch.uint8: @@ -77,7 +77,7 @@ def parameter_adapter(func): else: res = [input_tensor[tensor_index] for tensor_index in indices] return getattr(torch._C._VariableFunctionsClass, "stack")(res, 0) - if self.op_name_ == "__eq__" and len(args) > 1 and args[1] is None: + if self.api_name == "__eq__" and len(args) > 1 and args[1] is None: return False return func(self, *args, **kwargs) @@ -261,6 +261,10 @@ class Const: NPU = 'NPU' DISTRIBUTED = 'Distributed' + HIFLOAT8_TYPE = "torch_npu.HiFloat8Tensor" + FLOAT8_E5M2_TYPE = "torch.float8_e5m2" + FLOAT8_E4M3FN_TYPE = "torch.float8_e4m3fn" + RAISE_PRECISION = { torch.float16: torch.float32, torch.bfloat16: torch.float32, @@ -449,9 +453,9 @@ def is_recomputation(): def check_save_param(variable, name, save_backward): # try catch this api to skip invalid call - if not isinstance(variable, (list, dict, torch.Tensor, int, float, str)): + if not isinstance(variable, (list, dict, tuple, torch.Tensor, int, float, str)): logger.warning("PrecisionDebugger.save variable type not valid, " - "should be one of list, dict, torch.Tensor, int, float or string. " + "should be one of list, dict, tuple, torch.Tensor, int, float or string. " "Skip current save process.") raise ValueError if not isinstance(name, str): @@ -473,3 +477,15 @@ def replace_last_occurrence(text, old, new): if index != -1: return text[:index] + text[index:].replace(old, new, 1) return text + + +def is_hifloat8_tensor(tensor): + if not is_gpu and hasattr(torch_npu, "HiFloat8Tensor") and isinstance(tensor, torch_npu.HiFloat8Tensor): + return True + return False + + +def is_float8_tensor(tensor): + if str(tensor.dtype) in [Const.FLOAT8_E5M2_TYPE, Const.FLOAT8_E4M3FN_TYPE]: + return True + return is_hifloat8_tensor(tensor) diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/__init__.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7d60f07881d378bb0a7a9c6faf6147af07a915b2 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/__init__.py @@ -0,0 +1,16 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import msprobe.pytorch.config_checking.checkers diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/__init__.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..bc698ff7ceedccec3715b182b3a4980fd02c5ab2 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/__init__.py @@ -0,0 +1,26 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__all__ = ['BaseChecker', ] + +import msprobe.pytorch.config_checking.checkers.env_args_checker +import msprobe.pytorch.config_checking.checkers.pip_checker +import msprobe.pytorch.config_checking.checkers.checkpoint_checker +import msprobe.pytorch.config_checking.checkers.dataset_checker +import msprobe.pytorch.config_checking.checkers.weights_checker +import msprobe.pytorch.config_checking.checkers.hyperparameter_checker + + +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/base_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/base_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..fb6c36938cc0984ca761e2ae9b2bf7b0cde847dc --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/base_checker.py @@ -0,0 +1,66 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from abc import ABC, abstractmethod + +from msprobe.core.common.const import FileCheckConst + + +class PackInput: + + def __init__(self, config_dict=None, model=None): + self.code_path = config_dict.get("code path", None) + self.ckpt_path = config_dict.get("ckpt path", None) + self.need_env_args = config_dict.get("env args", None) + self.need_pip_data = config_dict.get("pip data", None) + self.output_zip_path = config_dict.get("output zip path", "./config_check_pack.zip") + self.shell_path = config_dict.get("shell path", None) + self.model = model + self.check_input_params() + + def check_input_params(self): + if self.code_path is not None: + if not isinstance(self.output_zip_path, str): + raise Exception(f"code_path must be a string") + if self.ckpt_path is not None: + if not isinstance(self.ckpt_path, str): + raise Exception(f"ckpt_path must be a string") + if not isinstance(self.output_zip_path, str) or not self.output_zip_path.endswith(FileCheckConst.ZIP_SUFFIX): + raise Exception(f"output zip path must be a string and ends with '.zip'") + + +class BaseChecker(ABC): + input_needed = None + target_name_in_zip = None + multi_rank = False + + @staticmethod + @abstractmethod + def pack(pack_input): + pass + + @staticmethod + @abstractmethod + def compare(bench_dir, cmp_dir, output_path): + pass + + @classmethod + def compare_ex(cls, bench_dir, cmp_dir, output_path): + bench_filepath = os.path.join(bench_dir, cls.target_name_in_zip) + cmp_filepath = os.path.join(cmp_dir, cls.target_name_in_zip) + if not os.path.exists(bench_filepath) or not os.path.exists(cmp_filepath): + return + cls.compare(bench_dir, cmp_dir, output_path) diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/checkpoint_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/checkpoint_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..f3208acbb05226e11dee7ac32ab6910ba344fd2e --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/checkpoint_checker.py @@ -0,0 +1,69 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json +import os + +import torch + +from msprobe.core.common.file_utils import load_json, create_file_with_list, create_file_in_zip +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker +from msprobe.pytorch.config_checking.config_checker import register_checker_item +from msprobe.pytorch.config_checking.utils.utils import config_checking_print, compare_dict, bytes_hash + + +def tensor_to_hash(tensor): + tensor_bytes = tensor.cpu().numpy().tobytes() + return bytes_hash(tensor_bytes) + + +def tensor_in_state_dict_to_hash(state_dict): + result = {} + for key, value in state_dict.items(): + result[key] = tensor_to_hash(value) + return result + + +def load_ckpt_file_in_zip(dirpath): + ckpt_filepath = os.path.join(dirpath, CheckpointChecker.target_name_in_zip) + ckpt_dict = load_json(ckpt_filepath) + return ckpt_dict + + +@register_checker_item("ckpt") +class CheckpointChecker(BaseChecker): + input_needed = "ckpt_path" + + target_name_in_zip = "ckpt_data.txt" + result_filename = "ckpt_data_check_result.txt" + + @staticmethod + def pack(pack_input): + output_zip_path = pack_input.output_zip_path + ckpt_data = torch.load(pack_input.ckpt_path) + if not isinstance(ckpt_data, dict): + raise Exception(f"{pack_input.ckpt_path} is not state dict") + result = tensor_in_state_dict_to_hash(ckpt_data) + create_file_in_zip(output_zip_path, CheckpointChecker.target_name_in_zip, json.dumps(result, indent=4)) + config_checking_print(f"add ckpt info to zip") + + @staticmethod + def compare(bench_dir, cmp_dir, output_path): + bench_ckpt_data = load_ckpt_file_in_zip(bench_dir) + cmp_ckpt_data = load_ckpt_file_in_zip(cmp_dir) + result = compare_dict(bench_ckpt_data, cmp_ckpt_data) + output_filepath = os.path.join(output_path, CheckpointChecker.result_filename) + create_file_with_list(result, output_filepath) + diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/dataset_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/dataset_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..0426f1db9c03490739b06dd44930f994c1afa8cc --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/dataset_checker.py @@ -0,0 +1,73 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +import torch + +from msprobe.core.common.file_utils import create_file_with_list, create_file_with_content, create_file_in_zip +from msprobe.pytorch.common.utils import get_rank_id +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker +from msprobe.pytorch.config_checking.config_checker import register_checker_item, register_pre_forward_fun_list +from msprobe.pytorch.config_checking.utils.utils import config_checking_print, \ + get_tensor_features, read_rank_result_to_dict, compare_dicts + + +def get_tuple_input_features(tup): + """Compute the features value of a tuple""" + result = {} + for i, x in enumerate(tup): + if isinstance(x, torch.Tensor): + result[f"input{i}"] = get_tensor_features(x) + else: + result[f"input{i}"] = str(x) + return result + + +@register_checker_item("dataset") +class DatasetChecker(BaseChecker): + input_needed = "model" + multi_rank = True + + target_name_in_zip = "dataset" + result_filename = "dataset_check_result.txt" + + @staticmethod + def pack(pack_input): + output_zip_path = pack_input.output_zip_path + + def collect_input(model, data_input): + if isinstance(data_input, torch.Tensor): + features = {"input0": get_tensor_features(data_input)} + elif isinstance(data_input, tuple): + features = get_tuple_input_features(data_input) + else: + raise ValueError("Unsupported input type when pack dataset") + + dataset_filepath = os.path.join(DatasetChecker.target_name_in_zip, f"rank{get_rank_id()}.json") + create_file_in_zip(output_zip_path, dataset_filepath, json.dumps(features, indent=4)) + config_checking_print(f"add first dataset input features to zip") + + register_pre_forward_fun_list(collect_input) + + @staticmethod + def compare(bench_dir, cmp_dir, output_path): + bench_dataset_pack_path = os.path.join(bench_dir, DatasetChecker.target_name_in_zip) + cmp_dataset_pack_path = os.path.join(cmp_dir, DatasetChecker.target_name_in_zip) + bench_dataset = read_rank_result_to_dict(bench_dataset_pack_path) + cmp_dataset = read_rank_result_to_dict(cmp_dataset_pack_path) + deleted, added, changed, result = compare_dicts(bench_dataset, cmp_dataset) + output_filepath = os.path.join(output_path, DatasetChecker.result_filename) + create_file_with_content(json.dumps(result, indent=4), output_filepath) diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/env_args_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/env_args_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..668e4815c3a5849dcf38de4e21ac9a03c211d901 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/env_args_checker.py @@ -0,0 +1,77 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json + +from msprobe.core.common.file_utils import load_json, load_yaml, create_file_with_content, create_file_in_zip +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker +from msprobe.pytorch.config_checking.config_checker import register_checker_item +from msprobe.pytorch.config_checking.utils.utils import config_checking_print + + +dirpath = os.path.dirname(__file__) +env_yaml_path = os.path.join(dirpath, "../resource/env.yaml") + + +def collect_env_data(): + result = {} + for key, value in os.environ.items(): + result[key] = value + return result + + +def compare_env_data(npu_path, bench_path): + error_message = "" + warning_message = "" + necessary_env = load_yaml(env_yaml_path) + npu_data = load_json(npu_path) + bench_data = load_json(bench_path) + for _, value in necessary_env.items(): + npu_env_name = value[0]["name"] + npu_value = npu_data.get(npu_env_name) if npu_data.get(npu_env_name) else value[0]["default_value"] + if len(value) == 1: + warning_message += f"only npu has this env, npu_env_name:{npu_env_name}, npu_value:{npu_value}\n" + continue + bench_env_name = value[1]["name"] + bench_value = bench_data.get(bench_env_name) if bench_data.get(bench_env_name) else value[1]["default_value"] + if npu_value != bench_value: + error_message += (f"npu_env_name:{npu_env_name}, npu_value:{npu_value}, bench_env_name:{bench_env_name}, " + f"bench_value:{bench_value}\n") + return error_message, warning_message + + +@register_checker_item("env") +class EnvArgsChecker(BaseChecker): + input_needed = "need_env_args" + + target_name_in_zip = "env_data.txt" + result_filename = "env_args_check_result.txt" + + @staticmethod + def pack(pack_input): + output_zip_path = pack_input.output_zip_path + env_args_dict = collect_env_data() + create_file_in_zip(output_zip_path, EnvArgsChecker.target_name_in_zip, json.dumps(env_args_dict, indent=4)) + config_checking_print(f"add env args to zip") + + @staticmethod + def compare(bench_dir, cmp_dir, output_path): + bench_env_data = os.path.join(bench_dir, EnvArgsChecker.target_name_in_zip) + cmp_env_data = os.path.join(cmp_dir, EnvArgsChecker.target_name_in_zip) + env_error_message, env_warning_message = compare_env_data(bench_env_data, cmp_env_data) + output_filepath = os.path.join(output_path, EnvArgsChecker.result_filename) + result = f"-env_error_message:\n{env_error_message}\n-env_warning_message:\n{env_warning_message}" + create_file_with_content(result, output_filepath) diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/hyperparameter_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/hyperparameter_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..dd0ae266855d8dfbf62770f4c4f4e381f8df07ff --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/hyperparameter_checker.py @@ -0,0 +1,218 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +import re +import tempfile +from difflib import SequenceMatcher + +from typing import Union, List, Dict, Any + +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker +from msprobe.pytorch.config_checking.config_checker import register_checker_item +from msprobe.pytorch.config_checking.utils.utils import compare_dict, config_checking_print +from msprobe.core.common.file_utils import (os_walk_for_files, create_file_in_zip, load_json, create_file_with_list, + FileOpen) +from msprobe.core.common.const import FileCheckConst, Const + + +@register_checker_item("hyperparameter") +class HyperparameterChecker(BaseChecker): + input_needed = "shell_path" + target_name_in_zip = "hyperparameters" + result_filename = "hyperparameter_diff.txt" + + PARAMETER_NAME_MAPPING = { + "learning_rate": ["lr", "learningrate"], + "batch_size": ["batch", "bs", "batch_size_per_gpu"], + "epochs": ["num_epochs", "max_epochs", "epoch"], + "weight_decay": ["wd", "weightdecay"], + "dropout_rate": ["dropout", "drop_rate"], + } + + @staticmethod + def pack(pack_input): + shell_path = pack_input.shell_path + output_zip_path = pack_input.output_zip_path + + if not isinstance(shell_path, list): + raise TypeError("shell_path should be a list of file paths.") + + for index, script_path in enumerate(shell_path): + if os.path.isfile(script_path): + hyperparameters = HyperparameterChecker._extract_hyperparameters_from_script(script_path) + if hyperparameters: + create_file_in_zip(output_zip_path, os.path.join(HyperparameterChecker.target_name_in_zip, + HyperparameterChecker.target_name_in_zip + + Const.REPLACEMENT_CHARACTER + str(index) + + FileCheckConst.JSON_SUFFIX), + json.dumps(hyperparameters, indent=4)) + config_checking_print(f"add hyperparameters args to zip") + else: + config_checking_print(f"Warning: Failed to extract hyperparameters from script {script_path}") + else: + config_checking_print(f"Warning: Script path {script_path} is not a file.") + + @staticmethod + def compare(bench_dir, cmp_dir, output_path): + bench_model_dir = os.path.join(bench_dir, HyperparameterChecker.target_name_in_zip) + cmp_model_dir = os.path.join(cmp_dir, HyperparameterChecker.target_name_in_zip) + output_filepath = os.path.join(output_path, HyperparameterChecker.result_filename) + bench_hyperparameters = HyperparameterChecker.load_hyperparameters(bench_model_dir) + cmp_hyperparameters = HyperparameterChecker.load_hyperparameters(cmp_model_dir) + + if len(bench_hyperparameters) != len(cmp_hyperparameters): + config_checking_print("The shell path length dose not match!") + raise Exception("The shell path length dose not match!") + + all_diffs = [] + all_files = set(bench_hyperparameters.keys()) | set(cmp_hyperparameters.keys()) + + for filename in all_files: + bench_params = bench_hyperparameters.get(filename, {}) + cmp_params = cmp_hyperparameters.get(filename, {}) + + if bench_params and cmp_params: + all_diffs.extend(HyperparameterChecker.compare_param(bench_params, cmp_params, filename)) + + elif bench_params is not None: + all_diffs.append(f"[Only in benchmark] File: {filename}") + else: + all_diffs.append(f"[Only in compare] File: {filename}") + create_file_with_list(all_diffs, output_filepath) + + @staticmethod + def compare_param(bench_params, cmp_params, filename): + all_diffs = [] + file_diffs = [] + bench_param_names = bench_params.keys() + for bench_param_name in bench_param_names: + matched_cmp_param_name = HyperparameterChecker._fuzzy_match_parameter(bench_param_name, cmp_params) + if matched_cmp_param_name: + bench_param_value = bench_params[bench_param_name] + cmp_param_value = cmp_params[matched_cmp_param_name] + if bench_param_value != cmp_param_value: + diff = compare_dict({bench_param_name: bench_param_value}, + {matched_cmp_param_name: cmp_param_value}) + if diff: + file_diffs.extend( + [f" Parameter '{bench_param_name}' (matched with '{matched_cmp_param_name}'): {d}" + for d in diff]) + del cmp_params[matched_cmp_param_name] + else: + file_diffs.append( + f" [Only in benchmark] Parameter: '{bench_param_name}': {bench_params[bench_param_name]}") + for cmp_param_name, cmp_param_value in cmp_params.items(): + file_diffs.append(f" [Only in compare] Parameter: '{cmp_param_name}': {cmp_param_value}") + if file_diffs: + file_diffs.sort() + all_diffs.append(f"File: {filename}") + all_diffs.extend(file_diffs) + return all_diffs + + @staticmethod + def load_hyperparameters(model_dir): + hyperparameters = {} + if not os.path.exists(model_dir): + return hyperparameters + subfiles = os_walk_for_files(model_dir, Const.MAX_TRAVERSAL_DEPTH) + for files in subfiles: + if files["file"].endswith(FileCheckConst.JSON_SUFFIX): + filepath = os.path.join(files["root"], files["file"]) + relative_filepath = os.path.relpath(filepath, model_dir) + params = load_json(filepath) + if params: + hyperparameters[relative_filepath] = params + return hyperparameters + + @staticmethod + def _extract_hyperparameters_from_script(script_path: str) -> Dict[str, Any]: + """ + Extracts arguments from bash script used to run a model training. + """ + hyperparameters = {} + script_content_list = [] + with FileOpen(script_path, 'r') as file: + for line in file: + stripped_line = line.lstrip() + if not stripped_line.startswith('#'): + line = line.split('#')[0].rstrip() + '\n' + if line.strip(): + script_content_list.append(line) + script_content = ''.join(script_content_list) + + command_line = re.search(r'torchrun\s[^|]*|python -m torch.distributed.launch\s[^|]*', script_content, + re.DOTALL) + if command_line: + command_line = command_line.group() + + blocks = re.findall(r'([a-zA-Z0-9_]{1,20}_ARGS)="(.*?)"', script_content, re.DOTALL) + block_contents = {} + for block_name, block_content in blocks: + block_content = block_content.replace('\n', ' ') + block_contents[block_name] = block_content + command_line = command_line.replace(f"${block_name}", block_content) + + matches = re.findall(r'--([\w-]+)(?:\s+([^\s\\]+))?', command_line) + for match in matches: + key, value = match + args_key = re.match(r'\$\{?(\w+)}?', value) + if args_key: + env_vars = re.findall(rf'{args_key.group(1)}=\s*(.+)', script_content) + if env_vars: + value = env_vars[-1] + hyperparameters[key] = value if value else True + + return hyperparameters + + @staticmethod + def _fuzzy_match_parameter(param_name: str, available_params: Dict[str, Any]) -> Union[str, None]: + """ + Fuzzy matches a parameter name against available parameter names using predefined + mappings and string similarity. + """ + if param_name in available_params: + return param_name + + canonical_name = None + for standard_name, aliases in HyperparameterChecker.PARAMETER_NAME_MAPPING.items(): + if param_name == standard_name or param_name in aliases: + canonical_name = standard_name + break + + if canonical_name: + if canonical_name in available_params: + return canonical_name + for alias in HyperparameterChecker.PARAMETER_NAME_MAPPING[canonical_name]: + if alias in available_params: + config_checking_print( + f"Matched '{param_name}' to alias '{alias}' via canonical name '{canonical_name}'") + return alias + + best_match_name = None + best_match_ratio = 0.8 + for available_param_name in available_params: + ratio = SequenceMatcher(None, param_name.lower(), available_param_name.lower()).ratio() + if ratio > best_match_ratio: + best_match_ratio = ratio + best_match_name = available_param_name + + if best_match_name: + config_checking_print( + f"Fuzzy matched parameter '{param_name}' to '{best_match_name}' (similarity: {best_match_ratio:.2f})") + return best_match_name + + return None diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/pip_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/pip_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..7d122028ae85a982a515c222ed3e6281a934a72f --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/pip_checker.py @@ -0,0 +1,94 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import re +try: + import importlib.metadata as metadata +except ImportError: + import importlib_metadata as metadata + +from msprobe.core.common.file_utils import load_yaml, create_file_with_content, create_file_in_zip +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker +from msprobe.pytorch.config_checking.utils.utils import merge_keys +from msprobe.pytorch.config_checking.config_checker import register_checker_item +from msprobe.pytorch.config_checking.utils.utils import config_checking_print +from msprobe.core.common.file_utils import FileOpen + +dirpath = os.path.dirname(__file__) +depend_path = os.path.join(dirpath, "../resource/dependency.yaml") + + +def load_pip_txt(file_path): + output_dir = {} + with FileOpen(file_path, 'r', encoding='utf-8') as file: + lines = file.readlines() + for line in lines: + info_list = line.strip().split("=") + output_dir[info_list[0]] = "" if len(info_list) != 2 else info_list[1] + return output_dir + + +def collect_pip_data(): + result = "" + packages = metadata.distributions() + for pkg in packages: + result += f"{pkg.metadata['Name']}={pkg.version}\n" + return result + + +def compare_pip_data(npu_path, bench_path): + error_message = "" + warning_message = "" + necessary_dependency = load_yaml(depend_path)["dependency"] + npu_data = load_pip_txt(npu_path) + bench_data = load_pip_txt(bench_path) + key_list = merge_keys(npu_data, bench_data) + for package in key_list: + if package in necessary_dependency: + if npu_data.get(package) and bench_data.get(package) and npu_data.get(package) == bench_data.get(package): + continue + error_message += (f"package_name:{package}, npu_version:{npu_data.get(package)}, " + f"bench_version:{bench_data.get(package)}\n") + else: + if npu_data.get(package) and bench_data.get(package) and npu_data.get(package) == bench_data.get(package): + continue + warning_message += (f"package_name:{package}, npu_version:{npu_data.get(package)}, " + f"bench_version:{bench_data.get(package)}\n") + return error_message, warning_message + + +@register_checker_item("pip") +class PipPackageChecker(BaseChecker): + input_needed = "need_pip_data" + + target_name_in_zip = "pip_data.txt" + result_filename = "pip_data_check_result.txt" + + @staticmethod + def pack(pack_input): + output_zip_path = pack_input.output_zip_path + pip_data = collect_pip_data() + create_file_in_zip(output_zip_path, PipPackageChecker.target_name_in_zip, pip_data) + config_checking_print(f"add pip info to zip") + + @staticmethod + def compare(bench_dir, cmp_dir, output_path): + bench_pip_data = os.path.join(bench_dir, PipPackageChecker.target_name_in_zip) + cmp_pip_data = os.path.join(cmp_dir, PipPackageChecker.target_name_in_zip) + pip_error_message, pip_warning_message = compare_pip_data(bench_pip_data, cmp_pip_data) + output_filepath = os.path.join(output_path, PipPackageChecker.result_filename) + result = f"-pip_error_message:\n {pip_error_message}\n-pip_warning_message:\n {pip_warning_message}" + create_file_with_content(result, output_filepath) diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/weights_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/weights_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..0275eacf3629618072e3875873e18cf536ef5b09 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/checkers/weights_checker.py @@ -0,0 +1,77 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +import torch + +from msprobe.core.common.file_utils import create_file_with_list, create_file_with_content, load_yaml, load_json, \ + create_file_in_zip +from msprobe.pytorch.common.utils import get_rank_id +from msprobe.pytorch.config_checking.checkers.base_checker import BaseChecker +from msprobe.pytorch.config_checking.config_checker import register_checker_item, register_pre_forward_fun_list +from msprobe.pytorch.config_checking.utils.utils import (merge_keys, config_checking_print, + get_tensor_features, + read_rank_result_to_dict, compare_dicts) + + +def collect_weights_data(model): + weights_data = {} + for name, param in model.named_parameters(): + if param.dtype == torch.bfloat16: + param = param.float() + weights_data[name] = get_tensor_features(param) + return weights_data + + +def compare_weights_data(npu_path, bench_path): + npu_data = load_json(npu_path) + bench_data = load_json(bench_path) + key_list = merge_keys(npu_data, bench_data) + result = "The weights of blow tensor is not same:\n" + for tensor_name in key_list: + if npu_data.get(tensor_name) != bench_data.get(tensor_name): + result += f"{tensor_name}\n" + return result + + +@register_checker_item("weights") +class WeightsChecker(BaseChecker): + input_needed = "model" + multi_rank = True + + target_name_in_zip = "weights" + result_filename = "weights_data_check_result.txt" + + @staticmethod + def pack(pack_input): + output_zip_path = pack_input.output_zip_path + + def collect_weights(model, data_input): + weights_data_dict = collect_weights_data(model) + weights_data_filepath = os.path.join(WeightsChecker.target_name_in_zip, f"rank{get_rank_id()}.json") + create_file_in_zip(output_zip_path, weights_data_filepath, json.dumps(weights_data_dict, indent=4)) + config_checking_print(f"add weights info to zip") + register_pre_forward_fun_list(collect_weights) + + @staticmethod + def compare(bench_dir, cmp_dir, output_path): + bench_weight_pack_path = os.path.join(bench_dir, WeightsChecker.target_name_in_zip) + cmp_weight_pack_path = os.path.join(cmp_dir, WeightsChecker.target_name_in_zip) + bench_weight = read_rank_result_to_dict(bench_weight_pack_path) + cmp_weight = read_rank_result_to_dict(cmp_weight_pack_path) + deleted, added, changed, result = compare_dicts(bench_weight, cmp_weight) + output_filepath = os.path.join(output_path, WeightsChecker.result_filename) + create_file_with_content(json.dumps(result, indent=4), output_filepath) diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/config_checker.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/config_checker.py new file mode 100644 index 0000000000000000000000000000000000000000..619c4687e74771a1fd1cd05aba68a2666231b8d4 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/config_checker.py @@ -0,0 +1,97 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import shutil + +import torch +import torch.distributed as dist + +from msprobe.core.common.file_utils import load_json, split_zip_file_path, create_directory, extract_zip, make_dir +from msprobe.pytorch.config_checking.checkers.base_checker import PackInput +from msprobe.pytorch.config_checking.utils.utils import config_checking_print + + + +class ConfigChecker: + checkers = {} + pre_forward_fun_list = [] + + def __init__(self, config_filepath, model=None): + if model and not isinstance(model, torch.nn.Module): + raise Exception(f"{model} is not a torch.nn.Module") + config_dict = load_json(config_filepath) + self.pack_input = PackInput(config_dict, model) + file_path, file_name = split_zip_file_path(self.pack_input.output_zip_path) + if not os.path.exists(file_path): + create_directory(file_path) + self.pack() + else: + if os.path.exists(self.pack_input.output_zip_path): + raise Exception("The output file path already exist!") + self.pack() + + + @staticmethod + def compare(bench_zip_path, cmp_zip_path, outpath): + if os.path.exists(outpath): + shutil.rmtree(outpath) + bench_dir = os.path.join(outpath, "bench") + cmp_dir = os.path.join(outpath, "cmp") + extract_zip(bench_zip_path, bench_dir) + config_checking_print(f"extract zip file {bench_zip_path} to {bench_dir}") + extract_zip(cmp_zip_path, cmp_dir) + config_checking_print(f"extract zip file {cmp_zip_path} to {cmp_dir}") + + output_dir = os.path.join(outpath, "output") + make_dir(output_dir) + for checker in ConfigChecker.checkers.values(): + checker.compare_ex(bench_dir, cmp_dir, output_dir) + config_checking_print(f"config checking result save to {os.path.realpath(output_dir)}") + + + def pack(self): + config_checking_print(f"pack result zip path {os.path.realpath(self.pack_input.output_zip_path)}") + if dist.is_initialized() and dist.get_rank() == 0: + config_checking_print(f"pack result zip path {self.pack_input.output_zip_path}") + if os.path.exists(self.pack_input.output_zip_path): + os.remove(self.pack_input.output_zip_path) + + def hook(*args, **kwargs): + for collect_func in self.pre_forward_fun_list: + collect_func(*args, **kwargs) + config_checking_print(f"!!! Exit at beginning of first step. Don't wory, it's as expected. !!!") + raise Exception(f"!!! Exit at beginning of first step. Don't wory, it's as expected. !!!") + + if self.pack_input.model: + self.pack_input.model.register_forward_pre_hook(hook) + for checker in ConfigChecker.checkers.values(): + if checker.input_needed and not getattr(self.pack_input, checker.input_needed): + continue + if dist.is_initialized() and dist.get_rank() != 0 and not checker.multi_rank: + continue + checker.pack(self.pack_input) + + +def register_checker_item(key, cls=None): + if cls is None: + # 无参数时,返回装饰器函数 + return lambda cls: register_checker_item(key, cls) + ConfigChecker.checkers[key] = cls + return cls + + +def register_pre_forward_fun_list(func): + ConfigChecker.pre_forward_fun_list.append(func) diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_vf.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/config_checking.py similarity index 34% rename from debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_vf.py rename to debug/accuracy_tools/msprobe/pytorch/config_checking/config_checking.py index 05ee3bc92257be9882c20cf825ebb7561f41ddb1..414bc5ae6e1a1c0142502e5d7f8f0b88fb66895b 100644 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_vf.py +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/config_checking.py @@ -13,48 +13,30 @@ # See the License for the specific language governing permissions and # limitations under the License. -import os -import torch +from msprobe.pytorch.config_checking.config_checker import ConfigChecker +from msprobe.pytorch.common.log import logger -from msprobe.core.common.const import Const -from msprobe.core.common.file_utils import load_yaml -from msprobe.pytorch.hook_module.hook_module import HOOKModule -from msprobe.pytorch.common.utils import torch_device_guard +def pack(config_filepath): + ConfigChecker(config_filepath) -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, "support_wrap_ops.yaml") +def compare(bench_zip_path, cmp_zip_path, outpath): + ConfigChecker.compare(bench_zip_path, cmp_zip_path, outpath) -def get_vf_ops(): - yaml_data = load_yaml(yaml_path) - wrap_vf_ops = yaml_data.get('_VF') - return wrap_vf_ops +def _config_checking_parser(parser): + parser.add_argument('-pack', '--pack', help='Pack a directory into a zip file') + parser.add_argument('-c', '--compare', nargs=2, help='Compare two zip files') + parser.add_argument('-o', '--output', help='output path, default is current directory') -class HOOKVfOP(object): - pass - -class VfOPTemplate(HOOKModule): - def __init__(self, op_name, hook): - self.op_name_ = op_name - self.prefix_op_name_ = "VF" + Const.SEP + str(op_name) + Const.SEP - super().__init__(hook) - - @torch_device_guard - def forward(self, *args, **kwargs): - return getattr(torch._C._VariableFunctionsClass, str(self.op_name_))(*args, **kwargs) - - -def wrap_vf_op(op_name, hook): - def vf_op_template(*args, **kwargs): - return VfOPTemplate(op_name, hook)(*args, **kwargs) - - return vf_op_template - - -def wrap_vf_ops_and_bind(hook): - _vf_ops = get_vf_ops() - for op_name in _vf_ops: - setattr(HOOKVfOP, "wrap_" + op_name, wrap_vf_op(op_name, hook)) +def _run_config_checking_command(args): + if args.pack: + pack(args.pack) + elif args.compare: + output_dirpath = args.output if args.output else "./config_check_result" + compare(args.compare[0], args.compare[1], output_dirpath) + else: + logger.error("The param is not correct, you need to give '-pack' for pack or '-c' for compare.") + raise Exception("The param is not correct, you need to give '-pack' for pack or '-c' for compare.") diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/resource/dependency.yaml b/debug/accuracy_tools/msprobe/pytorch/config_checking/resource/dependency.yaml new file mode 100644 index 0000000000000000000000000000000000000000..98a8dec5919ff7eec2493197fffeb42f2f724cef --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/resource/dependency.yaml @@ -0,0 +1,25 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +dependency: + - transformers + - deepspeed + - megatron + - numpy + - transformers + - datasets + - torch + - torchversion + - peft \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/resource/env.yaml b/debug/accuracy_tools/msprobe/pytorch/config_checking/resource/env.yaml new file mode 100644 index 0000000000000000000000000000000000000000..13ea0e39f89b4807b72a6322ddc865145d9fde9d --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/resource/env.yaml @@ -0,0 +1,38 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +HCCL_DETERMINISTIC: + - name: HCCL_DETERMINISTIC + default_value: False + +HCCL_ALGO: + - name: HCCL_ALGO + default_value: None + +HCCL_INTRA_ROCE_ENABLE: + - name: HCCL_INTRA_ROCE_ENABLE + default_value: 0 + +HCCL_INTRA_PICE_ENABLE: + - name: HCCL_INTRA_PICE_ENABLE + default_value: 1 + +ASCEND_LAUNCH_BLOCKING: + - name: ASCEND_LAUNCH_BLOCKING + default_value: False + +ASCEND_RT_VISIBLE_DEVICE: + - name: ASCEND_RT_VISIBLE_DEVICE + default_value: None \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/__init__.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..0d07baefa2f4f8a02e48c22e68087662a551837c --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/__init__.py @@ -0,0 +1,18 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__all__ = ['apply_patches'] + +from msprobe.pytorch.config_checking.utils.random_patch import apply_patches diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/random_patch.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/random_patch.py new file mode 100644 index 0000000000000000000000000000000000000000..9c2eb41e7b93ae1e70033ad0daa4757233e69245 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/random_patch.py @@ -0,0 +1,89 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import random +import traceback +from functools import wraps + +import numpy as np +import torch +from msprobe.pytorch.config_checking.utils.utils import config_checking_print + + +DEFAULT_RANDOM_LOG_PATH = './random_patch.log' + + +def __log_stack(func): + @wraps(func) + def wrapper(*args, **kwargs): + stack = traceback.format_stack() + msg = f"info: random function {func.__name__} called. Call stack:" + for line in stack[:-1]: + msg += '\n' + line.strip() + logging.info(msg) + return func(*args, **kwargs) + + return wrapper + + +def __check_torch_with_device(func): + @wraps(func) + def wrapper(*args, **kwargs): + if 'device' in kwargs: + # 获取调用栈信息以确定文件和行号 + stack = traceback.extract_stack() + caller = stack[-2] + file_name = caller.filename + line_number = caller.lineno + logging.warning(f"Warning: torch function {func.__name__} called with device specified in {file_name} " + f"at line {line_number}.") + return func(*args, **kwargs) + return wrapper + + +def __track_func(func): + return __log_stack(__check_torch_with_device(func)) + + +def apply_patches(): + # init logging + logging.basicConfig(filename=DEFAULT_RANDOM_LOG_PATH, level=logging.INFO) + + # Patch random module + random.random = __track_func(random.random) + random.randint = __track_func(random.randint) + random.uniform = __track_func(random.uniform) + random.choice = __track_func(random.choice) + + # Patch numpy.random module + np.random.rand = __track_func(np.random.rand) + np.random.randint = __track_func(np.random.randint) + np.random.choice = __track_func(np.random.choice) + np.random.normal = __track_func(np.random.normal) + + # Patch torch random functions + torch.rand = __track_func(torch.rand) + torch.randint = __track_func(torch.randint) + torch.randn = __track_func(torch.randn) + torch.rand_like = __track_func(torch.rand_like) + torch.randint_like = __track_func(torch.randint_like) + torch.randn_like = __track_func(torch.randn_like) + torch.manual_seed = __track_func(torch.manual_seed) + + # Patch torch.Tensor random function + torch.Tensor.exponential_ = __track_func(torch.Tensor.exponential_) + + config_checking_print(f"random patches saved to file: {DEFAULT_RANDOM_LOG_PATH}") diff --git a/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/utils.py b/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..5b67d229836babd18c9a00d9cc1c0817302f6f50 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/config_checking/utils/utils.py @@ -0,0 +1,111 @@ +# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import re +import hashlib + +import torch + +from msprobe.core.common.file_utils import load_json +from msprobe.pytorch.common.log import logger + + +def merge_keys(dir_0, dir_1): + output_list = list(dir_0.keys()) + output_list.extend(list(dir_1.keys())) + return set(output_list) + + +def compare_dict(bench_dict, cmp_dict): + result = [] + for key in set(bench_dict.keys()) | set(cmp_dict.keys()): + if key in bench_dict and key in cmp_dict: + if bench_dict[key] != cmp_dict[key]: + result.append(f"{key}: {bench_dict[key]} -> {cmp_dict[key]}") + elif key in bench_dict: + result.append(f"{key}: [deleted] -> {bench_dict[key]}") + else: + result.append(f"{key}: [added] -> {cmp_dict[key]}") + return result + + +def config_checking_print(msg): + logger.info(f"[config checking log] {msg}") + + +def tensor_to_hash(tensor): + """Compute the hash value of a tensor""" + tensor_bytes = tensor.clone().detach().cpu().numpy().tobytes() + return bytes_hash(tensor_bytes) + + +def get_tensor_features(tensor): + features = { + "hash": tensor_to_hash, + "max": lambda x: torch.max(x).item(), + "min": lambda x: torch.min(x).item(), + "mean": lambda x: torch.mean(x).item(), + "norm": lambda x: torch.norm(x).item(), + } + + if not tensor.is_floating_point() or tensor.dtype == torch.float64: + tensor = tensor.float() + return {key: features.get(key)(tensor) for key in features} + + +def read_rank_result_to_dict(dirpath): + result = {} + filenames = [filename for filename in os.listdir(dirpath) if re.match(r"rank\d+", filename)] + filenames.sort(key=lambda x: int(x.split('.')[0][4:])) # rank1.json 按rank排序 + for filename in filenames: + filepath = os.path.join(dirpath, filename) + result[filename.split('.')[0]] = load_json(filepath) + return result + + +def compare_dicts(dict1, dict2, path=''): + deleted = [] + added = [] + changed = [] + result = {} + + for key in dict1: + if key not in dict2: + deleted.append(f"[Deleted]: {path + key}") + result[key] = "[deleted]" + else: + if isinstance(dict1[key], dict) and isinstance(dict2[key], dict): + sub_deleted, sub_added, sub_changed, sub_result = compare_dicts( + dict1[key], dict2[key], path + key + '/') + deleted.extend(sub_deleted) + added.extend(sub_added) + changed.extend(sub_changed) + if sub_result: + result[key] = sub_result + elif dict1[key] != dict2[key]: + changed.append(f"[Changed]: {path + key} : {dict1[key]} -> {dict2[key]}") + result[key] = f"[changed]: {dict1[key]} -> {dict2[key]}" + for key in dict2: + if key not in dict1: + added.append(f"[Added]: {path + key}") + result[key] = "[added]" + return deleted, added, changed, result + + +def bytes_hash(obj: bytes): + hex_dig = hashlib.sha256(obj).hexdigest() + short_hash = int(hex_dig, 16) % (2 ** 16) + return short_hash diff --git a/debug/accuracy_tools/msprobe/pytorch/debugger/precision_debugger.py b/debug/accuracy_tools/msprobe/pytorch/debugger/precision_debugger.py index 5bb1d3a14e82d7b4bce9d7da8921a1d701e82222..a19702ff864f4a735c68e6daa4f6027db6cf4f8d 100644 --- a/debug/accuracy_tools/msprobe/pytorch/debugger/precision_debugger.py +++ b/debug/accuracy_tools/msprobe/pytorch/debugger/precision_debugger.py @@ -19,7 +19,7 @@ import torch from msprobe.core.common.const import Const, FileCheckConst, MsgConst from msprobe.core.common.exceptions import MsprobeException from msprobe.core.common.file_utils import FileChecker -from msprobe.core.common.utils import get_real_step_or_rank +from msprobe.core.common.utils import get_real_step_or_rank, check_init_step from msprobe.pytorch.common.log import logger from msprobe.pytorch.common.utils import check_save_param from msprobe.pytorch.debugger.debugger_config import DebuggerConfig @@ -171,6 +171,14 @@ class PrecisionDebugger: except ValueError: return instance.service.save(variable, name, save_backward) + + @classmethod + def set_init_step(cls, step): + instance = cls._instance + if not instance: + raise Exception(MsgConst.NOT_CREATED_INSTANCE) + check_init_step(step) + instance.service.init_step = step def module_dump(module, dump_name): diff --git a/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_dump.py b/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_dump.py index 4700de6f1f9f3b5ddfb9507decb6f8739b5eda9b..cc78962f401a9e4f46d5794d7ca074f2e37f45e0 100644 --- a/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_dump.py +++ b/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_dump.py @@ -17,7 +17,7 @@ import torch from msprobe.core.common.const import Const from msprobe.core.data_dump.scope import BaseScope from msprobe.pytorch.common.log import logger -from msprobe.pytorch.hook_module.api_registry import api_register +from msprobe.pytorch.hook_module.api_register import get_api_register torch_version_above_or_equal_2 = torch.__version__.split('+')[0] >= '2.0' @@ -26,13 +26,14 @@ class ModuleDumper: def __init__(self, service): self.service = service self.hook_handle_list = [] + self.api_register = get_api_register() def start_module_dump(self, module, dump_name): - api_register.api_originality() + self.api_register.restore_all_api() self.register_hook(module, dump_name) def stop_module_dump(self): - api_register.api_modularity() + self.api_register.register_all_api() for hook_handle in self.hook_handle_list: if isinstance(hook_handle, torch.utils.hooks.RemovableHandle): hook_handle.remove() diff --git a/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_processer.py b/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_processer.py index b5ca1da461fd4235a09172de4b9dcea34a624e58..37611f4db3238b0002f4a2f69ea94f98f38c09e2 100644 --- a/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_processer.py +++ b/debug/accuracy_tools/msprobe/pytorch/dump/module_dump/module_processer.py @@ -17,9 +17,10 @@ from functools import wraps import torch from msprobe.core.common.const import Const +from msprobe.core.common.utils import recursion_depth_decorator from msprobe.core.data_dump.scope import BaseScope, ModuleRangeScope, MixRangeScope from msprobe.pytorch.common.log import logger -from msprobe.pytorch.common.utils import replace_last_occurrence +from msprobe.pytorch.common.utils import replace_last_occurrence, is_float8_tensor from torch.utils.checkpoint import checkpoint as origin_checkpoint from torch.utils.checkpoint import set_checkpoint_early_stop from torch.utils.hooks import BackwardHook @@ -58,8 +59,9 @@ class ModuleProcesser: return clone_return_value_func @staticmethod + @recursion_depth_decorator("ModuleDump: ModuleProcesser.clone_if_tensor") def clone_if_tensor(result): - if isinstance(result, torch.Tensor): + if isinstance(result, torch.Tensor) and not is_float8_tensor(result): return result.clone() elif type(result) is tuple: return tuple(ModuleProcesser.clone_if_tensor(x) for x in result) @@ -109,6 +111,8 @@ class ModuleProcesser: for name, module in modules_and_names: if module == model: continue + if module.__class__.__name__ == "FullyShardedDataParallel": + continue module_index = (index + Const.SEP) if index != "-1" else "" prefix_name = (BaseScope.Module_Type_Module + Const.SEP + module_index + name + Const.SEP + module.__class__.__name__ + Const.SEP) diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/api_register.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/api_register.py new file mode 100644 index 0000000000000000000000000000000000000000..f8da9453e8317f942f65a9366fb93da898103625 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/hook_module/api_register.py @@ -0,0 +1,142 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import functools +import os + +import torch +import torch.distributed as dist + +from msprobe.core.common.const import Const +from msprobe.core.data_dump.api_registry import ApiRegistry +from msprobe.pytorch.common.utils import ( + torch_without_guard_version, is_gpu, torch_device_guard, parameter_adapter +) +from msprobe.pytorch.function_factory import npu_custom_functions +from msprobe.pytorch.hook_module.hook_module import HOOKModule +from msprobe.pytorch.hook_module.utils import dynamic_import_op + +try: + import mindspeed.ops +except ImportError: + mindspeed_enable = False +else: + mindspeed_enable = True + + +torch_version_above_2 = torch.__version__.split('+')[0] > '2.0' + +_api_types = { + Const.PT_FRAMEWORK: { + Const.PT_API_TYPE_FUNCTIONAL: (torch.nn.functional, (torch.nn.functional,)), + Const.PT_API_TYPE_TENSOR: (torch.Tensor, (torch.Tensor,)), + Const.PT_API_TYPE_TORCH: (torch, (torch,)), + Const.PT_API_TYPE_VF: (torch._C._VariableFunctionsClass, (torch._VF,)), + Const.PT_API_TYPE_DIST: (dist, (dist, dist.distributed_c10d)) + } +} +if not is_gpu: + import torch_npu + if torch_without_guard_version: + _api_types.get(Const.PT_FRAMEWORK).update( + { + Const.PT_API_TYPE_NPU: (torch.ops.npu, (torch_npu, torch.ops.npu)) + } + ) + else: + _api_types.get(Const.PT_FRAMEWORK).update( + {Const.PT_API_TYPE_NPU: (torch_npu._C._VariableFunctionsClass, (torch_npu,))} + ) + _api_types.get(Const.PT_FRAMEWORK).update( + { + Const.PT_API_TYPE_NPU_DIST: (torch_npu.distributed, (torch_npu.distributed, + torch_npu.distributed.distributed_c10d)) + } + ) + if mindspeed_enable: + _api_types.get(Const.PT_FRAMEWORK).update({Const.PT_API_TYPE_MINDSPEED: (mindspeed.ops, (mindspeed.ops,))}) + dynamic_import_op(mindspeed.ops) + +_inner_used_api = {} +_supported_api_list_path = (os.path.join(os.path.dirname(os.path.realpath(__file__)), Const.SUPPORT_API_FILE_NAME),) +_cuda_func_mapping = {"npu_fusion_attention": "gpu_fusion_attention"} + + +@parameter_adapter +def tensor_module_forward(module, *args, **kwargs): + return module.api_func(*args, **kwargs) + + +def dist_module_forward(module, *args, **kwargs): + handle = module.api_func(*args, **kwargs) + if kwargs.get("async_op") or module.api_name in ["isend", "irecv"]: + if handle and hasattr(handle, 'wait'): + handle.wait() + if module.api_name == "batch_isend_irecv": + if isinstance(handle, list): + for req in handle: + req.wait() + return handle + + +def npu_module_forward(module, *args, **kwargs): + if not module.need_hook: + if module.api_name not in npu_custom_functions: + raise Exception(f'There is not bench function {module.api_name}') + if module.device == Const.CUDA_LOWERCASE: + module.api_name = _cuda_func_mapping.get(module.api_name, module.api_name) + if module.device in [Const.CUDA_LOWERCASE, Const.CPU_LOWERCASE]: + return npu_custom_functions[module.api_name](*args, **kwargs) + return module.api_func(*args, **kwargs) + + +forward_methods = { + "Tensor": tensor_module_forward, + "Distributed": dist_module_forward, + "NPU": npu_module_forward +} + + +class ApiTemplate(HOOKModule): + def __init__(self, api_name, api_func, prefix, hook_build_func, need_hook=True, device=Const.CPU_LOWERCASE): + self.api_name = api_name + self.api_func = api_func + self.prefix = prefix + self.prefix_api_name = prefix + Const.SEP + str(api_name.split(Const.SEP)[-1]) + Const.SEP + self.need_hook = need_hook + self.device = device + if self.need_hook: + super().__init__(hook_build_func) + if prefix == Const.DIST_API_TYPE_PREFIX: + self.op_is_distributed = True + + @torch_device_guard + def forward(self, *args, **kwargs): + exec_func = forward_methods.get(self.prefix) + exec_func = functools.partial(exec_func, self) if exec_func else self.api_func + return exec_func(*args, **kwargs) + + +api_register = None + + +def get_api_register(return_new=False): + if return_new: + return ApiRegistry(_api_types, _inner_used_api, _supported_api_list_path, ApiTemplate) + + global api_register + if api_register is None: + api_register = ApiRegistry(_api_types, _inner_used_api, _supported_api_list_path, ApiTemplate) + return api_register diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/api_registry.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/api_registry.py deleted file mode 100644 index 1aad89bd6e89ae839513001b1d51572b50d8280b..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/api_registry.py +++ /dev/null @@ -1,166 +0,0 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import torch -import torch.distributed as dist - -from msprobe.pytorch.hook_module import wrap_torch, wrap_functional, wrap_tensor, wrap_vf, wrap_distributed, wrap_aten -from msprobe.pytorch.hook_module.wrap_aten import get_aten_ops -from msprobe.pytorch.hook_module.wrap_distributed import get_distributed_ops -from msprobe.pytorch.hook_module.wrap_functional import get_functional_ops -from msprobe.pytorch.hook_module.wrap_tensor import get_tensor_ops -from msprobe.pytorch.hook_module.wrap_torch import get_torch_ops -from msprobe.pytorch.hook_module.wrap_vf import get_vf_ops -from msprobe.pytorch.common.utils import torch_without_guard_version, npu_distributed_api, is_gpu -from msprobe.core.common.const import Const - -torch_version_above_2 = torch.__version__.split('+')[0] > '2.0' - -if not is_gpu: - import torch_npu - from . import wrap_npu_custom - from .wrap_npu_custom import get_npu_ops - - -class ApiRegistry: - def __init__(self): - self.tensor_ori_attr = {} - self.torch_ori_attr = {} - self.functional_ori_attr = {} - self.distributed_ori_attr = {} - self.npu_distributed_ori_attr = {} - self.vf_ori_attr = {} - self.aten_ori_attr = {} - self.torch_npu_ori_attr = {} - - self.tensor_hook_attr = {} - self.torch_hook_attr = {} - self.functional_hook_attr = {} - self.distributed_hook_attr = {} - self.npu_distributed_hook_attr = {} - self.vf_hook_attr = {} - self.aten_hook_attr = {} - self.torch_npu_hook_attr = {} - - @staticmethod - def store_ori_attr(ori_api_group, api_list, api_ori_attr): - for api in api_list: - if '.' in api: - sub_module_name, sub_op = api.rsplit('.', 1) - sub_module = getattr(ori_api_group, sub_module_name) - api_ori_attr[api] = getattr(sub_module, sub_op) - else: - api_ori_attr[api] = getattr(ori_api_group, api) - - @staticmethod - def set_api_attr(api_group, attr_dict): - for api, api_attr in attr_dict.items(): - if '.' in api: - sub_module_name, sub_op = api.rsplit('.', 1) - sub_module = getattr(api_group, sub_module_name, None) - if sub_module is not None: - setattr(sub_module, sub_op, api_attr) - else: - setattr(api_group, api, api_attr) - - def api_modularity(self): - self.set_api_attr(torch.Tensor, self.tensor_hook_attr) - self.set_api_attr(torch, self.torch_hook_attr) - self.set_api_attr(torch.nn.functional, self.functional_hook_attr) - self.set_api_attr(dist, self.distributed_hook_attr) - self.set_api_attr(dist.distributed_c10d, self.distributed_hook_attr) - if not is_gpu and not torch_without_guard_version: - self.set_api_attr(torch_npu.distributed, self.npu_distributed_hook_attr) - self.set_api_attr(torch_npu.distributed.distributed_c10d, self.npu_distributed_hook_attr) - if torch_version_above_2: - self.set_api_attr(torch.ops.aten, self.aten_hook_attr) - self.set_api_attr(torch._VF, self.vf_hook_attr) - if not is_gpu: - self.set_api_attr(torch_npu, self.torch_npu_hook_attr) - - def api_originality(self): - self.set_api_attr(torch.Tensor, self.tensor_ori_attr) - self.set_api_attr(torch, self.torch_ori_attr) - self.set_api_attr(torch.nn.functional, self.functional_ori_attr) - self.set_api_attr(dist, self.distributed_ori_attr) - self.set_api_attr(dist.distributed_c10d, self.distributed_ori_attr) - if not is_gpu and not torch_without_guard_version: - self.set_api_attr(torch_npu.distributed, self.npu_distributed_ori_attr) - self.set_api_attr(torch_npu.distributed.distributed_c10d, self.npu_distributed_ori_attr) - if torch_version_above_2: - self.set_api_attr(torch.ops.aten, self.aten_ori_attr) - self.set_api_attr(torch._VF, self.vf_ori_attr) - if not is_gpu: - self.set_api_attr(torch_npu, self.torch_npu_ori_attr) - - def initialize_hook(self, hook, online_run_ut=False): - """ - initialize_hook - Args: - hook (_type_): initialize_hook - online_run_ut (bool): default False, whether online run_ut or not. - If online_run_ut is True, the hook will not wrap the aten ops. - """ - self.store_ori_attr(torch.Tensor, get_tensor_ops(), self.tensor_ori_attr) - wrap_tensor.wrap_tensor_ops_and_bind(hook) - for attr_name in dir(wrap_tensor.HOOKTensor): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.tensor_hook_attr[attr_name[5:]] = getattr(wrap_tensor.HOOKTensor, attr_name) - - self.store_ori_attr(torch, get_torch_ops(), self.torch_ori_attr) - wrap_torch.wrap_torch_ops_and_bind(hook) - for attr_name in dir(wrap_torch.HOOKTorchOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.torch_hook_attr[attr_name[5:]] = getattr(wrap_torch.HOOKTorchOP, attr_name) - - self.store_ori_attr(torch.nn.functional, get_functional_ops(), self.functional_ori_attr) - wrap_functional.wrap_functional_ops_and_bind(hook) - for attr_name in dir(wrap_functional.HOOKFunctionalOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.functional_hook_attr[attr_name[5:]] = getattr(wrap_functional.HOOKFunctionalOP, attr_name) - - self.store_ori_attr(dist, get_distributed_ops(), self.distributed_ori_attr) - wrap_distributed.wrap_distributed_ops_and_bind(hook) - if not is_gpu and not torch_without_guard_version: - self.store_ori_attr(torch_npu.distributed, npu_distributed_api, self.npu_distributed_ori_attr) - for attr_name in dir(wrap_distributed.HOOKDistributedOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.distributed_hook_attr[attr_name[5:]] = getattr(wrap_distributed.HOOKDistributedOP, attr_name) - if not is_gpu and not torch_without_guard_version and attr_name[5:] in npu_distributed_api: - self.npu_distributed_hook_attr[attr_name[5:]] = getattr(wrap_distributed.HOOKDistributedOP, - attr_name) - - if torch_version_above_2 and not online_run_ut: - self.store_ori_attr(torch.ops.aten, get_aten_ops(), self.aten_ori_attr) - wrap_aten.wrap_aten_ops_and_bind(hook) - for attr_name in dir(wrap_aten.HOOKAtenOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.aten_hook_attr[attr_name[5:]] = getattr(wrap_aten.HOOKAtenOP, attr_name) - - self.store_ori_attr(torch._VF, get_vf_ops(), self.vf_ori_attr) - wrap_vf.wrap_vf_ops_and_bind(hook) - for attr_name in dir(wrap_vf.HOOKVfOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.vf_hook_attr[attr_name[5:]] = getattr(wrap_vf.HOOKVfOP, attr_name) - - if not is_gpu: - self.store_ori_attr(torch_npu, get_npu_ops(), self.torch_npu_ori_attr) - wrap_npu_custom.wrap_npu_ops_and_bind(hook) - for attr_name in dir(wrap_npu_custom.HOOKNpuOP): - if attr_name.startswith(Const.ATTR_NAME_PREFIX): - self.torch_npu_hook_attr[attr_name[5:]] = getattr(wrap_npu_custom.HOOKNpuOP, attr_name) - - -api_register = ApiRegistry() diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/hook_module.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/hook_module.py index b59d4be82f2b55326c2a1d6a8a9e127a8470bff6..dccf9c7a9221990eb5ec3829544368ede1297b2c 100644 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/hook_module.py +++ b/debug/accuracy_tools/msprobe/pytorch/hook_module/hook_module.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. # All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -21,6 +21,8 @@ import torch import torch.nn as nn import torch.utils.hooks as full_hooks +from msprobe.pytorch.common.utils import is_float8_tensor + torch_version_above_or_equal_2 = torch.__version__.split('+')[0] >= '2.0' @@ -28,28 +30,27 @@ class HOOKModule(nn.Module): module_count = defaultdict(int) inner_stop_hook = {} - def __init__(self, build_hook) -> None: + def __init__(self, hook_build_func) -> None: super(HOOKModule, self).__init__() self.has_overflow = False - self.prefix = "" self.current_thread = threading.current_thread().ident if self.current_thread not in HOOKModule.inner_stop_hook: HOOKModule.inner_stop_hook[self.current_thread] = False self.stop_hook = HOOKModule.inner_stop_hook.get(self.current_thread, False) if not self.stop_hook: - if hasattr(self, "prefix_op_name_"): - self.prefix = self.prefix_op_name_ - self.forward_data_collected = False - forward_pre_hook, forward_hook, backward_hook, _ = build_hook(self.prefix) - if torch_version_above_or_equal_2: - self.register_forward_pre_hook(forward_pre_hook, with_kwargs=True) - self.register_forward_hook(forward_hook, with_kwargs=True) - else: - self.register_forward_pre_hook(forward_pre_hook) - self.register_forward_hook(forward_hook) - self.register_backward_hook(backward_hook) + + prefix = self.prefix_api_name if hasattr(self, "prefix_api_name") else "" + if callable(hook_build_func): + forward_pre_hook, forward_hook, backward_hook, _ = hook_build_func(prefix) + if torch_version_above_or_equal_2: + self.register_forward_pre_hook(forward_pre_hook, with_kwargs=True) + self.register_forward_hook(forward_hook, with_kwargs=True) + else: + self.register_forward_pre_hook(forward_pre_hook) + self.register_forward_hook(forward_hook) + self.register_backward_hook(backward_hook) def __call__(self, *args, **kwargs): changed = False @@ -111,6 +112,10 @@ class HOOKModule(nn.Module): return result else: return result + + if is_float8_tensor(var) or not (var.requires_grad and torch.is_grad_enabled()): + return result + grad_fn = var.grad_fn if grad_fn is not None: for hook in non_full_backward_hooks: diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/register_optimizer_hook.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/register_optimizer_hook.py index 75be9fc4532ea5863ed3daad569c062c4ccb91ba..b4f9a5f50639752e8094b38961ef600cc6d7b101 100644 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/register_optimizer_hook.py +++ b/debug/accuracy_tools/msprobe/pytorch/hook_module/register_optimizer_hook.py @@ -32,8 +32,9 @@ def register_optimizer_hook(data_collector): def patch_clip_grad(func): def wrapper(*args, **kwargs): data_collector.optimizer_status = Const.CLIP_GRAD - func(*args, **kwargs) + result = func(*args, **kwargs) data_collector.optimizer_status = Const.END_PREFIX + Const.CLIP_GRAD + return result return wrapper diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml b/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml index 4bc22f51ceb5497f307fb4ac3226c8c590ea459a..6be86e0dfbc26237320c50012e1a73cbcdd5f80a 100644 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml +++ b/debug/accuracy_tools/msprobe/pytorch/hook_module/support_wrap_ops.yaml @@ -151,7 +151,6 @@ tensor: - __eq__ - __ge__ - __gt__ - - __getitem__ - __iadd__ - __iand__ - __idiv__ @@ -1912,4 +1911,29 @@ distributed: - all_to_all - all_gather_into_tensor - reduce_scatter_tensor - - batch_isend_irecv \ No newline at end of file + - batch_isend_irecv + +npu_distributed: + - isend + - irecv + +mindspeed: + - dropout_add_layer_norm.npu_dropout_add_layer_norm + - npu_rotary_position_embedding.npu_rotary_position_embedding + - fusion_attention_v2.npu_fusion_attention + - npu_mm_all_reduce_add_rms_norm.npu_mm_all_reduce_add_rms_norm + - npu_mm_all_reduce_add_rms_norm_.npu_mm_all_reduce_add_rms_norm_ + - gmm.npu_gmm + - gmm.npu_gmm_v2 + - npu_grouped_mat_mul_all_reduce.npu_grouped_mat_mul_all_reduce + - ffn.npu_ffn + - npu_moe_token_permute.npu_moe_token_permute + - npu_moe_token_unpermute.npu_moe_token_unpermute + - npu_ring_attention_update.npu_ring_attention_update + - npu_matmul_add.npu_matmul_add_fp32 + - npu_groupmatmul_add.npu_groupmatmul_add_fp32 + - npu_all_to_all_all_gather_bmm.npu_all_to_all_all_gather_bmm + - npu_bmm_reduce_scatter_all_to_all.npu_bmm_reduce_scatter_all_to_all + - quant_gmm.npu_quant_gmm + - quant_gmm.npu_quant_gmm_v2 + - npu_apply_fused_ema_adamw.npu_apply_fused_ema_adamw \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/utils.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/utils.py index 41869403a547fc526ec422ecbb123af18ff81a39..0992caf0a41e2bc203ca9793295e94261ca48b95 100644 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/utils.py +++ b/debug/accuracy_tools/msprobe/pytorch/hook_module/utils.py @@ -14,7 +14,12 @@ # limitations under the License. import os +import importlib +import inspect + +from msprobe.core.common.const import Const from msprobe.core.common.file_utils import load_yaml +from msprobe.core.common.log import logger def get_ops(): @@ -26,3 +31,24 @@ def get_ops(): wrap_torch = ops.get('torch') wrap_npu_ops = ops.get('torch_npu') return set(wrap_functional) | set(wrap_tensor) | set(wrap_torch) | set(wrap_npu_ops) + + +def dynamic_import_op(package): + package_name = package.__name__ + ops = {} + ops_dir, _ = os.path.split(package.__file__) + for file_name in os.listdir(ops_dir): + if file_name.endswith(Const.PY_SUFFIX) and file_name != Const.INIT_PY: + sub_module_name = file_name[:-3] + module_name = f"{package_name}.{sub_module_name}" + try: + module = importlib.import_module(module_name) + except Exception as e: + logger.warning(f"import {module_name} failed!") + continue + + func_members = inspect.getmembers(module, inspect.isfunction) + for func_member in func_members: + func_name, func = func_member[0], func_member[1] + ops[f"{sub_module_name}.{func_name}"] = func + return ops diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_distributed.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_distributed.py deleted file mode 100644 index 1cd11842c31bacdad7c1bb90f98ac81c3415a40e..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_distributed.py +++ /dev/null @@ -1,79 +0,0 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from functools import wraps -import torch.distributed as dist - -from msprobe.pytorch.hook_module.hook_module import HOOKModule -from msprobe.pytorch.common.utils import torch_device_guard -from msprobe.core.common.const import Const -from msprobe.core.common.file_utils import load_yaml - - -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, "support_wrap_ops.yaml") - - -distributed_func = {} -for f in dir(dist): - distributed_func[f] = getattr(dist, f) - - -def get_distributed_ops(): - _all_distributed_ops = dir(dist) - yaml_data = load_yaml(yaml_path) - wrap_distributed_ops = yaml_data.get('distributed') - return set(wrap_distributed_ops) & set(_all_distributed_ops) - - -class HOOKDistributedOP(object): - pass - - -class DistributedOPTemplate(HOOKModule): - def __init__(self, op_name, build_hook): - self.op_name_ = op_name - self.prefix_op_name_ = "Distributed" + Const.SEP + str(op_name) + Const.SEP - super().__init__(build_hook) - if not self.stop_hook: - self.op_is_distributed = True - - @torch_device_guard - def forward(self, *args, **kwargs): - handle = distributed_func.get(self.op_name_)(*args, **kwargs) - if kwargs.get("async_op") or self.op_name_ in ["isend", "irecv"]: - if handle and hasattr(handle, 'wait'): - handle.wait() - if self.op_name_ == "batch_isend_irecv": - if isinstance(handle, list): - for req in handle: - req.wait() - return handle - - -def wrap_distributed_op(op_name, hook): - @wraps(DistributedOPTemplate) - def distributed_op_template(*args, **kwargs): - return DistributedOPTemplate(op_name, hook)(*args, **kwargs) - - distributed_op_template.__name__ = op_name - return distributed_op_template - - -def wrap_distributed_ops_and_bind(hook): - _distributed_ops = get_distributed_ops() - for op_name in _distributed_ops: - setattr(HOOKDistributedOP, "wrap_" + str(op_name), wrap_distributed_op(op_name, hook)) diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_functional.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_functional.py deleted file mode 100644 index 6164169476dab66ac2bdb8d0cbc41a04ddce6713..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_functional.py +++ /dev/null @@ -1,66 +0,0 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import torch - -from msprobe.pytorch.hook_module.hook_module import HOOKModule -from msprobe.pytorch.common.utils import torch_device_guard -from msprobe.core.common.const import Const -from msprobe.pytorch.common.log import logger -from msprobe.core.common.file_utils import load_yaml - - -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, "support_wrap_ops.yaml") - - -def get_functional_ops(): - yaml_data = load_yaml(yaml_path) - wrap_functional_ops = yaml_data.get('functional') - _all_functional_ops = dir(torch.nn.functional) - return set(wrap_functional_ops) & set(_all_functional_ops) - - -TorchFunctions = {func: getattr(torch.nn.functional, func) for func in get_functional_ops()} - - -class HOOKFunctionalOP(object): - pass - - -class FunctionalOPTemplate(HOOKModule): - def __init__(self, op_name, hook, need_hook=True): - self.op_name_ = op_name - self.prefix_op_name_ = "Functional" + Const.SEP + str(op_name) + Const.SEP - if need_hook: - super().__init__(hook) - - @torch_device_guard - def forward(self, *args, **kwargs): - return TorchFunctions[str(self.op_name_)](*args, **kwargs) - - -def wrap_functional_op(op_name, hook): - def functional_op_template(*args, **kwargs): - return FunctionalOPTemplate(op_name, hook)(*args, **kwargs) - - return functional_op_template - - -def wrap_functional_ops_and_bind(hook): - _functional_ops = get_functional_ops() - for op_name in _functional_ops: - setattr(HOOKFunctionalOP, "wrap_" + op_name, wrap_functional_op(op_name, hook)) diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_npu_custom.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_npu_custom.py deleted file mode 100644 index 1c0afc59f50c069fbcd7e9a546c5b57c467400a9..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_npu_custom.py +++ /dev/null @@ -1,85 +0,0 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import torch - -from msprobe.pytorch.hook_module.hook_module import HOOKModule -from msprobe.pytorch.common.utils import torch_device_guard, torch_without_guard_version -from msprobe.core.common.const import Const -from msprobe.core.common.log import logger -from msprobe.core.common.file_utils import load_yaml -from msprobe.pytorch.function_factory import npu_custom_functions - -try: - import torch_npu -except ImportError: - logger.info("Failing to import torch_npu.") - - -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, "support_wrap_ops.yaml") -cuda_func_mapping = {"npu_fusion_attention" : "gpu_fusion_attention"} - - -def get_npu_ops(): - if torch_without_guard_version: - _npu_ops = dir(torch.ops.npu) - else: - _npu_ops = dir(torch_npu._C._VariableFunctionsClass) - yaml_data = load_yaml(yaml_path) - wrap_npu_ops = yaml_data.get('torch_npu') - return set(wrap_npu_ops) & set(_npu_ops) - - -class HOOKNpuOP(object): - pass - - -class NpuOPTemplate(HOOKModule): - - def __init__(self, op_name, hook, need_hook=True, device=Const.CPU_LOWERCASE): - self.op_name_ = op_name - self.prefix_op_name_ = "NPU" + Const.SEP + str(op_name) + Const.SEP - self.need_hook = need_hook - self.device = device - if need_hook: - super().__init__(hook) - - @torch_device_guard - def forward(self, *args, **kwargs): - if not self.need_hook: - if self.op_name_ not in npu_custom_functions: - raise Exception(f'There is not bench function {self.op_name_}') - if self.device == Const.CUDA_LOWERCASE: - self.op_name_ = cuda_func_mapping.get(self.op_name_, self.op_name_) - if self.device in [Const.CUDA_LOWERCASE, Const.CPU_LOWERCASE]: - return npu_custom_functions[self.op_name_](*args, **kwargs) - if torch_without_guard_version: - return getattr(torch.ops.npu, str(self.op_name_))(*args, **kwargs) - else: - return getattr(torch_npu._C._VariableFunctionsClass, str(self.op_name_))(*args, **kwargs) - - -def wrap_npu_op(op_name, hook): - def npu_op_template(*args, **kwargs): - return NpuOPTemplate(op_name, hook)(*args, **kwargs) - return npu_op_template - - -def wrap_npu_ops_and_bind(hook): - _npu_ops = get_npu_ops() - for op_name in _npu_ops: - setattr(HOOKNpuOP, "wrap_" + str(op_name), wrap_npu_op(op_name, hook)) diff --git a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_torch.py b/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_torch.py deleted file mode 100644 index fc9d61c206bcfaeda7fefb5cb8b90fda2d67cb16..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/pytorch/hook_module/wrap_torch.py +++ /dev/null @@ -1,84 +0,0 @@ -# Copyright (c) 2024-2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import torch - -from msprobe.pytorch.hook_module.hook_module import HOOKModule -from msprobe.pytorch.common.utils import torch_device_guard -from msprobe.core.common.const import Const -from msprobe.core.common.file_utils import load_yaml - - -cur_path = os.path.dirname(os.path.realpath(__file__)) -yaml_path = os.path.join(cur_path, "support_wrap_ops.yaml") - - -def get_torch_ops(): - _torch_ops = [] - yaml_data = load_yaml(yaml_path) - wrap_torch_ops = yaml_data.get('torch') - for operation in wrap_torch_ops: - if '.' in operation: - operation_sub_module_name, operation_sub_op = operation.rsplit('.', 1) - operation_sub_module = getattr(torch, operation_sub_module_name) - if operation_sub_op in dir(operation_sub_module): - _torch_ops.append(operation) - else: - if hasattr(torch, operation): - _torch_ops.append(operation) - return set(_torch_ops) - - -TorchOps = {} -for op in get_torch_ops(): - if '.' in op: - sub_module_name, sub_op = op.rsplit('.', 1) - sub_module = getattr(torch, sub_module_name) - TorchOps[op] = getattr(sub_module, sub_op) - else: - TorchOps[op] = getattr(torch, op) - - - -class HOOKTorchOP(object): - pass - - -class TorchOPTemplate(HOOKModule): - - def __init__(self, op_name, hook, need_hook=True): - self.op_name_ = op_name - self.prefix_op_name_ = "Torch" + Const.SEP + str(op_name) + Const.SEP - if need_hook: - super().__init__(hook) - - @torch_device_guard - def forward(self, *args, **kwargs): - return TorchOps[str(self.op_name_)](*args, **kwargs) - - -def wrap_torch_op(op_name, hook): - - def torch_op_template(*args, **kwargs): - return TorchOPTemplate(op_name, hook)(*args, **kwargs) - - return torch_op_template - - -def wrap_torch_ops_and_bind(hook): - _torch_ops = get_torch_ops() - for op_name in _torch_ops: - setattr(HOOKTorchOP, "wrap_" + op_name, wrap_torch_op(op_name, hook)) diff --git a/debug/accuracy_tools/msprobe/pytorch/monitor/distributed/wrap_distributed.py b/debug/accuracy_tools/msprobe/pytorch/monitor/distributed/wrap_distributed.py index b2fa26a58e702120fcabd5d82f8e1e0ed27f3bc4..e94763e4787fe4e5b4826c8f706849ff3e7a4a58 100644 --- a/debug/accuracy_tools/msprobe/pytorch/monitor/distributed/wrap_distributed.py +++ b/debug/accuracy_tools/msprobe/pytorch/monitor/distributed/wrap_distributed.py @@ -24,6 +24,7 @@ import torch.nn as nn from msprobe.core.common.const import MonitorConst from msprobe.core.common.file_utils import load_yaml from msprobe.pytorch.monitor.module_metric import get_metrics, get_summary_writer_tag_name +from msprobe.pytorch.common.log import logger try: import torch_npu @@ -37,6 +38,7 @@ WrapDistributedOps = load_yaml(OpsPath).get("distributed", []) StackBlackListPath = os.path.join(os.path.dirname(__file__), "stack_blacklist.yaml") StackBlackList = load_yaml(StackBlackListPath).get("stack", []) +MAX_STRING_LENGTH = 1000 distributed_func = {} for f in dir(dist): @@ -106,7 +108,6 @@ class ApiRegistry: if args[0] in PENDING_ASYNC_CC_BY_HANDLE: store_func = PENDING_ASYNC_CC_BY_HANDLE.pop(args[0]) store_func() - return wrapped_wait dist.Work.wait = wrapped_wait(dist.Work) @@ -139,6 +140,8 @@ def get_process_group(process_group): def stack_filter(stack): + if len(stack) > MAX_STRING_LENGTH: + logger.warning(f'The character strin contains more than {MAX_STRING_LENGTH}. re match is skipped.') for pattern in StackBlackList: if re.search(pattern, stack): return False @@ -188,10 +191,12 @@ def update_data(old, new): def is_target_line(codeline): - stack = get_callstack() - whole_stack = ';'.join(stack) if codeline == []: return True + stack = get_callstack() + whole_stack = ';'.join(stack) + if len(whole_stack) > MAX_STRING_LENGTH: + logger.warning(f'The character strin contains more than {MAX_STRING_LENGTH}. re match is skipped.') for pattern in codeline: if re.search(pattern, whole_stack): return True @@ -267,7 +272,7 @@ def create_hooks(context, monitor): RANK = dist.get_rank() if dist.is_initialized() and RANK not in monitor.module_rank_list and monitor.module_rank_list != []: return [pre_hooks, hooks] - + if monitor.cc_log_only: pre_hooks.append(cc_log_hook) return [pre_hooks, hooks] diff --git a/debug/accuracy_tools/msprobe/pytorch/monitor/features.py b/debug/accuracy_tools/msprobe/pytorch/monitor/features.py index 81c029d401f9194688d332ac711d6065f126ce6a..a6cadb25c2c1e374a299b316ecfb72fa1ecd08bf 100644 --- a/debug/accuracy_tools/msprobe/pytorch/monitor/features.py +++ b/debug/accuracy_tools/msprobe/pytorch/monitor/features.py @@ -33,6 +33,10 @@ def get_mean(x: torch.tensor): return torch.mean(x.to(torch.float64)) +@torch.no_grad() +def get_mean(x: torch.tensor): + return torch.mean(x) + @torch.no_grad() def get_norm(x: torch.tensor): return torch.norm(x.to(torch.float64), p=2) @@ -42,7 +46,6 @@ def get_norm(x: torch.tensor): def get_max(x: torch.tensor): return torch.max(x) - @torch.no_grad() def get_zeros(x: torch.tensor, eps: float): return torch.sum(torch.abs(x) < eps) / x.numel() diff --git a/debug/accuracy_tools/msprobe/pytorch/monitor/module_hook.py b/debug/accuracy_tools/msprobe/pytorch/monitor/module_hook.py index d0285564d3cb5c00b69933db3259b7c3339c443d..2db2a9712566e41406ff9e82d1e4e8cff4b32ef9 100644 --- a/debug/accuracy_tools/msprobe/pytorch/monitor/module_hook.py +++ b/debug/accuracy_tools/msprobe/pytorch/monitor/module_hook.py @@ -26,6 +26,7 @@ from torch.utils.hooks import BackwardHook from msprobe.core.common.const import MonitorConst, Const from msprobe.core.common.file_utils import load_json, save_json +from msprobe.core.common.utils import recursion_depth_decorator from msprobe.pytorch.common.log import logger from msprobe.pytorch.common.utils import is_recomputation from msprobe.pytorch.monitor.anomaly_analyse import AnomalyDataWriter @@ -735,6 +736,7 @@ class TrainerMon: logger.info_on_rank_0(f"> {hooked_count} modules are monitored.") + @recursion_depth_decorator('msprobe.pytorch.monitor.clone_if_tensor') def clone_if_tensor(args): if isinstance(args, tuple): return tuple([clone_if_tensor(arg) for arg in args]) @@ -1066,7 +1068,7 @@ class TrainerMon: self.enable_megatron = True logger.info("megatron version is > core_r0.8.0 <= core_r0.9.0") except ImportError: - self.enable_megatron = False + self.enable_megatron = False | self.enable_megatron if not self.enable_megatron: self._hook_weights() diff --git a/debug/accuracy_tools/msprobe/pytorch/monitor/optimizer_collect.py b/debug/accuracy_tools/msprobe/pytorch/monitor/optimizer_collect.py index 602514836d2531ad4a6be3a23f56bc3b942ba199..df2c9d1c407646cd863d633666389ffd9597587e 100644 --- a/debug/accuracy_tools/msprobe/pytorch/monitor/optimizer_collect.py +++ b/debug/accuracy_tools/msprobe/pytorch/monitor/optimizer_collect.py @@ -185,7 +185,7 @@ class MegatronChainedDistributedOptimizerMon(MegatronDistributedOptimizerMon): for opt in torch_opt.chained_optimizers: self.map_fp16_tp_fp32_param(opt) - if not isinstance(torch_opt, torch.optim.Optimizer): + if not isinstance(torch_opt, torch.optim.Optimizer) and not hasattr(torch_opt, 'state'): torch_opt.state = {} for opt in torch_opt.chained_optimizers: torch_opt.state.update(opt.optimizer.state) @@ -198,7 +198,7 @@ class MegatronChainedMixPrecisionOptimizerMon(MixPrecisionOptimizerMon): for opt in torch_opt.chained_optimizers: self.map_fp16_tp_fp32_param(opt) - if not isinstance(torch_opt, torch.optim.Optimizer): + if not isinstance(torch_opt, torch.optim.Optimizer) and not hasattr(torch_opt, 'state'): torch_opt.state = {} for opt in torch_opt.chained_optimizers: torch_opt.state.update(opt.optimizer.state) @@ -206,8 +206,59 @@ class MegatronChainedMixPrecisionOptimizerMon(MixPrecisionOptimizerMon): class DeepSpeedZeroOptimizerStage0Mon(OptimizerMon): - def fetch_mv(self, monitor, torch_opt, params2name): - return self._fetch_mv_in_adam(monitor, torch_opt, params2name) + def get_group_index(self, torch_opt): + bit16_groups = torch_opt.bf16_groups + param2group = defaultdict() + for group_idx, bit16_group in enumerate(bit16_groups): + for param in bit16_group: + param2group[param] = group_idx + return param2group + + def fetch_mv(self, monitor, torch_opt, params2name, name2indices=None): + param2group = self.get_group_index(torch_opt) + exp_avg_dict = defaultdict(float) + exp_avg_sq_dict = defaultdict(float) + update_dict = defaultdict() + ratio_dict = defaultdict() + + param_slice_mappings = torch_opt.state_dict()['param_slice_mappings'] + for param, name in params2name.items(): + group_idx = param2group[param] + state = torch_opt.state[torch_opt.fp32_groups_flat_partition[group_idx]] + if state.get('exp_avg', None) is None: + logger.warning(f"optimizer state is None. Something is wrong if this is not the first step") + break + param_slice_mapping = param_slice_mappings[group_idx] + hp_address = param_slice_mapping.get(torch_opt.param_names[param]) + if hp_address is None: + continue + start = hp_address.start + numel = hp_address.numel + + if monitor.mv_distribution: + exp_avg_dict[name] = state['exp_avg'].narrow(0, start, numel) + exp_avg_sq_dict[name] = state['exp_avg_sq'].narrow(0, start, numel) + if monitor.mg_direction: + exp_avg_dict[name] = state['exp'].narrow(0, start, numel) + if monitor.ur_distribution: + if len(torch_opt.param_groups) > 1: + logger.info(f"the length of torch_opt.param_groups is {len(torch_opt.param_groups)}.") + if 'step' in state: + step = state['step'] # Optimizer from pytorch or FusedAdam from apex(used by megatron) + elif 'step' in torch_opt.param_groups[0]: + step = torch_opt.param_groups[0]['step'] # AdamW from mindspeed + else: + logger.warning(f"step of {name} is None, maybe something wrong happened.") + continue + exp_avg = state['exp_avg'].narrow(0, start, numel) + exp_avg_sq = state['exp_avg_sq'].narrow(0, start, numel) + exp_avg_hat = exp_avg / (1 - torch_opt.defaults['betas'][0] ** step) + exp_avg_sq_hat = exp_avg_sq / (1 - torch_opt.defaults['betas'][1] ** step) + update_dict[name] = exp_avg_hat / (torch.sqrt(exp_avg_sq_hat) + torch_opt.defaults['eps']) + ratio_dict[name] = exp_avg_hat / torch.sqrt(exp_avg_sq_hat) + monitor.update_heatmap_visualizer[name].pre_cal(update_dict[name]) + monitor.ratio_heatmap_visualizer[name].pre_cal(ratio_dict[name]) + return MVResult(exp_avg=exp_avg_dict, exp_avg_sq=exp_avg_sq_dict, update=update_dict, ratio=ratio_dict) class DeepSpeedZeroOptimizerStage3Mon(OptimizerMon): diff --git a/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyze_dump_graph.py b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyze_dump_graph.py new file mode 100644 index 0000000000000000000000000000000000000000..9a5f80205371599f7e43ac5dae8880eea56b233e --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyze_dump_graph.py @@ -0,0 +1,337 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Dict, List, Set, Optional, Tuple, Callable +from enum import Enum +from dataclasses import dataclass +from collections import defaultdict, deque + +from msprobe.core.common.log import logger +from msprobe.pytorch.nan_analyse.api_info import APIInfo +from msprobe.pytorch.nan_analyse.pre_process_dump_data import process_on_all_ranks + + +class NodeType(Enum): + COMPUTE = "compute" + SEND = "send" + RECV = "recv" + COLLECTIVE = "collective" + + +class EdgeType(Enum): + SEQUENTIAL = "sequential" + COMMUNICATION = "communication" + + +@dataclass +class Node: + node_id: str # (rank_id:api_name) + rank: int + api_info: APIInfo + node_type: NodeType + + def __hash__(self): + return hash(self.node_id) + + def __eq__(self, other): + return isinstance(other, Node) and self.node_id == other.node_id + + def __str__(self): + return self.node_id + + +class Edge: + def __init__(self, src: Node, dst: Node, edge_type: EdgeType = EdgeType.SEQUENTIAL): + self.src = src + self.dst = dst + self.edge_type = edge_type + self.edge_id = self.__generate_edge_name() + + def __generate_edge_name(self): + return f'{self.src.node_id}_{self.dst.node_id}' + + +class DistributedComputeGraph: + def __init__(self): + self.nodes: Dict[str, Node] = {} + self.edges: Dict[str, Edge] = {} + self.adj_list: Dict[Node, List[Node]] = defaultdict(list) + self.rank_to_nodes: Dict[int, List[Node]] = {} + # 添加入度统计 + self.in_degrees: Dict[Node, int] = defaultdict(int) + + def add_node(self, node: Node): + self.nodes[node.node_id] = node + if not self.rank_to_nodes.get(node.rank): + self.rank_to_nodes[node.rank] = [] + self.rank_to_nodes[node.rank].append(node) + + def add_edge(self, src: Node, dst: Node, edge_type: EdgeType = EdgeType.SEQUENTIAL): + edge = Edge(src, dst, edge_type) + # 边去重 + if self.edges.get(edge.edge_id): + return + self.edges[edge.edge_id] = edge + self.adj_list[src].append(dst) + # 更新入度 + self.in_degrees[dst] += 1 + + def get_node(self, node_id: str) -> Optional[Node]: + return self.nodes.get(node_id) + + def get_nodes_by_rank(self, rank_id: int) -> List[Node]: + return self.rank_to_nodes.get(rank_id, []) + + def get_start_nodes(self) -> List[Node]: + """获取所有入度为0的节点或者每个rank上首个节点""" + start_nodes = [node for node in self.nodes.values() if self.in_degrees[node] == 0] + if not start_nodes: + return self._get_first_nodes() + return start_nodes + + def _get_first_nodes(self): + first_nodes = [] + for rank in self.rank_to_nodes.keys(): + first_nodes.extend(self.__get_first_node_by_rank(rank)) + return first_nodes + + def __get_first_node_by_rank(self, rank): + nodes = self.rank_to_nodes.get(rank, []) + if not nodes: + return [] + return nodes[:1] + + +class GraphBuilder: + @staticmethod + def create_node(rank: int, api_info: APIInfo) -> Node: + node_id = f"{rank}:{api_info.api_name}" + + if api_info.is_communication_op: + if "send" in api_info.api_name.lower(): + node_type = NodeType.SEND + elif "recv" in api_info.api_name.lower(): + node_type = NodeType.RECV + else: + node_type = NodeType.COLLECTIVE + else: + node_type = NodeType.COMPUTE + + return Node(node_id, rank, api_info, node_type) + + @staticmethod + def build_graph(rank_ops_data: Dict[int, Dict]) -> DistributedComputeGraph: + graph = DistributedComputeGraph() + + # Step 1: Create all nodes + rank_nodes: Dict[int, List[Node]] = {} + for rank, ops in rank_ops_data.items(): + rank_nodes[rank] = [] + for _, api_info in ops.items(): + node = GraphBuilder.create_node(rank, api_info) + graph.add_node(node) + rank_nodes[rank].append(node) + + # Step 2: Connect sequential operations within each rank + for _, nodes in rank_nodes.items(): + for i in range(len(nodes) - 1): + graph.add_edge(nodes[i], nodes[i + 1], EdgeType.SEQUENTIAL) + + # Step 3: Connect communication operations between ranks + GraphBuilder._connect_p2p_operations(graph, rank_nodes) + GraphBuilder._connect_collective_operations(graph, rank_nodes) + + return graph + + @staticmethod + def _connect_p2p_operations(graph: DistributedComputeGraph, rank_nodes: Dict[int, List[Node]]): + match_list = [] + + for nodes in rank_nodes.values(): + match_list.extend(node for node in nodes if node.node_type in (NodeType.SEND, NodeType.RECV)) + + for node in match_list: + if not node.api_info.pg: + continue + + for rank in node.api_info.pg: + if rank == node.api_info.cur_rank: + continue + + for candi_node in graph.get_nodes_by_rank(rank): + if GraphBuilder._match_comm_ops(node, candi_node): + graph.add_edge(node, candi_node, EdgeType.COMMUNICATION) + break + + @staticmethod + def _connect_collective_operations(graph: DistributedComputeGraph, rank_nodes: Dict[int, List[Node]]): + collective_groups: Dict[str, List[Node]] = defaultdict(list) + + # Group collective operations by their process group + for nodes in rank_nodes.values(): + for node in nodes: + if node.node_type == NodeType.COLLECTIVE: + group_key = GraphBuilder._get_process_group_key(node.api_info) + collective_groups[group_key].append(node) + + # Connect nodes in the same collective operation + for group in collective_groups.values(): + for i, node_i in enumerate(group): + for j, node_j in enumerate(group): + if i >= j: + continue + graph.add_edge(node_i, node_j, EdgeType.COMMUNICATION) + graph.add_edge(node_j, node_i, EdgeType.COMMUNICATION) # Bidirectional for collectives + + @staticmethod + def _match_comm_ops(no1: Node, no2: Node) -> bool: + return no1.api_info == no2.api_info + + @staticmethod + def _get_process_group_key(api_info: APIInfo) -> str: + return api_info.process_group_id + + +class SortStrategy(Enum): + CALL_INDEX = "call_index" + RANK = "rank" + API_NAME = "api_name" + + +class GraphTraversal: + + @staticmethod + def sort_levels(levels: List[List[Node]], strategy: SortStrategy = SortStrategy.CALL_INDEX) -> List[List[Node]]: + """ + 对每一层的节点进行排序 + Args: + levels: 层次遍历的结果 + strategy: 排序策略 + Returns: + sorted_levels: 排序后的层次结果 + """ + sort_key = GraphTraversal._get_sort_key(strategy) + return [sorted(level, key=sort_key) for level in levels] + + @staticmethod + def bfs_by_level(graph: DistributedComputeGraph) -> List[List[Node]]: + """ + 使用BFS进行层次遍历,返回每一层的节点列表 + Args: + graph: 分布式计算图 + Returns: + levels: 每一层节点的列表的列表 + """ + start_nodes = graph.get_start_nodes() + if not start_nodes: + return [[]] + + # 记录已访问的节点和它们所在的层级 + visited = {} # 节点 -> 层级的映射 + current_level = 0 + levels = [[]] # 初始层包含起始节点 + queue = deque() # (节点, 层级)的队列 + + for n in start_nodes: + visited[n] = 0 + levels[0].append(n) + queue.append((n, 0)) + + while queue: + node, level = queue.popleft() + + # 如果遇到新的层级,创建新的层级列表 + if level > current_level: + current_level = level + levels.append([]) + + # 遍历邻接节点 + for neighbor in graph.adj_list[node]: + # 如果邻接节点未访问过,或者在更深的层级遇到了它 + if neighbor not in visited or visited[neighbor] > level + 1: + visited[neighbor] = level + 1 + queue.append((neighbor, level + 1)) + # 将节点添加到对应层级的列表中 + if len(levels) <= level + 1: + levels.append([]) + if neighbor not in levels[level + 1]: + levels[level + 1].append(neighbor) + + return levels + + @staticmethod + def get_node_info(node: Node) -> str: + """ + 获取节点的详细信息,用于调试和打印 + """ + return (f"Node(id={node.node_id}, rank={node.rank}, call_index={node.api_info.call_index}, " + f"type={node.node_type.value})") + + @staticmethod + def print_levels_info(levels: List[List[Node]]): + """ + 打印每一层的节点信息 + """ + logger.info("Level visit results:") + for i, level in enumerate(levels): + logger.info(f"level {i}:") + for node in level: + logger.info(f"node: {GraphTraversal.get_node_info(node)}") + + @staticmethod + def print_cycles_info(cycles: Set[Tuple[Node, Node]]): + """ + 打印检测到的环信息 + """ + logger.info("\n检测到的环:") + for source, target in cycles: + logger.info(f"环: {GraphTraversal.get_node_info(source)} -> {GraphTraversal.get_node_info(target)}") + + @staticmethod + def _get_sort_key(strategy: SortStrategy) -> Callable[[Node], any]: + """Get the sort key function based on the sorting strategy""" + if strategy == SortStrategy.CALL_INDEX: + return lambda node: (node.api_info.call_index, node.rank) + elif strategy == SortStrategy.RANK: + return lambda node: node.rank + elif strategy == SortStrategy.API_NAME: + return lambda node: node.api_info.api_name + else: + return lambda node: node.api_info.call_index # Default to call_index + + +def traverse_graph(graph: DistributedComputeGraph, sort_strategy: SortStrategy = SortStrategy.CALL_INDEX): + levels, cycles = GraphTraversal.bfs_by_level(graph), set() + sorted_levels = GraphTraversal.sort_levels(levels, sort_strategy) + + GraphTraversal.print_levels_info(sorted_levels) + GraphTraversal.print_cycles_info(cycles) + + return levels, cycles + + +def main(): + file_path = 'test_data/all_reduce_data' + # Load your data as before + data = process_on_all_ranks(file_path) + + # Build the graph + graph = GraphBuilder.build_graph(data) + + # Traverse the graph + _, _ = traverse_graph(graph) + + +if __name__ == '__main__': + main() diff --git a/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyze_pp_partition.py b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyze_pp_partition.py new file mode 100644 index 0000000000000000000000000000000000000000..59e6952ce6a16260f035289f2af42d0746912436 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyze_pp_partition.py @@ -0,0 +1,172 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from collections import defaultdict +from typing import Dict, List, Set, Optional + +from msprobe.core.common.log import logger +from msprobe.pytorch.nan_analyse.api_info import APIInfo +from msprobe.pytorch.nan_analyse.pre_process_dump_data import process_on_all_ranks +from msprobe.pytorch.nan_analyse.utils import singleton + + +MAX_RECURSIVE_DEPTH = 100 + + +def __is_send_op(op_name: str) -> bool: + if op_name.startswith('Distributed.') and 'send.' in op_name: + return True + return False + + +def __is_recv_op(op_name: str) -> bool: + if op_name.startswith('Distributed.') and 'recv.' in op_name: + return True + return False + + +def _is_first_send_op(op_name: str) -> bool: + if __is_send_op(op_name) and 'send.0' in op_name: + return True + return False + + +def _is_first_recv_op(op_name: str) -> bool: + if __is_recv_op(op_name) and 'recv.0' in op_name: + return True + return False + + +@singleton +class PPAnalyzer: + def __init__(self, rank_data: Dict[int, dict]): + # 初始化rank_to_data字典,rank_id --> dump_data + self.rank_to_data = rank_data + self.rank_to_stage = {} # 存储rank对应的pipeline stage + self.send_recv_pairs = defaultdict(list) # 存储send-recv配对信息 + + @staticmethod + def _find_start_ranks(rank_graph: Dict[int, Set[int]]) -> List[int]: + """找到没有入边的rank(pipeline的起始rank)""" + all_ranks = set(rank_graph.keys()) + target_ranks = set() + for ranks in rank_graph.values(): + target_ranks.update(ranks) + return list(all_ranks - target_ranks) + + @staticmethod + def _get_target_rank(op_info: APIInfo) -> Optional[int]: + """从send操作中提取目标rank""" + kwargs = op_info.input_kwargs + if 'dst' in kwargs: + return int(kwargs['dst'].get('value')) + return None + + @staticmethod + def _get_source_rank(op_info: APIInfo) -> Optional[int]: + """从recv操作中提取源rank""" + kwargs = op_info.input_kwargs + if 'src' in kwargs: + return kwargs['src'].get('value') + return None + + def get_pp_stages(self) -> Dict[int, List[int]]: + """获取每个stage包含的ranks""" + stages = defaultdict(list) + for rank, stage in self.rank_to_stage.items(): + stages[stage].append(rank) + return dict(stages) + + def analyze(self): + self.analyze_send_recv() + self.determine_pp_stages() + + def analyze_send_recv(self): + """分析所有rank的send和recv操作""" + rank_data = self.rank_to_data + for cur_rank, data in rank_data.items(): + self._analyze_cur_rank(cur_rank, data) + + def determine_pp_stages(self): + """确定每个rank属于哪个pipeline stage""" + # 构建rank之间的依赖关系图 + rank_graph = defaultdict(set) + for rank, pairs in self.send_recv_pairs.items(): + for op_type, other_rank in pairs: + if op_type == 'send': + rank_graph[rank].add(other_rank) + + # 没有send、recv操作,所有的rank属于同一个stage + if not rank_graph: + all_ranks = set(self.rank_to_data.keys()) + for rank in all_ranks: + self.rank_to_stage[rank] = 0 + return + + # 使用拓扑排序确定stage + visited = set() + + def dfs(rank_id: int, stage: int): + if stage >= MAX_RECURSIVE_DEPTH: + raise ValueError("Recursive depth exceeds the limit") + + if rank_id in visited: + return + visited.add(rank_id) + self.rank_to_stage[rank_id] = stage + + # 遍历所有下一个rank + for next_rank in rank_graph[rank_id]: + dfs(next_rank, stage + 1) + + # 找到起始rank(入度为0的节点)为首个PP stage + start_ranks = self._find_start_ranks(rank_graph) + for start_rank in start_ranks: + dfs(start_rank, 0) + + def _analyze_cur_rank(self, cur_rank: int, data: Dict[str, APIInfo]): + if not data: + return + + for op_name, op_info in data.items(): + if _is_first_send_op(op_name): + target_rank = self._get_target_rank(op_info) + if target_rank is None or target_rank < cur_rank: # 仅添加大于cur_rank的send操作,保证所有都是前向 + continue + self.send_recv_pairs[cur_rank].append(('send', target_rank)) + + # 不采集rcv的通信算子,仅仅从send数据分析,rcv算子用于做validation + elif _is_first_recv_op(op_name): + source_rank = self._get_source_rank(op_info) + if source_rank is None: + continue + + +def main(): + file_path = 'test_data/send_recv' + data = process_on_all_ranks(file_path) + + # 分析pp stage + analyzer = PPAnalyzer(data) + analyzer.analyze() + + pp_stages = analyzer.get_pp_stages() + + logger.info("Pipeline Parallel Stages:") + for stage, ranks in sorted(pp_stages.items()): + logger.info(f"Stage {stage}: Ranks {sorted(ranks)}") + + +if __name__ == "__main__": + main() diff --git a/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyzer.py b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyzer.py new file mode 100644 index 0000000000000000000000000000000000000000..9afe1dc16a674817c7d5fc066c4d2be81c1cc3d2 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/analyzer.py @@ -0,0 +1,65 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprobe.pytorch.nan_analyse.analyze_dump_graph import GraphBuilder, GraphTraversal +from msprobe.pytorch.nan_analyse.pre_process_dump_data import process_on_all_ranks + + +class HeapDumpAnalyzer: + def __init__(self, dump_file_path): + """初始化分析器 + Args: + dump_file_path (str): 堆转储文件的路径 + """ + self.dump_file_path = dump_file_path + self.processed_data = None + self.analysis_results = None + self.graph = None + self.visited_levels = None + + def pre_process(self): + """预处理dump文件 + Returns: + 处理后的数据结构 + """ + self.processed_data = process_on_all_ranks(self.dump_file_path) + self.graph = GraphBuilder.build_graph(self.processed_data) + + def analyze_graph(self): + """分析预处理后的数据 + Returns: + 分析结果 + """ + if self.processed_data is None or self.graph is None: + raise ValueError("Data or graph is not processed yet") + self.visited_levels = GraphTraversal.bfs_by_level(self.graph) + + def post_process(self): + """获取分析结果""" + self.analysis_results = GraphTraversal.sort_levels(self.visited_levels) + + def apply(self): + """执行完整的分析流程""" + self.pre_process() + + self.analyze_graph() + + self.post_process() + return self.analysis_results + + +if __name__ == "__main__": + analyzer = HeapDumpAnalyzer("test_data/send_recv") + results = analyzer.apply() + GraphTraversal.print_levels_info(results) diff --git a/debug/accuracy_tools/msprobe/pytorch/nan_analyse/api_info.py b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/api_info.py new file mode 100644 index 0000000000000000000000000000000000000000..17fdc88a0abadc3a8e9e3cb12012eb0a6d3d9213 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/api_info.py @@ -0,0 +1,164 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from dataclasses import dataclass + +from typing import Dict, List, Union, Any + +from msprobe.core.common.const import Const +from msprobe.core.overflow_check.filter import IgnoreFilter +from msprobe.pytorch.nan_analyse.utils import singleton, has_nan_inf, generate_hash + + +def is_comm_api_name_match(bench_api_name, cmp_api_name): + if 'send' in bench_api_name and 'recv' in cmp_api_name: + return True + if 'recv' in bench_api_name and 'send' in cmp_api_name: + return True + return bench_api_name == cmp_api_name + + +@dataclass +class APIInfo: + input_kwargs: Dict + output_data: List[Dict] + api_name: str + torch_api_name: str + input_args: List[Dict] + call_index: int + is_communication_op: bool + + cur_rank: int + process_group_id = str + + def __init__(self, api_name, input_args=None, input_kwargs=None, output_data=None, call_index=0, cur_rank=None): + self.input_kwargs = input_kwargs + self.output_data = output_data + self.api_name = api_name + self.input_args = input_args + self.call_index = call_index + self.cur_rank = cur_rank + self.torch_api_name = self.__extract_torch_api(self.api_name) + self.is_communication_op = self.__is_communication_operators() + self.pg, self.process_group_id = self.__generate_pg_id() + + def __eq__(self, other): + if not self.is_communication_op or not other.is_communication_op: + return False + + if not is_comm_api_name_match(self.torch_api_name, other.torch_api_name): + return False + + if self.torch_api_name != other.torch_api_name: + return False + if self.process_group_id != other.process_group_id: + return False + return True + + @staticmethod + def __extract_torch_api(api_name) -> str: + """ + Process tensor api name to extract first two fields in lowercase. + """ + # Empty string checking + if not api_name.strip(): + return "" + + parts = api_name.split(Const.SEP) + + # Handle different cases based on number of parts + if len(parts) == 0: + return "" + elif len(parts) == 1: + return parts[0].lower() + else: + return Const.SEP.join(parts[:2]).lower() + + def __is_communication_operators(self) -> bool: + # 定义通信算子的关键字,覆盖各种通信操作,如all_reduce, send, broadcast等 + # 从wrap文件中读取,先硬编码在文件中 + communication_keywords = [ + 'send', # send 算子 + 'recv', # recv 算子 + 'broadcast', # broadcast 算子 + 'all_reduce', # all_reduce 算子 + 'reduce', # reduce 算子 + 'all_gather', # all_gather 算子 + 'gather', # gather 算子 + 'isend', # isend 算子 + 'irecv', # irecv 算子 + 'scatter', # scatter 算子 + 'reduce_scatter', # reduce_scatter 算子 + '_reduce_scatter_base', # _reduce_scatter_base 算子 + '_all_gather_base', # _all_gather_base 算子 + 'all_to_all_single', # all_to_all_single 算子 + 'all_to_all', # all_to_all 算子 + 'all_gather_into_tensor', # all_gather_into_tensor 算子 + 'reduce_scatter_tensor' # reduce_scatter_tensor 算子 + ] + + # 是否以Distributed开头,并且算子名包含上述通信算子 + return (any(keyword in self.api_name for keyword in communication_keywords) or + self.api_name.startswith('Distributed.')) + + def __generate_pg_id(self): + if not self.is_communication_op: + return None, None + + process_group: List[int] = [] + if 'send' in self.api_name: + dst = int(self.input_kwargs.get('dst', {}).get('value')) + process_group.extend([self.cur_rank, dst]) + elif 'recv' in self.api_name: + src = int(self.input_kwargs.get('src', {}).get('value')) + process_group.extend([src, self.cur_rank]) + else: + process_group.extend(self.input_kwargs.get('group_ranks', [])) + + # 暂时直接使用调用的次数,而忽略pg的匹配 + call_cnt = self.api_name.split('.')[-2] + fmt = f'{call_cnt}_{str(process_group)}' + + return process_group, generate_hash(fmt) + + +@singleton +class AnomalyDetector: + def __init__(self): + self._filter = IgnoreFilter() + + @staticmethod + def _has_anomaly(data: Union[Dict, Any]) -> bool: + return has_nan_inf(data) + + def has_input_anomaly(self, api_data) -> bool: + """检查输入是否有异常(包括args和kwargs)""" + # args + args_anomaly = any(self._has_anomaly(x) for x in api_data.input_args if isinstance(x, dict)) + # kwargs + kwargs_anomaly = any(self._has_anomaly(x) for x in api_data.input_kwargs.values() if isinstance(x, dict)) + return args_anomaly or kwargs_anomaly + + def has_output_anomaly(self, api_data) -> bool: + """检查输出是否有异常""" + return any(self._has_anomaly(x) for x in api_data.output_data if isinstance(x, dict)) + + def has_overflow(self, data: APIInfo) -> bool: + # 输入输出不存在nan、inf,不存在溢出 + if not (self.has_input_anomaly(data) or self.has_output_anomaly(data)): + return False + # 是否真的溢出,并且对计算结果造成影响 + if self._filter.apply_filter(data): + return False + return True diff --git a/debug/accuracy_tools/msprobe/pytorch/nan_analyse/pre_process_dump_data.py b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/pre_process_dump_data.py new file mode 100644 index 0000000000000000000000000000000000000000..73815f8b922d3c6e7ff5d7a8505524ad61a4ee15 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/pre_process_dump_data.py @@ -0,0 +1,100 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import os +import re +from typing import Any, Dict +from collections import OrderedDict + +from msprobe.core.common.const import Const +from msprobe.core.common.log import logger +from msprobe.core.common.file_utils import load_json +from msprobe.pytorch.nan_analyse.api_info import APIInfo, AnomalyDetector + + +def _create_api_info(api_name: str, _data: Dict, call_index: int = 0, cur_rank: int = 0) -> APIInfo: + """从原始数据创建APIInfo实例""" + return APIInfo( + api_name=api_name, + input_args=_data.get(Const.INPUT_ARGS, []), + input_kwargs=_data.get(Const.INPUT_KWARGS, {}), + output_data=_data.get(Const.OUTPUT, []), + call_index=call_index, + cur_rank=cur_rank + ) + + +def extract_essential_operators(dump_data: Any, cur_rank: int, common_overflow_num=5): + """ + 减少内存占用,仅筛选出溢出、通信算子等,用于下一步构图 + """ + # 从数据中提取通信算子和nan等溢出问题算子,使用顺序dict保存结果 + # order dict性能与list+dict性能比较,是否对这里进行改造 + extract_opts = OrderedDict() + detector = AnomalyDetector() # 单例,无额外内存占用 + cnt = 0 + index = 0 + for api_name, value in dump_data.get('data', {}).items(): + api_info = _create_api_info(api_name, value, call_index=index, cur_rank=cur_rank) + index += 1 + + is_overflow, is_comm_op = detector.has_overflow(api_info), api_info.is_communication_op + if cnt < common_overflow_num and is_overflow: + extract_opts[api_name] = api_info + cnt += 1 + continue + + return extract_opts + + +def process_on_all_ranks(base_path: str): + all_rank_ops_data = {} + + # 获取所有rank目录 + for rank_dir in os.listdir(base_path): + rank_path = os.path.join(base_path, rank_dir) + if not os.path.isdir(rank_path) or not rank_dir.startswith('rank'): + logger.warning(f"{rank_dir} is not a valid rank directory.") + continue + + dump_file = os.path.join(rank_path, 'dump.json') + if not os.path.exists(dump_file): + logger.warning(f"{dump_file} does not exist for {rank_dir}") + continue + + rank_id = get_rank_id(rank_dir) + dump_data = load_json(dump_file) + op_list = extract_essential_operators(dump_data, rank_id) + + if op_list: + all_rank_ops_data[rank_id] = op_list + else: + logger.warning(f"No essential operators found for {rank_id}") + + return all_rank_ops_data + + +def get_rank_id(rank_dir: str) -> int: + match = re.search(r'rank(\d+)', rank_dir) + + if not match: + raise ValueError(f"Invalid rank directory: {rank_dir}") + return int(match.group(1)) + + +if __name__ == '__main__': + file_path = 'test_data/all_reduce_data' + + data = process_on_all_ranks(file_path) + logger.info(data) diff --git a/debug/accuracy_tools/msprobe/pytorch/nan_analyse/utils.py b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..2eb54dc488ccbd98ddba132a285a99655deba815 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/nan_analyse/utils.py @@ -0,0 +1,50 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import hashlib +from typing import Any + + +CHECK_FIELDS = ['Max', 'Min', 'Mean'] +OVERFLOW_VALUES = ['inf', '-inf', 'nan'] + + +def singleton(cls): + """ + :param cls: any class + :return: singleton handle + """ + _instance = {} + + def _singleton(*args: any, **kw: any) -> any: + if cls not in _instance: + _instance[cls] = cls(*args, **kw) + return _instance.get(cls) + + return _singleton + + +def has_nan_inf(value: Any) -> bool: + """检查值是否包含NaN或Inf""" + if isinstance(value, dict): + for k, v in value.items(): + if k in CHECK_FIELDS and str(v).lower() in OVERFLOW_VALUES: + return True + return False + + +def generate_hash(input_string): + sha256_hash = hashlib.sha256() + sha256_hash.update(input_string.encode('utf-8')) + return sha256_hash.hexdigest() diff --git a/debug/accuracy_tools/msprobe/pytorch/online_dispatch/dump_compare.py b/debug/accuracy_tools/msprobe/pytorch/online_dispatch/dump_compare.py index b185bc1110d4062d8a31b9cc94dc946d8fb8456c..a154064755ed116eeb2f2ea97b50160bf4b7beb9 100644 --- a/debug/accuracy_tools/msprobe/pytorch/online_dispatch/dump_compare.py +++ b/debug/accuracy_tools/msprobe/pytorch/online_dispatch/dump_compare.py @@ -19,6 +19,8 @@ import os from datetime import datetime, timezone import torch +from msprobe.core.common.const import Const +from msprobe.core.common.utils import recursion_depth_decorator from msprobe.core.common.file_utils import FileOpen, save_npy, save_json from msprobe.pytorch.common.log import logger @@ -91,6 +93,7 @@ def support_basic_type(data): return False +@recursion_depth_decorator("dump_data") def dump_data(data, prefix, dump_path): if isinstance(data, (tuple, list)) and data: for i, item in enumerate(data): diff --git a/debug/accuracy_tools/msprobe/pytorch/online_dispatch/utils.py b/debug/accuracy_tools/msprobe/pytorch/online_dispatch/utils.py index ae8b9435a34ced607d4e70fab615b2b017083fe9..2116186cc046865388c40d33142384301f84acd2 100644 --- a/debug/accuracy_tools/msprobe/pytorch/online_dispatch/utils.py +++ b/debug/accuracy_tools/msprobe/pytorch/online_dispatch/utils.py @@ -27,8 +27,10 @@ else: pta_cpu_device = torch.device("cpu") from msprobe.core.common.const import CompareConst +from msprobe.core.common.utils import recursion_depth_decorator from msprobe.pytorch.common.log import logger + cpu_device = torch._C.device("cpu") COLOR_RED = '\033[31m' COLOR_GREEN = '\033[32m' @@ -85,6 +87,7 @@ def get_callstack(): return callstack +@recursion_depth_decorator("data_to_cpu") def data_to_cpu(data, deep, data_cpu): global cpu_device list_cpu = [] diff --git a/debug/accuracy_tools/msprobe/pytorch/parse_tool/lib/utils.py b/debug/accuracy_tools/msprobe/pytorch/parse_tool/lib/utils.py index 66229d36b8d0b532eea48f1aa5d96e178ed80cdc..db731b338244ca78bfd460633a7442b1e2ef2d5d 100644 --- a/debug/accuracy_tools/msprobe/pytorch/parse_tool/lib/utils.py +++ b/debug/accuracy_tools/msprobe/pytorch/parse_tool/lib/utils.py @@ -264,7 +264,7 @@ class Util: match = re_pattern.match(name) if not match: continue - if extern_pattern != '' and re_pattern.match(extern_pattern) and not re.match(extern_pattern, name): + if extern_pattern != '' and re_pattern.match(extern_pattern) and not name.startswith(extern_pattern): continue file_list[name] = gen_info_func(name, match, file["root"]) return file_list diff --git a/debug/accuracy_tools/msprobe/pytorch/pt_config.py b/debug/accuracy_tools/msprobe/pytorch/pt_config.py index 8293ac969490b103eef630081b6001234ca8bb07..d97aff11ff43b9141288467ec0c0c4d1827f3825 100644 --- a/debug/accuracy_tools/msprobe/pytorch/pt_config.py +++ b/debug/accuracy_tools/msprobe/pytorch/pt_config.py @@ -16,9 +16,10 @@ import os import re -from msprobe.core.common.const import Const +from msprobe.core.common.const import Const, FileCheckConst from msprobe.core.common.exceptions import MsprobeException -from msprobe.core.common.file_utils import FileOpen, load_json, check_file_or_directory_path, check_crt_valid +from msprobe.core.common.file_utils import FileOpen, load_json, check_file_or_directory_path, check_crt_valid, \ + FileChecker from msprobe.core.common.log import logger from msprobe.core.common.utils import is_int from msprobe.core.common_config import BaseConfig, CommonConfig @@ -252,6 +253,8 @@ class RunUTConfig(BaseConfig): self.port = json_config.get("port", -1) self.rank_list = json_config.get("rank_list", Const.DEFAULT_LIST) self.tls_path = json_config.get("tls_path", "./") + self.master_ip = json_config.get("master_ip", "127.0.0.1") + self.master_port = json_config.get("master_port", "8888") self.check_run_ut_config() @classmethod @@ -271,13 +274,26 @@ class RunUTConfig(BaseConfig): @classmethod def check_nfs_path_config(cls, nfs_path): - if nfs_path and not os.path.exists(nfs_path): - raise Exception("nfs_path: %s does not exist" % nfs_path) + if nfs_path: + FileChecker(nfs_path, FileCheckConst.DIR, FileCheckConst.READ_ABLE).common_check() @classmethod def check_tls_path_config(cls, tls_path): - if tls_path and not os.path.exists(tls_path): - raise Exception("tls_path: %s does not exist" % tls_path) + if tls_path: + FileChecker(tls_path, FileCheckConst.DIR, FileCheckConst.READ_ABLE).common_check() + + @classmethod + def check_master_ip_config(cls, master_ip): + if not re.match(Const.ipv4_pattern, master_ip): + raise Exception("master_ip: %s is invalid" % master_ip) + + @classmethod + def check_master_port_config(cls, master_port): + if not isinstance(master_port, str) or not master_port.isdigit(): + raise Exception(f"port: {master_port} is invalid. Port must be a numeric string.") + port_number = int(master_port) + if not (0 < port_number <= 65535): + raise Exception(f"port: {master_port} is invalid. Port range must be between 1 and 65535.") def check_run_ut_config(self): RunUTConfig.check_filter_list_config(Const.WHITE_LIST, self.white_list) @@ -285,6 +301,8 @@ class RunUTConfig(BaseConfig): RunUTConfig.check_error_data_path_config(self.error_data_path) RunUTConfig.check_nfs_path_config(self.nfs_path) RunUTConfig.check_tls_path_config(self.tls_path) + RunUTConfig.check_master_ip_config(self.master_ip) + RunUTConfig.check_master_port_config(self.master_port) class GradToolConfig(BaseConfig): diff --git a/debug/accuracy_tools/msprobe/pytorch/service.py b/debug/accuracy_tools/msprobe/pytorch/service.py index fd81a7f1cf064506a4fb91481429828c97113509..7fdc4380f51afa032e141b3bb108431e4b806b6d 100644 --- a/debug/accuracy_tools/msprobe/pytorch/service.py +++ b/debug/accuracy_tools/msprobe/pytorch/service.py @@ -30,7 +30,7 @@ from msprobe.pytorch.common.log import logger from msprobe.pytorch.common.utils import get_rank_if_initialized, is_recomputation from msprobe.pytorch.dump.kernel_dump.kernel_config import create_kernel_config_json from msprobe.pytorch.dump.module_dump.module_processer import ModuleProcesser -from msprobe.pytorch.hook_module.api_registry import api_register +from msprobe.pytorch.hook_module.api_register import get_api_register from msprobe.pytorch.hook_module.hook_module import HOOKModule from msprobe.pytorch.hook_module.register_optimizer_hook import register_optimizer_hook @@ -50,6 +50,8 @@ class Service: self.switch = False self.inner_switch = False self.current_iter = 0 + self.loop = 0 + self.init_step = 0 self.first_start = True self.current_rank = None self.dump_iter_dir = None @@ -58,6 +60,7 @@ class Service: self.params_grad_info = {} self.hook_handle_dict = {} # 提前注册,确保注册尽可能多的API hook + self.api_register = get_api_register() self.register_api_hook() self.init_for_debug_level() @@ -246,6 +249,8 @@ class Service: return HookFn(pre_forward_hook_fn, forward_hook_fn, backward_hook_fn, forward_hook_torch_version_below_2_fn) def start(self, model): + self.current_iter = self.loop + self.init_step + self.data_collector.update_iter(self.current_iter) if self.config.level == Const.LEVEL_DEBUG: return if self.need_stop_service(): @@ -288,10 +293,8 @@ class Service: if self.config.online_run_ut and torch_version_above_or_equal_2: run_ut_dispatch(self.attl, False, self.config.online_run_ut_recompute) return - if self.config.async_dump: - self.data_collector.fill_stack_tensor_data() - if self.config.task == Const.TENSOR: - self.data_collector.data_processor.dump_async_data() + if self.config.async_dump and self.config.task == Const.TENSOR: + self.data_collector.data_processor.dump_async_data() self.data_collector.write_json() def step(self): @@ -299,13 +302,10 @@ class Service: return if self.should_stop_service: return - if self.config.async_dump: - self.data_collector.fill_stack_tensor_data() - if self.config.task == Const.TENSOR: - self.data_collector.data_processor.dump_async_data() + if self.config.async_dump and self.config.task == Const.TENSOR: + self.data_collector.data_processor.dump_async_data() self.data_collector.write_json() - self.current_iter += 1 - self.data_collector.update_iter(self.current_iter) + self.loop += 1 self.reset_status() def need_stop_service(self): @@ -370,11 +370,10 @@ class Service: def register_api_hook(self): if self.config.level in [Const.LEVEL_MIX, Const.LEVEL_L1, Const.LEVEL_L2]: logger.info_on_rank_0(f"The api {self.config.task} hook function is successfully mounted to the model.") - api_register.initialize_hook( - functools.partial(self.build_hook, BaseScope.Module_Type_API), - self.config.online_run_ut + self.api_register.initialize_hook( + functools.partial(self.build_hook, BaseScope.Module_Type_API) ) - api_register.api_modularity() + self.api_register.register_all_api() def register_module_hook(self): if self.config.level in [Const.LEVEL_L0, Const.LEVEL_MIX]: diff --git a/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/__init__.py b/debug/accuracy_tools/msprobe/pytorch/visualization/__init__.py similarity index 100% rename from profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/__init__.py rename to debug/accuracy_tools/msprobe/pytorch/visualization/__init__.py diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/builder/__init__.py b/debug/accuracy_tools/msprobe/pytorch/visualization/builder/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/builder/graph_builder.py b/debug/accuracy_tools/msprobe/pytorch/visualization/builder/graph_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..f623a48ae3b9607103b4af63bd8838d3d13c8a0b --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/builder/graph_builder.py @@ -0,0 +1,84 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from ..graph.graph import Graph +from ..graph.node_op import NodeOp +from ..utils import load_json_file, load_data_json_file, save_json_file, GraphConst +from .msprobe_adapter import get_input_output + + +class GraphBuilder: + @staticmethod + def build(construct_path, data_path, model_name='DefaultModel'): + """ + GraphBuilder的对外提供的构图方法 + Args: + construct_path: construct.json路径 + data_path: dump.json路径 + model_name: 模型名字,依赖外部输入 + Returns: Graph,代表图的数据结构 + """ + construct_dict = load_json_file(construct_path) + data_dict = load_data_json_file(data_path) + graph = Graph(model_name) + GraphBuilder._init_nodes(graph, construct_dict, data_dict) + return graph + + @staticmethod + def to_json(filename, graph_n, graph_b=None, tool_tip=None): + """ + 将graph导出成.vis文件的接口 + Args: + filename: 输出文件路径 + graph_n: Graph + graph_b: bench Graph,为空是只输出graph_b,不为空会同时输出两个graph,作为对比的结果 + tool_tip: 在对比模型下输出的意见 + """ + result = {} + if graph_b: + result[GraphConst.JSON_NPU_KEY] = graph_n.to_dict() + result[GraphConst.JSON_BENCH_KEY] = graph_b.to_dict() + else: + result = graph_n.to_dict() + if tool_tip: + result[GraphConst.JSON_TIP_KEY] = tool_tip + save_json_file(filename, result) + + @staticmethod + def _init_nodes(graph, construct_dict, data_dict): + for subnode_id, upnode_id in construct_dict.items(): + if upnode_id: + upnode_op = NodeOp.get_node_op(upnode_id) + upnode = GraphBuilder._create_or_get_node(graph, data_dict, upnode_op, upnode_id) + else: + upnode = graph.root + node_op = NodeOp.get_node_op(subnode_id) + GraphBuilder._create_or_get_node(graph, data_dict, node_op, subnode_id, upnode) + + @staticmethod + def _create_or_get_node(graph, data_dict, op, name, upnode=None): + if name in graph.node_map: + node = graph.get_node(name) + else: + graph.add_node(op, name, upnode) + node = graph.get_node(name) + node_data = data_dict.get(name, {}) + # 添加输入输出数据 + input_data, output_data = get_input_output(node_data, node.id) + # 更新数据 + node.set_input_output(input_data, output_data) + # 添加节点 + node.add_upnode(upnode) + return node \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/builder/msprobe_adapter.py b/debug/accuracy_tools/msprobe/pytorch/visualization/builder/msprobe_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..7ea0dfabedf7c482975094abdd981baa1afeb44e --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/builder/msprobe_adapter.py @@ -0,0 +1,185 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import re +from ...compare.acc_compare import read_op, merge_tensor, get_accuracy, _do_multi_process +from ....core.common.utils import task_dumppath_get +from ..utils import GraphConst + + +# 用于将节点名字解析成对应的NodeOp的规则 +op_patterns = [ + r'^(Module)', #NodeOp.module + r'^(Tensor|Torch|Functional|NPU|VF|Distributed|Aten)' #NodeOp.function_api +] + + +def get_compare_mode(dump_path_param): + """ + 获得比较模式,包括summary、MD5和真实数据三种模式 + Args: + dump_path_param: 调用acc_compare接口所依赖的参数 + Returns: 0 summary mode, 1 md5 mode, 2 true data mode + """ + summary_compare, md5_compare = task_dumppath_get(dump_path_param) + if summary_compare: + compare_mode = GraphConst.SUMMARY_COMPARE + elif md5_compare: + compare_mode = GraphConst.MD5_COMPARE + else: + compare_mode = GraphConst.REAL_DATA_COMPARE + return compare_mode + + +def run_real_data(dump_path_param, csv_path): + """ + 多进程运行生成真实数据 + Args: + dump_path_param: 调用acc_compare接口所依赖的参数 + csv_path: 生成文件路径 + """ + return _do_multi_process(dump_path_param, csv_path) + + +def get_input_output(node_data, node_id): + """ + 将dump的原始数据进行拆解,分解为output和input两个数据 + Args: + node_data: 属于单个节点的dump数据 + node_id: 节点名字 + """ + input_data = {} + output_data = {} + op_parsed_list = read_op(node_data, node_id) + for item in op_parsed_list: + full_op_name = item.get('full_op_name', '') + if not full_op_name: + continue + splits = full_op_name.split('.') + if len(splits) <= GraphConst.OUTPUT_INDEX: + continue + if 'output' in splits[GraphConst.OUTPUT_INDEX]: + output_data[full_op_name] = item + else: + input_data[full_op_name] = item + return input_data, output_data + + +def compare_data(data_dict_list1, data_dict_list2): + """ + 比较get_input_output中输出的结果是否结构一致,比较一致返回True + """ + if len(data_dict_list1) != len(data_dict_list2): + return False + # 用于比较两个节点是否相等的关键字段 + tag_keys = ['type', 'dtype', 'shape'] + for key1, key2 in zip(data_dict_list1, data_dict_list2): + dict1 = data_dict_list1[key1] + dict2 = data_dict_list2[key2] + for tag_key in tag_keys: + tag_value1 = dict1.get(tag_key, None) + tag_value2 = dict2.get(tag_key, None) + if tag_value1 != tag_value2: + return False + return True + + +def format_node_data(data_dict): + """ + 批量进行节点数据的输出 + """ + del_list = ['requires_grad', 'data_name', 'full_op_name'] + for _, value in data_dict.items(): + if not isinstance(value, dict): + continue + for item in del_list: + if item in value: + del value[item] + _format_data(value) + return data_dict + + +def compare_node(node_ids, data_dicts, stack_json_data, is_summary_compare, is_md5_compare): + """ + 调用acc_compare.py中的get_accuracy获得精度对比指标 + 真实数据对比模式无法获得精度对比指标,需要调用多进程比对接口 + Returns: 包含参数信息和对比指标(真实数据对比模式除外)的list + """ + merge_n = _parse_node(node_ids[0], data_dicts[0], stack_json_data, is_summary_compare, is_md5_compare) + merge_b = _parse_node(node_ids[1], data_dicts[1], stack_json_data, is_summary_compare, is_md5_compare) + result = [] + get_accuracy(result, merge_n, merge_b, is_summary_compare, is_md5_compare) + return result + + +def _parse_node(node_id, data_dict, stack_json_data, is_summary_compare, is_md5_compare): + """ + 转换节点,使其能够作为acc_compare.py中的get_accuracy的入参 + """ + op_parsed_list = read_op(data_dict.get(node_id, {}), node_id) + if node_id in stack_json_data: + op_parsed_list.append( + {'full_op_name': node_id, 'full_info': stack_json_data[node_id]}) + else: + op_parsed_list.append({'full_op_name': node_id, 'full_info': None}) + result = merge_tensor(op_parsed_list, is_summary_compare, is_md5_compare) + if not result: + result['op_name'] = [] + return result + + +def _format_decimal_string(s): + """ + 使用正则表达式匹配包含数字、小数点和可选的百分号的字符串 + """ + pattern = re.compile(r'\d{1,20}\.\d{1,20}%?') + matches = pattern.findall(s) + for match in matches: + is_percent = match.endswith('%') + number_str = match.rstrip('%') + decimal_part = number_str.split('.')[1] + # 如果小数位数大于6,进行处理 + if len(decimal_part) > GraphConst.ROUND_TH: + number_float = float(number_str) + formatted_number = f"{number_float:.{GraphConst.ROUND_TH}f}" + # 如果原来是百分数,加回百分号 + if is_percent: + formatted_number += '%' + # 替换原字符串中的数值部分 + s = s.replace(match, formatted_number) + return s + + +def _format_data(data_dict): + """ + 格式化数据,小数保留6位,处理一些异常值 + """ + pattern = r'^[+-]?(\d+(.\d*)?|.\d+)([eE][+-]?\d+)$' + for key, value in data_dict.items(): + if isinstance(value, str): + # 将单引号删掉,None换成null避免前端解析错误 + value = value.replace("'", "").replace('None', 'null') + value = _format_decimal_string(value) + elif value is None or value == ' ': + value = 'null' + # 科学计数法1.123123123123e-11,格式化为1.123123e-11 + elif isinstance(value, float) and len(str(value)) < GraphConst.STR_MAX_LEN and re.match(pattern, str(value)): + value = "{:.6e}".format(value) + elif isinstance(value, float): + value = round(value, GraphConst.ROUND_TH) + # Inf会走入这里,确保转成Inf。另外给其他不符合预期的类型做兜底方案 + if not isinstance(value, (list, tuple, dict, str)): + value = str(value) + data_dict[key] = value diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/compare/__init__.py b/debug/accuracy_tools/msprobe/pytorch/visualization/compare/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/compare/graph_comparator.py b/debug/accuracy_tools/msprobe/pytorch/visualization/compare/graph_comparator.py new file mode 100644 index 0000000000000000000000000000000000000000..3d5f2972468adab8a436167d2f50eab9ace05873 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/compare/graph_comparator.py @@ -0,0 +1,104 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from ..builder.msprobe_adapter import compare_node, get_compare_mode, run_real_data +from ..utils import GraphConst, load_json_file, load_data_json_file, get_csv_df +from ..graph.graph import Graph +from .mode_adapter import ModeAdapter + + +class GraphComparator: + def __init__(self, graphs, data_paths, stack_path, output_path): + self.graph_n = graphs[0] + self.graph_b = graphs[1] + self._parse_param(data_paths, stack_path, output_path) + + def compare(self): + """ + 比较函数,初始化结束后单独调用。比较结果写入graph_n + """ + self._compare_nodes(self.graph_n.root) + self._postcompare() + + def add_compare_result_to_node(self, node, compare_result_list): + """ + 将比对结果添加到节点的输入输出数据中 + Args: + node: 节点 + compare_result_list: 包含参数信息和对比指标(真实数据对比模式除外)的list + """ + # 真实数据比对,先暂存节点,在多进程对比得到精度指标后,再将指标添加到节点中 + if self.ma.prepare_real_data(node): + return + compare_in_dict = {} + compare_out_dict = {} + # input和output对比数据分开 + for item in compare_result_list: + if 'output' in item[0]: + compare_out_dict[item[0]] = item + else: + compare_in_dict[item[0]] = item + precision_status, precision_index, other_dict = self.ma.parse_result(node, [compare_in_dict, compare_out_dict]) + node.data[GraphConst.JSON_STATUS_KEY] = precision_status + node.data[GraphConst.JSON_INDEX_KEY] = precision_index + node.data.update(other_dict) + if not precision_status: + self.ma.add_error_key(node.output_data) + node.get_suggestions() + + def _parse_param(self, data_paths, stack_path, output_path): + self.dump_path_param = { + 'npu_json_path': data_paths[0], + 'bench_json_path': data_paths[1], + 'stack_json_path': stack_path, + 'is_print_compare_log': True + } + self.output_path = output_path + compare_mode = get_compare_mode(self.dump_path_param) + self.ma = ModeAdapter(compare_mode) + self.data_n_dict = load_data_json_file(data_paths[0]) + self.data_b_dict = load_data_json_file(data_paths[1]) + self.stack_json_data = load_json_file(stack_path) + + def _postcompare(self): + if not self.ma.is_real_data_compare(): + return + df = get_csv_df(self.ma.is_md5_compare(), self.ma.is_summary_compare(), True, self.ma.csv_data) + df = run_real_data(self.dump_path_param, df) + compare_data_dict = {row[0]: row.tolist() for _, row in df.iterrows()} + for node in self.ma.compare_nodes: + precision_status, precision_index, _ = self.ma.parse_result(node, [compare_data_dict]) + node.data[GraphConst.JSON_STATUS_KEY] = precision_status + node.data[GraphConst.JSON_INDEX_KEY] = precision_index + if not precision_status: + self.ma.add_error_key(node.output_data) + node.get_suggestions() + + def _compare_nodes(self, node_n): + #递归遍历NPU树中的节点,如果在Bench中找到具有相同名称的节点,检查他们的祖先和参数信息,检查一致则及逆行精度数据对比 + #这里采用先序遍历,好处在于当这个节点被比较时,他的先序已经被匹配,这可以为后续的模糊匹配提供重要信息 + node_b, ancestors = Graph.match(self.graph_n, node_n, self.graph_b) + if node_b: + ancestors.append(node_b.id) + node_n.add_link(node_b, ancestors) + # 真实数据比对只会得到基本信息,并没有精度指标,需要调用多进程对比接口 + compare_result_list = compare_node([node_n.id, node_b.id], [self.data_n_dict, self.data_b_dict], + self.stack_json_data, self.ma.is_summary_compare(), + self.ma.is_md5_compare()) + if compare_result_list: + self.ma.add_csv_data(compare_result_list) + self.add_compare_result_to_node(node_n, compare_result_list) + for subnode in node_n.subnodes: + self._compare_nodes(subnode) diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/compare/mode_adapter.py b/debug/accuracy_tools/msprobe/pytorch/visualization/compare/mode_adapter.py new file mode 100644 index 0000000000000000000000000000000000000000..d58f2078b6f8996a31c2f830ef5adf79bc7948c3 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/compare/mode_adapter.py @@ -0,0 +1,211 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json +from ....core.common.const import CompareConst, Const +from ..utils import ToolTip, GraphConst, str2float + + +class ModeAdapter: + def __init__(self, compare_mode): + self.compare_mode = compare_mode + self.csv_data = [] + self.compare_nodes = [] + + @staticmethod + def _add_md5_compare_data(node_data, compare_data_dict): + precision_status = True + for key, value in node_data.items(): + if not isinstance(value, dict): + continue + compare_data = compare_data_dict.get(key) + if compare_data: + key_list = [GraphConst.JSON_MD5_KEY] + headers = CompareConst.MD5_COMPARE_RESULT_HEADER + id_list = [headers.index(x) for x in key_list] + ModeAdapter._match_data(value, compare_data, key_list, id_list) + # md5比对是否通过 + if value.get(GraphConst.JSON_MD5_KEY) != CompareConst.PASS: + precision_status = False + node_data[key] = value + return precision_status + + @staticmethod + def _add_real_compare_data(node_data, compare_data_dict): + min_thousandth = float(1) + numbers = [] + for key, value in node_data.items(): + if not isinstance(value, dict): + continue + compare_data = compare_data_dict.get(key) + if compare_data: + key_list = [CompareConst.COSINE, CompareConst.MAX_ABS_ERR, CompareConst.MAX_RELATIVE_ERR, + CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] + headers = CompareConst.COMPARE_RESULT_HEADER + id_list = [headers.index(x) for x in key_list] + ModeAdapter._match_data(value, compare_data, key_list, id_list) + # 获取一个节点所有的输入或输出最小的双千指标 + thousandth = value.get(CompareConst.ONE_THOUSANDTH_ERR_RATIO) + # 可能是None,可能是非数字内容str + try: + thousandth = float(thousandth) + except (ValueError, TypeError): + thousandth = None + if thousandth is not None: + numbers.append(thousandth) + node_data[key] = value + # 双千指标都是None的异常情况 + if not numbers: + min_thousandth = None + else: + min_thousandth = min(numbers + [min_thousandth]) + return min_thousandth + + @staticmethod + def _add_summary_compare_data( node_data, compare_data_dict): + precision_status = True + max_relative_err = 0 + for key, value in node_data.items(): + if not isinstance(value, dict): + continue + compare_data = compare_data_dict.get(key) + if compare_data: + # 对应比对结果csv的列 + key_list = [CompareConst.MAX_DIFF, CompareConst.MIN_DIFF, CompareConst.MEAN_DIFF, + CompareConst.NORM_DIFF, CompareConst.MAX_RELATIVE_ERR, CompareConst.MIN_RELATIVE_ERR, + CompareConst.MEAN_RELATIVE_ERR, CompareConst.NORM_RELATIVE_ERR] + headers = CompareConst.SUMMARY_COMPARE_RESULT_HEADER + id_list = [headers.index(x) for x in key_list] + ModeAdapter._match_data(value, compare_data, key_list, id_list) + # 相对误差大于0.5疑似有精度问题,小值域1e-3不比较相对误差 + for index, item in enumerate(key_list[4:]): + value_diff = value.get(key_list[index]) + if isinstance(value_diff, float) and value_diff != 0 and abs(value_diff) < GraphConst.SMALL_VALUE: + value[item] = ToolTip.SMALL_VALUE_TIP.format(key_list[index]) + continue + relative_err = str2float(value.get(item)) + max_relative_err = max(max_relative_err, relative_err) + node_data[key] = value + if max_relative_err > GraphConst.MAX_RELATIVE_ERR_TH: + precision_status = False + max_relative_err = 1 if max_relative_err > 1 else max_relative_err + precision_index = 1 - max_relative_err + return precision_status, precision_index + + @staticmethod + def _match_data(data_dict, compare_data, key_list, id_list): + """ + 绑定精度指标到node的input_data和output_data + """ + if len(key_list) != len(id_list): + return + for id, key in zip(id_list, key_list): + data = compare_data[id] + if data is not None and 'nan' not in str(data) and str(data) != ' ': + data_dict[key] = data + else: + data_dict[key] = 'null' + + def parse_result(self, node, compare_data_dict): + """ + 根据结果返回数据,分别是precision_status,precision_index,和附加数据 + """ + other_dict = {} + if self.is_md5_compare(): + precision_status_in = ModeAdapter._add_md5_compare_data(node.input_data, compare_data_dict[0]) + precision_status_out = ModeAdapter._add_md5_compare_data(node.output_data, compare_data_dict[1]) + # 所有输入输出md5对比通过,这个节点才算通过 + precision_status = precision_status_in and precision_status_out + precision_index = 1 if precision_status else 0 + other_result = CompareConst.PASS if precision_status else CompareConst.DIFF + other_dict[GraphConst.JSON_MD5_KEY] = other_result + elif self.is_summary_compare(): + precision_status_in, precision_index_in = ModeAdapter._add_summary_compare_data(node.input_data, compare_data_dict[0]) + precision_status_out, precision_index_out = ModeAdapter._add_summary_compare_data(node.output_data, compare_data_dict[1]) + precision_status = precision_status_in and precision_status_out + precision_index = min(precision_index_in, precision_index_out) + else: + min_thousandth_in = ModeAdapter._add_real_compare_data(node.input_data, compare_data_dict[0]) + min_thousandth_out = ModeAdapter._add_real_compare_data(node.output_data, compare_data_dict[0]) + if min_thousandth_in and min_thousandth_out: + change_percentage = abs(min_thousandth_in - min_thousandth_out) + else: + change_percentage = 0 + precision_status = True + if change_percentage > GraphConst.REAL_DATA_TH: + precision_status = False + precision_index = 0 if change_percentage > 1 else 1 - change_percentage + return precision_status, precision_index, other_dict + + def prepare_real_data(self, node): + """ + 为真实数据比较模式准备节点信息 + """ + if self.is_real_data_compare(): + self.compare_nodes.append(node) + return True + return False + + def is_summary_compare(self): + return self.compare_mode == GraphConst.SUMMARY_COMPARE + + def is_md5_compare(self): + return self.compare_mode == GraphConst.MD5_COMPARE + + def is_real_data_compare(self): + return self.compare_mode == GraphConst.REAL_DATA_COMPARE + + def add_csv_data(self, compare_result_list): + if not self.is_real_data_compare(): + return + self.csv_data.extend(compare_result_list) + + def add_error_key(self, node_data): + """ + 根据不同的模式进行提供不同错误信息 + """ + for key, value in node_data.items(): + if not isinstance(value, dict): + continue + if self.is_summary_compare(): + message = [CompareConst.MAX_RELATIVE_ERR, CompareConst.MIN_RELATIVE_ERR, + CompareConst.MEAN_RELATIVE_ERR, CompareConst.NORM_RELATIVE_ERR] + elif self.is_real_data_compare(): + message = [CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] + else: + # 输出件优化 + message = [] + value[GraphConst.ERROR_KEY] = message + node_data[key] = value + + def get_tool_tip(self): + """ + 用于前端展示字段的具体含义 + """ + if self.is_summary_compare(): + tips = { + CompareConst.MAX_DIFF: ToolTip.MAX_DIFF, + CompareConst.MIN_DIFF: ToolTip.MIN_DIFF, + CompareConst.MEAN_DIFF: ToolTip.MEAN_DIFF, + CompareConst.NORM_DIFF: ToolTip.NORM_DIFF} + elif self.is_md5_compare(): + tips = {Const.MD5: ToolTip.MD5} + else: + tips = { + CompareConst.ONE_THOUSANDTH_ERR_RATIO: ToolTip.ONE_THOUSANDTH_ERR_RATIO, + CompareConst.COSINE: ToolTip.COSINE, + CompareConst.MAX_ABS_ERR: ToolTip.MAX_ABS_ERR, + CompareConst.MAX_RELATIVE_ERR: ToolTip.MAX_RELATIVE_ERR} + return tips diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/graph/__init__.py b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/graph/base_node.py b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/base_node.py new file mode 100644 index 0000000000000000000000000000000000000000..f04f367f591244a6d1ed48529d1fb4aae7cb2453 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/base_node.py @@ -0,0 +1,107 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .node_op import NodeOp +from ..utils import Suggestions, GraphConst +from ..builder.msprobe_adapter import format_node_data, compare_data + + +class BaseNode: + def __init__(self, node_op, node_id, up_node=None): + self.op = node_op + self.id = node_id + self.data = {} + self.output_data = {} + self.input_data = {} + self.upnode = None + self.add_upnode(up_node) + self.subnodes = [] + self.matched_node_link = [] + self.suggestions = {} + + def __str__(self): + info = f'id:\t{self.id}' + return info + + def __eq__(self, other): + """ + 用来判断两个节点是否可以被匹配上,认为结构上是否一致 + """ + if not compare_data(self.input_data, other.input_data): + return False + if not compare_data(self.output_data, other.output_data): + return False + return True + + def get_suggestions(self): + """ + 精度疑似有问题时,提供一些建议 + """ + if self.op == NodeOp.module: + self.suggestions[GraphConst.SUGGEST_KEY] = Suggestions.Module + self.suggestions[Suggestions.PTDBG] = Suggestions.PTDBG_URL + elif self.op == NodeOp.function_api: + self.suggestions[GraphConst.SUGGEST_KEY] = Suggestions.API + self.suggestions[Suggestions.API_ACCURACY_CHECKER] = Suggestions.API_ACCURACY_CHECKER_URL + + def set_input_output(self, input_data, output_data): + self.input_data = input_data + self.output_data = output_data + + def add_upnode(self, node): + """ + 绑定upnode,用于对两个节点进行上下级关联 + """ + if not node or node.id == self.id or self.upnode: + return + self.upnode = node + node.subnodes.append(self) + + def add_link(self, node, ancestors): + """ + 在节点匹配成功后进行匹配数据的录入 + Args: + node: 和self相互匹配的节点 + ancestors: 对面节点的祖先信息 + """ + self.matched_node_link = ancestors + node.matched_node_link = ancestors + + def to_dict(self): + """ + 输出数据 + """ + result = {} + result['id'] = self.id + result['node_type'] = self.op.value + result['data'] = self.data + result['output_data'] = format_node_data(self.output_data) + result['input_data'] = format_node_data(self.input_data) + result['upnode'] = self.upnode.id if self.upnode else 'None' + result['subnodes'] = [node.id for node in self.subnodes] + result['matched_node_link'] = self.matched_node_link + result['suggestions'] = self.suggestions + return result + + def get_ancestors(self): + """ + 获取节点所有祖先的列表 + """ + ancestors = [] + current_node = self.upnode + while current_node: + ancestors.append(current_node.id) + current_node = current_node.upnode + return list(reversed(ancestors)) diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/graph/graph.py b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/graph.py new file mode 100644 index 0000000000000000000000000000000000000000..6bae10ad3fc8a041d3ef2e8fb707d40a22b42f19 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/graph.py @@ -0,0 +1,86 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from .base_node import BaseNode +from .node_op import NodeOp +from ..utils import GraphConst + + +class Graph: + def __init__(self, model_name): + self.node_map = {} + self.add_node(NodeOp.module, model_name) + self.root = self.get_node(model_name) + + def __str__(self): + infos = [f'{str(self.node_map.get(node_id))}' for node_id in self.node_map] + info = "\n".join(infos) + return info + + @staticmethod + def match(graph_n, node_n, graph_b): + """ + 给定节点n,在另一个graph中匹配它对应的节点。前置条件是它的父节点匹配已经完成 + 目前采用完全匹配的方式,后续可能在这里加入一定的模糊匹配逻辑 + 返回匹配结果,匹配到的节点,以及祖先树。没匹配到则返回None, [] + """ + if not node_n or node_n.id not in graph_b.node_map: + return None, [] + node_b = graph_b.node_map.get(node_n.id) + if node_n != node_b: + return None, [] + ancestors_n = node_n.get_ancestors() + ancestors_b = node_b.get_ancestors() + if ancestors_n != ancestors_b: + return None, [] + return node_b, ancestors_n + + @staticmethod + def dfs(node, result): + info = node.to_dict() + result[node.id] = info + for subnode in node.subnodes: + Graph.dfs(subnode, result) + + def add_node(self, node_op, node_id, up_node=None): + """ + 在graph中进行节点的添加 + Args: + node_op: 需要添加的节点类型 + node_id: 需要添加的节点id + up_node:对应节点的父节点 + """ + if node_id in self.node_map: + return + node = BaseNode(node_op, node_id, up_node) + self.node_map[node_id] = node + + def get_node(self, node_id): + """ + 返回节点,不存在返回None + """ + return self.node_map.get(node_id, None) + + def to_dict(self): + """ + 用于数据输出 + """ + result = {} + result[GraphConst.JSON_ROOT_KEY] = self.root.id if self.root else 'None' + result[GraphConst.JSON_NODE_KEY] = {} + for node_id in self.node_map: + info = self.node_map.get(node_id).to_dict() + result[GraphConst.JSON_NODE_KEY][node_id] = info + return result diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/graph/node_op.py b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/node_op.py new file mode 100644 index 0000000000000000000000000000000000000000..1629caabd1989beac72646ea36efb4a82b328f3a --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/graph/node_op.py @@ -0,0 +1,37 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from enum import Enum +import re +from ..builder.msprobe_adapter import op_patterns + + +class NodeOp(Enum): + module = 0 + function_api = 1 + + @staticmethod + def get_node_op(node_name: str): + """ + 基于代表节点的字符串,解析节点种类 + """ + for op in NodeOp: + index = op.value + if index < 0 or index >= len(op_patterns): + raise Exception("NodeOp and op_patterns in MsprobeAdapter do not match") + pattern = op_patterns[index] + if re.match(pattern, node_name): + return op + raise Exception(f"Cannot parse node_name {node_name} into NodeOp") diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/test.py b/debug/accuracy_tools/msprobe/pytorch/visualization/test.py new file mode 100644 index 0000000000000000000000000000000000000000..165d54ce17ed295308c7fa52b4dc5251271453a8 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/test.py @@ -0,0 +1,85 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +import shutil +import filecmp +from .compare.graph_comparator import GraphComparator +from .utils import GraphConst +from .builder.graph_builder import GraphBuilder +from ...pytorch.common.log import logger +from ...core.common.file_check import create_directory + + +def compare_graph(dump_path_n, dump_path_b, out_path): + # 对两个数据进行构图 + construct_path_n = os.path.join(dump_path_n, GraphConst.CONSTRUCT_FILE) + construct_path_b = os.path.join(dump_path_b, GraphConst.CONSTRUCT_FILE) + data_path_n = os.path.join(dump_path_n, GraphConst.DUMP_FILE) + data_path_b = os.path.join(dump_path_b, GraphConst.DUMP_FILE) + graph_n = GraphBuilder.build(construct_path_n, data_path_n, 'TestNet') + graph_b = GraphBuilder.build(construct_path_b, data_path_b, 'TestNet') + # 基于graph、stack和data进行比较 + stack_path = os.path.join(dump_path_n, GraphConst.STACK_FILE) + graph_comparator = GraphComparator([graph_n, graph_b], [data_path_n, data_path_b], stack_path, out_path) + graph_comparator.compare() + output_path = os.path.join(out_path, 'compare.vis') + GraphBuilder.to_json(output_path, graph_n, graph_b, graph_comparator.ma.get_tool_tip()) + + +def build_graph(dump_path, out_path): + construct_path = os.path.join(dump_path, GraphConst.CONSTRUCT_FILE) + data_path = os.path.join(dump_path, GraphConst.DUMP_FILE) + output_path = os.path.join(out_path, 'build.vis') + graph = GraphBuilder.build(construct_path, data_path, 'TestNet') + GraphBuilder.to_json(output_path, graph) + + +def run_st(data_path): + start_time = time.time() + run_bench(data_path, 'output2') + end_time = time.time() + logger.info(f'run_st time cost: {end_time - start_time}') + # 比较output2的结果和output1 的bench结果差距 + for data_dir in os.listdir(data_path): + data_dir = os.path.join(data_path, data_dir) + if not os.path.isdir(data_dir): + continue + output1 = os.path.join(data_dir, 'output1') + output2 = os.path.join(data_dir, 'output2') + files = ['build.vis', 'compare.vis'] + for vis_file in files: + file1 = os.path.join(output1, vis_file) + file2 = os.path.join(output2, vis_file) + result = filecmp.cmp(file1, file2) + if result: + logger.info('pass ' + file1) + else: + logger.info('not pass ' + file1) + + +def run_bench(data_path, output_dir): + for data_dir in os.listdir(data_path): + data_dir = os.path.join(data_path, data_dir) + if not os.path.isdir(data_dir): + continue + run_data_path = os.path.join(data_dir, 'data') + output_path = os.path.join(data_dir, output_dir) + if os.path.exists(output_path): + shutil.rmtree(output_path) + create_directory(output_path) + build_graph(run_data_path, output_path) + compare_graph(run_data_path, run_data_path, output_path) diff --git a/debug/accuracy_tools/msprobe/pytorch/visualization/utils.py b/debug/accuracy_tools/msprobe/pytorch/visualization/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..fb046f9758686fe810a05b1a23d76880b86bb994 --- /dev/null +++ b/debug/accuracy_tools/msprobe/pytorch/visualization/utils.py @@ -0,0 +1,118 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json +from ...core.common.file_check import FileOpen +from ..compare.acc_compare import result_to_csv + + +def load_json_file(file_path): + """ + 加载json文件 + """ + try: + with FileOpen(file_path, 'r') as f: + file_dict = json.load(f) + if not isinstance(file_dict, dict): + return {} + return file_dict + except json.JSONDecodeError: + return {} + + +def load_data_json_file(file_path): + """ + 加载dump.json中的data字段 + """ + return load_json_file(file_path).get(GraphConst.DATA_KEY, {}) + + +def save_json_file(file_path, data): + """ + 保存json文件 + """ + with FileOpen(file_path, 'w') as f: + f.write(json.dumps(data, indent=4)) + + +def get_csv_df(md5_compare, summary_compare, stack, csv_data): + """ + 调用acc接口写入csv + """ + return result_to_csv(md5_compare, summary_compare, stack, csv_data, None) + + +def str2float(percentage_str): + """ + 百分比字符串转换转换为浮点型 + Args: + percentage_str: '0.00%', '23.4%' + Returns: float 0.00, 0.234 + """ + try: + percentage_str = percentage_str.strip('%') + return float(percentage_str) / 100 + except ValueError: + return 0 + + +class ToolTip: + MAX_DIFF = 'NPU与标杆API统计信息比对,最大值的差值' + MIN_DIFF = 'NPU与标杆API统计信息比对,最小值的差值' + MEAN_DIFF = 'NPU与标杆API统计信息比对,平均值的差值' + NORM_DIFF = 'NPU与标杆API统计信息比对,2范数(平方根)的差值' + MD5 = '数据MD5信息,用于比较两个数据信息是否完全一致' + ONE_THOUSANDTH_ERR_RATIO = 'Tensor中的元素逐个与对应的标杆数据对比,相对误差大于千分之一的比例占总元素个数的比例小于千分之一' + COSINE = '通过计算两个向量的余弦值来判断其相似度,数值越接近于1说明计算出的两个张量越相似,实际可接受阈值为大于0.99。在计算中可能会存在nan,主要由于可能会出现其中一个向量为0' + MAX_ABS_ERR = '当最大绝对误差越接近0表示其计算的误差越小,实际可接受阈值为小于0.001' + MAX_RELATIVE_ERR = '当最大相对误差越接近0表示其计算的误差越小。当dump数据中存在0或Nan时,比对结果中最大相对误差则出现inf或Nan的情况,属于正常现象' + SMALL_VALUE_TIP = '{} 小于1e-3,不计算相对误差' + + +class Suggestions: + Module = '此模块精度比对结果疑似异常,请使用ptdbg工具对模块中的api进行dump比对' + API = '此api精度比对结果疑似异常,请使用api accuracy checker工具对api进行精度检测' + PTDBG = 'ptdbg工具' + PTDBG_URL = 'https://gitee.com/ascend/att/tree/master/debug/accuracy_tools/ptdbg_ascend' + API_ACCURACY_CHECKER = 'api accuracy checker工具' + API_ACCURACY_CHECKER_URL = 'https://gitee.com/ascend/att/tree/master/debug/accuracy_tools/api_accuracy_checker' + + +class GraphConst: + CONSTRUCT_FILE = 'construct.json' + DUMP_FILE = 'dump.json' + STACK_FILE = 'stack.json' + GRAPH_FILE = 'graph.vis' + ERROR_KEY = 'error_key' + SUMMARY_COMPARE = 0 + MD5_COMPARE = 1 + REAL_DATA_COMPARE = 2 + JSON_NPU_KEY = 'NPU' + JSON_BENCH_KEY = 'Bench' + JSON_TIP_KEY = 'Tooltip' + JSON_MD5_KEY = 'md5 Compare Result' + JSON_ROOT_KEY = 'root' + JSON_NODE_KEY = 'node' + DATA_KEY = 'data' + REAL_DATA_TH = 0.1 + MAX_RELATIVE_ERR_TH = 0.5 + ROUND_TH = 6 + JSON_STATUS_KEY = 'precision_status' + JSON_INDEX_KEY = 'precision_index' + SUGGEST_KEY = 'text' + TAG_NA = 'na' + OUTPUT_INDEX = -2 + STR_MAX_LEN = 50 + SMALL_VALUE = 1e-3 diff --git a/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/dump_no_pt_no_ms.json b/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/dump_no_pt_no_ms.json new file mode 100644 index 0000000000000000000000000000000000000000..63a062d8ffa264a0254fc2bab0208dcf951ae094 --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/dump_no_pt_no_ms.json @@ -0,0 +1,3 @@ +{ + "task": "tensor" +} \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/ms_dump_no_framework.json b/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/ms_dump_no_framework.json new file mode 100644 index 0000000000000000000000000000000000000000..b223c74b2315af1b9454e5f1e70c29502d449c56 --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/ms_dump_no_framework.json @@ -0,0 +1,4 @@ +{ + "task": "tensor", + "type": "mindspore.float16" +} \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/pt_dump_no_framework.json b/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/pt_dump_no_framework.json new file mode 100644 index 0000000000000000000000000000000000000000..2444ae1fd4096b083a9e8a0e51c9166bb990f51f --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/core_ut/common/test_dump_file/pt_dump_no_framework.json @@ -0,0 +1,4 @@ +{ + "task": "tensor", + "type": "torch.float16" +} \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/core_ut/common/test_utils.py b/debug/accuracy_tools/msprobe/test/core_ut/common/test_utils.py index 3472ca9018e189ffb48e4d26cfeb79e1ba1ff16d..69abbf12b4c7823e87ead0d055f20f7a12ed5ff4 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/common/test_utils.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/common/test_utils.py @@ -1,7 +1,7 @@ #!/usr/bin/env python3 # -*- coding: utf-8 -*- """ -# Copyright (C) 2024-2024. Huawei Technologies Co., Ltd. All rights reserved. +# Copyright (C) 2024-2025. Huawei Technologies Co., Ltd. All rights reserved. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -18,11 +18,13 @@ import json import os import tempfile from datetime import datetime, timezone +import unittest from unittest import TestCase from unittest.mock import MagicMock, mock_open, patch import OpenSSL import numpy as np +from pathlib import Path from msprobe.core.common.const import Const from msprobe.core.common.file_utils import ( @@ -53,7 +55,8 @@ from msprobe.core.common.utils import (CompareException, recursion_depth_decorator, MsprobeBaseException, check_str_param, - is_json_file) + is_json_file, + detect_framework_by_dump_json) class TestUtils(TestCase): @@ -203,7 +206,7 @@ class TestUtils(TestCase): with self.assertRaises(CompareException) as context: set_dump_path(input_param) self.assertEqual(context.exception.code, CompareException.INVALID_PATH_ERROR) - mock_error.assert_called_with("Please check the json path is valid. npu_path: None, bench_path: bench_path") + mock_error.assert_called_with("Please check the json path is valid and ensure that neither npu_path nor bench_path is None.") @patch.object(logger, "error") def test_get_dump_mode(self, mock_error): @@ -488,3 +491,42 @@ class TestCheckCrtValid(TestCase): with self.assertRaises(RuntimeError) as context: check_crt_valid(self.cert_file_path) self.assertIn('The SSL certificate is invalid', str(context.exception)) + + +class TestDetectFrameworkByDumpJson(unittest.TestCase): + + @patch('msprobe.core.common.utils.load_json') + def test_valid_pytorch_framework(self, mock_load_json): + mock_load_json.return_value = {"framework": Const.PT_FRAMEWORK} + + result = detect_framework_by_dump_json("dummy_path") + + self.assertEqual(result, Const.PT_FRAMEWORK) + + @patch('msprobe.core.common.utils.load_json') + def test_valid_mindspore_framework(self, mock_load_json): + mock_load_json.return_value = {"framework": Const.MS_FRAMEWORK} + + result = detect_framework_by_dump_json("dummy_path") + + self.assertEqual(result, Const.MS_FRAMEWORK) + + def test_detect_framework_in_file(self): + self.current_dir = Path(__file__).parent + file_path = self.current_dir / "test_dump_file/pt_dump_no_framework.json" + result = detect_framework_by_dump_json(file_path) + self.assertEqual(result, Const.PT_FRAMEWORK) + + self.current_dir = Path(__file__).parent + file_path = self.current_dir / "test_dump_file/ms_dump_no_framework.json" + result = detect_framework_by_dump_json(file_path) + self.assertEqual(result, Const.MS_FRAMEWORK) + + @patch("msprobe.core.common.utils.logger") + def test_detect_framework_exception(self, mock_logger): + self.current_dir = Path(__file__).parent + file_path = self.current_dir / "test_dump_file/dump_no_pt_no_ms.json" + with self.assertRaises(CompareException) as context: + result = detect_framework_by_dump_json(file_path) + self.assertEqual(context.exception.code, CompareException.INVALID_PARAM_ERROR) + mock_logger.error.assert_called_once_with(f"{file_path} must be based on the MindSpore or PyTorch framework.") diff --git a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare.py b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare.py index b4566fcfe6f48d9040feb4dc22f3a96cd08719a7..94244be326e9954c700339abec2db16a2ab31b07 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare.py @@ -11,7 +11,7 @@ import torch from msprobe.core.common.const import CompareConst, Const from msprobe.core.common.utils import CompareException -from msprobe.core.compare.acc_compare import Comparator, ModeConfig, get_bench_data_name +from msprobe.core.compare.acc_compare import Comparator, ModeConfig from msprobe.core.compare.highlight import find_error_rows, find_compare_result_error_rows, ApiBatch from msprobe.core.compare.utils import get_accuracy from msprobe.pytorch.compare.pt_compare import PTComparator @@ -159,16 +159,16 @@ aten_result = [ -10.640625, -0.008758544921875, 5.397906303405762, -5.796811580657959, 2.5283952709287405e-10, 'Warning', 'Need double check api accuracy.', 'None'], ['Aten__native_batch_norm_legit_functional.default_0_forward.output.1', 'Nan', 'torch.float32', 'Nan', [256], 'Nan', - ' ', ' ', ' ', ' ', ' ', 0.30550330877304077, -0.24485322833061218, -0.010361209511756897, 'Nan', 'Nan', 'Nan', + ' ', ' ', ' ', ' ', ' ', ' ', 0.30550330877304077, -0.24485322833061218, -0.010361209511756897, 'Nan', 'Nan', 'Nan', 'Yes', '', 'None'], ['Aten__native_batch_norm_legit_functional.default_0_forward.output.2', 'Nan', 'torch.float32', 'Nan', [256], 'Nan', - ' ', ' ', ' ', ' ', ' ', 623.9192504882812, 432.96826171875, 520.2276611328125, 'Nan', 'Nan', 'Nan', + ' ', ' ', ' ', ' ', ' ', ' ', 623.9192504882812, 432.96826171875, 520.2276611328125, 'Nan', 'Nan', 'Nan', 'Yes', '', 'None'], ['Aten__native_batch_norm_legit_functional.default_0_forward.output.3', 'Nan', 'torch.float32', 'Nan', [256], 'Nan', - ' ', ' ', ' ', ' ', ' ', 2.4797861576080322, -3.055997371673584, -0.04795549064874649, 'Nan', 'Nan', 'Nan', + ' ', ' ', ' ', ' ', ' ', ' ', 2.4797861576080322, -3.055997371673584, -0.04795549064874649, 'Nan', 'Nan', 'Nan', 'Yes', '', 'None'], ['Aten__native_batch_norm_legit_functional.default_0_forward.output.4', 'Nan', 'torch.float32', 'Nan', [256], 'Nan', - ' ', ' ', ' ', ' ', ' ', 61.7945556640625, 42.59713363647461, 52.03831481933594, 'Nan', 'Nan', 'Nan', + ' ', ' ', ' ', ' ', ' ', ' ', 61.7945556640625, 42.59713363647461, 52.03831481933594, 'Nan', 'Nan', 'Nan', 'Yes', '', 'None']] highlight_dict = {'red_rows': [], 'yellow_rows': []} @@ -191,17 +191,21 @@ summary_line_3 = ['Functional_batch_norm_0_forward.output.2', 'Functional_batch_ 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0, 0, 0, 0, 2, 0, 1, 1, 1, 1, 1, 1, 'Warning', ''] line_input = ['Functional.batch.norm.0.forward.input.0', 'Functional.batch.norm.0.forward.input.0', 'torch.float16', - 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 1, 1, 1, 0.95, 1, 1, 1, 1, 1, 1.01, 1, 1, 1, + 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 1, 0.5, 1, 1, 0.95, 1, + 1, 1, 1, 1, 1.01, 1, 1, 1, 'Yes', ''] line_1 = ['Functional.batch.norm.0.forward.output.0', 'Functional.batch.norm.0.forward.output.0', 'torch.float16', - 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0.8, 1, 1, 0.59, 1, 'nan', 0, 1, 1, 19, 1, 1, 1, - 'Warning', ''] + 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0.8, 0.5, 1, 1, 0.59, 1, + 'nan', 0, 1, 1, 19, 1, 1, 1, + 'Yes', ''] line_2 = ['Functional.batch.norm.0.forward.output.1', 'Functional.batch.norm.0.forward.output.1', 'torch.float16', - 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0.9, 1, 1, 0.8, 1, 0, 0.12, 0, 1, 1, 0.1, 1, 1, 1, - 'Warning', ''] + 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0.9, 0.5, 1, 1, 0.8, 1, + 0, 0.12, 0, 1, 1, 0.1, 1, 1, + 'Yes', ''] line_3 = ['Functional.batch.norm.0.forward.output.2', 'Functional.batch.norm.0.forward.output.2', 'torch.float16', - 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0.8, 1.1e+10, 1, 0.85, 1, 9, 0.12, 0, 1, 1, 0.1, 1, - 1, 1, 'Warning', ''] + 'torch.float32', [256, 256, 14, 14], [256, 256, 14, 14], 0.8, 0.5, 1.1e+10, 1, 0.85, 1, + 9, 0.12, 0, 1, 1, 0.1, 1, 1, + 'Yes', ''] op_data = { 'input_args': [{'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [16, 1, 3, 3], @@ -363,7 +367,7 @@ class TestUtilsMethods(unittest.TestCase): 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', 'File']] result_all = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', - 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', + 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', 'File', '-1']] columns_md5_stack_mode_true = CompareConst.MD5_COMPARE_RESULT_HEADER + ['NPU_Stack_Info'] result_table_md5_true = pd.DataFrame(result_md5, columns=columns_md5_stack_mode_true, dtype=object) @@ -403,10 +407,10 @@ class TestUtilsMethods(unittest.TestCase): 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '']] result_all_test = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', - 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', + 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '', '-1']] result_all = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', - 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', + 'torch.float32', 'torch.float32', [2, 2], [2, 2], '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1']] columns_md5_stack_mode_true = CompareConst.MD5_COMPARE_RESULT_HEADER result_table_md5_true = pd.DataFrame(result_md5, columns=columns_md5_stack_mode_true, dtype='object') @@ -632,11 +636,11 @@ class TestUtilsMethods(unittest.TestCase): def test_do_multi_process(self): data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1']] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', ['-1', '-1']]] o_data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', - 'torch.float32', 'torch.float32', [2, 2], [2, 2], 'unsupported', 'unsupported', 'unsupported', - 'unsupported', 'unsupported', - 1, 1, 1, 1, 1, 1, 1, 1, 'None', 'No bench data matched.', '-1']] + 'torch.float32', 'torch.float32', [2, 2], [2, 2], + 'unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', + 1, 1, 1, 1, 1, 1, 1, 1, 'None', 'No bench data matched.', ['-1', '-1']]] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) o_result = pd.DataFrame(o_data, columns=columns) @@ -666,10 +670,10 @@ class TestUtilsMethods(unittest.TestCase): mode_config = ModeConfig(stack_mode, auto_analyze, fuzzy_match, dump_mode) pt_comparator = PTComparator(mode_config) - result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param, {}) + result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param) self.assertEqual(result, ['unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', - 'No bench data matched.']) + 'unsupported', 'No bench data matched.']) def test_compare_by_op_2(self): npu_op_name = 'Functional.linear.0.forward.input.0' @@ -684,42 +688,22 @@ class TestUtilsMethods(unittest.TestCase): pt_comparator = PTComparator(mode_config) pt_name = '-1' - pt_path = os.path.join(base_dir, pt_name) - op_name_mapping_dict = {'Functional.linear.0.forward.input.0': [pt_path, pt_path]} + op_name_mapping_dict = {'Functional.linear.0.forward.input.0': [pt_name, pt_name]} input_param = {'npu_dump_data_dir': base_dir, 'bench_dump_data_dir': base_dir} - result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param, - {'Functional.linear.0.forward': {'input_args': [ - {'data_name': 'Functional.linear.0.forward.input.0.pt'}]}}) + result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param) self.assertEqual(result, ['unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', - f'Dump file: {pt_path} not found.']) + 'unsupported', 'No bench data matched.']) pt_name = 'Functional.linear.0.forward.input.0.pt' - pt_path = os.path.join(base_dir, pt_name) - op_name_mapping_dict = {'Functional.linear.0.forward.input.0': [pt_path, pt_path]} + op_name_mapping_dict = {'Functional.linear.0.forward.input.0': [pt_name, pt_name]} input_param = {'npu_dump_data_dir': base_dir, 'bench_dump_data_dir': base_dir} - result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param, {}) + result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param) self.assertEqual(result, ['unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', - 'Bench does not have data file.']) + 'unsupported', 'Dump file: Functional.linear.0.forward.input.0.pt not found.']) generate_pt(base_dir) - result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param, - {'Functional.linear.0.forward': {'input_args': [ - {'data_name': 'Functional.linear.0.forward.input.0.pt'}]}}) - self.assertEqual(result, [1.0, 0.0, 0.0, 1.0, 1.0, '']) - - def test_get_bench_data_name_input(self): - bench_op_name = "Functional.linear.0.forward.input.0" - bench_data = {"Functional.linear.0.forward": {"input_args": [{"data_name": "Functional.linear.0.forward.input.0.pt"}], "input_kwargs": {}, "output": []}} - result = get_bench_data_name(bench_op_name, bench_data) - - self.assertEqual(result, "Functional.linear.0.forward.input.0.pt") - - def test_get_bench_data_name_output(self): - bench_op_name = "Functional.linear.0.forward.output.0" - bench_data = {"Functional.linear.0.forward": {"input_args": [], "input_kwargs": {}, "output": [{"data_name": "Functional.linear.0.forward.output.0.pt"}]}} - result = get_bench_data_name(bench_op_name, bench_data) - - self.assertEqual(result, "Functional.linear.0.forward.output.0.pt") + result = pt_comparator.compare_by_op(npu_op_name, bench_op_name, op_name_mapping_dict, input_param) + self.assertEqual(result, [1.0, 0.0, 0.0, 0.0, 1.0, 1.0, '']) class TestComparator(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_npy_compare.py b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_npy_compare.py index aec6cdc51173ae817f32dd76455bec645659b45c..da315b657c8c1fc691136a1dbc56574d69c92076 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_npy_compare.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_npy_compare.py @@ -20,7 +20,7 @@ from unittest.mock import patch from msprobe.core.common.const import CompareConst from msprobe.core.compare.npy_compare import handle_inf_nan, reshape_value, get_error_flag_and_msg, \ npy_data_check, statistics_data_check, get_relative_err, GetCosineSimilarity, GetMaxAbsErr, GetMaxRelativeErr, \ - GetErrRatio, error_value_process, compare_ops_apply + GetErrRatio, error_value_process, compare_ops_apply, GetEuclideanDistance op_name = 'Functional.conv2d.0.backward.input.0' @@ -113,7 +113,7 @@ class TestUtilsMethods(unittest.TestCase): n_value, b_value, error_flag, err_msg = get_error_flag_and_msg(n_value, b_value, error_flag=error_flag) self.assertFalse(error_flag) - self.assertEqual(err_msg, "This is type of 0-d tensor, can not calculate 'Cosine', " + self.assertEqual(err_msg, "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', " "'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. ") def test_get_error_flag_and_msg_shape_unmatch(self): @@ -239,15 +239,17 @@ class TestUtilsMethods(unittest.TestCase): b_value_1 = np.array(1) relative_err = get_relative_err(n_value_1, b_value_1) n_value_1, b_value_1 = reshape_value(n_value_1, b_value_1) - result, err_msg = op.apply(n_value_1, b_value_1, relative_err) + err_msg = "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. " + result, err_msg = op.apply(n_value_1, b_value_1, relative_err, err_msg) self.assertEqual(result, CompareConst.UNSUPPORTED) - self.assertEqual(err_msg, "") + self.assertEqual(err_msg, "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. ") n_value_2 = np.array([1, 2]) b_value_2 = np.array([1, 2]) relative_err = get_relative_err(n_value_2, b_value_2) n_value_2, b_value_2 = reshape_value(n_value_2, b_value_2) - result, err_msg = op.apply(n_value_2, b_value_2, relative_err) + err_msg = "" + result, err_msg = op.apply(n_value_2, b_value_2, relative_err, err_msg) self.assertEqual(result, 1.0) self.assertEqual(err_msg, "") @@ -255,7 +257,8 @@ class TestUtilsMethods(unittest.TestCase): b_value_3 = np.array([0, 0]) relative_err = get_relative_err(n_value_3, b_value_3) n_value_3, b_value_3 = reshape_value(n_value_3, b_value_3) - result, err_msg = op.apply(n_value_3, b_value_3, relative_err) + err_msg = "" + result, err_msg = op.apply(n_value_3, b_value_3, relative_err, err_msg) self.assertEqual(result, 1.0) self.assertEqual(err_msg, "") @@ -263,7 +266,8 @@ class TestUtilsMethods(unittest.TestCase): b_value_4 = np.array([1, 2]) relative_err = get_relative_err(n_value_4, b_value_4) n_value_4, b_value_4 = reshape_value(n_value_4, b_value_4) - result, err_msg = op.apply(n_value_4, b_value_4, relative_err) + err_msg = "" + result, err_msg = op.apply(n_value_4, b_value_4, relative_err, err_msg) self.assertEqual(result, CompareConst.NAN) self.assertEqual(err_msg, 'Cannot compare by Cosine Similarity, All the data is Zero in npu dump data.') @@ -271,7 +275,8 @@ class TestUtilsMethods(unittest.TestCase): b_value_5 = np.array([0, 0]) relative_err = get_relative_err(n_value_5, b_value_5) n_value_5, b_value_5 = reshape_value(n_value_5, b_value_5) - result, err_msg = op.apply(n_value_5, b_value_5, relative_err) + err_msg = "" + result, err_msg = op.apply(n_value_5, b_value_5, relative_err, err_msg) self.assertEqual(result, CompareConst.NAN) self.assertEqual(err_msg, 'Cannot compare by Cosine Similarity, All the data is Zero in Bench dump data.') @@ -282,7 +287,9 @@ class TestUtilsMethods(unittest.TestCase): b_value_1 = np.array([1]) relative_err = get_relative_err(n_value_1, b_value_1) n_value_1, b_value_1 = reshape_value(n_value_1, b_value_1) - result, err_msg = op.apply(n_value_1, b_value_1, relative_err) + err_msg = "" + + result, err_msg = op.apply(n_value_1, b_value_1, relative_err, err_msg) self.assertEqual(result, CompareConst.UNSUPPORTED) self.assertEqual(err_msg, "This is a 1-d tensor of length 1.") @@ -294,8 +301,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([1, 1]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, CompareConst.NAN) self.assertEqual(err_msg, "Cannot compare by Cosine Similarity, the dump data has NaN.") @@ -319,8 +327,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([0, 0]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, 2.0) self.assertEqual(err_msg, "") @@ -333,8 +342,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([1, 1]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, CompareConst.NAN) self.assertEqual(err_msg, "Cannot compare by MaxAbsError, the data contains nan/inf/-inf in dump data.") @@ -347,8 +357,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([1, 1]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, 1.0) self.assertEqual(err_msg, "") @@ -361,8 +372,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([1, 1]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, CompareConst.NAN) self.assertEqual(err_msg, "Cannot compare by MaxRelativeError, the data contains nan/inf/-inf in dump data.") @@ -375,8 +387,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([1, 1]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, 0.5) self.assertEqual(err_msg, "") @@ -387,11 +400,12 @@ class TestUtilsMethods(unittest.TestCase): n_value = np.array(1) # 标量 b_value = np.array(1) relative_err = np.array(0) + err_msg = "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. " - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, CompareConst.UNSUPPORTED) - self.assertEqual(err_msg, "") + self.assertEqual(err_msg, "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. ") def test_GetThousandErrRatio_not_size(self): op = GetErrRatio(CompareConst.THOUSAND_RATIO_THRESHOLD) @@ -399,8 +413,9 @@ class TestUtilsMethods(unittest.TestCase): n_value = np.array([1, 2]) b_value = np.array([1, 2]) relative_err = np.array([]) # 空数组 + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, CompareConst.NAN) self.assertEqual(err_msg, "") @@ -412,8 +427,9 @@ class TestUtilsMethods(unittest.TestCase): b_value = np.array([1, 1]) relative_err = get_relative_err(n_value, b_value) n_value, b_value = reshape_value(n_value, b_value) + err_msg = "" - result, err_msg = op.apply(n_value, b_value, relative_err) + result, err_msg = op.apply(n_value, b_value, relative_err, err_msg) self.assertEqual(result, 0.5) self.assertEqual(err_msg, "") @@ -471,5 +487,34 @@ class TestUtilsMethods(unittest.TestCase): error_flag = False err_msg = '' a, b = compare_ops_apply(n_value, b_value, error_flag, err_msg) - self.assertEqual(a, [1.0, 0.0, 0.0, 1.0, 1.0]) + self.assertEqual(a, [1.0, 0.0, 0.0, 0.0, 1.0, 1.0]) self.assertEqual(b, '') + + +class TestGetEuclideanDistance(unittest.TestCase): + + def setUp(self): + self.euc_distance = GetEuclideanDistance() + + def test_euclidean_distance_normal(self): + # 测试计算两个张量之间的欧式距离 + n_value = np.array([1, 2, 3]) + b_value = np.array([4, 5, 6]) + relative_err = None + err_msg = "" + + result, msg = self.euc_distance.apply(n_value, b_value, relative_err, err_msg) + expected_distance = np.linalg.norm(n_value - b_value) + self.assertEqual(result, expected_distance) + self.assertEqual(msg, '') + + def test_euclidean_distance_0d_tensor(self): + # 测试计算两个张量之间的欧式距离 + n_value = np.array(1) + b_value = np.array(1) + relative_err = None + err_msg = "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. " + + result, msg = self.euc_distance.apply(n_value, b_value, relative_err, err_msg) + self.assertEqual(result, CompareConst.UNSUPPORTED) + self.assertEqual(msg, "This is type of 0-d tensor, can not calculate 'Cosine', 'EucDist', 'One Thousandth Err Ratio' and 'Five Thousandths Err Ratio'. ") diff --git a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_utils.py b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_utils.py index ab8703dcd353ff32dc0722fc314ade6042d6f567..bf23f4de1dac73a44a2497e1a927ba30e5440715 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_utils.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_acc_compare_utils.py @@ -221,28 +221,34 @@ o_result_unmatch_2 = [ 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None'] ] o_result_unmatch_3 = [ - ['Functional.conv2d.0.forward.input.0', 'N/A', 'torch.float32', 'N/A', [1, 1, 28, 28], 'N/A', 'N/A', 'N/A', 'N/A', - 'N/A', 'N/A', 3.029174327850342, -2.926689624786377, -0.06619918346405029, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', - 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.forward.input.1', 'N/A', 'torch.float32', 'N/A', [16, 1, 5, 5], 'N/A', 'N/A', 'N/A', 'N/A', - 'N/A', 'N/A', 0.19919930398464203, -0.19974489510059357, 0.006269412115216255, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', - 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.forward.input.2', 'N/A', 'torch.float32', 'N/A', [16], 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', - 'N/A', 0.19734230637550354, -0.18177609145641327, 0.007903944700956345, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', - 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.forward.parameters.weight', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', 'N/A', - 'N/A', 'N/A', - 'N/A', 'N/A', 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.forward.parameters.bias', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', 'N/A', 'N/A', - 'N/A', - 'N/A', 'N/A', 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.forward.output.0', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', 'N/A', 'N/A', 'N/A', - 'N/A', 'N/A', 2.1166646480560303, -2.190781354904175, -0.003579073818400502, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', - 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.parameters_grad.weight', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', 'N/A', 'N/A', 'N/A', - 'N/A', 'N/A', 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', '-1'], - ['Functional.conv2d.0.parameters_grad.bias', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', 'N/A', 'N/A', 'N/A', - 'N/A', 'N/A', 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', '-1'] + ['Functional.conv2d.0.forward.input.0', 'N/A', 'torch.float32', 'N/A', [1, 1, 28, 28], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 3.029174327850342, -2.926689624786377, -0.06619918346405029, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.forward.input.1', 'N/A', 'torch.float32', 'N/A', [16, 1, 5, 5], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 0.19919930398464203, -0.19974489510059357, 0.006269412115216255, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.forward.input.2', 'N/A', 'torch.float32', 'N/A', [16], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 0.19734230637550354, -0.18177609145641327, 0.007903944700956345, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.forward.parameters.weight', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.forward.parameters.bias', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.forward.output.0', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 2.1166646480560303, -2.190781354904175, -0.003579073818400502, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.parameters_grad.weight', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', ['-1', '-1']], + ['Functional.conv2d.0.parameters_grad.bias', 'N/A', 'torch.float32', 'N/A', [1, 16, 28, 28], 'N/A', + 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', + 1.0, 1.0, 1.0, 1.0, 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'No bench data matched.', 'None', ['-1', '-1']] ] # test_merge_tensor @@ -558,7 +564,7 @@ class TestUtilsMethods(unittest.TestCase): dump_mode = Const.ALL result_item = result_item_init(n_info, b_info, dump_mode) self.assertEqual(result_item, ['Tensor.add.0.forward.input.0', 'Tensor.add.0.forward.input.0', - 'torch.float32', 'torch.float32', [96], [96], ' ', ' ', ' ', ' ', ' ']) + 'torch.float32', 'torch.float32', [96], [96], ' ', ' ', ' ', ' ', ' ', ' ']) dump_mode = Const.SUMMARY result_item = result_item_init(n_info, b_info, dump_mode) diff --git a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_highlight.py b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_highlight.py index f561a3e05ec84c3ee75dac50ed5aec2a2af7f7b5..3261bce5d6d0a15d8e46c7d9fc22df0cf64c9e4d 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_highlight.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_highlight.py @@ -26,7 +26,7 @@ def generate_result_xlsx(base_dir): data_path = os.path.join(base_dir, 'target_result.xlsx') data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] ] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) @@ -101,8 +101,8 @@ class TestUtilsMethods(unittest.TestCase): self.assertEqual(result, None) def test_CheckOneThousandErrorRatio_str(self): - api_in = [1, 1, 1, 1, 1, 1, 1, 1, 1, "unsupported"] - api_out = [1, 1, 1, 1, 1, 1, 1, 1, 1, "unsupported"] + api_in = [1, 1, 1, 1, 1, 1, 0.9, 0.5, 1, 1, "unsupported"] + api_out = [1, 1, 1, 1, 1, 1, 0.9, 0.5, 1, 1, "unsupported"] info = (api_in, api_out, 1) color_columns = () dump_mode = Const.ALL @@ -113,8 +113,8 @@ class TestUtilsMethods(unittest.TestCase): @patch("msprobe.core.compare.highlight.add_highlight_row_info") def test_CheckOneThousandErrorRatio_red(self, mock_add_highlight_row_info): - api_in = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] - api_out = [1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5] + api_in = [1, 1, 1, 1, 1, 1, 0.9, 0.5, 1, 1, 1] + api_out = [1, 1, 1, 1, 1, 1, 0.9, 0.5, 1, 1, 0.5] info = (api_in, api_out, 1) ColorColumns = namedtuple('ColorColumns', ['red', 'yellow']) color_columns = ColorColumns(red=[], yellow=[]) @@ -315,7 +315,7 @@ class TestUtilsMethods(unittest.TestCase): columns = CompareConst.COMPARE_RESULT_HEADER data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', ''] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', ''] ] result_df = pd.DataFrame(data, columns=columns) @@ -329,7 +329,7 @@ class TestUtilsMethods(unittest.TestCase): def test_highlight_rows_xlsx_red(self): data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] ] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) @@ -342,7 +342,7 @@ class TestUtilsMethods(unittest.TestCase): def test_highlight_rows_xlsx_yellow(self): data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] ] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) @@ -356,7 +356,7 @@ class TestUtilsMethods(unittest.TestCase): def test_highlight_rows_xlsx_malicious_columns(self, mock_save_book): data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] ] columns = CompareConst.COMPARE_RESULT_HEADER + ['=Data_name'] result_df = pd.DataFrame(data, columns=columns) @@ -378,10 +378,10 @@ class TestUtilsMethods(unittest.TestCase): def test_highlight_rows_xlsx_malicious_type(self, mock_save_book): data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', '=torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'], + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'], ['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', '=torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] ] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) @@ -416,10 +416,10 @@ class TestUtilsMethods(unittest.TestCase): def test_update_highlight_err_msg(self): data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'], + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'], ['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', '', '-1'] ] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) @@ -433,10 +433,10 @@ class TestUtilsMethods(unittest.TestCase): t_data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', 'a\nb', '-1'], + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', 'a\nb', '-1'], ['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', 'd', '-1'] + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, 'Yes', 'd', '-1'] ] target_result_df = pd.DataFrame(t_data, columns=columns) self.assertTrue(result_df.equals(target_result_df)) diff --git a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_multiprocessing_compute.py b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_multiprocessing_compute.py index 9c2dea835fea13af7902bf796d9ab06c9eb6a61b..49f084ce07c8e90afb2aa1c3340bb4c3965c8fa7 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_multiprocessing_compute.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/compare/test_cmp_multiprocessing_compute.py @@ -16,14 +16,14 @@ from test_acc_compare import generate_dump_json data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - '', '', '', '', '', + '', '', '', '', '', '', 1, 1, 1, 1, 1, 1, 1, 1, - 'Yes', '', '-1']] + 'Yes', '', ['-1', '-1']]] o_data = [['Functional.linear.0.forward.input.0', 'Functional.linear.0.forward.input.0', 'torch.float32', 'torch.float32', [2, 2], [2, 2], - 'unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', + 'unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', 'unsupported', 1, 1, 1, 1, 1, 1, 1, 1, - 'None', 'No bench data matched.', '-1']] + 'None', 'No bench data matched.', ['-1', '-1']]] columns = CompareConst.COMPARE_RESULT_HEADER + ['Data_name'] result_df = pd.DataFrame(data, columns=columns) o_result = pd.DataFrame(o_data, columns=columns) @@ -34,9 +34,9 @@ class TestUtilsMethods(unittest.TestCase): def setUp(self): self.result_df = pd.DataFrame(columns=[ - CompareConst.COSINE, CompareConst.MAX_ABS_ERR, CompareConst.MAX_RELATIVE_ERR, - CompareConst.ERROR_MESSAGE, CompareConst.ACCURACY, - CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO + CompareConst.COSINE, CompareConst.EUC_DIST, CompareConst.MAX_ABS_ERR, CompareConst.MAX_RELATIVE_ERR, + CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO, + CompareConst.ACCURACY, CompareConst.ERROR_MESSAGE ]) os.makedirs(base_dir, mode=0o750, exist_ok=True) self.lock = threading.Lock() @@ -54,9 +54,9 @@ class TestUtilsMethods(unittest.TestCase): func = Comparator(mode_config).compare_ops generate_dump_json(base_dir) - input_parma = {'bench_json_path': os.path.join(base_dir, 'dump.json')} + input_param = {'bench_json_path': os.path.join(base_dir, 'dump.json')} lock = multiprocessing.Manager().RLock() - result = _handle_multi_process(func, input_parma, result_df, lock) + result = _handle_multi_process(func, input_param, result_df, lock) self.assertTrue(result.equals(o_result)) def test_read_dump_data(self): @@ -72,9 +72,10 @@ class TestUtilsMethods(unittest.TestCase): cos_result=[0.99, 0.98], max_err_result=[0.01, 0.02], max_relative_err_result=[0.001, 0.002], - err_msgs=['', 'Error in comparison'], + euc_dist_result=[0.5, 0.49], one_thousand_err_ratio_result=[0.1, 0.2], - five_thousand_err_ratio_result=[0.05, 0.1] + five_thousand_err_ratio_result=[0.05, 0.1], + err_msgs=['', 'Error in comparison'] ) offset = 0 updated_df = _save_cmp_result(offset, comparison_result, self.result_df, self.lock) @@ -88,9 +89,10 @@ class TestUtilsMethods(unittest.TestCase): cos_result=[0.99], max_err_result=[], max_relative_err_result=[0.001], - err_msgs=[''], + euc_dist_result=[0.5], one_thousand_err_ratio_result=[0.1], - five_thousand_err_ratio_result=[0.05] + five_thousand_err_ratio_result=[0.05], + err_msgs=[''] ) with self.assertRaises(CompareException) as context: _save_cmp_result(0, comparison_result, self.result_df, self.lock) diff --git a/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_mindspore_processor.py b/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_mindspore_processor.py index b593d34c5d86c7fb3b4a0e8a3ff548c55555e09d..7406e0d1cc7f896a8e9beb4839d1cf4603cff2f3 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_mindspore_processor.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_mindspore_processor.py @@ -66,15 +66,6 @@ class TestMindsporeDataProcessor(unittest.TestCase): self.assertEqual(result.mean, 2.0) self.assertEqual(result.norm, ms.ops.norm(tensor).item()) - def test_get_stat_info_float_async(self): - self.config.async_dump = True - tensor = ms.tensor([1.0, 2.0, 3.0]) - result = self.processor.get_stat_info(tensor).stack_tensor_stat[1] - self.assertEqual(result[0].item(), 3.0) - self.assertEqual(result[1].item(), 1.0) - self.assertEqual(result[2].item(), 2.0) - self.assertEqual(result[3].item(), ms.ops.norm(tensor).item()) - def test_get_stat_info_int(self): self.config.async_dump = False tensor = ms.Tensor([1, 2, 3], dtype=ms.int32) @@ -84,13 +75,6 @@ class TestMindsporeDataProcessor(unittest.TestCase): self.assertEqual(result.mean, 2) self.assertEqual(result.norm, ms.ops.norm(tensor).item()) - def test_get_stat_info_int_async(self): - self.config.async_dump = True - tensor = ms.tensor([1, 2, 3]) - result = self.processor.get_stat_info(tensor).stack_tensor_stat[1] - self.assertEqual(result[0].item(), 3.0) - self.assertEqual(result[1].item(), 1.0) - def test_get_stat_info_bool(self): self.config.async_dump = False tensor = ms.Tensor([True, False, True]) @@ -100,64 +84,6 @@ class TestMindsporeDataProcessor(unittest.TestCase): self.assertIsNone(result.mean) self.assertIsNone(result.norm) - def test_get_stat_info_bool_async(self): - self.config.async_dump = True - tensor = ms.Tensor([True, False, True]) - result = self.processor.get_stat_info(tensor).stack_tensor_stat[1] - self.assertEqual(result[0].item(), True) - self.assertEqual(result[1].item(), False) - - @patch.object(MindsporeDataProcessor, 'get_md5_for_tensor') - def test__analyze_tensor(self, get_md5_for_tensor): - get_md5_for_tensor.return_value = "test_md5" - tensor = ms.Tensor(np.array([1, 2, 3], dtype=np.int32)) - self.config.summary_mode = 'md5' - self.config.async_dump = False - suffix = "test_tensor" - expected_result = { - 'type': 'mindspore.Tensor', - 'dtype': 'Int32', - 'shape': (3,), - 'Max': 3, - 'Min': 1, - 'Mean': 2, - 'Norm': ms.ops.norm(tensor).item(), - 'md5': 'test_md5', - } - result = self.processor._analyze_tensor(tensor, suffix) - self.assertEqual(result, expected_result) - - -class TestTensorDataProcessor(unittest.TestCase): - - def setUp(self): - self.config = MagicMock() - self.data_writer = MagicMock() - self.processor = TensorDataProcessor(self.config, self.data_writer) - self.data_writer.dump_tensor_data_dir = "./dump_data" - self.processor.current_api_or_module_name = "test_api" - self.processor.api_data_category = "input" - - @patch('msprobe.core.data_dump.data_processor.mindspore_processor.save_tensor_as_npy') - def test_analyze_tensor(self, mock_save): - self.config.framework = "mindspore" - self.config.async_dump = False - tensor = ms.Tensor([1.0, 2.0, 3.0]) - suffix = 'suffix' - result = self.processor._analyze_tensor(tensor, suffix) - mock_save.assert_called_once() - expected = { - 'type': 'mindspore.Tensor', - 'dtype': str(tensor.dtype), - 'shape': tensor.shape, - 'Max': 3.0, - 'Min': 1.0, - 'Mean': 2.0, - 'Norm': ms.ops.norm(tensor).item(), - 'data_name': 'test_api.input.suffix.npy' - } - self.assertEqual(expected, result) - class TestOverflowCheckDataProcessor(unittest.TestCase): def setUp(self): @@ -218,57 +144,6 @@ class TestOverflowCheckDataProcessor(unittest.TestCase): self.data_processor.overflow_nums = 3 self.assertFalse(self.data_processor.is_terminated) - def test__analyze_maybe_overflow_tensor(self): - self.data_processor.has_overflow = False - tensor_json = {"Max": None, "Min": 0} - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertFalse(self.data_processor.has_overflow) - tensor_json.update({"Max": -np.inf}) - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.data_processor.has_overflow) - self.data_processor.has_overflow = False - tensor_json.update({"Max": np.inf}) - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.data_processor.has_overflow) - self.data_processor.has_overflow = False - tensor_json.update({"Max": np.nan}) - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.data_processor.has_overflow) - tensor_json.update({"Max": 0}) - self.data_processor.has_overflow = False - tensor_json.update({"Min": -np.inf}) - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.data_processor.has_overflow) - self.data_processor.has_overflow = False - tensor_json.update({"Min": np.inf}) - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.data_processor.has_overflow) - self.data_processor.has_overflow = False - tensor_json.update({"Min": np.nan}) - self.data_processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.data_processor.has_overflow) - - @patch("msprobe.core.data_dump.data_processor.mindspore_processor.logger.warning") - @patch.object(OverflowCheckDataProcessor, "get_save_file_path") - @patch.object(MindsporeDataProcessor, "_analyze_tensor") - def test__analyze_tensor(self, mock_super, mock_get_file_path, mock_warning): - mock_get_file_path.return_value = ("dump_data_name", "file_path") - single_arg = {"Max": None} - mock_super.return_value = single_arg - - with patch("msprobe.core.data_dump.data_processor.mindspore_processor.path_len_exceeds_limit", - return_value=False): - ret = self.data_processor._analyze_tensor("tensor", "suffix") - self.assertEqual(self.data_processor.cached_tensors_and_file_paths, {"file_path": "tensor"}) - mock_warning.assert_not_called() - mock_super.assert_called_with("tensor", "suffix") - self.assertEqual(ret.get("Max"), None) - self.assertEqual(ret.get("data_name"), "dump_data_name") - - with patch("msprobe.core.data_dump.data_processor.mindspore_processor.path_len_exceeds_limit", - return_value=True): - self.data_processor._analyze_tensor("tensor", "suffix") - mock_warning.assert_called_with("The file path file_path length exceeds limit.") class TestKernelDumpDataProcessor(unittest.TestCase): def setUp(self): diff --git a/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_pytorch_processor.py b/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_pytorch_processor.py index 34064e7cc2b9d0aa5c0c2e98806b8993137a589c..5cf644526be92f8788c2ef00761d0875fccc80d7 100644 --- a/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_pytorch_processor.py +++ b/debug/accuracy_tools/msprobe/test/core_ut/data_dump/data_processor/test_pytorch_processor.py @@ -1,3 +1,19 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +# Copyright (C) 2024-2025. Huawei Technologies Co., Ltd. All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +""" import hashlib import os import sys @@ -19,6 +35,7 @@ from msprobe.core.data_dump.data_processor.pytorch_processor import ( KernelDumpDataProcessor ) from torch import distributed as dist +from torch._subclasses import FakeTensorMode class TestPytorchDataProcessor(unittest.TestCase): @@ -62,6 +79,15 @@ class TestPytorchDataProcessor(unittest.TestCase): result = PytorchDataProcessor.get_stat_info(mock_data) self.assertIsInstance(result, TensorStatInfo) + def test_get_stat_info_with_fake_tensor(self): + with FakeTensorMode() as fake_tensor_mode: + fake_tensor = fake_tensor_mode.from_tensor(torch.randn(1, 2, 3)) + result = PytorchDataProcessor.get_stat_info(fake_tensor) + self.assertIsNone(result.max) + self.assertIsNone(result.min) + self.assertIsNone(result.mean) + self.assertIsNone(result.norm) + def test_get_stat_info_float(self): tensor = torch.tensor([1.0, 2.0, 3.0]) result = self.processor.get_stat_info(tensor) @@ -70,14 +96,6 @@ class TestPytorchDataProcessor(unittest.TestCase): self.assertEqual(result.mean, 2.0) self.assertEqual(result.norm, torch.norm(tensor).item()) - def test_get_stat_info_float_async(self): - tensor = torch.tensor([1.0, 2.0, 3.0]) - result = self.processor.get_stat_info_async(tensor).stack_tensor_stat[1] - self.assertEqual(result[0].item(), 3.0) - self.assertEqual(result[1].item(), 1.0) - self.assertEqual(result[2].item(), 2.0) - self.assertEqual(result[3].item(), torch.norm(tensor).item()) - def test_get_stat_info_int(self): tensor = torch.tensor([1, 2, 3], dtype=torch.int32) result = self.processor.get_stat_info(tensor) @@ -86,13 +104,6 @@ class TestPytorchDataProcessor(unittest.TestCase): self.assertEqual(result.mean, 2) self.assertEqual(result.norm, torch.norm(tensor.float()).item()) - def test_get_stat_info_int_async(self): - tensor = torch.tensor([1, 2, 3]) - result = self.processor.get_stat_info_async(tensor).stack_tensor_stat[1] - self.assertEqual(result[0].item(), 3.0) - self.assertEqual(result[1].item(), 1.0) - self.assertEqual(result[2].item(), 2.0) - self.assertEqual(result[3].item(), torch.norm(tensor.float()).item()) def test_get_stat_info_empty(self): tensor = torch.tensor([]) @@ -110,12 +121,6 @@ class TestPytorchDataProcessor(unittest.TestCase): self.assertIsNone(result.mean) self.assertIsNone(result.norm) - def test_get_stat_info_bool_async(self): - tensor = torch.tensor([True, False, True]) - result = self.processor.get_stat_info_async(tensor).stack_tensor_stat[1] - self.assertEqual(result[0].item(), True) - self.assertEqual(result[1].item(), False) - def test_get_stat_info_with_scalar_tensor(self): scalar_tensor = torch.tensor(42.0) result = PytorchDataProcessor.get_stat_info(scalar_tensor) @@ -291,39 +296,15 @@ class TestPytorchDataProcessor(unittest.TestCase): expected_result = self.processor._analyze_builtin(Ellipsis) self.assertEqual(result, expected_result) - @patch.object(PytorchDataProcessor, 'get_md5_for_tensor') - def test_analyze_tensor(self, get_md5_for_tensor): - get_md5_for_tensor.return_value = 'mocked_md5' - tensor = torch.tensor([1.0, 2.0, 3.0]) - self.config.summary_mode = 'md5' - self.config.async_dump = False - result = self.processor._analyze_tensor(tensor, 'suffix') - expected = { - 'type': 'torch.Tensor', - 'dtype': str(tensor.dtype), - 'shape': tensor.shape, - 'Max': 3.0, - 'Min': 1.0, - 'Mean': 2.0, - 'Norm': torch.norm(tensor).item(), - 'requires_grad': tensor.requires_grad, - 'md5': 'mocked_md5' - } - self.assertDictEqual(expected, result) - - def test_analyze_tensor_with_empty_tensor(self): - tensor = torch.tensor([]) - result = self.processor._analyze_tensor(tensor, 'suffix') - self.assertEqual(result['Max'], None) - self.assertEqual(result['Min'], None) - self.assertEqual(result['Mean'], None) - self.assertEqual(result['Norm'], None) + def test_cast_to_float_if_fp8(self): + tensor = MagicMock() + tensor.dtype = "torch.float8_e5m2" + _, dtype = self.processor._cast_to_float_if_fp8(tensor) + self.assertEqual(dtype, "torch.float8_e5m2") - def test_analyze_tensor_with_inf_and_nan(self): - tensor = torch.tensor([1.0, float('inf'), float('nan'), -float('inf')]) - result = self.processor._analyze_tensor(tensor, 'suffix') - self.assertEqual(result['Max_except_inf_nan'], 1.0) - self.assertEqual(result['Min_except_inf_nan'], 1.0) + tensor.dtype = "torch.float8_e4m3fn" + _, dtype = self.processor._cast_to_float_if_fp8(tensor) + self.assertEqual(dtype, "torch.float8_e4m3fn") class TestTensorDataProcessor(unittest.TestCase): @@ -336,27 +317,6 @@ class TestTensorDataProcessor(unittest.TestCase): self.processor.current_api_or_module_name = "test_api" self.processor.api_data_category = "input" - @patch('torch.save') - def test_analyze_tensor(self, mock_save): - self.config.framework = "pytorch" - self.config.async_dump = False - tensor = torch.tensor([1.0, 2.0, 3.0]) - suffix = 'suffix' - result = self.processor._analyze_tensor(tensor, suffix) - mock_save.assert_called_once() - expected = { - 'type': 'torch.Tensor', - 'dtype': 'torch.float32', - 'shape': tensor.shape, - 'Max': 3.0, - 'Min': 1.0, - 'Mean': 2.0, - 'Norm': torch.norm(tensor).item(), - 'requires_grad': False, - 'data_name': 'test_api.input.suffix.pt' - } - self.assertEqual(expected, result) - class TestOverflowCheckDataProcessor(unittest.TestCase): @@ -448,33 +408,6 @@ class TestOverflowCheckDataProcessor(unittest.TestCase): self.processor._is_support_inf_nan() self.assertTrue(self.processor.support_inf_nan) - def test_analyze_maybe_overflow_tensor(self): - tensor_json = {'Max': None, 'Min': None} - self.processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertFalse(self.processor.has_overflow) - - tensor_json = {'Max': float('inf'), 'Min': 1.0} - self.processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.processor.has_overflow) - - tensor_json = {'Max': 1.0, 'Min': float('inf')} - self.processor._analyze_maybe_overflow_tensor(tensor_json) - self.assertTrue(self.processor.has_overflow) - - @patch('msprobe.core.common.file_utils.path_len_exceeds_limit', return_value=False) - @patch.object(BaseDataProcessor, 'get_save_file_path', - return_value=['test_api_name', 'test_api_name.0.forward.input.pt']) - def test_analyze_tensor(self, mock_path_len_exceeds_limit, _): - tensor = torch.tensor([1.0, 2.0, 3.0]) - suffix = 'suffix' - expected = {'Max': 3.0, 'Min': 1.0, 'data_name': 'test_api_name'} - with patch.object(PytorchDataProcessor, '_analyze_tensor', - return_value={'Max': 3.0, 'Min': 1.0}) as mock_super_analyze_tensor: - result = self.processor._analyze_tensor(tensor, suffix) - mock_super_analyze_tensor.assert_called_once_with(tensor, suffix) - mock_path_len_exceeds_limit.assert_called_once() - self.assertEqual(expected, result) - class TestFreeBenchmarkDataProcessor(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/test/core_ut/data_dump/test_api_registry.py b/debug/accuracy_tools/msprobe/test/core_ut/data_dump/test_api_registry.py new file mode 100644 index 0000000000000000000000000000000000000000..c67c5d8ee9efd201cdcf09bc82471cac1f6607c3 --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/core_ut/data_dump/test_api_registry.py @@ -0,0 +1,73 @@ +# Copyright (c) 2025-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from unittest import TestCase +from unittest.mock import patch + +import torch + +from msprobe.core.common.const import Const +from msprobe.core.data_dump.api_registry import _get_attr, ApiWrapper + + +class TestFunctions(TestCase): + def test__get_attr(self): + module = torch + + attr_name = 'linalg.norm' + target_value = torch.linalg.norm + actual_value = _get_attr(module, attr_name) + self.assertEqual(target_value, actual_value) + + attr_name = 'norm' + target_value = torch.norm + actual_value = _get_attr(module, attr_name) + self.assertEqual(target_value, actual_value) + + +class TestApiWrapper(TestCase): + api_types = { + Const.PT_FRAMEWORK: { + Const.PT_API_TYPE_TORCH: (torch, torch), + } + } + supported_api_list_path = (Const.SUPPORT_API_FILE_NAME,) + yaml_value = {'torch': ['linalg.norm', 'norm']} + api_names = {Const.PT_FRAMEWORK: {'torch': {'linalg.norm', 'norm'}}} + + def test___init__(self): + with patch('msprobe.core.data_dump.api_registry.load_yaml', return_value=self.yaml_value): + api_wrapper = ApiWrapper(self.api_types, self.supported_api_list_path) + self.assertEqual(api_wrapper.api_types, self.api_types) + self.assertEqual(api_wrapper.api_list_paths, self.supported_api_list_path) + self.assertEqual(api_wrapper.api_names, self.api_names) + self.assertEqual(api_wrapper.wrapped_api_functions, {}) + + api_wrapper = ApiWrapper(self.api_types, Const.SUPPORT_API_FILE_NAME) + self.assertEqual(api_wrapper.api_list_paths, list(self.supported_api_list_path)) + + with self.assertRaises(Exception) as context: + api_wrapper = ApiWrapper(self.api_types, (Const.SUPPORT_API_FILE_NAME, Const.SUPPORT_API_FILE_NAME)) + self.assertEqual(str(context.exception), + "The number of api_list_paths must be equal to the number of frameworks in 'api_types', " + "when api_list_paths is a list or tuple.") + + def test__get_api_names(self): + target_value = self.api_names + with patch('msprobe.core.data_dump.api_registry.load_yaml', return_value=self.yaml_value): + api_wrapper = ApiWrapper(self.api_types, self.supported_api_list_path) + actual_value = api_wrapper._get_api_names() + self.assertEqual(target_value, actual_value) diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/api_accuracy_checker/test_data_manager.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/api_accuracy_checker/test_data_manager.py index bb4c8b197ef8362921858839ca3790224715a39a..9cfad00d8ff13e91eb84fff5f46ab434f9ed1d4d 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/api_accuracy_checker/test_data_manager.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/api_accuracy_checker/test_data_manager.py @@ -2,7 +2,8 @@ import unittest from unittest.mock import patch, mock_open, MagicMock import os from msprobe.mindspore.api_accuracy_checker.api_accuracy_checker import DataManager -from msprobe.core.common.const import MsCompareConst, CompareConst +from msprobe.core.common.const import CompareConst +from msprobe.mindspore.common.const import MsCompareConst class TestDataManager(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/mindspore_data/dump.json b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/mindspore_data/dump.json index 5b954f6d6443c92e6321e5f55e373e99f428653d..48800c0455c6651b146600e61e636d4dc25fac31 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/mindspore_data/dump.json +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/mindspore_data/dump.json @@ -1,6 +1,7 @@ { "task": "statistics", "level": "mix", + "framework": "mindspore", "dump_data_dir": null, "data": { "Tensor.__add__.0.forward": { diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/pytorch_data/dump.json b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/pytorch_data/dump.json index 150cbd43b169573e48542aa0c46c26e7df69843e..b2704185ff19b961b43453f81247236d77677d83 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/pytorch_data/dump.json +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/dump_file/pytorch_data/dump.json @@ -1,6 +1,7 @@ { "task": "statistics", "level": "mix", + "framework": "pytorch", "dump_data_dir": null, "data": { "Tensor.__add__.0.forward": { diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_compare.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_compare.py index b5cbff9784a837ea4d64ac9eccdf30175564f712..6f7377894002e60add41dc7b2d3c1d3d68391e0b 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_compare.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_compare.py @@ -5,8 +5,10 @@ import random import shutil import tempfile import unittest +from unittest.mock import patch import numpy as np +import pandas as pd import torch import yaml @@ -350,21 +352,21 @@ class TestUtilsMethods(unittest.TestCase): finally: shutil.rmtree(data_path) - def test_check_cross_framework(self): - ms_data = { - "data_name": "Cell.model.language_model.encoder.layers.5.input_norm.FusedRMSNorm.forward.0.input.0.npy", - } - pt_data = { - "data_name": "Module.module.module.language_model.encoder.layers.0.input_norm.RMSNorm.forward.0.input.0.pt", - } + @patch('msprobe.mindspore.compare.ms_compare.detect_framework_by_dump_json') + def test_check_cross_framework_valid_pytorch(self, mock_detect_framework): + mock_detect_framework.return_value = Const.PT_FRAMEWORK + + result = check_cross_framework("dummy_path") + + self.assertTrue(result) - def check_data(data): - with tempfile.NamedTemporaryFile(mode='w+', suffix='.json', encoding='utf-8', delete=True) as temp_file: - json.dump(data, temp_file, ensure_ascii=False, indent=4) - temp_file.flush() - return check_cross_framework(temp_file.name) - self.assertFalse(check_data(ms_data)) - self.assertTrue(check_data(pt_data)) + @patch('msprobe.mindspore.compare.ms_compare.detect_framework_by_dump_json') + def test_check_cross_framework_invalid_framework(self, mock_detect_framework): + mock_detect_framework.return_value = Const.MS_FRAMEWORK + + result = check_cross_framework("dummy_path") + + self.assertFalse(result) def test_comapre_process(self): data_path = tempfile.mkdtemp(prefix='dump_data', dir='/tmp') @@ -533,4 +535,28 @@ class TestUtilsMethods(unittest.TestCase): api_list = ["Mint"] with self.assertRaises(CompareException): - ms_comparator.get_api_name(api_list) \ No newline at end of file + ms_comparator.get_api_name(api_list) + + def test_process_data_name(self): + stack_mode = True + auto_analyze = True + fuzzy_match = False + dump_mode = Const.ALL + + mode_config = ModeConfig(stack_mode, auto_analyze, fuzzy_match, dump_mode) + mapping_config = MappingConfig() + ms_comparator = MSComparator(mode_config, mapping_config) + + data = pd.DataFrame({ + 'data_name_x': ['A', 'B', 'C'], + 'data_name_y': ['X', 'Y', 'Z'] + }) + + result = ms_comparator.process_data_name(data.copy()) + + expected = pd.DataFrame({ + 'data_name_x': [['A', 'X'], ['B', 'Y'], ['C', 'Z']], + 'data_name_y': ['X', 'Y', 'Z'] + }) + + pd.testing.assert_frame_equal(result, expected) diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_graph_compare.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_graph_compare.py index e3fd9348efe7dd4df0a6db2cd52a45f4757dae01..c2e7c9368c3f049511657469ebb16388015b621a 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_graph_compare.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/compare/test_ms_graph_compare.py @@ -78,7 +78,7 @@ class TestMsGraphCompare(unittest.TestCase): result_correct = ( f"[['{npu_file_path}', '{bench_file_path}', dtype('float16'), dtype('float16'), (10, 10), (10, 10), " - f"44.0, 44.0, 44.0, inf, 44.0, 44.0, 44.0, inf, 'Yes', '', 1.0, 0.0, 0.0, 1.0, 1.0]]") + f"44.0, 44.0, 44.0, inf, 44.0, 44.0, 44.0, inf, 'Yes', '', 1.0, 0.0, 0.0, 0.0, 1.0, 1.0]]") self.assertNotEqual(len(files), 0) self.assertEqual(result, result_correct) diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/debugger/test_ms_precision_debugger.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/debugger/test_ms_precision_debugger.py index 066ff537ce6fba12f712ae3d4681115499be35a6..790a02b4048cdade6679c0bdc94a03ee21863340 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/debugger/test_ms_precision_debugger.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/debugger/test_ms_precision_debugger.py @@ -94,10 +94,18 @@ class TestPrecisionDebugger(unittest.TestCase): self.assertTrue(Handler.called) def test_stop_step(self): + class MockConfig: + def __init__(self): + self.execution_mode = None + self.level = None + self.level_ori = Const.LEVEL_L1 + class MockPrecisionDebugger: def __init__(self): self.task = Const.TENSOR self.service = None + self.config = MockConfig() + PrecisionDebugger._instance = None with self.assertRaises(Exception) as context: PrecisionDebugger.stop() diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/free_benchmark/test_ms_api_pynative_self_check.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/free_benchmark/test_ms_api_pynative_self_check.py index e589dd4d58715d74644047f8c7e7a6ce79ccf225..4872527e4c29e4200a9f60459137425cfbf5d73d 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/free_benchmark/test_ms_api_pynative_self_check.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/free_benchmark/test_ms_api_pynative_self_check.py @@ -23,12 +23,18 @@ from mindspore import Tensor, mint, ops from msprobe.core.common.const import Const from msprobe.mindspore.common.const import FreeBenchmarkConst from msprobe.mindspore.common.log import logger -from msprobe.mindspore.dump.hook_cell.api_registry import api_register -from msprobe.mindspore.free_benchmark.api_pynative_self_check import (ApiPyNativeSelfCheck, check_all_tensor, - check_self, data_pre_deal, - deal_fuzzed_and_original_result, - get_module, get_supported_ops, - get_target_arg_index, need_wrapper_func) +from msprobe.mindspore.free_benchmark.api_pynative_self_check import ( + ApiPyNativeSelfCheck, + check_all_tensor, + check_self, + data_pre_deal, + deal_fuzzed_and_original_result, + get_module, + get_supported_ops, + get_target_arg_index, + need_wrapper_func, + _api_register +) from msprobe.mindspore.free_benchmark.common.config import Config from msprobe.mindspore.free_benchmark.common.handler_params import HandlerParams from msprobe.mindspore.free_benchmark.common.utils import Tools @@ -83,8 +89,8 @@ class TestApiPyNativeSelfCheck(TestCase): self.assertEqual(self_checker.ori_func, target_ori_func) def test_handle(self): - with patch.object(api_register, "initialize_hook") as mock_init_hook, \ - patch.object(api_register, "api_set_hook_func") as mock_set_hook: + with patch.object(_api_register, "initialize_hook") as mock_init_hook, \ + patch.object(_api_register, "register_all_api") as mock_set_hook: self.checker.handle() mock_init_hook.assert_called_with(self.checker.build_hook) mock_set_hook.assert_called_once() @@ -156,8 +162,8 @@ class TestApiPyNativeSelfCheck(TestCase): mock_warning.reset_mock() Config.stage = Const.FORWARD with patch.object(logger, "info") as mock_info, \ - patch.object(api_register, "api_set_ori_func") as mock_set_ori, \ - patch.object(api_register, "api_set_hook_func") as mock_set_hook, \ + patch.object(_api_register, "restore_all_api") as mock_set_ori, \ + patch.object(_api_register, "register_all_api") as mock_set_hook, \ patch("msprobe.mindspore.free_benchmark.api_pynative_self_check.deal_fuzzed_and_original_result", return_value="ret"): args = (1.0, 1.0) diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_dump_tool_factory.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_dump_tool_factory.py index 8f5d207c41923175b6efe4f9dc313896f879fd89..9abd7a56853b53fafbfb6b5abf4e4b3a4b4619cb 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_dump_tool_factory.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_dump_tool_factory.py @@ -63,7 +63,7 @@ class TestDumpToolFactory(TestCase): config.level = Const.CELL with self.assertRaises(Exception) as context: DumpToolFactory.create(config) - self.assertEqual(str(context.exception), "Data dump is not supported in graph_ge mode when dump level is cell.") + self.assertEqual(str(context.exception), "The model is empty and cell dump is not enabled.") config.execution_mode = Const.GRAPH_KBYK_MODE config.level = Const.KERNEL diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_debug_save.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_debug_save.py index 495eedbf41384f820c2ca054fd73192d1966a8bd..7af7dd89727d98a595103c6de9ee0106b0865665 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_debug_save.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_debug_save.py @@ -38,40 +38,4 @@ class TestMindsporeDebuggerSave(TestCase): task_config = BaseConfig(statistics_task_json) with patch("msprobe.mindspore.debugger.precision_debugger.parse_json_config", return_value=(common_config, task_config)), \ patch("msprobe.mindspore.debugger.precision_debugger.set_register_backward_hook_functions"): - self.debugger = PrecisionDebugger() - - def test_forward_and_backward(self): - def forward_func(x, y): - PrecisionDebugger.save(x, "x_tensor") - return x * y - x = mindspore.Tensor([1.]) - y = mindspore.Tensor([2.]) - result_json = { - "task": "statistics", - "level": "debug", - "framework": "mindspore", - "dump_data_dir": None, - "data": { - "x_tensor.0": { - "type": "mindspore.Tensor", - "dtype": "Float32", - "shape": (1,), - "Max": 1.0, - "Min": 1.0, - "Mean": 1.0, - "Norm": 1.0 - }, - "x_tensor_grad.0": { - "type": "mindspore.Tensor", - "dtype": "Float32", - "shape": (1,), - "Max": 2.0, - "Min": 2.0, - "Mean": 2.0, - "Norm": 2.0 - } - } - } - grad_fn = mindspore.value_and_grad(forward_func, (0, 1)) - grad_fn(x, y) - self.assertEqual(self.debugger.service.data_collector.data_writer.cache_debug, result_json) \ No newline at end of file + self.debugger = PrecisionDebugger() \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_service.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_service.py index 912830ea1ab705aae63c69f5c240887d4b4ce5b7..288145f7a1915e4d5d4e5b7f773a3ce40141f3b8 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_service.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_ms_service.py @@ -21,12 +21,13 @@ from unittest.mock import MagicMock, patch from mindspore import nn, ops from msprobe.core.common.exceptions import MsprobeException -from msprobe.core.common.utils import Const, DumpPathAggregation +from msprobe.core.common.utils import Const +from msprobe.core.data_dump.api_registry import ApiRegistry from msprobe.core.data_dump.scope import BaseScope from msprobe.mindspore.cell_processor import CellProcessor from msprobe.mindspore.common.log import logger from msprobe.mindspore.common.utils import register_backward_hook_functions -from msprobe.mindspore.dump.hook_cell.api_registry import ApiRegistry, api_register +from msprobe.mindspore.dump.hook_cell.api_register import get_api_register from msprobe.mindspore.dump.hook_cell.hook_cell import HOOKCell from msprobe.mindspore.dump.jit_dump import JitDump from msprobe.mindspore.service import Service @@ -49,7 +50,7 @@ class TestService(unittest.TestCase): self.service.primitive_hook_service = MagicMock() def tearDown(self) -> None: - api_register.api_set_ori_func() + get_api_register().restore_all_api() def test_init(self): self.assertEqual(self.service.config.level, "L0") @@ -197,7 +198,7 @@ class TestService(unittest.TestCase): @patch.object(Service, 'need_end_service', return_value=False) @patch.object(JitDump, 'set_config') @patch.object(JitDump, 'set_data_collector') - @patch.object(ApiRegistry, 'api_set_hook_func') + @patch.object(ApiRegistry, 'register_all_api') def test_start_with_jit_dump_enabled(self, mock_api_set_hook_func, mock_set_data_collector, mock_set_config, mock_need_end_service, mock_register_cell_hook, mock_register_primitive_hook): @@ -218,10 +219,9 @@ class TestService(unittest.TestCase): HOOKCell.cell_count = {"test_api": 1} JitDump.jit_count = {"test_api": 1} self.service.primitive_hook_service.primitive_counters = {"test_api": 1} - self.service.current_iter = 0 + self.service.loop = 0 self.service.step() - self.assertEqual(self.service.current_iter, 1) - self.service.data_collector.update_iter.assert_called_once_with(1) + self.assertEqual(self.service.loop, 1) self.service.data_collector.reset_status.assert_called_once() self.assertEqual(JitDump.jit_count, defaultdict(int)) self.assertEqual((self.service.primitive_hook_service.primitive_counters), {}) @@ -269,7 +269,7 @@ class TestService(unittest.TestCase): primitive_combined_name) @patch.object(ApiRegistry, 'initialize_hook') - @patch.object(ApiRegistry, 'api_set_hook_func') + @patch.object(ApiRegistry, 'register_all_api') @patch("msprobe.mindspore.service.logger.info") def test_register_hook_new_with_level_mix(self, mock_logger, mock_api_set_hook_func, mock_initialize_hook): self.service.config.level = Const.LEVEL_MIX diff --git a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_primitive_dump.py b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_primitive_dump.py index 3cafd49f2c101c45dbb65a08803dd77c6bca485d..79deeee08e13273f08f32be26a375d1d26f5d2f1 100644 --- a/debug/accuracy_tools/msprobe/test/mindspore_ut/test_primitive_dump.py +++ b/debug/accuracy_tools/msprobe/test/mindspore_ut/test_primitive_dump.py @@ -84,9 +84,9 @@ class TestService(unittest.TestCase): self.assertEqual(self.service.primitive_hook_service.primitive_counters[primitive_name], 1) def test_step_updates_iteration(self): - initial_iter = self.service.current_iter + initial_iter = self.service.loop self.service.step() - self.assertEqual(self.service.current_iter, initial_iter + 1) + self.assertEqual(self.service.loop, initial_iter + 1) @patch.object(HOOKCell, 'cell_count', new_callable=lambda: defaultdict(int)) def test_step_resets_counters(self, _): @@ -96,12 +96,13 @@ class TestService(unittest.TestCase): self.assertEqual(self.service.primitive_hook_service.primitive_counters, {}) self.assertEqual(HOOKCell.cell_count, defaultdict(int)) - def test_step_calls_update_iter(self): - # 检查是否在调用 step 时调用了 update_iter + def test_start_calls_update_iter(self): + # 检查是否在调用 start 时调用了 update_iter with patch.object(self.service.data_collector, 'update_iter') as mock_update_iter: - initial_iter = self.service.current_iter - self.service.step() - mock_update_iter.assert_called_once_with(initial_iter + 1) + initial_iter = self.service.loop + init_step = self.service.init_step + self.service.start() + mock_update_iter.assert_called_once_with(initial_iter + init_step) class TestPrimitiveHookService(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/common/test_config.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/common/test_config.py index df03485dc6c77371750fd0b67ca2c37ff7e2ed7b..30fa11d94de0dd4fec483502a51d0474e8b7646a 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/common/test_config.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/common/test_config.py @@ -16,6 +16,8 @@ class TestUtConfig(): self.port = 8080 self.rank_list = [0, 1, 2] self.tls_path = '/path/to/tls' + self.master_ip = '127.0.0.1' + self.master_port = 8888 class TestConfig(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_algorithm.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_algorithm.py index 377a29f2237e2b3172e6fc35a712ff36cc69972d..f1cc0d31363c326c3412824f4a5a176b70da1a90 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_algorithm.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/compare/test_algorithm.py @@ -208,3 +208,45 @@ class TestAlgorithmMethods(unittest.TestCase): ulp_err = alg.calc_ulp_err(self.bench_data, self.device_data, eb, exponent_num, data_type) expected_ulp_err = (self.device_data.astype(data_type) - self.bench_data).astype(data_type) * np.exp2(-eb + exponent_num) self.assertTrue(np.allclose(ulp_err, expected_ulp_err)) + + +class TestKahanLossRange(unittest.TestCase): + + def setUp(self): + self.cumsum = torch.tensor( + [[1000, 30], [1, 20], [10, 10]], dtype=torch.bfloat16) + self.addend = torch.tensor([[3, 0.2]], dtype=torch.bfloat16) + self.tensors = [ + torch.tensor([1000], dtype=torch.bfloat16), + torch.tensor([1004], dtype=torch.bfloat16), + torch.tensor([103], dtype=torch.bfloat16), + torch.tensor([4], dtype=torch.bfloat16)] + + def test_kahan_loss_positive(self): + # 测试最大化需要补偿的正损失, loss_res为历史损失中最大值,且mask会遮蔽小于0的部分 + loss_res, mask = alg.maximize_kahan_loss(self.cumsum, self.addend, negative=False) + expected_loss = torch.tensor([1, 0.0498], dtype=torch.bfloat16) + expected_mask = expected_loss >= 0 + self.assertTrue(torch.allclose(loss_res, expected_loss)) + self.assertTrue(torch.allclose(mask, expected_mask)) + + def test_kahan_loss_negative(self): + # 测试最大化需要补偿的负损失, loss_res为历史损失中最小值,且mask会遮蔽大于0的部分 + loss_res, mask = alg.maximize_kahan_loss(self.cumsum, self.addend, negative=True) + expected_loss = torch.tensor([0, -0.0127], dtype=torch.bfloat16) + expected_mask = expected_loss <= 0 + self.assertTrue(torch.allclose(loss_res, expected_loss)) + self.assertTrue(torch.allclose(mask, expected_mask)) + + def test_kahan_range_empty_list(self): + # 测试输入为空列表的情况 + with self.assertRaises(ValueError): + alg.kahan_range([]) + + def test_kahan_range_min_max(self): + max_ = alg.kahan_range(self.tensors, negative=True) + min_ = alg.kahan_range(self.tensors, negative=False) + expected_min = torch.tensor(2096, dtype=torch.bfloat16) + expected_max = torch.tensor(2112, dtype=torch.bfloat16) + self.assertTrue(torch.allclose(min_, expected_min)) + self.assertTrue(torch.allclose(max_, expected_max)) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_distributed_bench_function.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_distributed_bench_function.py new file mode 100644 index 0000000000000000000000000000000000000000..0b21a9559e90acec80c9cb4726d8ce039ddb6a71 --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_distributed_bench_function.py @@ -0,0 +1,29 @@ +import torch +import unittest + +from msprobe.pytorch.api_accuracy_checker.run_ut.distributed_bench_function import sort_all_input + +class TestSortAllInput(unittest.TestCase): + def setUp(self): + self.inputs = [ + torch.tensor([3.0, 2.0, 1.0]), + torch.tensor([6.0, 5.0, 4.0]), + torch.tensor([9.0, 8.0, 7.0]) + ] + + def test_normal_case(self): + # 测试正常情况 + sorted_inputs = sort_all_input(self.inputs) + expected_sorted_inputs = [ + torch.tensor([9.0, 8.0, 7.0]), + torch.tensor([6.0, 5.0, 4.0]), + torch.tensor([3.0, 2.0, 1.0]) + ] + for result, expected in zip(sorted_inputs, expected_sorted_inputs): + self.assertTrue(torch.equal(result, expected)) + + def test_single_tensor(self): + # 测试只有一个张量的情况 + single_input = [torch.tensor([2.0])] + sorted_inputs = sort_all_input(single_input) + self.assertTrue(torch.equal(sorted_inputs[0], single_input[0])) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_multi_run_ut.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_multi_run_ut.py index 1ad191a0d4e85715e6199367d1d305c10a728630..8eb8fde4fdca88c97a4165f541f6dd6e7133303f 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_multi_run_ut.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_multi_run_ut.py @@ -136,7 +136,7 @@ class TestMultiRunUT(unittest.TestCase): def setUp(self): self.test_json_file = os.path.join(os.path.dirname(os.path.realpath(__file__)), "dump.json") - self.test_data = {'data': {'key1': 'TRUE', 'key2': 'TRUE', 'key3': 'TRUE'}} + self.test_data = {'dump_data_dir': '/test', 'data': {'key1': 'TRUE', 'key2': 'TRUE', 'key3': 'TRUE'}} self.test_json_content = json.dumps(self.test_data) self.forward_split_files_content = [ {'key1': 'TRUE', 'key2': 'TRUE'}, diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_run_ut_utils.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_run_ut_utils.py index 0cf30461aec70b85577c38ebed011bf9f818874d..751d3f6affd10c82f9aeee941bed8cf5453daad8 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_run_ut_utils.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/api_accuracy_checker/run_ut/test_run_ut_utils.py @@ -1,13 +1,28 @@ -# coding=utf-8 +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import unittest -from unittest.mock import patch, MagicMock + import torch + from msprobe.pytorch.api_accuracy_checker.run_ut.run_ut_utils import * from msprobe.core.common.file_utils import create_directory, write_csv class TestRunUtUtils(unittest.TestCase): - + def setUp(self): save_path = "temp_save_path" create_directory(save_path) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/common/test_pt_utils.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/common/test_pt_utils.py index cdc922cc98d59b59ec0be85833d2000cd38913c8..0a25e6edf5983df968cd788e55348643e8098438 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/common/test_pt_utils.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/common/test_pt_utils.py @@ -1,17 +1,44 @@ -import os +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import io +import os +import tempfile import unittest from unittest.mock import MagicMock, patch -import tempfile import torch import torch.distributed as dist - -from msprobe.core.common.file_utils import FileCheckConst from msprobe.core.common.exceptions import DistributedNotInitializedError +from msprobe.core.common.file_utils import FileCheckConst from msprobe.pytorch.api_accuracy_checker.common.utils import ApiData -from msprobe.pytorch.common.utils import parameter_adapter, get_rank_if_initialized, \ - get_tensor_rank, get_rank_id, print_rank_0, load_pt, save_pt, save_api_data, load_api_data, save_pkl, load_pkl +from msprobe.pytorch.common.utils import ( + parameter_adapter, + get_rank_if_initialized, + get_tensor_rank, + get_rank_id, + print_rank_0, + load_pt, + save_pt, + save_api_data, + load_api_data, + save_pkl, + load_pkl, + is_float8_tensor, + is_hifloat8_tensor +) class TestParameterAdapter(unittest.TestCase): @@ -19,7 +46,7 @@ class TestParameterAdapter(unittest.TestCase): def setUp(self): self.func_mock = MagicMock() self.decorated_func = parameter_adapter(self.func_mock) - self.op_name_ = "__getitem__" + self.api_name = "__getitem__" def test_handle_masked_select_bfloat16(self): input_tensor = torch.tensor([1.0, 2.0], dtype=torch.bfloat16) @@ -45,7 +72,7 @@ class TestParameterAdapter(unittest.TestCase): self.assertTrue(torch.equal(result, torch.tensor([20.0, 30.0]))) def test_op_name_eq_with_none(self): - self.op_name_ = "__eq__" + self.api_name = "__eq__" args = (torch.tensor([1]), None) result = self.decorated_func(self, *args) self.assertFalse(result) @@ -186,6 +213,12 @@ class TestSavePT(unittest.TestCase): self.tensor = torch.tensor([1, 2, 3]) self.filepath = 'temp_tensor.pt' + def tearDown(self): + try: + os.remove(self.filepath) + except FileNotFoundError: + pass + @patch('msprobe.pytorch.common.utils.save_pt') @patch('os.path.realpath', return_value='temp_tensor.pt') @patch('msprobe.core.common.file_utils.check_path_before_create') @@ -193,21 +226,6 @@ class TestSavePT(unittest.TestCase): def test_save_pt_success(self, mock_change_mode, mock_check_path, mock_realpath, mock_torch_save): mock_torch_save(self.tensor, self.filepath) mock_torch_save.assert_called_once_with(self.tensor, self.filepath) - mock_change_mode.assert_called_once_with(self.filepath, FileCheckConst.DATA_FILE_AUTHORITY) - -class TestSavePT(unittest.TestCase): - - def setUp(self): - self.tensor = torch.tensor([1, 2, 3]) - self.filepath = 'temp_tensor.pt' - - @patch('torch.save') - @patch('os.path.realpath', return_value='temp_tensor.pt') - @patch('msprobe.core.common.file_utils.check_path_before_create') - @patch('msprobe.core.common.file_utils.change_mode') - def test_save_pt_success(self, mock_change_mode, mock_check_path, mock_realpath, mock_torch_save): - save_pt(self.tensor, self.filepath) - mock_torch_save.assert_called_once_with(self.tensor, self.filepath) @patch('torch.save', side_effect=Exception("Save failed")) @patch('os.path.realpath', return_value='temp_tensor.pt') @@ -218,12 +236,6 @@ class TestSavePT(unittest.TestCase): save_pt(self.tensor, self.filepath) self.assertIn("save pt file temp_tensor.pt failed", str(context.exception)) - def tearDown(self): - try: - os.remove(self.filepath) - except FileNotFoundError: - pass - class TestSaveApiData(unittest.TestCase): @@ -299,3 +311,24 @@ class TestSavePkl(unittest.TestCase): load_pkl(self.filepath) self.assertIn("Unsupported object type: os.system", str(context.exception)) os.remove(self.filepath) + +class TestFloat8Tensor(unittest.TestCase): + def setUp(self): + self.tensor = MagicMock() + + def test_is_float8_tensor(self): + self.tensor.dtype = "torch.float8_e5m2" + res = is_float8_tensor(self.tensor) + self.assertTrue(res) + + self.tensor.dtype = "torch.float8_e4m3fn" + res = is_float8_tensor(self.tensor) + self.assertTrue(res) + + def test_is_not_float8_tensor(self): + self.tensor.dtype = 123 + res = is_float8_tensor(self.tensor) + self.assertFalse(res) + + res = is_hifloat8_tensor(self.tensor) + self.assertFalse(res) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/bench.sh b/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/bench.sh new file mode 100644 index 0000000000000000000000000000000000000000..217676ef0f451b6b8f2d2cecb14545d9a7f8dd8b --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/bench.sh @@ -0,0 +1,25 @@ +MASTER_PORT=6000 +NNODES=1 +NODE_RANK=0 +CKPT_SAVE_DIR="your model save ckpt path" +DATA_PATH="your data path" +TOKENIZER_MODEL="your tokenizer path" +CKPT_LOAD_DIR="your model ckpt path" +TP=1 + +DISTRIBUTED_ARGS=" + --master_port $MASTER_PORT +" + +GPT_ARGS=" + --tensor-model-parallel-size ${TP} \ + --sequence-parallel \ + --tokenizer-model ${TOKENIZER_MODEL} \ +" + +torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \ + $GPT_ARGS \ + --distributed-backend nccl \ + --load $CKPT_LOAD_DIR \ + --save $CKPT_SAVE_DIR \ + | tee logs/train_llama2_7b.log \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/cmp.sh b/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/cmp.sh new file mode 100644 index 0000000000000000000000000000000000000000..8df9e6507975c7edbcfee105d838563171c720e4 --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/cmp.sh @@ -0,0 +1,25 @@ +MASTER_PORT=6001 +NNODES=1 +NODE_RANK=0 +CKPT_SAVE_DIR="./aaa" +DATA_PATH="./aaa" +TOKENIZER_MODEL="./aaa" +CKPT_LOAD_DIR="./aaa" +TP=2 + +DISTRIBUTED_ARGS=" + --master_port $MASTER_PORT +" + +GPT_ARGS=" + --tensor-model-parallel-size ${TP} \ + --sequence-parallel \ + --tokenizer-model ${TOKENIZER_MODEL} \ +" + +torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \ + $GPT_ARGS \ + --distributed-backend nccl \ + --load $CKPT_LOAD_DIR \ + --save $CKPT_SAVE_DIR \ + | tee logs/train_llama2_7b.log \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/test_config_checking.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/test_config_checking.py new file mode 100644 index 0000000000000000000000000000000000000000..95d6b7dbd7fbb81c053987bca986fe707e2d0818 --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/config_checking/test_config_checking.py @@ -0,0 +1,112 @@ +import os +import random +import shutil +import unittest +import torch +import json +import numpy as np +import torch.nn as nn +from msprobe.pytorch.config_checking.config_checker import ConfigChecker + +testdir = os.path.dirname(__file__) +config_checking_dir = os.path.dirname(testdir) +temp_dir = os.path.join(config_checking_dir, "temp") +os.makedirs(temp_dir, exist_ok=True) + + +def seed_all(seed=1234, mode=False): + random.seed(seed) + os.environ['PYTHONHASHSEED'] = str(seed) + np.random.seed(seed) + torch.manual_seed(seed) + torch.use_deterministic_algorithms(mode) + + +class MockModule(nn.Module): + def __init__(self): + super().__init__() + self.linear = nn.Linear(10, 5) + self.relu = nn.ReLU() + + def forward(self, x): + x1 = self.linear(x) + x2 = self.relu(x1) + return x2 + + +def get_test_dataset(): + inputs = [torch.rand(10, 10) for _ in range(10)] + labels = [torch.randint(0, 5, (10,)) for _ in range(10)] + return zip(inputs, labels) + + +def get_test_model(): + test_module = MockModule() + nn.init.constant_(test_module.linear.weight, 1.0) + nn.init.constant_(test_module.linear.bias, 1.0) + return test_module + + +@unittest.mock.patch("msprobe.pytorch.config_checking.checkers.pip_checker.collect_pip_data") +@unittest.mock.patch("msprobe.pytorch.config_checking.checkers.env_args_checker.collect_env_data") +def train_test(config_dict, seed, mock_env, mock_pip): + mock_env.return_value = {"HCCL_DETERMINISTIC": False} + if seed == 1234: + mock_pip.return_value = "transformers=0.0.1" + else: + mock_pip.return_value = "transformers=0.0.2" + seed_all(seed) + + loss_fun = nn.CrossEntropyLoss() + test_module = get_test_model() + optimizer = torch.optim.SGD(test_module.parameters(), lr=1e-2) + + config_path = os.path.join(temp_dir, "config.json") + json.dump(config_dict, open(config_path, 'w', encoding='utf-8')) + ConfigChecker(config_path, test_module) + + try: + for input_data, label in get_test_dataset(): + output = test_module(input_data) + loss = loss_fun(output, label) + optimizer.zero_grad() + loss.backward() + optimizer.step() + except Exception: + pass + + +class TestConfigChecker(unittest.TestCase): + def tearDown(self): + shutil.rmtree(temp_dir) + + def test_all(self): + config_dict1 = { + "env args": True, + "pip data": True, + "shell path": [os.path.join(testdir, "cmp.sh")], + "output zip path": os.path.join(temp_dir, "config_check_pack1.zip") + } + train_test(config_dict1, 1234) + + config_dict2 = { + "env args": True, + "pip data": True, + "shell path": [os.path.join(testdir, "bench.sh")], + "output zip path": os.path.join(temp_dir, "config_check_pack2.zip") + } + train_test(config_dict2, 1233) + + ConfigChecker.compare(config_dict1["output zip path"], + config_dict2["output zip path"], + os.path.join(temp_dir, "compare_output")) + + compare_output_dir = os.path.join(temp_dir, "compare_output", "output") + with open(os.path.join(compare_output_dir, "pip_data_check_result.txt"), 'r', encoding='utf-8') as file: + lines = file.readlines() + with open(os.path.join(compare_output_dir, "hyperparameter_diff.txt"), 'r', encoding='utf-8') as file: + hyperparameter_diff = file.readlines() + self.assertEqual(lines[1], " package_name:transformers, npu_version:0.0.1, bench_version:0.0.2\n") + self.assertEqual(len(hyperparameter_diff), 6) + self.assertEqual(hyperparameter_diff[1], + ' Parameter \'load\' (matched with \'load\'): load: "./aaa" -> "your model ckpt path"\n') diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/dump/test_module_dump.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/dump/test_module_dump.py index 63d6abc3a2430bb6f092820c4b97a02cdf675612..5aaf0820a78339ff4f1cc5d28aff8762bae31a39 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/dump/test_module_dump.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/dump/test_module_dump.py @@ -18,8 +18,10 @@ from unittest.mock import patch, MagicMock import torch import torch.nn as nn + +from msprobe.core.data_dump.api_registry import ApiRegistry from msprobe.pytorch import PrecisionDebugger -from msprobe.pytorch.hook_module.api_registry import api_register +from msprobe.pytorch.hook_module.api_register import get_api_register from msprobe.pytorch.service import torch_version_above_or_equal_2 @@ -27,12 +29,12 @@ class TestModuleDumper(unittest.TestCase): @classmethod def setUpClass(cls): PrecisionDebugger._instance = None - api_register.api_originality() + get_api_register().restore_all_api() @classmethod def tearDownClass(cls): PrecisionDebugger._instance = None - api_register.api_originality() + get_api_register().restore_all_api() def setUp(self): self.module = nn.Linear(8, 4) @@ -41,7 +43,7 @@ class TestModuleDumper(unittest.TestCase): def test_stop_module_dump(self): self.module_dumper.hook_handle_list.extend([1, 2, 3]) - with patch('msprobe.pytorch.dump.module_dump.module_dump.api_register') as mock_api_register: + with patch.object(ApiRegistry, 'register_all_api') as mock_api_register: mock_handle1 = MagicMock(spec=torch.utils.hooks.RemovableHandle) mock_handle2 = MagicMock(spec=torch.utils.hooks.RemovableHandle) self.module_dumper.hook_handle_list.extend([mock_handle1, mock_handle2]) @@ -50,7 +52,7 @@ class TestModuleDumper(unittest.TestCase): mock_handle1.remove.assert_called_once() mock_handle2.remove.assert_called_once() self.assertEqual(self.module_dumper.hook_handle_list, []) - mock_api_register.api_modularity.assert_called_once() + mock_api_register.assert_called_once() def test_register_hook(self): self.module_dumper.register_hook(self.module, "TestModule") diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_api_registry.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_api_registry.py deleted file mode 100644 index 837ad23df76be2a012a7408dab4879847937f229..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_api_registry.py +++ /dev/null @@ -1,130 +0,0 @@ -import unittest -from msprobe.pytorch.hook_module.api_registry import ApiRegistry, torch_version_above_2, is_gpu - - -class TestApiRegistry(unittest.TestCase): - - def test_store_ori_attr(self): - class A(): - a1 = 1 - class B(): - a = A() - b1 = 1 - b2 = 2 - - api_list = ["a.a1", "b1", "b2"] - expect_output = {"a.a1":1, "b1":1, "b2":2} - actual_output = dict() - ApiRegistry.store_ori_attr(B, api_list, actual_output) - self.assertEqual(actual_output, expect_output) - - - def test_set_api_attr(self): - class A(): - a1 = 1 - class B(): - a = A().__class__ - b1 = 1 - - attr_dict = {"a.a2":2, "b2":2, "b3":3} - ApiRegistry.set_api_attr(B, attr_dict) - - for k, v in attr_dict.items(): - if '.' in k: - sub_module_name, sub_op = k.rsplit('.', 1) - sub_module = getattr(B, sub_module_name, None) - - self.assertEqual(getattr(sub_module, sub_op), v) - else: - self.assertEqual(getattr(B, k), v) - - def test_api_modularity(self): - - import torch - import torch.distributed as dist - #import torch_npu #门禁没有安装torch_npu - from msprobe.pytorch.hook_module.api_registry import torch_without_guard_version, npu_distributed_api, is_gpu, torch_version_above_2 - - - - reg = ApiRegistry() - attr_dict = {"b2":2, "b3":3} - reg.tensor_hook_attr = attr_dict - reg.torch_hook_attr = attr_dict - reg.functional_hook_attr = attr_dict - reg.distributed_hook_attr = attr_dict - reg.npu_distributed_hook_attr = attr_dict - reg.aten_hook_attr = attr_dict - reg.vf_hook_attr = attr_dict - reg.torch_npu_hook_attr = attr_dict - - reg.api_modularity() - self.assertEqual(torch.Tensor.b2, 2) - - self.assertEqual(torch.b2, 2) - self.assertEqual(torch.nn.functional.b2, 2) - self.assertEqual(dist.b2, 2) - self.assertEqual(dist.distributed_c10d.b2, 2) - #if not is_gpu and not torch_without_guard_version: - #self.assertEqual(torch_npu.distributed.b2, 2) - #self.assertEqual(torch_npu.distributed.distributed_c10d.b2, 2) - if torch_version_above_2: - self.assertEqual(torch.ops.aten.b2, 2) - self.assertEqual(torch._VF.b2, 2) - #if not is_gpu: - #self.assertEqual(torch_npu.b2, 2) - - - def test_api_originality(self): - import torch - import torch.distributed as dist - #import torch_npu #门禁没有安装torch_npu - from msprobe.pytorch.hook_module.api_registry import torch_without_guard_version, npu_distributed_api, is_gpu, torch_version_above_2 - - - - reg = ApiRegistry() - attr_dict = {"b2":2, "b3":3} - reg.tensor_hook_attr = attr_dict - reg.torch_hook_attr = attr_dict - reg.functional_hook_attr = attr_dict - reg.distributed_hook_attr = attr_dict - reg.npu_distributed_hook_attr = attr_dict - reg.aten_hook_attr = attr_dict - reg.vf_hook_attr = attr_dict - reg.torch_npu_hook_attr = attr_dict - - reg.api_originality() - self.assertEqual(torch.Tensor.b2, 2) - - self.assertEqual(torch.b2, 2) - self.assertEqual(torch.nn.functional.b2, 2) - self.assertEqual(dist.b2, 2) - self.assertEqual(dist.distributed_c10d.b2, 2) - #if not is_gpu and not torch_without_guard_version: - #self.assertEqual(torch_npu.distributed.b2, 2) - #self.assertEqual(torch_npu.distributed.distributed_c10d.b2, 2) - if torch_version_above_2: - self.assertEqual(torch.ops.aten.b2, 2) - self.assertEqual(torch._VF.b2, 2) - #if not is_gpu: - #self.assertEqual(torch_npu.b2, 2) - - def test_initialize_hook(self): - def hook_test(): - pass - - reg = ApiRegistry() - reg.initialize_hook(hook_test) - empty_list = [] - self.assertFalse(empty_list==reg.tensor_hook_attr) - self.assertFalse(empty_list==reg.torch_hook_attr) - self.assertFalse(empty_list==reg.functional_hook_attr) - self.assertFalse(empty_list==reg.distributed_hook_attr) - self.assertFalse(empty_list==reg.npu_distributed_hook_attr) - if torch_version_above_2: - #print(True) - self.assertFalse(empty_list==reg.aten_hook_attr) - if not is_gpu: - #print(True) - self.assertFalse(empty_list==reg.torch_npu_hook_attr) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_distributed.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_distributed.py deleted file mode 100644 index 246feb56becf9942de9214f5b24b8471e9b4024a..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_distributed.py +++ /dev/null @@ -1,41 +0,0 @@ -import unittest -import torch.distributed as dist -from msprobe.pytorch.hook_module.wrap_distributed import * - -class TestWrapDistributed(unittest.TestCase): - def hook(name, prefix): - def forward_pre_hook(nope, input, kwargs): - return input, kwargs - - def forward_hook(nope, input, kwargs, result): - return 2 - - def backward_hook(): - pass - - def forward_hook_torch_version_below_2(): - pass - - return forward_pre_hook, forward_hook, backward_hook, forward_hook_torch_version_below_2 - - def test_get_distributed_ops(self): - ops = get_distributed_ops() - self.assertIsInstance(ops, set) - - def test_DistributedOPTemplate(self): - self.setUp() - op_name = 'all_reduce' - if op_name in get_distributed_ops(): - op = DistributedOPTemplate(op_name, self.hook) - self.assertEqual(op.op_name_, op_name) - - def test_wrap_distributed_op(self): - op_name = 'all_reduce' - if op_name in get_distributed_ops(): - wrapped_op = wrap_distributed_op(op_name, self.hook) - self.assertTrue(callable(wrapped_op)) - - def test_wrap_distributed_ops_and_bind(self): - wrap_distributed_ops_and_bind(self.hook) - for op_name in get_distributed_ops(): - self.assertTrue(hasattr(HOOKDistributedOP, "wrap_" + str(op_name))) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_functional.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_functional.py deleted file mode 100644 index 282551e3cefdb2ae63efda284f5e7ae7482ae81c..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_functional.py +++ /dev/null @@ -1,73 +0,0 @@ -import unittest -import torch -import torch.nn.functional as F -from msprobe.pytorch.hook_module.wrap_functional import get_functional_ops, \ - wrap_functional_ops_and_bind, HOOKFunctionalOP -from msprobe.pytorch.common.utils import remove_dropout - - -class TestDropoutFunctions(unittest.TestCase): - - def setUp(self): - self.input_tensor = torch.ones(10, 10) - remove_dropout() - - def test_function_dropout_no_dropout(self): - output = F.dropout(self.input_tensor, p = 0., training = True) - self.assertTrue(torch.equal(self.input_tensor, output)) - - def test_function_dropout_train_vs_eval(self): - output_train = F.dropout(self.input_tensor, p = 0., training = True) - output_eval = F.dropout(self.input_tensor, p = 0., training = False) - self.assertTrue(torch.equal(output_train, output_eval)) - - def test_function_dropout_invalid_probability(self): - with self.assertRaises(ValueError): - F.dropout(self.input_tensor, p = -0.1) - with self.assertRaises(ValueError): - F.dropout(self.input_tensor, p = 1.1) - - def test_function_dropout2d_no_dropout(self): - output = F.dropout2d(self.input_tensor, p = 0., training = True) - self.assertTrue(torch.equal(self.input_tensor, output)) - - def test_function_dropout2d_train_vs_eval(self): - output_train = F.dropout2d(self.input_tensor, p = 0., training = True) - output_eval = F.dropout2d(self.input_tensor, p = 0., training = False) - self.assertTrue(torch.equal(output_train, output_eval)) - - def test_function_dropout2d_invalid_probability(self): - with self.assertRaises(ValueError): - F.dropout2d(self.input_tensor, p = -0.1) - with self.assertRaises(ValueError): - F.dropout2d(self.input_tensor, p = 1.1) - - def test_function_dropout3d_no_dropout(self): - input_tensor_3d = self.input_tensor.unsqueeze(0) - output = F.dropout3d(input_tensor_3d, p = 0., training = True) - self.assertTrue(torch.equal(input_tensor_3d, output)) - - def test_function_dropout3d_train_vs_eval(self): - input_tensor_3d = self.input_tensor.unsqueeze(0) - output_train = F.dropout3d(input_tensor_3d, p = 0., training = True) - output_eval = F.dropout3d(input_tensor_3d, p = 0., training = False) - self.assertTrue(torch.equal(output_train, output_eval)) - - def test_function_dropout3d_invalid_probability(self): - input_tensor_3d = self.input_tensor.unsqueeze(0) - with self.assertRaises(ValueError): - F.dropout3d(input_tensor_3d, p = -0.1) - with self.assertRaises(ValueError): - F.dropout3d(input_tensor_3d, p = 1.1) - - -class TestWrapFunctional(unittest.TestCase): - - def test_get_functional_ops(self): - expected_ops = {'relu', 'sigmoid', 'softmax'} - actual_ops = get_functional_ops() - self.assertTrue(expected_ops.issubset(actual_ops)) - - def test_wrap_functional_ops_and_bind(self): - wrap_functional_ops_and_bind(None) - self.assertTrue(hasattr(HOOKFunctionalOP, 'wrap_relu')) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_npu_custom.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_npu_custom.py deleted file mode 100644 index 573d6d000f37f429619b89507cecd1258fbe4c8b..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_npu_custom.py +++ /dev/null @@ -1,43 +0,0 @@ -import unittest -from unittest.mock import MagicMock, patch - -from msprobe.core.common.const import Const -from msprobe.core.common.log import logger -from msprobe.pytorch.function_factory import npu_custom_functions -from msprobe.pytorch.hook_module.wrap_npu_custom import NpuOPTemplate - -try: - import torch_npu -except ImportError: - logger.info("Failing to import torch_npu.") - - -class TestNpuOPTemplate(unittest.TestCase): - - def setUp(self): - self.mock_hook = MagicMock(return_value=(MagicMock(), MagicMock(), MagicMock(), None)) - self.template = NpuOPTemplate("sum", self.mock_hook) - - def test_init(self): - self.assertEqual(self.template.op_name_, "sum") - self.assertEqual(self.template.prefix_op_name_, f"NPU{Const.SEP}sum{Const.SEP}") - self.assertTrue(self.template.need_hook) - self.assertEqual(self.template.device, Const.CPU_LOWERCASE) - - @patch('torch.ops.npu.sum') - def test_forward_without_hook(self, mock_npu_sum): - self.template.need_hook = False - npu_custom_functions["sum"] = MagicMock(return_value="output_from_custom") - - result = self.template.forward(1, 2, key='value') - self.assertEqual(result, "output_from_custom") - mock_npu_sum.assert_not_called() - - @patch('torch.ops.npu.sum') - def test_forward_with_hook(self, mock_npu_sum): - self.template.need_hook = True - mock_npu_sum.return_value = "output_from_npu" - - result = self.template.forward(1, 2, key='value') - self.assertEqual(result, "output_from_npu") - mock_npu_sum.assert_called_once_with(1, 2, key='value') diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_tensor.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_tensor.py deleted file mode 100644 index 6868c5bda7a88c84702d15e995c7f60af2b4e4c5..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_tensor.py +++ /dev/null @@ -1,40 +0,0 @@ -import unittest -import torch -from msprobe.pytorch.hook_module.wrap_tensor import get_tensor_ops, HOOKTensor, TensorOPTemplate, wrap_tensor_op, wrap_tensor_ops_and_bind - -class TestWrapTensor(unittest.TestCase): - - def hook(name, prefix): - def forward_pre_hook(nope, input, kwargs): - return input, kwargs - - def forward_hook(nope, input, kwargs, result): - return 2 - - def backward_hook(): - pass - - def forward_hook_torch_version_below_2(): - pass - - return forward_pre_hook, forward_hook, backward_hook, forward_hook_torch_version_below_2 - - def test_get_tensor_ops(self): - result = get_tensor_ops() - self.assertIsInstance(result, set) - - def test_HOOKTensor(self): - hook_tensor = HOOKTensor() - self.assertIsInstance(hook_tensor, HOOKTensor) - - def test_TensorOPTemplate(self): - tensor_op_template = TensorOPTemplate('add', self.hook) - self.assertTrue(tensor_op_template.op_name_, 'add') - - def test_wrap_tensor_op(self): - wrapped_op = wrap_tensor_op('add', self.hook) - self.assertTrue(callable(wrapped_op)) - - def test_wrap_tensor_ops_and_bind(self): - wrap_tensor_ops_and_bind(self.hook) - self.assertTrue(hasattr(HOOKTensor, 'wrap_add')) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_torch.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_torch.py deleted file mode 100644 index e0e4d000c0bd83be4facbbb406357427faf875ec..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_torch.py +++ /dev/null @@ -1,48 +0,0 @@ -import unittest -import torch -from msprobe.pytorch.hook_module.wrap_torch import * - -class TestWrapTorch(unittest.TestCase): - - def hook(name, prefix): - def forward_pre_hook(nope, input, kwargs): - return input, kwargs - - def forward_hook(nope, input, kwargs, result): - return 2 - - def backward_hook(): - pass - - def forward_hook_torch_version_below_2(): - pass - - return forward_pre_hook, forward_hook, backward_hook, forward_hook_torch_version_below_2 - - def setUp(self): - - self.op_name = 'add' - self.torch_op = wrap_torch_op(self.op_name, self.hook) - - def test_get_torch_ops(self): - self.setUp() - ops = get_torch_ops() - self.assertIsInstance(ops, set) - self.assertIn(self.op_name, ops) - - def test_TorchOPTemplate(self): - self.setUp() - template = TorchOPTemplate(self.op_name, self.hook) - self.assertEqual(template.op_name_, self.op_name) - self.assertEqual(template.prefix_op_name_, "Torch." + str(self.op_name) + ".") - - def test_forward(self): - self.setUp() - template = TorchOPTemplate(self.op_name, self.hook) - result = template.forward(torch.tensor([1, 2, 3]), torch.tensor([4, 5, 6])) - torch.testing.assert_close(result, torch.tensor([5, 7, 9])) - - def test_wrap_torch_ops_and_bind(self): - self.setUp() - wrap_torch_ops_and_bind(self.hook) - self.assertTrue(hasattr(HOOKTorchOP, "wrap_" + self.op_name)) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_vf.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_vf.py deleted file mode 100644 index 98efb4bc5b8a30284fe820124e48af7f487d1c54..0000000000000000000000000000000000000000 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/hook_module/test_wrap_vf.py +++ /dev/null @@ -1,11 +0,0 @@ -import unittest -import torch -from msprobe.pytorch.hook_module import wrap_vf - -class TestWrapVF(unittest.TestCase): - def setUp(self): - self.hook = lambda x: x - - def test_get_vf_ops(self): - ops = wrap_vf.get_vf_ops() - self.assertIsInstance(ops, list) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/demo_model.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/demo_model.py index f5de419440224cca261b62df2495e8ce28b8e2d4..820b1f7476d3d92288069bc00ac798c44bf14da6 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/demo_model.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/demo_model.py @@ -1,7 +1,25 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import torch import torch.nn.functional as F from msprobe.pytorch import TrainerMon from msprobe.pytorch.common import seed_all +from msprobe.pytorch.hook_module.api_register import get_api_register + +get_api_register().restore_all_api() device = torch.device('cpu') dtype_float32 = torch.float32 diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_csv2tb.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_csv2tb.py index f2bc82ffafc2a1f10719d4a46669bc0050c12782..4178e2ef8fbfb2c2bafa90b32fa92d622b95e3cd 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_csv2tb.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_csv2tb.py @@ -1,3 +1,18 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import os import shutil import random @@ -11,6 +26,9 @@ from tensorboard.backend.event_processing.event_accumulator import EventAccumula from msprobe.pytorch import TrainerMon from msprobe.core.common.const import MonitorConst from msprobe.pytorch.monitor.csv2tb import parse_step_fn, csv2tensorboard_by_step +from msprobe.pytorch.hook_module.api_register import get_api_register + +get_api_register().restore_all_api() base_dir = os.path.dirname(os.path.realpath(__file__)) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_module_hook.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_module_hook.py index eefacb73c8e76636086554775b0e6f2e916ddf6e..66d016f9487a4e7f7fc747dfb021b1f887c51f4a 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_module_hook.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/monitor/test_module_hook.py @@ -1,3 +1,18 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import os.path import shutil import unittest @@ -8,10 +23,13 @@ import torch from msprobe.core.common.const import MonitorConst, Const from torch import distributed as dist +from msprobe.pytorch import TrainerMon +from msprobe.pytorch.hook_module.api_register import get_api_register from msprobe.pytorch.monitor.module_hook import CommunicationContext, GradContext, ModuleHookContext, \ param_is_not_tensor_parallel_duplicate, param_is_data_parallel_duplicate from msprobe.test.pytorch_ut.monitor.demo_model import monitor_demo -from msprobe.pytorch import TrainerMon + +get_api_register().restore_all_api() base_dir = os.path.dirname(os.path.realpath(__file__)) diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_config.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_config.py index c1b8bac47fda100636b55fbc5ad452c2843e8aaa..0724581bc79f48ed158b691d315a5a75cc1bc65d 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_config.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_config.py @@ -397,13 +397,13 @@ class TestRunUTConfig(unittest.TestCase): def test_check_nfs_path_config_not_exist(self, mock_exists): with self.assertRaises(Exception) as context: RunUTConfig.check_nfs_path_config("./invalid_nfs") - self.assertIn("does not exist", str(context.exception)) + self.assertIn("[msprobe] 非法文件路径:", str(context.exception)) @patch('os.path.exists', return_value=False) def test_check_tls_path_config_not_exist(self, mock_exists): with self.assertRaises(Exception) as context: RunUTConfig.check_tls_path_config("./invalid_tls") - self.assertIn("does not exist", str(context.exception)) + self.assertIn("[msprobe] 非法文件路径:", str(context.exception)) def test_check_run_ut_config(self): with patch.object(RunUTConfig, 'check_filter_list_config') as mock_filter, \ diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_debug_save.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_debug_save.py index 534437260e66d9e586d69d557d30e308a9f4f3ee..d68f28066fab1dbd453a91f34bfbc762949f3da0 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_debug_save.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/test_pt_debug_save.py @@ -38,43 +38,3 @@ class TestPytorchDebuggerSave(TestCase): task_config = BaseConfig(statistics_task_json) with patch("msprobe.pytorch.debugger.precision_debugger.parse_json_config", return_value=(common_config, task_config)): self.debugger = PrecisionDebugger() - - def test_forward_and_backward(self): - def forward_func(x, y): - PrecisionDebugger.save(x, "x_tensor") - return x * y - x = torch.tensor([1.]) - y = torch.tensor([2.]) - x.requires_grad = True - y.requires_grad = True - result_json = { - "task": "statistics", - "level": "debug", - "framework": "pytorch", - "dump_data_dir": None, - "data": { - "x_tensor.0": { - "type": "torch.Tensor", - "dtype": "torch.float32", - "shape": torch.Size([1]), - "Max": 1.0, - "Min": 1.0, - "Mean": 1.0, - "Norm": 1.0, - "requires_grad": True - }, - "x_tensor_grad.0": { - "type": "torch.Tensor", - "dtype": "torch.float32", - "shape": torch.Size([1]), - "Max": 2.0, - "Min": 2.0, - "Mean": 2.0, - "Norm": 2.0, - "requires_grad": False - } - } - } - loss = forward_func(x, y) - loss.backward() - self.assertEqual(self.debugger.service.data_collector.data_writer.cache_debug, result_json) \ No newline at end of file diff --git a/debug/accuracy_tools/msprobe/test/pytorch_ut/test_service.py b/debug/accuracy_tools/msprobe/test/pytorch_ut/test_service.py index 6687f3111050ea53e14e62f3afd55ae1eff2b8c0..f0da7c467f63cd95e3e8585816aba374912d4347 100644 --- a/debug/accuracy_tools/msprobe/test/pytorch_ut/test_service.py +++ b/debug/accuracy_tools/msprobe/test/pytorch_ut/test_service.py @@ -1,7 +1,23 @@ +# Copyright (c) 2024-2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import unittest from unittest.mock import patch, mock_open, MagicMock from msprobe.core.common.utils import Const +from msprobe.core.data_dump.api_registry import ApiRegistry from msprobe.pytorch.debugger.debugger_config import DebuggerConfig from msprobe.pytorch.pt_config import parse_json_config from msprobe.pytorch.service import Service @@ -67,7 +83,7 @@ class TestService(unittest.TestCase): def test_step_success(self): self.service.step() - self.assertEqual(self.service.current_iter, 1) + self.assertEqual(self.service.loop, 1) def test_step_fail(self): self.service.should_stop_service = True @@ -87,8 +103,8 @@ class TestService(unittest.TestCase): self.service.build_hook = MagicMock() self.config.level = "L1" with patch("msprobe.pytorch.service.logger.info_on_rank_0") as mock_logger, \ - patch("msprobe.pytorch.service.api_register.initialize_hook") as mock_init_hook, \ - patch("msprobe.pytorch.service.api_register.api_modularity") as mock_api_modularity: + patch.object(ApiRegistry, "initialize_hook") as mock_init_hook, \ + patch.object(ApiRegistry, 'register_all_api') as mock_api_modularity: self.service.register_api_hook() self.assertEqual(mock_logger.call_count, 1) mock_init_hook.assert_called_once() diff --git a/debug/accuracy_tools/msprobe/test/resources/layer_mapping/mindspore/dump.json b/debug/accuracy_tools/msprobe/test/resources/layer_mapping/mindspore/dump.json index b55f9e0699fe6329ceeb09a51fe20118c65545e7..153d84e7d117b5be89dfdb522edc39dc066929cb 100644 --- a/debug/accuracy_tools/msprobe/test/resources/layer_mapping/mindspore/dump.json +++ b/debug/accuracy_tools/msprobe/test/resources/layer_mapping/mindspore/dump.json @@ -1,6 +1,7 @@ { "task": "statistics", "level": "mix", + "framework": "mindspore", "dump_data_dir": null, "data": { "Cell.network_with_loss.module.language_model.embedding.word_embeddings.VocabParallelEmbedding.forward.0": { diff --git a/debug/accuracy_tools/msprobe/test/resources/layer_mapping/pytorch/dump.json b/debug/accuracy_tools/msprobe/test/resources/layer_mapping/pytorch/dump.json index d7dd1c0c38e2d24c8b0d19c346a50eb33437d232..02239176a9d690c4ce70c06cc6ab117a3c122811 100644 --- a/debug/accuracy_tools/msprobe/test/resources/layer_mapping/pytorch/dump.json +++ b/debug/accuracy_tools/msprobe/test/resources/layer_mapping/pytorch/dump.json @@ -1,6 +1,7 @@ { "task": "statistics", "level": "mix", + "framework": "pytorch", "dump_data_dir": null, "data": { "Module.module.module.language_model.embedding.word_embeddings.VocabParallelEmbedding.forward.0": { diff --git a/debug/accuracy_tools/msprobe/test/visualization_ut/builder/test_graph_builder.py b/debug/accuracy_tools/msprobe/test/visualization_ut/builder/test_graph_builder.py index 706dc8bf82e59f413c3fd559a39af89c6a70be47..2e41f2a325cf77937884b624c14b9ee7bef6c243 100644 --- a/debug/accuracy_tools/msprobe/test/visualization_ut/builder/test_graph_builder.py +++ b/debug/accuracy_tools/msprobe/test/visualization_ut/builder/test_graph_builder.py @@ -32,7 +32,7 @@ class TestGraphBuilder(unittest.TestCase): self.assertIsInstance(graph, Graph) self.assertEqual(len(graph.node_map), 3) - @patch('msprobe.visualization.builder.graph_builder.save_json_file') + @patch('msprobe.visualization.builder.graph_builder.save_json') def test_to_json(self, mock_save_json_file): GraphBuilder.to_json("step/rank/output.vis", self.config) mock_save_json_file.assert_called_once() @@ -111,3 +111,23 @@ class TestGraphBuilder(unittest.TestCase): self.assertEqual(graph.root.subnodes[2].op, NodeOp.module) self.assertEqual(len(graph.root.subnodes[0].subnodes), 0) self.assertEqual(graph.root.subnodes[0].id, 'Module.a.0') + + def test_add_parameters_grad(self): + graph = Graph('TestNet') + graph.add_node(NodeOp.module, 'Module.a.backward.0', graph.root) + graph.add_node(NodeOp.module, 'Module.b.backward.0', graph.root) + graph.add_node(NodeOp.module, 'Module.a.backward.1', graph.root) + graph.add_node(NodeOp.module, 'Module.aa.backward.0', graph.get_node('Module.a.backward.0')) + graph.add_node(NodeOp.module, 'Module.aaa.backward.0', graph.get_node('Module.a.backward.0')) + graph.add_node(NodeOp.module, 'Module.aa.backward.1', graph.get_node('Module.a.backward.1')) + graph.add_node(NodeOp.module, 'Module.aaa.backward.1', graph.get_node('Module.a.backward.1')) + + data_dict = {'Module.a.parameters_grad': {}, 'Module.aaa.parameters_grad': {}} + GraphBuilder._add_parameters_grad(graph, data_dict) + root_nodes_id = [node.id for node in graph.get_node('TestNet').subnodes] + sub_nodes_id0 = [node.id for node in graph.get_node('Module.a.backward.0').subnodes] + sub_nodes_id1 = [node.id for node in graph.get_node('Module.a.backward.1').subnodes] + + self.assertEqual(root_nodes_id[-1], 'Module.a.backward.1') + self.assertEqual(sub_nodes_id0[-1], 'Module.aaa.backward.0') + self.assertEqual(sub_nodes_id1[-1], 'Module.a.parameters_grad') diff --git a/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_mode_adapter.py b/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_mode_adapter.py index 87d1f9ee5f01c7c9b2f264f3e6ec16b5155c1f8e..5f9a64f04dd7d4bffe0881519c2aa1264c105898 100644 --- a/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_mode_adapter.py +++ b/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_mode_adapter.py @@ -2,7 +2,8 @@ import json import unittest from unittest.mock import patch, MagicMock from msprobe.visualization.compare.mode_adapter import ModeAdapter -from msprobe.visualization.graph.base_node import BaseNode, NodeOp +from msprobe.visualization.graph.base_node import BaseNode +from msprobe.visualization.graph.node_op import NodeOp from msprobe.visualization.utils import GraphConst, ToolTip from msprobe.core.common.const import CompareConst @@ -225,27 +226,6 @@ class TestModeAdapter(unittest.TestCase): self.adapter.add_csv_data(compare_result_list) self.assertEqual(self.adapter.csv_data, compare_result_list) - def test_add_error_key(self): - node_data = {'key': {}} - self.adapter.compare_mode = GraphConst.REAL_DATA_COMPARE - self.adapter.add_error_key(node_data) - self.assertEqual(node_data['key'][GraphConst.ERROR_KEY], - [CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO]) - node_data = {'key': {}} - self.adapter.compare_mode = GraphConst.SUMMARY_COMPARE - self.adapter.add_error_key(node_data) - self.assertEqual(node_data['key'][GraphConst.ERROR_KEY], - [CompareConst.MAX_RELATIVE_ERR, CompareConst.MIN_RELATIVE_ERR, - CompareConst.MEAN_RELATIVE_ERR, CompareConst.NORM_RELATIVE_ERR]) - node_data = {'key': []} - self.adapter.add_error_key(node_data) - self.assertEqual(node_data['key'], []) - - node_data = {'key': {}} - self.adapter.compare_mode = '111' - self.adapter.add_error_key(node_data) - self.assertEqual(node_data['key'], {'error_key': []}) - def test_get_tool_tip(self): self.adapter.compare_mode = GraphConst.MD5_COMPARE tips = self.adapter.get_tool_tip() diff --git a/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_multi_mapping.py b/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_multi_mapping.py new file mode 100644 index 0000000000000000000000000000000000000000..7fe14317b2af7334693270d060c58af2dada4cbc --- /dev/null +++ b/debug/accuracy_tools/msprobe/test/visualization_ut/compare/test_multi_mapping.py @@ -0,0 +1,114 @@ +import unittest +from msprobe.visualization.compare.multi_mapping import MultiMapping +from msprobe.visualization.graph.graph import Graph +from msprobe.visualization.graph.base_node import BaseNode +from msprobe.visualization.graph.node_op import NodeOp +from msprobe.visualization.utils import GraphConst + + +class TestMultiMapping(unittest.TestCase): + + def setUp(self): + pass + + def test_validate_yaml(self): + multi_mapping = MultiMapping.validate_yaml({}) + self.assertEqual(multi_mapping, {}) + + multi_mapping = MultiMapping.validate_yaml([]) + self.assertEqual(multi_mapping, {}) + + multi_mapping = MultiMapping.validate_yaml({'a': 'b'}) + self.assertEqual(multi_mapping, {('a',): ('b',)}) + + multi_mapping = MultiMapping.validate_yaml({'a': 'b c d'}) + self.assertEqual(multi_mapping, {('a',): ('b c d',)}) + + multi_mapping = MultiMapping.validate_yaml({'a': 'b, c, d'}) + self.assertEqual(multi_mapping, {('a',): ('b', 'd')}) + + def test_validate_ids_in_graph(self): + graph = Graph("model_name") + graph.node_map = {'node1': BaseNode(NodeOp.module, 'node1'), + 'node2': BaseNode(NodeOp.module, 'node2'), + 'node3': BaseNode(NodeOp.module, 'node3')} + result = MultiMapping.validate_ids_in_graph(['node1', 'node3'], graph) + self.assertTrue(result) + + result = MultiMapping.validate_ids_in_graph(['node1', 'node5'], graph) + self.assertFalse(result) + + def test_get_merged_nodes_data(self): + node_ids = ['Module.layer1.Linear.forward.0', 'Module.layer3.Linear.forward.0'] + dump_data = {'Module.layer1.Linear.forward.0': {'input_args': [ + {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [100, 10], 'Max': 3.029174327850342, + 'Min': -3.405808448791504, 'Mean': -0.08760099112987518, 'Norm': 31.511741638183594, + 'requires_grad': False}], 'input_kwargs': {}, 'output': [ + {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [100, 20], 'Max': 2.280996561050415, + 'Min': -2.6040544509887695, 'Mean': -0.05008987337350845, 'Norm': 26.9143123626709, + 'requires_grad': True}], 'parameters': { + 'weight': {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [20, 10], 'Max': 0.31333038210868835, + 'Min': -0.3147874176502228, 'Mean': -0.007642852142453194, 'Norm': 2.594407558441162, + 'requires_grad': True}, + 'bias': {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [20], 'Max': 0.3160688579082489, + 'Min': -0.31076428294181824, 'Mean': -0.05035770684480667, 'Norm': 0.8817608952522278, + 'requires_grad': True}}, 'is_recompute': False}, + 'Module.layer3.Linear.forward.0': {'input_args': [ + {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [100, 30], 'Max': 1.8936877250671387, + 'Min': -1.60052490234375, 'Mean': -0.05550510436296463, 'Norm': 21.1639404296875, + 'requires_grad': True}], 'input_kwargs': {}, 'output': [ + {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [100, 1], 'Max': 0.8175169229507446, + 'Min': -0.3781408369541168, 'Mean': 0.16728776693344116, 'Norm': 2.627354145050049, + 'requires_grad': True}], 'parameters': { + 'weight': {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [1, 30], + 'Max': 0.17745383083820343, 'Min': -0.11874081194400787, 'Mean': 0.013812449760735035, + 'Norm': 0.48705562949180603, 'requires_grad': True}, + 'bias': {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [1], 'Max': 0.1430283486843109, + 'Min': 0.1430283486843109, 'Mean': 0.1430283486843109, 'Norm': 0.1430283486843109, + 'requires_grad': True}}, 'is_recompute': False}} + multi_node_data = {'input_args': [ + {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [100, 10], 'Max': 3.029174327850342, + 'Min': -3.405808448791504, 'Mean': -0.08760099112987518, 'Norm': 31.511741638183594, + 'requires_grad': False}], 'input_kwargs': {}, 'output': [ + {'type': 'torch.Tensor', 'dtype': 'torch.float32', 'shape': [100, 1], 'Max': 0.8175169229507446, + 'Min': -0.3781408369541168, 'Mean': 0.16728776693344116, 'Norm': 2.627354145050049, + 'requires_grad': True}]} + result = MultiMapping.get_merged_nodes_data(node_ids, dump_data, 'multi_node0') + self.assertEqual(result, {'multi_node0': multi_node_data}) + result = MultiMapping.get_merged_nodes_data([], dump_data, 'multi_node0') + self.assertEqual(result, {}) + + def test_merge_nodes(self): + graph = Graph('graph') + graph.add_node(NodeOp.module, 'Module.layer1.Linear.forward.0', graph.root) + graph.add_node(NodeOp.module, 'Module.layer2.Linear.forward.0', graph.root) + graph.add_node(NodeOp.module, 'Module.layer3.Linear.forward.0', graph.root) + result = MultiMapping.merge_nodes(['Module.layer1.Linear.forward.0', 'Module.layer3.Linear.forward.0'], + graph) + self.assertTrue(isinstance(result.multi_node, BaseNode)) + self.assertEqual(result.multi_node.subnodes, [graph.get_node('Module.layer1.Linear.forward.0'), + graph.get_node('Module.layer2.Linear.forward.0'), + graph.get_node('Module.layer3.Linear.forward.0')]) + self.assertEqual(result.multi_node.upnode, graph.get_node('graph')) + self.assertEqual(result.multi_node.id, GraphConst.MERGE_NODES + '.forward.0') + + result = MultiMapping.merge_nodes(['Module.layer1.Linear.forward.0'], graph) + self.assertEqual(result.multi_node, graph.get_node('Module.layer1.Linear.forward.0')) + + result = MultiMapping.merge_nodes(['Module.layer5.Linear.forward.0', 'Module.layer6.Linear.forward.0'], + graph) + self.assertIsNone(result.multi_node) + + result = MultiMapping.merge_nodes(['Module.layer3.Linear.forward.0', 'Module.layer1.Linear.forward.0'], + graph) + self.assertIsNone(result.multi_node) + + def test_split_mapping_str(self): + result = MultiMapping._split_mapping_str('a, b,c, d') + self.assertEqual(result, ('a', 'd')) + + result = MultiMapping._split_mapping_str('a') + self.assertEqual(result, ('a',)) + + result = MultiMapping._split_mapping_str('a b* c ') + self.assertEqual(result, ('a b* c',)) diff --git a/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_base_node.py b/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_base_node.py index 480b95620e6a81577d825b7af55b45fc0a04c34c..64b7101c6b036113e018faec649974753acdaec3 100644 --- a/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_base_node.py +++ b/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_base_node.py @@ -1,6 +1,6 @@ import unittest -from msprobe.visualization.graph.base_node import BaseNode, NodeOp -from msprobe.visualization.utils import GraphConst +from msprobe.visualization.graph.base_node import BaseNode +from msprobe.visualization.graph.node_op import NodeOp class TestBaseNode(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_graph.py b/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_graph.py index 81f9fdca5277de6e1670da409bcf93e56ece3206..24f39cbb808234cfce6af02046755d3df3a1a5e4 100644 --- a/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_graph.py +++ b/debug/accuracy_tools/msprobe/test/visualization_ut/graph/test_graph.py @@ -55,17 +55,6 @@ class TestGraph(unittest.TestCase): self.assertIsNotNone(matched_node) self.assertEqual(ancestors, ['node_id_a']) - def test_dfs(self): - graph = Graph("model_name") - graph.add_node(NodeOp.module, "node_a") - graph.add_node(NodeOp.module, "node_b") - node_a = BaseNode(self.node_op, self.node_id) - result = {} - graph.dfs(node_a, result) - self.assertEqual(result, {'node_id': {'id': 'node_id', 'node_type': 0, 'data': {}, - 'output_data': {}, 'input_data': {}, 'upnode': 'None', 'subnodes': [], - 'matched_node_link': [], 'suggestions': {}, 'stack_info': []}}) - def test_split_nodes_by_micro_step(self): nodes = [BaseNode(NodeOp.module, 'a.forward.0'), BaseNode(NodeOp.module, 'a.backward.0'), BaseNode(NodeOp.api_collection, 'apis.0'), BaseNode(NodeOp.module, 'a.forward.1'), diff --git a/debug/accuracy_tools/msprobe/test/visualization_ut/test_graph_service.py b/debug/accuracy_tools/msprobe/test/visualization_ut/test_graph_service.py index 7dfd9564ebc21327f3e7e29be90da7f78c3b0393..0fe7047fb8aa3c1e7b0c9291b4892c9e75224a0d 100644 --- a/debug/accuracy_tools/msprobe/test/visualization_ut/test_graph_service.py +++ b/debug/accuracy_tools/msprobe/test/visualization_ut/test_graph_service.py @@ -21,6 +21,7 @@ class Args: overflow_check: bool = False fuzzy_match: bool = False complete_stack: bool = False + multi_mapping: str = None class TestGraphService(unittest.TestCase): diff --git a/debug/accuracy_tools/msprobe/visualization/builder/graph_builder.py b/debug/accuracy_tools/msprobe/visualization/builder/graph_builder.py index 814882e6b819e9e6b6b421aec5f8f0b89f03f7c6..bec99d675f4b1238fde3905037ec5f7fb5a0c8fe 100644 --- a/debug/accuracy_tools/msprobe/visualization/builder/graph_builder.py +++ b/debug/accuracy_tools/msprobe/visualization/builder/graph_builder.py @@ -16,19 +16,19 @@ import re from msprobe.core.common.const import Const -from msprobe.core.common.file_utils import load_json +from msprobe.core.common.file_utils import load_json, save_json from msprobe.visualization.builder.msprobe_adapter import get_input_output from msprobe.visualization.builder.msprobe_adapter import op_patterns from msprobe.visualization.graph.graph import Graph from msprobe.visualization.graph.node_op import NodeOp -from msprobe.visualization.utils import save_json_file, GraphConst +from msprobe.visualization.utils import GraphConst class GraphBuilder: backward_pattern = re.compile(r"(\.backward\.)(\d+)$") forward_pattern = re.compile(r"(\.forward\.)(\d+)$") - # 匹配以大写字母开头,后接任意字母,并以Template(结尾 - template_pattern = re.compile(r'\b[A-Z][a-zA-Z]*Template\(') + # 匹配以大写字母开头,后接任意字母,并以Template(结尾,或包含api_template(的字符串 + template_pattern = re.compile(r'\b([A-Z][a-zA-Z]*Template|api_template)\(') @staticmethod def build(construct_path, data_path, stack_path, model_name='DefaultModel', complete_stack=False): @@ -51,6 +51,7 @@ class GraphBuilder: graph = Graph(model_name, data_path=dump_dict.get('dump_data_dir', ''), dump_data=data_dict) GraphBuilder._init_nodes(graph, construct_dict, data_dict, stack_dict) GraphBuilder._collect_apis_between_modules(graph) + GraphBuilder._add_parameters_grad(graph, data_dict) return graph @staticmethod @@ -73,7 +74,7 @@ class GraphBuilder: if config.task: result[GraphConst.JSON_TASK_KEY] = config.task result[GraphConst.OVERFLOW_CHECK] = config.overflow_check - save_json_file(filename, result) + save_json(filename, result, indent=4) @staticmethod def _simplify_stack(stack_dict): @@ -235,6 +236,44 @@ class GraphBuilder: graph.root.subnodes = output + @staticmethod + def _add_parameters_grad(graph, data_dict): + """ + 将parameters_grad信息添加到graph中, + 对应模块的parameters_grad节点添加到对应模块的最后一次backward节点(backward计数最大)内作为子节点 + + 例如,graph有节点Module.a.backward.0, Module.a.backward.1, Module.a.backward.2 + 则Module.a.parameters_grad添加在Module.a.backward.2内作为子节点 + """ + prefixes = [] + suffix = Const.SEP + Const.PARAMS_GRAD + for node_id in data_dict.keys(): + if node_id not in graph.node_map and node_id.endswith(suffix): + prefixes.append(node_id.replace(suffix, '')) + + max_info = {prefix: 0 for prefix in prefixes} + + for key in graph.node_map.keys(): + for prefix in prefixes: + # 构建正则表达式,匹配以 "backward.数字" 结尾的键 + pattern = re.compile(r'^' + re.escape(prefix) + r'\.backward\.(\d+)$') + match = pattern.match(key) + if match: + num = int(match.group(1)) + if num > max_info[prefix]: + max_info[prefix] = num + + for prefix, num in max_info.items(): + node_id = prefix + Const.SEP + Const.BACKWARD + Const.SEP + str(num) + node = graph.get_node(node_id) + if node: + parameters_grad_node_id = graph.add_node(NodeOp.module, prefix + suffix, up_node=node) + # 添加输入输出数据 + node_data = data_dict.get(parameters_grad_node_id, {}) + input_data, output_data = get_input_output(node_data, parameters_grad_node_id) + # 更新数据 + graph.get_node(parameters_grad_node_id).set_input_output(input_data, output_data) + class GraphExportConfig: def __init__(self, graph_n, graph_b=None, tool_tip=None, node_colors=None, micro_steps=None, task='', diff --git a/debug/accuracy_tools/msprobe/visualization/builder/msprobe_adapter.py b/debug/accuracy_tools/msprobe/visualization/builder/msprobe_adapter.py index ee5e3f519ed126b2aaa493e0d3a3b7fce33313e4..2f219ce099c83254051ecb3d566b1bc1529e3f99 100644 --- a/debug/accuracy_tools/msprobe/visualization/builder/msprobe_adapter.py +++ b/debug/accuracy_tools/msprobe/visualization/builder/msprobe_adapter.py @@ -13,12 +13,12 @@ # See the License for the specific language governing permissions and # limitations under the License. import re -import math from msprobe.core.compare.acc_compare import read_op, merge_tensor, get_accuracy from msprobe.core.common.utils import set_dump_path, get_dump_mode from msprobe.visualization.utils import GraphConst from msprobe.core.common.const import Const from msprobe.core.compare.acc_compare import ModeConfig +from msprobe.core.common.file_utils import load_json # 用于将节点名字解析成对应的NodeOp的规则 op_patterns = [ @@ -63,6 +63,31 @@ def run_real_data(dump_path_param, csv_path, framework, is_cross_frame=False): return ms_comparator.do_multi_process(dump_path_param, csv_path) +def run_real_data_single(op_names, op_name_mapping_dict, input_param, framework, is_cross_frame=False): + """ + 单进程运行生成真实数据 + Args: + op_names: [npu_op_name, bench_op_name], excel中的NPU_Name和Bench_Name,例如:Functional.conv2d.0.forward.input.3.0 + op_name_mapping_dict: op_name和npy或pt文件的映射关系 + input_param: npu_json_path/bench_json_path/stack_json_path等参数 + framework: 框架类型, pytorch或mindspore + is_cross_frame: 是否进行跨框架比对,仅支持mindspore比pytorch, 其中pytorch为标杆 + """ + if not isinstance(op_names, list) or len(op_names) != 2: + return [] + mode_config = ModeConfig(stack_mode=False, auto_analyze=True, fuzzy_match=False, dump_mode=Const.ALL) + set_dump_path(input_param) + + if framework == Const.PT_FRAMEWORK: + from msprobe.pytorch.compare.pt_compare import PTComparator + return PTComparator(mode_config).compare_by_op(op_names[0], op_names[1], op_name_mapping_dict, input_param) + else: + from msprobe.mindspore.compare.ms_compare import MSComparator, MappingConfig + ms_comparator = MSComparator(mode_config, MappingConfig()) + ms_comparator.cross_frame = is_cross_frame + return ms_comparator.compare_by_op(op_names[0], op_names[1], op_name_mapping_dict, input_param) + + def get_input_output(node_data, node_id): """ 将dump的原始数据进行拆解,分解为output和input两个数据 diff --git a/debug/accuracy_tools/msprobe/visualization/compare/graph_comparator.py b/debug/accuracy_tools/msprobe/visualization/compare/graph_comparator.py index 902d721a8d1047b687b878eb45a802a1df4154bd..41a7276d16833f01207f2f7733a5426b4e31f852 100644 --- a/debug/accuracy_tools/msprobe/visualization/compare/graph_comparator.py +++ b/debug/accuracy_tools/msprobe/visualization/compare/graph_comparator.py @@ -1,4 +1,4 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# Copyright (c) 2025, Huawei Technologies Co., Ltd. # All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -14,15 +14,21 @@ # limitations under the License. import re -from msprobe.visualization.builder.msprobe_adapter import compare_node, get_compare_mode, run_real_data +from msprobe.visualization.builder.msprobe_adapter import compare_node, get_compare_mode, run_real_data, \ + run_real_data_single from msprobe.visualization.utils import GraphConst, load_json_file, load_data_json_file, get_csv_df from msprobe.visualization.graph.graph import Graph, NodeOp -from msprobe.visualization.graph.node_colors import NodeColors from msprobe.visualization.compare.mode_adapter import ModeAdapter -from msprobe.core.common.const import Const +from msprobe.core.common.const import Const, CompareConst +from msprobe.core.common.log import logger +from msprobe.core.common.file_utils import load_yaml +from msprobe.visualization.compare.multi_mapping import MultiMapping +from msprobe.core.common.utils import recursion_depth_decorator class GraphComparator: + MAX_DEPTH = 1000 + def __init__(self, graphs, dump_path_param, args, mapping_dict=None): self.graph_n = graphs[0] self.graph_b = graphs[1] @@ -31,6 +37,7 @@ class GraphComparator: self.mapping_dict = mapping_dict self.fuzzy_match = args.fuzzy_match self.pattern = re.compile(r'\.\d+\.') + self.is_cross_frame = True if self.mapping_dict else False def compare(self): """ @@ -41,7 +48,75 @@ class GraphComparator: else: self._compare_nodes(self.graph_n.root) self._postcompare() - + + def multi_compare(self, multi_yaml_path): + """ + 多对多节点比对,需建立数量n与数量m节点之间的映射关系 + Args: + multi_yaml_path: 映射文件路径 + """ + multi_mapping = MultiMapping.validate_yaml(load_yaml(multi_yaml_path)) + if not multi_mapping: + logger.warning( + f'The multi mapping file {multi_yaml_path} content is incorrect, and the mapping is not effective.') + return + if self.ma.compare_mode == GraphConst.REAL_DATA_COMPARE: + # 获取真实数据指标在真实数据表头的索引 + id_list = [CompareConst.COMPARE_RESULT_HEADER.index(x) for x in CompareConst.ALL_COMPARE_INDEX] + for node_n_ids, node_b_ids in multi_mapping.items(): + if not MultiMapping.validate_ids_in_graph(node_n_ids, self.graph_n): + continue + if not MultiMapping.validate_ids_in_graph(node_b_ids, self.graph_b, GraphConst.JSON_BENCH_KEY): + continue + merged_items_n = MultiMapping.merge_nodes(node_n_ids, self.graph_n) + merged_items_b = MultiMapping.merge_nodes(node_b_ids, self.graph_b) + node_n = merged_items_n.multi_node + node_n_data = self.data_n_dict + node_b = merged_items_b.multi_node + node_b_data = self.data_b_dict + + if node_n.op == NodeOp.multi_collection: + node_n_data = MultiMapping.get_merged_nodes_data(node_n_ids, self.data_n_dict, node_n.id) + if node_b.op == NodeOp.multi_collection: + node_b_data = MultiMapping.get_merged_nodes_data(node_b_ids, self.data_b_dict, node_b.id) + + node = self._compare_node_with_mapping(node_n, {node_n.id: node_b.id}) + if not node: + continue + compare_result_list = compare_node([node_n.id, node_b.id], + [node_n_data, node_b_data], + self.stack_json_data, self.ma.compare_mode) + if not compare_result_list: + continue + # 真实数据模式,compare_result_list里没有精度指标,需要调用真实数据的比对接口得到指标 + if self.ma.compare_mode == GraphConst.REAL_DATA_COMPARE: + for compare_result in compare_result_list: + # 准备真实数据比对接口需要的参数 + full_param_name_n = compare_result[0] + full_param_name_b = compare_result[1] + + data_name_n = MultiMapping.get_dump_data_name(merged_items_n, full_param_name_n) + data_name_b = MultiMapping.get_dump_data_name(merged_items_b, full_param_name_b) + op_name_mapping_dict = {full_param_name_n: [data_name_n, data_name_b]} + + real_compare_result = run_real_data_single([full_param_name_n, full_param_name_b], + op_name_mapping_dict, self.dump_path_param, + self.framework, self.is_cross_frame) + if len(real_compare_result) < len(id_list): + continue + for i, index in enumerate(id_list): + # 根据索引,将真实数据指标插入表头相应位置 + compare_result[index] = real_compare_result[i] + compare_dict = {} + for item in compare_result_list: + if not isinstance(item, (list, tuple)) or not item: + continue + compare_dict[MultiMapping.replace_param_name(item[0], node_n.id)] = item + precision_index, _ = self.ma.parse_result(node_n, [compare_dict]) + node_n.data[GraphConst.JSON_INDEX_KEY] = precision_index + else: + self.add_compare_result_to_node(node_n, compare_result_list) + def add_compare_result_to_node(self, node, compare_result_list): """ 将比对结果添加到节点的输入输出数据中 @@ -66,55 +141,15 @@ class GraphComparator: self.ma.parse_result(node, [compare_in_dict, compare_out_dict])) node.data[GraphConst.JSON_INDEX_KEY] = precision_index node.data.update(other_dict) - - def _parse_param(self, dump_path_param, output_path): - self.dump_path_param = dump_path_param - self.output_path = output_path - compare_mode = get_compare_mode(self.dump_path_param) - self.ma = ModeAdapter(compare_mode) - self.data_n_dict = load_data_json_file(dump_path_param.get('npu_json_path')) - self.data_b_dict = load_data_json_file(dump_path_param.get('bench_json_path')) - self.stack_json_data = load_json_file(dump_path_param.get('stack_json_path')) - - def _postcompare(self): - self._handle_api_collection_index() - if not self.ma.compare_mode == GraphConst.REAL_DATA_COMPARE: - return - df = get_csv_df(True, self.ma.csv_data, self.ma.compare_mode) - df = run_real_data(self.dump_path_param, df, self.framework, True if self.mapping_dict else False) - compare_data_dict = {row[0]: row.tolist() for _, row in df.iterrows()} - for node in self.ma.compare_nodes: - precision_index, _ = self.ma.parse_result(node, [compare_data_dict]) - node.data[GraphConst.JSON_INDEX_KEY] = precision_index - - def _handle_api_collection_index(self): - """ - api集合的指标, md5模式使用集合中所有api最小的指标,statistics和tensor模式使用集合中所有api最大的指标 - md5模式下指标为0代表最差,statistics和tensor模式下指标为1代表最差 - """ - for node in self.graph_n.root.subnodes: - if node.op == NodeOp.api_collection: - precision_index = GraphConst.MAX_INDEX_KEY if self.ma.compare_mode == GraphConst.MD5_COMPARE \ - else GraphConst.MIN_INDEX_KEY - for api in node.subnodes: - precision_index = min(precision_index, - api.data.get(GraphConst.JSON_INDEX_KEY, GraphConst.MAX_INDEX_KEY)) \ - if self.ma.compare_mode == GraphConst.MD5_COMPARE \ - else max(precision_index, api.data.get(GraphConst.JSON_INDEX_KEY, GraphConst.MIN_INDEX_KEY)) - node.data[GraphConst.JSON_INDEX_KEY] = precision_index + @recursion_depth_decorator('GraphComparator._compare_nodes', max_depth=MAX_DEPTH) def _compare_nodes(self, node_n): """ 递归遍历NPU树中的节点,如果在Bench中找到具有相同名称的节点,检查他们的祖先和参数信息,检查一致则及逆行精度数据对比 这里采用先序遍历,好处在于当这个节点被比较时,他的先序已经被匹配,这可以为后续的模糊匹配提供重要信息 """ if self.mapping_dict: - node_b, ancestors_n, ancestors_b = Graph.mapping_match(node_n, self.graph_b, self.mapping_dict) - if node_b: - ancestors_n.append(node_n.id) - ancestors_b.append(node_b.id) - node_n.matched_node_link = ancestors_b - node_b.matched_node_link = ancestors_n + node_b = self._compare_node_with_mapping(node_n, self.mapping_dict) else: node_b, ancestors = Graph.match(self.graph_n, node_n, self.graph_b) if node_b: @@ -126,6 +161,7 @@ class GraphComparator: for subnode in node_n.subnodes: self._compare_nodes(subnode) + @recursion_depth_decorator('GraphComparator._compare_nodes_fuzzy', max_depth=MAX_DEPTH) def _compare_nodes_fuzzy(self, node_n): if node_n.op != NodeOp.function_api: # 模块经过模糊匹配 @@ -146,6 +182,51 @@ class GraphComparator: for sub_node in node_n.subnodes: self._compare_nodes_fuzzy(sub_node) + def _compare_node_with_mapping(self, node_n, mapping_dict): + node_b, ancestors_n, ancestors_b = Graph.mapping_match(node_n, self.graph_b, mapping_dict) + if node_b: + ancestors_n.append(node_n.id) + ancestors_b.append(node_b.id) + node_n.matched_node_link = ancestors_b + node_b.matched_node_link = ancestors_n + return node_b + + def _parse_param(self, dump_path_param, output_path): + self.dump_path_param = dump_path_param + self.output_path = output_path + compare_mode = get_compare_mode(self.dump_path_param) + self.ma = ModeAdapter(compare_mode) + self.data_n_dict = load_data_json_file(dump_path_param.get('npu_json_path')) + self.data_b_dict = load_data_json_file(dump_path_param.get('bench_json_path')) + self.stack_json_data = load_json_file(dump_path_param.get('stack_json_path')) + + def _postcompare(self): + self._handle_api_collection_index() + if not self.ma.compare_mode == GraphConst.REAL_DATA_COMPARE: + return + df = get_csv_df(True, self.ma.csv_data, self.ma.compare_mode) + df = run_real_data(self.dump_path_param, df, self.framework, self.is_cross_frame) + compare_data_dict = {row[0]: row.tolist() for _, row in df.iterrows()} + for node in self.ma.compare_nodes: + precision_index, _ = self.ma.parse_result(node, [compare_data_dict]) + node.data[GraphConst.JSON_INDEX_KEY] = precision_index + + def _handle_api_collection_index(self): + """ + api集合的指标, md5模式使用集合中所有api最小的指标,statistics和tensor模式使用集合中所有api最大的指标 + md5模式下指标为0代表最差,statistics和tensor模式下指标为1代表最差 + """ + for node in self.graph_n.root.subnodes: + if node.op == NodeOp.api_collection: + precision_index = GraphConst.MAX_INDEX_KEY if self.ma.compare_mode == GraphConst.MD5_COMPARE \ + else GraphConst.MIN_INDEX_KEY + for api in node.subnodes: + precision_index = min(precision_index, + api.data.get(GraphConst.JSON_INDEX_KEY, GraphConst.MAX_INDEX_KEY)) \ + if self.ma.compare_mode == GraphConst.MD5_COMPARE \ + else max(precision_index, api.data.get(GraphConst.JSON_INDEX_KEY, GraphConst.MIN_INDEX_KEY)) + node.data[GraphConst.JSON_INDEX_KEY] = precision_index + def _get_and_add_result(self, node_n, node_b): compare_result_list = compare_node([node_n.id, node_b.id], [self.data_n_dict, self.data_b_dict], diff --git a/debug/accuracy_tools/msprobe/visualization/compare/mode_adapter.py b/debug/accuracy_tools/msprobe/visualization/compare/mode_adapter.py index 535192d80c566c48cedde4ea5b4474b6dc82dec0..7b961c4e8cdcb0b2d636d2782d3a9cce851a982f 100644 --- a/debug/accuracy_tools/msprobe/visualization/compare/mode_adapter.py +++ b/debug/accuracy_tools/msprobe/visualization/compare/mode_adapter.py @@ -14,7 +14,6 @@ # limitations under the License. import json -import math from msprobe.core.common.const import CompareConst, Const from msprobe.visualization.utils import ToolTip, GraphConst, str2float @@ -157,24 +156,6 @@ class ModeAdapter: return self.csv_data.extend(compare_result_list) - def add_error_key(self, node_data): - """ - 根据不同的模式进行提供不同错误信息 - """ - for key, value in node_data.items(): - if not isinstance(value, dict): - continue - if self.compare_mode == GraphConst.SUMMARY_COMPARE: - message = [CompareConst.MAX_RELATIVE_ERR, CompareConst.MIN_RELATIVE_ERR, - CompareConst.MEAN_RELATIVE_ERR, CompareConst.NORM_RELATIVE_ERR] - elif self.compare_mode == GraphConst.REAL_DATA_COMPARE: - message = [CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] - else: - # 输出件优化 - message = [] - value[GraphConst.ERROR_KEY] = message - node_data[key] = value - def get_tool_tip(self): """ 用于前端展示字段的具体含义 diff --git a/debug/accuracy_tools/msprobe/visualization/compare/multi_mapping.py b/debug/accuracy_tools/msprobe/visualization/compare/multi_mapping.py new file mode 100644 index 0000000000000000000000000000000000000000..bcc7c0f31351a52e40acfd6824c6b2f8f49ffd52 --- /dev/null +++ b/debug/accuracy_tools/msprobe/visualization/compare/multi_mapping.py @@ -0,0 +1,173 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from dataclasses import dataclass + +from msprobe.core.common.const import Const +from msprobe.core.common.log import logger +from msprobe.visualization.utils import GraphConst +from msprobe.visualization.graph.graph import NodeOp, BaseNode +from msprobe.core.compare.utils import get_name_and_state + + +@dataclass +class MergedItems: + multi_node: BaseNode = None + start_node: BaseNode = None + end_node: BaseNode = None + + +class MultiMapping: + + @staticmethod + def validate_yaml(yaml_file): + multi_mapping = {} + if not yaml_file: + logger.warning(f'The multi mapping file cannot be empty.') + return multi_mapping + if not isinstance(yaml_file, dict): + logger.warning(f'The multi mapping file format must be a dict.') + return multi_mapping + for key, value in yaml_file.items(): + multi_mapping[MultiMapping._split_mapping_str(key)] = MultiMapping._split_mapping_str(value) + return multi_mapping + + @staticmethod + def validate_ids_in_graph(node_ids, graph, graph_type=GraphConst.JSON_NPU_KEY): + in_graph = True + for node_id in node_ids: + if node_id not in graph.node_map: + logger.warning(f'{node_id} does not exist in the {graph_type} graph, and the mapping is not effective.') + in_graph = False + return in_graph + + @staticmethod + def get_merged_nodes_data(node_ids: (list, tuple), dump_data: dict, multi_node_id: str): + if len(node_ids) < 2: + return {} + multi_node_data = {} + for k, v in dump_data.get(node_ids[0], {}).items(): + if k in [Const.INPUT, Const.INPUT_ARGS, Const.INPUT_KWARGS]: + multi_node_data[k] = v + for k, v in dump_data.get(node_ids[-1], {}).items(): + if k == Const.OUTPUT: + multi_node_data[k] = v + return {multi_node_id: multi_node_data} + + @staticmethod + def replace_param_name(param_name: str, multi_node_id): + try: + api, _ = get_name_and_state(param_name) + except Exception: + return param_name + return param_name.replace(api, multi_node_id + Const.SEP) + + @staticmethod + def merge_nodes(node_ids, graph): + """ + 根据传入的节点名称列表,将列表中的节点合并为一个节点,并取列表中的首节点输入数据作为融合节点的输入,尾节点的输出数据作为融合节点的输出 + Args: + node_ids: 节点名称列表 + graph: 图 + + Returns: 融合节点,首节点,尾节点 + + """ + if not node_ids or not isinstance(node_ids, (list, tuple)): + return MergedItems() + if len(node_ids) == 1: + return MergedItems(graph.get_node(node_ids[0])) + # 根据映射文件中配置的首尾节点id,得到首尾节点id之间的所有节点id列表 + node0 = graph.get_node(node_ids[0]) + node1 = graph.get_node(node_ids[-1]) + if not node0 or not node1: + return MergedItems() + current_node_list = node0.upnode.subnodes + + start_index = end_index = 0 + for i, node in enumerate(current_node_list): + if node.id == node_ids[0]: + start_index = i + elif node.id == node_ids[-1]: + end_index = i + + if start_index > end_index: + logger.warning(f'{node_ids[0]} and {node_ids[-1]} are in the wrong order, {node_ids[0]} should come first, ' + f'and the mapping is not effective.') + return MergedItems() + + current_node_list = current_node_list[start_index:end_index + 1] + + # 创建一个新的节点,作为被映射多个节点的集合,输入使用第一个节点的输入,输出使用最后一个节点的输出 + multi_node_name = GraphConst.MERGE_NODES + Const.SEP + Const.FORWARD \ + if Const.SEP + Const.FORWARD + Const.SEP in node0.id \ + else GraphConst.MERGE_NODES + Const.SEP + Const.BACKWARD + multi_node_id = graph.add_node(NodeOp.multi_collection, multi_node_name, id_accumulation=True) + multi_node = graph.get_node(multi_node_id) + multi_node.subnodes = current_node_list + multi_node.upnode = node0.upnode + # 重新确立父子关系 + for node in current_node_list: + node.upnode = multi_node + + multi_node.upnode.subnodes[start_index:end_index + 1] = [multi_node] + + # 给节点添加输入输出数据, parameters信息不添加, 因为多对多节点之间的parameters的shape会不一致导致无法比对 + input_data = {} + output_data = {} + for key, value in node0.input_data.items(): + if any(s in key for s in [Const.INPUT, Const.INPUT_ARGS, Const.INPUT_KWARGS]): + input_data[MultiMapping.replace_param_name(key, multi_node_id)] = value + for key, value in node1.output_data.items(): + output_data[MultiMapping.replace_param_name(key, multi_node_id)] = value + multi_node.input_data = input_data + multi_node.output_data = output_data + + return MergedItems(multi_node, node0, node1) + + @staticmethod + def get_dump_data_name(merged_items, full_param_name): + """ + 根据节点参数名称,从融合节点信息中获取此参数的真实数据名称 + Args: + merged_items: 融合节点信息 + full_param_name: 参数名称,例如Module.layer.Linear.forward.0.input.0 + + Returns: 真实数据名称,例如Module.layer.Linear.forward.0.input.0.pt + + """ + try: + _, state = get_name_and_state(full_param_name) + except Exception: + return "-1" + node = merged_items.multi_node + # 如果是融合节点,那么其真实数据的存盘data_name需要从融合节点的首节点和尾节点中获取 + if node.op == NodeOp.multi_collection: + data = merged_items.end_node.output_data \ + if Const.OUTPUT == state \ + else merged_items.start_node.input_data + else: + data = node.output_data \ + if Const.OUTPUT == state \ + else node.input_data + + return data.get(full_param_name, {}).get("data_name", "-1") + + @staticmethod + def _split_mapping_str(x: str): + if Const.COMMA in x: + split_list = x.split(Const.COMMA) + return split_list[0].strip(), split_list[-1].strip() + return (x.strip(),) diff --git a/debug/accuracy_tools/msprobe/visualization/graph/base_node.py b/debug/accuracy_tools/msprobe/visualization/graph/base_node.py index 2642ff1e97ebcc055212d4d776eb7c8a08866dc8..fd1541b87bf5e7ba54a95089646683c41f546ca6 100644 --- a/debug/accuracy_tools/msprobe/visualization/graph/base_node.py +++ b/debug/accuracy_tools/msprobe/visualization/graph/base_node.py @@ -12,10 +12,11 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. + from msprobe.core.overflow_check.level import OverflowLevel -from msprobe.visualization.graph.node_op import NodeOp from msprobe.visualization.utils import GraphConst from msprobe.visualization.builder.msprobe_adapter import format_node_data, compare_data, compare_data_fuzzy +from msprobe.core.common.log import logger class BaseNode: @@ -114,7 +115,13 @@ class BaseNode: """ ancestors = [] current_node = self.upnode + seen_nodes = set() while current_node: + if current_node.id in seen_nodes: + logger.warning(f'Detected a cycle in the node structure and cannot get node ancestors, ' + f'current node is {current_node.id}.') + return [] + seen_nodes.add(current_node.id) ancestors.append(current_node.id) current_node = current_node.upnode return list(reversed(ancestors)) diff --git a/debug/accuracy_tools/msprobe/visualization/graph/distributed_analyzer.py b/debug/accuracy_tools/msprobe/visualization/graph/distributed_analyzer.py index 5e68d6b2528aea4d6645da2885fa76a7b9bb97b2..a4b709a1ed1e57fd34330e403992d5fdb781c4f5 100644 --- a/debug/accuracy_tools/msprobe/visualization/graph/distributed_analyzer.py +++ b/debug/accuracy_tools/msprobe/visualization/graph/distributed_analyzer.py @@ -107,15 +107,6 @@ class DistributedAnalyzer: return None, None return group_ranks, group_id - @staticmethod - def _get_batch_group_info(node, rank): - for data in node.input_data.values(): - group_id = data.get('group_id') - if group_id is not None: - return group_id - logger.warning(f'The group_id of node {node.id} does not exist, {CANNOT_MATCH}{rank}') - return None - def distributed_match(self): for rank, graph in self.graphs.items(): nodes = graph.node_map @@ -377,7 +368,7 @@ class DistributedAnalyzer: target_api_name = self.config.get(api_name)[0] target_rank = int(id_info[1].replace(Const.RANK, '')) except Exception as e: - logger.warning(f'Failed to parsing batch p2p parameter with error info: {e}.') + logger.warning(f'Failed to parse batch p2p parameter with error info: {e}.') continue target_node = self._get_target_node(rank, unique_group_id, api_name, target_rank, target_api_name) if not target_node: diff --git a/debug/accuracy_tools/msprobe/visualization/graph/graph.py b/debug/accuracy_tools/msprobe/visualization/graph/graph.py index 5ce12d1cadb9aec2cc7c65954bb861b85032212d..90574174144ecc6b53033871dceda2bc53c87ba5 100644 --- a/debug/accuracy_tools/msprobe/visualization/graph/graph.py +++ b/debug/accuracy_tools/msprobe/visualization/graph/graph.py @@ -20,9 +20,6 @@ from msprobe.core.common.log import logger from msprobe.core.common.const import Const -MAX_RECUR_LEVEL = 100 - - class Graph: def __init__(self, model_name, data_path='', dump_data=None): self.node_map = {} @@ -67,7 +64,6 @@ class Graph: ancestors_b = node_b.get_ancestors() return node_b, ancestors_n, ancestors_b - @staticmethod def fuzzy_match(node_n, node_b): if not node_n or not node_b or not node_n.fuzzy_eq(node_b): @@ -76,13 +72,6 @@ class Graph: ancestors_b = node_b.get_ancestors() return node_b, ancestors_n, ancestors_b - @staticmethod - def dfs(node, result): - info = node.to_dict() - result[node.id] = info - for subnode in node.subnodes: - Graph.dfs(subnode, result) - @staticmethod def split_nodes_by_micro_step(nodes): """ diff --git a/debug/accuracy_tools/msprobe/visualization/graph/node_op.py b/debug/accuracy_tools/msprobe/visualization/graph/node_op.py index 33bfa9cc2e34a0960c3ff236a1bd183a5753a0ab..12072fff032ee1e26c5e8274cd1676679d531331 100644 --- a/debug/accuracy_tools/msprobe/visualization/graph/node_op.py +++ b/debug/accuracy_tools/msprobe/visualization/graph/node_op.py @@ -22,9 +22,9 @@ from msprobe.core.common.log import logger class NodeOp(Enum): module = 0 function_api = 1 + multi_collection = 8 api_collection = 9 - @staticmethod def get_node_op(node_name: str): """ @@ -37,5 +37,5 @@ class NodeOp(Enum): pattern = op_patterns[index] if re.match(pattern, node_name): return op - logger.warning(f"Cannot parsing node_name {node_name} into NodeOp, default parsing as module.") + logger.warning(f"Cannot parse node_name {node_name} into NodeOp, default parsing as module.") return NodeOp.module diff --git a/debug/accuracy_tools/msprobe/visualization/graph_service.py b/debug/accuracy_tools/msprobe/visualization/graph_service.py index 75b0014c1c09abb8dfecf285fed5eed3063827a0..d9ed834d517588b14e3cb0ccfe96701012580137 100644 --- a/debug/accuracy_tools/msprobe/visualization/graph_service.py +++ b/debug/accuracy_tools/msprobe/visualization/graph_service.py @@ -75,6 +75,9 @@ def _compare_graph(input_param, args): graph_n.overflow_check() graph_b.overflow_check() + if args.multi_mapping: + graph_comparator.multi_compare(args.multi_mapping) + return CompareGraphResult(graph_n, graph_b, graph_comparator, micro_steps) @@ -159,7 +162,7 @@ def _compare_graph_steps(input_param, args): bench_steps = sorted(check_and_return_dir_contents(dump_step_b, Const.STEP)) if npu_steps != bench_steps: - logger.error('The number of steps in the two runs are different. Unable to match the steps.') + logger.error('The number of steps in the two runs is different. Unable to match the steps.') raise CompareException(CompareException.INVALID_PATH_ERROR) for folder_step in npu_steps: @@ -217,6 +220,8 @@ def _graph_service_parser(parser): help=" Whether to perform a fuzzy match on the api name.", required=False) parser.add_argument("-cs", "--complete_stack", dest="complete_stack", action="store_true", help=" Whether to use complete stack information.", required=False) + parser.add_argument("-mm", "--multi_mapping", dest="multi_mapping", type=str, + help=" The multi mapping file path.", required=False) def _graph_service_command(args): diff --git a/debug/accuracy_tools/msprobe/visualization/utils.py b/debug/accuracy_tools/msprobe/visualization/utils.py index 623bcd11c45f1ff8e9c283d30a982af239706ce4..0aeb140e811696b3ade46ada559f08fa0b7249ea 100644 --- a/debug/accuracy_tools/msprobe/visualization/utils.py +++ b/debug/accuracy_tools/msprobe/visualization/utils.py @@ -42,14 +42,6 @@ def load_data_json_file(file_path): return load_json_file(file_path).get(GraphConst.DATA_KEY, {}) -def save_json_file(file_path, data): - """ - 保存json文件 - """ - with FileOpen(file_path, 'w') as f: - f.write(json.dumps(data, indent=4)) - - def get_csv_df(stack_mode, csv_data, compare_mode): """ 调用acc接口写入csv @@ -73,14 +65,6 @@ def str2float(percentage_str): return 0 -def is_integer(s): - try: - int(s) - return True - except Exception: - return False - - def check_directory_content(input_path): """ 检查input_path内容, 是否全是step{数字}命名的文件夹(例如step0), 或者全是rank{数字}命名的文件夹(例如rank0), 或者全是文件 @@ -182,13 +166,11 @@ class GraphConst: STR_MAX_LEN = 50 SMALL_VALUE = 1e-3 MD5_INDEX_LIST = [CompareConst.RESULT] - REAL_DATA_INDEX_LIST = [CompareConst.COSINE, CompareConst.MAX_ABS_ERR, CompareConst.MAX_RELATIVE_ERR, - CompareConst.ONE_THOUSANDTH_ERR_RATIO, CompareConst.FIVE_THOUSANDTHS_ERR_RATIO] - SUMMARY_INDEX_LIST = [CompareConst.MAX_DIFF, CompareConst.MIN_DIFF, CompareConst.MEAN_DIFF, - CompareConst.NORM_DIFF, CompareConst.MAX_RELATIVE_ERR, CompareConst.MIN_RELATIVE_ERR, - CompareConst.MEAN_RELATIVE_ERR, CompareConst.NORM_RELATIVE_ERR] + REAL_DATA_INDEX_LIST = CompareConst.ALL_COMPARE_INDEX + SUMMARY_INDEX_LIST = CompareConst.SUMMARY_COMPARE_INDEX VALUE_INDEX_LIST = [Const.MAX, Const.MIN, Const.MEAN, Const.NORM] APIS_BETWEEN_MODULES = 'Apis_Between_Modules' + MERGE_NODES = 'Merged_Nodes' NULL = 'null' NONE = 'None' VALUE = 'value' diff --git a/debug/accuracy_tools/setup.py b/debug/accuracy_tools/setup.py index 2da7fcf667765a841b9db1bbf5628fad5b1cf8a9..14fd15e3c06deef1d0e3b9ff26b199b02f6ce391 100644 --- a/debug/accuracy_tools/setup.py +++ b/debug/accuracy_tools/setup.py @@ -24,12 +24,12 @@ import setuptools INSTALL_REQUIRED = [ "wheel", "einops", - "numpy < 2.0", + "numpy >=1.23.0, < 2.0", "pandas >= 1.3.5, < 2.1", "pyyaml", "rich", "tqdm", - "openpyxl", + "openpyxl >= 3.0.6", "pyopenssl", "twisted", "matplotlib", diff --git a/dynolog_npu/README.md b/dynolog_npu/README.md deleted file mode 100644 index 9cc015e66c656c65fa48ad73a8246487a2016bef..0000000000000000000000000000000000000000 --- a/dynolog_npu/README.md +++ /dev/null @@ -1,148 +0,0 @@ -# Ascend Extension for dynolog - -## 安装方式 - -### 1. clone 代码 - -```bash -git clone https://gitee.com/ascend/mstt.git -``` - -### 2. 安装依赖 -dynolog的编译依赖,确保安装了以下依赖: - - - - - - - - - - - - - -
Language - Toolchain -
C++ - gcc 8.5.0+ -
Rust - Rust 1.58.1 (1.56+ required for clap dependency) -
- -- 安装rust - -```bash -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh - -source $HOME/.cargo/env -``` - -- 安装ninja - -```bash -# debian -sudo apt-get install -y cmake ninja-build - -# centos -sudo yum install -y cmake ninja -``` - -### 3. 编译 - -默认编译生成dyno和dynolog二进制文件, -t参数可以支持将二进制文件打包成deb包或rpm包. - -```bash -# 编译dyno和dynolog二进制文件 -bash scripts/build.sh - -# 编译deb包, 当前支持amd64和aarch64平台, 默认为amd64, 编译aarch64平台需要修改third_party/dynolog/scripts/debian/control文件中的Architecture改为aarch64 -bash scripts/build.sh -t deb - -# 编译rpm包, 当前只支持amd64平台 -bash scripts/build.sh -t rpm -``` - -## 使用方式 - -### Profiler trace dump功能 -Profiler trace dump功能基于dynolog开发,实现类似于动态profiling的动态触发Ascend Torch Profiler采集profiling的功能。用户基于dyno CLI命令行可以动态触发指定节点的训练进程trace dump。 - -- 查看nputrace支持的命令和帮助 - -```bash -dyno nputrace --help -``` - -- nputrace使用方式 - -```bash -dyno nputrace [SUBCOMMANDS] --log-file -``` - -nputrace子命令支持的参数选项 - -| 子命令 | 参数类型 | 说明 | -|-------|-------|-------| -| record_shapes | action | 是否采集算子的InputShapes和InputTypes,设置参数采集,默认不采集 | -| profile_memory | action | 是否采集算子内存信息,设置参数采集,默认不采集 | -| with_stack | action | 是否采集Python调用栈,设置参数采集,默认不采集 | -| with_flops | action | 是否采集算子flops,设置参数采集,默认不采集 | -| with_modules | action | 是否采集modules层级的Python调用栈,设置参数采集,默认不采集 | -| analyse | action | 采集后是否自动解析,设置参数解析,默认不解析 | -| l2_cache | action | 是否采集L2 Cache数据,设置参数采集,默认不采集 | -| op_attr | action | 是否采集算子属性信息,设置参数采集,默认不采集 | -| data_simplification | String | 解析完成后是否数据精简,可选值范围[`true`, `false`],默认值`true` | -| activities | String | 控制CPU、NPU事件采集范围,可选值范围[`CPU,NPU`, `NPU,CPU`, `CPU`, `NPU`],默认值`CPU,NPU` | -| profiler_level | String | 控制profiler的采集等级,可选值范围[`Level_none`, `Level0`, `Level1`, `Level2`],默认值`Level0`| -| aic_metrics | String | AI Core的性能指标采集项,可选值范围[`AiCoreNone`, `PipeUtilization`, `ArithmeticUtilization`, `Memory`, `MemoryL0`, `ResourceConflictRatio`, `MemoryUB`, `L2Cache`, `MemoryAccess`],默认值`AiCoreNone`| -| export_type | String | profiler解析导出数据的类型,可选值范围[`Text`, `Db`],默认值`Text`| -| gc_detect_threshold | Option | GC检测阈值,单位ms,只采集超过阈值的GC事件。该参数为可选参数,默认不设置时不开启GC检测 | - -- nputrace示例命令 - -```bash -# 示例1:采集框架、CANN和device数据,同时采集完后自动解析以及解析完成不做数据精简,落盘路径为/tmp/profile_data -dyno nputrace --activities CPU,NPU --analyse --data_simplification false --log-file /tmp/profile_data - -# 示例2:只采集CANN和device数据,同时采集完后自动解析以及解析完成后开启数据精简,落盘路径为/tmp/profile_data -dyno nputrace --activities NPU --analyse --data_simplification true --log-file /tmp/profile_data - -# 示例3:只采集CANN和device数据,只采集不解析,落盘路径为/tmp/profile_data -dyno nputrace --activities NPU --log-file /tmp/profile_data -``` - -### NPU Monitor功能 -NPU Monitor基于MSPTI/MSTX能力开发,实现了轻量级在线监控能力,能够用于性能问题的初步定位。 - -```bash -dyno npu-monitor --help -``` - -- npu-monitor使用方式 - -```bash -dyno npu-monitor [SUBCOMMANDS] -``` - -npu-monitor子命令支持的参数选项 -| 子命令 | 参数类型 | 说明 | -|-------|-------|-------| -| npu_monitor_start | action | 开启性能监控,设置参数开启,默认不采集 | -| npu_monitor_stop | action | 停止性能监控,设置参数开启,默认不采集 | -| report_interval_s | int | 性能监控数据上报周期,单位s,需要在启动时设置。默认值60 | -| mspti_activity_kind | String | 性能监控数据上报数据类型,可以设置单个或多个,多个类型以逗号分隔,需要在启动时设置。可选值范围[`Marker`, `Kernel`, `API`, `Hccl`, `Memory`, `MemSet`, `MemCpy`] , 默认值`Marker`| - -- npu-monitor示例命令 - -```bash -# 示例1:开启性能监控,使用默认配置 -dyno npu-monitor --npu_monitor_start - -# 示例2:暂停性能监控 -dyno npu-monitor --npu_monitor_stop - -# 示例3:开启性能监控,上报周期30s, 上报数据类型Marker和Kernel -dyno npu-monitor --npu_monitor_start 30 --mspti_activity_kind Marker,Kernel -``` \ No newline at end of file diff --git a/dynolog_npu/dynolog_npu/cli/src/commands/mod.rs b/dynolog_npu/dynolog_npu/cli/src/commands/mod.rs deleted file mode 100644 index 18950d3c1a01d972db58a614a46f08176b02c725..0000000000000000000000000000000000000000 --- a/dynolog_npu/dynolog_npu/cli/src/commands/mod.rs +++ /dev/null @@ -1,18 +0,0 @@ -// Copyright (c) Meta Platforms, Inc. and affiliates. -// -// This source code is licensed under the MIT license found in the -// LICENSE file in the root directory of this source tree. - -// Export all command submodules to be used in main.rs -// Note: This "intermediate" commands module is purely for organizational purposes. -// This allows for a clear distinction between the command dispatching code and the command -// handling code. Additionally, explicitly "exporting" all the command modules here allows -// us to avoid having to explicitly list all the command modules in main.rs. - -pub mod dcgm; -pub mod gputrace; -pub mod nputrace; -pub mod npumonitor; -pub mod status; -pub mod version; -// ... add new command modules here \ No newline at end of file diff --git a/dynolog_npu/dynolog_npu/cli/src/commands/npumonitor.rs b/dynolog_npu/dynolog_npu/cli/src/commands/npumonitor.rs deleted file mode 100644 index 1edfaea5939f5cee5df8618720d1bfa16d0071b5..0000000000000000000000000000000000000000 --- a/dynolog_npu/dynolog_npu/cli/src/commands/npumonitor.rs +++ /dev/null @@ -1,59 +0,0 @@ -use std::net::TcpStream; - -use anyhow::Result; - -#[path = "utils.rs"] -mod utils; - -#[derive(Debug)] -pub struct NpuMonitorConfig { - pub npu_monitor_start: bool, - pub npu_monitor_stop: bool, - pub report_interval_s: u32, - pub mspti_activity_kind: String, -} - -impl NpuMonitorConfig { - fn config(&self) -> String { - format!( - r#" -NPU_MONITOR_START={} -NPU_MONITOR_STOP={} -REPORT_INTERVAL_S={} -MSPTI_ACTIVITY_KIND={}"#, - self.npu_monitor_start, - self.npu_monitor_stop, - self.report_interval_s, - self.mspti_activity_kind - ) - } -} - -pub fn run_npumonitor( - client: TcpStream, - config: NpuMonitorConfig, -) -> Result<()> { - let config_str = config.config(); - println!("Npu monitor config = \n{}", config_str); - let config_str = config_str.replace('\n', "\\n"); - - let request_json = format!( - r#" -{{ - "fn": "setKinetOnDemandRequest", - "config": "{}", - "job_id": 0, - "pids": [0], - "process_limit": 3 -}}"#, - config_str - ); - - utils::send_msg(&client, &request_json).expect("Error sending message to service"); - - let resp_str = utils::get_resp(&client).expect("Unable to decode output bytes"); - - println!("response = {}", resp_str); - - Ok(()) -} diff --git a/dynolog_npu/dynolog_npu/cli/src/commands/nputrace.rs b/dynolog_npu/dynolog_npu/cli/src/commands/nputrace.rs deleted file mode 100644 index 4bf7132de338d8eee0de556449269712617772e2..0000000000000000000000000000000000000000 --- a/dynolog_npu/dynolog_npu/cli/src/commands/nputrace.rs +++ /dev/null @@ -1,242 +0,0 @@ -use std::net::TcpStream; - -use anyhow::Result; -use serde_json::Value; - -#[path = "utils.rs"] -mod utils; - -#[derive(Debug)] -pub enum NpuTraceTriggerConfig { - DurationBased { - profile_start_time: u64, - duration_ms: u64, - }, - IterationBased { - start_step: u64, - iterations: i64, - }, -} - -impl NpuTraceTriggerConfig { - fn config(&self) -> String { - match *self { - NpuTraceTriggerConfig::DurationBased { - profile_start_time, - duration_ms, - } => format!( - "PROFILE_START_TIME={}\nACTIVITIES_DURATION_MSECS={}", - profile_start_time, duration_ms - ), - NpuTraceTriggerConfig::IterationBased { - start_step, - iterations, - } => format!( - r#"PROFILE_START_ITERATION=0 -PROFILE_START_STEP={} -ACTIVITIES_ITERATIONS={}"#, - start_step, iterations - ), - } - } -} - -// torch npu profiler config -#[derive(Debug)] -pub struct NpuTraceOptions { - pub record_shapes: bool, - pub profile_memory: bool, - pub with_stack: bool, - pub with_flops: bool, - pub with_modules: bool, - pub activities: String, - pub analyse: bool, - pub profiler_level: String, - pub aic_metrics: String, - pub l2_cache: bool, - pub op_attr: bool, - pub gc_detect_threshold: Option, - pub data_simplification: String, - pub export_type: String, -} - -impl NpuTraceOptions { - fn config(&self) -> String { - format!( - r#" -PROFILE_RECORD_SHAPES={} -PROFILE_PROFILE_MEMORY={} -PROFILE_WITH_STACK={} -PROFILE_WITH_FLOPS={} -PROFILE_WITH_MODULES={} -PROFILE_ACTIVITIES={} -PROFILE_ANALYSE={} -PROFILE_PROFILER_LEVEL={} -PROFILE_AIC_METRICS={} -PROFILE_L2_CACHE={} -PROFILE_OP_ATTR={} -PROFILE_GC_DETECT_THRESHOLD={} -PROFILE_DATA_SIMPLIFICATION={} -PROFILE_EXPORT_TYPE={}"#, - self.record_shapes, - self.profile_memory, - self.with_stack, - self.with_flops, - self.with_modules, - self.activities, - self.analyse, - self.profiler_level, - self.aic_metrics, - self.l2_cache, - self.op_attr, - self.gc_detect_threshold.map_or("None".to_string(), |v| v.to_string()), - self.data_simplification, - self.export_type - ) - } -} - -#[derive(Debug)] -pub struct NpuTraceConfig { - pub log_file: String, - pub trigger_config: NpuTraceTriggerConfig, - pub trace_options: NpuTraceOptions, -} - -impl NpuTraceConfig { - fn config(&self) -> String { - format!( - "ACTIVITIES_LOG_FILE={}\n{}{}", - self.log_file, - self.trigger_config.config(), - self.trace_options.config() - ) - } -} - -pub fn run_nputrace( - client: TcpStream, - job_id: u64, - pids: &str, - process_limit: u32, - config: NpuTraceConfig, -) -> Result<()> { - let config_str = config.config(); - println!("NpuTrace config = \n{}", config_str); - let config_str = config_str.replace('\n', "\\n"); - - let request_json = format!( - r#" -{{ - "fn": "setKinetOnDemandRequest", - "config": "{}", - "job_id": {}, - "pids": [{}], - "process_limit": {} -}}"#, - config_str, job_id, pids, process_limit - ); - - utils::send_msg(&client, &request_json).expect("Error sending message to service"); - - let resp_str = utils::get_resp(&client).expect("Unable to decode output bytes"); - - println!("response = {}", resp_str); - - let resp_v: Value = serde_json::from_str(&resp_str)?; - let processes = resp_v["processesMatched"].as_array().unwrap(); - - if processes.is_empty() { - println!("No processes were matched, please check --job-id or --pids flags"); - } else { - println!("Matched {} processes", processes.len()); - println!("Trace output files will be written to:"); - - for pid in processes { - let pid = pid.as_i64().unwrap(); - println!( - " {}", - config.log_file.replace(".json", &format!("_{}.json", pid)) - ); - } - } - - Ok(()) -} - - -#[cfg(test)] -mod test { - use crate::*; - - #[test] - fn test_nputrace_trigger_config() { - let trigger_config = NpuTraceTriggerConfig::DurationBased { - profile_start_time: 1000, - duration_ms: 1000, - }; - assert_eq!( - trigger_config.config(), - r#"PROFILE_START_TIME=1000 -ACTIVITIES_DURATION_MSECS=1000"# - ); - - let trigger_config = NpuTraceTriggerConfig::IterationBased { - profile_start_step: 1000, - iterations: 1000, - }; - assert_eq!( - trigger_config.config(), - r#"PROFILE_START_ITERATION=0 -PROFILE_START_STEP=1000 -ACTIVITIES_ITERATIONS=1000"# - ); - } - - #[test] - fn test_nputrace_config() { - let config = NpuTraceConfig { - log_file: "test.json".to_string(), - trigger_config: NpuTraceTriggerConfig::DurationBased { - profile_start_time: 1000, - duration_ms: 1000, - }, - trace_options: NpuTraceOptions { - record_shapes: true, - profile_memory: false, - with_stack: true, - with_flops: true, - with_modules: true, - activities: "CPU,NPU".to_string(), - analyse: false, - profiler_level: "Level0".to_string(), - aic_metrics: "AiCoreNone".to_string(), - l2_cache: true, - op_attr: true, - gc_detect_threshold: 0.1, - data_simplification: "true", - export_type: "Text".to_string(), - }, - }; - assert_eq!( - config.config(), - r#"ACTIVITIES_LOG_FILE=test.json -PROFILE_START_TIME=1000 -ACTIVITIES_DURATION_MSECS=1000 -PROFILE_RECORD_SHAPES=true -PROFILE_PROFILE_MEMORY=false -PROFILE_WITH_STACK=true -PROFILE_WITH_FLOPS=true -PROFILE_WITH_MODULES=true -PROFILE_ACTIVITIES=CPU,NPU -PROFILE_ANALYSE=false -PROFILE_PROFILER_LEVEL=Level0 -PROFILE_AIC_METRICS=AiCoreNone -PROFILE_L2_CACHE=true -PROFILE_OP_ATTR=true -PROFILE_GC_DETECT_THRESHOLD=0.1 -PROFILE_DATA_SIMPLIFICATION=true -PROFILE_EXPORT_TYPE=Text"# - ); - } -} diff --git a/dynolog_npu/dynolog_npu/cli/src/main.rs b/dynolog_npu/dynolog_npu/cli/src/main.rs deleted file mode 100644 index 8bc4a2af0e2c19d6e783663924578e3c2ad7408a..0000000000000000000000000000000000000000 --- a/dynolog_npu/dynolog_npu/cli/src/main.rs +++ /dev/null @@ -1,350 +0,0 @@ -// Copyright (c) Meta Platforms, Inc. and affiliates. -// -// This source code is licensed under the MIT license found in the -// LICENSE file in the root directory of this source tree. - -use std::net::TcpStream; -use std::net::ToSocketAddrs; - -use anyhow::Result; -use clap::Parser; -use std::collections::HashSet; - -// Make all the command modules accessible to this file. -mod commands; -use commands::gputrace::GpuTraceConfig; -use commands::gputrace::GpuTraceOptions; -use commands::gputrace::GpuTraceTriggerConfig; -use commands::nputrace::NpuTraceConfig; -use commands::nputrace::NpuTraceOptions; -use commands::nputrace::NpuTraceTriggerConfig; -use commands::npumonitor::NpuMonitorConfig; -use commands::*; - -/// Instructions on adding a new Dyno CLI command: -/// -/// 1. Add a new variant to the `Command` enum. -/// Please include a description of the command and, if applicable, its flags/subcommands. -/// -/// 2. Create a new file for the command's implementation in the commands/ directory (ie -/// commands/status.rs). This new file is where the command should be implemented. -/// Make the new command's module accessible from this file by adding -/// a new line with `pub mod ;` to commands/mod.rs. -/// -/// -/// 3. Add a branch to the match statement in main() to handle the new enum variant (from step 1). -/// From here, invoke the handling logic defined in the new file (from step 2). In an effort to keep -/// the command dispatching logic clear and concise, please keep the code in the match branch to a minimum. - -const DYNO_PORT: u16 = 1778; - -#[derive(Debug, Parser)] -struct Opts { - #[clap(long, default_value = "localhost")] - hostname: String, - #[clap(long, default_value_t = DYNO_PORT)] - port: u16, - #[clap(subcommand)] - cmd: Command, -} - -const ALLOWED_VALUES: &[&str] = &["Marker", "Kernel", "API", "Hccl", "Memory", "MemSet", "MemCpy"]; - -fn parse_mspti_activity_kinds(src: &str) -> Result{ - let allowed_values: HashSet<&str> = ALLOWED_VALUES.iter().cloned().collect(); - - let kinds: Vec<&str> = src.split(',').map(|s| s.trim()).collect(); - - for kind in &kinds { - if !allowed_values.contains(kind) { - return Err(format!("Invalid MSPTI activity kind: {}, Possible values: {:?}.]", kind, allowed_values)); - } - } - - Ok(src.to_string()) -} - -#[derive(Debug, Parser)] -enum Command { - /// Check the status of a dynolog process - Status, - /// Check the version of a dynolog process - Version, - /// Capture gputrace - Gputrace { - /// Job id of the application to trace. - #[clap(long, default_value_t = 0)] - job_id: u64, - /// List of pids to capture trace for (comma separated). - #[clap(long, default_value = "0")] - pids: String, - /// Duration of trace to collect in ms. - #[clap(long, default_value_t = 500)] - duration_ms: u64, - /// Training iterations to collect, this takes precedence over duration. - #[clap(long, default_value_t = -1)] - iterations: i64, - /// Log file for trace. - #[clap(long)] - log_file: String, - /// Unix timestamp used for synchronized collection (milliseconds since epoch). - #[clap(long, default_value_t = 0)] - profile_start_time: u64, - /// Start iteration roundup, starts an iteration based trace at a multiple - /// of this value. - #[clap(long, default_value_t = 1)] - profile_start_iteration_roundup: u64, - /// Max number of processes to profile. - #[clap(long, default_value_t = 3)] - process_limit: u32, - /// Record PyTorch operator input shapes and types. - #[clap(long, action)] - record_shapes: bool, - /// Profile PyTorch memory. - #[clap(long, action)] - profile_memory: bool, - /// Capture Python stacks in traces. - #[clap(long, action)] - with_stacks: bool, - /// Annotate operators with analytical flops. - #[clap(long, action)] - with_flops: bool, - /// Capture PyTorch operator modules in traces. - #[clap(long, action)] - with_modules: bool, - }, - /// Capture nputrace. Subcommand functions aligned with Ascend Torch Profiler. - Nputrace { - /// Job id of the application to trace. - #[clap(long, default_value_t = 0)] - job_id: u64, - /// List of pids to capture trace for (comma separated). - #[clap(long, default_value = "0")] - pids: String, - /// Duration of trace to collect in ms. - #[clap(long, default_value_t = 500)] - duration_ms: u64, - /// Training iterations to collect, this takes precedence over duration. - #[clap(long, default_value_t = -1)] - iterations: i64, - /// Log file for trace. - #[clap(long)] - log_file: String, - /// Unix timestamp used for synchronized collection (milliseconds since epoch). - #[clap(long, default_value_t = 0)] - profile_start_time: u64, - /// Number of steps to start profile. - #[clap(long, default_value_t = 0)] - start_step: u64, - /// Max number of processes to profile. - #[clap(long, default_value_t = 3)] - process_limit: u32, - /// Whether to record PyTorch operator input shapes and types. - #[clap(long, action)] - record_shapes: bool, - /// Whether to profile PyTorch memory. - #[clap(long, action)] - profile_memory: bool, - /// Whether to profile the Python call stack in trace. - #[clap(long, action)] - with_stack: bool, - /// Annotate operators with analytical flops. - #[clap(long, action)] - with_flops: bool, - /// Whether to profile PyTorch operator modules in traces. - #[clap(long, action)] - with_modules: bool, - /// The scope of the profile's events. - #[clap(long, value_parser = ["CPU,NPU", "NPU,CPU", "CPU", "NPU"], default_value = "CPU,NPU")] - activities: String, - /// Profiler level. - #[clap(long, value_parser = ["Level0", "Level1", "Level2", "Level_none"], default_value = "Level0")] - profiler_level: String, - /// AIC metrics. - #[clap(long, value_parser = ["AiCoreNone", "PipeUtilization", "ArithmeticUtilization", "Memory", "MemoryL0", "ResourceConflictRatio", "MemoryUB", "L2Cache", "MemoryAccess"], default_value = "AiCoreNone")] - aic_metrics: String, - /// Whether to analyse the data after collection. - #[clap(long, action)] - analyse: bool, - /// Whether to collect L2 cache. - #[clap(long, action)] - l2_cache: bool, - /// Whether to collect op attributes. - #[clap(long, action)] - op_attr: bool, - /// GC detect threshold. - #[clap(long)] - gc_detect_threshold: Option, - /// Whether to streamline data after analyse is complete. - #[clap(long, value_parser = ["true", "false"], default_value = "true")] - data_simplification: String, - /// Types of data exported by the profiler. - #[clap(long, value_parser = ["Text", "Db"], default_value = "Text")] - export_type: String, - }, - /// Ascend MSPTI Monitor - NpuMonitor { - /// Start NPU monitor. - #[clap(long, action)] - npu_monitor_start: bool, - /// Stop NPU monitor. - #[clap(long, action)] - npu_monitor_stop: bool, - /// NPU monitor report interval in seconds. - #[clap(long, default_value_t = 60)] - report_interval_s: u32, - /// MSPTI collect activity kind - #[clap(long, value_parser = parse_mspti_activity_kinds, default_value = "Marker")] - mspti_activity_kind: String, - }, - /// Pause dcgm profiling. This enables running tools like Nsight compute and avoids conflicts. - DcgmPause { - /// Duration to pause dcgm profiling in seconds - #[clap(long, default_value_t = 300)] - duration_s: i32, - }, - /// Resume dcgm profiling - DcgmResume, -} - -/// Create a socket connection to dynolog -fn create_dyno_client(host: &str, port: u16) -> Result { - let addr = (host, port) - .to_socket_addrs()? - .next() - .expect("Failed to connect to the server"); - - TcpStream::connect(addr).map_err(|err| err.into()) -} - -fn main() -> Result<()> { - let Opts { - hostname, - port, - cmd, - } = Opts::parse(); - - let dyno_client = - create_dyno_client(&hostname, port).expect("Couldn't connect to the server..."); - - match cmd { - Command::Status => status::run_status(dyno_client), - Command::Version => version::run_version(dyno_client), - Command::Gputrace { - job_id, - pids, - log_file, - duration_ms, - iterations, - profile_start_time, - profile_start_iteration_roundup, - process_limit, - record_shapes, - profile_memory, - with_stacks, - with_flops, - with_modules, - } => { - let trigger_config = if iterations > 0 { - GpuTraceTriggerConfig::IterationBased { - profile_start_iteration_roundup, - iterations, - } - } else { - GpuTraceTriggerConfig::DurationBased { - profile_start_time, - duration_ms, - } - }; - let trace_options = GpuTraceOptions { - record_shapes, - profile_memory, - with_stacks, - with_flops, - with_modules, - }; - let trace_config = GpuTraceConfig { - log_file, - trigger_config, - trace_options, - }; - gputrace::run_gputrace(dyno_client, job_id, &pids, process_limit, trace_config) - } - Command::Nputrace { - job_id, - pids, - log_file, - duration_ms, - iterations, - profile_start_time, - start_step, - process_limit, - record_shapes, - profile_memory, - with_stack, - with_flops, - with_modules, - activities, - analyse, - profiler_level, - aic_metrics, - l2_cache, - op_attr, - gc_detect_threshold, - data_simplification, - export_type, - } => { - let trigger_config = if iterations > 0 { - NpuTraceTriggerConfig::IterationBased { - start_step, - iterations, - } - } else { - NpuTraceTriggerConfig::DurationBased { - profile_start_time, - duration_ms, - } - }; - - let trace_options = NpuTraceOptions { - record_shapes, - profile_memory, - with_stack, - with_flops, - with_modules, - activities, - analyse, - profiler_level, - aic_metrics, - l2_cache, - op_attr, - gc_detect_threshold, - data_simplification, - export_type, - }; - let trace_config = NpuTraceConfig { - log_file, - trigger_config, - trace_options, - }; - nputrace::run_nputrace(dyno_client, job_id, &pids, process_limit, trace_config) - } - Command::NpuMonitor { - npu_monitor_start, - npu_monitor_stop, - report_interval_s, - mspti_activity_kind, - } => { - let npu_mon_config = NpuMonitorConfig { - npu_monitor_start, - npu_monitor_stop, - report_interval_s, - mspti_activity_kind - }; - npumonitor::run_npumonitor(dyno_client, npu_mon_config) - } - Command::DcgmPause { duration_s } => dcgm::run_dcgm_pause(dyno_client, duration_s), - Command::DcgmResume => dcgm::run_dcgm_resume(dyno_client), - // ... add new commands here - } -} \ No newline at end of file diff --git a/dynolog_npu/dynolog_npu/dynolog/src/Main.cpp b/dynolog_npu/dynolog_npu/dynolog/src/Main.cpp deleted file mode 100644 index 8e5177768327e37173d4e7661e334a9400bd6172..0000000000000000000000000000000000000000 --- a/dynolog_npu/dynolog_npu/dynolog/src/Main.cpp +++ /dev/null @@ -1,206 +0,0 @@ -// Copyright (c) Meta Platforms, Inc. and affiliates. -// -// This source code is licensed under the MIT license found in the -// LICENSE file in the root directory of this source tree. - -// Dynolog : A portable telemetry monitoring daemon. - -#include -#include -#include -#include -#include -#include "dynolog/src/CompositeLogger.h" -#include "dynolog/src/FBRelayLogger.h" -#include "dynolog/src/KernelCollector.h" -#include "dynolog/src/Logger.h" -#include "dynolog/src/ODSJsonLogger.h" -#include "dynolog/src/PerfMonitor.h" -#include "dynolog/src/ScubaLogger.h" -#include "dynolog/src/ServiceHandler.h" -#include "dynolog/src/gpumon/DcgmGroupInfo.h" -#include "dynolog/src/rpc/SimpleJsonServer.h" -#include "dynolog/src/rpc/SimpleJsonServerInl.h" -#include "dynolog/src/tracing/IPCMonitor.h" -#include "hbt/src/perf_event/BuiltinMetrics.h" - -#ifdef USE_PROMETHEUS -#include "dynolog/src/PrometheusLogger.h" -#endif - -using namespace dynolog; -using json = nlohmann::json; -namespace hbt = facebook::hbt; - -DEFINE_int32(port, 1778, "Port for listening RPC requests."); -DEFINE_bool(use_JSON, false, "Emit metrics to JSON file through JSON logger"); -#ifdef USE_PROMETHEUS -DEFINE_bool(use_prometheus, false, "Emit metrics to Prometheus"); -#endif -DEFINE_bool(use_fbrelay, false, "Emit metrics to FB Relay on Lab machines"); -DEFINE_bool(use_ODS, false, "Emit metrics to ODS through ODS logger"); -DEFINE_bool(use_scuba, false, "Emit metrics to Scuba through Scuba logger"); -DEFINE_int32( - kernel_monitor_reporting_interval_s, - 60, - "Duration in seconds to read and report metrics for kernel monitor"); -DEFINE_int32( - perf_monitor_reporting_interval_s, - 60, - "Duration in seconds to read and report metrics for performance monitor"); -DEFINE_int32( - dcgm_reporting_interval_s, - 10, - "Duration in seconds to read and report metrics for DCGM"); -DEFINE_bool( - enable_ipc_monitor, - false, - "Enabled IPC monitor for on system tracing requests."); -DEFINE_bool( - enable_gpu_monitor, - false, - "Enabled GPU monitorng, currently supports NVIDIA GPUs."); -DEFINE_bool(enable_perf_monitor, false, "Enable heartbeat perf monitoring."); - -std::unique_ptr getLogger(const std::string& scribe_category = "") { - std::vector> loggers; -#ifdef USE_PROMETHEUS - if (FLAGS_use_prometheus) { - loggers.push_back(std::make_unique()); - } -#endif - if (FLAGS_use_fbrelay) { - loggers.push_back(std::make_unique()); - } - if (FLAGS_use_ODS) { - loggers.push_back(std::make_unique()); - } - if (FLAGS_use_JSON) { - loggers.push_back(std::make_unique()); - } - if (FLAGS_use_scuba && !scribe_category.empty()) { - loggers.push_back(std::make_unique(scribe_category)); - } - return std::make_unique(std::move(loggers)); -} - -auto next_wakeup(int sec) { - return std::chrono::steady_clock::now() + std::chrono::seconds(sec); -} - -void kernel_monitor_loop() { - KernelCollector kc; - - LOG(INFO) << "Running kernel monitor loop : interval = " - << FLAGS_kernel_monitor_reporting_interval_s << " s."; - - while (1) { - auto logger = getLogger(); - auto wakeup_timepoint = - next_wakeup(FLAGS_kernel_monitor_reporting_interval_s); - - kc.step(); - kc.log(*logger); - logger->finalize(); - - /* sleep override */ - std::this_thread::sleep_until(wakeup_timepoint); - } -} - -void perf_monitor_loop() { - PerfMonitor pm( - hbt::CpuSet::makeAllOnline(), - std::vector{"instructions", "cycles"}, - getDefaultPmuDeviceManager(), - getDefaultMetrics()); - - LOG(INFO) << "Running perf monitor loop : interval = " - << FLAGS_perf_monitor_reporting_interval_s << " s."; - - while (1) { - auto logger = getLogger(); - auto wakeup_timepoint = - next_wakeup(FLAGS_perf_monitor_reporting_interval_s); - - pm.step(); - pm.log(*logger); - - logger->finalize(); - /* sleep override */ - std::this_thread::sleep_until(wakeup_timepoint); - } -} - -auto setup_server(std::shared_ptr handler) { - return std::make_unique>( - handler, FLAGS_port); -} - -void gpu_monitor_loop(std::shared_ptr dcgm) { - auto logger = getLogger(FLAGS_scribe_category); - - LOG(INFO) << "Running DCGM loop : interval = " - << FLAGS_dcgm_reporting_interval_s << " s."; - LOG(INFO) << "DCGM fields: " << gpumon::FLAGS_dcgm_fields; - - while (1) { - auto wakeup_timepoint = next_wakeup(FLAGS_dcgm_reporting_interval_s); - - dcgm->update(); - dcgm->log(*logger); - - /* sleep override */ - std::this_thread::sleep_until(wakeup_timepoint); - } -} - -int main(int argc, char** argv) { - gflags::ParseCommandLineFlags(&argc, &argv, true); - FLAGS_logtostderr = 1; - google::InitGoogleLogging(argv[0]); - - LOG(INFO) << "Starting Ascend Extension for dynolog, version = " DYNOLOG_VERSION - << ", build git-hash = " DYNOLOG_GIT_REV; - - std::shared_ptr dcgm; - - std::unique_ptr ipcmon; - std::unique_ptr ipcmon_thread, gpumon_thread, pm_thread; - - if (FLAGS_enable_ipc_monitor) { - LOG(INFO) << "Starting IPC Monitor"; - ipcmon = std::make_unique(); - ipcmon_thread = - std::make_unique([&ipcmon]() { ipcmon->loop(); }); - } - - if (FLAGS_enable_gpu_monitor) { - dcgm = gpumon::DcgmGroupInfo::factory( - gpumon::FLAGS_dcgm_fields, FLAGS_dcgm_reporting_interval_s * 1000); - gpumon_thread = std::make_unique(gpu_monitor_loop, dcgm); - } - std::thread km_thread{kernel_monitor_loop}; - if (FLAGS_enable_perf_monitor) { - pm_thread = std::make_unique(perf_monitor_loop); - } - - // setup service - auto handler = std::make_shared(dcgm); - - // use simple json RPC server for now - auto server = setup_server(handler); - server->run(); - - km_thread.join(); - if (pm_thread) { - pm_thread->join(); - } - if (gpumon_thread) { - gpumon_thread->join(); - } - - server->stop(); - - return 0; -} \ No newline at end of file diff --git a/dynolog_npu/plugin/Readme.md b/dynolog_npu/plugin/Readme.md deleted file mode 100644 index c59bfffad5aaac5383b407e3ff3d23ed126131f5..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/Readme.md +++ /dev/null @@ -1,17 +0,0 @@ - - -# Build and Install npu-dynolog-plugin -``` -# install pybind11 -pip install pybind11 - -# build dynolog_npu_plugin wheel -python3 setup.py bdist_wheel -# install -pip install dist/{dynolog-npu-plugin-xxx.wheel} - -# example -import IPCMonitor -dyno_worker = IPCMonitor.PyDynamicMonitorProxy() -dyno_worker.init_dyno(0) -``` diff --git a/dynolog_npu/plugin/bindings.cpp b/dynolog_npu/plugin/bindings.cpp deleted file mode 100644 index c0cdaa4d577b3a76ec2d6f3eae4b426556a56532..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/bindings.cpp +++ /dev/null @@ -1,11 +0,0 @@ -#include -#include "ipc_monitor/PyDynamicMonitorProxy.h" - -namespace py = pybind11; - -PYBIND11_MODULE(IPCMonitor, m) { - py::class_(m, "PyDynamicMonitorProxy") - .def(py::init<>()) - .def("init_dyno", &dynolog_npu::ipc_monitor::PyDynamicMonitorProxy::InitDyno, py::arg("npuId")) - .def("poll_dyno", &dynolog_npu::ipc_monitor::PyDynamicMonitorProxy::PollDyno); -} \ No newline at end of file diff --git a/dynolog_npu/plugin/build.sh b/dynolog_npu/plugin/build.sh deleted file mode 100755 index ce20d9d2be546afbc63e3aace524f74858eff6ff..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/build.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -# install pybind11 -pip install pybind11 - -# build dynolog_npu_plugin wheel -python3 setup.py bdist_wheel - -# find .whl files in dist -files=$(find dist -type f -name "*.whl" 2>/dev/null) -count=$(echo "$files" | wc -l) -if [ "$count" -eq 1 ]; then - echo "find .whl in dist: $files" -else - echo "find no or multi .whl in dist" - exit 1 -fi - -# pip install whl -echo "pip install ${files}" -pip install ${files} \ No newline at end of file diff --git a/dynolog_npu/plugin/ipc_monitor/DynoLogNpuMonitor.cpp b/dynolog_npu/plugin/ipc_monitor/DynoLogNpuMonitor.cpp deleted file mode 100644 index 940f5aae167f088361057fe2a7a389a76f5bb2b4..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/DynoLogNpuMonitor.cpp +++ /dev/null @@ -1,36 +0,0 @@ -#include "DynoLogNpuMonitor.h" - -#include - -#include "utils.h" - -namespace dynolog_npu { -namespace ipc_monitor { - -bool DynoLogNpuMonitor::Init() -{ - if (isInitialized_) { - std::cout << "[WRARNING] DynoLog npu monitor already initialized" << std::endl; - return true; - } - bool res = ipcClient_.RegisterInstance(npuId_); - if (res) { - isInitialized_ = true; - std::cout << "[INFO] DynoLog npu monitor initialized success !" << std::endl; - } - return res; -} - -std::string DynoLogNpuMonitor::Poll() -{ - std::string res = ipcClient_.IpcClientNpuConfig(); - if (res.empty()) { - std::cout << "[INFO] Request for dynolog server is empty !" << std::endl; - return ""; - } - std::cout << "[INFO] Received NPU configuration successfully" << std::endl; - return res; -} - -} // namespace ipc_monitor -} // namespace dynolog_npu \ No newline at end of file diff --git a/dynolog_npu/plugin/ipc_monitor/DynoLogNpuMonitor.h b/dynolog_npu/plugin/ipc_monitor/DynoLogNpuMonitor.h deleted file mode 100644 index 40ee21072710312a86cd75befdcefa67e24efb8f..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/DynoLogNpuMonitor.h +++ /dev/null @@ -1,33 +0,0 @@ -#ifndef DYNOLOG_NPU_MONITOR_H -#define DYNOLOG_NPU_MONITOR_H - -#include "MonitorBase.h" -#include "NpuIpcClient.h" -#include "singleton.h" - -namespace dynolog_npu { -namespace ipc_monitor { - -class DynoLogNpuMonitor : public MonitorBase, public Singleton { - friend class Singleton; - -public: - DynoLogNpuMonitor() = default; - bool Init() override; - std::string Poll() override; - void SetNpuId(int id) override - { - npuId_ = id; - } - -private: - bool isInitialized_ = false; - int32_t npuId_ = 0; - IpcClient ipcClient_; -}; - -} // namespace ipc_monitor -} // namespace dynolog_npu - -#endif - diff --git a/dynolog_npu/plugin/ipc_monitor/MonitorBase.h b/dynolog_npu/plugin/ipc_monitor/MonitorBase.h deleted file mode 100644 index 108023c7624b747e5987be9184d6c594decd360a..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/MonitorBase.h +++ /dev/null @@ -1,18 +0,0 @@ -#ifndef MONITOR_BASE_H -#define MONITOR_BASE_H -#include - -namespace dynolog_npu { -namespace ipc_monitor { - -class MonitorBase { -public: - virtual bool Init() = 0; - virtual std::string Poll() = 0; - virtual void SetNpuId(int id) = 0; -}; - -} // namespace ipc_monitor -} // namespace dynolog_npu - -#endif \ No newline at end of file diff --git a/dynolog_npu/plugin/ipc_monitor/NpuIpcClient.cpp b/dynolog_npu/plugin/ipc_monitor/NpuIpcClient.cpp deleted file mode 100644 index 97966e8eeacc7276426feb237aa122eb8dee046f..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/NpuIpcClient.cpp +++ /dev/null @@ -1,138 +0,0 @@ -#include "NpuIpcClient.h" - -#include - -namespace dynolog_npu { -namespace ipc_monitor { - -bool IpcClient::RegisterInstance(int32_t id) -{ - NpuContext context{ - .npu = id, - .pid = getpid(), - .jobId = JOB_ID, - }; - std::unique_ptr message = Message::ConstructMessage(context, "ctxt"); - try { - if (!SyncSendMessage(*message, std::string(DYNO_IPC_NAME))) { - std::cout << "[WARNING]Failed to send register ctxt for pid " << context.pid << " with dyno" << std::endl; - return false; - } - } catch (const std::exception &e) { - std::cout << "[WARNING] Error when SyncSendMessage: " << e.what() << std::endl; - return false; - } - std::cout << "[INFO] Resigter pid " << context.pid << " for dynolog success !" << std::endl; - return true; -} -std::string IpcClient::IpcClientNpuConfig() -{ - auto size = pids_.size(); - auto *req = (NpuRequest *)malloc(sizeof(NpuRequest) + sizeof(int32_t) * size); - req->type = DYNO_IPC_TYPE; - req->pidSize = size; - req->jobId = JOB_ID; - for (int i = 0; i < size; i++) { - req->pids[i] = pids_[i]; - } - std::unique_ptr message = Message::ConstructMessage(*req, "req", size); - if (!SyncSendMessage(*message, std::string(DYNO_IPC_NAME))) { - std::cout << "[WARNING] Failed to send config to dyno server fail !" << std::endl; - free(req); - req = nullptr; - return ""; - } - free(req); - message = PollRecvMessage(MAX_IPC_RETRIES, MAX_SLEEP_US); - if (!message) { - std::cout << "[WARNING] Failed to receive on-demand config !" << std::endl; - return ""; - } - std::string res = std::string(ReinterpretConvert(message->buf.get()), message->metadata.size); - - return res; -} -std::unique_ptr IpcClient::ReceiveMessage() -{ - std::lock_guard wguard(dequeLock_); - if (msgDynoDeque_.empty()) { - return nullptr; - } - std::unique_ptr message = std::move(msgDynoDeque_.front()); - msgDynoDeque_.pop_front(); - return message; -} -bool IpcClient::SyncSendMessage(const Message &message, const std::string &destName, int numRetry, int seepTimeUs) -{ - if (destName.empty()) { - std::cout << "[WARNING] Can not send to empty socket name !" << std::endl; - return false; - } - int i = 0; - std::vector npuPayLoad{ NpuPayLoad(sizeof(struct Metadata), (void *)&message.metadata), - NpuPayLoad(message.metadata.size, message.buf.get()) }; - try { - auto ctxt = ep_.BuildSendNpuCtxt(destName, npuPayLoad, std::vector()); - while (!ep_.TrySendMessage(*ctxt) && i < numRetry) { - i++; - usleep(seepTimeUs); - seepTimeUs *= 2; // 2: double sleep time - } - } catch (const std::exception &e) { - std::cout << "[ERROR] Error when SyncSendMessage: " << e.what() << std::endl; - return false; - } - return i < numRetry; -} -bool IpcClient::Recv() -{ - try { - Metadata recvMetadata; - std::vector PeekNpuPayLoad{ NpuPayLoad(sizeof(struct Metadata), &recvMetadata) }; - auto peekCtxt = ep_.BuildNpuRcvCtxt(PeekNpuPayLoad); - bool successFlag = false; - try { - successFlag = ep_.TryPeekMessage(*peekCtxt); - } catch (std::exception &e) { - std::cout << "[ERROR] Error when TryPeekMessage: " << e.what() << std::endl; - return false; - } - if (successFlag) { - std::unique_ptr npuMessage = std::make_unique(Message()); - npuMessage->metadata = recvMetadata; - npuMessage->buf = std::make_unique(recvMetadata.size); - npuMessage->src = std::string(ep_.GetName(*peekCtxt)); - std::vector npuPayLoad{ NpuPayLoad(sizeof(struct Metadata), (void *)&npuMessage->metadata), - NpuPayLoad(recvMetadata.size, npuMessage->buf.get()) }; - auto recvCtxt = ep_.BuildNpuRcvCtxt(npuPayLoad); - try { - successFlag = ep_.TryRcvMessage(*recvCtxt); - } catch (std::exception &e) { - std::cout << "[ERROR] Error when TryRecvMsg: " << e.what() << std::endl; - return false; - } - if (successFlag) { - std::lock_guard wguard(dequeLock_); - msgDynoDeque_.push_back(std::move(npuMessage)); - return true; - } - } - } catch (std::exception &e) { - std::cout << "[ERROR] Error in Recv(): " << e.what() << std::endl; - return false; - } - return false; -} -std::unique_ptr IpcClient::PollRecvMessage(int maxRetry, int sleeTimeUs) -{ - for (int i = 0; i < maxRetry; i++) { - if (Recv()) { - return ReceiveMessage(); - } - usleep(sleeTimeUs); - } - return nullptr; -} - -} // namespace ipc_monitor -} // namespace dynolog_npu \ No newline at end of file diff --git a/dynolog_npu/plugin/ipc_monitor/NpuIpcClient.h b/dynolog_npu/plugin/ipc_monitor/NpuIpcClient.h deleted file mode 100644 index ae7b00eb51b935db4e799fab470c3343e78bcb6f..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/NpuIpcClient.h +++ /dev/null @@ -1,103 +0,0 @@ -#ifndef NPU_IPC_CLIENT_H -#define NPU_IPC_CLIENT_H -#include -#include -#include -#include -#include -#include -#include -#include -#include "NpuIpcEndPoint.h" -#include "utils.h" - -namespace dynolog_npu { -namespace ipc_monitor { - -constexpr int TYPE_SIZE = 32; -constexpr int JOB_ID = 0; -constexpr const char *DYNO_IPC_NAME = "dynolog"; -constexpr const int DYNO_IPC_TYPE = 3; -constexpr const int MAX_IPC_RETRIES = 5; -constexpr const int MAX_SLEEP_US = 10000; -struct NpuRequest { - int type; - int pidSize; - int64_t jobId; - int32_t pids[0]; -}; -struct NpuContext { - int32_t npu; - pid_t pid; - int64_t jobId; -}; -struct Metadata { - size_t size = 0; - char type[TYPE_SIZE] = ""; -}; -struct Message { - Metadata metadata; - std::unique_ptr buf; - std::string src; - template static std::unique_ptr ConstructMessage(const T &data, const std::string &type) - { - std::unique_ptr ipcNpuMessage = std::make_unique(Message()); - if (type.size() + 1 > sizeof(ipcNpuMessage->metadata.type)) { - throw std::runtime_error("Type string is too long to fit in metadata.type" + IPC_ERROR(ErrCode::PARAM)); - } - memcpy(ipcNpuMessage->metadata.type, type.c_str(), type.size() + 1); -#if __cplusplus >= 201703L - if constexpr (std::is_same::value == true) { - ipcNpuMessage->metadata.size = data.size(); - ipcNpuMessage->buf = std::make_unique(ipcNpuMessage->metadata.size); - memcpy(ipcNpuMessage->buf.get(), data.c_str(), sizeof(data)); - return ipcNpuMessage; - } -#endif - static_assert(std::is_trivially_copyable::value); - ipcNpuMessage->metadata.size = sizeof(data); - ipcNpuMessage->buf = std::make_unique(ipcNpuMessage->metadata.size); - memcpy(ipcNpuMessage->buf.get(), &data, sizeof(data)); - return ipcNpuMessage; - } - - template - static std::unique_ptr ConstructMessage(const T &data, const std::string &type, int n) - { - std::unique_ptr ipcNpuMessage = std::make_unique(Message()); - if (type.size() + 1 > sizeof(ipcNpuMessage->metadata.type)) { - throw std::runtime_error("Type string is too long to fit in metadata.type" + IPC_ERROR(ErrCode::PARAM)); - } - memcpy(ipcNpuMessage->metadata.type, type.c_str(), type.size() + 1); - static_assert(std::is_trivially_copyable::value); - static_assert(std::is_trivially_copyable::value); - ipcNpuMessage->metadata.size = sizeof(data) + sizeof(U) * n; - ipcNpuMessage->buf = std::make_unique(ipcNpuMessage->metadata.size); - memcpy(ipcNpuMessage->buf.get(), &data, ipcNpuMessage->metadata.size); - return ipcNpuMessage; - } -}; -class IpcClient { -public: - IpcClient(const IpcClient &) = delete; - IpcClient &operator = (const IpcClient &) = delete; - IpcClient() = default; - bool RegisterInstance(int32_t npu); - std::string IpcClientNpuConfig(); - -private: - std::vector pids_ = GetPids(); - NpuIpcEndPoint<0> ep_{ "dynoconfigclient" + GenerateUuidV4() }; - std::mutex dequeLock_; - std::deque> msgDynoDeque_; - std::unique_ptr ReceiveMessage(); - bool SyncSendMessage(const Message &message, const std::string &destName, int numRetry = 10, - int seepTimeUs = 10000); - bool Recv(); - std::unique_ptr PollRecvMessage(int maxRetry, int sleeTimeUs); -}; - -} // namespace ipc_monitor -} // namespace dynolog_npu - -#endif \ No newline at end of file diff --git a/dynolog_npu/plugin/ipc_monitor/NpuIpcEndPoint.h b/dynolog_npu/plugin/ipc_monitor/NpuIpcEndPoint.h deleted file mode 100644 index 6560fa515646226ddbffbca49c4f818eb0d0ebcf..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/NpuIpcEndPoint.h +++ /dev/null @@ -1,204 +0,0 @@ -#ifndef NPU_IPC_ENDPOINT_H -#define NPU_IPC_ENDPOINT_H -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include "utils.h" - -namespace dynolog_npu { -namespace ipc_monitor { - -using fileDesT = int; -constexpr const char STR_END_CHAR = '\0'; -constexpr int SOCKET_FD_CHMOD = 0666; - -struct NpuPayLoad { - size_t size; - void *data; - NpuPayLoad(size_t size, void *data) : size(size), data(data) {} -}; - -template struct NpuIpcEndPointCtxt { - struct sockaddr_un messageName; - size_t messageLen; - fileDesT *fileDesPtr; - struct msghdr msghdr; - std::vector iov; - char ancillaryBuf[CMSG_SPACE(MaxNumFileDes * sizeof(fileDesT))]; - explicit NpuIpcEndPointCtxt(size_t num) : iov(std::vector(num)){}; -}; - -template class NpuIpcEndPoint final { - using Ctxt = NpuIpcEndPointCtxt; - -public: - constexpr static size_t addressMaxLen = 108 - 2; // Max unix socket path length - explicit NpuIpcEndPoint(const std::string &addressName) - { - socketFd = socket(AF_UNIX, SOCK_DGRAM, 0); - if (socketFd == -1) { - throw std::runtime_error(std::strerror(errno) + IPC_ERROR(ErrCode::PARAM)); - } - struct sockaddr_un address; - size_t addressLen = SetSocketAdress(addressName, address); - if (address.sun_path[0] != STR_END_CHAR) { - unlink(address.sun_path); - } - int res = bind(socketFd, ReinterpretConvert(&address), addressLen); - if (res == -1) { - throw std::runtime_error("Bind socket failed." + IPC_ERROR(ErrCode::PARAM)); - } - if (address.sun_path[0] != STR_END_CHAR) { - chmod(address.sun_path, SOCKET_FD_CHMOD); - } - } - ~NpuIpcEndPoint() - { - close(socketFd); - } - [[nodiscard]] auto BuildSendNpuCtxt(const std::string &desAddrName, const std::vector &npuPayLoad, - const std::vector &fileDes) - { - if (fileDes.size() > MaxNumFileDes) { - throw std::runtime_error("Request to fill more than max connections " + IPC_ERROR(ErrCode::PARAM)); - } - if (desAddrName.empty()) { - throw std::runtime_error("Can not send to dest point, because dest socket name is empty " + - IPC_ERROR(ErrCode::PARAM)); - } - auto ctxt = BuildNpuCtxt_(npuPayLoad, fileDes.size()); - ctxt->msghdr.msg_namelen = SetSocketAdress(desAddrName, ctxt->messageName); - if (!fileDes.empty()) { - if (fileDes.size() * sizeof(fileDesT) > sizeof(ctxt->fileDesPtr)) { - throw std::runtime_error("Memcpy failed when fileDes size large than ctxt fileDesPtr " + - IPC_ERROR(ErrCode::PARAM)); - } - memcpy(ctxt->fileDesPtr, fileDes.data(), fileDes.size() * sizeof(fileDesT)); - } - return ctxt; - } - - [[nodiscard]] bool TrySendMessage(Ctxt const & ctxt, bool retryOnConnRefused = true) - { - ssize_t retCode = sendmsg(socketFd, &ctxt.msghdr, MSG_DONTWAIT); - if (retCode > 0) { - return true; - } - if ((errno == EAGAIN || errno == EWOULDBLOCK) && retCode == -1) { - return false; - } - if (retryOnConnRefused && errno == ECONNREFUSED && retCode == -1) { - return false; - } - throw std::runtime_error("TrySendMessage occur " + std::string(std::strerror(errno)) + " " + - IPC_ERROR(ErrCode::PARAM)); - } - - [[nodiscard]] auto BuildNpuRcvCtxt(const std::vector &npuPayLoad) - { - return BuildNpuCtxt_(npuPayLoad, MaxNumFileDes); - } - - [[nodiscard]] bool TryRcvMessage(Ctxt &ctxt) noexcept - { - auto retCode = recvmsg(socketFd, &ctxt.msghdr, MSG_DONTWAIT); - if (retCode > 0) { - return true; - } - if (retCode == 0) { - return false; - } - if (errno == EWOULDBLOCK || errno == EAGAIN) { - return false; - } - throw std::runtime_error("TryRcvMessage occur " + std::string(std::strerror(errno)) + " " + - IPC_ERROR(ErrCode::PARAM)); - } - - [[nodiscard]] bool TryPeekMessage(Ctxt &ctxt) - { - ssize_t ret = recvmsg(socketFd, &ctxt.msghdr, MSG_DONTWAIT | MSG_PEEK); - if (ret > 0) { - return true; - } - if (ret == 0) { - return false; - } - if (errno == EAGAIN || errno == EWOULDBLOCK) { - return false; - } - throw std::runtime_error("TryPeekMessage occur " + std::string(std::strerror(errno))); - } - - const char *GetName(Ctxt const & ctxt) const noexcept - { - if (ctxt.messageName.sun_path[0] != STR_END_CHAR) { - throw std::runtime_error("GetName() want to got abstract socket, but got " + - std::string(ctxt.messageName.sun_path)); - } - return ctxt.messageName.sun_path + 1; - } - - std::vector GetFileDes(const Ctxt &ctxt) const - { - struct cmsghdr *cmg = CMSG_FIRSTHDR(&ctxt.msghdl); - unsigned numFileDes = (cmg->cmsg_len - sizeof(struct cmsghdr)) / sizeof(fileDesT); - return { ctxt.fileDesPtr, ctxt.fileDesPtr + numFileDes }; - } - -protected: - fileDesT socketFd; - size_t SetSocketAdress(const std::string &srcSocket, struct sockaddr_un &destSocket) - { - if (srcSocket.size() > addressMaxLen) { - throw std::runtime_error("Abstract UNIX Socket path cannot be larger than addressMaxLen"); - } - destSocket.sun_family = AF_UNIX; - destSocket.sun_path[0] = STR_END_CHAR; - if (srcSocket.empty()) { - return sizeof(sa_family_t); - } - srcSocket.copy(destSocket.sun_path + 1, srcSocket.size()); - destSocket.sun_path[srcSocket.size() + 1] = STR_END_CHAR; - return sizeof(sa_family_t) + srcSocket.size() + 2; // 2 - } - - auto BuildNpuCtxt_(const std::vector &npuPayLoad, unsigned numFileDes) - { - auto ctxt = std::make_unique(npuPayLoad.size()); - std::memset(&ctxt->msghdr, 0, sizeof(ctxt->msghdr)); - for (auto i = 0; i < npuPayLoad.size(); i++) { - ctxt->iov[i] = {npuPayLoad[i].data, npuPayLoad[i].size}; - } - ctxt->msghdr.msg_name = &ctxt->messageName; - ctxt->msghdr.msg_namelen = sizeof(decltype(ctxt->messageName)); - ctxt->msghdr.msg_iov = ctxt->iov.data(); - ctxt->msghdr.msg_iovlen = npuPayLoad.size(); - ctxt->fileDesPtr = nullptr; - if (numFileDes == 0) { - return ctxt; - } - const size_t fileDesSize = sizeof(fileDesT) * numFileDes; - ctxt->msghdr.msg_control = ctxt->ancillaryBuf; - ctxt->msghdr.msg_controllen = CMSG_SPACE(fileDesSize); - - struct cmsghdr *cmsg = CMSG_FIRSTHDR(&ctxt->msghdr); - cmsg->cmsg_level = SOL_SOCKET; - cmsg->cmsg_type = SCM_RIGHTS; - cmsg->cmsg_len = CMSG_LEN(fileDesSize); - ctxt->fileDesPtr = ReinterpretConvert(CMSG_DATA(cmsg)); - return ctxt; - } -}; - -} // namespace ipc_monitor -} // namespace dynolog_npu - -#endif diff --git a/dynolog_npu/plugin/ipc_monitor/PyDynamicMonitorProxy.h b/dynolog_npu/plugin/ipc_monitor/PyDynamicMonitorProxy.h deleted file mode 100644 index 8b5f88abf9d2cf589bec685cd3a520729afe8dd5..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/PyDynamicMonitorProxy.h +++ /dev/null @@ -1,40 +0,0 @@ -#ifndef PYDYNAMIC_MONITOR_PROXY_H -#define PYDYNAMIC_MONITOR_PROXY_H - -#include -#include -#include "MonitorBase.h" -#include "DynoLogNpuMonitor.h" - -namespace dynolog_npu { -namespace ipc_monitor { - -class PyDynamicMonitorProxy { -public: - PyDynamicMonitorProxy() = default; - bool InitDyno(int npuId) - { - try { - monitor_ = DynoLogNpuMonitor::GetInstance(); - monitor_->SetNpuId(npuId); - bool res = monitor_->Init(); - return res; - } catch (const std::exception &e) { - std::cout << "[ERROR] Error when init dyno " << e.what() << std::endl; - return false; - } - } - - std::string PollDyno() - { - return monitor_->Poll(); - }; - -private: - MonitorBase *monitor_ = nullptr; -}; - -} // namespace ipc_monitor -} // namespace dynolog_npu - -#endif diff --git a/dynolog_npu/plugin/ipc_monitor/singleton.h b/dynolog_npu/plugin/ipc_monitor/singleton.h deleted file mode 100644 index 8bb106f3adc8b365ef81feb603c6aaac917a00e2..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/singleton.h +++ /dev/null @@ -1,31 +0,0 @@ -#ifndef SINGLETON_H -#define SINGLETON_H -#include - -namespace dynolog_npu { -namespace ipc_monitor { - -template -class Singleton { -public: - static T *GetInstance() noexcept(std::is_nothrow_constructible::value) { - static T instance; - return &instance; - } - - virtual ~Singleton() = default; - -protected: - explicit Singleton() = default; - -private: - explicit Singleton(const Singleton &obj) = delete; - Singleton& operator=(const Singleton &obj) = delete; - explicit Singleton(Singleton &&obj) = delete; - Singleton& operator=(Singleton &&obj) = delete; -}; - -} // ipc_monitor -} // dynolog_npu - -#endif \ No newline at end of file diff --git a/dynolog_npu/plugin/ipc_monitor/utils.cpp b/dynolog_npu/plugin/ipc_monitor/utils.cpp deleted file mode 100644 index 936821fd34bc34bc9db9e09515132e8af39ba57a..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/utils.cpp +++ /dev/null @@ -1,135 +0,0 @@ -#include "utils.h" - -namespace dynolog_npu { -namespace ipc_monitor { -std::unordered_map submoduleMap = { - {SubModule::IPC, "IPC"}, -}; - -std::unordered_map errCodeMap = { - {ErrCode::SUC, "success"}, - {ErrCode::PARAM, "invalid parameter"}, - {ErrCode::TYPE, "invalid type"}, - {ErrCode::VALUE, "invalid value"}, - {ErrCode::PTR, "invalid pointer"}, - {ErrCode::INTERNAL, "internal error"}, - {ErrCode::MEMORY, "memory error"}, - {ErrCode::NOT_SUPPORT, "feature not supported"}, - {ErrCode::NOT_FOUND, "resource not found"}, - {ErrCode::UNAVAIL, "resource unavailable"}, - {ErrCode::SYSCALL, "system call failed"}, - {ErrCode::TIMEOUT, "timeout error"}, - {ErrCode::PERMISSION, "permission error"}, -}; - -std::string getCurrentTimestamp() -{ - auto now = std::chrono::system_clock::now(); - auto micros = std::chrono::duration_cast(now.time_since_epoch()); - - std::time_t currentTime = std::chrono::system_clock::to_time_t(now); - std::tm* timeInfo = std::localtime(¤tTime); - - auto milli_time = std::chrono::duration_cast(micros).count() % 1000; - auto micro_time = micros.count() % 1000; - - std::ostringstream oss; - oss << std::put_time(timeInfo, "%Y-%m-%d-%H:%M:%S"); - return oss.str(); -} - -std::string formatErrorCode(SubModule submodule, ErrCode errorCode) -{ - std::ostringstream oss; - oss << "\n[ERROR] " << getCurrentTimestamp() << " (PID:" << getpid() << ")"; - oss << "ERR" << std::setw(2) << std::setfill('0') << static_cast(submodule); // 2: 字段宽度 - oss << std::setw(3) << std::setfill('0') << static_cast(errorCode); // 3: 字段宽度 - oss << " " << submoduleMap[submodule] << " " << errCodeMap[errorCode]; - - return oss.str(); -}; - - -int32_t GetProcessId() -{ - return static_cast(getpid()); -} - -std::pair GetParentPidAndCommand(int32_t pid) -{ - std::string fileName = "/proc/" + std::to_string(pid) + "/stat"; - std::ifstream statFile(fileName); - if (!statFile) { - return std::make_pair(0, ""); - } - int32_t parentPid = 0; - std::string command; - std::string line; - if (std::getline(statFile, line)) { - int ret = sscanf(line.c_str(), "%*d (%[^)]) %*c %d", command.data(), &parentPid); - if (ret == 2) { // 2: 接收到2个字符 - std::cout << "[INFO] Success to get parent pid: " << parentPid << std::endl; - return std::make_pair(parentPid, command); - } - } - std::cout << "[WARNING] Failed to parse /proc/" << pid << "/stat" << std::endl; - return std::make_pair(0, ""); -} - -std::vector> GetPidCommandPairsofAncestors() -{ - std::vector> process_pids_and_cmds; - process_pids_and_cmds.reserve(MaxParentPids + 1); - int32_t current_pid = GetProcessId(); - for (int i = 0; i <= MaxParentPids && (i == 0 || current_pid > 1); i++) { - std::pair parent_pid_and_cmd = GetParentPidAndCommand(current_pid); - process_pids_and_cmds.push_back(std::make_pair(current_pid, parent_pid_and_cmd.second)); - current_pid = parent_pid_and_cmd.first; - } - return process_pids_and_cmds; -} - -std::vector GetPids() -{ - const auto &pids = GetPidCommandPairsofAncestors(); - std::vector res; - res.reserve(pids.size()); - for (const auto &pidPair : pids) { - res.push_back(pidPair.first); - } - return res; -} -std::string GenerateUuidV4() -{ - static std::random_device randomDevice; - static std::mt19937 gen(randomDevice()); - static std::uniform_int_distribution<> dis(0, 15); // range (0, 15) - static std::uniform_int_distribution<> dis2(8, 11); // range (8, 11) - - std::stringstream stringStream; - stringStream << std::hex; - for (int i = 0; i < 8; i++) { // 8 times - stringStream << dis(gen); - } - stringStream << "-"; - for (int j = 0; j < 4; j++) { // 4 times - stringStream << dis(gen); - } - stringStream << "-4"; // add -4 - for (int k = 0; k < 3; k++) { // 3 times - stringStream << dis(gen); - } - stringStream << "-"; - stringStream << dis2(gen); - for (int m = 0; m < 3; m++) { // 3 times - stringStream << dis(gen); - } - stringStream << "-"; - for (int n = 0; n < 12; n++) { // 12 times - stringStream << dis(gen); - } - return stringStream.str(); -} - -} // namespace ipc_monitor -} // namespace dynolog_npu diff --git a/dynolog_npu/plugin/ipc_monitor/utils.h b/dynolog_npu/plugin/ipc_monitor/utils.h deleted file mode 100644 index 0d8ceb8cfd0bf81b6d8b807c6ac1b505276ddf83..0000000000000000000000000000000000000000 --- a/dynolog_npu/plugin/ipc_monitor/utils.h +++ /dev/null @@ -1,63 +0,0 @@ -#ifndef IPC_MONITOR_UTILS_H -#define IPC_MONITOR_UTILS_H -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - - -namespace dynolog_npu { -namespace ipc_monitor { - -constexpr int MaxParentPids = 5; -int32_t GetProcessId(); -std::string GenerateUuidV4(); -std::vector GetPids(); -std::pair GetParentPidAndCommand(int32_t pid); -std::vector> GetPidCommandPairsofAncestors(); -std::string getCurrentTimestamp(); - -enum class SubModule { - IPC = 0 -}; - -enum class ErrCode { - SUC = 0, - PARAM = 1, - TYPE = 2, - VALUE = 3, - PTR = 4, - INTERNAL = 5, - MEMORY = 6, - NOT_SUPPORT = 7, - NOT_FOUND = 8, - UNAVAIL = 9, - SYSCALL = 10, - TIMEOUT = 11, - PERMISSION = 12, -}; - - -std::string formatErrorCode(SubModule submodule, ErrCode errorCode); - -#define IPC_ERROR(error) formatErrorCode(SubModule::IPC, error) - -template -inline T ReinterpretConvert(V ptr) { - return reinterpret_cast(ptr); -} - - -} // namespace ipc_monitor -} // namespace dynolog_npu - -#endif - diff --git a/dynolog_npu/scripts/apply_dyno_patches.sh b/dynolog_npu/scripts/apply_dyno_patches.sh deleted file mode 100644 index c492db74a2a56948433a47e9cffcccd4ac71e098..0000000000000000000000000000000000000000 --- a/dynolog_npu/scripts/apply_dyno_patches.sh +++ /dev/null @@ -1,36 +0,0 @@ -#! /bin/bash -set -e - -apply_ascend_patches() { - cd ./third_party/dynolog || return 1 - - if [ ! -d "../../patches" ]; then - echo "ERROR: patches directory not found" - cd ../.. - return 1 - fi - - for patch_file in ../../patches/*.patch; do - if [ -f "$patch_file" ]; then - echo "Applying patch: $patch_file" - git apply --check -p1 "$patch_file" - if [ $? -ne 0 ]; then - echo "ERROR: Failed to apply patch: $(basename $patch_file)" - cd ../.. - return 1 - fi - git apply -p1 "$patch_file" - if [ $? -ne 0 ]; then - echo "ERROR: Failed to apply patch: $(basename $patch_file)" - cd ../.. - return 1 - fi - fi - done - - cd ../.. - echo "Successfully applied all Ascend patches" - return 0 -} - -apply_ascend_patches \ No newline at end of file diff --git a/dynolog_npu/scripts/build.sh b/dynolog_npu/scripts/build.sh deleted file mode 100644 index aa3508e14faa6bfea06afe0cd3083ad1a5317037..0000000000000000000000000000000000000000 --- a/dynolog_npu/scripts/build.sh +++ /dev/null @@ -1,108 +0,0 @@ -#!/bin/bash -set -e - -check_gcc_version() { - if ! command -v gcc >/dev/null 2>&1; then - echo "ERROR: gcc command not found" - return 1 - fi - - local GCC_VERSION=$(gcc -dumpversion) - local GCC_MAJOR=$(echo $GCC_VERSION | cut -d. -f1) - local GCC_MINOR=$(echo $GCC_VERSION | cut -d. -f2) - - if [ "$GCC_MAJOR" -lt 8 ] || ([ "$GCC_MAJOR" -eq 8 ] && [ "$GCC_MINOR" -lt 5 ]); then - echo "ERROR: gcc version must be greater than or equal to 8.5.0" - echo "Current gcc version: $GCC_VERSION" - return 1 - fi - echo "Check pass: current gcc version is $GCC_VERSION" - return 0 -} - -check_rust_version() { - if ! command -v rustc >/dev/null 2>&1; then - echo "ERROR: rustc command not found" - return 1 - fi - - local RUST_VERSION=$(rustc --version | cut -d' ' -f2) - local RUST_MAJOR=$(echo $RUST_VERSION | cut -d. -f1) - local RUST_MINOR=$(echo $RUST_VERSION | cut -d. -f2) - - if [ "$RUST_MAJOR" -lt 1 ] || ([ "$RUST_MAJOR" -eq 1 ] && [ "$RUST_MINOR" -lt 56 ]); then - echo "ERROR: Rust version must be greater than or equal to 1.56.0" - echo "Current Rust version: $RUST_VERSION" - return 1 - fi - echo "Check pass: current Rust version is $RUST_VERSION" - return 0 -} - -update_and_checkout_submodule() { - DYNLOG_COMMIT_ID="a9b6aeddcd6363252f5388cb0dd942981a09a24b" - - git submodule update --init --recursive - if [ $? -ne 0 ]; then - echo "ERROR: update git submodule failed" - return 1 - fi - - cd ./third_party/dynolog - git checkout ${DYNLOG_COMMIT_ID} - if [ $? -ne 0 ]; then - echo "ERROR: switch to dynolog specified commit failed" - cd .. - return 1 - fi - echo "Check pass: switch to dynolog specified commit ${DYNLOG_COMMIT_ID}" - cd ../../ - return 0 -} - -PACKAGE_TYPE="" -while getopts "t:" opt; do - case $opt in - t) - PACKAGE_TYPE="$OPTARG" - if [[ "$PACKAGE_TYPE" != "deb" && "$PACKAGE_TYPE" != "rpm" ]]; then - echo "ERROR: Invalid package type. Supported types: deb, rpm" - exit 1 - fi - ;; - \?) - echo "Usage: $0 [-t package_type]" - echo "package_type: deb or rpm (optional, if not specified will only build)" - exit 1 - ;; - esac -done - -echo "------------------ Check GCC and Rust version ----------------------" -check_gcc_version -check_rust_version - -echo "------------------ Update and checkout submodule -------------------" -update_and_checkout_submodule - -echo "------------------ Generate patch for Ascend -----------------------" -bash scripts/gen_dyno_patches.sh - -echo "------------------ Apply patch for Ascend --------------------------" -bash scripts/apply_dyno_patches.sh - -echo "------------------ Build dynolog patch for Ascend-------------------" -cd third_party/dynolog -rm -rf build -if [ -z "$PACKAGE_TYPE" ]; then - bash scripts/build.sh - echo "Build dynolog success without packaging" -elif [ "$PACKAGE_TYPE" = "deb" ]; then - bash scripts/debian/make_deb.sh - mv dynolog_*.deb ../../ - echo "Build dynolog deb package success" -elif [ "$PACKAGE_TYPE" = "rpm" ]; then - bash scripts/rpm/make_rpm.sh - mv dynolog_*.rpm ../../ - echo "Build dynolog rpm package success" -fi diff --git a/dynolog_npu/scripts/gen_dyno_patches.sh b/dynolog_npu/scripts/gen_dyno_patches.sh deleted file mode 100644 index 5ade74dbcfcf88dfbc072c9de790ec4f3ec451d9..0000000000000000000000000000000000000000 --- a/dynolog_npu/scripts/gen_dyno_patches.sh +++ /dev/null @@ -1,63 +0,0 @@ -#!/bin/bash -set -e - -WORK_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" -PATCHES_DIR="${WORK_DIR}/patches" -DYNOLOG_DIR="${WORK_DIR}/third_party/dynolog" -MODIFIED_FILES_DIR="${WORK_DIR}/dynolog_npu" - -mkdir -p "${PATCHES_DIR}" - -generate_patches() { - echo "Generating patches from modified files..." - - # 检查修改后的文件目录是否存在 - if [ ! -d "${MODIFIED_FILES_DIR}" ]; then - echo "ERROR: dynolog_npu directory not found" - return 1 - fi - - # 清理旧的patch文件 - rm -f "${PATCHES_DIR}"/*.patch - - # 遍历修改后的文件目录 - find "${MODIFIED_FILES_DIR}" -type f | while read modified_file; do - # 获取相对路径 - rel_path=$(realpath --relative-to="${MODIFIED_FILES_DIR}" "${modified_file}") - original_file="${DYNOLOG_DIR}/${rel_path}" - - echo "original_file: ${original_file}" - # 检查原始文件是否存在 - if [ ! -f "${original_file}" ]; then - echo "WARN: Original file not found: ${original_file}" - - cp "${modified_file}" "${original_file}" - echo "Copied ${modified_file} to ${original_file}" - continue - fi - - # 生成patch文件名(将路径中的斜杠替换为下划线) - patch_name=$(echo "${rel_path}" | sed 's/\//_/g') - patch_file="${PATCHES_DIR}/${patch_name}.patch" - - echo "Generating patch for: ${rel_path}" - - ( - cd "${WORK_DIR}" - diff -u "third_party/dynolog/${rel_path}" "dynolog_npu/${rel_path}" > "${patch_file}" || true - ) - - # 检查patch文件大小 - if [ ! -s "${patch_file}" ]; then - rm "${patch_file}" - echo "No differences found for: ${rel_path}" - else - echo "Successfully generated patch: ${patch_file}" - fi - done - - echo "Patch generation completed" - return 0 -} - -generate_patches \ No newline at end of file diff --git a/dynolog_npu/third_party/dynolog b/dynolog_npu/third_party/dynolog deleted file mode 160000 index d5d37bc182bc2aa8fa60ba7d5ee897bacb5cbd4b..0000000000000000000000000000000000000000 --- a/dynolog_npu/third_party/dynolog +++ /dev/null @@ -1 +0,0 @@ -Subproject commit d5d37bc182bc2aa8fa60ba7d5ee897bacb5cbd4b diff --git a/flight_recoder/analysis_flight.py b/flight_recoder/analysis_flight.py deleted file mode 100644 index f81f771ab1c81ad79cb93401e200b600a4b17af3..0000000000000000000000000000000000000000 --- a/flight_recoder/analysis_flight.py +++ /dev/null @@ -1,164 +0,0 @@ -# Copyright (c) 2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Copyright Huawei Technologies Co., Ltd. 2024-2025. All rights reserved. - -import os -import pickle -import sys -import logging -from collections import defaultdict - -from check_path import get_valid_read_path - - -logging.basicConfig( - level=logging.INFO, # 设置日志级别为 INFO - format="%(asctime)s - %(levelname)s - %(message)s", # 设置日志格式 - handlers=[logging.StreamHandler()], # 输出到控制台 -) - - -SAFE_CLASSES = { - # 内置安全类型 - "builtins": {"str", "int", "float", "list", "dict", "tuple"}, -} - - -class SafeUnpickler(pickle.Unpickler): - def find_class(self, module, name): - # 检查模块和类是否在白名单中 - if module in SAFE_CLASSES and name in SAFE_CLASSES[module]: - return super().find_class(module, name) - raise pickle.UnpicklingError(f"Forbidden class: {module}.{name}") - - -def load_recorder_data(path, world_size): - """加载所有 rank 的 recorder 数据""" - recorder_dict = {} - for rank in range(world_size): - file_path = os.path.join(path, str(rank)) if not path.endswith("/") else path + str(rank) - file_path = get_valid_read_path(file_path) - try: - with open(file_path, "rb") as f: - res = SafeUnpickler(f).load() - recorder_dict[str(rank)] = res - except Exception as e: - logging.error(f"Failed to load data from {file_path}: {e}") - return recorder_dict - - -def extract_hccl_info(recorder_dict): - """从 recorder 数据中提取 HCCL 相关信息""" - hccl_dict = {} - for rank, recorder in recorder_dict.items(): - entries = recorder.get("entries", []) - if not entries: - continue - last_entry = entries[-1] - hccl_dict[rank] = { - "state": last_entry.get("state", None), - "record_id": last_entry.get("record_id", None), - "pg_id": last_entry.get("pg_id", None), - "time_discovered_completed_ns": last_entry.get("time_discovered_completed_ns", None), - "name": last_entry.get("frames", [{}])[0].get("name", None), - } - return hccl_dict - - -def analyze_pg_groups(hccl_dict): - """分析 HCCL 数据,按 pg_id 分组并检查问题""" - pg_groups = defaultdict(list) - for _, op in hccl_dict.items(): - pg_groups[op["pg_id"]].append(op) - - for pg_id, group in pg_groups.items(): - scheduled_ops = [op for op in group if op["state"] == "scheduled"] - completed_ops = [op for op in group if op["state"] == "completed"] - - # 情况 1: 所有卡都是 scheduled,且 record_id 和 name 相同 - if len(scheduled_ops) == len(group): - record_id = scheduled_ops[0]["record_id"] - name = scheduled_ops[0]["name"] - all_same = all(op["record_id"] == record_id and op["name"] == name for op in scheduled_ops) - if all_same: - logging.info( - f"The pg_id {pg_id}'s Communication Operator {name}" - " executed too slowly, causing the HCCL to time out." - ) - - # 情况 2: 存在 completed 算子且 该算子的record_id 比其他 scheduled 算子少 1 - elif completed_ops and scheduled_ops: - completed_op = completed_ops[0] - scheduled_record_id = scheduled_ops[0]["record_id"] - if completed_op["record_id"] == scheduled_record_id - 1: - logging.info( - f"The pg_id {pg_id}'s rank {completed_op['pg_id']}'s " - "Computational task took too long, causing the other ranks' " - "HCCL task to time out." - ) - - # 情况 3: 所有算子均为 completed - elif not scheduled_ops and completed_ops: - latest_op = max(completed_ops, key=lambda x: x["time_discovered_completed_ns"] or 0) - logging.info( - f"The computational task of the pg_id {pg_id} " - f"after the communication operator {latest_op['name']} " - "took too long." - ) - - else: - logging.info(f"The situation cannot be recognized!") - - -def get_int_arg(args, idx, default): - if len(args) > idx: - try: - return int(args[idx]) - except ValueError: - logging.warning(f"Invalid input {args[idx]}, using default: {default}") - return default - - -def main(): - # 设置默认值 - default_path = os.getenv("TORCH_HCCL_DEBUG_INFO_TEMP_FILE") - default_world_size = 8 - - # 获取命令行参数,如果未提供则使用默认值 - path = sys.argv[1] if len(sys.argv) > 1 else default_path - world_size = get_int_arg(sys.argv, 2, default_world_size) - - if not path: - raise ValueError("Path is required and cannot be empty.") - - logging.info(f"Path: {path}") - logging.info(f"World Size: {world_size}") - - # 加载数据 - recorder_dict = load_recorder_data(path, world_size) - if not recorder_dict: - logging.error("No valid recorder data found.") - return - - # 提取 HCCL 信息 - hccl_dict = extract_hccl_info(recorder_dict) - - # 分析 HCCL 数据 - analyze_pg_groups(hccl_dict) - - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/flight_recoder/check_path.py b/flight_recoder/check_path.py deleted file mode 100644 index b34e4dcdb68b28b44f387cb14919ad127658ca8f..0000000000000000000000000000000000000000 --- a/flight_recoder/check_path.py +++ /dev/null @@ -1,81 +0,0 @@ -# Copyright (c) 2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re -import os -import sys -import stat - - -PATH_WHITE_LIST_REGEX = re.compile(r"[^_A-Za-z0-9/.-]") -MAX_READ_FILE_SIZE_4G = 4294967296 # 4G, 4 * 1024 * 1024 * 1024 -MAX_READ_FILE_SIZE_32G = 34359738368 # 32G, 32 * 1024 * 1024 * 1024 -MAX_READ_FILE_SIZE_512G = 549755813888 # 512G, 512 * 1024 * 1024 * 1024 - -# group not writable, others no permission, max stat is 750 -WRITE_FILE_NOT_PERMITTED_STAT = stat.S_IWGRP | stat.S_IWOTH | stat.S_IROTH | stat.S_IXOTH -# group not writable, others not writable, max stat is 755 -READ_FILE_NOT_PERMITTED_STAT = stat.S_IWGRP | stat.S_IWOTH - - -def type_to_str(value_type): - return ' or '.join([ii.__name__ for ii in value_type]) if isinstance(value_type, tuple) else value_type.__name__ - - -def check_type(value, value_type, param_name="value"): - if not isinstance(value, value_type): - raise TypeError('{} must be {}, not {}.'.format(param_name, type_to_str(value_type), type(value).__name__)) - - -def get_valid_path(path): - check_type(path, str, "path") - if not path or len(path) == 0: - raise ValueError("The value of the path cannot be empty.") - if PATH_WHITE_LIST_REGEX.search(path): # Check special char - raise ValueError("Input path contains invalid characters.") # Not printing out the path value for invalid char - path = os.path.expanduser(path) # Consider paths starting with "~" - if os.path.islink(os.path.abspath(path)): # when checking link, get rid of the "/" at the path tail if any - raise ValueError("The value of the path cannot be soft link: {}.".format(path)) - - real_path = os.path.realpath(path) - - if len(real_path) > 4096: - raise ValueError("The length of file path should be less than 4096.") - - if real_path != path and PATH_WHITE_LIST_REGEX.search(real_path): # Check special char again - raise ValueError("Input path contains invalid characters.") # Not printing out the path value for invalid char - - return real_path - - -def is_belong_to_user_or_group(file_stat): - return file_stat.st_uid == os.getuid() or file_stat.st_gid in os.getgroups() - - -def get_valid_read_path(path, size_max=MAX_READ_FILE_SIZE_4G, check_user_stat=True, is_dir=False): - real_path = get_valid_path(path) - if not os.path.isfile(real_path): - raise ValueError("The path {} doesn't exists or not a file.".format(path)) - - file_stat = os.stat(real_path) - if check_user_stat and not sys.platform.startswith("win") and not is_belong_to_user_or_group(file_stat): - raise ValueError("The file {} doesn't belong to the current user or group.".format(path)) - if check_user_stat and os.stat(path).st_mode & READ_FILE_NOT_PERMITTED_STAT > 0: - raise ValueError("The file {} is group writable, or is others writable.".format(path)) - if not os.access(real_path, os.R_OK) or file_stat.st_mode & stat.S_IRUSR == 0: # At least been 400 - raise ValueError("Current user doesn't have read permission to the file {}.".format(path)) - if not is_dir and size_max > 0 and file_stat.st_size > size_max: - raise ValueError("The file {} exceeds size limitation of {}.".format(path, size_max)) - return real_path \ No newline at end of file diff --git a/flight_recoder/flight_recoder.md b/flight_recoder/flight_recoder.md deleted file mode 100644 index 8b398a6730bae0823b04c20a22258a81392922c9..0000000000000000000000000000000000000000 --- a/flight_recoder/flight_recoder.md +++ /dev/null @@ -1,49 +0,0 @@ -# 飞行记录器超时类问题分析 - -训练任务卡住是阻塞AI大规模分布式集群训练任务的主要和关键问题,当前需要等待集合通信超时才能感知,影响集群可用性。框架需要支持检测训练任务卡住问题,做到提前识别并保存必要的诊断信息,提高问题定位效率和集群设备可用性。当HeartbeatMonitor长时间未检测到心跳时,即可认为训练任务已经卡住,需要触发诊断信息保存。 - -本工具提供torch npu上飞行记录器flight recorder记录日志的读取解析能力,并根据解析后的日志提供超时类问题的初步分析能力,主要支持以下三种情况的超时类问题的识别和分析 - -|问题| 具体内容 | -| --- | --- | -|类型一 | 同通信域内的某张卡计算超时,导致其他卡等待触发飞行记录器和hccl time out | -|类型二 | 同通信域内的通信算子之后的非通信任务耗时过长| -|类型三 | 同通信域内的某个通信算子进行通信时执行超时 | - -## 使用方法 - -### 1 飞行记录器开启方法 - -按照如下方法设置环境变量开启飞行记录器 - -``` -export TORCH_HCCL_ENABLE_MONITORING=1 #用于检测是否开启卡住问题检测 -export TORCH_HCCL_DUMP_ON_TIMEOUT=1 # 用于控制是否保存诊断信息 -export TORCH_HCCL_TRACE_BUFFER_SIZE=1 # 用于控制保存的集合通信状态数量 -export TORCH_HCCL_HEARTBEAT_TIMEOUT_SEC=20 # 用于控制心跳超时时间,即训练业务多久未下发集合通信算子时需要判定为卡住,默认10分钟,单位s。(需要小于HCCL_EXEC_TIMEOUT,避免集合通信先报超时错误) -export TORCH_HCCL_DEBUG_INFO_TEMP_FILE=/tmp/ #保存诊断信息的文件路径 -``` - -### 2 工具使用方法 - -``` -python analysis_flight.py path world_size -``` - -脚本从命令行参数获取 `path` 和 `world_size` 的值,并记录日志。如果未提供命令行参数,则使用默认值。 - -* `path`:从命令行第一个参数获取,如果未提供则使用 `default_path`, default_path从TORCH_HCCL_DEBUG_INFO_TEMP_FILE获取。 -* `world_size`:从命令行第二个参数获取,如果未提供则使用 `default_world_size`,默认为8。 - -| 参数名| 含义 | 使用限制 | -| --- | --- | --- | -| path | 飞行记录器的日志 | 可选。数据类型:string 默认为环境变量中的TORCH_HCCL_DEBUG_INFO_TEMP_FILE,若设置日志格式指定有前缀,则需要在路径中加入前缀 | -| world_size | 同一个通信域中的卡数 | 可选。数据类型:int 默认为8 | - -### 3 输出示例 - -``` -2025-02-19 08:10:07,160 - INFO - Path: /tmp/ -2025-02-19 08:10:07,160 - INFO - World Size: 8 -2025-02-19 08:10:07,162 - INFO - The pg_id 0's rank 0's Computational task took too long, causing the other ranks' HCCL task to time out. -``` diff --git a/plugins/mindstudio-vscode-plugins/OWNERS b/plugins/mindstudio-vscode-plugins/OWNERS deleted file mode 100644 index 2c4ada94aa198321313f24bc0b0f289eba360c33..0000000000000000000000000000000000000000 --- a/plugins/mindstudio-vscode-plugins/OWNERS +++ /dev/null @@ -1,9 +0,0 @@ -options: - no_parent_owners: true -approvers: -- lee314 -- linxi9527 -reviewers: -- jzc_23 -- duanhaomiao -- yangqingliang4 \ No newline at end of file diff --git a/plugins/tensorboard-plugins/OWNERS b/plugins/tensorboard-plugins/ OWNERS similarity index 67% rename from plugins/tensorboard-plugins/OWNERS rename to plugins/tensorboard-plugins/ OWNERS index 8dd996262b04faf778976324fa4221e51c4bfa30..34c383beaf138da92df0991b472135496450a827 100644 --- a/plugins/tensorboard-plugins/OWNERS +++ b/plugins/tensorboard-plugins/ OWNERS @@ -3,8 +3,7 @@ options: approvers: - wo-wenjie - ly-qianxiao -- leo920320 -- ninghuang reviewers: +- wo-wenjie +- ly-qianxiao - leo920320 -- ninghuang diff --git a/plugins/tensorboard-plugins/.github/workflows/libkineto_ci.yml b/plugins/tensorboard-plugins/.github/workflows/libkineto_ci.yml new file mode 100644 index 0000000000000000000000000000000000000000..3133d6400fb0b3ca0ee9b38c311c2db6d1167c7e --- /dev/null +++ b/plugins/tensorboard-plugins/.github/workflows/libkineto_ci.yml @@ -0,0 +1,56 @@ +name: LIBKINETOCI + +on: + push: + branches: + - main + pull_request: + branches: + - main + +jobs: + build: + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [ubuntu-latest] + + steps: + - uses: actions/checkout@v2 + - name: Checkout submodules + shell: bash + run: | + auth_header="$(git config --local --get http.https://github.com/.extraheader)" + git submodule sync --recursive + git -c "http.extraheader=$auth_header" -c protocol.version=2 submodule update --init --force --recursive --depth=1 + + - name: Get env vars + run: | + echo GITHUB_WORKFLOW = $GITHUB_WORKFLOW + echo HOME = $HOME + echo GITHUB_ACTION = $GITHUB_ACTION + echo GITHUB_ACTIONS = $GITHUB_ACTIONS + echo GITHUB_REPOSITORY = $GITHUB_REPOSITORY + echo GITHUB_EVENT_NAME = $GITHUB_EVENT_NAME + echo GITHUB_EVENT_PATH = $GITHUB_EVENT_PATH + echo GITHUB_WORKSPACE = $GITHUB_WORKSPACE + echo GITHUB_SHA = $GITHUB_SHA + echo GITHUB_REF = $GITHUB_REF + c++ --verbose + + # TODO: Figure out how to install cupti headers T84637671 + - name: Build static lib + run: | + set -e + mkdir build_static + cd build_static + cmake -DKINETO_LIBRARY_TYPE=static ../libkineto/ + make -j + + - name: Build shared lib + run: | + set -e + mkdir build_shared + cd build_shared + cmake -DKINETO_LIBRARY_TYPE=shared ../libkineto/ + make -j diff --git a/plugins/tensorboard-plugins/.github/workflows/tb_plugin_build_pip_package.yml b/plugins/tensorboard-plugins/.github/workflows/tb_plugin_build_pip_package.yml new file mode 100644 index 0000000000000000000000000000000000000000..9bdafcc442635eaff19fc7a7505f5231cf6e5cf7 --- /dev/null +++ b/plugins/tensorboard-plugins/.github/workflows/tb_plugin_build_pip_package.yml @@ -0,0 +1,19 @@ +name: Build torch-tb-profiler Pip Package + +on: + # TODO: Add an on_release trigger to build on tags + workflow_dispatch: + +jobs: + build-package: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - name: build pip package + run: | + set -e + cd tb_plugin + python setup.py sdist bdist_wheel + cd dist/ + pip install *.whl + python -c "import torch_tb_profiler;print(torch_tb_profiler.__version__)" diff --git a/plugins/tensorboard-plugins/.github/workflows/tb_plugin_ci.yml b/plugins/tensorboard-plugins/.github/workflows/tb_plugin_ci.yml new file mode 100644 index 0000000000000000000000000000000000000000..1b59a7bf90a6009caa41d4ac0e3d5545dc8b6c7c --- /dev/null +++ b/plugins/tensorboard-plugins/.github/workflows/tb_plugin_ci.yml @@ -0,0 +1,57 @@ +name: TB_Plugin_CI + +on: + push: + branches: + - main + - release/** + - plugin/** + + pull_request: + branches: + - main + - release/** + - plugin/** + +jobs: + generate-matrix: + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - id: set-matrix + run: | + echo $GITHUB_BASE_REF + if [ $GITHUB_BASE_REF == "plugin/vnext" ] + then + echo "::set-output name=matrix::{\"python-version\":[3.7, 3.8, 3.9], \"cuda-version\":[\"cpu\"], \"pytorch-version\":[\"nightly\"]}" + else + echo "::set-output name=matrix::{\"python-version\":[3.7, 3.8, 3.9], \"cuda-version\":[\"cpu\"], \"pytorch-version\":[\"nightly\", \"1.11rc\", \"stable\"]}" + fi + + build: + needs: generate-matrix + runs-on: ubuntu-latest + strategy: + matrix: ${{fromJSON(needs.generate-matrix.outputs.matrix)}} + steps: + - uses: actions/checkout@v2 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + architecture: 'x64' + - name: Test + env: + CUDA_VERSION: ${{ matrix.cuda-version }} + PYTORCH_VERSION: ${{ matrix.pytorch-version }} + TORCH_PROFILER_LOG_LEVEL: DEBUG + GRPC_VERBOSITY: DEBUG + GRPC_ENABLE_FORK_SUPPORT: 'False' + run: | + set -e + cd tb_plugin + sh ./ci_scripts/install_env.sh + pip install .[gs] + cd test + pytest diff --git a/plugins/tensorboard-plugins/.gitignore b/plugins/tensorboard-plugins/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..ce186381c0b566e0ca225be70cbf8ac233d7aa6b --- /dev/null +++ b/plugins/tensorboard-plugins/.gitignore @@ -0,0 +1,3 @@ +# ignore common items +.idea +.vscode diff --git a/plugins/tensorboard-plugins/.gitmodules b/plugins/tensorboard-plugins/.gitmodules new file mode 100644 index 0000000000000000000000000000000000000000..4660ee8bc9e6a4be4f4fbb007b8e66058122d716 --- /dev/null +++ b/plugins/tensorboard-plugins/.gitmodules @@ -0,0 +1,6 @@ +[submodule "libkineto/third_party/googletest"] + path = libkineto/third_party/googletest + url = https://github.com/google/googletest.git +[submodule "libkineto/third_party/fmt"] + path = libkineto/third_party/fmt + url = https://github.com/fmtlib/fmt.git diff --git a/plugins/tensorboard-plugins/CODE_OF_CONDUCT.md b/plugins/tensorboard-plugins/CODE_OF_CONDUCT.md new file mode 100644 index 0000000000000000000000000000000000000000..a0cbeaab7650bf08267fbdbc9bb54e845c88f392 --- /dev/null +++ b/plugins/tensorboard-plugins/CODE_OF_CONDUCT.md @@ -0,0 +1,77 @@ +# Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to make participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or + advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic + address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies within all project spaces, and it also applies when +an individual is representing the project or its community in public spaces. +Examples of representing a project or community include using an official +project e-mail address, posting via an official social media account, or acting +as an appointed representative at an online or offline event. Representation of +a project may be further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at . All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq + diff --git a/plugins/tensorboard-plugins/CONTRIBUTING.md b/plugins/tensorboard-plugins/CONTRIBUTING.md new file mode 100644 index 0000000000000000000000000000000000000000..a2e931bb6f0cc82ff030cee10ee1c99fbbbda07b --- /dev/null +++ b/plugins/tensorboard-plugins/CONTRIBUTING.md @@ -0,0 +1,34 @@ +# Contributing to Kineto +We want to make contributing to this project as easy and transparent as +possible. + +## Code of Conduct +The code of conduct is described in [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md). + +## Pull Requests +We actively welcome your pull requests. + +1. Fork the repo and create your branch from `main`. +2. If you've added code that should be tested, add tests. +3. If you've changed APIs, update the documentation. +4. Ensure the test suite passes. +5. Make sure your code lints. +6. If you haven't already, complete the Contributor License Agreement ("CLA"). + +## Contributor License Agreement ("CLA") +In order to accept your pull request, we need you to submit a CLA. You only need +to do this once to work on any of Facebook's open source projects. + +Complete your CLA here: + +## Issues +We use GitHub issues to track public bugs. Please ensure your description is +clear and has sufficient instructions to be able to reproduce the issue. + +Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe +disclosure of security bugs. In those cases, please go through the process +outlined on that page and do not file a public issue. + +## License +By contributing to Kineto, you agree that your contributions will be licensed +under the LICENSE file in the root directory of this source tree. diff --git a/plugins/tensorboard-plugins/LICENSE b/plugins/tensorboard-plugins/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..edb179715b5213644cfe903d43294f54892e707e --- /dev/null +++ b/plugins/tensorboard-plugins/LICENSE @@ -0,0 +1,33 @@ +BSD License + +For Kineto software + +Copyright (c) Facebook, Inc. and its affiliates. All rights reserved. + +All contributions by Microsoft: +Copyright (c) Microsoft Corporation. (The Azure AI Platform team) + +Redistribution and use in source and binary forms, with or without modification, +are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + + * Neither the name Facebook nor the names of its contributors may be used to + endorse or promote products derived from this software without specific + prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR +ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/plugins/tensorboard-plugins/README.md b/plugins/tensorboard-plugins/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3a18f4c6239f353c10362c9e0ba5aae052cb2c07 --- /dev/null +++ b/plugins/tensorboard-plugins/README.md @@ -0,0 +1,38 @@ +# Kineto + +Kineto is part of the PyTorch Profiler. + +The Kineto project was started to help enable +- **performance observability and diagnostics** across common ML bottleneck components +- **actionable recommendations** for common issues +- integration of external system-level profiling tools +- integration with popular visualization platforms and analysis pipelines + +A central component is libkineto, a profiling library with special focus on low-overhead GPU timeline tracing. + +The PyTorch Profiler TensorBoard plugin provides powerful and intuitive visualizations of profiling results, as well as actionable recommendations, and is the best way to experience the new PyTorch Profiler. + +## Libkineto +Libkineto is an in-process profiling library integrated with the PyTorch Profiler. Please refer to the [README](libkineto/README.md) file in the `libkineto` folder as well as documentation on the [new PyTorch Profiler API](https://pytorch.org/docs/master/profiler.html). + +## PyTorch TensorBoard Profiler NPU Plugin +The goal of the PyTorch TensorBoard Profiler is to provide a seamless and intuitive end-to-end profiling experience, including straightforward collection from PyTorch and insightful visualizations and recommendations in the TensorBoard UI. +Please refer to the [README](tb_plugin/README.md) file in the `tb_plugin` folder. + +## Future Development Direction: +Some areas we're currently working on: +- Support for tracing distributed workloads +- Trace processing, analysis and recommendation engine +- System-level activities, multiple tracing sources +- Profiling and monitoring daemon for larger scale deployments + +## Releases and Contributing +We will follow the PyTorch release schedule which roughly happens on a 3 month basis. + +We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion. + +If you plan to contribute new features, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the infrastructure in a different direction than you might be aware of. We expect the architecture to keep evolving. + +## License +Kineto has a BSD-style license, as found in the [LICENSE](LICENSE) file. + diff --git a/plugins/tensorboard-plugins/libkineto/CMakeLists.txt b/plugins/tensorboard-plugins/libkineto/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..63966de803a786913b104419776aa94bb00b74b0 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/CMakeLists.txt @@ -0,0 +1,198 @@ +cmake_minimum_required(VERSION 3.5 FATAL_ERROR) + +list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules") + +#install libraries into correct locations on all platforms +include(GNUInstallDirs) + +# function to extract filelists from libkineto_defs.bzl file +find_package(PythonInterp) +function(get_filelist name outputvar) + execute_process( + COMMAND "${PYTHON_EXECUTABLE}" -c + "exec(open('libkineto_defs.bzl').read());print(';'.join(${name}))" + WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" + OUTPUT_VARIABLE _tempvar) + string(REPLACE "\n" "" _tempvar "${_tempvar}") + set(${outputvar} ${_tempvar} PARENT_SCOPE) +endfunction() + +project(kineto VERSION 0.1 LANGUAGES CXX C) + +set(KINETO_LIBRARY_TYPE "default" CACHE STRING + "Type of library (default, static or shared) to build") +set_property(CACHE KINETO_LIBRARY_TYPE PROPERTY STRINGS default shared) +option(KINETO_BUILD_TESTS "Build kineto unit tests" ON) + +set(LIBKINETO_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/src") +set(LIBKINETO_INCLUDE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/include") +set(LIBKINETO_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR}) +set(LIBKINETO_THIRDPARTY_DIR "${CMAKE_CURRENT_SOURCE_DIR}/third_party") +set(CMAKE_EXPORT_COMPILE_COMMANDS ON) + +#We should default to a Release build +if (NOT CMAKE_BUILD_TYPE OR CMAKE_BUILD_TYPE STREQUAL "") + set(CMAKE_BUILD_TYPE "Release" CACHE STRING "" FORCE) +endif() + +if (NOT CUDA_SOURCE_DIR) + set(CUDA_SOURCE_DIR "$ENV{CUDA_SOURCE_DIR}") + message(INFO " CUDA_SOURCE_DIR = ${CUDA_SOURCE_DIR}") +endif() + +if (NOT ROCM_SOURCE_DIR) + set(ROCM_SOURCE_DIR "$ENV{ROCM_SOURCE_DIR}") + message(INFO " ROCM_SOURCE_DIR = ${ROCM_SOURCE_DIR}") +endif() + +# Set LIBKINETO_NOCUPTI to explicitly disable CUPTI +# Otherwise, CUPTI is disabled if not found +IF (NOT CUDA_SOURCE_DIR OR NOT CUPTI_INCLUDE_DIR OR NOT CUDA_cupti_LIBRARY) + set(LIBKINETO_NOCUPTI ON CACHE BOOL "" FORCE) +endif() + +IF (NOT ROCM_SOURCE_DIR AND NOT ROCTRACER_INCLUDE_DIR) + set(LIBKINETO_NOROCTRACER ON CACHE BOOL "" FORCE) +endif() + +# Define file lists +if (LIBKINETO_NOCUPTI AND LIBKINETO_NOROCTRACER) + get_filelist("get_libkineto_cpu_only_srcs(with_api=False)" LIBKINETO_SRCS) + message(INFO " CUPTI unavailable or disabled - not building GPU profilers") +elseif(NOT LIBKINETO_NOROCTRACER) + get_filelist("get_libkineto_roctracer_srcs()" LIBKINETO_SRCS) + message(INFO " Building with roctracer") +else() + get_filelist("get_libkineto_cupti_srcs(with_api=False)" LIBKINETO_SRCS) +endif() +get_filelist("get_libkineto_public_headers()" LIBKINETO_PUBLIC_HEADERS) +get_filelist("get_libkineto_api_srcs()" LIBKINETO_API_SRCS) + +add_library(kineto_base OBJECT ${LIBKINETO_SRCS}) +add_library(kineto_api OBJECT ${LIBKINETO_API_SRCS}) + +# Make libraries depend on libkineto_defs.bzl +add_custom_target(libkineto_defs.bzl DEPENDS libkineto_defs.bzl) +add_dependencies(kineto_base libkineto_defs.bzl) + +set_target_properties(kineto_base kineto_api PROPERTIES + CXX_STANDARD 14 + CXX_STANDARD_REQUIRED YES + CXX_EXTENSIONS NO + CXX_VISIBILITY_PRESET hidden) + +set(KINETO_COMPILE_OPTIONS "-DKINETO_NAMESPACE=libkineto") +list(APPEND KINETO_COMPILE_OPTIONS "-DFMT_HEADER_ONLY") +if(NOT MSVC) + list(APPEND KINETO_COMPILE_OPTIONS "-std=c++14") +else() + list(APPEND KINETO_COMPILE_OPTIONS "/std:c++14") + list(APPEND KINETO_COMPILE_OPTIONS "-DWIN32_LEAN_AND_MEAN") + list(APPEND KINETO_COMPILE_OPTIONS "-DNOGDI") +endif() +if (NOT LIBKINETO_NOCUPTI) + list(APPEND KINETO_COMPILE_OPTIONS "-DHAS_CUPTI") +endif() +if (NOT LIBKINETO_NOROCTRACER) + target_compile_options(kineto_base PRIVATE "-DHAS_ROCTRACER") + target_compile_options(kineto_base PRIVATE "-D__HIP_PLATFORM_HCC__") + target_compile_options(kineto_base PRIVATE "-D__HIP_PLATFORM_AMD__") +endif() + +target_compile_options(kineto_base PRIVATE "${KINETO_COMPILE_OPTIONS}") +target_compile_options(kineto_api PRIVATE "${KINETO_COMPILE_OPTIONS}") + +if(NOT TARGET fmt) + if(NOT FMT_SOURCE_DIR) + set(FMT_SOURCE_DIR "${LIBKINETO_THIRDPARTY_DIR}/fmt" + CACHE STRING "fmt source directory from submodules") + endif() + + # Build FMT. + # FMT and some other libraries use BUILD_SHARED_LIBS to control + # the library type. + # Save and restore the value after configuring FMT + set(TEMP_BUILD_SHARED_LIBS ${BUILD_SHARED_LIBS}) + set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build shared libs" FORCE) + set(FMT_LIBRARY_TYPE static CACHE STRING "Set lib type to static") + add_subdirectory("${FMT_SOURCE_DIR}" "${LIBKINETO_BINARY_DIR}/fmt") + set_property(TARGET fmt PROPERTY POSITION_INDEPENDENT_CODE ON) + set(BUILD_SHARED_LIBS ${TEMP_BUILD_SHARED_LIBS} CACHE BOOL "Build shared libs" FORCE) +endif() + +set(FMT_INCLUDE_DIR "${FMT_SOURCE_DIR}/include") +message(STATUS "Kineto: FMT_SOURCE_DIR = ${FMT_SOURCE_DIR}") +message(STATUS "Kineto: FMT_INCLUDE_DIR = ${FMT_INCLUDE_DIR}") +if (NOT CUPTI_INCLUDE_DIR) + set(CUPTI_INCLUDE_DIR "${CUDA_SOURCE_DIR}/extras/CUPTI/include") +endif() +if (NOT CUDA_INCLUDE_DIRS) + set(CUDA_INCLUDE_DIRS "${CUDA_SOURCE_DIR}/include") +endif() +if (NOT ROCTRACER_INCLUDE_DIR) + set(ROCTRACER_INCLUDE_DIR "${ROCM_SOURCE_DIR}/roctracer/include") +endif() +if (NOT ROCM_INCLUDE_DIRS) + set(ROCM_INCLUDE_DIRS "${ROCM_SOURCE_DIR}/include") +endif() + +message(INFO " CUPTI_INCLUDE_DIR = ${CUPTI_INCLUDE_DIR}") +message(INFO " ROCTRACER_INCLUDE_DIR = ${ROCTRACER_INCLUDE_DIR}") + +target_include_directories(kineto_base PUBLIC + $ + $ + $ + $ + $ + $ + $) + +target_include_directories(kineto_api PUBLIC + $ + $) + +if(KINETO_LIBRARY_TYPE STREQUAL "default") + add_library(kineto + $ + $) +elseif(KINETO_LIBRARY_TYPE STREQUAL "static") + add_library(kineto STATIC + $ + $) +elseif(KINETO_LIBRARY_TYPE STREQUAL "shared") + add_library(kineto SHARED + $) + set_property(TARGET kineto_base PROPERTY POSITION_INDEPENDENT_CODE ON) + set_target_properties(kineto PROPERTIES + CXX_VISIBILITY_PRESET hidden) +else() + message(FATAL_ERROR "Unsupported library type ${KINETO_LIBRARY_TYPE}") +endif() + +if(NOT LIBKINETO_NOROCTRACER) + find_library(ROCTRACER_LIBRARY NAMES libroctracer64.so HINTS /opt/rocm/roctracer/lib) + target_link_libraries(kineto "${ROCTRACER_LIBRARY}") + find_library(KINETO_HIP_LIBRARY NAMES libamdhip64.so HINTS /opt/rocm/lib) + target_link_libraries(kineto "${KINETO_HIP_LIBRARY}") +endif() + +if(NOT LIBKINETO_NOCUPTI) + target_link_libraries(kineto "${CUDA_cupti_LIBRARY}") +endif() +target_link_libraries(kineto $) +add_dependencies(kineto fmt::fmt-header-only) + +install(TARGETS kineto EXPORT kinetoLibraryConfig + ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} + LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}) + +install(FILES ${LIBKINETO_PUBLIC_HEADERS} + DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/kineto") + +install(EXPORT kinetoLibraryConfig DESTINATION share/cmake/kineto + FILE kinetoLibraryConfig.cmake) + +if(KINETO_BUILD_TESTS) + add_subdirectory(test) +endif() diff --git a/plugins/tensorboard-plugins/libkineto/README.md b/plugins/tensorboard-plugins/libkineto/README.md new file mode 100644 index 0000000000000000000000000000000000000000..37127ca5aa821217da48aad38cb82eb36f8735c2 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/README.md @@ -0,0 +1,65 @@ +# Libkineto + +Libkineto is an in-process profiling library, part of the Kineto performance +tools project. + +The library provides a way to collect GPU traces and metrics from the host +process, either via the library public API or by sending a signal, if enabled. + +Currently only NVIDIA GPUs are supported. + +## Build Notes +Libkineto uses the standard CMAKE-based build flow. + +### Dependencies +Libkineto requires gcc 5+ and: + +- NVIDIA CUPTI: used to collect traces and metrics from NVIDIA GPUs. +- fmt: used for its convenient and lightweight string formatting functionality. +- googletest: required to build and run Kineto's tests. + - **googletest is not required** if you don't want to run Kineto tests. +By default, building of tests is **on**. Turn it off by setting `KINETO_BUILD_TESTS` to **off**. + +You can download [NVIDIA CUPTI][1], [fmt][2], [googletest][3] and set +`CUDA_SOURCE_DIR`, `FMT_SOURCE_DIR`, `GOOGLETEST_SOURCE_DIR` respectively for +cmake to find these libraries. If the fmt and googletest variables are not set, cmake will +build the git submodules found in the `third_party` directory. +If `CUDA_SOURCE_DIR` is not set, libkineto will fail to build. + +### Building Libkineto + +``` +# Check out repo and sub modules +git clone --recursive https://github.com/pytorch/kineto.git +# Build libkineto with cmake +cd kineto/libkineto +mkdir build && cd build +cmake .. +make +``` + +To run the tests after building libkineto (if tests are built), use the following +command: +``` +make test +``` + +### Installing Libkineto +``` +make install +``` + +## How Libkineto works +We will provide a high-level overview, design philosophy and brief descriptions of various +parts of Libkineto in upcoming blogs. + +## Full documentation +We strive to keep our source files readable. The best and up-to-date +documentation is available in the source files. + +## License +Libkineto is BSD licensed, as detailed in the [LICENSE](../LICENSE) file. + +[1]:https://developer.nvidia.com/CUPTI-CTK10_2 +[2]:https://github.com/fmt +[3]:https://github.com/google/googletest diff --git a/plugins/tensorboard-plugins/libkineto/include/AbstractConfig.h b/plugins/tensorboard-plugins/libkineto/include/AbstractConfig.h new file mode 100644 index 0000000000000000000000000000000000000000..1cadf4906c11c3b5f59e290295048cee7fd63acf --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/AbstractConfig.h @@ -0,0 +1,113 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include + +namespace KINETO_NAMESPACE { + +class AbstractConfig { + public: + AbstractConfig& operator=(const AbstractConfig&) = delete; + AbstractConfig(AbstractConfig&&) = delete; + AbstractConfig& operator=(AbstractConfig&&) = delete; + + virtual ~AbstractConfig() { + for (const auto& p : featureConfigs_) { + delete p.second; + } + } + + // Return a copy of the full derived class + virtual AbstractConfig* cloneDerived(AbstractConfig& parent) const = 0; + + // Returns true if successfully parsed the config string + bool parse(const std::string& conf); + + // Default setup for signal-triggered profiling + virtual void setSignalDefaults() { + for (auto& p : featureConfigs_) { + p.second->setSignalDefaults(); + } + } + + // Default setup for client-triggered profiling + virtual void setClientDefaults() { + for (auto& p : featureConfigs_) { + p.second->setClientDefaults(); + } + } + + // Time config was created / updated + std::chrono::time_point timestamp() const { + return timestamp_; + } + + // Source config string that this was parsed from + const std::string& source() const { + return source_; + } + + AbstractConfig& feature(std::string name) const { + const auto& pos = featureConfigs_.find(name); + return *pos->second; + } + + // Transfers ownership of cfg arg + void addFeature(const std::string& name, AbstractConfig* cfg) { + featureConfigs_[name] = cfg; + } + + protected: + AbstractConfig() {} + AbstractConfig(const AbstractConfig& other) = default; + + // Return true if the option was recognized and successfully parsed. + // Throw std::invalid_argument if val is invalid. + virtual bool handleOption(const std::string& name, std::string& val); + + // Perform post-validation checks, typically conditons involving + // multiple options. + // Throw std::invalid_argument if automatic correction can not be made. + // + // @param fallbackProfileStartTime Specify a fallback profile start timestamp in case it was never specified by the client + virtual void validate(const std::chrono::time_point& fallbackProfileStartTime) = 0; + + // TODO: Separate out each profiler type into features? + virtual void printActivityProfilerConfig(std::ostream& s) const; + + // Helpers for use in handleOption + // Split a string by delimiter and remove external white space + std::vector splitAndTrim(const std::string& s, char delim) const; + // Lowercase for case-insensitive comparisons + std::string toLower(std::string& s) const; + // Does string end with suffix + bool endsWith(const std::string& s, const std::string& suffix) const; + // Conversions + int64_t toIntRange(const std::string& val, int64_t min, int64_t max) const; + int32_t toInt32(const std::string& val) const; + int64_t toInt64(const std::string& val) const; + bool toBool(std::string& val) const; + + void cloneFeaturesInto(AbstractConfig& cfg) const { + for (const auto& feature : featureConfigs_) { + cfg.featureConfigs_[feature.first] = feature.second->cloneDerived(cfg); + } + } + + private: + // Time config was created / updated + std::chrono::time_point timestamp_{}; + + // Original configuration string, used for comparison + std::string source_{""}; + + // Configuration objects for optional features + std::map featureConfigs_{}; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/include/ActivityProfilerInterface.h b/plugins/tensorboard-plugins/libkineto/include/ActivityProfilerInterface.h new file mode 100644 index 0000000000000000000000000000000000000000..29871e47ab8af87888ccb8e20403bc26c433b5cc --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ActivityProfilerInterface.h @@ -0,0 +1,91 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include + +#include "ActivityType.h" +#include "ActivityTraceInterface.h" +#include "IActivityProfiler.h" + +namespace libkineto { + +class ActivityProfilerController; +struct CpuTraceBuffer; +class Config; + +class ActivityProfilerInterface { + + public: + virtual ~ActivityProfilerInterface() {}; + + virtual void init() {} + virtual bool isInitialized() { + return false; + } + virtual bool isActive(){ + return false; + } + + // *** Asynchronous API *** + // Instead of starting and stopping the trace manually, provide a start time + // and duration and / or iteration stop criterion. + // Tracing terminates when either condition is met. + virtual void scheduleTrace(const std::string& configStr) {} + + // *** Synchronous API *** + // These must be called in order: + // prepareTrace -> startTrace -> stopTrace. + + // Many tracing structures are lazily initialized during trace collection, + // with potentially high overhead. + // Call prepareTrace to enable tracing, then run the region to trace + // at least once (and ideally run the same code that is to be traced) to + // allow tracing structures to be initialized. + virtual void prepareTrace( + const std::set& activityTypes, + const std::string& configStr = "") {} + + // Start recording, potentially reusing any buffers allocated since + // prepareTrace was called. + virtual void startTrace() {} + + // Stop and process trace, producing an in-memory list of trace records. + // The processing will be done synchronously (using the calling thread.) + virtual std::unique_ptr stopTrace() { + return nullptr; + } + + // Re-evaluate internal state to allow for triggering operations based + // on number of iteration. each implicitly increments the iteration count + virtual void step() {} + + // *** TraceActivity API *** + // FIXME: Pass activityProfiler interface into clientInterface? + virtual void pushCorrelationId(uint64_t id){} + virtual void popCorrelationId(){} + virtual void transferCpuTrace( + std::unique_ptr traceBuffer){} + + // Correlation ids for user defined spans + virtual void pushUserCorrelationId(uint64_t){} + virtual void popUserCorrelationId(){} + + // Saves information for the current thread to be used in profiler output + // Client must record any new kernel thread where the activity has occured. + virtual void recordThreadInfo() {} + + // Record trace metadata, currently supporting only string key and values, + // values with the same key are overwritten + virtual void addMetadata(const std::string& key, const std::string& value) = 0; + + // Add a child activity profiler, this enables frameworks in the application + // to enable custom framework events. + virtual void addChildActivityProfiler( + std::unique_ptr profiler) {} +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/ActivityTraceInterface.h b/plugins/tensorboard-plugins/libkineto/include/ActivityTraceInterface.h new file mode 100644 index 0000000000000000000000000000000000000000..23d4edab00ce2fa90427e13818ac09c8541835ac --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ActivityTraceInterface.h @@ -0,0 +1,21 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include + +namespace libkineto { + +struct ITraceActivity; + +class ActivityTraceInterface { + public: + virtual ~ActivityTraceInterface() {} + virtual const std::vector* activities() { + return nullptr; + } + virtual void save(const std::string& path) {} +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/ActivityType.h b/plugins/tensorboard-plugins/libkineto/include/ActivityType.h new file mode 100644 index 0000000000000000000000000000000000000000..74c6a2531d6a9cee3196f9f889517926afea823f --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ActivityType.h @@ -0,0 +1,34 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include + +namespace libkineto { + +enum class ActivityType { + CPU_OP = 0, // cpu side ops + USER_ANNOTATION, + GPU_USER_ANNOTATION, + GPU_MEMCPY, + GPU_MEMSET, + CONCURRENT_KERNEL, // on-device kernels + EXTERNAL_CORRELATION, + CUDA_RUNTIME, // host side cuda runtime events + CUDA_PROFILER_RANGE, // CUPTI Profiler range for performance metrics + GLOW_RUNTIME, // host side glow runtime events + CPU_INSTANT_EVENT, // host side point-like events + PYTHON_FUNCTION, + OVERHEAD, // CUPTI induced overhead events sampled from its overhead API. + ENUM_COUNT // This is to add buffer and not used for any profiling logic. Add your new type before it. +}; + +const char* toString(ActivityType t); +ActivityType toActivityType(const std::string& str); + +// Return an array of all activity types except COUNT +constexpr int activityTypeCount = (int)ActivityType::ENUM_COUNT; +const std::array activityTypes(); + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/ClientInterface.h b/plugins/tensorboard-plugins/libkineto/include/ClientInterface.h new file mode 100644 index 0000000000000000000000000000000000000000..06dc075838164f80e9481b34a5d5d3c136b92efd --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ClientInterface.h @@ -0,0 +1,16 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +namespace libkineto { + +class ClientInterface { + public: + virtual ~ClientInterface() {} + virtual void init() = 0; + virtual void warmup(bool setupOpInputsCollection) = 0; + virtual void start() = 0; + virtual void stop() = 0; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/Config.h b/plugins/tensorboard-plugins/libkineto/include/Config.h new file mode 100644 index 0000000000000000000000000000000000000000..040e96c9f75ab3ab768aaebac28f959f12a3ea06 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/Config.h @@ -0,0 +1,433 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include "AbstractConfig.h" +#include "ActivityType.h" + +#include +#include +#include +#include +#include +#include + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +class Config : public AbstractConfig { + public: + Config(); + Config& operator=(const Config&) = delete; + Config(Config&&) = delete; + Config& operator=(Config&&) = delete; + + // Return a full copy including feature config object + std::unique_ptr clone() const { + auto cfg = std::unique_ptr(new Config(*this)); + cloneFeaturesInto(*cfg); + return cfg; + } + + bool handleOption(const std::string& name, std::string& val) override; + + void setClientDefaults() override; + + // Log events to this file + const std::string& eventLogFile() const { + return eventLogFile_; + } + + bool activityProfilerEnabled() const { + return activityProfilerEnabled_ || + activitiesOnDemandTimestamp_.time_since_epoch().count() > 0; + } + + // Log activitiy trace to this file + const std::string& activitiesLogFile() const { + return activitiesLogFile_; + } + + // Log activitiy trace to this url + const std::string& activitiesLogUrl() const { + return activitiesLogUrl_; + } + + void setActivitiesLogUrl(const std::string& url) { + activitiesLogUrl_ = url; + } + + bool activitiesLogToMemory() const { + return activitiesLogToMemory_; + } + + // Is profiling enabled for the given device? + bool eventProfilerEnabledForDevice(uint32_t dev) const { + return 0 != (eventProfilerDeviceMask_ & (1 << dev)); + } + + // Take a sample (read hardware counters) at this frequency. + // This controls how often counters are read - if all counters cannot + // be collected simultaneously then multiple samples are needed to + // collect all requested counters - see multiplex period. + std::chrono::milliseconds samplePeriod() const { + return samplePeriod_; + } + + void setSamplePeriod(std::chrono::milliseconds period) { + samplePeriod_ = period; + } + + // When all requested counters cannot be collected simultaneously, + // counters will be multiplexed at this frequency. + // Multiplexing can have a large performance impact if done frequently. + // To avoid a perf impact, keep this at 1s or above. + std::chrono::milliseconds multiplexPeriod() const { + return multiplexPeriod_; + } + + void setMultiplexPeriod(std::chrono::milliseconds period) { + multiplexPeriod_ = period; + } + + // Report counters at this frequency. Note that several samples can + // be reported each time, see samplesPerReport. + std::chrono::milliseconds reportPeriod() const { + return reportPeriod_; + } + + void setReportPeriod(std::chrono::milliseconds msecs); + + // Number of samples dispatched each report period. + // Must be in the range [1, report period / sample period]. + // In other words, aggregation is supported but not interpolation. + int samplesPerReport() const { + return samplesPerReport_; + } + + void setSamplesPerReport(int count) { + samplesPerReport_ = count; + } + + // The names of events to collect + const std::set& eventNames() const { + return eventNames_; + } + + // Add additional events to be profiled + void addEvents(const std::set& names) { + eventNames_.insert(names.begin(), names.end()); + } + + // The names of metrics to collect + const std::set& metricNames() const { + return metricNames_; + } + + // Add additional metrics to be profiled + void addMetrics(const std::set& names) { + metricNames_.insert(names.begin(), names.end()); + } + + const std::vector& percentiles() const { + return eventReportPercentiles_; + } + + // Profile for this long, then revert to base config + std::chrono::seconds eventProfilerOnDemandDuration() const { + return eventProfilerOnDemandDuration_; + } + + void setEventProfilerOnDemandDuration(std::chrono::seconds duration) { + eventProfilerOnDemandDuration_ = duration; + } + + // Too many event profilers on a single system can overload the driver. + // At some point, latencies shoot through the roof and collection of samples + // becomes impossible. To avoid this situation we have a limit of profilers + // per GPU. + // NOTE: Communication with a daemon is needed for this feature. + // Library must be built with an active DaemonConfigLoader. + int maxEventProfilersPerGpu() const { + return eventProfilerMaxInstancesPerGpu_; + } + + // On Cuda11 we've seen occasional hangs when reprogramming counters + // Monitor profiling threads and report when a thread is not responding + // for a given number of seconds. + // A period of 0 means disable. + std::chrono::seconds eventProfilerHeartbeatMonitorPeriod() const { + return eventProfilerHeartbeatMonitorPeriod_; + } + + // The types of activities selected in the configuration file + const std::set& selectedActivityTypes() const { + return selectedActivityTypes_; + } + + void setSelectedActivityTypes(const std::set& types) { + selectedActivityTypes_ = types; + } + + bool isOpInputsCollectionEnabled() const { + return enableOpInputsCollection_; + } + + // Trace for this long + std::chrono::milliseconds activitiesDuration() const { + return activitiesDuration_; + } + + // Trace for this many iterations, determined by external API + int activitiesRunIterations() const { + return activitiesRunIterations_; + } + + std::chrono::milliseconds activitiesDurationDefault() const; + + void setActivitiesDuration(std::chrono::milliseconds duration) { + activitiesDuration_ = duration; + } + + int activitiesMaxGpuBufferSize() const { + return activitiesMaxGpuBufferSize_; + } + + std::chrono::seconds activitiesWarmupDuration() const { + return activitiesWarmupDuration_; + } + + int activitiesWarmupIterations() const { + return activitiesWarmupIterations_; + } + + // Timestamp at which the profiling to start, requested by the user. + const std::chrono::time_point requestTimestamp() + const { + if (profileStartTime_.time_since_epoch().count()) { + return profileStartTime_; + } + + // TODO(T94634890): Deperecate requestTimestamp + return requestTimestamp_ + maxRequestAge() + activitiesWarmupDuration(); + } + + bool hasProfileStartTime() const { + return requestTimestamp_.time_since_epoch().count() > 0 || + profileStartTime_.time_since_epoch().count() > 0; + } + + int profileStartIteration() const { + return profileStartIteration_; + } + + bool hasProfileStartIteration() const { + return profileStartIteration_ >= 0 && activitiesRunIterations_ > 0; + } + + void setProfileStartIteration(int iter) { + profileStartIteration_ = iter; + } + + int profileStartIterationRoundUp() const { + return profileStartIterationRoundUp_; + } + + // calculate the start iteration accounting for warmup + int startIterationIncludingWarmup() const { + if (!hasProfileStartIteration()) { + return -1; + } + return profileStartIteration_ - activitiesWarmupIterations_; + } + + const std::chrono::seconds maxRequestAge() const; + + // All VLOG* macros will log if the verbose log level is >= + // the verbosity specified for the verbose log message. + // Default value is -1, so messages with log level 0 will log by default. + int verboseLogLevel() const { + return verboseLogLevel_; + } + + // Modules for which verbose logging is enabled. + // If empty, logging is enabled for all modules. + const std::vector& verboseLogModules() const { + return verboseLogModules_; + } + + bool sigUsr2Enabled() const { + return enableSigUsr2_; + } + + bool ipcFabricEnabled() const { + return enableIpcFabric_; + } + + static std::chrono::milliseconds alignUp( + std::chrono::milliseconds duration, + std::chrono::milliseconds alignment) { + duration += alignment; + return duration - (duration % alignment); + } + + std::chrono::time_point + eventProfilerOnDemandStartTime() const { + return eventProfilerOnDemandTimestamp_; + } + + std::chrono::time_point + eventProfilerOnDemandEndTime() const { + return eventProfilerOnDemandTimestamp_ + eventProfilerOnDemandDuration_; + } + + std::chrono::time_point + activityProfilerRequestReceivedTime() const { + return activitiesOnDemandTimestamp_; + } + + // Users may request and set trace id and group trace id. + const std::string& requestTraceID() const { + return requestTraceID_; + } + + void setRequestTraceID(const std::string& tid) { + requestTraceID_ = tid; + } + + const std::string& requestGroupTraceID() const { + return requestGroupTraceID_; + } + + void setRequestGroupTraceID(const std::string& gtid) { + requestGroupTraceID_ = gtid; + } + + void updateActivityProfilerRequestReceivedTime(); + + void printActivityProfilerConfig(std::ostream& s) const override; + + void validate( + const std::chrono::time_point& fallbackProfileStartTime) override; + + static void addConfigFactory( + std::string name, + std::function factory); + + void print(std::ostream& s) const; + + private: + explicit Config(const Config& other) = default; + + AbstractConfig* cloneDerived(AbstractConfig& parent) const override { + // Clone from AbstractConfig not supported + assert(false); + return nullptr; + } + + uint8_t createDeviceMask(const std::string& val); + + // Adds valid activity types from the user defined string list in the + // configuration file + void setActivityTypes(const std::vector& selected_activities); + + // Sets the default activity types to be traced + void selectDefaultActivityTypes() { + // If the user has not specified an activity list, add all types + for (ActivityType t : activityTypes()) { + // Do no enable this by default + // TODO: introduce optional types + if (t != ActivityType::OVERHEAD) { + selectedActivityTypes_.insert(t); + } + } + } + + int verboseLogLevel_; + std::vector verboseLogModules_; + + // Event profiler + // These settings are also supported in on-demand mode + std::chrono::milliseconds samplePeriod_; + std::chrono::milliseconds reportPeriod_; + int samplesPerReport_; + std::set eventNames_; + std::set metricNames_; + + // On-demand duration + std::chrono::seconds eventProfilerOnDemandDuration_; + // Last on-demand request + std::chrono::time_point + eventProfilerOnDemandTimestamp_; + + int eventProfilerMaxInstancesPerGpu_; + + // Monitor whether event profiler threads are stuck + // at this frequency + std::chrono::seconds eventProfilerHeartbeatMonitorPeriod_; + + // These settings can not be changed on-demand + std::string eventLogFile_; + std::vector eventReportPercentiles_ = {5, 25, 50, 75, 95}; + uint8_t eventProfilerDeviceMask_ = ~0; + std::chrono::milliseconds multiplexPeriod_; + + // Activity profiler + bool activityProfilerEnabled_; + std::set selectedActivityTypes_; + + // The activity profiler settings are all on-demand + std::string activitiesLogFile_; + + std::string activitiesLogUrl_; + + // Log activities to memory buffer + bool activitiesLogToMemory_{false}; + + int activitiesMaxGpuBufferSize_; + std::chrono::seconds activitiesWarmupDuration_; + int activitiesWarmupIterations_; + + // Client Interface + // Enable inputs collection when tracing ops + bool enableOpInputsCollection_{true}; + + // Profile for specified iterations and duration + std::chrono::milliseconds activitiesDuration_; + int activitiesRunIterations_; + + // Below are not used + // Use this net name for iteration count + std::string activitiesExternalAPIIterationsTarget_; + // Only profile nets that includes this in the name + std::vector activitiesExternalAPIFilter_; + // Only profile nets with at least this many operators + int activitiesExternalAPINetSizeThreshold_; + // Only profile nets with at least this many GPU operators + int activitiesExternalAPIGpuOpCountThreshold_; + // Last activity profiler request + std::chrono::time_point + activitiesOnDemandTimestamp_; + + // Synchronized start timestamp + std::chrono::time_point profileStartTime_; + // or start iteration + int profileStartIteration_; + int profileStartIterationRoundUp_; + + // DEPRECATED + std::chrono::time_point requestTimestamp_; + + // Enable profiling via SIGUSR2 + bool enableSigUsr2_; + + // Enable IPC Fabric instead of thrift communication + bool enableIpcFabric_; + + // Logger Metadata + std::string requestTraceID_; + std::string requestGroupTraceID_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/include/GenericTraceActivity.h b/plugins/tensorboard-plugins/libkineto/include/GenericTraceActivity.h new file mode 100644 index 0000000000000000000000000000000000000000..4272cf1efa4e7613a46c3684270b4e803853345b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/GenericTraceActivity.h @@ -0,0 +1,125 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include + +#include "ThreadUtil.h" +#include "ITraceActivity.h" +#include "TraceSpan.h" + +namespace libkineto { + +// Link type, used in GenericTraceActivity.flow.type +constexpr unsigned int kLinkFwdBwd = 1; +constexpr unsigned int kLinkAsyncCpuGpu = 2; + +// @lint-ignore-every CLANGTIDY cppcoreguidelines-non-private-member-variables-in-classes +// @lint-ignore-every CLANGTIDY cppcoreguidelines-pro-type-member-init +class GenericTraceActivity : public ITraceActivity { + + public: + GenericTraceActivity() : activityType(ActivityType::ENUM_COUNT), traceSpan_(NULL) {} + + GenericTraceActivity( + const TraceSpan& trace, ActivityType type, const std::string& name) + : activityType(type), activityName(name), traceSpan_(&trace) { + } + + int64_t deviceId() const override { + return device; + } + + int64_t resourceId() const override { + return resource; + } + + int32_t getThreadId() const override { + return threadId; + } + + int64_t timestamp() const override { + return startTime; + } + + int64_t duration() const override { + return endTime - startTime; + } + + int64_t correlationId() const override { + return id; + } + + ActivityType type() const override { + return activityType; + } + + const ITraceActivity* linkedActivity() const override { + return nullptr; + } + + int flowType() const override { + return flow.type; + } + + int flowId() const override { + return flow.id; + } + + bool flowStart() const override { + return flow.start; + } + + const std::string name() const override { + return activityName; + } + + const TraceSpan* traceSpan() const override { + return traceSpan_; + } + + void log(ActivityLogger& logger) const override; + + //Encode client side metadata as a key/value + template + void addMetadata(const std::string& key, const ValType& value) { + metadata_.push_back(fmt::format("\"{}\": {}", key, value)); + } + + void addMetadataQuoted(const std::string& key, const std::string& value) { + metadata_.push_back(fmt::format("\"{}\": \"{}\"", key, value)); + } + + const std::string metadataJson() const override { + return fmt::format("{}", fmt::join(metadata_, ", ")); + } + + virtual ~GenericTraceActivity() {}; + + int64_t startTime{0}; + int64_t endTime{0}; + int32_t id{0}; + int32_t device{0}; + int32_t resource{0}; + int32_t threadId{0}; + ActivityType activityType; + std::string activityName; + struct Flow { + Flow(): id(0), type(0), start(0) {} + // Ids must be unique within each type + uint32_t id : 27; + // Type will be used to connect flows between profilers, as + // well as look up flow information (name etc) + uint32_t type : 4; + uint32_t start : 1; + } flow; + + private: + const TraceSpan* traceSpan_; + std::vector metadata_; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/IActivityProfiler.h b/plugins/tensorboard-plugins/libkineto/include/IActivityProfiler.h new file mode 100644 index 0000000000000000000000000000000000000000..f5d4b3fb828a3348d948c6487acc6a9e5a18f836 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/IActivityProfiler.h @@ -0,0 +1,104 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include + +#include "Config.h" +#include "GenericTraceActivity.h" + +/* This file includes an abstract base class for an activity profiler + * that can be implemented by multiple tracing agents in the application. + * The high level Kineto profiler can co-ordinate start and end of tracing + * and combine together events from multiple such activity profilers. + */ + +namespace libkineto { + +using namespace KINETO_NAMESPACE; + +#ifdef _MSC_VER +// workaround for the predefined ERROR macro on Windows +#undef ERROR +#endif // _MSC_VER + +enum class TraceStatus { + READY, // Accepting trace requests + WARMUP, // Performing trace warmup + RECORDING, // Actively collecting activities + PROCESSING, // Recording is complete, preparing results + ERROR, // One or more errors (and possibly also warnings) occurred. + WARNING, // One or more warnings occurred. +}; + +/* IActivityProfilerSession: + * an opaque object that can be used by a high level profiler to + * start/stop and return trace events. + */ +class IActivityProfilerSession { + + public: + virtual ~IActivityProfilerSession() {} + + // start the trace collection synchronously + virtual void start() = 0; + + // stop the trace collection synchronously + virtual void stop() = 0; + + TraceStatus status() { + return status_; + } + + // returns list of Trace Activities + virtual std::vector& activities() = 0; + + // returns errors with this trace + virtual std::vector errors() = 0; + + // processes trace activities using logger + virtual void processTrace(ActivityLogger& logger) = 0; + + // XXX define trace formats + // virtual save(string name, TraceFormat format) + + protected: + TraceStatus status_ = TraceStatus::READY; +}; + + +/* Activity Profiler Plugins: + * These allow other frameworks to integrate into Kineto's primariy + * activity profiler. While the primary activity profiler handles + * timing the trace collections and correlating events the plugins + * can become source of new trace activity types. + */ +class IActivityProfiler { + + public: + + virtual ~IActivityProfiler() {} + + // name of profiler + virtual const std::string& name() const = 0; + + // returns activity types this profiler supports + virtual const std::set& availableActivities() const = 0; + + // Calls prepare() on registered tracer providers passing in the relevant + // activity types. Returns a profiler session handle + virtual std::unique_ptr configure( + const std::set& activity_types, + const Config& config) = 0; + + // asynchronous version of the above with future timestamp and duration. + virtual std::unique_ptr configure( + int64_t ts_ms, + int64_t duration_ms, + const std::set& activity_types, + const Config& config) = 0; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/ILoggerObserver.h b/plugins/tensorboard-plugins/libkineto/include/ILoggerObserver.h new file mode 100644 index 0000000000000000000000000000000000000000..4fce7851b9669ff93a3f3a772140b0466674853c --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ILoggerObserver.h @@ -0,0 +1,50 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +// Stages in libkineto used when pushing logs to UST Logger. +constexpr char kWarmUpStage[] = "Warm Up"; +constexpr char kCollectionStage[] = "Collection"; +constexpr char kPostProcessingStage[] = "Post Processing"; + +#if !USE_GOOGLE_LOG + +#include +#include + +namespace libkineto { + +enum LoggerOutputType { + VERBOSE = 0, + INFO = 1, + WARNING = 2, + ERROR = 3, + STAGE = 4, + ENUM_COUNT = 5 +}; + +const char* toString(LoggerOutputType t); +LoggerOutputType toLoggerOutputType(const std::string& str); + +constexpr int LoggerTypeCount = (int) LoggerOutputType::ENUM_COUNT; + +class ILoggerObserver { + public: + virtual ~ILoggerObserver() = default; + virtual void write(const std::string& message, LoggerOutputType ot) = 0; + virtual const std::map> extractCollectorMetadata() = 0; + virtual void reset() = 0; + virtual void addDevice(const int64_t device) = 0; + virtual void setTraceDurationMS(const int64_t duration) = 0; + virtual void addEventCount(const int64_t count) = 0; + virtual void setTraceID(const std::string&) {} + virtual void setGroupTraceID(const std::string&) {} + virtual void addDestination(const std::string& dest) = 0; + +}; + +} // namespace libkineto + +#endif // !USE_GOOGLE_LOG diff --git a/plugins/tensorboard-plugins/libkineto/include/ITraceActivity.h b/plugins/tensorboard-plugins/libkineto/include/ITraceActivity.h new file mode 100644 index 0000000000000000000000000000000000000000..a477ed814662cb4c57738b7e40ec6052e9f65288 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ITraceActivity.h @@ -0,0 +1,53 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +#include "ActivityType.h" + +namespace libkineto { + +class ActivityLogger; +struct TraceSpan; + +// Generic activity interface is borrowed from tensorboard protobuf format. +struct ITraceActivity { + virtual ~ITraceActivity() {} + // Device is a physical or logical entity, e.g. CPU, GPU or process + virtual int64_t deviceId() const = 0; + // A resource is something on the device, h/w thread, + // functional units etc. + virtual int64_t resourceId() const = 0; + // s/w thread + virtual int32_t getThreadId() const = 0; + // Start timestamp in mucrosecond + virtual int64_t timestamp() const = 0; + // Duration in microseconds + virtual int64_t duration() const = 0; + // Used to link up async activities + virtual int64_t correlationId() const = 0; + // Part of a flow, identified by flow id and type + virtual int flowType() const = 0; + virtual int flowId() const = 0; + virtual bool flowStart() const = 0; + virtual ActivityType type() const = 0; + virtual const std::string name() const = 0; + // Optional linked activity + virtual const ITraceActivity* linkedActivity() const = 0; + // Optional containing trace object + virtual const TraceSpan* traceSpan() const = 0; + // Log activity + virtual void log(ActivityLogger& logger) const = 0; + // Return json formatted metadata + // FIXME: Return iterator to dynamic type map here instead + virtual const std::string metadataJson() const = 0; + + static int64_t nsToUs(int64_t ns) { + // It's important that this conversion is the same everywhere. + // No rounding! + return ns / 1000; + } +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/ThreadUtil.h b/plugins/tensorboard-plugins/libkineto/include/ThreadUtil.h new file mode 100644 index 0000000000000000000000000000000000000000..d1dc80ad2ab0dfd3bea313363fb0e6565349889c --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/ThreadUtil.h @@ -0,0 +1,22 @@ +#pragma once + +#include +#include +#include +#include + +namespace libkineto { + +int32_t systemThreadId(); +int32_t threadId(); +bool setThreadName(const std::string& name); +std::string getThreadName(); + +int32_t processId(); +std::string processName(int32_t pid); + +// Return a list of pids and process names for the current process +// and its parents. +std::vector> pidCommandPairsOfAncestors(); + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/TraceSpan.h b/plugins/tensorboard-plugins/libkineto/include/TraceSpan.h new file mode 100644 index 0000000000000000000000000000000000000000..af9a9d5ee556830ac34568e6c81ec4f8f00da2e3 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/TraceSpan.h @@ -0,0 +1,36 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include + +namespace libkineto { + +struct TraceSpan { + TraceSpan() = delete; + TraceSpan( + int64_t startTime, int64_t endTime, std::string name) + : startTime(startTime), endTime(endTime), name(std::move(name)) { + } + TraceSpan( + int opCount, int it, std::string name, std::string prefix) + : opCount(opCount), + iteration(it), + name(std::move(name)), + prefix(std::move(prefix)) { + } + + // FIXME: change to duration? + int64_t startTime{0}; + int64_t endTime{0}; + int opCount{0}; + int iteration{-1}; + // Name is used to identify timeline + std::string name; + // Prefix used to distinguish trace spans on the same timeline + std::string prefix; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/libkineto.h b/plugins/tensorboard-plugins/libkineto/include/libkineto.h new file mode 100644 index 0000000000000000000000000000000000000000..87c3d64f638dad9d1c2d24c013135db60d477642 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/libkineto.h @@ -0,0 +1,138 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +// Mediator for initialization and profiler control + +#pragma once + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ActivityProfilerInterface.h" +#include "ActivityType.h" +#include "ClientInterface.h" +#include "GenericTraceActivity.h" +#include "TraceSpan.h" +#include "IActivityProfiler.h" +#include "ActivityTraceInterface.h" + +#include "ThreadUtil.h" + +extern "C" { + void suppressLibkinetoLogMessages(); + int InitializeInjection(void); + bool libkineto_init(bool cpuOnly, bool logOnError); +} + +namespace libkineto { + +class Config; +class ConfigLoader; + +struct CpuTraceBuffer { + TraceSpan span{0, 0, "none"}; + int gpuOpCount; + std::deque activities; +}; + +using ChildActivityProfilerFactory = + std::function()>; + +class LibkinetoApi { + public: + + explicit LibkinetoApi(ConfigLoader& configLoader) + : configLoader_(configLoader) { + } + + // Called by client that supports tracing API. + // libkineto can still function without this. + void registerClient(ClientInterface* client); + + // Called by libkineto on init + void registerProfiler(std::unique_ptr profiler) { + activityProfiler_ = std::move(profiler); + initClientIfRegistered(); + } + + ActivityProfilerInterface& activityProfiler() { + return *activityProfiler_; + } + + ClientInterface* client() { + return client_; + } + + void initProfilerIfRegistered() { + static std::once_flag once; + if (activityProfiler_) { + std::call_once(once, [this] { + if (!activityProfiler_->isInitialized()) { + activityProfiler_->init(); + initChildActivityProfilers(); + } + }); + } + } + + bool isProfilerInitialized() const { + return activityProfiler_ && activityProfiler_->isInitialized(); + } + + bool isProfilerRegistered() const { + return activityProfiler_ != nullptr; + } + + void suppressLogMessages() { + suppressLibkinetoLogMessages(); + } + + // Provides access to profier configuration manaegement + ConfigLoader& configLoader() { + return configLoader_; + } + + void registerProfilerFactory( + ChildActivityProfilerFactory factory) { + if (isProfilerInitialized()) { + activityProfiler_->addChildActivityProfiler(factory()); + } else { + childProfilerFactories_.push_back(factory); + } + } + + private: + + void initChildActivityProfilers() { + if (!isProfilerInitialized()) { + return; + } + for (const auto& factory : childProfilerFactories_) { + activityProfiler_->addChildActivityProfiler(factory()); + } + childProfilerFactories_.clear(); + } + + // Client is initialized once both it and libkineto has registered + void initClientIfRegistered(); + + ConfigLoader& configLoader_; + std::unique_ptr activityProfiler_{}; + ClientInterface* client_{}; + int32_t clientRegisterThread_{0}; + + bool isLoaded_{false}; + std::vector childProfilerFactories_; +}; + +// Singleton +LibkinetoApi& api(); + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/include/time_since_epoch.h b/plugins/tensorboard-plugins/libkineto/include/time_since_epoch.h new file mode 100644 index 0000000000000000000000000000000000000000..caa6b4d92760d384eca2b1383a679fe7435c53b3 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/include/time_since_epoch.h @@ -0,0 +1,16 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +namespace libkineto { + +inline int64_t timeSinceEpoch( + const std::chrono::time_point& t) { + return std::chrono::duration_cast( + t.time_since_epoch()) + .count(); +} + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/libkineto_defs.bzl b/plugins/tensorboard-plugins/libkineto/libkineto_defs.bzl new file mode 100644 index 0000000000000000000000000000000000000000..330c54a22dfcedf895f0eba4077713a7c4cd8072 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/libkineto_defs.bzl @@ -0,0 +1,77 @@ +# Copyright (c) Facebook, Inc. and its affiliates. +# All rights reserved. +# This source code is licensed under the BSD-style license found in the +# LICENSE file in the root directory of this source tree. + +def get_libkineto_api_srcs(): + return [ + "src/ThreadUtil.cpp", + "src/libkineto_api.cpp", + ] + +def get_libkineto_cupti_srcs(with_api = True): + return [ + "src/CudaDeviceProperties.cpp", + "src/CuptiActivityApi.cpp", + "src/CuptiActivityPlatform.cpp", + "src/CuptiCallbackApi.cpp", + "src/CuptiEventApi.cpp", + "src/CuptiMetricApi.cpp", + "src/CuptiRangeProfilerApi.cpp", + "src/Demangle.cpp", + "src/EventProfiler.cpp", + "src/EventProfilerController.cpp", + "src/WeakSymbols.cpp", + "src/cupti_strings.cpp", + ] + (get_libkineto_cpu_only_srcs(with_api)) + +def get_libkineto_roctracer_srcs(with_api = True): + return [ + "src/RoctracerActivityApi.cpp", + ] + (get_libkineto_cpu_only_srcs(with_api)) + +def get_libkineto_cpu_only_srcs(with_api = True): + return [ + "src/AbstractConfig.cpp", + "src/CuptiActivityProfiler.cpp", + "src/ActivityProfilerController.cpp", + "src/ActivityProfilerProxy.cpp", + "src/ActivityType.cpp", + "src/Config.cpp", + "src/ConfigLoader.cpp", + "src/CuptiActivityApi.cpp", + "src/Demangle.cpp", + "src/GenericTraceActivity.cpp", + "src/ILoggerObserver.cpp", + "src/Logger.cpp", + "src/init.cpp", + "src/output_csv.cpp", + "src/output_json.cpp", + ] + (get_libkineto_api_srcs() if with_api else []) + +def get_libkineto_public_headers(): + return [ + "include/AbstractConfig.h", + "include/ActivityProfilerInterface.h", + "include/ActivityType.h", + "include/Config.h", + "include/ClientInterface.h", + "include/GenericTraceActivity.h", + "include/GenericTraceActivity.h", + "include/IActivityProfiler.h", + "include/ILoggerObserver.h", + "include/ITraceActivity.h", + "include/TraceSpan.h", + "include/ThreadUtil.h", + "include/libkineto.h", + "include/time_since_epoch.h", + ] + +# kineto code should be updated to not have to +# suppress these warnings. +KINETO_COMPILER_FLAGS = [ + "-fexceptions", + "-Wno-deprecated-declarations", + "-Wno-unused-function", + "-Wno-unused-private-field", +] diff --git a/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cpp b/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cpp new file mode 100644 index 0000000000000000000000000000000000000000..780047912ed09996d3952901267d46aab99cf78c --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cpp @@ -0,0 +1,38 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include +#include + +#include +#include + +#include "kineto/libkineto/sample_programs/kineto_playground.cuh" + +using namespace kineto; + +static const std::string kFileName = "/tmp/kineto_playground_trace.json"; + +int main() { + warmup(); + + // Kineto config + + // Empty types set defaults to all types + std::set types; + + auto& profiler = libkineto::api().activityProfiler(); + libkineto::api().initProfilerIfRegistered(); + profiler.prepareTrace(types); + + // Good to warm up after prepareTrace to get cupti initialization to settle + warmup(); + profiler.startTrace(); + playground(); + + auto trace = profiler.stopTrace(); + LOG(INFO) << "Stopped and processed trace. Got " << trace->activities()->size() << " activities."; + trace->save(kFileName); + return 0; +} + diff --git a/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cu b/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cu new file mode 100644 index 0000000000000000000000000000000000000000..54c6f82ff4be2e468c0e868b49b3a9130de97490 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cu @@ -0,0 +1,60 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include + +#include "kineto_playground.cuh" + + +namespace kineto { + +void warmup(void) { + // Inititalizing CUDA can take a while which we normally do not want to see in Kineto traces. + // This is done in various ways that take Kineto as dependency. This is our way of doing warmup + // for kineto_playground + size_t bytes = 1000; + float* mem = NULL; + auto error = cudaMalloc(&mem, bytes); + if (error != cudaSuccess) { + printf("cudaMalloc failed during kineto_playground warmup. error code: %d", error); + return; + } + + cudaFree(mem); +} + +void basicMemcpyMemset(void) { + size_t size = (1 << 8) * sizeof(float); + float *hostMemSrc, *deviceMem, *hostMemDst; + cudaError_t err; + + hostMemSrc = (float*)malloc(size); + hostMemDst = (float*)malloc(size); + err = cudaMalloc(&deviceMem, size); + if (err != cudaSuccess) { + printf("cudaMalloc failed during %s", __func__); + return; + } + + memset(hostMemSrc, 1, size); + cudaMemcpy(deviceMem, hostMemSrc, size, cudaMemcpyHostToDevice); + if (err != cudaSuccess) { + printf("cudaMemcpy failed during %s", __func__); + return; + } + + cudaMemcpy(hostMemDst, deviceMem, size, cudaMemcpyDeviceToHost); + if (err != cudaSuccess) { + printf("cudaMemcpy failed during %s", __func__); + return; + } + + free(hostMemSrc); + free(hostMemDst); + cudaFree(deviceMem); +} + +void playground(void) { + // Add your experimental CUDA implementation here. +} + +} diff --git a/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cuh b/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cuh new file mode 100644 index 0000000000000000000000000000000000000000..54e1ee59ada9ae88370b38146567ed87be2b914b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/sample_programs/kineto_playground.cuh @@ -0,0 +1,18 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +namespace kineto { + +// Warms up CUDA before the tracing starts +void warmup(void); + +// Basic usage of cudaMemcpy and cudaMemset +void basicMemcpyMemset(void); + +// Your experimental code goes in here! +void playground(void); + +} diff --git a/plugins/tensorboard-plugins/libkineto/src/AbstractConfig.cpp b/plugins/tensorboard-plugins/libkineto/src/AbstractConfig.cpp new file mode 100644 index 0000000000000000000000000000000000000000..d60ab43c9a3e198167beb7987d619b0bb8e9ed13 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/AbstractConfig.cpp @@ -0,0 +1,188 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "AbstractConfig.h" + +#include +#include +#include + +#include "Logger.h" + +using namespace std::chrono; + +using std::string; +using std::vector; + +namespace KINETO_NAMESPACE { + +constexpr char kWhitespace[] = "\t\n "; + +static bool isWhitespace(string& s) { + return s.find_first_not_of(kWhitespace) == string::npos; +} + +// Remove whitespace from both end of string +static inline string trim(string& s) { + if (s.empty()) { + return s; + } else if (isWhitespace(s)) { + return ""; + } + auto start = s.find_first_not_of(kWhitespace); + auto end = s.find_last_not_of(kWhitespace); + return s.substr(start, end - start + 1); +} + +// Helper function for split. +// Return the index of char d in string s. +// If not found, returns the length of the string. +static int find(const char* s, char delim) { + int i; + for (i = 0; s[i]; i++) { + if (s[i] == delim) { + break; + } + } + return i; +} + +// Split a string by delimiter +static vector split(const string& s, char delim) { + vector res; + const char* cs = s.c_str(); + for (int i = find(cs, delim); cs[i]; cs += i + 1, i = find(cs, delim)) { + res.emplace_back(cs, i); + } + res.emplace_back(cs); + return res; +} + +// Remove a trailing comment. +static inline string stripComment(const string& s) { + std::size_t pos = s.find("#"); + return s.substr(0, pos); +} + +string AbstractConfig::toLower(string& s) const { + string res = s; + for (int i = 0; i < res.size(); i++) { + if (res[i] >= 'A' && res[i] <= 'Z') { + res[i] += ('a' - 'A'); + } + } + return res; +} + +bool AbstractConfig::endsWith(const string& s, const string& suffix) const { + if (suffix.size() > s.size()) { + return false; + } + return s.compare(s.size() - suffix.size(), suffix.size(), suffix) == 0; +} + +vector AbstractConfig::splitAndTrim(const string& s, char delim) const { + auto res = split(s, delim); + for (string& x : res) { + x = trim(x); + } + return res; +} + +int64_t AbstractConfig::toIntRange(const string& val, int64_t min, int64_t max) + const { + char* invalid; + int64_t res = strtoll(val.c_str(), &invalid, 10); + if (val.empty() || *invalid) { + throw std::invalid_argument(fmt::format("Invalid integer: {}", val)); + } else if (res < min || res > max) { + throw std::invalid_argument(fmt::format( + "Invalid argument: {} - expected range [{}, {}]", res, min, max)); + } + return res; +} + +int32_t AbstractConfig::toInt32(const string& val) const { + return toIntRange(val, 0, ~0u / 2); +} + +int64_t AbstractConfig::toInt64(const string& val) const { + return toIntRange(val, 0, ~0ul / 2); +} + +bool AbstractConfig::toBool(string& val) const { + const std::array bool_vals{ + "n", "y", "no", "yes", "f", "t", "false", "true"}; + const string lower_val = toLower(val); + for (int i = 0; i < bool_vals.size(); i++) { + if (lower_val == bool_vals[i]) { + return i % 2; + } + } + throw std::invalid_argument(fmt::format("Invalid bool argument: {}", val)); + return false; +} + +bool AbstractConfig::parse(const string& conf) { + std::istringstream iss(conf); + string line; + + timestamp_ = system_clock::now(); + + // Read the string stream 1 line at a time to parse. + while (std::getline(iss, line)) { + line = stripComment(line); + if (isWhitespace(line)) { + continue; + } + vector key_val = splitAndTrim(line, '='); + if (key_val.size() != 2) { + LOG(ERROR) << "Invalid config line: " << line; + return false; + } else { + bool handled = false; + try { + handled = handleOption(key_val[0], key_val[1]); + if (!handled) { + for (auto& feature_cfg : featureConfigs_) { + if (feature_cfg.second->handleOption(key_val[0], key_val[1])) { + handled = true; + break; + } + } + } + } catch (const std::exception& e) { + LOG(ERROR) << "Failed to parse config line: " << line; + LOG(ERROR) << e.what(); + return false; + } + if (!handled) { + // This might be due to using a newer config option on an + // older binary where it is not supported. In this case, + // print a warning message - but it is expected to work! + LOG(WARNING) << "Unrecognized config line: " << line; + } + } + } + + validate(timestamp_); + + // Store original text, used to detect updates + source_ = conf; + timestamp_ = system_clock::now(); + return true; +} + +bool AbstractConfig::handleOption( + const std::string& /* unused */, + std::string& /* unused */) { + LOG(ERROR) << "handleOption unimplemented"; + return false; +} + +void AbstractConfig::printActivityProfilerConfig(std::ostream& s) const { + for (const auto& feature_cfg : featureConfigs_) { + feature_cfg.second->printActivityProfilerConfig(s); + } +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityBuffers.h b/plugins/tensorboard-plugins/libkineto/src/ActivityBuffers.h new file mode 100644 index 0000000000000000000000000000000000000000..157af879379a5f5fc5e274f22604987a97f17af4 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityBuffers.h @@ -0,0 +1,29 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + + +#include +#include + +#include "libkineto.h" +#include "CuptiActivityBuffer.h" + +namespace KINETO_NAMESPACE { + +struct ActivityBuffers { + std::list> cpu; + std::unique_ptr gpu; + + // Add a wrapper object to the underlying struct stored in the buffer + template + const ITraceActivity& addActivityWrapper(const T& act) { + wrappers_.push_back(std::make_unique(act)); + return *wrappers_.back().get(); + } + + private: + std::vector> wrappers_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityLoggerFactory.h b/plugins/tensorboard-plugins/libkineto/src/ActivityLoggerFactory.h new file mode 100644 index 0000000000000000000000000000000000000000..0d1bf642cd68051e487004d33e19c5eb181e1c41 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityLoggerFactory.h @@ -0,0 +1,60 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include + +namespace KINETO_NAMESPACE { + +class ActivityLogger; + +class ActivityLoggerFactory { + + public: + using FactoryFunc = + std::function(const std::string& url)>; + + // Add logger factory for a protocol prefix + void addProtocol(const std::string& protocol, FactoryFunc f) { + factories_[tolower(protocol)] = f; + } + + // Create a logger, invoking the factory for the protocol specified in url + std::unique_ptr makeLogger(const std::string& url) const { + std::string protocol = extractProtocol(url); + auto it = factories_.find(tolower(protocol)); + if (it != factories_.end()) { + return it->second(stripProtocol(url)); + } + throw std::invalid_argument(fmt::format( + "No logger registered for the {} protocol prefix", + protocol)); + return nullptr; + } + + private: + static std::string tolower(std::string s) { + std::transform(s.begin(), s.end(), s.begin(), + [](unsigned char c) { return std::tolower(c); } + ); + return s; + } + + static std::string extractProtocol(std::string url) { + return url.substr(0, url.find("://")); + } + + static std::string stripProtocol(std::string url) { + size_t pos = url.find("://"); + return pos == url.npos ? url : url.substr(pos + 3); + } + + std::map factories_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerController.cpp b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerController.cpp new file mode 100644 index 0000000000000000000000000000000000000000..c85d41ed73ff059bcd7ee69c36a0bcc6c3d5c4ca --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerController.cpp @@ -0,0 +1,246 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "ActivityProfilerController.h" + +#include +#include + +#include "ActivityLoggerFactory.h" +#include "ActivityTrace.h" +#include "CuptiActivityApi.h" +#ifdef HAS_ROCTRACER +#include "RoctracerActivityApi.h" +#endif +#include "ThreadUtil.h" +#include "output_json.h" +#include "output_membuf.h" + +#include "Logger.h" + +using namespace std::chrono; + +namespace KINETO_NAMESPACE { + +constexpr milliseconds kProfilerIntervalMsecs(1000); + +ActivityProfilerController::ActivityProfilerController( + ConfigLoader& configLoader, bool cpuOnly) + : configLoader_(configLoader) { +#ifdef HAS_ROCTRACER + profiler_ = std::make_unique( + RoctracerActivityApi::singleton(), cpuOnly); +#else + profiler_ = std::make_unique( + CuptiActivityApi::singleton(), cpuOnly); +#endif + configLoader_.addHandler(ConfigLoader::ConfigKind::ActivityProfiler, this); +} + +ActivityProfilerController::~ActivityProfilerController() { + configLoader_.removeHandler( + ConfigLoader::ConfigKind::ActivityProfiler, this); + if (profilerThread_) { + // signaling termination of the profiler loop + stopRunloop_ = true; + profilerThread_->join(); + delete profilerThread_; + profilerThread_ = nullptr; + } +} + +static ActivityLoggerFactory initLoggerFactory() { + ActivityLoggerFactory factory; + factory.addProtocol("file", [](const std::string& url) { + return std::unique_ptr(new ChromeTraceLogger(url)); + }); + return factory; +} + +static ActivityLoggerFactory& loggerFactory() { + static ActivityLoggerFactory factory = initLoggerFactory(); + return factory; +} + +void ActivityProfilerController::addLoggerFactory( + const std::string& protocol, ActivityLoggerFactory::FactoryFunc factory) { + loggerFactory().addProtocol(protocol, factory); +} + +static std::unique_ptr makeLogger(const Config& config) { + if (config.activitiesLogToMemory()) { + return std::make_unique(config); + } + return loggerFactory().makeLogger(config.activitiesLogUrl()); +} + +bool ActivityProfilerController::canAcceptConfig() { + return !profiler_->isActive(); +} + +void ActivityProfilerController::acceptConfig(const Config& config) { + VLOG(1) << "acceptConfig"; + if (config.activityProfilerEnabled()) { + scheduleTrace(config); + } +} + +void ActivityProfilerController::profilerLoop() { + setThreadName("Kineto Activity Profiler"); + VLOG(0) << "Entering activity profiler loop"; + + auto now = system_clock::now(); + auto next_wakeup_time = now + kProfilerIntervalMsecs; + + while (!stopRunloop_) { + now = system_clock::now(); + + while (now < next_wakeup_time) { + /* sleep override */ + std::this_thread::sleep_for(next_wakeup_time - now); + now = system_clock::now(); + } + + if (!profiler_->isActive()) { + std::lock_guard lock(asyncConfigLock_); + if (asyncRequestConfig_ + && !asyncRequestConfig_->hasProfileStartIteration()) { + // Note on now + kProfilerIntervalMsecs + // Profiler interval does not align perfectly upto startTime - warmup. Waiting until the next tick + // won't allow sufficient time for the profiler to warm up. So check if we are very close to the warmup time and trigger warmup + if (now + kProfilerIntervalMsecs + >= (asyncRequestConfig_->requestTimestamp() - asyncRequestConfig_->activitiesWarmupDuration())) { + LOG(INFO) << "Received on-demand activity trace request by " + << " profile timestamp = " + << asyncRequestConfig_-> + requestTimestamp().time_since_epoch().count(); + activateConfig(now); + } + } + } + + while (next_wakeup_time < now) { + next_wakeup_time += kProfilerIntervalMsecs; + } + + if (profiler_->isActive()) { + next_wakeup_time = profiler_->performRunLoopStep(now, next_wakeup_time); + VLOG(1) << "Profiler loop: " + << duration_cast(system_clock::now() - now).count() + << "ms"; + } + } + + VLOG(0) << "Exited activity profiling loop"; +} + +void ActivityProfilerController::step() { + int64_t currentIter = ++iterationCount_; + VLOG(0) << "Step called , iteration = " << currentIter; + + // optimization to not take the lock unless necessary + if (asyncRequestConfig_ && !profiler_->isActive()) { + std::lock_guard lock(asyncConfigLock_); + auto startIter = asyncRequestConfig_->startIterationIncludingWarmup(); + + if (asyncRequestConfig_->hasProfileStartIteration() + && currentIter >= startIter) { + LOG(INFO) << "Received on-demand activity trace request by profile" + << " start iteration = " + << asyncRequestConfig_->profileStartIteration() + << " current iteration = " << currentIter; + + if (currentIter > startIter) { + // adjust the start iteration if it is in the past + auto newProfileStart = currentIter + + asyncRequestConfig_->activitiesWarmupIterations(); + LOG(INFO) << "Start iteration updated to " << newProfileStart; + asyncRequestConfig_->setProfileStartIteration(newProfileStart); + } + activateConfig(system_clock::now()); + } + } + + if (profiler_->isActive()) { + auto now = system_clock::now(); + auto next_wakeup_time = now + kProfilerIntervalMsecs; + profiler_->performRunLoopStep(now, next_wakeup_time, currentIter); + } +} + +void ActivityProfilerController::activateConfig( + std::chrono::time_point now) { + logger_ = makeLogger(*asyncRequestConfig_); + profiler_->setLogger(logger_.get()); + profiler_->configure(*asyncRequestConfig_, now); + asyncRequestConfig_ = nullptr; +} + +void ActivityProfilerController::scheduleTrace(const Config& config) { + VLOG(1) << "scheduleTrace"; + if (profiler_->isActive()) { + LOG(ERROR) << "Ignored request - profiler busy"; + return; + } + int64_t currentIter = iterationCount_; + if (config.hasProfileStartIteration() && currentIter < 0) { + LOG(ERROR) << "Ignored profile iteration count based request as " + << "application is not updating iteration count"; + return; + } + std::lock_guard lock(asyncConfigLock_); + asyncRequestConfig_ = config.clone(); + + auto startIter = asyncRequestConfig_->startIterationIncludingWarmup(); + + if (asyncRequestConfig_->hasProfileStartIteration() + && (currentIter > startIter) + && asyncRequestConfig_->profileStartIterationRoundUp() > 0) { + auto newProfileStart + = currentIter + asyncRequestConfig_->activitiesWarmupIterations(); + // round up to nearest multiple + auto divisor = asyncRequestConfig_->profileStartIterationRoundUp(); + auto rem = newProfileStart % divisor; + newProfileStart += ((rem == 0) ? 0 : divisor - rem); + LOG(INFO) << "Rounding up profiler start iteration to : " << newProfileStart; + asyncRequestConfig_->setProfileStartIteration(newProfileStart); + } + + // start a profilerLoop() thread to handle request + if (!profilerThread_) { + profilerThread_ = + new std::thread(&ActivityProfilerController::profilerLoop, this); + } +} + +void ActivityProfilerController::prepareTrace(const Config& config) { + // Requests from ActivityProfilerApi have higher priority than + // requests from other sources (signal, daemon). + // Cancel any ongoing request and refuse new ones. + auto now = system_clock::now(); + if (profiler_->isActive()) { + LOG(WARNING) << "Cancelling current trace request in order to start " + << "higher priority synchronous request"; + if (libkineto::api().client()) { + libkineto::api().client()->stop(); + } + profiler_->stopTrace(now); + profiler_->reset(); + } + + profiler_->configure(config, now); +} + +std::unique_ptr ActivityProfilerController::stopTrace() { + profiler_->stopTrace(std::chrono::system_clock::now()); + auto logger = std::make_unique(profiler_->config()); + profiler_->processTrace(*logger); + profiler_->reset(); + return std::make_unique(std::move(logger), loggerFactory()); +} + +void ActivityProfilerController::addMetadata( + const std::string& key, const std::string& value) { + profiler_->addMetadata(key, value); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerController.h b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerController.h new file mode 100644 index 0000000000000000000000000000000000000000..415f107cbed6aab4777c65e9e51d65686002e762 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerController.h @@ -0,0 +1,84 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include + +#include "ActivityLoggerFactory.h" +#include "CuptiActivityProfiler.h" +#include "ActivityProfilerInterface.h" +#include "ActivityTraceInterface.h" +#include "ConfigLoader.h" +#include "CuptiActivityApi.h" + +namespace KINETO_NAMESPACE { + +class Config; + +class ActivityProfilerController : public ConfigLoader::ConfigHandler { + public: + explicit ActivityProfilerController(ConfigLoader& configLoader, bool cpuOnly); + ActivityProfilerController(const ActivityProfilerController&) = delete; + ActivityProfilerController& operator=(const ActivityProfilerController&) = + delete; + + ~ActivityProfilerController(); + + static void addLoggerFactory( + const std::string& protocol, + ActivityLoggerFactory::FactoryFunc factory); + + bool canAcceptConfig() override; + void acceptConfig(const Config& config) override; + + void scheduleTrace(const Config& config); + + void prepareTrace(const Config& config); + + void startTrace() { + profiler_->startTrace(std::chrono::system_clock::now()); + } + + void step(); + + std::unique_ptr stopTrace(); + + bool isActive() { + return profiler_->isActive(); + } + + void transferCpuTrace( + std::unique_ptr cpuTrace) { + return profiler_->transferCpuTrace(std::move(cpuTrace)); + } + + void recordThreadInfo() { + profiler_->recordThreadInfo(); + } + + void addChildActivityProfiler( + std::unique_ptr profiler) { + profiler_->addChildActivityProfiler(std::move(profiler)); + } + + void addMetadata(const std::string& key, const std::string& value); + + private: + void profilerLoop(); + void activateConfig(std::chrono::time_point now); + + std::unique_ptr asyncRequestConfig_; + std::mutex asyncConfigLock_; + std::unique_ptr profiler_; + std::unique_ptr logger_; + std::thread* profilerThread_{nullptr}; + std::atomic_bool stopRunloop_{false}; + std::atomic iterationCount_{-1}; + ConfigLoader& configLoader_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerProxy.cpp b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerProxy.cpp new file mode 100644 index 0000000000000000000000000000000000000000..b2d36b7b3abf9c3e0aed838a10e4054a5d292139 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerProxy.cpp @@ -0,0 +1,119 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "ActivityProfilerProxy.h" + +#include "ActivityProfilerController.h" +#include "Config.h" +#include "CuptiActivityApi.h" +#include "Logger.h" +#include + +namespace KINETO_NAMESPACE { + +ActivityProfilerProxy::ActivityProfilerProxy( + bool cpuOnly, ConfigLoader& configLoader) + : cpuOnly_(cpuOnly), configLoader_(configLoader) { +} + +ActivityProfilerProxy::~ActivityProfilerProxy() { + delete controller_; +}; + +void ActivityProfilerProxy::init() { + if (!controller_) { + controller_ = new ActivityProfilerController(configLoader_, cpuOnly_); + } +} + +void ActivityProfilerProxy::scheduleTrace(const std::string& configStr) { + Config config; + config.parse(configStr); + controller_->scheduleTrace(config); +} + +void ActivityProfilerProxy::scheduleTrace(const Config& config) { + controller_->scheduleTrace(config); +} + +void ActivityProfilerProxy::prepareTrace( + const std::set& activityTypes, + const std::string& configStr) { + Config config; + bool validate_required = true; + + // allow user provided config to override default options + if (!configStr.empty()) { + if (!config.parse(configStr)) { + LOG(WARNING) << "Failed to parse config : " << configStr; + } + // parse also runs validate + validate_required = false; + } + + config.setClientDefaults(); + config.setSelectedActivityTypes(activityTypes); + + if (validate_required) { + config.validate(std::chrono::system_clock::now()); + } + + controller_->prepareTrace(config); +} + +void ActivityProfilerProxy::startTrace() { + controller_->startTrace(); +} + +std::unique_ptr +ActivityProfilerProxy::stopTrace() { + return controller_->stopTrace(); +} + +void ActivityProfilerProxy::step() { + controller_->step(); +} + +bool ActivityProfilerProxy::isActive() { + return controller_->isActive(); +} + +void ActivityProfilerProxy::pushCorrelationId(uint64_t id) { + CuptiActivityApi::pushCorrelationID(id, + CuptiActivityApi::CorrelationFlowType::Default); +} + +void ActivityProfilerProxy::popCorrelationId() { + CuptiActivityApi::popCorrelationID( + CuptiActivityApi::CorrelationFlowType::Default); +} + +void ActivityProfilerProxy::pushUserCorrelationId(uint64_t id) { + CuptiActivityApi::pushCorrelationID(id, + CuptiActivityApi::CorrelationFlowType::User); +} + +void ActivityProfilerProxy::popUserCorrelationId() { + CuptiActivityApi::popCorrelationID( + CuptiActivityApi::CorrelationFlowType::User); +} + +void ActivityProfilerProxy::transferCpuTrace( + std::unique_ptr traceBuffer) { + controller_->transferCpuTrace(std::move(traceBuffer)); +} + +void ActivityProfilerProxy::addMetadata( + const std::string& key, const std::string& value) { + controller_->addMetadata(key, value); +} + +void ActivityProfilerProxy::recordThreadInfo() { + controller_->recordThreadInfo(); +} + +void ActivityProfilerProxy::addChildActivityProfiler( + std::unique_ptr profiler) { + controller_->addChildActivityProfiler(std::move(profiler)); +} + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerProxy.h b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerProxy.h new file mode 100644 index 0000000000000000000000000000000000000000..b5cf84b2f1ddb005060fea0927c99fc63d144d99 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityProfilerProxy.h @@ -0,0 +1,73 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include "ActivityProfilerInterface.h" + +#include +#include +#include + +#include "ActivityType.h" +#include "ITraceActivity.h" + +namespace libkineto { + // previous declaration is struct so this one must be too. + struct CpuTraceBuffer; +} + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +class ActivityProfilerController; +class Config; +class ConfigLoader; + +class ActivityProfilerProxy : public ActivityProfilerInterface { + + public: + ActivityProfilerProxy(bool cpuOnly, ConfigLoader& configLoader); + ~ActivityProfilerProxy() override; + + void init() override; + bool isInitialized() override { + return controller_ != nullptr; + } + + bool isActive() override; + + void recordThreadInfo() override; + + void scheduleTrace(const std::string& configStr) override; + void scheduleTrace(const Config& config); + + void prepareTrace( + const std::set& activityTypes, + const std::string& configStr = "") override; + + void startTrace() override; + void step() override; + std::unique_ptr stopTrace() override; + + void pushCorrelationId(uint64_t id) override; + void popCorrelationId() override; + + void pushUserCorrelationId(uint64_t id) override; + void popUserCorrelationId() override; + + void transferCpuTrace( + std::unique_ptr traceBuffer) override; + + void addMetadata(const std::string& key, const std::string& value) override; + + virtual void addChildActivityProfiler( + std::unique_ptr profiler) override; + + private: + bool cpuOnly_{true}; + ConfigLoader& configLoader_; + ActivityProfilerController* controller_{nullptr}; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityTrace.h b/plugins/tensorboard-plugins/libkineto/src/ActivityTrace.h new file mode 100644 index 0000000000000000000000000000000000000000..0be76af08e47c16ebee2ac1d1ad01c4425ff17a5 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityTrace.h @@ -0,0 +1,45 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include + +#include "ActivityLoggerFactory.h" +#include "ActivityTraceInterface.h" +#include "output_json.h" +#include "output_membuf.h" + +namespace libkineto { + +class ActivityTrace : public ActivityTraceInterface { + public: + ActivityTrace( + std::unique_ptr tmpLogger, + const ActivityLoggerFactory& factory) + : memLogger_(std::move(tmpLogger)), + loggerFactory_(factory) { + } + + const std::vector* activities() override { + return memLogger_->traceActivities(); + }; + + void save(const std::string& url) override { + std::string prefix; + // if no protocol is specified, default to file + if (url.find("://") == url.npos) { + prefix = "file://"; + } + memLogger_->log(*loggerFactory_.makeLogger(prefix + url)); + }; + + private: + // Activities are logged into a buffer + std::unique_ptr memLogger_; + + // Alternative logger used by save() if protocol prefix is specified + const ActivityLoggerFactory& loggerFactory_; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/ActivityType.cpp b/plugins/tensorboard-plugins/libkineto/src/ActivityType.cpp new file mode 100644 index 0000000000000000000000000000000000000000..18856b72370abdb6d9cf4309b32be4cae10805de --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ActivityType.cpp @@ -0,0 +1,58 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "ActivityType.h" + +#include + +namespace libkineto { + +struct ActivityTypeName { + const char* name; + ActivityType type; +}; + +static constexpr std::array map{{ + {"cpu_op", ActivityType::CPU_OP}, + {"user_annotation", ActivityType::USER_ANNOTATION}, + {"gpu_user_Annotation", ActivityType::GPU_USER_ANNOTATION}, + {"gpu_memcpy", ActivityType::GPU_MEMCPY}, + {"gpu_memset", ActivityType::GPU_MEMSET}, + {"kernel", ActivityType::CONCURRENT_KERNEL}, + {"external_correlation", ActivityType::EXTERNAL_CORRELATION}, + {"cuda_runtime", ActivityType::CUDA_RUNTIME}, + {"cuda_profiler_range", ActivityType::CUDA_PROFILER_RANGE}, + {"glow_runtime", ActivityType::GLOW_RUNTIME}, + {"cpu_instant_event", ActivityType::CPU_INSTANT_EVENT}, + {"python_function", ActivityType::PYTHON_FUNCTION}, + {"overhead", ActivityType::OVERHEAD}, + {"ENUM_COUNT", ActivityType::ENUM_COUNT} +}}; + +static constexpr bool matchingOrder(int idx = 0) { + return map[idx].type == ActivityType::ENUM_COUNT || + ((idx == (int) map[idx].type) && matchingOrder(idx + 1)); +} +static_assert(matchingOrder(), "ActivityTypeName map is out of order"); + +const char* toString(ActivityType t) { + return map[(int)t].name; +} + +ActivityType toActivityType(const std::string& str) { + for (int i = 0; i < activityTypeCount; i++) { + if (str == map[i].name) { + return map[i].type; + } + } + throw std::invalid_argument(fmt::format("Invalid activity type: {}", str)); +} + +const std::array activityTypes() { + std::array res; + for (int i = 0; i < activityTypeCount; i++) { + res[i] = map[i].type; + } + return res; +} + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/Config.cpp b/plugins/tensorboard-plugins/libkineto/src/Config.cpp new file mode 100644 index 0000000000000000000000000000000000000000..95538840f378e83b2b44161823042c620b34fe93 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/Config.cpp @@ -0,0 +1,473 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "Config.h" + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "Logger.h" +#include "ThreadUtil.h" + +using namespace std::chrono; + +using std::string; +using std::vector; + +namespace KINETO_NAMESPACE { + +constexpr milliseconds kDefaultSamplePeriodMsecs(1000); +constexpr milliseconds kDefaultMultiplexPeriodMsecs(1000); +constexpr milliseconds kDefaultActivitiesProfileDurationMSecs(500); +constexpr int kDefaultActivitiesMaxGpuBufferSize(128 * 1024 * 1024); +constexpr seconds kDefaultActivitiesWarmupDurationSecs(5); +constexpr seconds kDefaultBufferUntilWarmup(10); +constexpr seconds kDefaultReportPeriodSecs(1); +constexpr int kDefaultSamplesPerReport(1); +constexpr int kDefaultMaxEventProfilersPerGpu(1); +constexpr int kDefaultEventProfilerHearbeatMonitorPeriod(0); +constexpr seconds kMaxRequestAge(10); + +// Event Profiler +constexpr char kEventsKey[] = "EVENTS"; +constexpr char kMetricsKey[] = "METRICS"; +constexpr char kSamplePeriodKey[] = "SAMPLE_PERIOD_MSECS"; +constexpr char kMultiplexPeriodKey[] = "MULTIPLEX_PERIOD_MSECS"; +constexpr char kReportPeriodKey[] = "REPORT_PERIOD_SECS"; +constexpr char kSamplesPerReportKey[] = "SAMPLES_PER_REPORT"; +constexpr char kEventsLogFileKey[] = "EVENTS_LOG_FILE"; +constexpr char kEventsEnabledDevicesKey[] = "EVENTS_ENABLED_DEVICES"; +constexpr char kOnDemandDurationKey[] = "EVENTS_DURATION_SECS"; +constexpr char kMaxEventProfilersPerGpuKey[] = "MAX_EVENT_PROFILERS_PER_GPU"; +constexpr char kHeartbeatMonitorPeriodKey[] = + "EVENTS_HEARTBEAT_MONITOR_PERIOD_SECS"; + +// Activity Profiler +constexpr char kActivitiesEnabledKey[] = "ACTIVITIES_ENABLED"; +constexpr char kActivityTypesKey[] = "ACTIVITY_TYPES"; +constexpr char kActivitiesLogFileKey[] = "ACTIVITIES_LOG_FILE"; +constexpr char kActivitiesDurationKey[] = "ACTIVITIES_DURATION_SECS"; +constexpr char kActivitiesDurationMsecsKey[] = "ACTIVITIES_DURATION_MSECS"; +constexpr char kActivitiesWarmupDurationSecsKey[] = "ACTIVITIES_WARMUP_PERIOD_SECS"; +constexpr char kActivitiesMaxGpuBufferSizeKey[] = + "ACTIVITIES_MAX_GPU_BUFFER_SIZE_MB"; + +// Client Interface +constexpr char kClientInterfaceEnableOpInputsCollection[] = "CLIENT_INTERFACE_ENABLE_OP_INPUTS_COLLECTION"; + +constexpr char kActivitiesWarmupIterationsKey[] = "ACTIVITIES_WARMUP_ITERATIONS"; +constexpr char kActivitiesIterationsKey[] = "ACTIVITIES_ITERATIONS"; +// Common + +// Client-side timestamp used for synchronized start across hosts for +// distributed workloads. +// Specified in milliseconds Unix time (milliseconds since epoch). +// To use, compute a future timestamp as follows: +// * C++: + duration_cast( +// system_clock::now().time_since_epoch()).count() +// * Python: + int(time.time() * 1000) +// * Bash: $(( + $(date +%s%3N))) +// If used for a tracing request, timestamp must be far enough in the future +// to accommodate ACTIVITIES_WARMUP_PERIOD_SECS as well as any delays in +// propagating the request to the profiler. +// If the request can not be honored, it is up to the profilers to report +// an error somehow - no checks are done at config parse time. +// Note PROFILE_START_ITERATION has higher precedence +constexpr char kProfileStartTimeKey[] = "PROFILE_START_TIME"; +// DEPRECATED - USE PROFILE_START_TIME instead +constexpr char kRequestTimestampKey[] = "REQUEST_TIMESTAMP"; + +// Alternatively if the application supports reporting iterations +// start the profile at specific iteration. If the iteration count +// is >= this value the profile is started immediately. +// A value >= 0 is valid for this config option to take effect. +// Note PROFILE_START_ITERATION will take precedence over PROFILE_START_TIME. +constexpr char kProfileStartIterationKey[] = "PROFILE_START_ITERATION"; + +// Users can also start the profile on an integer multiple of the config +// value PROFILE_START_ITERATION_ROUNDUP. This knob behaves similar to +// PROFILE_START_ITERATION but instead of saying : "start collection trace on +// iteration 500", one can configure it to "start collecting trace on the next +// 100th iteration". +// +// For example, +// PROFILE_START_ITERATION_ROUNDUP = 1000, and the current iteration is 2010 +// The profile will then be collected on the next multiple of 1000 ie. 3000 +// Note PROFILE_START_ITERATION_ROUNDUP will also take precedence over +// PROFILE_START_TIME. +constexpr char kProfileStartIterationRoundUpKey[] + = "PROFILE_START_ITERATION_ROUNDUP"; + +// Enable on-demand trigger via kill -USR2 +// When triggered in this way, /tmp/libkineto.conf will be used as config. +constexpr char kEnableSigUsr2Key[] = "ENABLE_SIGUSR2"; + +// Enable communication through IPC Fabric +// and disable thrift communication with dynolog daemon +constexpr char kEnableIpcFabricKey[] = "ENABLE_IPC_FABRIC"; + +// Verbose log level +// The actual glog is not used and --v and --vmodule has no effect. +// Instead set the verbose level and modules in the config file. +constexpr char kLogVerboseLevelKey[] = "VERBOSE_LOG_LEVEL"; +// By default, all modules will log verbose messages >= verboseLogLevel. +// But to reduce noise we can specify one or more modules of interest. +// A module is a C/C++ object file (source file name), +// Example argument: ActivityProfiler.cpp,output_json.cpp +constexpr char kLogVerboseModulesKey[] = "VERBOSE_LOG_MODULES"; + +// Max devices supported on any system +constexpr uint8_t kMaxDevices = 8; + +namespace { + +struct FactoryMap { + + void addFactory( + std::string name, + std::function factory) { + std::lock_guard lock(lock_); + factories_[name] = factory; + } + + void addFeatureConfigs(Config& cfg) { + std::lock_guard lock(lock_); + for (const auto& p : factories_) { + cfg.addFeature(p.first, p.second(cfg)); + } + } + +// Config factories are shared between objects and since +// config objects can be created by multiple threads, we need a lock. + std::mutex lock_; + std::map> factories_; +}; + +std::shared_ptr configFactories() { + // Ensure this is safe to call during shutdown, even as static + // destructors are invoked. Once factories destructor has been + // invoked, weak_ptr.lock() will return nullptr. + // But calls before that point will have a valid shared_ptr, + // delaying destruction of the underlying FactoryMap. + static auto factories = std::make_shared(); + static std::weak_ptr weak_ptr = factories; + return weak_ptr.lock(); +} + +} // namespace + +void Config::addConfigFactory( + std::string name, + std::function factory) { + auto factories = configFactories(); + if (factories) { + factories->addFactory(name, factory); + } +} + +static string defaultTraceFileName() { + return fmt::format("/tmp/libkineto_activities_{}.json", processId()); +} + +Config::Config() + : verboseLogLevel_(-1), + samplePeriod_(kDefaultSamplePeriodMsecs), + reportPeriod_(duration_cast(kDefaultReportPeriodSecs)), + samplesPerReport_(kDefaultSamplesPerReport), + eventProfilerOnDemandDuration_(seconds(0)), + eventProfilerMaxInstancesPerGpu_(kDefaultMaxEventProfilersPerGpu), + eventProfilerHeartbeatMonitorPeriod_( + kDefaultEventProfilerHearbeatMonitorPeriod), + multiplexPeriod_(kDefaultMultiplexPeriodMsecs), + activityProfilerEnabled_(true), + activitiesLogFile_(defaultTraceFileName()), + activitiesLogUrl_(fmt::format("file://{}", activitiesLogFile_)), + activitiesMaxGpuBufferSize_(kDefaultActivitiesMaxGpuBufferSize), + activitiesWarmupDuration_(kDefaultActivitiesWarmupDurationSecs), + activitiesWarmupIterations_(0), + activitiesDuration_(kDefaultActivitiesProfileDurationMSecs), + activitiesRunIterations_(0), + activitiesOnDemandTimestamp_(milliseconds(0)), + profileStartTime_(milliseconds(0)), + profileStartIteration_(-1), + profileStartIterationRoundUp_(-1), + requestTimestamp_(milliseconds(0)), + enableSigUsr2_(false), + enableIpcFabric_(false) { + auto factories = configFactories(); + if (factories) { + factories->addFeatureConfigs(*this); + } +} + +uint8_t Config::createDeviceMask(const string& val) { + uint8_t res = 0; + for (const auto& d : splitAndTrim(val, ',')) { + res |= 1 << toIntRange(d, 0, kMaxDevices - 1); + } + return res; +} + +const seconds Config::maxRequestAge() const { + return kMaxRequestAge; +} + +static std::string getTimeStr(time_point t) { + std::time_t t_c = system_clock::to_time_t(t); + return fmt::format("{:%H:%M:%S}", fmt::localtime(t_c)); +} + +static time_point handleRequestTimestamp(int64_t ms) { + auto t = time_point(milliseconds(ms)); + auto now = system_clock::now(); + if (t > now) { + throw std::invalid_argument(fmt::format( + "Invalid {}: {} - time is in future", + kRequestTimestampKey, + getTimeStr(t))); + } else if ((now - t) > kMaxRequestAge) { + throw std::invalid_argument(fmt::format( + "Invalid {}: {} - time is more than {}s in the past", + kRequestTimestampKey, + getTimeStr(t), + kMaxRequestAge.count())); + } + return t; +} + +void Config::setActivityTypes( + const std::vector& selected_activities) { + selectedActivityTypes_.clear(); + if (selected_activities.size() > 0) { + for (const auto& activity : selected_activities) { + if (activity == "") { + continue; + } + selectedActivityTypes_.insert(toActivityType(activity)); + } + } +} + +bool Config::handleOption(const std::string& name, std::string& val) { + // Event Profiler + if (!name.compare(kEventsKey)) { + vector event_names = splitAndTrim(val, ','); + eventNames_.insert(event_names.begin(), event_names.end()); + } else if (!name.compare(kMetricsKey)) { + vector metric_names = splitAndTrim(val, ','); + metricNames_.insert(metric_names.begin(), metric_names.end()); + } else if (!name.compare(kSamplePeriodKey)) { + samplePeriod_ = milliseconds(toInt32(val)); + } else if (!name.compare(kMultiplexPeriodKey)) { + multiplexPeriod_ = milliseconds(toInt32(val)); + } else if (!name.compare(kReportPeriodKey)) { + setReportPeriod(seconds(toInt32(val))); + } else if (!name.compare(kSamplesPerReportKey)) { + samplesPerReport_ = toInt32(val); + } else if (!name.compare(kEventsLogFileKey)) { + eventLogFile_ = val; + } else if (!name.compare(kEventsEnabledDevicesKey)) { + eventProfilerDeviceMask_ = createDeviceMask(val); + } else if (!name.compare(kOnDemandDurationKey)) { + eventProfilerOnDemandDuration_ = seconds(toInt32(val)); + eventProfilerOnDemandTimestamp_ = timestamp(); + } else if (!name.compare(kMaxEventProfilersPerGpuKey)) { + eventProfilerMaxInstancesPerGpu_ = toInt32(val); + } else if (!name.compare(kHeartbeatMonitorPeriodKey)) { + eventProfilerHeartbeatMonitorPeriod_ = seconds(toInt32(val)); + } + + // Activity Profiler + else if (!name.compare(kActivitiesDurationKey)) { + activitiesDuration_ = + duration_cast(seconds(toInt32(val))); + activitiesOnDemandTimestamp_ = timestamp(); + } else if (!name.compare(kActivityTypesKey)) { + vector activity_types = splitAndTrim(toLower(val), ','); + setActivityTypes(activity_types); + } else if (!name.compare(kActivitiesDurationMsecsKey)) { + activitiesDuration_ = milliseconds(toInt32(val)); + activitiesOnDemandTimestamp_ = timestamp(); + } else if (!name.compare(kActivitiesIterationsKey)) { + activitiesRunIterations_ = toInt32(val); + activitiesOnDemandTimestamp_ = timestamp(); + } else if (!name.compare(kLogVerboseLevelKey)) { + verboseLogLevel_ = toInt32(val); + } else if (!name.compare(kLogVerboseModulesKey)) { + verboseLogModules_ = splitAndTrim(val, ','); + } else if (!name.compare(kActivitiesEnabledKey)) { + activityProfilerEnabled_ = toBool(val); + } else if (!name.compare(kActivitiesLogFileKey)) { + activitiesLogFile_ = val; + activitiesLogUrl_ = fmt::format("file://{}", val); + activitiesOnDemandTimestamp_ = timestamp(); + } else if (!name.compare(kActivitiesMaxGpuBufferSizeKey)) { + activitiesMaxGpuBufferSize_ = toInt32(val) * 1024 * 1024; + } else if (!name.compare(kActivitiesWarmupDurationSecsKey)) { + activitiesWarmupDuration_ = seconds(toInt32(val)); + } else if (!name.compare(kActivitiesWarmupIterationsKey)) { + activitiesWarmupIterations_ = toInt32(val); + } + + // Client Interface + else if (!name.compare(kClientInterfaceEnableOpInputsCollection)) { + enableOpInputsCollection_ = toBool(val); + } + + // Common + else if (!name.compare(kRequestTimestampKey)) { + VLOG(0) << kRequestTimestampKey + << " has been deprecated - please use " + << kProfileStartTimeKey; + requestTimestamp_ = handleRequestTimestamp(toInt64(val)); + } else if (!name.compare(kProfileStartTimeKey)) { + profileStartTime_ = + time_point(milliseconds(toInt64(val))); + } else if (!name.compare(kProfileStartIterationKey)) { + profileStartIteration_ = toInt32(val); + } else if (!name.compare(kProfileStartIterationRoundUpKey)) { + profileStartIterationRoundUp_ = toInt32(val); + } else if (!name.compare(kEnableSigUsr2Key)) { + enableSigUsr2_ = toBool(val); + } else if (!name.compare(kEnableIpcFabricKey)) { + enableIpcFabric_ = toBool(val); + } else { + return false; + } + return true; +} + +std::chrono::milliseconds Config::activitiesDurationDefault() const { + return kDefaultActivitiesProfileDurationMSecs; +}; + +void Config::updateActivityProfilerRequestReceivedTime() { + activitiesOnDemandTimestamp_ = system_clock::now(); +} + +void Config::setClientDefaults() { + AbstractConfig::setClientDefaults(); + activitiesLogToMemory_ = true; +} + +void Config::validate( + const time_point& fallbackProfileStartTime) { + if (samplePeriod_.count() == 0) { + LOG(WARNING) << "Sample period must be greater than 0, setting to 1ms"; + samplePeriod_ = milliseconds(1); + } + + if (multiplexPeriod_ < samplePeriod_) { + LOG(WARNING) << "Multiplex period can not be smaller " + << "than sample period"; + LOG(WARNING) << "Setting multiplex period to " << samplePeriod_.count() + << "ms"; + multiplexPeriod_ = samplePeriod_; + } + + if ((multiplexPeriod_ % samplePeriod_).count() != 0) { + LOG(WARNING) << "Multiplex period must be a " + << "multiple of sample period"; + multiplexPeriod_ = alignUp(multiplexPeriod_, samplePeriod_); + LOG(WARNING) << "Setting multiplex period to " << multiplexPeriod_.count() + << "ms"; + } + + if ((reportPeriod_ % multiplexPeriod_).count() != 0 || + reportPeriod_.count() == 0) { + LOG(WARNING) << "Report period must be a " + << "multiple of multiplex period"; + reportPeriod_ = alignUp(reportPeriod_, multiplexPeriod_); + LOG(WARNING) << "Setting report period to " << reportPeriod_.count() + << "ms"; + } + + if (samplesPerReport_ < 1) { + LOG(WARNING) << "Samples per report must be in the range " + << "[1, report period / sample period]"; + LOG(WARNING) << "Setting samples per report to 1"; + samplesPerReport_ = 1; + } + + int max_samples_per_report = reportPeriod_ / samplePeriod_; + if (samplesPerReport_ > max_samples_per_report) { + LOG(WARNING) << "Samples per report must be in the range " + << "[1, report period / sample period] ([1, " + << reportPeriod_.count() << "ms / " << samplePeriod_.count() + << "ms = " << max_samples_per_report << "])"; + LOG(WARNING) << "Setting samples per report to " << max_samples_per_report; + samplesPerReport_ = max_samples_per_report; + } + + if (!hasProfileStartTime()) { + VLOG(0) + << "No explicit timestamp has been set. " + << "Defaulting it to now + activitiesWarmupDuration with buffer."; + profileStartTime_ = fallbackProfileStartTime + + activitiesWarmupDuration() + kDefaultBufferUntilWarmup; + } + + if (profileStartIterationRoundUp_ == 0) { + // setting to 0 will mess up modulo arithmetic, set it to -1 so it has no effect + LOG(WARNING) << "Profiler start iteration round up should be >= 1."; + profileStartIterationRoundUp_ = -1; + } + + if (profileStartIterationRoundUp_ > 0 && !hasProfileStartIteration()) { + VLOG(0) << "Setting profiler start iteration to 0 so this config is " + << "triggered via iteration count."; + profileStartIteration_ = 0; + } + + if (selectedActivityTypes_.size() == 0) { + selectDefaultActivityTypes(); + } +} + +void Config::setReportPeriod(milliseconds msecs) { + reportPeriod_ = msecs; +} + +void Config::printActivityProfilerConfig(std::ostream& s) const { + s << "Log file: " << activitiesLogFile() << std::endl; + if (hasProfileStartIteration()) { + s << "Trace start Iteration: " << profileStartIteration() << std::endl; + s << "Trace warmup Iterations: " << activitiesWarmupIterations() << std::endl; + s << "Trace profile Iterations: " << activitiesRunIterations() << std::endl; + if (profileStartIterationRoundUp() > 0) { + s << "Trace start iteration roundup : " << profileStartIterationRoundUp() + << std::endl; + } + } else if (hasProfileStartTime()) { + std::time_t t_c = system_clock::to_time_t(requestTimestamp()); + LOG(INFO) << "Trace start time: " + << fmt::format("{:%Y-%m-%d %H:%M:%S}", fmt::localtime(t_c)); + s << "Trace duration: " << activitiesDuration().count() << "ms" + << std::endl; + s << "Warmup duration: " << activitiesWarmupDuration().count() << "s" + << std::endl; + } + + s << "Max GPU buffer size: " << activitiesMaxGpuBufferSize() / 1024 / 1024 + << "MB" << std::endl; + + std::vector activities; + for (const auto& activity : selectedActivityTypes_) { + activities.push_back(toString(activity)); + } + s << "Enabled activities: " + << fmt::format("{}", fmt::join(activities, ",")) << std::endl; + + AbstractConfig::printActivityProfilerConfig(s); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ConfigLoader.cpp b/plugins/tensorboard-plugins/libkineto/src/ConfigLoader.cpp new file mode 100644 index 0000000000000000000000000000000000000000..4080b678d371e98757897d4d7726c159887377e1 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ConfigLoader.cpp @@ -0,0 +1,300 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "ConfigLoader.h" + +#ifdef __linux__ +#include +#endif + +#include +#include +#include +#include +#include + +#include "DaemonConfigLoader.h" + +#include "Logger.h" + +using namespace std::chrono; +using std::string; + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +constexpr char kConfigFileEnvVar[] = "KINETO_CONFIG"; +#ifdef __linux__ +constexpr char kConfigFile[] = "/etc/libkineto.conf"; +constexpr char kOnDemandConfigFile[] = "/tmp/libkineto.conf"; +#else +constexpr char kConfigFile[] = "libkineto.conf"; +constexpr char kOnDemandConfigFile[] = "libkineto.conf"; +#endif + +constexpr std::chrono::seconds kConfigUpdateIntervalSecs(300); +constexpr std::chrono::seconds kOnDemandConfigUpdateIntervalSecs(5); + +#ifdef __linux__ +static struct sigaction originalUsr2Handler = {}; +#endif + +// Use SIGUSR2 to initiate profiling. +// Look for an on-demand config file. +// If none is found, default to base config. +// Try to not affect existing handlers +static bool hasOriginalSignalHandler() { +#ifdef __linux__ + return originalUsr2Handler.sa_handler != nullptr || + originalUsr2Handler.sa_sigaction != nullptr; +#else + return false; +#endif +} + +static void handle_signal(int signal) { +#ifdef __linux__ + if (signal == SIGUSR2) { + ConfigLoader::instance().handleOnDemandSignal(); + if (hasOriginalSignalHandler()) { + // Invoke original handler and reinstate ours + struct sigaction act; + sigaction(SIGUSR2, &originalUsr2Handler, &act); + raise(SIGUSR2); + sigaction(SIGUSR2, &act, &originalUsr2Handler); + } + } +#endif +} + +static void setupSignalHandler(bool enableSigUsr2) { +#ifdef __linux__ + if (enableSigUsr2) { + struct sigaction act = {}; + act.sa_handler = &handle_signal; + act.sa_flags = SA_NODEFER; + if (sigaction(SIGUSR2, &act, &originalUsr2Handler) < 0) { + PLOG(ERROR) << "Failed to register SIGUSR2 handler"; + } + if (originalUsr2Handler.sa_handler == &handle_signal) { + originalUsr2Handler = {}; + } + } else if (hasOriginalSignalHandler()) { + sigaction(SIGUSR2, &originalUsr2Handler, nullptr); + originalUsr2Handler = {}; + } +#endif +} + +// return an empty string if reading gets any errors. Otherwise a config string. +static std::string readConfigFromConfigFile(const char* filename) { + // Read whole file into a string. + std::ifstream file(filename); + std::string conf; + try { + conf.assign( + std::istreambuf_iterator(file), std::istreambuf_iterator()); + } catch (std::exception& e) { + VLOG(0) << "Error reading " << filename << ": " + << e.what(); + conf = ""; + } + return conf; +} + +static std::function()>& +daemonConfigLoaderFactory() { + static std::function()> factory = nullptr; + return factory; +} + +void ConfigLoader::setDaemonConfigLoaderFactory( + std::function()> factory) { + daemonConfigLoaderFactory() = factory; +} + +ConfigLoader& ConfigLoader::instance() { + static ConfigLoader config_loader; + return config_loader; +} + +// return an empty string if polling gets any errors. Otherwise a config string. +std::string ConfigLoader::readOnDemandConfigFromDaemon( + time_point now) { + if (!daemonConfigLoader_) { + return ""; + } + bool events = canHandlerAcceptConfig(ConfigKind::EventProfiler); + bool activities = canHandlerAcceptConfig(ConfigKind::ActivityProfiler); + return daemonConfigLoader_->readOnDemandConfig(events, activities); +} + +int ConfigLoader::contextCountForGpu(uint32_t device) { + if (!daemonConfigLoader_) { + // FIXME: Throw error? + return 0; + } + return daemonConfigLoader_->gpuContextCount(device); +} + +ConfigLoader::ConfigLoader() + : configUpdateIntervalSecs_(kConfigUpdateIntervalSecs), + onDemandConfigUpdateIntervalSecs_(kOnDemandConfigUpdateIntervalSecs), + stopFlag_(false), + onDemandSignal_(false) { +} + +void ConfigLoader::startThread() { + if (!updateThread_) { + // Create default base config here - at this point static initializers + // of extensions should have run and registered all config feature factories + std::lock_guard lock(configLock_); + if (!config_) { + config_ = std::make_unique(); + } + updateThread_ = + std::make_unique(&ConfigLoader::updateConfigThread, this); + } +} + +ConfigLoader::~ConfigLoader() { + if (updateThread_) { + stopFlag_ = true; + { + std::lock_guard lock(updateThreadMutex_); + updateThreadCondVar_.notify_one(); + } + updateThread_->join(); + } +#if !USE_GOOGLE_LOG + Logger::clearLoggerObservers(); +#endif // !USE_GOOGLE_LOG +} + +void ConfigLoader::handleOnDemandSignal() { + onDemandSignal_ = true; + { + std::lock_guard lock(updateThreadMutex_); + updateThreadCondVar_.notify_one(); + } +} + +const char* ConfigLoader::configFileName() { + if (!configFileName_) { + configFileName_ = getenv(kConfigFileEnvVar); + if (configFileName_ == nullptr) { + configFileName_ = kConfigFile; + } + } + return configFileName_; +} + +DaemonConfigLoader* ConfigLoader::daemonConfigLoader() { + if (!daemonConfigLoader_ && daemonConfigLoaderFactory()) { + daemonConfigLoader_ = daemonConfigLoaderFactory()(); + daemonConfigLoader_->setCommunicationFabric(config_->ipcFabricEnabled()); + } + return daemonConfigLoader_.get(); +} + +void ConfigLoader::updateBaseConfig() { + // First try reading local config file + // If that fails, read from daemon + // TODO: Invert these once daemon path fully rolled out + std::string config_str = readConfigFromConfigFile(configFileName()); + if (config_str.empty() && daemonConfigLoader()) { + // If local config file was not successfully loaded (e.g. not found) + // then try the daemon + config_str = daemonConfigLoader()->readBaseConfig(); + } + if (config_str != config_->source()) { + std::lock_guard lock(configLock_); + config_ = std::make_unique(); + config_->parse(config_str); + if (daemonConfigLoader()) { + daemonConfigLoader()->setCommunicationFabric(config_->ipcFabricEnabled()); + } + setupSignalHandler(config_->sigUsr2Enabled()); + SET_LOG_VERBOSITY_LEVEL( + config_->verboseLogLevel(), + config_->verboseLogModules()); + VLOG(0) << "Detected base config change"; + } +} + +void ConfigLoader::configureFromSignal( + time_point now, + Config& config) { + LOG(INFO) << "Received on-demand profiling signal, " + << "reading config from " << kOnDemandConfigFile; + // Reset start time to 0 in order to compute new default start time + const std::string config_str = "PROFILE_START_TIME=0\n" + + readConfigFromConfigFile(kOnDemandConfigFile); + config.parse(config_str); + config.setSignalDefaults(); + notifyHandlers(config); +} + +void ConfigLoader::configureFromDaemon( + time_point now, + Config& config) { + const std::string config_str = readOnDemandConfigFromDaemon(now); + if (config_str.empty()) { + return; + } + + LOG(INFO) << "Received config from dyno:\n" << config_str; + config.parse(config_str); + notifyHandlers(config); +} + +void ConfigLoader::updateConfigThread() { + auto now = system_clock::now(); + auto next_config_load_time = now; + auto next_on_demand_load_time = now + onDemandConfigUpdateIntervalSecs_; + seconds interval = configUpdateIntervalSecs_; + if (interval > onDemandConfigUpdateIntervalSecs_) { + interval = onDemandConfigUpdateIntervalSecs_; + } + auto onDemandConfig = std::make_unique(); + + // This can potentially sleep for long periods of time, so allow + // the desctructor to wake it to avoid a 5-minute long destruct period. + for (;;) { + { + std::unique_lock lock(updateThreadMutex_); + updateThreadCondVar_.wait_for(lock, interval); + } + if (stopFlag_) { + break; + } + now = system_clock::now(); + if (now > next_config_load_time) { + updateBaseConfig(); + next_config_load_time = now + configUpdateIntervalSecs_; + } + if (onDemandSignal_.exchange(false)) { + onDemandConfig = config_->clone(); + configureFromSignal(now, *onDemandConfig); + } else if (now > next_on_demand_load_time) { + onDemandConfig = std::make_unique(); + configureFromDaemon(now, *onDemandConfig); + next_on_demand_load_time = now + onDemandConfigUpdateIntervalSecs_; + } + if (onDemandConfig->verboseLogLevel() >= 0) { + LOG(INFO) << "Setting verbose level to " + << onDemandConfig->verboseLogLevel() + << " from on-demand config"; + SET_LOG_VERBOSITY_LEVEL( + onDemandConfig->verboseLogLevel(), + onDemandConfig->verboseLogModules()); + } + } +} + +bool ConfigLoader::hasNewConfig(const Config& oldConfig) { + std::lock_guard lock(configLock_); + return config_->timestamp() > oldConfig.timestamp(); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ConfigLoader.h b/plugins/tensorboard-plugins/libkineto/src/ConfigLoader.h new file mode 100644 index 0000000000000000000000000000000000000000..4ce3468e48db116b2a40d992f000a3af1338e70a --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ConfigLoader.h @@ -0,0 +1,147 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include "Config.h" + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "ILoggerObserver.h" + +namespace libkineto { + class LibkinetoApi; +} + +namespace KINETO_NAMESPACE { + +using namespace libkineto; +class DaemonConfigLoader; + +class ConfigLoader { + public: + + static ConfigLoader& instance(); + + enum ConfigKind { + ActivityProfiler = 0, + EventProfiler, + NumConfigKinds + }; + + struct ConfigHandler { + virtual ~ConfigHandler() {} + virtual bool canAcceptConfig() = 0; + virtual void acceptConfig(const Config& cfg) = 0; + }; + + void addHandler(ConfigKind kind, ConfigHandler* handler) { + std::lock_guard lock(updateThreadMutex_); + handlers_[kind].push_back(handler); + startThread(); + } + + void removeHandler(ConfigKind kind, ConfigHandler* handler) { + std::lock_guard lock(updateThreadMutex_); + auto it = std::find( + handlers_[kind].begin(), handlers_[kind].end(), handler); + if (it != handlers_[kind].end()) { + handlers_[kind].erase(it); + } + } + + void notifyHandlers(const Config& cfg) { + std::lock_guard lock(updateThreadMutex_); + for (auto& key_val : handlers_) { + for (ConfigHandler* handler : key_val.second) { + handler->acceptConfig(cfg); + } + } + } + + bool canHandlerAcceptConfig(ConfigKind kind) { + std::lock_guard lock(updateThreadMutex_); + for (ConfigHandler* handler : handlers_[kind]) { + if (!handler->canAcceptConfig()) { + return false; + } + } + return true; + } + + void initBaseConfig() { + bool init = false; + { + std::lock_guard lock(configLock_); + init = !config_ || config_->source().empty(); + } + if (init) { + updateBaseConfig(); + } + } + + inline std::unique_ptr getConfigCopy() { + std::lock_guard lock(configLock_); + return config_->clone(); + } + + bool hasNewConfig(const Config& oldConfig); + int contextCountForGpu(uint32_t gpu); + + void handleOnDemandSignal(); + + static void setDaemonConfigLoaderFactory( + std::function()> factory); + + private: + ConfigLoader(); + ~ConfigLoader(); + + const char* configFileName(); + DaemonConfigLoader* daemonConfigLoader(); + + void startThread(); + void updateConfigThread(); + void updateBaseConfig(); + + // Create configuration when receiving SIGUSR2 + void configureFromSignal( + std::chrono::time_point now, + Config& config); + + // Create configuration when receiving request from a daemon + void configureFromDaemon( + std::chrono::time_point now, + Config& config); + + std::string readOnDemandConfigFromDaemon( + std::chrono::time_point now); + + std::mutex configLock_; + std::atomic configFileName_{nullptr}; + std::unique_ptr config_; + std::unique_ptr daemonConfigLoader_; + std::map> handlers_; + + std::chrono::seconds configUpdateIntervalSecs_; + std::chrono::seconds onDemandConfigUpdateIntervalSecs_; + std::unique_ptr updateThread_; + std::condition_variable updateThreadCondVar_; + std::mutex updateThreadMutex_; + std::atomic_bool stopFlag_{false}; + std::atomic_bool onDemandSignal_{false}; + +#if !USE_GOOGLE_LOG + std::unique_ptr> loggerObservers_; + std::mutex loggerObserversMutex_; +#endif // !USE_GOOGLE_LOG +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CudaDeviceProperties.cpp b/plugins/tensorboard-plugins/libkineto/src/CudaDeviceProperties.cpp new file mode 100644 index 0000000000000000000000000000000000000000..1e909d5f9cfda13b95cc4abab547d964fe47b48a --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CudaDeviceProperties.cpp @@ -0,0 +1,130 @@ +/* + * Copyright (c) Kineto Contributors + * All rights reserved. + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include "CudaDeviceProperties.h" + +#include +#include + +#include +#include + +#include "Logger.h" + +namespace KINETO_NAMESPACE { + +static const std::vector createDeviceProps() { + std::vector props; + int device_count; + cudaError_t error_id = cudaGetDeviceCount(&device_count); + // Return empty vector if error. + if (error_id != cudaSuccess) { + LOG(ERROR) << "cudaGetDeviceCount failed with code " << error_id; + return {}; + } + VLOG(0) << "Device count is " << device_count; + for (size_t i = 0; i < device_count; ++i) { + cudaDeviceProp prop; + error_id = cudaGetDeviceProperties(&prop, i); + // Return empty vector if any device property fail to get. + if (error_id != cudaSuccess) { + LOG(ERROR) << "cudaGetDeviceProperties failed with " << error_id; + return {}; + } + props.push_back(prop); + LOGGER_OBSERVER_ADD_DEVICE(i); + } + return props; +} + +static const std::vector& deviceProps() { + static const std::vector props = createDeviceProps(); + return props; +} + +static const std::string createDevicePropertiesJson( + size_t id, const cudaDeviceProp& props) { + return fmt::format(R"JSON( + {{ + "id": {}, "name": "{}", "totalGlobalMem": {}, + "computeMajor": {}, "computeMinor": {}, + "maxThreadsPerBlock": {}, "maxThreadsPerMultiprocessor": {}, + "regsPerBlock": {}, "regsPerMultiprocessor": {}, "warpSize": {}, + "sharedMemPerBlock": {}, "sharedMemPerMultiprocessor": {}, + "numSms": {}, "sharedMemPerBlockOptin": {} + }})JSON", + id, props.name, props.totalGlobalMem, + props.major, props.minor, + props.maxThreadsPerBlock, props.maxThreadsPerMultiProcessor, + props.regsPerBlock, props.regsPerMultiprocessor, props.warpSize, + props.sharedMemPerBlock, props.sharedMemPerMultiprocessor, + props.multiProcessorCount, props.sharedMemPerBlockOptin); +} + +static const std::string createDevicePropertiesJson() { + std::vector jsonProps; + const auto& props = deviceProps(); + for (size_t i = 0; i < props.size(); i++) { + jsonProps.push_back(createDevicePropertiesJson(i, props[i])); + } + return fmt::format("{}", fmt::join(jsonProps, ",")); +} + +const std::string& devicePropertiesJson() { + static std::string devicePropsJson = createDevicePropertiesJson(); + return devicePropsJson; +} + +int smCount(uint32_t deviceId) { + const std::vector &props = deviceProps(); + return deviceId >= props.size() ? 0 : + props[deviceId].multiProcessorCount; +} + +float kernelOccupancy( + uint32_t deviceId, + uint16_t registersPerThread, + int32_t staticSharedMemory, + int32_t dynamicSharedMemory, + int32_t blockX, + int32_t blockY, + int32_t blockZ, + float blocksPerSm) { + // Calculate occupancy + float occupancy = -1.0; + const std::vector &props = deviceProps(); + if (deviceId < props.size()) { + cudaOccFuncAttributes occFuncAttr; + occFuncAttr.maxThreadsPerBlock = INT_MAX; + occFuncAttr.numRegs = registersPerThread; + occFuncAttr.sharedSizeBytes = staticSharedMemory; + occFuncAttr.partitionedGCConfig = PARTITIONED_GC_OFF; + occFuncAttr.shmemLimitConfig = FUNC_SHMEM_LIMIT_DEFAULT; + occFuncAttr.maxDynamicSharedSizeBytes = 0; + const cudaOccDeviceState occDeviceState = {}; + int blockSize = blockX * blockY * blockZ; + size_t dynamicSmemSize = dynamicSharedMemory; + cudaOccResult occ_result; + cudaOccDeviceProp prop(props[deviceId]); + cudaOccError status = cudaOccMaxActiveBlocksPerMultiprocessor( + &occ_result, &prop, &occFuncAttr, &occDeviceState, + blockSize, dynamicSmemSize); + if (status == CUDA_OCC_SUCCESS) { + if (occ_result.activeBlocksPerMultiprocessor < blocksPerSm) { + blocksPerSm = occ_result.activeBlocksPerMultiprocessor; + } + occupancy = blocksPerSm * blockSize / + (float) props[deviceId].maxThreadsPerMultiProcessor; + } else { + LOG_EVERY_N(ERROR, 1000) << "Failed to calculate occupancy, status = " + << status; + } + } + return occupancy; +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CudaDeviceProperties.h b/plugins/tensorboard-plugins/libkineto/src/CudaDeviceProperties.h new file mode 100644 index 0000000000000000000000000000000000000000..b731fde0c2aab4c9bd3e97f475d204dad02986e7 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CudaDeviceProperties.h @@ -0,0 +1,31 @@ +/* + * Copyright (c) Kineto Contributors + * All rights reserved. + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#pragma once + +#include +#include + +namespace KINETO_NAMESPACE { + +int smCount(uint32_t deviceId); + +// Return estimated achieved occupancy for a kernel +float kernelOccupancy( + uint32_t deviceId, + uint16_t registersPerThread, + int32_t staticSharedMemory, + int32_t dynamicSharedMemory, + int32_t blockX, + int32_t blockY, + int32_t blockZ, + float blocks_per_sm); + +// Return compute properties for each device as a json string +const std::string& devicePropertiesJson(); + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivity.h b/plugins/tensorboard-plugins/libkineto/src/CuptiActivity.h new file mode 100644 index 0000000000000000000000000000000000000000..09c29504060ecbbac609aa2d021ff643f45c143e --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivity.h @@ -0,0 +1,114 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +#include "ITraceActivity.h" +#include "CuptiActivityPlatform.h" +#include "ThreadUtil.h" +#include "cupti_strings.h" + +namespace libkineto { + class ActivityLogger; +} + +namespace KINETO_NAMESPACE { + +using namespace libkineto; +struct TraceSpan; + +// These classes wrap the various CUPTI activity types +// into subclasses of ITraceActivity so that they can all be accessed +// using the ITraceActivity interface and logged via ActivityLogger. + +// Abstract base class, templated on Cupti activity type +template +struct CuptiActivity : public ITraceActivity { + explicit CuptiActivity(const T* activity, const ITraceActivity* linked) + : activity_(*activity), linked_(linked) {} + int64_t timestamp() const override { + return nsToUs(unixEpochTimestamp(activity_.start)); + } + int64_t duration() const override { + return nsToUs(activity_.end - activity_.start); + } + // TODO(T107507796): Deprecate ITraceActivity + int64_t correlationId() const override {return 0;} + int32_t getThreadId() const override {return 0;} + const ITraceActivity* linkedActivity() const override {return linked_;} + int flowType() const override {return kLinkAsyncCpuGpu;} + int flowId() const override {return correlationId();} + const T& raw() const {return activity_;} + const TraceSpan* traceSpan() const override {return nullptr;} + + protected: + const T& activity_; + const ITraceActivity* linked_{nullptr}; +}; + +// CUpti_ActivityAPI - CUDA runtime activities +struct RuntimeActivity : public CuptiActivity { + explicit RuntimeActivity( + const CUpti_ActivityAPI* activity, + const ITraceActivity* linked, + int32_t threadId) + : CuptiActivity(activity, linked), threadId_(threadId) {} + int64_t correlationId() const override {return activity_.correlationId;} + int64_t deviceId() const override {return processId();} + int64_t resourceId() const override {return threadId_;} + ActivityType type() const override {return ActivityType::CUDA_RUNTIME;} + bool flowStart() const override; + const std::string name() const override {return runtimeCbidName(activity_.cbid);} + void log(ActivityLogger& logger) const override; + const std::string metadataJson() const override; + + private: + const int32_t threadId_; +}; + +// CUpti_ActivityAPI - CUDA runtime activities +struct OverheadActivity : public CuptiActivity { + explicit OverheadActivity( + const CUpti_ActivityOverhead* activity, + const ITraceActivity* linked, + int32_t threadId=0) + : CuptiActivity(activity, linked), threadId_(threadId) {} + + int64_t timestamp() const override { + return nsToUs(unixEpochTimestamp(activity_.start)); + } + int64_t duration() const override { + return nsToUs(activity_.end - activity_.start); + } + // TODO: Update this with PID ordering + int64_t deviceId() const override {return -1;} + int64_t resourceId() const override {return threadId_;} + ActivityType type() const override {return ActivityType::OVERHEAD;} + bool flowStart() const override; + const std::string name() const override {return overheadKindString(activity_.overheadKind);} + void log(ActivityLogger& logger) const override; + const std::string metadataJson() const override; + + private: + const int32_t threadId_; +}; + +// Base class for GPU activities. +// Can also be instantiated directly. +template +struct GpuActivity : public CuptiActivity { + explicit GpuActivity(const T* activity, const ITraceActivity* linked) + : CuptiActivity(activity, linked) {} + int64_t correlationId() const override {return raw().correlationId;} + int64_t deviceId() const override {return raw().deviceId;} + int64_t resourceId() const override {return raw().streamId;} + ActivityType type() const override; + bool flowStart() const override {return false;} + const std::string name() const override; + void log(ActivityLogger& logger) const override; + const std::string metadataJson() const override; + const T& raw() const {return CuptiActivity::raw();} +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivity.tpp b/plugins/tensorboard-plugins/libkineto/src/CuptiActivity.tpp new file mode 100644 index 0000000000000000000000000000000000000000..1ff2dafe06b0016ce7b904ef4b55e047c69bcc1c --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivity.tpp @@ -0,0 +1,111 @@ + /* + * Copyright (c) Facebook, Inc. and its affiliates. + * All rights reserved. + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include "CuptiActivity.h" + +#include + +#include "Demangle.h" +#include "output_base.h" + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +template<> +inline const std::string GpuActivity::name() const { + return demangle(raw().name); +} + +template<> +inline ActivityType GpuActivity::type() const { + return ActivityType::CONCURRENT_KERNEL; +} + +static inline std::string memcpyName(uint8_t kind, uint8_t src, uint8_t dst) { + return fmt::format( + "Memcpy {} ({} -> {})", + memcpyKindString((CUpti_ActivityMemcpyKind)kind), + memoryKindString((CUpti_ActivityMemoryKind)src), + memoryKindString((CUpti_ActivityMemoryKind)dst)); +} + +template<> +inline ActivityType GpuActivity::type() const { + return ActivityType::GPU_MEMCPY; +} + +template<> +inline const std::string GpuActivity::name() const { + return memcpyName(raw().copyKind, raw().srcKind, raw().dstKind); +} + +template<> +inline ActivityType GpuActivity::type() const { + return ActivityType::GPU_MEMCPY; +} + +template<> +inline const std::string GpuActivity::name() const { + return memcpyName(raw().copyKind, raw().srcKind, raw().dstKind); +} + +template<> +inline const std::string GpuActivity::name() const { + const char* memory_kind = + memoryKindString((CUpti_ActivityMemoryKind)raw().memoryKind); + return fmt::format("Memset ({})", memory_kind); +} + +template<> +inline ActivityType GpuActivity::type() const { + return ActivityType::GPU_MEMSET; +} + +inline void RuntimeActivity::log(ActivityLogger& logger) const { + logger.handleActivity(*this); +} + +inline void OverheadActivity::log(ActivityLogger& logger) const { + logger.handleActivity(*this); +} + +inline bool OverheadActivity::flowStart() const { + return false; +} + +inline const std::string OverheadActivity::metadataJson() const { + return ""; +} + +template +inline void GpuActivity::log(ActivityLogger& logger) const { + logger.handleGpuActivity(*this); +} + +inline bool RuntimeActivity::flowStart() const { + return activity_.cbid == CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000 || + (activity_.cbid >= CUPTI_RUNTIME_TRACE_CBID_cudaMemcpy_v3020 && + activity_.cbid <= CUPTI_RUNTIME_TRACE_CBID_cudaMemset2DAsync_v3020) || + activity_.cbid == + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchCooperativeKernel_v9000 || + activity_.cbid == + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchCooperativeKernelMultiDevice_v9000; +} + +inline const std::string RuntimeActivity::metadataJson() const { + return fmt::format(R"JSON( + "cbid": {}, "correlation": {})JSON", + activity_.cbid, activity_.correlationId); +} + +template +inline const std::string GpuActivity::metadataJson() const { + return ""; +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityApi.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityApi.cpp new file mode 100644 index 0000000000000000000000000000000000000000..5718bed2f89b06cc702d1b82976cd42e5fceebd0 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityApi.cpp @@ -0,0 +1,343 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "CuptiActivityApi.h" + +#include +#include + +#include "cupti_call.h" +#include "Logger.h" + +using namespace std::chrono; + +namespace KINETO_NAMESPACE { + +// TODO: do we want this to be configurable? +// Set to 2MB to avoid constantly creating buffers (espeically for networks +// that has many small memcpy such as sparseNN) +// Consider putting this on huge pages? +constexpr size_t kBufSize(2 * 1024 * 1024); + +CuptiActivityApi& CuptiActivityApi::singleton() { + static CuptiActivityApi instance; + return instance; +} + +void CuptiActivityApi::pushCorrelationID(int id, CorrelationFlowType type) { +#ifdef HAS_CUPTI + if (!singleton().externalCorrelationEnabled_) { + return; + } + VLOG(2) << "pushCorrelationID(" << id << ")"; + switch(type) { + case Default: + CUPTI_CALL(cuptiActivityPushExternalCorrelationId( + CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0, id)); + break; + case User: + CUPTI_CALL(cuptiActivityPushExternalCorrelationId( + CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1, id)); + } +#endif +} + +void CuptiActivityApi::popCorrelationID(CorrelationFlowType type) { +#ifdef HAS_CUPTI + if (!singleton().externalCorrelationEnabled_) { + return; + } + switch(type) { + case Default: + CUPTI_CALL(cuptiActivityPopExternalCorrelationId( + CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0, nullptr)); + break; + case User: + CUPTI_CALL(cuptiActivityPopExternalCorrelationId( + CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1, nullptr)); + } +#endif +} + +static int getSMCount() { +#ifdef HAS_CUPTI + // There may be a simpler way to get the number of SMs.... + // Look for domain_d - this has 80 instances on Volta and + // 56 instances on Pascal, corresponding to the number of SMs + // FIXME: This does not work on Turing and later + uint32_t domainCount{0}; + CUPTI_CALL(cuptiDeviceGetNumEventDomains(0, &domainCount)); + std::vector ids(domainCount); + size_t sz = sizeof(CUpti_EventDomainID) * domainCount; + CUPTI_CALL(cuptiDeviceEnumEventDomains(0, &sz, ids.data())); + for (CUpti_EventDomainID id : ids) { + char name[16]; + name[0] = '\0'; + sz = sizeof(name); + CUPTI_CALL(cuptiEventDomainGetAttribute( + id, CUPTI_EVENT_DOMAIN_ATTR_NAME, &sz, name)); + if (strncmp(name, "domain_d", sz) == 0) { + uint32_t count{0}; + sz = sizeof(count); + CUPTI_CALL(cuptiDeviceGetEventDomainAttribute( + 0, id, CUPTI_EVENT_DOMAIN_ATTR_TOTAL_INSTANCE_COUNT, &sz, &count)); + return count; + } + } +#endif + + return -1; +} + +int CuptiActivityApi::smCount() { + static int sm_count = getSMCount(); + return sm_count; +} + +static bool nextActivityRecord( + uint8_t* buffer, + size_t valid_size, + CUpti_Activity*& record) { +#ifdef HAS_CUPTI + CUptiResult status = CUPTI_CALL_NOWARN( + cuptiActivityGetNextRecord(buffer, valid_size, &record)); + if (status != CUPTI_SUCCESS) { + if (status != CUPTI_ERROR_MAX_LIMIT_REACHED) { + CUPTI_CALL(status); + } + record = nullptr; + } +#endif + return record != nullptr; +} + +void CuptiActivityApi::setMaxBufferSize(int size) { + maxGpuBufferCount_ = 1 + size / kBufSize; +} + +void CuptiActivityApi::forceLoadCupti() { +#ifdef HAS_CUPTI + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL)); +#endif +} + +#ifdef HAS_CUPTI +void CUPTIAPI CuptiActivityApi::bufferRequestedTrampoline( + uint8_t** buffer, + size_t* size, + size_t* maxNumRecords) { + singleton().bufferRequested(buffer, size, maxNumRecords); +} + +void CuptiActivityApi::bufferRequested( + uint8_t** buffer, size_t* size, size_t* maxNumRecords) { + std::lock_guard guard(mutex_); + if (allocatedGpuTraceBuffers_.size() >= maxGpuBufferCount_) { + stopCollection = true; + LOG(WARNING) << "Exceeded max GPU buffer count (" + << allocatedGpuTraceBuffers_.size() + << " > " << maxGpuBufferCount_ + << ") - terminating tracing"; + } + + auto buf = std::make_unique(kBufSize); + *buffer = buf->data(); + *size = kBufSize; + + allocatedGpuTraceBuffers_[*buffer] = std::move(buf); + + *maxNumRecords = 0; +} +#endif + +std::unique_ptr +CuptiActivityApi::activityBuffers() { + { + std::lock_guard guard(mutex_); + if (allocatedGpuTraceBuffers_.empty()) { + return nullptr; + } + } + +#ifdef HAS_CUPTI + VLOG(1) << "Flushing GPU activity buffers"; + time_point t1; + if (VLOG_IS_ON(1)) { + t1 = system_clock::now(); + } + // Can't hold mutex_ during this call, since bufferCompleted + // will be called by libcupti and mutex_ is acquired there. + CUPTI_CALL(cuptiActivityFlushAll(CUPTI_ACTIVITY_FLAG_FLUSH_FORCED)); + if (VLOG_IS_ON(1)) { + flushOverhead = + duration_cast(system_clock::now() - t1).count(); + } +#endif + std::lock_guard guard(mutex_); + // Transfer ownership of buffers to caller. A new map is created on-demand. + return std::move(readyGpuTraceBuffers_); +} + +#ifdef HAS_CUPTI +int CuptiActivityApi::processActivitiesForBuffer( + uint8_t* buf, + size_t validSize, + std::function handler) { + int count = 0; + if (buf && validSize) { + CUpti_Activity* record{nullptr}; + while ((nextActivityRecord(buf, validSize, record))) { + handler(record); + ++count; + } + } + return count; +} +#endif + +const std::pair CuptiActivityApi::processActivities( + CuptiActivityBufferMap& buffers, + std::function handler) { + std::pair res{0, 0}; +#ifdef HAS_CUPTI + for (auto& pair : buffers) { + // No lock needed - only accessed from this thread + auto& buf = pair.second; + res.first += processActivitiesForBuffer(buf->data(), buf->size(), handler); + res.second += buf->size(); + } +#endif + return res; +} + +void CuptiActivityApi::clearActivities() { + { + std::lock_guard guard(mutex_); + if (allocatedGpuTraceBuffers_.empty()) { + return; + } + } + // Can't hold mutex_ during this call, since bufferCompleted + // will be called by libcupti and mutex_ is acquired there. +#ifdef HAS_CUPTI + CUPTI_CALL(cuptiActivityFlushAll(0)); +#endif + // FIXME: We might want to make sure we reuse + // the same memory during warmup and tracing. + // Also, try to use the amount of memory required + // for active tracing during warmup. + std::lock_guard guard(mutex_); + // Throw away ready buffers as a result of above flush + readyGpuTraceBuffers_ = nullptr; +} + +#ifdef HAS_CUPTI +void CUPTIAPI CuptiActivityApi::bufferCompletedTrampoline( + CUcontext ctx, + uint32_t streamId, + uint8_t* buffer, + size_t /* unused */, + size_t validSize) { + singleton().bufferCompleted(ctx, streamId, buffer, 0, validSize); +} + +void CuptiActivityApi::bufferCompleted( + CUcontext ctx, + uint32_t streamId, + uint8_t* buffer, + size_t /* unused */, + size_t validSize) { + + std::lock_guard guard(mutex_); + auto it = allocatedGpuTraceBuffers_.find(buffer); + if (it == allocatedGpuTraceBuffers_.end()) { + LOG(ERROR) << "bufferCompleted called with unknown buffer: " + << (void*) buffer; + return; + } + + if (!readyGpuTraceBuffers_) { + readyGpuTraceBuffers_ = std::make_unique(); + } + // Set valid size of buffer before moving to ready map + it->second->setSize(validSize); + (*readyGpuTraceBuffers_)[it->first] = std::move(it->second); + allocatedGpuTraceBuffers_.erase(it); + + // report any records dropped from the queue; to avoid unnecessary cupti + // API calls, we make it report only in verbose mode (it doesn't happen + // often in our testing anyways) + if (VLOG_IS_ON(1)) { + size_t dropped = 0; + CUPTI_CALL(cuptiActivityGetNumDroppedRecords(ctx, streamId, &dropped)); + if (dropped != 0) { + LOG(WARNING) << "Dropped " << dropped << " activity records"; + } + } +} +#endif + +void CuptiActivityApi::enableCuptiActivities( + const std::set& selected_activities) { +#ifdef HAS_CUPTI + static bool registered = false; + if (!registered) { + CUPTI_CALL( + cuptiActivityRegisterCallbacks(bufferRequestedTrampoline, bufferCompletedTrampoline)); + } + + externalCorrelationEnabled_ = false; + for (const auto& activity : selected_activities) { + if (activity == ActivityType::GPU_MEMCPY) { + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_MEMCPY)); + } + if (activity == ActivityType::GPU_MEMSET) { + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_MEMSET)); + } + if (activity == ActivityType::CONCURRENT_KERNEL) { + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL)); + } + if (activity == ActivityType::EXTERNAL_CORRELATION) { + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION)); + externalCorrelationEnabled_ = true; + } + if (activity == ActivityType::CUDA_RUNTIME) { + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_RUNTIME)); + } + if (activity == ActivityType::OVERHEAD) { + CUPTI_CALL(cuptiActivityEnable(CUPTI_ACTIVITY_KIND_OVERHEAD)); + } + } +#endif + + // Explicitly enabled, so reset this flag if set + stopCollection = false; +} + +void CuptiActivityApi::disableCuptiActivities( + const std::set& selected_activities) { +#ifdef HAS_CUPTI + for (const auto& activity : selected_activities) { + if (activity == ActivityType::GPU_MEMCPY) { + CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_MEMCPY)); + } + if (activity == ActivityType::GPU_MEMSET) { + CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_MEMSET)); + } + if (activity == ActivityType::CONCURRENT_KERNEL) { + CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL)); + } + if (activity == ActivityType::EXTERNAL_CORRELATION) { + CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION)); + } + if (activity == ActivityType::CUDA_RUNTIME) { + CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_RUNTIME)); + } + if (activity == ActivityType::OVERHEAD) { + CUPTI_CALL(cuptiActivityDisable(CUPTI_ACTIVITY_KIND_OVERHEAD)); + } + } + externalCorrelationEnabled_ = false; +#endif +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityApi.h b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityApi.h new file mode 100644 index 0000000000000000000000000000000000000000..92af51ecac9ec99181c4726c3849894de9e32b33 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityApi.h @@ -0,0 +1,100 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include + +#ifdef HAS_CUPTI +#include +#endif + +#include "ActivityType.h" +#include "CuptiActivityBuffer.h" + + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +#ifndef HAS_CUPTI +using CUpti_Activity = void; +#endif + +class CuptiActivityApi { + public: + enum CorrelationFlowType { + Default, + User + }; + + CuptiActivityApi() = default; + CuptiActivityApi(const CuptiActivityApi&) = delete; + CuptiActivityApi& operator=(const CuptiActivityApi&) = delete; + + virtual ~CuptiActivityApi() {} + + static CuptiActivityApi& singleton(); + + virtual int smCount(); + static void pushCorrelationID(int id, CorrelationFlowType type); + static void popCorrelationID(CorrelationFlowType type); + + void enableCuptiActivities( + const std::set& selected_activities); + void disableCuptiActivities( + const std::set& selected_activities); + void clearActivities(); + + virtual std::unique_ptr activityBuffers(); + + virtual const std::pair processActivities( + CuptiActivityBufferMap&, + std::function handler); + + void setMaxBufferSize(int size); + + std::atomic_bool stopCollection{false}; + int64_t flushOverhead{0}; + + static void forceLoadCupti(); + + private: +#ifdef HAS_CUPTI + int processActivitiesForBuffer( + uint8_t* buf, + size_t validSize, + std::function handler); + static void CUPTIAPI + bufferRequestedTrampoline(uint8_t** buffer, size_t* size, size_t* maxNumRecords); + static void CUPTIAPI bufferCompletedTrampoline( + CUcontext ctx, + uint32_t streamId, + uint8_t* buffer, + size_t /* unused */, + size_t validSize); +#endif // HAS_CUPTI + + int maxGpuBufferCount_{0}; + CuptiActivityBufferMap allocatedGpuTraceBuffers_; + std::unique_ptr readyGpuTraceBuffers_; + std::mutex mutex_; + bool externalCorrelationEnabled_{false}; + + protected: +#ifdef HAS_CUPTI + void bufferRequested(uint8_t** buffer, size_t* size, size_t* maxNumRecords); + void bufferCompleted( + CUcontext ctx, + uint32_t streamId, + uint8_t* buffer, + size_t /* unused */, + size_t validSize); +#endif +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityBuffer.h b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityBuffer.h new file mode 100644 index 0000000000000000000000000000000000000000..1c3fbef62c8d8f42ff5da1718e20315cc1ba95d5 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityBuffer.h @@ -0,0 +1,51 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include + +#include "ITraceActivity.h" + +namespace KINETO_NAMESPACE { + +class CuptiActivityBuffer { + public: + explicit CuptiActivityBuffer(size_t size) : size_(size) { + buf_.reserve(size); + } + CuptiActivityBuffer() = delete; + CuptiActivityBuffer& operator=(const CuptiActivityBuffer&) = delete; + CuptiActivityBuffer(CuptiActivityBuffer&&) = default; + CuptiActivityBuffer& operator=(CuptiActivityBuffer&&) = default; + + size_t size() const { + return size_; + } + + void setSize(size_t size) { + assert(size <= buf_.capacity()); + size_ = size; + } + + uint8_t* data() { + return buf_.data(); + } + + private: + + std::vector buf_; + size_t size_; + + std::vector> wrappers_; +}; + +using CuptiActivityBufferMap = + std::map>; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityPlatform.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityPlatform.cpp new file mode 100644 index 0000000000000000000000000000000000000000..fa2ef2f3a8c9cbb7f10567c158d6ee3e8e26eed0 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityPlatform.cpp @@ -0,0 +1,31 @@ +#include + +namespace chrono = std::chrono; + +namespace KINETO_NAMESPACE { + +#ifdef _WIN32 +uint64_t epochs_diff() { + // On Windows, steady_clock wraps the QueryPerformanceCounter function. + // https://docs.microsoft.com/en-us/cpp/standard-library/steady-clock-struct?view=msvc-160 + auto steady = + chrono::time_point_cast(chrono::steady_clock::now()); + auto system = + chrono::time_point_cast(chrono::system_clock::now()); + + auto time_since_unix = system.time_since_epoch().count(); + auto time_since_boot = steady.time_since_epoch().count(); + return time_since_unix - time_since_boot; +} + +uint64_t unixEpochTimestamp(uint64_t ts) { + static uint64_t diff = epochs_diff(); + return ts + diff; +} +#else +uint64_t unixEpochTimestamp(uint64_t ts) { + return ts; +} +#endif // _WIN32 + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityPlatform.h b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityPlatform.h new file mode 100644 index 0000000000000000000000000000000000000000..78de8373d5fe391d48edffc897aff6893aa6f54f --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityPlatform.h @@ -0,0 +1,12 @@ +#pragma once + +#include + +namespace KINETO_NAMESPACE { + +// cupti's timestamps are platform specific. This function convert the raw +// cupti timestamp to time since unix epoch. So that on different platform, +// correction can work correctly. +uint64_t unixEpochTimestamp(uint64_t ts); + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityProfiler.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityProfiler.cpp new file mode 100644 index 0000000000000000000000000000000000000000..97c23ef047d75aff75b56773a20801ce83fb1653 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityProfiler.cpp @@ -0,0 +1,841 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "CuptiActivityProfiler.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef HAS_CUPTI +#include +#endif + +#include "Config.h" +#include "time_since_epoch.h" +#ifdef HAS_CUPTI +#include "CuptiActivity.h" +#include "CuptiActivity.tpp" +#include "CuptiActivityApi.h" +#endif // HAS_CUPTI +#ifdef HAS_ROCTRACER +#include "RoctracerActivityApi.h" +#endif +#include "output_base.h" + +#include "Logger.h" +#include "ThreadUtil.h" + +using namespace std::chrono; +using namespace libkineto; +using std::string; + +namespace KINETO_NAMESPACE { + +void CuptiActivityProfiler::transferCpuTrace( + std::unique_ptr cpuTrace) { + std::lock_guard guard(mutex_); + const string& trace_name = cpuTrace->span.name; + if (currentRunloopState_ != RunloopState::CollectTrace && + currentRunloopState_ != RunloopState::ProcessTrace) { + VLOG(0) << "Trace collection not in progress - discarding span " + << trace_name; + return; + } + + cpuTrace->span.iteration = iterationCountMap_[trace_name]++; + + VLOG(0) << "Received iteration " << cpuTrace->span.iteration << " of span " + << trace_name << " (" << cpuTrace->activities.size() << " activities / " + << cpuTrace->gpuOpCount << " gpu activities)"; + traceBuffers_->cpu.push_back(std::move(cpuTrace)); +} + +#ifdef HAS_ROCTRACER +CuptiActivityProfiler::CuptiActivityProfiler(RoctracerActivityApi& cupti, bool cpuOnly) +#else +CuptiActivityProfiler::CuptiActivityProfiler(CuptiActivityApi& cupti, bool cpuOnly) +#endif + : cupti_(cupti), + flushOverhead_{0, 0}, + setupOverhead_{0, 0}, + cpuOnly_{cpuOnly}, + currentRunloopState_{RunloopState::WaitForRequest}, + stopCollection_{false} {} + +void CuptiActivityProfiler::processTraceInternal(ActivityLogger& logger) { + LOG(INFO) << "Processing " << traceBuffers_->cpu.size() + << " CPU buffers"; + VLOG(0) << "Profile time range: " << captureWindowStartTime_ << " - " + << captureWindowEndTime_; + logger.handleTraceStart(metadata_); + for (auto& cpu_trace : traceBuffers_->cpu) { + string trace_name = cpu_trace->span.name; + VLOG(0) << "Processing CPU buffer for " << trace_name << " (" + << cpu_trace->span.iteration << ") - " + << cpu_trace->activities.size() << " records"; + VLOG(0) << "Span time range: " << cpu_trace->span.startTime << " - " + << cpu_trace->span.endTime; + processCpuTrace(*cpu_trace, logger); + LOGGER_OBSERVER_ADD_EVENT_COUNT(cpu_trace->activities.size()); + } + +#ifdef HAS_CUPTI + if (!cpuOnly_) { + VLOG(0) << "Retrieving GPU activity buffers"; + traceBuffers_->gpu = cupti_.activityBuffers(); + if (VLOG_IS_ON(1)) { + addOverheadSample(flushOverhead_, cupti_.flushOverhead); + } + if (traceBuffers_->gpu) { + const auto count_and_size = cupti_.processActivities( + *traceBuffers_->gpu, + std::bind(&CuptiActivityProfiler::handleCuptiActivity, this, std::placeholders::_1, &logger)); + LOG(INFO) << "Processed " << count_and_size.first + << " GPU records (" << count_and_size.second << " bytes)"; + LOGGER_OBSERVER_ADD_EVENT_COUNT(count_and_size.first); + } + } +#endif // HAS_CUPTI +#ifdef HAS_ROCTRACER + if (!cpuOnly_) { + VLOG(0) << "Retrieving GPU activity buffers"; + const int count = cupti_.processActivities(logger); + LOG(INFO) << "Processed " << count + << " GPU records"; + LOGGER_OBSERVER_ADD_EVENT_COUNT(count); + } +#endif // HAS_ROCTRACER + + for (const auto& session : sessions_){ + LOG(INFO) << "Processing child profiler trace"; + session->processTrace(logger); + } + + finalizeTrace(*config_, logger); +} + +CuptiActivityProfiler::CpuGpuSpanPair& CuptiActivityProfiler::recordTraceSpan( + TraceSpan& span, int gpuOpCount) { + TraceSpan gpu_span(gpuOpCount, span.iteration, span.name, "GPU: "); + auto& iterations = traceSpans_[span.name]; + iterations.push_back({span, gpu_span}); + return iterations.back(); +} + +void CuptiActivityProfiler::processCpuTrace( + libkineto::CpuTraceBuffer& cpuTrace, + ActivityLogger& logger) { + if (cpuTrace.activities.size() == 0) { + LOG(WARNING) << "CPU trace is empty!"; + return; + } + + CpuGpuSpanPair& span_pair = recordTraceSpan(cpuTrace.span, cpuTrace.gpuOpCount); + TraceSpan& cpu_span = span_pair.first; + for (auto const& act : cpuTrace.activities) { + VLOG(2) << act.correlationId() << ": OP " << act.activityName; + if (config_->selectedActivityTypes().count(act.type())) { + act.log(logger); + } + clientActivityTraceMap_[act.correlationId()] = &span_pair; + activityMap_[act.correlationId()] = &act; + + recordThreadInfo(act.resourceId(), act.getThreadId(), act.deviceId()); + } + logger.handleTraceSpan(cpu_span); +} + +#ifdef HAS_CUPTI +inline void CuptiActivityProfiler::handleCorrelationActivity( + const CUpti_ActivityExternalCorrelation* correlation) { + if (correlation->externalKind == CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM0) { + cpuCorrelationMap_[correlation->correlationId] = correlation->externalId; + } else if (correlation->externalKind == CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1){ + userCorrelationMap_[correlation->correlationId] = correlation->externalId; + } else { + LOG(ERROR) << "Invalid CUpti_ActivityExternalCorrelation sent to handleCuptiActivity"; + } +} +#endif // HAS_CUPTI + +static GenericTraceActivity createUserGpuSpan( + const libkineto::ITraceActivity& cpuTraceActivity, + const libkineto::ITraceActivity& gpuTraceActivity) { + GenericTraceActivity res( + *cpuTraceActivity.traceSpan(), + ActivityType::GPU_USER_ANNOTATION, + cpuTraceActivity.name()); + res.startTime = gpuTraceActivity.timestamp(); + res.device = gpuTraceActivity.deviceId(); + res.resource = gpuTraceActivity.resourceId(); + res.endTime = + gpuTraceActivity.timestamp() + gpuTraceActivity.duration(); + res.id = cpuTraceActivity.correlationId(); + return res; +} + +void CuptiActivityProfiler::GpuUserEventMap::insertOrExtendEvent( + const ITraceActivity& userActivity, + const ITraceActivity& gpuActivity) { + StreamKey key(gpuActivity.deviceId(), gpuActivity.resourceId()); + CorrelationSpanMap& correlationSpanMap = streamSpanMap_[key]; + auto it = correlationSpanMap.find(userActivity.correlationId()); + if (it == correlationSpanMap.end()) { + auto it_success = correlationSpanMap.insert({ + userActivity.correlationId(), createUserGpuSpan(userActivity, gpuActivity) + }); + it = it_success.first; + } + GenericTraceActivity& span = it->second; + if (gpuActivity.timestamp() < span.startTime || span.startTime == 0) { + span.startTime = gpuActivity.timestamp(); + } + int64_t gpu_activity_end = gpuActivity.timestamp() + gpuActivity.duration(); + if (gpu_activity_end > span.endTime) { + span.endTime = gpu_activity_end; + } +} + +const CuptiActivityProfiler::CpuGpuSpanPair& CuptiActivityProfiler::defaultTraceSpan() { + static TraceSpan span(0, 0, "Unknown", ""); + static CpuGpuSpanPair span_pair(span, span); + return span_pair; +} + +void CuptiActivityProfiler::GpuUserEventMap::logEvents(ActivityLogger *logger) { + for (auto const& streamMapPair : streamSpanMap_) { + for (auto const& correlationSpanPair : streamMapPair.second) { + correlationSpanPair.second.log(*logger); + } + } +} + +#ifdef HAS_CUPTI +inline bool CuptiActivityProfiler::outOfRange(const ITraceActivity& act) { + bool out_of_range = act.timestamp() < captureWindowStartTime_ || + (act.timestamp() + act.duration()) > captureWindowEndTime_; + if (out_of_range) { + VLOG(2) << "TraceActivity outside of profiling window: " << act.name() + << " (" << act.timestamp() << " < " << captureWindowStartTime_ << " or " + << (act.timestamp() + act.duration()) << " > " << captureWindowEndTime_; + } + return out_of_range; +} + +inline static bool isBlockListedRuntimeCbid(CUpti_CallbackId cbid) { + // Some CUDA calls that are very frequent and also not very interesting. + // Filter these out to reduce trace size. + if (cbid == CUPTI_RUNTIME_TRACE_CBID_cudaGetDevice_v3020 || + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaSetDevice_v3020 || + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaGetLastError_v3020 || + // Don't care about cudaEvents + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaEventCreate_v3020 || + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaEventCreateWithFlags_v3020 || + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaEventRecord_v3020 || + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaEventDestroy_v3020 || + cbid == CUPTI_RUNTIME_TRACE_CBID_cudaEventSynchronize_v3020) { + return true; + } + + return false; +} + +void CuptiActivityProfiler::handleRuntimeActivity( + const CUpti_ActivityAPI* activity, + ActivityLogger* logger) { + if (isBlockListedRuntimeCbid(activity->cbid)) { + return; + } + VLOG(2) << activity->correlationId + << ": CUPTI_ACTIVITY_KIND_RUNTIME, cbid=" << activity->cbid + << " tid=" << activity->threadId; + int32_t tid = activity->threadId; + const auto& it = resourceInfo_.find({processId(), tid}); + if (it != resourceInfo_.end()) { + tid = it->second.id; + } + const ITraceActivity* linked = linkedActivity( + activity->correlationId, cpuCorrelationMap_); + const auto& runtime_activity = + traceBuffers_->addActivityWrapper(RuntimeActivity(activity, linked, tid)); + checkTimestampOrder(&runtime_activity); + if (outOfRange(runtime_activity)) { + return; + } + runtime_activity.log(*logger); +} + +void CuptiActivityProfiler::handleOverheadActivity( + const CUpti_ActivityOverhead* activity, + ActivityLogger* logger) { + VLOG(2) << ": CUPTI_ACTIVITY_KIND_OVERHEAD" << " overheadKind=" << activity->overheadKind; + + const auto& overhead_activity = + traceBuffers_->addActivityWrapper(OverheadActivity(activity, nullptr)); + overhead_activity.log(*logger); +} + + +inline void CuptiActivityProfiler::updateGpuNetSpan( + const ITraceActivity& gpuOp) { + if (!gpuOp.linkedActivity()) { + VLOG(0) << "Missing linked activity"; + return; + } + const auto& it = clientActivityTraceMap_.find( + gpuOp.linkedActivity()->correlationId()); + if (it == clientActivityTraceMap_.end()) { + // No correlation id mapping? + return; + } + TraceSpan& gpu_span = it->second->second; + if (gpuOp.timestamp() < gpu_span.startTime || gpu_span.startTime == 0) { + gpu_span.startTime = gpuOp.timestamp(); + } + if ((gpuOp.timestamp() + gpuOp.duration()) > gpu_span.endTime) { + gpu_span.endTime = gpuOp.timestamp() + gpuOp.duration(); + } +} + +// I've observed occasional broken timestamps attached to GPU events... +void CuptiActivityProfiler::checkTimestampOrder(const ITraceActivity* act1) { + // Correlated GPU runtime activity cannot + // have timestamp greater than the GPU activity's + const auto& it = correlatedCudaActivities_.find(act1->correlationId()); + if (it == correlatedCudaActivities_.end()) { + correlatedCudaActivities_.insert({act1->correlationId(), act1}); + return; + } + + // Activities may be appear in the buffers out of order. + // If we have a runtime activity in the map, it should mean that we + // have a GPU activity passed in, and vice versa. + const ITraceActivity* act2 = it->second; + if (act2->type() == ActivityType::CUDA_RUNTIME) { + // Buffer is out-of-order. + // Swap so that runtime activity is first for the comparison below. + std::swap(act1, act2); + } + if (act1->timestamp() > act2->timestamp()) { + LOG(WARNING) << "GPU op timestamp (" << act2->timestamp() + << ") < runtime timestamp (" << act1->timestamp() << ") by " + << act1->timestamp() - act2->timestamp() << "us"; + LOG(WARNING) << "Name: " << act2->name() + << " Device: " << act2->deviceId() + << " Stream: " << act2->resourceId(); + } +} + +inline void CuptiActivityProfiler::handleGpuActivity( + const ITraceActivity& act, + ActivityLogger* logger) { + if (outOfRange(act)) { + return; + } + checkTimestampOrder(&act); + VLOG(2) << act.correlationId() << ": " + << act.name(); + recordStream(act.deviceId(), act.resourceId(), ""); + act.log(*logger); + updateGpuNetSpan(act); + if (config_->selectedActivityTypes().count(ActivityType::GPU_USER_ANNOTATION)) { + const auto& it = userCorrelationMap_.find(act.correlationId()); + if (it != userCorrelationMap_.end()) { + const auto& it2 = activityMap_.find(it->second); + if (it2 != activityMap_.end()) { + recordStream(act.deviceId(), act.resourceId(), "context"); + gpuUserEventMap_.insertOrExtendEvent(*it2->second, act); + } + } + } +} + +const ITraceActivity* CuptiActivityProfiler::linkedActivity( + int32_t correlationId, + const std::unordered_map& correlationMap) { + const auto& it = correlationMap.find(correlationId); + if (it != correlationMap.end()) { + const auto& it2 = activityMap_.find(it->second); + if (it2 != activityMap_.end()) { + return it2->second; + } + } + return nullptr; +} + +template +inline void CuptiActivityProfiler::handleGpuActivity( + const T* act, ActivityLogger* logger) { + const ITraceActivity* linked = linkedActivity( + act->correlationId, cpuCorrelationMap_); + const auto& gpu_activity = + traceBuffers_->addActivityWrapper(GpuActivity(act, linked)); + handleGpuActivity(gpu_activity, logger); +} + +void CuptiActivityProfiler::handleCuptiActivity(const CUpti_Activity* record, ActivityLogger* logger) { + switch (record->kind) { + case CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION: + handleCorrelationActivity( + reinterpret_cast( + record)); + break; + case CUPTI_ACTIVITY_KIND_RUNTIME: + handleRuntimeActivity( + reinterpret_cast(record), logger); + break; + case CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL: + handleGpuActivity( + reinterpret_cast(record), logger); + break; + case CUPTI_ACTIVITY_KIND_MEMCPY: + handleGpuActivity( + reinterpret_cast(record), logger); + break; + case CUPTI_ACTIVITY_KIND_MEMCPY2: + handleGpuActivity( + reinterpret_cast(record), logger); + break; + case CUPTI_ACTIVITY_KIND_MEMSET: + handleGpuActivity( + reinterpret_cast(record), logger); + break; + case CUPTI_ACTIVITY_KIND_OVERHEAD: + handleOverheadActivity (reinterpret_cast(record), logger); + break; + default: + LOG(WARNING) << "Unexpected activity type: " << record->kind; + break; + } +} +#endif // HAS_CUPTI + +void CuptiActivityProfiler::configureChildProfilers() { + // If child profilers are enabled create profiler sessions + for (auto& profiler: profilers_) { + int64_t start_time_ms = duration_cast( + profileStartTime_.time_since_epoch()).count(); + LOG(INFO) << "Running child profiler " << profiler->name() << " for " + << config_->activitiesDuration().count() << " ms"; + auto session = profiler->configure( + start_time_ms, + config_->activitiesDuration().count(), + config_->selectedActivityTypes(), + *config_ + ); + if (session) { + sessions_.push_back(std::move(session)); + } + } +} + +void CuptiActivityProfiler::configure( + const Config& config, + const time_point& now) { + std::lock_guard guard(mutex_); + if (isActive()) { + LOG(ERROR) << "CuptiActivityProfiler already busy, terminating"; + return; + } + + config_ = config.clone(); + + if (config_->activitiesDuration().count() == 0) { + // Use default if not specified + config_->setActivitiesDuration( + config_->activitiesDurationDefault()); + } + + // Ensure we're starting in a clean state + resetTraceData(); + +#if !USE_GOOGLE_LOG + // Add a LoggerObserverCollector to collect all logs during the trace. + loggerCollectorMetadata_ = std::make_unique(); + Logger::addLoggerObserver(loggerCollectorMetadata_.get()); +#endif // !USE_GOOGLE_LOG + + profileStartTime_ = config_->requestTimestamp(); + + if (config_->hasProfileStartIteration()) { + profileStartIter_ = config_->profileStartIteration(); + profileEndIter_ = profileStartIter_ + config_->activitiesRunIterations(); + } else { + + profileStartIter_ = -1; + profileEndIter_ = (std::numeric_limits::max)(); + + if (profileStartTime_ < now) { + LOG(ERROR) << "Not starting tracing - start timestamp is in the past. Time difference (ms): " << duration_cast(now - profileStartTime_).count(); + return; + } else if ((profileStartTime_ - now) < config_->activitiesWarmupDuration()) { + LOG(ERROR) << "Not starting tracing - insufficient time for warmup. Time to warmup (ms): " << duration_cast(profileStartTime_ - now).count() ; + return; + } + } + + if (LOG_IS_ON(INFO)) { + config_->printActivityProfilerConfig(LIBKINETO_DBG_STREAM); + } + if (!cpuOnly_ && !libkineto::api().client()) { + if (profileStartIter_ < 0) { + LOG(INFO) << "GPU-only tracing for " + << config_->activitiesDuration().count() << "ms"; + } else { + LOG(INFO) << "GPU-only tracing for " + << config_->activitiesRunIterations() << " iterations"; + } + } + + // Set useful metadata into the logger. + LOGGER_OBSERVER_SET_TRACE_DURATION_MS(config_->activitiesDuration().count()); + if (!config_->requestTraceID().empty()) { + LOGGER_OBSERVER_SET_TRACE_ID(config_->requestTraceID()); + } + if (!config_->requestGroupTraceID().empty()) { + LOGGER_OBSERVER_SET_GROUP_TRACE_ID(config_->requestGroupTraceID()); + } + LOGGER_OBSERVER_ADD_DESTINATION(config_->activitiesLogUrl()); + +#if defined(HAS_CUPTI) || defined(HAS_ROCTRACER) + if (!cpuOnly_) { + // Enabling CUPTI activity tracing incurs a larger perf hit at first, + // presumably because structures are allocated and initialized, callbacks + // are activated etc. After a while the overhead decreases and stabilizes. + // It's therefore useful to perform some warmup before starting recording. + LOG(INFO) << "Enabling GPU tracing"; + cupti_.setMaxBufferSize(config_->activitiesMaxGpuBufferSize()); + + time_point timestamp; + if (VLOG_IS_ON(1)) { + timestamp = system_clock::now(); + } +#ifdef HAS_CUPTI + cupti_.enableCuptiActivities(config_->selectedActivityTypes()); +#else + cupti_.enableActivities(config_->selectedActivityTypes()); +#endif + if (VLOG_IS_ON(1)) { + auto t2 = system_clock::now(); + addOverheadSample( + setupOverhead_, duration_cast(t2 - timestamp).count()); + } + } +#endif // HAS_CUPTI || HAS_ROCTRACER + + if (profilers_.size() > 0) { + configureChildProfilers(); + } + + if (libkineto::api().client()) { + libkineto::api().client()->warmup(config_->isOpInputsCollectionEnabled()); + } + if (profileStartIter_ >= 0) { + LOG(INFO) << "Tracing starting on iteration = " << profileStartIter_; + } else { + LOG(INFO) << "Tracing starting in " + << duration_cast(profileStartTime_ - now).count() << "s"; + } + + traceBuffers_ = std::make_unique(); + captureWindowStartTime_ = captureWindowEndTime_ = 0; + currentRunloopState_ = RunloopState::Warmup; +} + +void CuptiActivityProfiler::startTraceInternal(const time_point& now) { + captureWindowStartTime_ = libkineto::timeSinceEpoch(now); + VLOG(0) << "Warmup -> CollectTrace"; + for (auto& session: sessions_){ + LOG(INFO) << "Starting child profiler session"; + session->start(); + } + currentRunloopState_ = RunloopState::CollectTrace; +} + +void CuptiActivityProfiler::stopTraceInternal(const time_point& now) { + if (captureWindowEndTime_ == 0) { + captureWindowEndTime_ = libkineto::timeSinceEpoch(now); + } +#if defined(HAS_CUPTI) || defined(HAS_ROCTRACER) + if (!cpuOnly_) { + time_point timestamp; + if (VLOG_IS_ON(1)) { + timestamp = system_clock::now(); + } +#ifdef HAS_CUPTI + cupti_.disableCuptiActivities(config_->selectedActivityTypes()); +#else + cupti_.disableActivities(config_->selectedActivityTypes()); +#endif + if (VLOG_IS_ON(1)) { + auto t2 = system_clock::now(); + addOverheadSample( + setupOverhead_, duration_cast(t2 - timestamp).count()); + } + } +#endif // HAS_CUPTI || HAS_ROCTRACER + + if (currentRunloopState_ == RunloopState::CollectTrace) { + VLOG(0) << "CollectTrace -> ProcessTrace"; + } else { + LOG(WARNING) << "Called stopTrace with state == " << + static_cast::type>( + currentRunloopState_.load()); + } + for (auto& session: sessions_){ + LOG(INFO) << "Stopping child profiler session"; + session->stop(); + } + currentRunloopState_ = RunloopState::ProcessTrace; +} + +void CuptiActivityProfiler::resetInternal() { + resetTraceData(); + currentRunloopState_ = RunloopState::WaitForRequest; +} + +bool CuptiActivityProfiler::isWarmupDone( + const time_point& now, + int64_t currentIter) const { + // is it a time based config + if (profileStartIter_ < 0) { + // qualify that this check is not being called from application step() API + // this avoids races between the step() API and periodically invoked + // profiler run loop step() method + return (currentIter < 0) && (now >= profileStartTime_); + } + // this is an iteration based config + if (currentIter < 0) { + return false; + } + return currentIter >= profileStartIter_; +} + +bool CuptiActivityProfiler::isCollectionDone( + const time_point& now, + int64_t currentIter) const { + // is it a time based config + if (profileStartIter_ < 0) { + // qualify that this check is not being called from application step() API + return (currentIter < 0) && (now >= profileEndTime_); + } + // this is an iteration based config + if (currentIter < 0) { + return false; + } + return currentIter >= profileEndIter_; +} + +const time_point CuptiActivityProfiler::performRunLoopStep( + const time_point& now, + const time_point& nextWakeupTime, + int64_t currentIter) { + auto new_wakeup_time = nextWakeupTime; + bool warmup_done = false, collection_done = false; + + VLOG_IF(1, currentIter >= 0) << "Run loop on application step(), iteration = " + << currentIter; + + switch (currentRunloopState_) { + case RunloopState::WaitForRequest: + VLOG(1) << "State: WaitForRequest"; + // Nothing to do + break; + + case RunloopState::Warmup: + VLOG(1) << "State: Warmup"; + warmup_done = isWarmupDone(now, currentIter); +#if defined(HAS_CUPTI) || defined(HAS_ROCTRACER) + // Flushing can take a while so avoid doing it close to the start time + if (!cpuOnly_ && currentIter < 0 && + (profileStartIter_ >= 0 || nextWakeupTime < profileStartTime_)) { + cupti_.clearActivities(); + } + + if (cupti_.stopCollection) { + // Go to process trace to clear any outstanding buffers etc + LOG(WARNING) << "Trace terminated during warmup"; + std::lock_guard guard(mutex_); + stopTraceInternal(now); + resetInternal(); + VLOG(0) << "Warmup -> WaitForRequest"; + break; + } +#endif // HAS_CUPTI || HAS_ROCTRACER + + if (warmup_done) { + UST_LOGGER_MARK_COMPLETED(kWarmUpStage); + if (profileStartIter_ < 0 && + (now > profileStartTime_ + milliseconds(10))) { + LOG(WARNING) + << "Tracing started " + << duration_cast(now - profileStartTime_).count() + << "ms late!"; + } else { + LOG(INFO) << "Tracing started"; + } + startTrace(now); + if (libkineto::api().client()) { + libkineto::api().client()->start(); + } + if (nextWakeupTime > profileEndTime_) { + new_wakeup_time = profileEndTime_; + } + } else if (nextWakeupTime > profileStartTime_) { + new_wakeup_time = profileStartTime_; + } + + break; + + case RunloopState::CollectTrace: + VLOG(1) << "State: CollectTrace"; + // captureWindowStartTime_ can be set by external threads, + // so recompute end time. + // FIXME: Is this a good idea for synced start? + if (profileStartIter_ < 0) { + std::lock_guard guard(mutex_); + profileEndTime_ = time_point( + microseconds(captureWindowStartTime_)) + + config_->activitiesDuration(); + } + + collection_done = isCollectionDone(now, currentIter); + + // TODO revisit stopCollection_ is not used right now + if (collection_done || stopCollection_.exchange(false) +#if defined(HAS_CUPTI) || defined(HAS_ROCTRACER) + || cupti_.stopCollection +#endif // HAS_CUPTI || HAS_ROCTRACER + ){ + // Update runloop state first to prevent further updates to shared state + LOG(INFO) << "Tracing complete."; + if (currentIter > 0) { + LOG(INFO) << "This state change was invoked by application's step() call"; + } + // FIXME: Need to communicate reason for stopping on errors + if (libkineto::api().client()) { + libkineto::api().client()->stop(); + } + std::lock_guard guard(mutex_); + stopTraceInternal(now); + VLOG_IF(0, collection_done) << "Reached profile end time"; + + UST_LOGGER_MARK_COMPLETED(kCollectionStage); + } else if (profileStartIter_ >= 0) { + // nothing to do here + } else if (now < profileEndTime_ && profileEndTime_ < nextWakeupTime) { + new_wakeup_time = profileEndTime_; + } + + break; + + case RunloopState::ProcessTrace: + VLOG(1) << "State: ProcessTrace"; + // skip this state transition if it called from the step() api + // of the profiler. + // else it could lead to a race between the profiler thread and an + // application thread calling step() + if (currentIter >= 0) { + return new_wakeup_time; + } + // FIXME: Probably want to allow interruption here + // for quickly handling trace request via synchronous API + std::lock_guard guard(mutex_); + processTraceInternal(*logger_); + UST_LOGGER_MARK_COMPLETED(kPostProcessingStage); + resetInternal(); + VLOG(0) << "ProcessTrace -> WaitForRequest"; + break; + } + + return new_wakeup_time; +} + +void CuptiActivityProfiler::finalizeTrace(const Config& config, ActivityLogger& logger) { + LOG(INFO) << "Recorded nets:"; + { + for (const auto& it : iterationCountMap_) { + LOG(INFO) << it.first << ": " << it.second << " iterations"; + } + iterationCountMap_.clear(); + } + + // Process names + int32_t pid = processId(); + string process_name = processName(pid); + if (!process_name.empty()) { + logger.handleDeviceInfo( + {pid, process_name, "CPU"}, captureWindowStartTime_); + if (!cpuOnly_) { + // GPU events use device id as pid (0-7). + constexpr int kMaxGpuCount = 8; + for (int gpu = 0; gpu < kMaxGpuCount; gpu++) { + logger.handleDeviceInfo( + {gpu, process_name, fmt::format("GPU {}", gpu)}, + captureWindowStartTime_); + } + } + } + + // Thread & stream info + for (auto pair : resourceInfo_) { + const auto& resource = pair.second; + logger.handleResourceInfo(resource, captureWindowStartTime_); + } + + for (const auto& iterations : traceSpans_) { + for (const auto& span_pair : iterations.second) { + const TraceSpan& gpu_span = span_pair.second; + if (gpu_span.opCount > 0) { + logger.handleTraceSpan(gpu_span); + } + } + } + + // Overhead info + overheadInfo_.push_back(ActivityLogger::OverheadInfo("CUPTI Overhead")); + for(const auto& info : overheadInfo_) { + logger.handleOverheadInfo(info, captureWindowStartTime_); + } + + gpuUserEventMap_.logEvents(&logger); + +#if !USE_GOOGLE_LOG + // Save logs from LoggerCollector objects into Trace metadata. + auto LoggerMD = loggerCollectorMetadata_->extractCollectorMetadata(); + std::unordered_map> LoggerMDString; + for (auto& md : LoggerMD) { + LoggerMDString[toString(md.first)] = md.second; + } +#endif // !USE_GOOGLE_LOG + + logger.finalizeTrace(config, std::move(traceBuffers_), captureWindowEndTime_, LoggerMDString); +} + +void CuptiActivityProfiler::resetTraceData() { +#if defined(HAS_CUPTI) || defined(HAS_ROCTRACER) + if (!cpuOnly_) { + cupti_.clearActivities(); + } +#endif // HAS_CUPTI || HAS_ROCTRACER + activityMap_.clear(); + cpuCorrelationMap_.clear(); + correlatedCudaActivities_.clear(); + gpuUserEventMap_.clear(); + traceSpans_.clear(); + clientActivityTraceMap_.clear(); + traceBuffers_ = nullptr; + metadata_.clear(); + sessions_.clear(); +#if !USE_GOOGLE_LOG + Logger::removeLoggerObserver(loggerCollectorMetadata_.get()); +#endif // !USE_GOOGLE_LOG +} + + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiActivityProfiler.h b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityProfiler.h new file mode 100644 index 0000000000000000000000000000000000000000..208833a4db720429982a63ed72ffa4762ef00bd0 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiActivityProfiler.h @@ -0,0 +1,364 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "ThreadUtil.h" +#include "TraceSpan.h" +#include "libkineto.h" +#include "output_base.h" +#include "GenericTraceActivity.h" +#include "IActivityProfiler.h" +#include "LoggerCollector.h" + +namespace KINETO_NAMESPACE { + +class Config; +class CuptiActivityApi; +class RoctracerActivityApi; + +class CuptiActivityProfiler { + public: + CuptiActivityProfiler(CuptiActivityApi& cupti, bool cpuOnly); + CuptiActivityProfiler(RoctracerActivityApi& rai, bool cpuOnly); + CuptiActivityProfiler(const CuptiActivityProfiler&) = delete; + CuptiActivityProfiler& operator=(const CuptiActivityProfiler&) = delete; + + bool isActive() const { + return currentRunloopState_ != RunloopState::WaitForRequest; + } + + // Invoke at a regular interval to perform profiling activities. + // When not active, an interval of 1-5 seconds is probably fine, + // depending on required warm-up time and delayed start time. + // When active, it's a good idea to invoke more frequently to stay below + // memory usage limit (ACTIVITIES_MAX_GPU_BUFFER_SIZE_MB) during warmup. + const std::chrono::time_point performRunLoopStep( + const std::chrono::time_point& now, + const std::chrono::time_point& nextWakeupTime, + int64_t currentIter = -1); + + // Used for async requests + void setLogger(ActivityLogger* logger) { + logger_ = logger; + } + + // Synchronous control API + void startTrace( + const std::chrono::time_point& now) { + std::lock_guard guard(mutex_); + startTraceInternal(now); + } + + void stopTrace(const std::chrono::time_point& now) { + std::lock_guard guard(mutex_); + stopTraceInternal(now); + } + + // Process CPU and GPU traces + void processTrace(ActivityLogger& logger) { + std::lock_guard guard(mutex_); + processTraceInternal(logger); + } + + void reset() { + std::lock_guard guard(mutex_); + resetInternal(); + } + + // Set up profiler as specified in config. + void configure( + const Config& config, + const std::chrono::time_point& now); + + // Registered with client API to pass CPU trace events over + void transferCpuTrace( + std::unique_ptr cpuTrace); + + Config& config() { + return *config_; + } + + inline void recordThreadInfo() { + int32_t sysTid = systemThreadId(); + // Note we're using the lower 32 bits of the (opaque) pthread id + // as key, because that's what CUPTI records. + int32_t tid = threadId(); + int32_t pid = processId(); + std::lock_guard guard(mutex_); + recordThreadInfo(sysTid, tid, pid); + } + + // T107508020: We can deprecate the recordThreadInfo(void) once we optimized profiler_kineto + void recordThreadInfo(int32_t sysTid, int32_t tid, int32_t pid) { + if (resourceInfo_.find({pid, tid}) == resourceInfo_.end()) { + resourceInfo_.emplace( + std::make_pair(pid, tid), + ActivityLogger::ResourceInfo( + pid, + sysTid, + sysTid, // sortindex + fmt::format("thread {} ({})", sysTid, getThreadName()))); + } + } + + void addMetadata(const std::string& key, const std::string& value) { + std::lock_guard guard(mutex_); + metadata_[key] = value; + } + + void addChildActivityProfiler( + std::unique_ptr profiler) { + std::lock_guard guard(mutex_); + profilers_.push_back(std::move(profiler)); + } + + protected: + + using CpuGpuSpanPair = std::pair; + static const CpuGpuSpanPair& defaultTraceSpan(); + + private: + + // Map of gpu activities to user defined events + class GpuUserEventMap { + public: + // Insert a user defined event which maps to the gpu trace activity. + // If the user defined event mapping already exists this will update the + // gpu side span to include the span of gpuTraceActivity. + void insertOrExtendEvent(const ITraceActivity& cpuTraceActivity, + const ITraceActivity& gpuTraceActivity); + // Log out the events to the logger + void logEvents(ActivityLogger *logger); + + void clear() { + streamSpanMap_.clear(); + } + + private: + // device id and stream name + using StreamKey = std::pair; + + // map of correlation id to TraceSpan + using CorrelationSpanMap = + std::unordered_map; + std::map streamSpanMap_; + }; + + GpuUserEventMap gpuUserEventMap_; + // id -> activity* + std::unordered_map activityMap_; + // cuda runtime id -> pytorch op id + // CUPTI provides a mechanism for correlating Cuda events to arbitrary + // external events, e.g.operator activities from PyTorch. + std::unordered_map cpuCorrelationMap_; + // CUDA runtime <-> GPU Activity + std::unordered_map + correlatedCudaActivities_; + std::unordered_map userCorrelationMap_; + + // data structure to collect cuptiActivityFlushAll() latency overhead + struct profilerOverhead { + int64_t overhead; + int cntr; + }; + + bool isWarmupDone( + const std::chrono::time_point& now, + int64_t currentIter) const; + + bool isCollectionDone( + const std::chrono::time_point& now, + int64_t currentIter) const; + + void startTraceInternal( + const std::chrono::time_point& now); + + void stopTraceInternal( + const std::chrono::time_point& now); + + void processTraceInternal(ActivityLogger& logger); + + void resetInternal(); + + void finalizeTrace(const Config& config, ActivityLogger& logger); + + void configureChildProfilers(); + + // Process a single CPU trace + void processCpuTrace( + libkineto::CpuTraceBuffer& cpuTrace, + ActivityLogger& logger); + + // Create resource names for streams + inline void recordStream(int device, int id, const char* postfix) { + if (resourceInfo_.find({device, id}) == resourceInfo_.end()) { + resourceInfo_.emplace( + std::make_pair(device, id), + ActivityLogger::ResourceInfo( + device, id, id, fmt::format( + "stream {} {}", id, postfix))); + } + } + + // Record client trace span for subsequent lookups from activities + // Also creates a corresponding GPU-side span. + CpuGpuSpanPair& recordTraceSpan(TraceSpan& span, int gpuOpCount); + + // Returns true if net name is to be tracked for a specified number of + // iterations. + bool iterationTargetMatch(libkineto::CpuTraceBuffer& trace); + + // net name to id + int netId(const std::string& netName); + + const ITraceActivity* linkedActivity( + int32_t correlationId, + const std::unordered_map& correlationMap); + +#ifdef HAS_CUPTI + // Process generic CUPTI activity + void handleCuptiActivity(const CUpti_Activity* record, ActivityLogger* logger); + + // Process specific GPU activity types + void updateGpuNetSpan(const ITraceActivity& gpuOp); + bool outOfRange(const ITraceActivity& act); + void handleCorrelationActivity( + const CUpti_ActivityExternalCorrelation* correlation); + void handleRuntimeActivity( + const CUpti_ActivityAPI* activity, ActivityLogger* logger); + void handleOverheadActivity( + const CUpti_ActivityOverhead* activity, ActivityLogger* logger); + void handleGpuActivity(const ITraceActivity& act, + ActivityLogger* logger); + template + void handleGpuActivity(const T* act, ActivityLogger* logger); +#endif // HAS_CUPTI + + void resetTraceData(); + + void addOverheadSample(profilerOverhead& counter, int64_t overhead) { + counter.overhead += overhead; + counter.cntr++; + } + int64_t getOverhead(const profilerOverhead& counter) { + if (counter.cntr == 0) { + return 0; + } + return counter.overhead / counter.cntr; + } + + void checkTimestampOrder(const ITraceActivity* act1); + + // On-demand request configuration + std::unique_ptr config_; + + // Logger used during trace processing + ActivityLogger* logger_; + + // Calls to CUPTI is encapsulated behind this interface +#ifdef HAS_ROCTRACER + RoctracerActivityApi& cupti_; // Design failure here +#else + CuptiActivityApi& cupti_; +#endif + + enum class RunloopState { + WaitForRequest, + Warmup, + CollectTrace, + ProcessTrace + }; + + // Start and end time used for triggering and stopping profiling + std::chrono::time_point profileStartTime_; + std::chrono::time_point profileEndTime_; + int64_t profileStartIter_ = -1, profileEndIter_ = -1; + + + // All recorded trace spans, both CPU and GPU + // Trace Id -> list of iterations. + // Using map of lists for the iterator semantics, since we are recording + // pointers to the elements in this structure. + std::map> traceSpans_; + + // Maintain a map of client trace activity to trace span. + // Maps correlation id -> TraceSpan* held by traceSpans_. + using ActivityTraceMap = std::unordered_map; + ActivityTraceMap clientActivityTraceMap_; + + // Cache thread names and system thread ids for pthread ids, + // and stream ids for GPU streams + std::map< + std::pair, + ActivityLogger::ResourceInfo> resourceInfo_; + + std::vector overheadInfo_; + + // the overhead to flush the activity buffer + profilerOverhead flushOverhead_; + // the overhead to enable/disable activity tracking + profilerOverhead setupOverhead_; + + bool cpuOnly_{false}; + + // *************************************************************************** + // Below state is shared with external threads. + // These need to either be atomic, accessed under lock or only used + // by external threads in separate runloop phases from the profiler thread. + // *************************************************************************** + + // Mutex to protect non-atomic access to below state + std::mutex mutex_; + + // Runloop phase + std::atomic currentRunloopState_{RunloopState::WaitForRequest}; + + // Keep track of the start time of the first net in the current trace. + // This is only relevant to Caffe2 as PyTorch does not have nets. + // All CUDA events before this time will be removed + // Can be written by external threads during collection. + int64_t captureWindowStartTime_{0}; + // Similarly, all CUDA API events after the last net event will be removed + int64_t captureWindowEndTime_{0}; + + // span name -> iteration count + std::map iterationCountMap_; + // Flag used to stop tracing from external api callback. + // Needs to be atomic since it's set from a different thread. + std::atomic_bool stopCollection_{false}; + + // Buffers where trace data is stored + std::unique_ptr traceBuffers_; + + // Trace metadata + std::unordered_map metadata_; + + // child activity profilers + std::vector> profilers_; + + // a vector of active profiler plugin sessions + std::vector> sessions_; + + // LoggerCollector to collect all LOGs during the trace +#if !USE_GOOGLE_LOG + std::unique_ptr loggerCollectorMetadata_; +#endif // !USE_GOOGLE_LOG +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApi.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApi.cpp new file mode 100644 index 0000000000000000000000000000000000000000..1876003998dc0c66f882d939ca8100750cfd046a --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApi.cpp @@ -0,0 +1,260 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "CuptiCallbackApi.h" + +#include +#include +#include +#include +#include + +#ifdef HAS_CUPTI +#include "cupti_call.h" +#endif +#include "Logger.h" + + +namespace KINETO_NAMESPACE { + +// limit on number of handles per callback type +constexpr size_t MAX_CB_FNS_PER_CB = 8; + +// Reader Writer lock types +using ReaderWriterLock = std::shared_timed_mutex; +using ReaderLockGuard = std::shared_lock; +using WriteLockGuard = std::unique_lock; + +static ReaderWriterLock callbackLock_; + +/* Callback Table : + * Overall goal of the design is to optimize the lookup of function + * pointers. The table is structured at two levels and the leaf + * elements in the table are std::list to enable fast access/inserts/deletes + * + * | + * -> cb id 0 -> std::list of callbacks + * ... + * -> cb id n -> std::list of callbacks + * | + * ... + * CallbackTable is the finaly table type above + * See type declrartions in header file. + */ + + +/* callback_switchboard : is the global callback handler we register + * with CUPTI. The goal is to make it as efficient as possible + * to re-direct to the registered callback(s). + * + * Few things to care about : + * a) use if/then switches rather than map/hash structures + * b) avoid dynamic memory allocations + * c) be aware of locking overheads + */ +#ifdef HAS_CUPTI +static void CUPTIAPI callback_switchboard( +#else +static void callback_switchboard( +#endif + void* /* unused */, + CUpti_CallbackDomain domain, + CUpti_CallbackId cbid, + const CUpti_CallbackData* cbInfo) { + + // below statement is likey going to call a mutex + // on the singleton access + CuptiCallbackApi::singleton().__callback_switchboard( + domain, cbid, cbInfo); +} + + +void CuptiCallbackApi::__callback_switchboard( + CUpti_CallbackDomain domain, + CUpti_CallbackId cbid, + const CUpti_CallbackData* cbInfo) { + VLOG(0) << "Callback: domain = " << domain << ", cbid = " << cbid; + CallbackList *cblist = nullptr; + + switch (domain) { + + // add the fastest path for kernel launch callbacks + // as these are the most frequent ones + case CUPTI_CB_DOMAIN_RUNTIME_API: + switch (cbid) { + case CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000: + cblist = &callbacks_.runtime[ + CUDA_LAUNCH_KERNEL - __RUNTIME_CB_DOMAIN_START]; + break; + default: + break; + } + break; + + case CUPTI_CB_DOMAIN_RESOURCE: + switch (cbid) { + case CUPTI_CBID_RESOURCE_CONTEXT_CREATED: + cblist = &callbacks_.resource[ + RESOURCE_CONTEXT_CREATED - __RESOURCE_CB_DOMAIN_START]; + break; + case CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING: + cblist = &callbacks_.resource[ + RESOURCE_CONTEXT_DESTROYED - __RESOURCE_CB_DOMAIN_START]; + break; + default: + break; + } + break; + + default: + return; + } + + // ignore callbacks that are not handled + if (cblist == nullptr) { + return; + } + + // make a copy of the callback list so we avoid holding lock + // in common case this should be just one func pointer copy + std::array callbacks; + int num_cbs = 0; + { + ReaderLockGuard rl(callbackLock_); + int i = 0; + for (auto it = cblist->begin(); + it != cblist->end() && i < MAX_CB_FNS_PER_CB; + it++, i++) { + callbacks[i] = *it; + } + num_cbs = i; + } + + for (int i = 0; i < num_cbs; i++) { + auto fn = callbacks[i]; + fn(domain, cbid, cbInfo); + } +} + +CuptiCallbackApi& CuptiCallbackApi::singleton() { + static CuptiCallbackApi instance; + return instance; +} + +CuptiCallbackApi::CuptiCallbackApi() { +#ifdef HAS_CUPTI + lastCuptiStatus_ = CUPTI_ERROR_UNKNOWN; + lastCuptiStatus_ = CUPTI_CALL_NOWARN( + cuptiSubscribe(&subscriber_, + (CUpti_CallbackFunc)callback_switchboard, + nullptr)); + + initSuccess_ = (lastCuptiStatus_ == CUPTI_SUCCESS); +#endif +} + +CuptiCallbackApi::CallbackList* CuptiCallbackApi::CallbackTable::lookup( + CUpti_CallbackDomain domain, CuptiCallBackID cbid) { + size_t idx; + + switch (domain) { + + case CUPTI_CB_DOMAIN_RESOURCE: + assert(cbid >= __RESOURCE_CB_DOMAIN_START); + assert(cbid < __RESOURCE_CB_DOMAIN_END); + idx = cbid - __RESOURCE_CB_DOMAIN_START; + return &resource.at(idx); + + case CUPTI_CB_DOMAIN_RUNTIME_API: + assert(cbid >= __RUNTIME_CB_DOMAIN_START); + assert(cbid < __RUNTIME_CB_DOMAIN_END); + idx = cbid - __RUNTIME_CB_DOMAIN_START; + return &runtime.at(idx); + + default: + LOG(WARNING) << " Unsupported callback domain : " << domain; + return nullptr; + } +} + +bool CuptiCallbackApi::registerCallback( + CUpti_CallbackDomain domain, + CuptiCallBackID cbid, + CuptiCallbackFn cbfn) { + CallbackList* cblist = callbacks_.lookup(domain, cbid); + + if (!cblist) { + LOG(WARNING) << "Could not register callback -- domain = " << domain + << " callback id = " << cbid; + return false; + } + + // avoid duplicates + auto it = std::find(cblist->begin(), cblist->end(), cbfn); + if (it != cblist->end()) { + LOG(WARNING) << "Adding duplicate callback -- domain = " << domain + << " callback id = " << cbid; + return true; + } + + if (cblist->size() == MAX_CB_FNS_PER_CB) { + LOG(WARNING) << "Already registered max callback -- domain = " << domain + << " callback id = " << cbid; + } + + WriteLockGuard wl(callbackLock_); + cblist->push_back(cbfn); + return true; +} + +bool CuptiCallbackApi::deleteCallback( + CUpti_CallbackDomain domain, + CuptiCallBackID cbid, + CuptiCallbackFn cbfn) { + CallbackList* cblist = callbacks_.lookup(domain, cbid); + if (!cblist) { + LOG(WARNING) << "Attempting to remove unsupported callback -- domain = " << domain + << " callback id = " << cbid; + return false; + } + + // Locks are not required here as + // https://en.cppreference.com/w/cpp/container/list/erase + // "References and iterators to the erased elements are invalidated. + // Other references and iterators are not affected." + auto it = std::find(cblist->begin(), cblist->end(), cbfn); + if (it == cblist->end()) { + LOG(WARNING) << "Could not find callback to remove -- domain = " << domain + << " callback id = " << cbid; + return false; + } + + WriteLockGuard wl(callbackLock_); + cblist->erase(it); + return true; +} + +bool CuptiCallbackApi::enableCallback( + CUpti_CallbackDomain domain, CUpti_CallbackId cbid) { +#ifdef HAS_CUPTI + if (initSuccess_) { + lastCuptiStatus_ = CUPTI_CALL_NOWARN( + cuptiEnableCallback(1, subscriber_, domain, cbid)); + return (lastCuptiStatus_ == CUPTI_SUCCESS); + } +#endif + return false; +} + +bool CuptiCallbackApi::disableCallback( + CUpti_CallbackDomain domain, CUpti_CallbackId cbid) { +#ifdef HAS_CUPTI + if (initSuccess_) { + lastCuptiStatus_ = CUPTI_CALL_NOWARN( + cuptiEnableCallback(0, subscriber_, domain, cbid)); + return (lastCuptiStatus_ == CUPTI_SUCCESS); + } +#endif + return false; +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApi.h b/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApi.h new file mode 100644 index 0000000000000000000000000000000000000000..4526f3750b4a134bc888843b8ff347a1f2bf8d5f --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApi.h @@ -0,0 +1,130 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#ifdef HAS_CUPTI +#include +#endif +#include +#include +#include +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "CuptiCallbackApiMock.h" + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + + +/* CuptiCallbackApi : Provides an abstraction over CUPTI callback + * interface. This enables various callback functions to be registered + * with this class. The class registers a global callback handler that + * redirects to the respective callbacks. + * + * Note: one design choice we made is to only support simple function pointers + * in order to speed up the implementation for fast path. + */ + +using CuptiCallbackFn = void(*)( + CUpti_CallbackDomain domain, + CUpti_CallbackId cbid, + const CUpti_CallbackData* cbInfo); + + +class CuptiCallbackApi { + + public: + + /* Global list of supported callback ids + * use the class namespace to avoid confusing with CUPTI enums*/ + enum CuptiCallBackID { + CUDA_LAUNCH_KERNEL = 0, + // can possibly support more callback ids per domain + // + __RUNTIME_CB_DOMAIN_START = CUDA_LAUNCH_KERNEL, + + // Callbacks under Resource CB domain + RESOURCE_CONTEXT_CREATED, + RESOURCE_CONTEXT_DESTROYED, + + __RUNTIME_CB_DOMAIN_END = RESOURCE_CONTEXT_CREATED, + __RESOURCE_CB_DOMAIN_START = RESOURCE_CONTEXT_CREATED, + + __RESOURCE_CB_DOMAIN_END = RESOURCE_CONTEXT_DESTROYED + 1, + }; + + + CuptiCallbackApi(const CuptiCallbackApi&) = delete; + CuptiCallbackApi& operator=(const CuptiCallbackApi&) = delete; + + static CuptiCallbackApi& singleton(); + + bool initSuccess() const { + return initSuccess_; + } + +#ifdef HAS_CUPTI + CUptiResult getCuptiStatus() const { + return lastCuptiStatus_; + } +#endif + + bool registerCallback( + CUpti_CallbackDomain domain, + CuptiCallBackID cbid, + CuptiCallbackFn cbfn); + + // returns false if callback was not found + bool deleteCallback( + CUpti_CallbackDomain domain, + CuptiCallBackID cbid, + CuptiCallbackFn cbfn); + + bool enableCallback(CUpti_CallbackDomain domain, CUpti_CallbackId cbid); + bool disableCallback(CUpti_CallbackDomain domain, CUpti_CallbackId cbid); + + + // Please do not use this method. This has to be exposed as public + // so it is accessible from the callback handler + void __callback_switchboard( + CUpti_CallbackDomain domain, + CUpti_CallbackId cbid, + const CUpti_CallbackData* cbInfo); + + private: + + explicit CuptiCallbackApi(); + + // For callback table design overview see the .cpp file + using CallbackList = std::list; + + // level 2 tables sizes are known at compile time + constexpr static size_t RUNTIME_CB_DOMAIN_SIZE + = (__RUNTIME_CB_DOMAIN_END - __RUNTIME_CB_DOMAIN_START); + + constexpr static size_t RESOURCE_CB_DOMAIN_SIZE + = (__RESOURCE_CB_DOMAIN_END - __RESOURCE_CB_DOMAIN_START); + + // level 1 table is a struct + struct CallbackTable { + std::array runtime; + std::array resource; + + CallbackList* lookup(CUpti_CallbackDomain domain, CuptiCallBackID cbid); + }; + + CallbackTable callbacks_; + bool initSuccess_ = false; + +#ifdef HAS_CUPTI + CUptiResult lastCuptiStatus_; + CUpti_SubscriberHandle subscriber_; +#endif +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApiMock.h b/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApiMock.h new file mode 100644 index 0000000000000000000000000000000000000000..fd51267274f99a0c9949eaac6fdae2dff917c7a0 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiCallbackApiMock.h @@ -0,0 +1,32 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +// Provides data structures to mock CUPTI Callback API +#ifndef HAS_CUPTI + +enum CUpti_CallbackDomain { + CUPTI_CB_DOMAIN_RESOURCE, + CUPTI_CB_DOMAIN_RUNTIME_API, +}; +enum CUpti_CallbackId { + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000, + CUPTI_CBID_RESOURCE_CONTEXT_CREATED, + CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING, +}; + +using CUcontext = void*; + +struct CUpti_ResourceData { + CUcontext context; +}; + +constexpr int CUPTI_API_ENTER = 0; +constexpr int CUPTI_API_EXIT = 0; + +struct CUpti_CallbackData { + CUcontext context; + const char* symbolName; + int callbackSite; +}; +#endif // HAS_CUPTI diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiEventApi.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiEventApi.cpp new file mode 100644 index 0000000000000000000000000000000000000000..7f1d48c1d00bb7defb6b622c13da55da99312a3b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiEventApi.cpp @@ -0,0 +1,112 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "CuptiEventApi.h" + +#include + +#include "Logger.h" +#include "cupti_call.h" + +using namespace std::chrono; +using std::vector; + +namespace KINETO_NAMESPACE { + +CuptiEventApi::CuptiEventApi(CUcontext context) + : context_(context) { + CUPTI_CALL(cuptiGetDeviceId(context_, (uint32_t*)&device_)); +} + +CUpti_EventGroupSets* CuptiEventApi::createGroupSets( + vector& ids) { + CUpti_EventGroupSets* group_sets = nullptr; + CUptiResult res = CUPTI_CALL(cuptiEventGroupSetsCreate( + context_, sizeof(CUpti_EventID) * ids.size(), ids.data(), &group_sets)); + + if (res != CUPTI_SUCCESS || group_sets == nullptr) { + const char* errstr = nullptr; + CUPTI_CALL(cuptiGetResultString(res, &errstr)); + throw std::system_error(EINVAL, std::generic_category(), errstr); + } + + return group_sets; +} + +void CuptiEventApi::destroyGroupSets(CUpti_EventGroupSets* sets) { + CUPTI_CALL(cuptiEventGroupSetsDestroy(sets)); +} + +bool CuptiEventApi::setContinuousMode() { + // Avoid logging noise for CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED + CUptiResult res = CUPTI_CALL_NOWARN(cuptiSetEventCollectionMode( + context_, CUPTI_EVENT_COLLECTION_MODE_CONTINUOUS)); + if (res == CUPTI_ERROR_LEGACY_PROFILER_NOT_SUPPORTED) { + return false; + } + // Log warning on other errors + CUPTI_CALL(res); + return (res == CUPTI_SUCCESS); +} + +void CuptiEventApi::enablePerInstance(CUpti_EventGroup eventGroup) { + uint32_t profile_all = 1; + CUPTI_CALL(cuptiEventGroupSetAttribute( + eventGroup, + CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES, + sizeof(profile_all), + &profile_all)); +} + +uint32_t CuptiEventApi::instanceCount(CUpti_EventGroup eventGroup) { + uint32_t instance_count = 0; + size_t s = sizeof(instance_count); + CUPTI_CALL(cuptiEventGroupGetAttribute( + eventGroup, CUPTI_EVENT_GROUP_ATTR_INSTANCE_COUNT, &s, &instance_count)); + return instance_count; +} + +void CuptiEventApi::enableGroupSet(CUpti_EventGroupSet& set) { + CUptiResult res = CUPTI_CALL_NOWARN(cuptiEventGroupSetEnable(&set)); + if (res != CUPTI_SUCCESS) { + const char* errstr = nullptr; + CUPTI_CALL(cuptiGetResultString(res, &errstr)); + throw std::system_error(EIO, std::generic_category(), errstr); + } +} + +void CuptiEventApi::disableGroupSet(CUpti_EventGroupSet& set) { + CUPTI_CALL(cuptiEventGroupSetDisable(&set)); +} + +void CuptiEventApi::readEvent( + CUpti_EventGroup grp, + CUpti_EventID id, + vector& vals) { + size_t s = sizeof(int64_t) * vals.size(); + CUPTI_CALL(cuptiEventGroupReadEvent( + grp, + CUPTI_EVENT_READ_FLAG_NONE, + id, + &s, + reinterpret_cast(vals.data()))); +} + +vector CuptiEventApi::eventsInGroup(CUpti_EventGroup grp) { + uint32_t group_size = 0; + size_t s = sizeof(group_size); + CUPTI_CALL(cuptiEventGroupGetAttribute( + grp, CUPTI_EVENT_GROUP_ATTR_NUM_EVENTS, &s, &group_size)); + size_t events_size = group_size * sizeof(CUpti_EventID); + vector res(group_size); + CUPTI_CALL(cuptiEventGroupGetAttribute( + grp, CUPTI_EVENT_GROUP_ATTR_EVENTS, &events_size, res.data())); + return res; +} + +CUpti_EventID CuptiEventApi::eventId(const std::string& name) { + CUpti_EventID id{0}; + CUPTI_CALL(cuptiEventGetIdFromName(device_, name.c_str(), &id)); + return id; +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiEventApi.h b/plugins/tensorboard-plugins/libkineto/src/CuptiEventApi.h new file mode 100644 index 0000000000000000000000000000000000000000..79610f93f0ecfa62a9508d4caddfa876518169d3 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiEventApi.h @@ -0,0 +1,49 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include + +namespace KINETO_NAMESPACE { + +// C++ interface to CUPTI Events C API. +// Virtual methods are here mainly to allow easier testing. +class CuptiEventApi { + public: + explicit CuptiEventApi(CUcontext context_); + virtual ~CuptiEventApi() {} + + CUdevice device() { + return device_; + } + + virtual CUpti_EventGroupSets* createGroupSets( + std::vector& ids); + virtual void destroyGroupSets(CUpti_EventGroupSets* sets); + + virtual bool setContinuousMode(); + + virtual void enablePerInstance(CUpti_EventGroup eventGroup); + virtual uint32_t instanceCount(CUpti_EventGroup eventGroup); + + virtual void enableGroupSet(CUpti_EventGroupSet& set); + virtual void disableGroupSet(CUpti_EventGroupSet& set); + + virtual void + readEvent(CUpti_EventGroup g, CUpti_EventID id, std::vector& vals); + virtual std::vector eventsInGroup(CUpti_EventGroup g); + + virtual CUpti_EventID eventId(const std::string& name); + + protected: + // Unit testing + CuptiEventApi() : context_(nullptr), device_(0) {} + + private: + CUcontext context_; + CUdevice device_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiMetricApi.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiMetricApi.cpp new file mode 100644 index 0000000000000000000000000000000000000000..36401e7434108d1da079aa4ba0264192c5d62838 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiMetricApi.cpp @@ -0,0 +1,107 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "CuptiMetricApi.h" + +#include + +#include "Logger.h" +#include "cupti_call.h" + +using namespace std::chrono; +using std::vector; + +namespace KINETO_NAMESPACE { + +CUpti_MetricID CuptiMetricApi::idFromName(const std::string& name) { + CUpti_MetricID metric_id{~0u}; + CUptiResult res = + CUPTI_CALL(cuptiMetricGetIdFromName(device_, name.c_str(), &metric_id)); + if (res == CUPTI_ERROR_INVALID_METRIC_NAME) { + LOG(WARNING) << "Invalid metric name: " << name; + } + return metric_id; +} + +// Return a map of event IDs and names for a given metric id. +// Note that many events don't have a name. In that case the name will +// be set to the empty string. +std::map CuptiMetricApi::events( + CUpti_MetricID metric_id) { + uint32_t num_events = 0; + CUPTI_CALL(cuptiMetricGetNumEvents(metric_id, &num_events)); + vector ids(num_events); + size_t array_size = num_events * sizeof(CUpti_EventID); + CUPTI_CALL(cuptiMetricEnumEvents(metric_id, &array_size, ids.data())); + std::map res; + for (CUpti_EventID id : ids) { + // Attempt to lookup name from CUPTI + constexpr size_t kMaxEventNameLength = 64; + char cupti_name[kMaxEventNameLength]; + size_t size = kMaxEventNameLength; + CUPTI_CALL( + cuptiEventGetAttribute(id, CUPTI_EVENT_ATTR_NAME, &size, cupti_name)); + cupti_name[kMaxEventNameLength - 1] = 0; + + // CUPTI "helpfully" returns "event_name" when the event is unnamed. + if (size > 0 && strcmp(cupti_name, "event_name") != 0) { + res.emplace(id, cupti_name); + } else { + res.emplace(id, ""); + } + } + return res; +} + +CUpti_MetricValueKind CuptiMetricApi::valueKind(CUpti_MetricID metric) { + CUpti_MetricValueKind res{CUPTI_METRIC_VALUE_KIND_FORCE_INT}; + size_t value_kind_size = sizeof(res); + CUPTI_CALL(cuptiMetricGetAttribute( + metric, CUPTI_METRIC_ATTR_VALUE_KIND, &value_kind_size, &res)); + return res; +} + +CUpti_MetricEvaluationMode CuptiMetricApi::evaluationMode( + CUpti_MetricID metric) { + CUpti_MetricEvaluationMode eval_mode{ + CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE}; + size_t eval_mode_size = sizeof(eval_mode); + CUPTI_CALL(cuptiMetricGetAttribute( + metric, CUPTI_METRIC_ATTR_EVALUATION_MODE, &eval_mode_size, &eval_mode)); + return eval_mode; +} + +// FIXME: Consider caching value kind here +SampleValue CuptiMetricApi::calculate( + CUpti_MetricID metric, + CUpti_MetricValueKind kind, + vector& events, + vector& values, + int64_t duration) { + CUpti_MetricValue metric_value; + CUPTI_CALL(cuptiMetricGetValue( + device_, + metric, + events.size() * sizeof(CUpti_EventID), + events.data(), + values.size() * sizeof(int64_t), + reinterpret_cast(values.data()), + duration, + &metric_value)); + + switch (kind) { + case CUPTI_METRIC_VALUE_KIND_DOUBLE: + case CUPTI_METRIC_VALUE_KIND_PERCENT: + return SampleValue(metric_value.metricValueDouble); + case CUPTI_METRIC_VALUE_KIND_UINT64: + case CUPTI_METRIC_VALUE_KIND_INT64: + case CUPTI_METRIC_VALUE_KIND_THROUGHPUT: + return SampleValue(metric_value.metricValueUint64); + case CUPTI_METRIC_VALUE_KIND_UTILIZATION_LEVEL: + return SampleValue((int)metric_value.metricValueUtilizationLevel); + default: + assert(false); + } + return SampleValue(-1); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiMetricApi.h b/plugins/tensorboard-plugins/libkineto/src/CuptiMetricApi.h new file mode 100644 index 0000000000000000000000000000000000000000..f45d38cd6169dc7fd30208dbb7dac09fd8a9dee5 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiMetricApi.h @@ -0,0 +1,38 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +#include +#include + +#include "SampleListener.h" + +namespace KINETO_NAMESPACE { + +// C++ interface to CUPTI Metrics C API. +// Virtual methods are here mainly to allow easier testing. +class CuptiMetricApi { + public: + explicit CuptiMetricApi(CUdevice device) : device_(device) {} + virtual ~CuptiMetricApi() {} + + virtual CUpti_MetricID idFromName(const std::string& name); + virtual std::map events(CUpti_MetricID metric_id); + + virtual CUpti_MetricValueKind valueKind(CUpti_MetricID metric); + virtual CUpti_MetricEvaluationMode evaluationMode(CUpti_MetricID metric); + + virtual SampleValue calculate( + CUpti_MetricID metric, + CUpti_MetricValueKind kind, + std::vector& events, + std::vector& values, + int64_t duration); + + private: + CUdevice device_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiNvPerfMetric.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiNvPerfMetric.cpp new file mode 100644 index 0000000000000000000000000000000000000000..d1b08ab2c13d0615221e71f43f07c3d3fe102a2f --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiNvPerfMetric.cpp @@ -0,0 +1,504 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#ifdef HAS_CUPTI +#include +#if defined(CUDART_VERSION) && CUDART_VERSION > 10000 && CUDART_VERSION < 11040 +#include +#include +#include +#endif // cuda version > 10.00 and < 11.04 +#endif // HAS_CUPTI + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "ScopeExit.h" +#include "CuptiNvPerfMetric.h" +#include "Logger.h" + +namespace KINETO_NAMESPACE { + +// Add a namespace to isolate these utility functions that are only +// going to be used by the CuptiRangeProfiler. These included calls +// to NVIDIA PerfWorks APIs. +namespace nvperf { + + +// Largely based on NVIDIA sample code provided with CUDA release +// files Metric.cpp and Eval.cpp + +// ------------------------------------------------- +// Metric and Counter Data Configuration +// ------------------------------------------------- + + +// Note: Be carful before modifying the code below. There is a specific +// sequence one needs to follow to program the metrics else things may +// stop working. We tried to keep the flow consistent with the example +// code from NVIDIA. Since most of the programmability comes from +// the CUPTI profiler metric names this should be okay. + +// Only supported on CUDA RT Version between 10.0 and 11.04. +// After CUDA RT 11.04, the structure has changed. +// TODO update the structure NVPA_RawMetricsConfig to support 11.04 +#if defined(CUDART_VERSION) && CUDART_VERSION > 10000 && CUDART_VERSION < 11040 + +bool getRawMetricRequests( + NVPA_MetricsContext* metricsContext, + std::vector metricNames, + std::vector& rawMetricsDeps, + std::vector& rawMetricRequests) { + bool isolated = true; + /* Bug in collection with collection of metrics without instances, keep it + * to true*/ + bool keepInstances = true; + + for (const auto& metricName : metricNames) { + + NVPW_MetricsContext_GetMetricProperties_Begin_Params + getMetricPropertiesBeginParams = { + NVPW_MetricsContext_GetMetricProperties_Begin_Params_STRUCT_SIZE, nullptr}; + getMetricPropertiesBeginParams.pMetricsContext = metricsContext; + getMetricPropertiesBeginParams.pMetricName = metricName.c_str(); + + if (!NVPW_CALL( + NVPW_MetricsContext_GetMetricProperties_Begin( + &getMetricPropertiesBeginParams))) { + return false; + } + + for (const char** metricDepsIt = + getMetricPropertiesBeginParams.ppRawMetricDependencies; + *metricDepsIt; + ++metricDepsIt) { + rawMetricsDeps.push_back(*metricDepsIt); + } + + NVPW_MetricsContext_GetMetricProperties_End_Params + getMetricPropertiesEndParams = { + NVPW_MetricsContext_GetMetricProperties_End_Params_STRUCT_SIZE, nullptr}; + getMetricPropertiesEndParams.pMetricsContext = metricsContext; + + if (!NVPW_CALL(NVPW_MetricsContext_GetMetricProperties_End( + &getMetricPropertiesEndParams))) { + return false; + } + } + + for (const auto& rawMetricName : rawMetricsDeps) { + NVPA_RawMetricRequest metricRequest = {NVPA_RAW_METRIC_REQUEST_STRUCT_SIZE, nullptr}; + metricRequest.pMetricName = rawMetricName.c_str(); + metricRequest.isolated = isolated; + metricRequest.keepInstances = keepInstances; + rawMetricRequests.push_back(metricRequest); + VLOG(1) << "Adding raw metric struct : raw metric = " << rawMetricName + << " isolated = " << isolated << " keepinst = " << keepInstances; + } + + if (rawMetricRequests.size() == 0) { + LOG(WARNING) << "CUPTI Profiler was unable to configure any metrics"; + return false; + } + return true; +} + +// Setup CUPTI Profiler Config Image +bool getProfilerConfigImage( + const std::string& chipName, + const std::vector& metricNames, + std::vector& configImage, + const uint8_t* counterAvailabilityImage) { + + NVPW_CUDA_MetricsContext_Create_Params metricsContextCreateParams = { + NVPW_CUDA_MetricsContext_Create_Params_STRUCT_SIZE, nullptr}; + metricsContextCreateParams.pChipName = chipName.c_str(); + + if (!NVPW_CALL( + NVPW_CUDA_MetricsContext_Create(&metricsContextCreateParams))) { + return false; + } + + NVPW_MetricsContext_Destroy_Params metricsContextDestroyParams = { + NVPW_MetricsContext_Destroy_Params_STRUCT_SIZE, nullptr}; + metricsContextDestroyParams.pMetricsContext = + metricsContextCreateParams.pMetricsContext; + + SCOPE_EXIT([&]() { + NVPW_MetricsContext_Destroy( + (NVPW_MetricsContext_Destroy_Params*)&metricsContextDestroyParams); + }); + + // Get all raw metrics required for given metricNames list + std::vector rawMetricRequests; + + // note: we need a variable at this functions scope to hold the string + // pointers for underlying C char arrays. + std::vector rawMetricDeps; + + if (!getRawMetricRequests( + metricsContextCreateParams.pMetricsContext, + metricNames, + rawMetricDeps, + rawMetricRequests)) { + return false; + } + + NVPA_RawMetricsConfigOptions metricsConfigOptions = { + NVPA_RAW_METRICS_CONFIG_OPTIONS_STRUCT_SIZE, nullptr}; + metricsConfigOptions.activityKind = NVPA_ACTIVITY_KIND_PROFILER; + metricsConfigOptions.pChipName = chipName.c_str(); + NVPA_RawMetricsConfig* rawMetricsConfig; + if (!NVPW_CALL( + NVPA_RawMetricsConfig_Create( + &metricsConfigOptions, &rawMetricsConfig))) { + return false; + } + + // TODO check if this is required + if (counterAvailabilityImage) { + NVPW_RawMetricsConfig_SetCounterAvailability_Params + setCounterAvailabilityParams = { + NVPW_RawMetricsConfig_SetCounterAvailability_Params_STRUCT_SIZE, nullptr}; + setCounterAvailabilityParams.pRawMetricsConfig = rawMetricsConfig; + setCounterAvailabilityParams.pCounterAvailabilityImage = + counterAvailabilityImage; + if (!NVPW_CALL( + NVPW_RawMetricsConfig_SetCounterAvailability( + &setCounterAvailabilityParams))) { + return false; + } + } + + NVPW_RawMetricsConfig_Destroy_Params rawMetricsConfigDestroyParams = { + NVPW_RawMetricsConfig_Destroy_Params_STRUCT_SIZE, nullptr}; + rawMetricsConfigDestroyParams.pRawMetricsConfig = rawMetricsConfig; + SCOPE_EXIT([&]() { + NVPW_RawMetricsConfig_Destroy( + (NVPW_RawMetricsConfig_Destroy_Params*)&rawMetricsConfigDestroyParams); + }); + + // Start a Raw Metric Pass group + NVPW_RawMetricsConfig_BeginPassGroup_Params beginPassGroupParams = { + NVPW_RawMetricsConfig_BeginPassGroup_Params_STRUCT_SIZE, nullptr}; + beginPassGroupParams.pRawMetricsConfig = rawMetricsConfig; + if (!NVPW_CALL( + NVPW_RawMetricsConfig_BeginPassGroup(&beginPassGroupParams))) { + return false; + } + + // Add all raw metrics + NVPW_RawMetricsConfig_AddMetrics_Params addMetricsParams = { + NVPW_RawMetricsConfig_AddMetrics_Params_STRUCT_SIZE, nullptr}; + addMetricsParams.pRawMetricsConfig = rawMetricsConfig; + addMetricsParams.pRawMetricRequests = rawMetricRequests.data(); + addMetricsParams.numMetricRequests = rawMetricRequests.size(); + if (!NVPW_CALL( + NVPW_RawMetricsConfig_AddMetrics(&addMetricsParams))) { + return false; + } + + // End pass group + NVPW_RawMetricsConfig_EndPassGroup_Params endPassGroupParams = { + NVPW_RawMetricsConfig_EndPassGroup_Params_STRUCT_SIZE, nullptr}; + endPassGroupParams.pRawMetricsConfig = rawMetricsConfig; + if (!NVPW_CALL( + NVPW_RawMetricsConfig_EndPassGroup(&endPassGroupParams))) { + return false; + } + + // Setup Config Image generation + NVPW_RawMetricsConfig_GenerateConfigImage_Params generateConfigImageParams = { + NVPW_RawMetricsConfig_GenerateConfigImage_Params_STRUCT_SIZE, nullptr}; + generateConfigImageParams.pRawMetricsConfig = rawMetricsConfig; + if (!NVPW_CALL( + NVPW_RawMetricsConfig_GenerateConfigImage(&generateConfigImageParams))) { + return false; + } + + // Get the Config Image size... nearly there + NVPW_RawMetricsConfig_GetConfigImage_Params getConfigImageParams = { + NVPW_RawMetricsConfig_GetConfigImage_Params_STRUCT_SIZE, nullptr}; + getConfigImageParams.pRawMetricsConfig = rawMetricsConfig; + getConfigImageParams.bytesAllocated = 0; + getConfigImageParams.pBuffer = nullptr; + if (!NVPW_CALL( + NVPW_RawMetricsConfig_GetConfigImage(&getConfigImageParams))) { + return false; + } + + configImage.resize(getConfigImageParams.bytesCopied); + + // Write the Config image binary + getConfigImageParams.bytesAllocated = configImage.size(); + getConfigImageParams.pBuffer = configImage.data(); + if (!NVPW_CALL( + NVPW_RawMetricsConfig_GetConfigImage(&getConfigImageParams))) { + return false; + } + + return true; +} + +bool getCounterDataPrefixImage( + const std::string& chipName, + const std::vector& metricNames, + std::vector& counterDataImagePrefix) { + + NVPW_CUDA_MetricsContext_Create_Params metricsContextCreateParams = { + NVPW_CUDA_MetricsContext_Create_Params_STRUCT_SIZE, nullptr}; + metricsContextCreateParams.pChipName = chipName.c_str(); + + if (!NVPW_CALL( + NVPW_CUDA_MetricsContext_Create(&metricsContextCreateParams))) { + return false; + } + + NVPW_MetricsContext_Destroy_Params metricsContextDestroyParams = { + NVPW_MetricsContext_Destroy_Params_STRUCT_SIZE, nullptr}; + metricsContextDestroyParams.pMetricsContext = + metricsContextCreateParams.pMetricsContext; + + + SCOPE_EXIT([&]() { + NVPW_MetricsContext_Destroy( + (NVPW_MetricsContext_Destroy_Params*)&metricsContextDestroyParams); + }); + + // Get all raw metrics required for given metricNames list + std::vector rawMetricRequests; + + // note: we need a variable at this functions scope to hold the string + // pointers for underlying C char arrays. + std::vector rawMetricDeps; + + if (!getRawMetricRequests( + metricsContextCreateParams.pMetricsContext, + metricNames, + rawMetricDeps, + rawMetricRequests)) { + return false; + } + + // Setup Counter Data builder + NVPW_CounterDataBuilder_Create_Params counterDataBuilderCreateParams = { + NVPW_CounterDataBuilder_Create_Params_STRUCT_SIZE, nullptr}; + counterDataBuilderCreateParams.pChipName = chipName.c_str(); + if (!NVPW_CALL( + NVPW_CounterDataBuilder_Create(&counterDataBuilderCreateParams))) { + return false; + } + + NVPW_CounterDataBuilder_Destroy_Params counterDataBuilderDestroyParams = { + NVPW_CounterDataBuilder_Destroy_Params_STRUCT_SIZE, nullptr}; + counterDataBuilderDestroyParams.pCounterDataBuilder = + counterDataBuilderCreateParams.pCounterDataBuilder; + SCOPE_EXIT([&]() { + NVPW_CounterDataBuilder_Destroy(( + NVPW_CounterDataBuilder_Destroy_Params*)&counterDataBuilderDestroyParams); + }); + + // Add metrics to counter data image prefix + NVPW_CounterDataBuilder_AddMetrics_Params addMetricsParams = { + NVPW_CounterDataBuilder_AddMetrics_Params_STRUCT_SIZE, nullptr}; + addMetricsParams.pCounterDataBuilder = + counterDataBuilderCreateParams.pCounterDataBuilder; + addMetricsParams.pRawMetricRequests = rawMetricRequests.data(); + addMetricsParams.numMetricRequests = rawMetricRequests.size(); + if (!NVPW_CALL( + NVPW_CounterDataBuilder_AddMetrics(&addMetricsParams))) { + return false; + } + + // Get image prefix size + NVPW_CounterDataBuilder_GetCounterDataPrefix_Params + getCounterDataPrefixParams = { + NVPW_CounterDataBuilder_GetCounterDataPrefix_Params_STRUCT_SIZE, nullptr}; + getCounterDataPrefixParams.pCounterDataBuilder = + counterDataBuilderCreateParams.pCounterDataBuilder; + getCounterDataPrefixParams.bytesAllocated = 0; + getCounterDataPrefixParams.pBuffer = nullptr; + if (!NVPW_CALL( + NVPW_CounterDataBuilder_GetCounterDataPrefix( + &getCounterDataPrefixParams))) { + return false; + } + + counterDataImagePrefix.resize(getCounterDataPrefixParams.bytesCopied); + + // Now write counter data image prefix + getCounterDataPrefixParams.bytesAllocated = counterDataImagePrefix.size(); + getCounterDataPrefixParams.pBuffer = counterDataImagePrefix.data(); + if (!NVPW_CALL( + NVPW_CounterDataBuilder_GetCounterDataPrefix( + &getCounterDataPrefixParams))) { + return false; + } + + return true; +} + +// ------------------------------------------------- +// Metric and Counter Evaluation Utilities +// ------------------------------------------------- + +std::string getRangeDescription( + const std::vector& counterDataImage, + int rangeIndex) { + std::vector descriptionPtrs; + + NVPW_Profiler_CounterData_GetRangeDescriptions_Params getRangeDescParams = { + NVPW_Profiler_CounterData_GetRangeDescriptions_Params_STRUCT_SIZE, nullptr}; + getRangeDescParams.pCounterDataImage = counterDataImage.data(); + getRangeDescParams.rangeIndex = rangeIndex; + + if (!NVPW_CALL( + NVPW_Profiler_CounterData_GetRangeDescriptions(&getRangeDescParams))) { + return ""; + } + + descriptionPtrs.resize(getRangeDescParams.numDescriptions); + getRangeDescParams.ppDescriptions = descriptionPtrs.data(); + + if (!NVPW_CALL( + NVPW_Profiler_CounterData_GetRangeDescriptions(&getRangeDescParams))) { + return ""; + } + + std::string rangeName; + + for (size_t i = 0; i < getRangeDescParams.numDescriptions; i++) { + if (i > 0) { + rangeName.append("/"); + } + rangeName.append(descriptionPtrs[i]); + } + return rangeName; +} + +CuptiProfilerResult evalMetricValues( + const std::string& chipName, + const std::vector& counterDataImage, + const std::vector& metricNames, + bool verbose) { + + if (!counterDataImage.size()) { + LOG(ERROR) << "Counter Data Image is empty!"; + return {}; + } + + NVPW_CUDA_MetricsContext_Create_Params metricsContextCreateParams = { + NVPW_CUDA_MetricsContext_Create_Params_STRUCT_SIZE, nullptr}; + metricsContextCreateParams.pChipName = chipName.c_str(); + if (!NVPW_CALL( + NVPW_CUDA_MetricsContext_Create(&metricsContextCreateParams))) { + return {}; + } + + NVPW_MetricsContext_Destroy_Params metricsContextDestroyParams = { + NVPW_MetricsContext_Destroy_Params_STRUCT_SIZE, nullptr}; + metricsContextDestroyParams.pMetricsContext = + metricsContextCreateParams.pMetricsContext; + SCOPE_EXIT([&]() { + NVPW_MetricsContext_Destroy( + (NVPW_MetricsContext_Destroy_Params*)&metricsContextDestroyParams); + }); + + NVPW_CounterData_GetNumRanges_Params getNumRangesParams = { + NVPW_CounterData_GetNumRanges_Params_STRUCT_SIZE, nullptr}; + getNumRangesParams.pCounterDataImage = counterDataImage.data(); + if (!NVPW_CALL( + NVPW_CounterData_GetNumRanges(&getNumRangesParams))) { + return {}; + } + + // TBD in the future support special chars in metric name + // for now these are default + const bool isolated = true; + + // API takes a 2D array of chars + std::vector metricNamePtrs; + + for (const auto& metric : metricNames) { + metricNamePtrs.push_back(metric.c_str()); + } + + CuptiProfilerResult result{ + .metricNames = metricNames}; + + for (size_t rangeIndex = 0; rangeIndex < getNumRangesParams.numRanges; + ++rangeIndex) { + + CuptiRangeMeasurement rangeData { + .rangeName = getRangeDescription(counterDataImage, rangeIndex)}; + rangeData.values.resize(metricNames.size()); + + // First set Counter data image with current range + NVPW_MetricsContext_SetCounterData_Params setCounterDataParams = { + NVPW_MetricsContext_SetCounterData_Params_STRUCT_SIZE, nullptr}; + + setCounterDataParams.pMetricsContext = + metricsContextCreateParams.pMetricsContext; + setCounterDataParams.pCounterDataImage = counterDataImage.data(); + setCounterDataParams.isolated = isolated; + setCounterDataParams.rangeIndex = rangeIndex; + + NVPW_CALL(NVPW_MetricsContext_SetCounterData(&setCounterDataParams)); + + + // Now we can evaluate GPU metrics + NVPW_MetricsContext_EvaluateToGpuValues_Params evalToGpuParams = { + NVPW_MetricsContext_EvaluateToGpuValues_Params_STRUCT_SIZE, nullptr}; + evalToGpuParams.pMetricsContext = + metricsContextCreateParams.pMetricsContext; + evalToGpuParams.numMetrics = metricNamePtrs.size(); + evalToGpuParams.ppMetricNames = metricNamePtrs.data(); + evalToGpuParams.pMetricValues = rangeData.values.data(); + + if (!NVPW_CALL(NVPW_MetricsContext_EvaluateToGpuValues(&evalToGpuParams))) { + LOG(WARNING) << "Failed to evaluate metris for range : " + << rangeData.rangeName; + continue; + } + + if (verbose) { + for (size_t i = 0; i < metricNames.size(); i++) { + LOG(INFO) << "rangeName: " << rangeData.rangeName + << "\tmetricName: " << metricNames[i] + << "\tgpuValue: " << rangeData.values[i]; + } + } + + result.rangeVals.emplace_back(std::move(rangeData)); + } + + return result; +} + +#else + +bool getProfilerConfigImage( + const std::string& /*chipName*/, + const std::vector& /*metricNames*/, + std::vector& /*configImage*/, + const uint8_t* /*counterAvailabilityImage*/) { + return false; +} + +bool getCounterDataPrefixImage( + const std::string& /*chipName*/, + const std::vector& /*metricNames*/, + std::vector& /*counterDataImagePrefix*/) { + return false; +} + +CuptiProfilerResult evalMetricValues( + const std::string& /*chipName*/, + const std::vector& /*counterDataImage*/, + const std::vector& /*metricNames*/, + bool /*verbose*/) { + return {}; +} + +#endif // cuda version > 10.00 and < 11.04 + +} // namespace nvperf +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiNvPerfMetric.h b/plugins/tensorboard-plugins/libkineto/src/CuptiNvPerfMetric.h new file mode 100644 index 0000000000000000000000000000000000000000..d5dd1b1c1d20b066891f8be679e6d6371d4f4a9b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiNvPerfMetric.h @@ -0,0 +1,71 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "Logger.h" + +namespace KINETO_NAMESPACE { + +struct CuptiRangeMeasurement { + std::string rangeName; + std::vector values; +}; + +struct CuptiProfilerResult { + std::vector metricNames; + // rangeName, list values + std::vector rangeVals; +}; + +/* Utilities for CUPTI and NVIDIA PerfWorks Metric API + */ + +#define NVPW_CALL(call) \ + [&]() -> bool { \ + NVPA_Status _status_ = call; \ + if (_status_ != NVPA_STATUS_SUCCESS) { \ + LOG(WARNING) << fmt::format( \ + "function {} failed with error ({})", \ + #call, \ + (int)_status_); \ + return false; \ + } \ + return true; \ + }() + +// fixme - add a results string +// nvpperfGetResultString(_status_, &_errstr_); + +namespace nvperf { + +// Setup CUPTI profiler configuration blob and counter data image prefix +bool getProfilerConfigImage( + const std::string& chipName, + const std::vector& metricNames, + std::vector& configImage, + const uint8_t* counterAvailabilityImage = nullptr); + +// Setup CUPTI profiler configuration blob and counter data image prefix +bool getCounterDataPrefixImage( + const std::string& chipName, + const std::vector& metricNames, + std::vector& counterDataImagePrefix); + +/* NV Perf Metric Evaluation helpers + * - utilities to read binary data and obtain metrics for ranges + */ +CuptiProfilerResult evalMetricValues( + const std::string& chipName, + const std::vector& counterDataImage, + const std::vector& metricNames, + bool verbose = false); + + +} // namespace nvperf +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerApi.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerApi.cpp new file mode 100644 index 0000000000000000000000000000000000000000..e5f18ed7b0b70963eb2deab126ff4f7119ed582b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerApi.cpp @@ -0,0 +1,751 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include +#ifdef HAS_CUPTI +#include +#include +#endif // HAS_CUPTI +#include +#include + +#ifdef HAS_CUPTI +#include "cupti_call.h" +#endif + +#include "time_since_epoch.h" +#include "Logger.h" +#include "Demangle.h" + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "CuptiCallbackApiMock.h" +#include "CuptiRangeProfilerApi.h" + +#if HAS_CUPTI_RANGE_PROFILER +#include +#include +#include "cupti_call.h" +#endif // HAS_CUPTI_RANGE_PROFILER + +namespace KINETO_NAMESPACE { + +#if HAS_CUPTI_RANGE_PROFILER +constexpr char kRootUserRangeName[] = "__profile__"; +constexpr int kCallbacksCountToFlush = 500; + +// Should we set Counter availability image ourselves? +// Disabled this right now as this call conflicts with DCGM +// It is not clear why it should conflict except it being a profiler API call +// TODO Revisit +constexpr bool kSetCounterAvail = false; + +// Shared state to track one Cupti Profiler API per Device +namespace { +// per device profiler maps +std::unordered_map profiler_map; +std::unordered_map enable_flag; +std::unordered_map disable_flag; + +std::mutex contextMutex_; +std::unordered_map ctx_to_dev; +std::set active_devices; +} + +// forward declarations +void __trackCudaCtx(CUcontext ctx, uint32_t device_id, CUpti_CallbackId cbid); +void __trackCudaKernelLaunch(CUcontext ctx, const char* kernelName); + +/// Helper functions + +// Available raw counters +std::vector getCounterAvailiability(CUcontext cuContext) { + std::vector counterAvailabilityImage; + CUpti_Profiler_GetCounterAvailability_Params getCounterAvailabilityParams = { + CUpti_Profiler_GetCounterAvailability_Params_STRUCT_SIZE, nullptr}; + getCounterAvailabilityParams.ctx = cuContext; + CUPTI_CALL( + cuptiProfilerGetCounterAvailability(&getCounterAvailabilityParams)); + + counterAvailabilityImage.clear(); + counterAvailabilityImage.resize( + getCounterAvailabilityParams.counterAvailabilityImageSize); + + getCounterAvailabilityParams.pCounterAvailabilityImage = + counterAvailabilityImage.data(); + CUPTI_CALL( + cuptiProfilerGetCounterAvailability(&getCounterAvailabilityParams)); + + return counterAvailabilityImage; +} + +std::string getChipName(int deviceId) { + // Get chip name for the cuda device + CUpti_Device_GetChipName_Params getChipNameParams = { + CUpti_Device_GetChipName_Params_STRUCT_SIZE, nullptr}; + + getChipNameParams.deviceIndex = deviceId; + CUPTI_CALL(cuptiDeviceGetChipName(&getChipNameParams)); + + return getChipNameParams.pChipName; +} + +inline uint32_t getDevID(CUcontext ctx) { + uint32_t device_id = UINT32_MAX; + CUPTI_CALL(cuptiGetDeviceId(ctx, &device_id)); + if (device_id == UINT32_MAX) { + LOG(ERROR) << "Could not determine dev id for = " << ctx; + } + return device_id; +} + +// We use CUPTI Callback functions in three ways : +// 1. Track cuda contexts and maintain a list of active GPUs to profile +// 2. Callbacks on kernel launches to track the name of automatic +// ranges that correspond to names of kernels +// 3. Lastly CUPTI profiler has to be enabled on the same thread executing +// the CUDA kernels. We use Callbacks to enable the profiler +// asynchronously from another thread. + +void disableKernelCallbacks(); + +void trackCudaCtx( + CUpti_CallbackDomain /*domain*/, + CUpti_CallbackId cbid, + const CUpti_CallbackData* cbInfo) { + auto *d = reinterpret_cast(cbInfo); + auto ctx = d->context; + uint32_t device_id = getDevID(ctx); + + if (device_id == UINT32_MAX) { + return; + } + + __trackCudaCtx(ctx, device_id, cbid); +} + +void __trackCudaCtx(CUcontext ctx, uint32_t device_id, CUpti_CallbackId cbid) { + std::lock_guard g(contextMutex_); + if (cbid == CUPTI_CBID_RESOURCE_CONTEXT_CREATED) { + VLOG(0) << "CUPTI Profiler observed CUDA Context created = " + << ctx << " device id = " << device_id; + active_devices.insert(device_id); + if constexpr (kSetCounterAvail) { + if (active_devices.size() == 1) { + CuptiRBProfilerSession::setCounterAvailabilityImage( + getCounterAvailiability(ctx)); + } + } + ctx_to_dev[ctx] = device_id; + + } else if (cbid == CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING) { + VLOG(0) << "CUPTI Profiler observed CUDA Context destroyed = " + << ctx << " device id = " << device_id; + auto it = active_devices.find(device_id); + if (it != active_devices.end()) { + active_devices.erase(it); + ctx_to_dev.erase(ctx); + } + } +} + +void trackCudaKernelLaunch( + CUpti_CallbackDomain /*domain*/, + CUpti_CallbackId /*cbid*/, + const CUpti_CallbackData* cbInfo) { + VLOG(1) << " Trace : Callback name = " + << (cbInfo->symbolName ? cbInfo->symbolName: "") + << " context ptr = " << cbInfo->context; + auto ctx = cbInfo->context; + // should be in CUPTI_API_ENTER call site + if (cbInfo->callbackSite != CUPTI_API_ENTER) { + return; + } + __trackCudaKernelLaunch(ctx, cbInfo->symbolName); +} + +void __trackCudaKernelLaunch( + CUcontext ctx, + const char* kernelName) { + VLOG(0) << " Tracking kernel name = " << (kernelName ? kernelName : "") + << " context ptr = " << ctx; + + uint32_t device_id = 0; + auto it = ctx_to_dev.find(ctx); + if (it == ctx_to_dev.end()) { + // Warning here could be too noisy + VLOG(0) << " Could not find corresponding device to ctx = " << ctx; + return; + } else { + device_id = it->second; + } + + auto pit = profiler_map.find(device_id); + if (pit == profiler_map.end() || pit->second == nullptr) { + return; + } + auto profiler = pit->second; + + if (enable_flag[device_id]) { + LOG(INFO) << "Callback handler is enabling cupti profiler"; + profiler->startAndEnable(); + enable_flag[device_id] = false; + + } else if (disable_flag[device_id]) { + LOG(INFO) << "Callback handler is disabling cupti profiler"; + profiler->disableAndStop(); + return; + } + + if (profiler->curRange_ == CUPTI_AutoRange) { + profiler->logKernelName(kernelName ? kernelName : "__missing__"); + } + + /* TODO add per kernel time logging + if (measure_per_kernel) { + profiler->kernelStartTs_.push_back( + std::chrono::high_resolution_clock::now()); + } + */ + + // periodically flush profiler data from GPU + if (profiler->numCallbacks_ % kCallbacksCountToFlush == 0) { + profiler->flushCounterData(); + } + profiler->numCallbacks_++; +} + +void enableKernelCallbacks() { + auto& cbapi = CuptiCallbackApi::singleton(); + bool status = cbapi.enableCallback( + CUPTI_CB_DOMAIN_RUNTIME_API, + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000); + if (!status) { + LOG(WARNING) << "CUPTI Range Profiler unable to " + << "enable cuda kernel launch callback"; + return; + } + LOG(INFO) << "CUPTI Profiler kernel callbacks enabled"; +} + +void disableKernelCallbacks() { + auto& cbapi = CuptiCallbackApi::singleton(); + bool status = cbapi.disableCallback( + CUPTI_CB_DOMAIN_RUNTIME_API, + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000); + if (!status) { + LOG(WARNING) << "CUPTI Range Profiler unable to " + << "disable cuda kernel launch callback"; + return; + } + LOG(INFO) << "CUPTI Profiler kernel callbacks disabled"; +} + +// static +std::set CuptiRBProfilerSession::getActiveDevices() { + std::lock_guard g(contextMutex_); + return active_devices; +} + +// static +void CuptiRBProfilerSession::initCupti() { + CUpti_Profiler_Initialize_Params profilerInitializeParams = { + CUpti_Profiler_Initialize_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerInitialize(&profilerInitializeParams)); +} + +// static +void CuptiRBProfilerSession::deInitCupti() { + CUpti_Profiler_DeInitialize_Params profilerDeInitializeParams = { + CUpti_Profiler_DeInitialize_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerDeInitialize(&profilerDeInitializeParams)); +} + +// static +void CuptiRBProfilerSession::staticInit() { + CuptiRBProfilerSession::initCupti(); + + // Register CUPTI callbacks + auto& cbapi = CuptiCallbackApi::singleton(); + CUpti_CallbackDomain domain = CUPTI_CB_DOMAIN_RESOURCE; + bool status = cbapi.registerCallback( + domain, CuptiCallbackApi::RESOURCE_CONTEXT_CREATED, trackCudaCtx); + status = status && cbapi.registerCallback( + domain, CuptiCallbackApi::RESOURCE_CONTEXT_DESTROYED, trackCudaCtx); + status = status && cbapi.enableCallback( + domain, CUPTI_CBID_RESOURCE_CONTEXT_CREATED); + status = status && cbapi.enableCallback( + domain, CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING); + + if (!status) { + LOG(WARNING) << "CUPTI Range Profiler unable to attach cuda context " + << "create and destroy callbacks"; + CUPTI_CALL(cbapi.getCuptiStatus()); + return; + } + + domain = CUPTI_CB_DOMAIN_RUNTIME_API; + status = cbapi.registerCallback( + domain, CuptiCallbackApi::CUDA_LAUNCH_KERNEL, trackCudaKernelLaunch); + + if (!status) { + LOG(WARNING) << "CUPTI Range Profiler unable to attach cuda kernel " + << "launch callback"; + return; + } +} + +// static +std::vector& CuptiRBProfilerSession::counterAvailabilityImage() { + static std::vector counterAvailabilityImage_; + return counterAvailabilityImage_; +} + + +// Setup the profiler sessions +CuptiRBProfilerSession::CuptiRBProfilerSession( + const std::vector& metricNames, + int deviceId, + int maxRanges, + int numNestingLevels, + CUcontext cuContext) + : metricNames_(metricNames), + chipName_(getChipName(deviceId)), + deviceId_(deviceId), + maxRanges_(maxRanges), + numNestingLevels_(numNestingLevels), + cuContext_(cuContext) { + CuptiRBProfilerSession::initCupti(); + + LOG(INFO) << "Initializing CUPTI profiler session : device = " << deviceId + << " chip = " << chipName_; + /* Generate configuration for metrics, this can also be done offline*/ + NVPW_InitializeHost_Params initializeHostParams = { + NVPW_InitializeHost_Params_STRUCT_SIZE, nullptr}; + NVPW_CALL(NVPW_InitializeHost(&initializeHostParams)); + + if (metricNames.size()) { + if (!nvperf::getProfilerConfigImage( + chipName_, + metricNames, + configImage, + CuptiRBProfilerSession::counterAvailabilityImage().data())) { + LOG(ERROR) << "Failed to create configImage or counterDataImagePrefix"; + return; + } + if (!nvperf::getCounterDataPrefixImage( + chipName_, + metricNames, + counterDataImagePrefix)) { + LOG(ERROR) << "Failed to create counterDataImagePrefix"; + return; + } + } else { + LOG(ERROR) << "No metrics provided to profile"; + return; + } + + if (!createCounterDataImage()) { + LOG(ERROR) << "Failed to create counterDataImage"; + return; + } + + LOG(INFO) << "Size of structs\n" + << " config image size = " << configImage.size() << " B" + << " counter data image prefix = " + << counterDataImagePrefix.size() << " B" + << " counter data image size = " << counterDataImage.size() / 1024 + << " KB" + << " counter sb image size = " + << counterDataScratchBuffer.size() << " B"; + + beginPassParams_ = {CUpti_Profiler_BeginPass_Params_STRUCT_SIZE, nullptr}; + endPassParams_ = {CUpti_Profiler_EndPass_Params_STRUCT_SIZE, nullptr}; + + initSuccess_ = true; + profiler_map[deviceId] = this; +} + +// used in unittests only +CuptiRBProfilerSession::CuptiRBProfilerSession(int deviceId, CUcontext ctx) + : deviceId_(deviceId), cuContext_(ctx) { + initSuccess_ = true; + profiler_map[deviceId] = this; +} + +void CuptiRBProfilerSession::startInternal( + CUpti_ProfilerRange profilerRange, + CUpti_ProfilerReplayMode profilerReplayMode) { + LOG(INFO) << "Starting profiler session: profiler range = " + << ((profilerRange == CUPTI_AutoRange) ? "autorange" : "userrange") + << " replay mode = " + << ((profilerReplayMode == CUPTI_KernelReplay) ? "kernel" : "user"); + if (!initSuccess_) { + LOG(WARNING) << __func__ << "() bailing out since initialization failed"; + return; + } + + if (cuContext_ == nullptr) { + for (const auto& it : ctx_to_dev) { + if (it.second == deviceId_) { + cuContext_ = it.first; + break; + } + } + LOG(INFO) << " Cupti Profiler using CUDA context = " << cuContext_; + } + + profilerStartTs_ = std::chrono::high_resolution_clock::now(); + curRange_ = profilerRange; + curReplay_ = profilerReplayMode; + + CUpti_Profiler_BeginSession_Params beginSessionParams = { + CUpti_Profiler_BeginSession_Params_STRUCT_SIZE, nullptr}; + + beginSessionParams.ctx = cuContext_; + beginSessionParams.counterDataImageSize = counterDataImage.size(); + beginSessionParams.pCounterDataImage = counterDataImage.data(); + beginSessionParams.counterDataScratchBufferSize = + counterDataScratchBuffer.size(); + beginSessionParams.pCounterDataScratchBuffer = counterDataScratchBuffer.data(); + beginSessionParams.range = profilerRange; + beginSessionParams.replayMode = profilerReplayMode; + beginSessionParams.maxRangesPerPass = maxRanges_; + beginSessionParams.maxLaunchesPerPass = maxRanges_; + + auto status = CUPTI_CALL(cuptiProfilerBeginSession(&beginSessionParams)); + if (status != CUPTI_SUCCESS) { + LOG(WARNING) << "Failed to start CUPTI profiler"; + initSuccess_ = false; + return; + } + + // Set counter configuration + CUpti_Profiler_SetConfig_Params setConfigParams = { + CUpti_Profiler_SetConfig_Params_STRUCT_SIZE, nullptr}; + + setConfigParams.ctx = cuContext_; + setConfigParams.pConfig = configImage.data(); + setConfigParams.configSize = configImage.size(); + setConfigParams.passIndex = 0; + setConfigParams.minNestingLevel = 1; + setConfigParams.numNestingLevels = numNestingLevels_; + status = CUPTI_CALL(cuptiProfilerSetConfig(&setConfigParams)); + + if (status != CUPTI_SUCCESS) { + LOG(WARNING) << "Failed to configure CUPTI profiler"; + initSuccess_ = false; + return; + } + profilerInitDoneTs_ = std::chrono::high_resolution_clock::now(); + + if (curRange_ == CUPTI_AutoRange) { + enableKernelCallbacks(); + } + profilingActive_ = true; +} + +void CuptiRBProfilerSession::stop() { + if (!initSuccess_) { + LOG(WARNING) << __func__ << "() bailing out since initialization failed"; + return; + } + LOG(INFO) << "Stop profiler session on device = " << deviceId_; + + CUpti_Profiler_UnsetConfig_Params unsetConfigParams = { + CUpti_Profiler_UnsetConfig_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerUnsetConfig(&unsetConfigParams)); + + CUpti_Profiler_EndSession_Params endSessionParams = { + CUpti_Profiler_EndSession_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerEndSession(&endSessionParams)); + + disableKernelCallbacks(); + + profilerStopTs_ = std::chrono::high_resolution_clock::now(); + profilingActive_ = false; +} + +void CuptiRBProfilerSession::beginPass() { + if (!initSuccess_) { + LOG(WARNING) << __func__ << "() bailing out since initialization failed"; + return; + } + CUPTI_CALL(cuptiProfilerBeginPass(&beginPassParams_)); +} + +bool CuptiRBProfilerSession::endPass() { + if (!initSuccess_) { + LOG(WARNING) << __func__ << "() bailing out since initialization failed"; + return true; + } + CUPTI_CALL(cuptiProfilerEndPass(&endPassParams_)); + return endPassParams_.allPassesSubmitted; +} + +void CuptiRBProfilerSession::flushCounterData() { + LOG(INFO) << "Flushing counter data on device = " << deviceId_; + CUpti_Profiler_FlushCounterData_Params flushCounterDataParams = { + CUpti_Profiler_FlushCounterData_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerFlushCounterData(&flushCounterDataParams)); +} + +/// Enable and disable the profiler +void CuptiRBProfilerSession::enable() { + if (!initSuccess_) { + LOG(WARNING) << __func__ << "() bailing out since initialization failed"; + return; + } + CUpti_Profiler_EnableProfiling_Params enableProfilingParams = { + CUpti_Profiler_EnableProfiling_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerEnableProfiling(&enableProfilingParams)); +} + +void CuptiRBProfilerSession::disable() { + if (!initSuccess_) { + LOG(WARNING) << __func__ << "() bailing out since initialization failed"; + return; + } + CUpti_Profiler_DisableProfiling_Params disableProfilingParams = { + CUpti_Profiler_DisableProfiling_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerDisableProfiling(&disableProfilingParams)); +} + +/// User range based profiling +void CuptiRBProfilerSession::pushRange(const std::string& rangeName) { + LOG(INFO) << " CUPTI pushrange ( " << rangeName << " )"; + CUpti_Profiler_PushRange_Params pushRangeParams = { + CUpti_Profiler_PushRange_Params_STRUCT_SIZE, nullptr}; + pushRangeParams.pRangeName = rangeName.c_str(); + CUPTI_CALL(cuptiProfilerPushRange(&pushRangeParams)); +} + +void CuptiRBProfilerSession::popRange() { + LOG(INFO) << " CUPTI pop range"; + CUpti_Profiler_PopRange_Params popRangeParams = { + CUpti_Profiler_PopRange_Params_STRUCT_SIZE, nullptr}; + CUPTI_CALL(cuptiProfilerPopRange(&popRangeParams)); +} + +void CuptiRBProfilerSession::startAndEnable() { + startInternal(curRange_, curReplay_); + if (curReplay_ == CUPTI_UserReplay) { + beginPass(); + } + enable(); + if (curRange_ == CUPTI_UserRange) { + pushRange(kRootUserRangeName); + } + enable_flag[deviceId_] = false; +} + +void CuptiRBProfilerSession::disableAndStop() { + if (curRange_ == CUPTI_UserRange) { + popRange(); + } + disable(); + if (curReplay_ == CUPTI_UserReplay) { + endPass(); + flushCounterData(); + } + stop(); + disable_flag[deviceId_] = false; +} + +void CuptiRBProfilerSession::asyncStartAndEnable( + CUpti_ProfilerRange profilerRange, + CUpti_ProfilerReplayMode profilerReplayMode) { + LOG(INFO) << "Starting CUPTI profiler asynchronously on device = " + << deviceId_ << " profiler range = " + << ((profilerRange == CUPTI_AutoRange) ? "autorange" : "userrange") + << " replay mode = " + << ((profilerReplayMode == CUPTI_KernelReplay) ? "kernel" : "user"); + curReplay_ = profilerReplayMode; + curRange_ = profilerRange; + enable_flag[deviceId_] = true; + enableKernelCallbacks(); +} + +void CuptiRBProfilerSession::asyncDisableAndStop() { + LOG(INFO) << "Stopping CUPTI profiler asynchronously on device = " + << deviceId_ << " cu context = " << cuContext_; + disable_flag[deviceId_] = true; +} + + +CuptiProfilerResult CuptiRBProfilerSession::evaluateMetrics( + bool verbose) { + if (!initSuccess_) { + LOG(WARNING) << "Profiling failed, no results to return"; + return {}; + } + if (profilingActive_) { + disableAndStop(); + } + + LOG(INFO) << "Total kernels logged = " << kernelNames_.size(); + if (verbose) { + for (const auto& kernel : kernelNames_) { + std::cout << demangle(kernel) << std::endl; + } + LOG(INFO) << "Profiler Range data : "; + } + + auto results = nvperf::evalMetricValues( + chipName_, counterDataImage, metricNames_, verbose /*verbose*/); + + // profiler end-end duration + auto duration_ms = std::chrono::duration_cast( + profilerStopTs_ - profilerStartTs_); + + auto init_dur_ms = std::chrono::duration_cast( + profilerInitDoneTs_ - profilerStartTs_); + LOG(INFO) << "Total profiler time = " << duration_ms.count() << " ms"; + LOG(INFO) << "Total profiler init time = " << init_dur_ms.count() << " ms"; + + return results; +} + +std::unique_ptr CuptiRBProfilerSession::getProfilerTraceSpan() { + return std::make_unique( + timeSinceEpoch(profilerStartTs_), + timeSinceEpoch(profilerStopTs_), + "__cupti_profiler__" + ); +} + +void CuptiRBProfilerSession::saveCounterData( + const std::string& /*CounterDataFileName*/, + const std::string& /*CounterDataSBFileName*/) { + /* TBD write binary files for counter data and counter scratch buffer */ +} + +/// Setup counter data +bool CuptiRBProfilerSession::createCounterDataImage() { + CUpti_Profiler_CounterDataImageOptions counterDataImageOptions; + counterDataImageOptions.pCounterDataPrefix = counterDataImagePrefix.data(); + counterDataImageOptions.counterDataPrefixSize = counterDataImagePrefix.size(); + counterDataImageOptions.maxNumRanges = maxRanges_; + counterDataImageOptions.maxNumRangeTreeNodes = maxRanges_; + counterDataImageOptions.maxRangeNameLength = 64; + + // Calculate size of counter data image + CUpti_Profiler_CounterDataImage_CalculateSize_Params calculateSizeParams = { + CUpti_Profiler_CounterDataImage_CalculateSize_Params_STRUCT_SIZE, nullptr}; + calculateSizeParams.pOptions = &counterDataImageOptions; + calculateSizeParams.sizeofCounterDataImageOptions = + CUpti_Profiler_CounterDataImageOptions_STRUCT_SIZE; + + CUPTI_CALL( + cuptiProfilerCounterDataImageCalculateSize(&calculateSizeParams)); + counterDataImage.resize(calculateSizeParams.counterDataImageSize); + + // Initialize counter data image + CUpti_Profiler_CounterDataImage_Initialize_Params initializeParams = { + CUpti_Profiler_CounterDataImage_Initialize_Params_STRUCT_SIZE, nullptr}; + initializeParams.sizeofCounterDataImageOptions = + CUpti_Profiler_CounterDataImageOptions_STRUCT_SIZE; + initializeParams.pOptions = &counterDataImageOptions; + initializeParams.counterDataImageSize = + calculateSizeParams.counterDataImageSize; + initializeParams.pCounterDataImage = counterDataImage.data(); + CUPTI_CALL(cuptiProfilerCounterDataImageInitialize(&initializeParams)); + + // Calculate counter Scratch Buffer size + CUpti_Profiler_CounterDataImage_CalculateScratchBufferSize_Params + scratchBufferSizeParams = { + CUpti_Profiler_CounterDataImage_CalculateScratchBufferSize_Params_STRUCT_SIZE, nullptr}; + + scratchBufferSizeParams.counterDataImageSize = + calculateSizeParams.counterDataImageSize; + scratchBufferSizeParams.pCounterDataImage = + initializeParams.pCounterDataImage; + CUPTI_CALL(cuptiProfilerCounterDataImageCalculateScratchBufferSize( + &scratchBufferSizeParams)); + + counterDataScratchBuffer.resize( + scratchBufferSizeParams.counterDataScratchBufferSize); + + // Initialize scratch buffer + CUpti_Profiler_CounterDataImage_InitializeScratchBuffer_Params + initScratchBufferParams = { + CUpti_Profiler_CounterDataImage_InitializeScratchBuffer_Params_STRUCT_SIZE, nullptr}; + + initScratchBufferParams.counterDataImageSize = + calculateSizeParams.counterDataImageSize; + + initScratchBufferParams.pCounterDataImage = + initializeParams.pCounterDataImage; + initScratchBufferParams.counterDataScratchBufferSize = + scratchBufferSizeParams.counterDataScratchBufferSize; + initScratchBufferParams.pCounterDataScratchBuffer = + counterDataScratchBuffer.data(); + + CUPTI_CALL(cuptiProfilerCounterDataImageInitializeScratchBuffer( + &initScratchBufferParams)); + + return true; +} + +#elif defined(HAS_CUPTI) + +// Create empty stubs for the API when CUPTI is not present. +CuptiRBProfilerSession::CuptiRBProfilerSession( + const std::vector& metricNames, + int deviceId, + int maxRanges, + int numNestingLevels, + CUcontext cuContext) + : metricNames_(metricNames), + deviceId_(deviceId), + maxRanges_(maxRanges), + numNestingLevels_(numNestingLevels), + cuContext_(cuContext) {} +void CuptiRBProfilerSession::stop() {} +void CuptiRBProfilerSession::enable() {} +void CuptiRBProfilerSession::disable() {} +void CuptiRBProfilerSession::beginPass() {} +bool CuptiRBProfilerSession::endPass() { return true; } +void CuptiRBProfilerSession::flushCounterData() {} +void CuptiRBProfilerSession::pushRange(const std::string& /*rangeName*/) {} +void CuptiRBProfilerSession::popRange() {} +void CuptiRBProfilerSession::asyncStartAndEnable( + CUpti_ProfilerRange /*profilerRange*/, + CUpti_ProfilerReplayMode /*profilerReplayMode*/) {} +void CuptiRBProfilerSession::asyncDisableAndStop() {} +CuptiProfilerResult CuptiRBProfilerSession::evaluateMetrics(bool verbose) { + static CuptiProfilerResult res; + return res; +}; +void CuptiRBProfilerSession::saveCounterData( + const std::string& /*CounterDataFileName*/, + const std::string& /*CounterDataSBFileName*/) {} +void CuptiRBProfilerSession::initCupti() {} +void CuptiRBProfilerSession::deInitCupti() {} +void CuptiRBProfilerSession::staticInit() {} +bool CuptiRBProfilerSession::createCounterDataImage() { return true; } +void CuptiRBProfilerSession::startInternal( + CUpti_ProfilerRange /*profilerRange*/, + CUpti_ProfilerReplayMode /*profilerReplayMode*/) {} +std::vector& CuptiRBProfilerSession::counterAvailabilityImage() { + static std::vector _vec; + return _vec; +} +#endif // HAS_CUPTI_RANGE_PROFILER + +namespace testing { + +void trackCudaCtx(CUcontext ctx, uint32_t device_id, CUpti_CallbackId cbid) { +#if HAS_CUPTI_RANGE_PROFILER + __trackCudaCtx(ctx, device_id, cbid); +#endif // HAS_CUPTI_RANGE_PROFILER +} + +void trackCudaKernelLaunch(CUcontext ctx, const char* kernelName) { +#if HAS_CUPTI_RANGE_PROFILER + __trackCudaKernelLaunch(ctx, kernelName); +#endif // HAS_CUPTI_RANGE_PROFILER +} + +} // namespace testing +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerApi.h b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerApi.h new file mode 100644 index 0000000000000000000000000000000000000000..98a0b3ea5f4850dfa060e4e86d5ebf210692db1a --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerApi.h @@ -0,0 +1,220 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#ifdef HAS_CUPTI +#include +#include +// Using CUDA 11 and above due to usage of API: cuptiProfilerGetCounterAvailability. +#if defined(CUDART_VERSION) && CUDART_VERSION >= 10000 && CUDART_VERSION < 11040 && CUDA_VERSION >= 11000 +#define HAS_CUPTI_RANGE_PROFILER 1 +#endif // CUDART_VERSION > 10.00 and < 11.04 && CUDA_VERSION >= 11.00 +#endif // HAS_CUPTI + +#if HAS_CUPTI_RANGE_PROFILER +#include +#include +#include +#else +using CUpti_ProfilerRange = enum +{ + CUPTI_AutoRange, + CUPTI_UserRange, +}; + +using CUpti_ProfilerReplayMode = enum +{ + CUPTI_KernelReplay, + CUPTI_UserReplay, +}; +#endif // HAS_CUPTI_RANGE_PROFILER + +#include +#include +#include +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "TraceSpan.h" +#include "CuptiCallbackApi.h" +#include "CuptiNvPerfMetric.h" + +/* Cupti Range based profiler session + * See : https://docs.nvidia.com/cupti/Cupti/r_main.html#r_profiler + */ + +namespace KINETO_NAMESPACE { + +class CuptiRBProfilerSession { + public: + // Initialize and configure CUPTI Profiler counters. + // - Metric names must be provided as string vector. + // - Supported values by CUPTI can be found at - + // https://docs.nvidia.com/cupti/Cupti/r_main.html#r_host_metrics_api + explicit CuptiRBProfilerSession( + const std::vector& metricNames, + int deviceId, + int maxRanges, + int numNestingLevels = 1, + CUcontext cuContext = 0); + + virtual ~CuptiRBProfilerSession() = default; + + // Start profiling session + // This function has to be called from the CPU thread running + // the CUDA context. If this is not the case asyncStartAndEnable() + // can be used + void start( + CUpti_ProfilerRange profilerRange = CUPTI_AutoRange, + CUpti_ProfilerReplayMode profilerReplayMode = CUPTI_KernelReplay) { + startInternal(profilerRange, profilerReplayMode); + } + + // Stop profiling session + virtual void stop(); + + virtual void enable(); + virtual void disable(); + + // Profiler passes + // GPU hardware has limited performance monitoring resources + // the CUPTI profiler may need to run multiple passes to collect + // data for a given range + // If we use kernel replay model the kernels are automatically replayed + // else, you can use the beginPass() and endPass() functions below + // for user to manage the replays + + // starts a profiler pass with given kernels in between + virtual void beginPass(); + + // end a profiler pass with given kernels in between + // returns true if no more passes are required + virtual bool endPass(); + + // flushes the counter data - required if you use user replay + virtual void flushCounterData(); + + // Each pass can contain multiple of ranges + // metrics configured in a pass are collected per each range-stack. + virtual void pushRange(const std::string& rangeName); + virtual void popRange(); + + // utilities for common operations + void startAndEnable(); + void disableAndStop(); + + // Async APIs : these will can be called from another thread + // outside the CUDA context being profiled + void asyncStartAndEnable( + CUpti_ProfilerRange profilerRange = CUPTI_AutoRange, + CUpti_ProfilerReplayMode profilerReplayMode = CUPTI_KernelReplay); + void asyncDisableAndStop(); + + void printMetrics() { + evaluateMetrics(true); + } + + std::unique_ptr getProfilerTraceSpan(); + + virtual CuptiProfilerResult evaluateMetrics(bool verbose = false); + + void saveCounterData( + const std::string& CounterDataFileName, + const std::string& CounterDataSBFileName); + + // This is not thread safe so please only call after + // profiling has stopped + const std::vector& getKernelNames() const { + return kernelNames_; + } + + int deviceId() const { + return deviceId_; + } + + bool profilingActive() const { + return profilingActive_; + } + + static std::set getActiveDevices(); + + static void initCupti(); + + static void deInitCupti(); + + static void staticInit(); + + static void setCounterAvailabilityImage(std::vector img) { + counterAvailabilityImage() = img; + } + protected: + CuptiRBProfilerSession(int deviceId, CUcontext ctx); + + virtual void startInternal( + CUpti_ProfilerRange profilerRange, + CUpti_ProfilerReplayMode profilerReplayMode); + + CUpti_ProfilerRange curRange_ = CUPTI_AutoRange; + CUpti_ProfilerReplayMode curReplay_ = CUPTI_KernelReplay; + + private: + + bool createCounterDataImage(); + + + // log kernel name that used with callbacks + void logKernelName(const char* kernel) { + std::lock_guard lg(kernelNamesMutex_); + kernelNames_.emplace_back(kernel); + } + + std::vector metricNames_; + std::string chipName_; + + uint32_t deviceId_ = 0; + int maxRanges_; + int numNestingLevels_; + CUcontext cuContext_; + + + // data buffers for configuration and counter data collection + std::vector counterDataImagePrefix; + std::vector configImage; + std::vector counterDataImage; + std::vector counterDataScratchBuffer; + + std::chrono::time_point profilerStartTs_; + std::chrono::time_point + profilerInitDoneTs_; + std::chrono::time_point profilerStopTs_; + + std::mutex kernelNamesMutex_; + // raw kernel names (not demangled) + std::vector kernelNames_; + + uint32_t numCallbacks_ = 0; + + static std::vector& counterAvailabilityImage(); + +#if HAS_CUPTI_RANGE_PROFILER + CUpti_Profiler_BeginPass_Params beginPassParams_; + CUpti_Profiler_EndPass_Params endPassParams_; +#endif + + bool initSuccess_ = false; + bool profilingActive_ = false; + + friend void __trackCudaKernelLaunch(CUcontext ctx, const char* kernelName); +}; + +// called directly only in unit tests +namespace testing { + +void trackCudaCtx(CUcontext ctx, uint32_t device_id, CUpti_CallbackId cbid); +void trackCudaKernelLaunch(CUcontext ctx, const char* kernelName); + +} // namespace testing + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerConfig.cpp b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerConfig.cpp new file mode 100644 index 0000000000000000000000000000000000000000..04b1ad0cb3f807cf87d32bc03de0ca9b552b0063 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerConfig.cpp @@ -0,0 +1,68 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include + +#include +#include + +#include +#include + +using namespace std::chrono; + +namespace KINETO_NAMESPACE { + +// number of ranges affect the size of counter data binary used by +// the CUPTI Profiler. these defaults can be tuned +constexpr int KMaxAutoRanges = 1500; // supports 1500 kernels +constexpr int KMaxUserRanges = 10; // enable upto 10 sub regions marked by user + +constexpr char kCuptiProfilerMetricsKey[] = "CUPTI_PROFILER_METRICS"; +constexpr char kCuptiProfilerPerKernelKey[] = "CUPTI_PROFILER_ENABLE_PER_KERNEL"; +constexpr char kCuptiProfilerMaxRangesKey[] = "CUPTI_PROFILER_MAX_RANGES"; + +CuptiRangeProfilerConfig::CuptiRangeProfilerConfig(Config& cfg) + : parent_(&cfg), + cuptiProfilerPerKernel_(false), + cuptiProfilerMaxRanges_(0) {} + +bool CuptiRangeProfilerConfig::handleOption(const std::string& name, std::string& val) { + VLOG(0) << " handling : " << name << " = " << val; + // Cupti Range based Profiler configuration + if (!name.compare(kCuptiProfilerMetricsKey)) { + activitiesCuptiMetrics_ = splitAndTrim(val, ','); + } else if (!name.compare(kCuptiProfilerPerKernelKey)) { + cuptiProfilerPerKernel_ = toBool(val); + } else if (!name.compare(kCuptiProfilerMaxRangesKey)) { + cuptiProfilerMaxRanges_ = toInt64(val); + } else { + return false; + } + return true; +} + +void CuptiRangeProfilerConfig::setDefaults() { + if (activitiesCuptiMetrics_.size() > 0 && cuptiProfilerMaxRanges_ == 0) { + cuptiProfilerMaxRanges_ = + cuptiProfilerPerKernel_ ? KMaxAutoRanges : KMaxUserRanges; + } +} + +void CuptiRangeProfilerConfig::printActivityProfilerConfig(std::ostream& s) const { + if (activitiesCuptiMetrics_.size() > 0) { + s << "Cupti Profiler metrics : " + << fmt::format("{}", fmt::join(activitiesCuptiMetrics_, ", ")) << std::endl; + s << "Cupti Profiler measure per kernel : " + << cuptiProfilerPerKernel_ << std::endl; + s << "Cupti Profiler max ranges : " << cuptiProfilerMaxRanges_ << std::endl; + } +} + +void CuptiRangeProfilerConfig::registerFactory() { + Config::addConfigFactory( + kCuptiProfilerConfigName, + [](Config& cfg) { return new CuptiRangeProfilerConfig(cfg); }); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerConfig.h b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerConfig.h new file mode 100644 index 0000000000000000000000000000000000000000..549b8a4e8b40c66b59bae974eb87c7f64967344e --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/CuptiRangeProfilerConfig.h @@ -0,0 +1,86 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include "Config.h" + +#include +#include +#include +#include + +namespace KINETO_NAMESPACE { + +constexpr char kCuptiProfilerConfigName[] = "cupti_rb_profiler"; + +class CuptiRangeProfilerConfig : public AbstractConfig { + public: + bool handleOption(const std::string& name, std::string& val) override; + + void validate( + const std::chrono::time_point& + fallbackProfileStartTime) override {} + + static CuptiRangeProfilerConfig& get(const Config& cfg) { + return dynamic_cast(cfg.feature( + kCuptiProfilerConfigName)); + } + + Config& parent() const { + return *parent_; + } + + std::vector activitiesCuptiMetrics() const { + return activitiesCuptiMetrics_; + } + + bool cuptiProfilerPerKernel() const { + return cuptiProfilerPerKernel_; + } + + int64_t cuptiProfilerMaxRanges() const { + return cuptiProfilerMaxRanges_; + } + + void setSignalDefaults() override { + setDefaults(); + } + + void setClientDefaults() override { + setDefaults(); + } + + void printActivityProfilerConfig(std::ostream& s) const override; + + static void registerFactory(); + protected: + AbstractConfig* cloneDerived(AbstractConfig& parent) const override { + CuptiRangeProfilerConfig* clone = new CuptiRangeProfilerConfig(*this); + clone->parent_ = dynamic_cast(&parent); + return clone; + } + + private: + CuptiRangeProfilerConfig() = delete; + explicit CuptiRangeProfilerConfig(Config& parent); + explicit CuptiRangeProfilerConfig( + const CuptiRangeProfilerConfig& other) = default; + + // some defaults will depend on other configuration + void setDefaults(); + + // Associated Config object + Config* parent_; + + // Counter metrics exposed via CUPTI Profiler API + std::vector activitiesCuptiMetrics_; + + // Collect profiler metrics per kernel - autorange made + bool cuptiProfilerPerKernel_{false}; + + // max number of ranges to configure the profiler for. + // this has to be set before hand to reserve space for the output + int64_t cuptiProfilerMaxRanges_ = 0; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/DaemonConfigLoader.h b/plugins/tensorboard-plugins/libkineto/src/DaemonConfigLoader.h new file mode 100644 index 0000000000000000000000000000000000000000..9b0ed92863648824a57ce8193ddc16d7cf23622e --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/DaemonConfigLoader.h @@ -0,0 +1,27 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include + +namespace KINETO_NAMESPACE { + +class DaemonConfigLoader { + public: + virtual ~DaemonConfigLoader() {} + + // Return the base config from the daemon + virtual std::string readBaseConfig() = 0; + + // Return a configuration string from the daemon, if one has been posted. + virtual std::string readOnDemandConfig(bool events, bool activities) = 0; + + // Returns the number of tracked contexts for this device. The daemon has a + // global view. If an unexpedted error occurs, return -1. + virtual int gpuContextCount(uint32_t device) = 0; + + virtual void setCommunicationFabric(bool enabled) = 0; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/Demangle.cpp b/plugins/tensorboard-plugins/libkineto/src/Demangle.cpp new file mode 100644 index 0000000000000000000000000000000000000000..f84f0b8ec36f621061cb1e8bb8dd948cb8aed7b3 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/Demangle.cpp @@ -0,0 +1,49 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "Demangle.h" + +#ifndef _MSC_VER +#include +#endif +#include +#include + +namespace KINETO_NAMESPACE { + +static constexpr int kMaxSymbolSize = 1024; + +std::string demangle(const char* name) { +#ifndef _MSC_VER + if (!name) { + return ""; + } + + if (strlen(name) > kMaxSymbolSize) { + return name; + } + + int status; + size_t len = 0; + char* demangled = abi::__cxa_demangle(name, nullptr, &len, &status); + if (status != 0) { + return name; + } + std::string res(demangled); + // The returned buffer must be freed! + free(demangled); + return res; +#else + // TODO: demangling on Windows + if (!name) { + return ""; + } else { + return name; + } +#endif +} + +std::string demangle(const std::string& name) { + return demangle(name.c_str()); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/Demangle.h b/plugins/tensorboard-plugins/libkineto/src/Demangle.h new file mode 100644 index 0000000000000000000000000000000000000000..6dcf0776f1abf30e7e3614272fa02f6bae1bdf35 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/Demangle.h @@ -0,0 +1,12 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +namespace KINETO_NAMESPACE { + +std::string demangle(const char* name); +std::string demangle(const std::string& name); + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/EventProfiler.cpp b/plugins/tensorboard-plugins/libkineto/src/EventProfiler.cpp new file mode 100644 index 0000000000000000000000000000000000000000..dbf2755238974392ff6205f05a5c80a1733bf2ee --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/EventProfiler.cpp @@ -0,0 +1,635 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "EventProfiler.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "CuptiEventApi.h" +#include "Logger.h" + +using namespace std::chrono; +using std::accumulate; +using std::endl; +using std::map; +using std::ostream; +using std::string; +using std::unique_ptr; +using std::vector; + +namespace KINETO_NAMESPACE { + +static std::mutex& logMutex() { + static std::mutex instance; + return instance; +} + +// --------------------------------------------------------------------- +// class Event +// --------------------------------------------------------------------- + +// Compute domain instance percentiles +PercentileList& Event::percentiles( + PercentileList& pcs, + const SampleSlice& slice) const { + vector instance_values; + instance_values.reserve(instanceCount); + for (int i = 0; i < instanceCount; i++) { + instance_values.push_back(sumInstance(i, slice)); + } + return KINETO_NAMESPACE::percentiles(instance_values, pcs); +} + +// Add up all samples for a given domain instance +int64_t Event::sumInstance(int i, const SampleSlice& slice) const { + auto r = toIdxRange(slice); + auto start = samples_.cbegin(); + std::advance(start, r.first); + auto end = start; + std::advance(end, r.second); + return accumulate(start, end, 0ul, [i](int64_t a, const Sample& b) { + return a + b.second[i]; + }); +} + +// Add up all samples across all domain instances +int64_t Event::sumAll(const SampleSlice& slice) const { + int64_t res = 0; + for (int i = 0; i < instanceCount; i++) { + res += sumInstance(i, slice); + } + return res; +} + +// Print raw sample values for all domains +void Event::printSamples(ostream& s, CUdevice device) const { + // Don't mess up output with interleaved lines + // Probably OK to reuse logMutex() here since this is + // used for debugging, but need to keep an eye on it. + std::lock_guard lock(logMutex()); + s << "Device " << device << " " << name << ":" << endl; + for (const auto& sample : samples_) { + const auto& vals = sample.second; + for (int64_t val : vals) { + s << val << " "; + } + s << endl; + } +} + +// --------------------------------------------------------------------- +// class Metric +// --------------------------------------------------------------------- +Metric::Metric( + string name, + CUpti_MetricID id, + vector events, + CUpti_MetricEvaluationMode eval_mode, + CuptiMetricApi& cupti_metrics) + : name(std::move(name)), + id_(id), + events_(std::move(events)), + evalMode_(eval_mode), + cuptiMetrics_(cupti_metrics), + valueKind_(cuptiMetrics_.valueKind(id)) {} + +// Return per-SM vector as well as total +struct Metric::CalculatedValues Metric::calculate( + map& event_map, + nanoseconds sample_duration, + const SampleSlice& slice) { + vector metric_values; + vector ev_values; + ev_values.reserve(events_.size()); + if (evalMode_ & CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE) { + int instance_count = instanceCount(event_map); + metric_values.reserve(instance_count); + for (int i = 0; i < instance_count; i++) { + ev_values.clear(); + for (CUpti_EventID event_id : events_) { + ev_values.push_back(event_map[event_id].sumInstance(i, slice)); + } + metric_values.push_back(cuptiMetrics_.calculate( + id_, valueKind_, events_, ev_values, sample_duration.count())); + } + } + + // FIXME: Check assumption that all instances are profiled + ev_values.clear(); + for (CUpti_EventID event_id : events_) { + ev_values.push_back(event_map[event_id].sumAll(slice)); + } + SampleValue total = cuptiMetrics_.calculate( + id_, valueKind_, events_, ev_values, sample_duration.count()); + if (evalMode_ & CUPTI_METRIC_EVALUATION_MODE_AGGREGATE) { + metric_values.push_back(total); + } + return {metric_values, std::move(total)}; +} + +void Metric::printDescription(ostream& s) const { + s << fmt::format("{} ({})", name, fmt::join(events_, ",")) << endl; +} + +// --------------------------------------------------------------------- +// class EventGroupSet +// --------------------------------------------------------------------- + +// Each domain has a set of counters. +// Some counters in a domain can be collected simultaneously in a "group" +// Counters from different domains can also be collected at the same time +// Therefore we have a "set of groups", or group set, with counters that +// can all be collected at once. +EventGroupSet::EventGroupSet( + CUpti_EventGroupSet& set, + map& events, + CuptiEventApi& cupti) + : set_(set), events_(events), cuptiEvents_(cupti), enabled_(false) { + for (int g = 0; g < set.numEventGroups; g++) { + CUpti_EventGroup grp = set.eventGroups[g]; + // Profile all domain instances + cuptiEvents_.enablePerInstance(grp); + uint32_t instance_count = cuptiEvents_.instanceCount(grp); + for (const auto& id : cuptiEvents_.eventsInGroup(grp)) { + VLOG(0) << "Instance count for " << id << ":" << instance_count; + events_[id].instanceCount = instance_count; + } + } +} + +EventGroupSet::~EventGroupSet() { + // Disable EventGroupSet in Cupti. + if (enabled_) { + setEnabled(false); + } +} + +// Enable or disable this group set +void EventGroupSet::setEnabled(bool enabled) { + if (enabled && !enabled_) { + cuptiEvents_.enableGroupSet(set_); + } else if (!enabled && enabled_) { + cuptiEvents_.disableGroupSet(set_); + } + enabled_ = enabled; +} + +// Collect counter values for each counter in group set +void EventGroupSet::collectSample() { + auto timestamp = system_clock::now(); + for (int g = 0; g < set_.numEventGroups; g++) { + CUpti_EventGroup grp = set_.eventGroups[g]; + for (const auto& id : cuptiEvents_.eventsInGroup(grp)) { + Event& ev = events_[id]; + vector vals(ev.instanceCount); + // FIXME: Use cuptiEventGroupReadAllEvents + cuptiEvents_.readEvent(grp, id, vals); + + if (VLOG_IS_ON(0)) { + for (int64_t v : vals) { + if (v == CUPTI_EVENT_OVERFLOW) { + LOG(WARNING) << "Counter overflow detected " + << "- decrease sample period!" << endl; + } + } + } + + ev.addSample(timestamp, vals); + } + } + + if (VLOG_IS_ON(1)) { + auto t2 = system_clock::now(); + VLOG(1) << "Device " << cuptiEvents_.device() << " Sample (us): " + << duration_cast(t2 - timestamp).count(); + } +} + +// Print names of events in this group set, ordered by group +void EventGroupSet::printDescription(ostream& s) const { + for (int g = 0; g < set_.numEventGroups; g++) { + s << " Events in group " << g << ": "; + for (const auto& id : cuptiEvents_.eventsInGroup(set_.eventGroups[g])) { + s << id << " (" << events_[id].name << ") "; + } + s << endl; + } +} + +// --------------------------------------------------------------------- +// class EventProfiler +// --------------------------------------------------------------------- + +// Find nearest factor of a number by linear search, +// starting at hi and lo - hi searches up and lo searches down +static int nearestFactor(int hi, int lo, int number) { + return number % hi == 0 + ? hi + : number % lo == 0 ? lo : nearestFactor(hi + 1, lo - 1, number); +} + +static int nearestFactor(int count, int max) { + return nearestFactor(count, count, max); +} + +void EventProfiler::initEvents(const std::set& eventNames) { + events_.clear(); + // Build event map + for (const auto& name : eventNames) { + events_.emplace(cuptiEvents_->eventId(name), name); + } +} + +void EventProfiler::initMetrics(const std::set& metricNames) { + metrics_.clear(); + // Add events from metrics + metrics_.reserve(metricNames.size()); + for (const auto& metric_name : metricNames) { + CUpti_MetricID metric_id = cuptiMetrics_->idFromName(metric_name); + if (metric_id == ~0) { + continue; + } + + const auto& events = cuptiMetrics_->events(metric_id); + vector event_ids; + event_ids.reserve(events.size()); + for (const auto& pair : events) { + CUpti_EventID id = pair.first; + const string& event_name = pair.second; + if (event_name.empty()) { + // For unnamed events, use metric name and event id + // FIXME: For subsequent metrics using the same event, + // this will be confusing + events_.emplace(id, metric_name + "_" + event_name); + } else { + events_.emplace(id, event_name); + } + event_ids.push_back(id); + } + metrics_.emplace_back( + metric_name, + metric_id, + event_ids, + cuptiMetrics_->evaluationMode(metric_id), + *cuptiMetrics_); + } +} + +bool EventProfiler::initEventGroups() { + sets_.clear(); + if (eventGroupSets_) { + cuptiEvents_->destroyGroupSets(eventGroupSets_); + eventGroupSets_ = nullptr; + } + if (events_.empty()) { + return true; + } + + // Determine sets of groups to be collected + vector ids; + ids.reserve(events_.size()); + for (const auto& ev : events_) { + ids.push_back(ev.first); + } + eventGroupSets_ = cuptiEvents_->createGroupSets(ids); + VLOG(0) << "Number of group sets: " << eventGroupSets_->numSets; + for (int i = 0; i < eventGroupSets_->numSets; i++) { + sets_.push_back( + EventGroupSet(eventGroupSets_->sets[i], events_, *cuptiEvents_)); + } + return !sets_.empty(); +} + +static unique_ptr alignAndValidateConfigs( + Config& base, + Config* onDemand) { + auto now = system_clock::now(); + if (!onDemand || + now > + (onDemand->eventProfilerOnDemandStartTime() + + onDemand->eventProfilerOnDemandDuration())) { + base.validate(now); + return base.clone(); + } + + auto res = base.clone(); + res->addEvents(onDemand->eventNames()); + res->addMetrics(onDemand->metricNames()); + + int sample_period = + std::min(base.samplePeriod().count(), onDemand->samplePeriod().count()); + if (sample_period < base.samplePeriod().count() && + (base.samplePeriod().count() % sample_period) != 0) { + sample_period = nearestFactor(sample_period, base.samplePeriod().count()); + LOG(WARNING) + << "On-demand sample period must be a factor of base sample period. " + << "Adjusting from " << onDemand->samplePeriod().count() << "ms to " + << sample_period << "ms."; + } + base.setSamplePeriod(milliseconds(sample_period)); + base.validate(now); + res->setSamplePeriod(base.samplePeriod()); + res->setMultiplexPeriod(base.multiplexPeriod()); + res->validate(now); + onDemand->setSamplePeriod(base.samplePeriod()); + onDemand->setMultiplexPeriod(base.multiplexPeriod()); + onDemand->validate(now); + + return res; +} + +static milliseconds minReportPeriod(const Config& config, int num_sets) { + return config.multiplexPeriod() * num_sets; +} + +static bool canSupportReportPeriod(const Config& config, int num_sets) { + // Can we get through the groups an even number per report period? + milliseconds min_report_period = minReportPeriod(config, num_sets); + return (config.reportPeriod().count() % min_report_period.count()) == 0; +} + +static int completeSamplesPerReport(const Config& config, int num_sets) { + if (num_sets <= 1) { + return config.reportPeriod() / config.samplePeriod(); + } + // Numnber of complete sample collections in the report period + // E.g. if report period is 10000ms, sample period 500ms, + // multiplex period 2000ms and num_sets is 5 then # of complete samples is + // (2000ms / 500ms) * (10000ms / 2000ms / 5) = 4 * 1 = 4 + int samples_per_multiplex_period = + config.multiplexPeriod() / config.samplePeriod(); + int multiplex_periods_per_report = + config.reportPeriod() / config.multiplexPeriod(); + return (multiplex_periods_per_report / num_sets) * + samples_per_multiplex_period; +} + +static bool canSupportSamplesPerReport(const Config& config, int num_sets) { + // Can samples per report can be honored with an exact *full* set of samples? + // We don't support partial samples at this point. + int full_samples_per_report = completeSamplesPerReport(config, num_sets); + return (full_samples_per_report % config.samplesPerReport()) == 0; +} + +static void adjustConfig(Config& config, int num_sets) { + // Don't change sample period and multiplex period here, since that can + // cause overflows and perf degradation. Report period and samples per + // report is OK to change (with warning). + if (!canSupportReportPeriod(config, num_sets)) { + milliseconds min_report_period = minReportPeriod(config, num_sets); + LOG(WARNING) << "Report period must be a multiple of " + << min_report_period.count() << "ms (" << num_sets + << " event sets * " << config.multiplexPeriod().count() + << "ms multiplex period), in order to get complete samples."; + auto new_report_period = + Config::alignUp(config.reportPeriod(), min_report_period); + double sf = + ((double)new_report_period.count()) / config.reportPeriod().count(); + int new_samples_per_report = std::round(config.samplesPerReport() * sf); + LOG(WARNING) << "Adjusting report period from " + << config.reportPeriod().count() << "ms to " + << new_report_period.count() << "ms"; + if (new_samples_per_report != config.samplesPerReport()) { + LOG(WARNING) << "Adjusting samples per report from " + << config.samplesPerReport() << " to " + << new_samples_per_report; + } + config.setReportPeriod(new_report_period); + config.setSamplesPerReport(new_samples_per_report); + } + // Ensure that samples per report can be honored with + // an exact *full* set of samples. Don't support partial + // samples at this point. + if (!canSupportSamplesPerReport(config, num_sets)) { + int full_samples_per_report = completeSamplesPerReport(config, num_sets); + int adjusted_count = + nearestFactor(config.samplesPerReport(), full_samples_per_report); + LOG(WARNING) + << "Samples per report must be such that an even number of " + << "complete samples can be aggregated in each report period. Adjusting" + << " from " << config.samplesPerReport() << " to " << adjusted_count + << " (complete sample count is " << full_samples_per_report << ")"; + config.setSamplesPerReport(adjusted_count); + } +} + +// Prepare profiler +EventProfiler::EventProfiler( + std::unique_ptr cupti_events, + std::unique_ptr cupti_metrics, + vector>& loggers, + vector>& onDemandLoggers) + : cuptiEvents_(std::move(cupti_events)), + cuptiMetrics_(std::move(cupti_metrics)), + loggers_(loggers), + onDemandLoggers_(onDemandLoggers) {} + +void EventProfiler::reportSamples() { + dispatchSamples(*config_, loggers_, baseSamples_); + baseSamples_ += completeSamplesPerReport(*config_, sets_.size()); +} + +void EventProfiler::reportOnDemandSamples() { + dispatchSamples(*onDemandConfig_, onDemandLoggers_, onDemandSamples_); + onDemandSamples_ += completeSamplesPerReport(*onDemandConfig_, sets_.size()); +} + +EventProfiler::~EventProfiler() { + if (eventGroupSets_) { + for (auto& set : sets_) { + set.setEnabled(false); + } + cuptiEvents_->destroyGroupSets(eventGroupSets_); + } + VLOG(0) << "Stopped event profiler for device " << device(); +} + +void EventProfiler::updateLoggers(Config& config, Config* on_demand_config) { + // Update loggers. + for (auto& logger : loggers_) { + std::lock_guard lock(logMutex()); + logger->update(config); + } + + if (on_demand_config) { + // Update onDemand loggers. + for (auto& logger : onDemandLoggers_) { + std::lock_guard lock(logMutex()); + logger->update(*on_demand_config); + } + } +} + +bool EventProfiler::applyConfig(const Config& config) { + // Initialize events, metrics, and event group sets. + // TODO: Send warnings / errors back to dyno for onDemand config + try { + if (!initEventsAndMetrics(config)) { + return false; + } + } catch (const std::exception& ex) { + LOG(WARNING) << "Failed to apply config (" << ex.what() << ")"; + return false; + } + + return true; +} + +bool EventProfiler::initEventsAndMetrics(const Config& config) { + initEvents(config.eventNames()); + initMetrics(config.metricNames()); + // We now have the total list of events to collect + // They need to be organized into groups for multiplexing + if (!initEventGroups()) { + LOG(WARNING) << "No events/metrics initialized successfully"; + return false; + } + + if (VLOG_IS_ON(1)) { + printMetrics(LIBKINETO_DBG_STREAM); + printSets(LIBKINETO_DBG_STREAM); + } + return true; +} + +void EventProfiler::printSets(ostream& s) const { + for (int i = 0; i < sets_.size(); i++) { + s << "Set " << i << endl; + sets_[i].printDescription(s); + } +} + +void EventProfiler::printMetrics(ostream& s) const { + s << "Metrics:" << endl; + for (const Metric& m : metrics_) { + m.printDescription(s); + } +} + +void EventProfiler::printAllSamples(ostream& s, CUdevice device) const { + for (const auto& pair : events_) { + const Event& ev = pair.second; + ev.printSamples(s, device); + } +} + +void EventProfiler::enableNextCounterSet() { + if (sets_.size() > 1) { + auto t1 = system_clock::now(); + + VLOG(1) << "Disabling set " << curEnabledSet_; + sets_[curEnabledSet_].setEnabled(false); + curEnabledSet_ = (curEnabledSet_ + 1) % sets_.size(); + VLOG(1) << "Enabling set " << curEnabledSet_; + sets_[curEnabledSet_].setEnabled(true); + + if (VLOG_IS_ON(1)) { + auto t2 = system_clock::now(); + VLOG(1) << "Switch (us): " + << duration_cast(t2 - t1).count(); + } + } +} + +// Notify listeners of collected samples +void EventProfiler::dispatchSamples( + const Config& config, + const vector>& loggers, + int sample_offset) { + Sample sample(events_.size() + metrics_.size()); + // Normalize values to per second + auto delta = config.reportPeriod() / config.samplesPerReport(); + double sf = 1000.0 * sets_.size() / delta.count(); + for (int i = 0; i < config.samplesPerReport(); i++) { + sample.stats.clear(); + sample.deltaMsec = (delta * i).count(); + SampleSlice slice = {sample_offset, i, config.samplesPerReport()}; + VLOG(1) << "Slice: " << sample_offset << ", " << i << ", " + << config.samplesPerReport(); + for (const auto& pair : events_) { + const Event& ev = pair.second; + int64_t total = std::round(sf * ev.sumAll(slice)); + PercentileList pcs = initPercentiles(config.percentiles()); + normalize(ev.percentiles(pcs, slice), sf); + sample.stats.push_back({ev.name, std::move(pcs), SampleValue(total)}); + } + + for (auto& m : metrics_) { + // calculate returns a pair of per-SM vector and a total + auto vals = m.calculate(events_, delta, slice); + PercentileList pcs = initPercentiles(config.percentiles()); + sample.stats.push_back( + {m.name, std::move(percentiles(vals.perInstance, pcs)), vals.total}); + } + + for (auto& logger : loggers) { + std::lock_guard lock(logMutex()); + logger->handleSample(device(), sample, config.ipcFabricEnabled()); + } + } + + if (VLOG_IS_ON(2)) { + printAllSamples(LIBKINETO_DBG_STREAM, device()); + } +} + +void EventProfiler::configure(Config& config, Config* onDemandConfig) { + if (!sets_.empty()) { + sets_[curEnabledSet_].setEnabled(false); + clearSamples(); + } + + config_ = config.clone(); + onDemandConfig_ = onDemandConfig ? onDemandConfig->clone() : nullptr; + mergedConfig_ = alignAndValidateConfigs(*config_, onDemandConfig_.get()); + if (!applyConfig(*mergedConfig_)) { + LOG(WARNING) << "Failed to apply config!"; + mergedConfig_ = config_->clone(); + applyConfig(*config_); + } + if (!sets_.empty()) { + // Make timing adjustments based on multiplexing requirements. + adjustConfig(*config_, sets_.size()); + if (onDemandConfig_) { + int duration = onDemandConfig_->eventProfilerOnDemandDuration().count(); + LOG(INFO) << "On demand profiler activated for " << duration << " secs"; + adjustConfig(*onDemandConfig_, sets_.size()); + } + // If events or metrics were added or removed, need to tell loggers + updateLoggers(*config_, onDemandConfig_.get()); + } + + curEnabledSet_ = 0; + if (!sets_.empty()) { + sets_[0].setEnabled(true); + } else { + VLOG(0) << "No counters profiled!"; + } + + baseSamples_ = 0; + onDemandSamples_ = 0; +} + +void EventProfiler::collectSample() { + if (sets_.empty()) { + return; + } + sets_[curEnabledSet_].collectSample(); + if (VLOG_IS_ON(1)) { + printAllSamples(LIBKINETO_DBG_STREAM, device()); + } +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/EventProfiler.h b/plugins/tensorboard-plugins/libkineto/src/EventProfiler.h new file mode 100644 index 0000000000000000000000000000000000000000..fafd5b9bb8336b28b210ba58d588d3a798a73969 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/EventProfiler.h @@ -0,0 +1,341 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "Config.h" +#include "CuptiEventApi.h" +#include "CuptiMetricApi.h" +#include "SampleListener.h" + +namespace KINETO_NAMESPACE { + +// Helper function for computing percentiles (nearest-rank). +// Modifies the input. +template +inline PercentileList& percentiles(std::vector values, PercentileList& pcs) { + auto size = values.size(); + for (auto& x : pcs) { + int idx = std::min(size - 1, (x.first * size) / 100); + std::nth_element(values.begin(), values.begin() + idx, values.end()); + x.second = SampleValue(values[idx]); + } + return pcs; +} + +// Helper function for normalizing a percentile list +// Modifies the input +inline PercentileList& normalize(PercentileList& pcs, double sf) { + for (auto& pc : pcs) { + pc.second *= sf; + } + return pcs; +} + +// A slice of the sample buffer +struct SampleSlice { + // Start offset (samples) + int offset; + // Slice number + int index; + // Out of this many + int count; +}; + +// A sampled event +class Event { + public: + /* implicit */ Event(std::string name) : name(std::move(name)) {} + /* implicit */ Event(const char* name) : name(name) {} + Event() : name("INVALID") {} + + Event(const Event&) = delete; + Event& operator=(const Event&) = delete; + Event(Event&&) = default; + Event& operator=(Event&&) = default; + + void addSample( + std::chrono::time_point timestamp, + const std::vector& values) { + assert(values.size() == instanceCount); + samples_.emplace_back(timestamp, values); + } + + // Sum samples for a single domain instance + int64_t sumInstance(int i, const SampleSlice& slice) const; + + // Sum all samples across all domain instances + int64_t sumAll(const SampleSlice& slice) const; + + // Create list of percentiles + PercentileList& percentiles(PercentileList& pcs, const SampleSlice& slice) + const; + + void eraseSamples(int count) { + auto end = samples_.begin(); + std::advance(end, count); + samples_.erase(samples_.begin(), end); + } + + void clearSamples() { + samples_.clear(); + } + + int sampleCount() { + return samples_.size(); + } + + void printSamples(std::ostream& s, CUdevice device) const; + + // Event name (see nvprof --query-events) + std::string name; + + // Number of domain instances for this event, e.g. number of SMs + int instanceCount = 0; + + private: + std::pair toIdxRange(const SampleSlice& slice) const { + int size = (samples_.size() - slice.offset) / slice.count; + return std::make_pair(slice.offset + (slice.index * size), size); + } + + // List of collected samples, where each sample has values for + // one or more domain instances + using Sample = std::pair< + std::chrono::time_point, + std::vector>; + std::list samples_; +}; + +class Metric { + public: + Metric( + std::string name, + CUpti_MetricID id, + std::vector events, + CUpti_MetricEvaluationMode eval_mode, + CuptiMetricApi& cupti_metrics); + + struct CalculatedValues { + std::vector perInstance; + SampleValue total; + }; + + struct CalculatedValues calculate( + std::map& events, + std::chrono::nanoseconds sample_duration, + const SampleSlice& slice); + + int instanceCount(std::map& events) { + return events[events_[0]].instanceCount; + } + + void printDescription(std::ostream& s) const; + + std::string name; + + private: + CUpti_MetricID id_; + std::vector events_; + CUpti_MetricEvaluationMode evalMode_; + // Calls to CUPTI is encapsulated behind this interface + CuptiMetricApi& cuptiMetrics_; + CUpti_MetricValueKind valueKind_; +}; + +/** + * A set of event groups. + * Holds all the events that may be collected in a single pass. + * A group contains one or more counters for a single domain. + * A group set contains zero or one groups per domain. + */ +class EventGroupSet { + public: + EventGroupSet( + CUpti_EventGroupSet& set, + std::map& events, + CuptiEventApi& cupti); + ~EventGroupSet(); + + EventGroupSet(const EventGroupSet&) = delete; + EventGroupSet& operator=(const EventGroupSet&) = delete; + EventGroupSet(EventGroupSet&&) = default; + EventGroupSet& operator=(EventGroupSet&&) = delete; + + // Number of groups = number of domains profiled + int groupCount() const { + return set_.numEventGroups; + } + + void setEnabled(bool enabled); + // Take a sample of counters in this group set + void collectSample(); + void printDescription(std::ostream& s) const; + + private: + CUpti_EventGroupSet& set_; + std::map& events_; + // Calls to CUPTI is encapsulated behind this interface + CuptiEventApi& cuptiEvents_; + bool enabled_; +}; + +// The sampler +class EventProfiler { + public: + explicit EventProfiler( + std::unique_ptr cupti_events, + std::unique_ptr cupti_metrics, + std::vector>& loggers, + std::vector>& onDemandLoggers); + EventProfiler(const EventProfiler&) = delete; + EventProfiler& operator=(const EventProfiler&) = delete; + ~EventProfiler(); + + void configure(Config& config, Config* onDemandConfig); + + bool isOnDemandActive() { + return !!onDemandConfig_; + } + + // Print the counter sets. Multiple sets will be multiplexed. + void printSets(std::ostream& s) const; + + // Print metrics descriptions + void printMetrics(std::ostream& s) const; + + bool enableForDevice(Config& cfg); + + CUdevice device() { + return cuptiEvents_->device(); + } + + bool setContinuousMode() { + return cuptiEvents_->setContinuousMode(); + } + + std::chrono::milliseconds samplePeriod() { + return mergedConfig_->samplePeriod(); + } + + std::chrono::milliseconds multiplexPeriod() { + return mergedConfig_->multiplexPeriod(); + } + + std::chrono::milliseconds reportPeriod() { + return config_->reportPeriod(); + } + + std::chrono::milliseconds onDemandReportPeriod() { + return onDemandConfig_->reportPeriod(); + } + + // Read values of currently running counters. + void collectSample(); + + void reportSamples(); + void reportOnDemandSamples(); + + bool enabled() { + return sets_.size() > 0; + } + + bool multiplexEnabled() { + return sets_.size() > 1; + } + + // Multiplex counters. + void enableNextCounterSet(); + + void eraseReportedSamples() { + int erase_count = baseSamples_; + if (onDemandConfig_ && + onDemandConfig_->eventProfilerOnDemandDuration().count() > 0) { + erase_count = std::min(baseSamples_, onDemandSamples_); + } + eraseSamples(erase_count); + baseSamples_ -= erase_count; + onDemandSamples_ -= erase_count; + } + + void clearSamples() { + for (auto& pair : events_) { + pair.second.clearSamples(); + } + baseSamples_ = 0; + onDemandSamples_ = 0; + } + + private: + // Functions to initialize profiler based on Config settings. + bool applyConfig(const Config& config); + bool initEventsAndMetrics(const Config& config); + void initEvents(const std::set& eventNames); + void initMetrics(const std::set& metricNames); + bool initEventGroups(); + + PercentileList initPercentiles(const std::vector& percentiles) { + PercentileList res; + res.reserve(percentiles.size()); + for (int p : percentiles) { + res.emplace_back(p, SampleValue(0)); + } + return res; + } + + // Notify listeners of collected samples + void dispatchSamples( + const Config& config, + const std::vector>& loggers, + int report_nr); + + void eraseSamples(int count) { + for (auto& pair : events_) { + pair.second.eraseSamples(count); + } + } + + void updateLoggers(Config& config, Config* on_demand_config); + + // Print all collected samples since last clear. + void printAllSamples(std::ostream& s, CUdevice device) const; + + // Calls to CUPTI is encapsulated behind these interfaces + std::unique_ptr cuptiEvents_; + std::unique_ptr cuptiMetrics_; + // The CUpti API reports event IDs, we must map them to our event objects + std::map events_; + // List of metrics + std::vector metrics_; + // The countert sets needed to collect all counters + std::vector sets_; + // The event group set object returned by Cupti. + // Saved s.t. we can call cuptiEventGroupSetsDestroy to free memory when + // the object is no longer needed. + CUpti_EventGroupSets* eventGroupSets_ = nullptr; + // Current multiplexed counter set + int curEnabledSet_{0}; + + std::unique_ptr config_; + std::unique_ptr onDemandConfig_; + std::unique_ptr mergedConfig_; + int baseSamples_{0}; + int onDemandSamples_{0}; + + // Shared between profiler threads + // Vectors are read-only but calling loggers require lock + const std::vector>& loggers_; + const std::vector>& onDemandLoggers_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/EventProfilerController.cpp b/plugins/tensorboard-plugins/libkineto/src/EventProfilerController.cpp new file mode 100644 index 0000000000000000000000000000000000000000..0427cc7a90cbc49d31262bcce63f1f81c5b6293f --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/EventProfilerController.cpp @@ -0,0 +1,423 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "EventProfilerController.h" + +#include +#include +#include + +#include "ConfigLoader.h" +#include "CuptiEventApi.h" +#include "CuptiMetricApi.h" +#include "EventProfiler.h" +#include "output_csv.h" + +#include "Logger.h" +#include "ThreadUtil.h" + +using namespace std::chrono; +using std::unique_ptr; +using std::vector; + +namespace KINETO_NAMESPACE { + +namespace { + +vector(const Config&)>>& +loggerFactories() { + static vector(const Config&)>> + factories; + return factories; +} + +vector(const Config&)>>& +onDemandLoggerFactories() { + static vector(const Config&)>> + factories; + return factories; +} + +vector> makeLoggers(const Config& config) { + vector> loggers; + for (const auto& factory : loggerFactories()) { + loggers.push_back(factory(config)); + } + loggers.push_back(std::make_unique()); + loggers.push_back(std::make_unique()); + return loggers; +} + +vector> makeOnDemandLoggers( + const Config& config) { + vector> loggers; + for (const auto& factory : onDemandLoggerFactories()) { + loggers.push_back(factory(config)); + } + loggers.push_back(std::make_unique()); + return loggers; +} + +vector>& loggers(const Config& config) { + static auto res = makeLoggers(config); + return res; +} + +vector>& onDemandLoggers( + const Config& config) { + static auto res = makeOnDemandLoggers(config); + return res; +} + +} // anon namespace + +// Keep an eye on profiling threads. +// We've observed deadlocks in Cuda11 in libcuda / libcupti.. +namespace detail { + +class HeartbeatMonitor { + + public: + ~HeartbeatMonitor() { + stopMonitoring(); + } + + static HeartbeatMonitor& instance() { + static HeartbeatMonitor monitor; + return monitor; + } + + void profilerHeartbeat() { + int32_t tid = systemThreadId(); + std::lock_guard lock(mutex_); + profilerAliveMap_[tid]++; + } + + void setPeriod(seconds period) { + { + std::lock_guard lock(mutex_); + if (period_ == period) { + return; + } + period_ = period; + } + if (period == seconds(0)) { + stopMonitoring(); + } else { + startMonitoring(); + } + } + + private: + HeartbeatMonitor() = default; + + void monitorLoop() { + std::unique_lock lock(mutex_); + while(!stopMonitor_) { + auto cv_status = condVar_.wait_for(lock, seconds(period_)); + // Don't perform check on spurious wakeup or on notify + if (cv_status == std::cv_status::timeout) { + for (auto& pair : profilerAliveMap_) { + int32_t tid = pair.first; + int& i = pair.second; + if (i == 0) { + LOG(ERROR) << "Thread " << tid << " appears stuck!"; + } + i = 0; + } + } + } + } + + void startMonitoring() { + if (!monitorThread_) { + VLOG(0) << "Starting monitoring thread"; + stopMonitor_ = false; + monitorThread_ = std::make_unique( + &HeartbeatMonitor::monitorLoop, this); + } + } + + void stopMonitoring() { + if (monitorThread_) { + VLOG(0) << "Stopping monitoring thread"; + stopMonitor_ = true; + condVar_.notify_one(); + monitorThread_->join(); + monitorThread_ = nullptr; + VLOG(0) << "Monitoring thread terminated"; + } + } + + std::map profilerAliveMap_; + std::unique_ptr monitorThread_; + std::mutex mutex_; + std::condition_variable condVar_; + std::atomic_bool stopMonitor_{false}; + seconds period_{0}; +}; + +} // namespace detail + +namespace { +// Profiler map singleton +std::map>& profilerMap() { + static std::map> instance; + return instance; +} + +void reportLateSample( + int sleepMs, + int sampleMs, + int reportMs, + int reprogramMs) { + LOG_EVERY_N(WARNING, 10) << "Lost sample due to delays (ms): " << sleepMs + << ", " << sampleMs << ", " << reportMs << ", " + << reprogramMs; +} + +void configureHeartbeatMonitor( + detail::HeartbeatMonitor& monitor, const Config& base, const Config* onDemand) { + seconds base_period = + base.eventProfilerHeartbeatMonitorPeriod(); + seconds on_demand_period = !onDemand ? seconds(0) : + onDemand->eventProfilerHeartbeatMonitorPeriod(); + monitor.setPeriod( + on_demand_period > seconds(0) ? on_demand_period : base_period); +} + +} // anon namespace + +void EventProfilerController::addLoggerFactory( + std::function(const Config&)> factory) { + loggerFactories().push_back(factory); +} + +void EventProfilerController::addOnDemandLoggerFactory( + std::function(const Config&)> factory) { + onDemandLoggerFactories().push_back(factory); +} + +EventProfilerController::EventProfilerController( + CUcontext context, + ConfigLoader& configLoader, + detail::HeartbeatMonitor& heartbeatMonitor) + : configLoader_(configLoader), heartbeatMonitor_(heartbeatMonitor) { + auto cupti_events = std::make_unique(context); + auto cupti_metrics = + std::make_unique(cupti_events->device()); + configLoader_.addHandler( + ConfigLoader::ConfigKind::EventProfiler, this); + auto config = configLoader.getConfigCopy(); + profiler_ = std::make_unique( + std::move(cupti_events), + std::move(cupti_metrics), + loggers(*config), + onDemandLoggers(*config)); + profilerThread_ = std::make_unique( + &EventProfilerController::profilerLoop, this); +} + +EventProfilerController::~EventProfilerController() { + if (profilerThread_) { + // signaling termination of the profiler loop + stopRunloop_ = true; + profilerThread_->join(); + } + configLoader_.removeHandler( + ConfigLoader::ConfigKind::EventProfiler, this); + VLOG(0) << "Stopped event profiler"; +} + +// Must be called under lock +void EventProfilerController::start(CUcontext ctx, ConfigLoader& configLoader) { + profilerMap()[ctx] = unique_ptr( + new EventProfilerController( + ctx, configLoader, detail::HeartbeatMonitor::instance())); +} + +// Must be called under lock +void EventProfilerController::stop(CUcontext ctx) { + profilerMap()[ctx] = nullptr; +} + +bool EventProfilerController::canAcceptConfig() { + std::lock_guard guard(mutex_); + return !newOnDemandConfig_; +} + +void EventProfilerController::acceptConfig(const Config& config) { + if (config.eventProfilerOnDemandDuration().count() == 0) { + // Ignore - not for this profiler + return; + } + std::lock_guard guard(mutex_); + if (newOnDemandConfig_) { + LOG(ERROR) << "On demand request already queued - ignoring new request"; + return; + } + newOnDemandConfig_ = config.clone(); + LOG(INFO) << "Received new on-demand config"; +} + +bool EventProfilerController::enableForDevice(Config& cfg) { + // FIXME: Use device unique id! + if (!cfg.eventProfilerEnabledForDevice(profiler_->device())) { + return false; + } + // context count includes the new context + int instances = configLoader_.contextCountForGpu(profiler_->device()); + VLOG(0) << "Device context count: " << instances; + return instances >= 0 && instances <= cfg.maxEventProfilersPerGpu(); +} + +void EventProfilerController::profilerLoop() { + // We limit the number of profilers that can exist per GPU + auto config = configLoader_.getConfigCopy(); + if (!enableForDevice(*config)) { + VLOG(0) << "Not starting EventProfiler - profilers for GPU " + << profiler_->device() << " exceeds profilers per GPU limit (" + << config->maxEventProfilersPerGpu() << ")"; + return; + } + + if (!profiler_->setContinuousMode()) { + VLOG(0) << "Continuous mode not supported for GPU " + << profiler_->device() << ". Not starting Event Profiler."; + return; + } + + VLOG(0) << "Starting Event Profiler for GPU " << profiler_->device(); + setThreadName("CUPTI Event Profiler"); + + time_point next_sample_time; + time_point next_report_time; + time_point next_on_demand_report_time; + time_point next_multiplex_time; + std::unique_ptr on_demand_config = nullptr; + bool reconfigure = true; + bool restart = true; + int report_count = 0; + int on_demand_report_count = 0; + while (!stopRunloop_) { + heartbeatMonitor_.profilerHeartbeat(); + if (configLoader_.hasNewConfig(*config)) { + config = configLoader_.getConfigCopy(); + VLOG(0) << "Base config changed"; + report_count = 0; + reconfigure = true; + } + + auto now = system_clock::now(); + if (on_demand_config && + now > (on_demand_config->eventProfilerOnDemandStartTime() + + on_demand_config->eventProfilerOnDemandDuration())) { + on_demand_config = nullptr; + LOG(INFO) << "On-demand profiling complete"; + reconfigure = true; + } + + if (!profiler_->isOnDemandActive()) { + std::lock_guard lock(mutex_); + if (newOnDemandConfig_) { + VLOG(0) << "Received on-demand config, reconfiguring"; + on_demand_config = std::move(newOnDemandConfig_); + reconfigure = true; + on_demand_report_count = 0; + } + } + + if (reconfigure) { + try { + profiler_->configure(*config, on_demand_config.get()); + } catch (const std::exception& ex) { + LOG(ERROR) << "Encountered error while configuring event profiler: " + << ex.what(); + // Exit profiling entirely when encountering an error here + // as it indicates a serious problem or bug. + break; + } + configureHeartbeatMonitor( + heartbeatMonitor_, *config, on_demand_config.get()); + reconfigure = false; + restart = true; + } + + if (restart) { + now = system_clock::now(); + next_sample_time = now + profiler_->samplePeriod(); + next_report_time = now + profiler_->reportPeriod(); + if (profiler_->isOnDemandActive()) { + next_on_demand_report_time = now + profiler_->onDemandReportPeriod(); + } + next_multiplex_time = now + profiler_->multiplexPeriod(); + // Collect an initial sample and throw it away + // The next sample is the first valid one + profiler_->collectSample(); + profiler_->clearSamples(); + restart = false; + } + + auto start_sleep = now; + while (now < next_sample_time) { + /* sleep override */ + std::this_thread::sleep_for(next_sample_time - now); + now = system_clock::now(); + } + int sleep_time = duration_cast(now - start_sleep).count(); + + auto start_sample = now; + profiler_->collectSample(); + now = system_clock::now(); + int sample_time = duration_cast(now - start_sample).count(); + + next_sample_time += profiler_->samplePeriod(); + if (now > next_sample_time) { + reportLateSample(sleep_time, sample_time, 0, 0); + restart = true; + continue; + } + + auto start_report = now; + if (now > next_report_time) { + VLOG(1) << "Report #" << report_count++; + profiler_->reportSamples(); + next_report_time += profiler_->reportPeriod(); + } + if (profiler_->isOnDemandActive() && now > next_on_demand_report_time) { + VLOG(1) << "OnDemand Report #" << on_demand_report_count++; + profiler_->reportOnDemandSamples(); + next_on_demand_report_time += profiler_->onDemandReportPeriod(); + } + profiler_->eraseReportedSamples(); + now = system_clock::now(); + int report_time = duration_cast(now - start_report).count(); + + if (now > next_sample_time) { + reportLateSample(sleep_time, sample_time, report_time, 0); + restart = true; + continue; + } + + auto start_multiplex = now; + if (profiler_->multiplexEnabled() && now > next_multiplex_time) { + profiler_->enableNextCounterSet(); + next_multiplex_time += profiler_->multiplexPeriod(); + } + now = system_clock::now(); + int multiplex_time = + duration_cast(now - start_multiplex).count(); + + if (now > next_sample_time) { + reportLateSample(sleep_time, sample_time, report_time, multiplex_time); + restart = true; + } + + VLOG(0) << "Runloop execution time: " + << duration_cast(now - start_sample).count() << "ms"; + } + + VLOG(0) << "Device " << profiler_->device() + << ": Exited event profiling loop"; +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/EventProfilerController.h b/plugins/tensorboard-plugins/libkineto/src/EventProfilerController.h new file mode 100644 index 0000000000000000000000000000000000000000..007a82faa9289ada9256d09907167471eb6520b9 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/EventProfilerController.h @@ -0,0 +1,63 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include + +#include + +#include "ConfigLoader.h" + +namespace KINETO_NAMESPACE { + +class Config; +class ConfigLoader; +class EventProfiler; +class SampleListener; + +namespace detail { +class HeartbeatMonitor; +} + +class EventProfilerController : public ConfigLoader::ConfigHandler { + public: + EventProfilerController(const EventProfilerController&) = delete; + EventProfilerController& operator=(const EventProfilerController&) = delete; + + ~EventProfilerController(); + + static void start(CUcontext ctx, ConfigLoader& configLoader); + static void stop(CUcontext ctx); + + static void addLoggerFactory( + std::function(const Config&)> factory); + + static void addOnDemandLoggerFactory( + std::function(const Config&)> factory); + + bool canAcceptConfig() override; + + void acceptConfig(const Config& config) override; + + private: + explicit EventProfilerController( + CUcontext context, + ConfigLoader& configLoader, + detail::HeartbeatMonitor& heartbeatMonitor); + bool enableForDevice(Config& cfg); + void profilerLoop(); + + ConfigLoader& configLoader_; + std::unique_ptr newOnDemandConfig_; + detail::HeartbeatMonitor& heartbeatMonitor_; + std::unique_ptr profiler_; + std::unique_ptr profilerThread_; + std::atomic_bool stopRunloop_{false}; + std::mutex mutex_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/GenericTraceActivity.cpp b/plugins/tensorboard-plugins/libkineto/src/GenericTraceActivity.cpp new file mode 100644 index 0000000000000000000000000000000000000000..4e00b1256c4fa301e288e619ee9ef8c56c8b8569 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/GenericTraceActivity.cpp @@ -0,0 +1,10 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "GenericTraceActivity.h" +#include "output_base.h" + +namespace libkineto { + void GenericTraceActivity::log(ActivityLogger& logger) const { + logger.handleGenericActivity(*this); + } +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/ILoggerObserver.cpp b/plugins/tensorboard-plugins/libkineto/src/ILoggerObserver.cpp new file mode 100644 index 0000000000000000000000000000000000000000..f0106578811837c9cc677def30d5697d43a94221 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ILoggerObserver.cpp @@ -0,0 +1,54 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "ILoggerObserver.h" + +#if !USE_GOOGLE_LOG + +#include +#include + +namespace libkineto { + +struct LoggerTypeName { + constexpr LoggerTypeName(const char* n, LoggerOutputType t) : name(n), type(t) {}; + const char* name; + LoggerOutputType type; +}; + +static constexpr std::array LoggerMap{{ + {"VERBOSE", LoggerOutputType::VERBOSE}, + {"INFO", LoggerOutputType::INFO}, + {"WARNING", LoggerOutputType::WARNING}, + {"ERROR", LoggerOutputType::ERROR}, + {"STAGE", LoggerOutputType::STAGE}, + {"???", LoggerOutputType::ENUM_COUNT} +}}; + +static constexpr bool matchingOrder(int idx = 0) { + return LoggerMap[idx].type == LoggerOutputType::ENUM_COUNT || + ((idx == (int) LoggerMap[idx].type) && matchingOrder(idx + 1)); +} +static_assert(matchingOrder(), "LoggerTypeName map is out of order"); + +const char* toString(LoggerOutputType t) { + if(t < VERBOSE || t >= ENUM_COUNT) { + return LoggerMap[ENUM_COUNT].name; + } + return LoggerMap[(int)t].name; +} + +LoggerOutputType toLoggerOutputType(const std::string& str) { + for (int i = 0; i < LoggerTypeCount; i++) { + if (str == LoggerMap[i].name) { + return LoggerMap[i].type; + } + } + throw std::invalid_argument(fmt::format("Invalid activity type: {}", str)); +} + +} // namespace libkineto + + +#endif // !USE_GOOGLE_LOG diff --git a/plugins/tensorboard-plugins/libkineto/src/Logger.cpp b/plugins/tensorboard-plugins/libkineto/src/Logger.cpp new file mode 100644 index 0000000000000000000000000000000000000000..dbde765f51f7a5f03c31a9c79e6d00ce9a2070b6 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/Logger.cpp @@ -0,0 +1,136 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "Logger.h" +#include "ILoggerObserver.h" + +#ifndef USE_GOOGLE_LOG + +#include +#include +#include +#include +#include + +#include +#include + +#include "ThreadUtil.h" + +namespace KINETO_NAMESPACE { + +std::atomic_int Logger::severityLevel_{VERBOSE}; +std::atomic_int Logger::verboseLogLevel_{-1}; +std::atomic Logger::verboseLogModules_{~0ull}; + +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wglobal-constructors" +std::mutex Logger::loggerObserversMutex_; +#pragma GCC diagnostic pop + + +Logger::Logger(int severity, int line, const char* filePath, int errnum) + : buf_(), out_(LIBKINETO_DBG_STREAM), errnum_(errnum), messageSeverity_(severity) { + buf_ << toString((LoggerOutputType) severity) << ":"; + + const auto tt = + std::chrono::system_clock::to_time_t(std::chrono::system_clock::now()); + const char* file = strrchr(filePath, '/'); + buf_ << fmt::format("{:%Y-%m-%d %H:%M:%S}", fmt::localtime(tt)) << " " + << processId() << ":" << systemThreadId() << " " + << (file ? file + 1 : filePath) << ":" << line << "] "; +} + +Logger::~Logger() { +#ifdef __linux__ + if (errnum_ != 0) { + thread_local char buf[1024]; + buf_ << " : " << strerror_r(errnum_, buf, sizeof(buf)); + } +#endif + + { + std::lock_guard guard(loggerObserversMutex_); + for (auto* observer : loggerObservers()) { + // Output to observers. Current Severity helps keep track of which bucket the output goes. + if (observer) { + observer->write(buf_.str(), (LoggerOutputType) messageSeverity_); + } + } + } + + // Finally, print to terminal or console. + out_ << buf_.str() << std::endl; +} + +void Logger::setVerboseLogModules(const std::vector& modules) { + uint64_t mask = 0; + if (modules.empty()) { + mask = ~0ull; + } else { + for (const std::string& name : modules) { + mask |= hash(name.c_str()); + } + } + verboseLogModules_ = mask; +} + +void Logger::addLoggerObserver(ILoggerObserver* observer) { + if (observer == nullptr) { + return; + } + std::lock_guard guard(loggerObserversMutex_); + loggerObservers().insert(observer); +} + +void Logger::removeLoggerObserver(ILoggerObserver* observer) { + std::lock_guard guard(loggerObserversMutex_); + loggerObservers().erase(observer); +} + +void Logger::addLoggerObserverDevice(int64_t device) { + std::lock_guard guard(loggerObserversMutex_); + for (auto observer : loggerObservers()) { + observer->addDevice(device); + } +} + +void Logger::addLoggerObserverEventCount(int64_t count) { + std::lock_guard guard(loggerObserversMutex_); + for (auto observer : loggerObservers()) { + observer->addEventCount(count); + } +} + +void Logger::setLoggerObserverTraceDurationMS(int64_t duration) { + std::lock_guard guard(loggerObserversMutex_); + for (auto observer : loggerObservers()) { + observer->setTraceDurationMS(duration); + } +} + +void Logger::setLoggerObserverTraceID(const std::string& tid) { + std::lock_guard guard(loggerObserversMutex_); + for (auto observer : loggerObservers()) { + observer->setTraceID(tid); + } +} + +void Logger::setLoggerObserverGroupTraceID(const std::string& gtid) { + std::lock_guard guard(loggerObserversMutex_); + for (auto observer : loggerObservers()) { + observer->setGroupTraceID(gtid); + } +} + +void Logger::addLoggerObserverDestination(const std::string& dest) { + std::lock_guard guard(loggerObserversMutex_); + for (auto observer : loggerObservers()) { + observer->addDestination(dest); + } +} + +} // namespace KINETO_NAMESPACE + +#endif // USE_GOOGLE_LOG diff --git a/plugins/tensorboard-plugins/libkineto/src/Logger.h b/plugins/tensorboard-plugins/libkineto/src/Logger.h new file mode 100644 index 0000000000000000000000000000000000000000..868fc84b9f4ee86d88805bed81468a5df6988257 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/Logger.h @@ -0,0 +1,244 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +#define LIBKINETO_DBG_STREAM std::cerr + +#if USE_GOOGLE_LOG + +#include + +#define SET_LOG_SEVERITY_LEVEL(level) +#define SET_LOG_VERBOSITY_LEVEL(level, modules) +#define LOGGER_OBSERVER_ADD_DEVICE(device) +#define LOGGER_OBSERVER_ADD_EVENT_COUNT(count) +#define LOGGER_OBSERVER_SET_TRACE_DURATION_MS(duration) +#define LOGGER_OBSERVER_SET_TRACE_ID(tid) +#define LOGGER_OBSERVER_SET_GROUP_TRACE_ID(gtid) +#define LOGGER_OBSERVER_ADD_DESTINATION(dest) +#define UST_LOGGER_MARK_COMPLETED(stage) + +#else // !USE_GOOGLE_LOG +#include +#include +#include +#include +#include +#include +#include +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "ILoggerObserver.h" + +#ifdef _MSC_VER +// unset a predefined ERROR (windows) +#undef ERROR +#endif // _MSC_VER + +namespace KINETO_NAMESPACE { + +class Logger { + public: + Logger(int severity, int line, const char* filePath, int errnum = 0); + ~Logger(); + + inline std::ostream& stream() { + return buf_; + } + + static inline void setSeverityLevel(int level) { + severityLevel_ = level; + } + + static inline int severityLevel() { + return severityLevel_; + } + + static inline void setVerboseLogLevel(int level) { + verboseLogLevel_ = level; + } + + static inline int verboseLogLevel() { + return verboseLogLevel_; + } + + // This is constexpr so that the hash for a file name is computed at compile + // time when used in the VLOG macros. + // This way, there is no string comparison for matching VLOG modules, + // only a comparison of pre-computed hashes. + // No fancy hashing needed here. It's pretty inefficient (one character + // at a time) but the strings are not large and it's not in the critical path. + static constexpr uint64_t rol(uint64_t val, int amount) { + return val << amount | val >> (63 - amount); + } + static constexpr uint64_t hash(const char* s) { + uint64_t hash = hash_rec(s, 0); + return hash & rol(0x41a0240682483014ull, hash & 63); + } + static constexpr uint64_t hash_rec(const char* s, int off) { + // Random constants! + return (!s[off] ? 57ull : (hash_rec(s, off + 1) * 293) ^ s[off]); + } + static constexpr const char* basename(const char* s, int off = 0) { + return !s[off] + ? s + : s[off] == '/' ? basename(&s[off + 1]) : basename(s, off + 1); + } + + static void setVerboseLogModules(const std::vector& modules); + + static inline uint64_t verboseLogModules() { + return verboseLogModules_; + } + + static void clearLoggerObservers() { + std::lock_guard g(loggerObserversMutex_); + loggerObservers().clear(); + } + + static void addLoggerObserver(ILoggerObserver* observer); + + static void removeLoggerObserver(ILoggerObserver* observer); + + static void addLoggerObserverDevice(int64_t device); + + static void addLoggerObserverEventCount(int64_t count); + + static void setLoggerObserverTraceDurationMS(int64_t duration); + + static void setLoggerObserverTraceID(const std::string& tid); + + static void setLoggerObserverGroupTraceID(const std::string& gtid); + + static void addLoggerObserverDestination(const std::string& dest); + + private: + std::stringstream buf_; + std::ostream& out_; + int errnum_; + int messageSeverity_; + static std::atomic_int severityLevel_; + static std::atomic_int verboseLogLevel_; + static std::atomic verboseLogModules_; + static std::set& loggerObservers() { + static auto* inst = new std::set(); + return *inst; + } + static std::mutex loggerObserversMutex_; +}; + +class VoidLogger { + public: + VoidLogger() {} + void operator&(std::ostream&) {} +}; + +} // namespace KINETO_NAMESPACE + +#ifdef LOG // Undefine in case these are already defined (quite likely) +#undef LOG +#undef LOG_IS_ON +#undef LOG_IF +#undef LOG_EVERY_N +#undef LOG_IF_EVERY_N +#undef DLOG +#undef DLOG_IF +#undef VLOG +#undef VLOG_IF +#undef VLOG_EVERY_N +#undef VLOG_IS_ON +#undef DVLOG +#undef LOG_FIRST_N +#undef CHECK +#undef DCHECK +#undef DCHECK_EQ +#undef PLOG +#undef PCHECK +#undef LOG_OCCURRENCES +#endif + +#define LOG_IS_ON(severity) \ + (severity >= libkineto::Logger::severityLevel()) + +#define LOG_IF(severity, condition) \ + !(LOG_IS_ON(severity) && (condition)) ? (void)0 : libkineto::VoidLogger() & \ + libkineto::Logger(severity, __LINE__, __FILE__).stream() + +#define LOG(severity) LOG_IF(severity, true) + +#define LOCAL_VARNAME_CONCAT(name, suffix) _##name##suffix##_ + +#define LOCAL_VARNAME(name) LOCAL_VARNAME_CONCAT(name, __LINE__) + +#define LOG_OCCURRENCES LOCAL_VARNAME(log_count) + +#define LOG_EVERY_N(severity, rate) \ + static int LOG_OCCURRENCES = 0; \ + LOG_IF(severity, LOG_OCCURRENCES++ % rate == 0) \ + << "(x" << LOG_OCCURRENCES << ") " + +template +struct __to_constant__ { + static const uint64_t val = n; +}; +#define FILENAME_HASH \ + __to_constant__::val +#define VLOG_IS_ON(verbosity) \ + (libkineto::Logger::verboseLogLevel() >= verbosity && \ + (libkineto::Logger::verboseLogModules() & FILENAME_HASH) == FILENAME_HASH) + +#define VLOG_IF(verbosity, condition) \ + LOG_IF(VERBOSE, VLOG_IS_ON(verbosity) && (condition)) + +#define VLOG(verbosity) VLOG_IF(verbosity, true) + +#define VLOG_EVERY_N(verbosity, rate) \ + static int LOG_OCCURRENCES = 0; \ + VLOG_IF(verbosity, LOG_OCCURRENCES++ % rate == 0) \ + << "(x" << LOG_OCCURRENCES << ") " + +#define PLOG(severity) \ + libkineto::Logger(severity, __LINE__, __FILE__, errno).stream() + +#define SET_LOG_SEVERITY_LEVEL(level) \ + libkineto::Logger::setSeverityLevel(level) + +#define SET_LOG_VERBOSITY_LEVEL(level, modules) \ + libkineto::Logger::setVerboseLogLevel(level); \ + libkineto::Logger::setVerboseLogModules(modules) + +// Logging the set of devices the trace is collect on. +#define LOGGER_OBSERVER_ADD_DEVICE(device_count) \ + libkineto::Logger::addLoggerObserverDevice(device_count) + +// Incrementing the number of events collected by this trace. +#define LOGGER_OBSERVER_ADD_EVENT_COUNT(count) \ + libkineto::Logger::addLoggerObserverEventCount(count) + +// Record duration of trace in milliseconds. +#define LOGGER_OBSERVER_SET_TRACE_DURATION_MS(duration) \ + libkineto::Logger::setLoggerObserverTraceDurationMS(duration) + +// Record the trace id when given. +#define LOGGER_OBSERVER_SET_TRACE_ID(tid) \ + libkineto::Logger::setLoggerObserverTraceID(tid) + +// Record the group trace id when given. +#define LOGGER_OBSERVER_SET_GROUP_TRACE_ID(gtid) \ + libkineto::Logger::setLoggerObserverGroupTraceID(gtid) + +// Log the set of destinations the trace is sent to. +#define LOGGER_OBSERVER_ADD_DESTINATION(dest) \ + libkineto::Logger::addLoggerObserverDestination(dest) + +// UST Logger Semantics to describe when a stage is complete. +#define UST_LOGGER_MARK_COMPLETED(stage) \ + LOG(libkineto::LoggerOutputType::STAGE) << "Completed Stage: " << stage + +#endif // USE_GOOGLE_LOG diff --git a/plugins/tensorboard-plugins/libkineto/src/LoggerCollector.h b/plugins/tensorboard-plugins/libkineto/src/LoggerCollector.h new file mode 100644 index 0000000000000000000000000000000000000000..bb05aab218dc137cfe2f0107694a049ee2ea6508 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/LoggerCollector.h @@ -0,0 +1,70 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#if !USE_GOOGLE_LOG + +#include +#include +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "ILoggerObserver.h" + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +class LoggerCollector : public ILoggerObserver { + public: + LoggerCollector() : buckets_() {} + + void write(const std::string& message, LoggerOutputType ot = ERROR) override { + // Skip STAGE output type which is only used by USTLoggerCollector. + if (ot != STAGE) { + buckets_[ot].push_back(message); + } + } + + const std::map> extractCollectorMetadata() override { + return buckets_; + } + + void reset() override { + trace_duration_ms = 0; + event_count = 0; + destinations.clear(); + } + + void addDevice(const int64_t device) override { + devices.insert(device); + } + + void setTraceDurationMS(const int64_t duration) override { + trace_duration_ms = duration; + } + + void addEventCount(const int64_t count) override { + event_count += count; + } + + void addDestination(const std::string& dest) override { + destinations.insert(dest); + } + + protected: + std::map> buckets_; + + // These are useful metadata to collect from CUPTIActivityProfiler for internal tracking. + std::set devices; + int64_t trace_duration_ms{0}; + std::atomic event_count{0}; + std::set destinations; + +}; + +} // namespace KINETO_NAMESPACE + +#endif // !USE_GOOGLE_LOG diff --git a/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityApi.cpp b/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityApi.cpp new file mode 100644 index 0000000000000000000000000000000000000000..73eff13e2a08bcfecefb03f5b229bde89b7e96cb --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityApi.cpp @@ -0,0 +1,569 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "RoctracerActivityApi.h" + +#include +#include +#include + +#include "Demangle.h" +#include "output_base.h" +#include "ThreadUtil.h" + +typedef uint64_t timestamp_t; + +static timestamp_t timespec_to_ns(const timespec& time) { + return ((timestamp_t)time.tv_sec * 1000000000) + time.tv_nsec; + } + +using namespace std::chrono; + +namespace KINETO_NAMESPACE { + +constexpr size_t kBufSize(2 * 1024 * 1024); + +RoctracerActivityApi& RoctracerActivityApi::singleton() { + static RoctracerActivityApi instance; + return instance; +} + +RoctracerActivityApi::RoctracerActivityApi() { + gpuTraceBuffers_ = std::make_unique>(); +} + +RoctracerActivityApi::~RoctracerActivityApi() { + disableActivities(std::set()); + endTracing(); +} + +void RoctracerActivityApi::pushCorrelationID(int id, CorrelationFlowType type) { +#ifdef HAS_ROCTRACER + if (!singleton().externalCorrelationEnabled_) { + return; + } + // placeholder +#endif +} + +void RoctracerActivityApi::popCorrelationID(CorrelationFlowType type) { +#ifdef HAS_ROCTRACER + if (!singleton().externalCorrelationEnabled_) { + return; + } + // placeholder +#endif +} + +void RoctracerActivityApi::setMaxBufferSize(int size) { + maxGpuBufferCount_ = 1 + size / kBufSize; +} + +int RoctracerActivityApi::processActivities( + ActivityLogger& logger) { + // Find offset to map from monotonic clock to system clock. + // This will break time-ordering of events but is status quo. + + timespec t0, t1, t00; + clock_gettime(CLOCK_REALTIME, &t0); + clock_gettime(CLOCK_MONOTONIC, &t1); + clock_gettime(CLOCK_REALTIME, &t00); + + const timestamp_t toffset = (timespec_to_ns(t0) >> 1) + (timespec_to_ns(t00) >> 1) - timespec_to_ns(t1); + + int count = 0; + + // Basic Api calls + + for (auto &item : rows_) { + GenericTraceActivity a; + a.startTime = (item.begin + toffset) / 1000; + a.endTime = (item.end + toffset) / 1000; + a.id = item.id; + a.device = item.pid; + a.resource = item.tid; + a.activityType = ActivityType::CUDA_RUNTIME; + a.activityName = std::string(roctracer_op_string(ACTIVITY_DOMAIN_HIP_API, item.cid, 0)); + a.flow.id = item.id; + a.flow.type = kLinkAsyncCpuGpu; + a.flow.start = true; + + logger.handleGenericActivity(a); + ++count; + } + + // Malloc/Free calls + for (auto &item : mallocRows_) { + GenericTraceActivity a; + a.startTime = (item.begin + toffset) / 1000; + a.endTime = (item.end + toffset) / 1000; + a.id = item.id; + a.device = item.pid; + a.resource = item.tid; + a.activityType = ActivityType::CUDA_RUNTIME; + a.activityName = std::string(roctracer_op_string(ACTIVITY_DOMAIN_HIP_API, item.cid, 0)); + a.flow.id = item.id; + a.flow.type = kLinkAsyncCpuGpu; + a.flow.start = true; + + a.addMetadata("ptr", item.ptr); + if (item.cid == HIP_API_ID_hipMalloc) { + a.addMetadata("size", item.size); + } + + logger.handleGenericActivity(a); + ++count; + } + + // HipMemcpy calls + for (auto &item : copyRows_) { + GenericTraceActivity a; + a.startTime = (item.begin + toffset) / 1000; + a.endTime = (item.end + toffset) / 1000; + a.id = item.id; + a.device = item.pid; + a.resource = item.tid; + a.activityType = ActivityType::CUDA_RUNTIME; + a.activityName = std::string(roctracer_op_string(ACTIVITY_DOMAIN_HIP_API, item.cid, 0)); + a.flow.id = item.id; + a.flow.type = kLinkAsyncCpuGpu; + a.flow.start = true; + + a.addMetadata("src", item.src); + a.addMetadata("dst", item.dst); + a.addMetadata("size", item.size); + a.addMetadata("kind", item.kind); + if ((item.cid == HIP_API_ID_hipMemcpyAsync) || (item.cid == HIP_API_ID_hipMemcpyWithStream)) { + a.addMetadata("stream", fmt::format("{}", reinterpret_cast(item.stream))); + } + + logger.handleGenericActivity(a); + ++count; + } + + // Kernel Launch Api calls + + for (auto &item : kernelRows_) { + GenericTraceActivity a; + a.startTime = (item.begin + toffset) / 1000; + a.endTime = (item.end + toffset) / 1000; + a.id = item.id; + a.device = item.pid; + a.resource = item.tid; + a.activityType = ActivityType::CUDA_RUNTIME; + a.activityName = std::string(roctracer_op_string(ACTIVITY_DOMAIN_HIP_API, item.cid, 0)); + a.flow.id = item.id; + a.flow.type = kLinkAsyncCpuGpu; + a.flow.start = true; + + if (item.functionAddr != nullptr) { + a.addMetadataQuoted( + "kernel", demangle(hipKernelNameRefByPtr(item.functionAddr, item.stream))); + } + else if (item.function != nullptr) { + a.addMetadataQuoted( + "kernel", demangle(hipKernelNameRef(item.function))); + } + a.addMetadata("grid dim", fmt::format("[{}, {}, {}]", item.gridX, item.gridY, item.gridZ)); + a.addMetadata("block dim", fmt::format("[{}, {}, {}]", item.workgroupX, item.workgroupY, item.workgroupZ)); + a.addMetadata("shared size", item.groupSegmentSize); + a.addMetadata("stream", fmt::format("{}", reinterpret_cast(item.stream))); + + // Stash launches to tie to the async ops + kernelLaunches_[a.id] = a; + + // Stash kernel names to tie to the async ops + std::string name; + if (item.functionAddr != nullptr) { + name = demangle(hipKernelNameRefByPtr(item.functionAddr, item.stream)); + } + else if (item.function != nullptr) { + name = demangle(hipKernelNameRef(item.function)); + } + if (!name.empty()) { + uint32_t string_id = reverseStrings_[name]; + if (string_id == 0) { + string_id = nextStringId_++; + reverseStrings_[name] = string_id; + strings_[string_id] = name; + } + kernelNames_[item.id] = string_id; + } + + logger.handleGenericActivity(a); + ++count; + } + + // Async Ops + + for (auto& buffer : *gpuTraceBuffers_) { + const roctracer_record_t* record = (const roctracer_record_t*)(buffer.data); + const roctracer_record_t* end_record = (const roctracer_record_t*)(buffer.data + buffer.validSize); + GenericTraceActivity a; + + while (record < end_record) { + if ((record->domain == ACTIVITY_DOMAIN_HIP_API) && (loggedIds_.contains(record->op))) { + const char *name = roctracer_op_string(record->domain, record->op, record->kind); + a.device = record->process_id; + a.resource = record->thread_id; + + a.startTime = (record->begin_ns + toffset) / 1000; + a.endTime = (record->end_ns + toffset) / 1000; + a.id = record->correlation_id; + + a.activityType = ActivityType::CUDA_RUNTIME; + a.activityName = std::string(name); + a.flow.id = record->correlation_id; + a.flow.type = kLinkAsyncCpuGpu; + a.flow.start = true; + + logger.handleGenericActivity(a); + ++count; + } + else if (record->domain == ACTIVITY_DOMAIN_HCC_OPS) { + // Overlay launch metadata for kernels + auto kit = kernelLaunches_.find(record->correlation_id); + if (kit != kernelLaunches_.end()) { + a = (*kit).second; + } + + const char *name = roctracer_op_string(record->domain, record->op, record->kind); + a.device = record->device_id; + a.resource = record->queue_id; + + a.startTime = (record->begin_ns + toffset) / 1000; + a.endTime = (record->end_ns + toffset) / 1000; + a.id = record->correlation_id; + + a.activityType = ActivityType::CONCURRENT_KERNEL; + a.activityName = std::string(name); + a.flow.id = record->correlation_id; + a.flow.type = kLinkAsyncCpuGpu; + + auto it = kernelNames_.find(record->correlation_id); + if (it != kernelNames_.end()) { + a.activityName = strings_[it->second]; + } + + logger.handleGenericActivity(a); + ++count; + } + + roctracer_next_record(record, &record); + } + } + return count; +} + +void RoctracerActivityApi::clearActivities() { + gpuTraceBuffers_->clear(); + rows_.clear(); + kernelRows_.clear(); + copyRows_.clear(); + mallocRows_.clear(); + kernelLaunches_.clear(); +} + +void RoctracerActivityApi::api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void* arg) +{ + RoctracerActivityApi *dis = &singleton(); + + if (domain == ACTIVITY_DOMAIN_HIP_API && dis->loggedIds_.contains(cid)) { + const hip_api_data_t* data = (const hip_api_data_t*)(callback_data); + + // Pack callbacks into row structures + + static timespec timestamp; // FIXME verify thread safety + + if (data->phase == ACTIVITY_API_PHASE_ENTER) { + clock_gettime(CLOCK_MONOTONIC, ×tamp); // record proper clock + } + else { // (data->phase == ACTIVITY_API_PHASE_EXIT) + timespec endTime; + timespec startTime { timestamp }; + clock_gettime(CLOCK_MONOTONIC, &endTime); // record proper clock + + switch (cid) { + case HIP_API_ID_hipLaunchKernel: + case HIP_API_ID_hipExtLaunchKernel: + case HIP_API_ID_hipLaunchCooperativeKernel: // Should work here + { + auto &args = data->args.hipLaunchKernel; + dis->kernelRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + args.function_address, + nullptr, + args.numBlocks.x, + args.numBlocks.y, + args.numBlocks.z, + args.dimBlocks.x, + args.dimBlocks.y, + args.dimBlocks.z, + args.sharedMemBytes, + args.stream + ); + } + break; + case HIP_API_ID_hipHccModuleLaunchKernel: + case HIP_API_ID_hipModuleLaunchKernel: + case HIP_API_ID_hipExtModuleLaunchKernel: + { + auto &args = data->args.hipModuleLaunchKernel; + dis->kernelRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + nullptr, + args.f, + args.gridDimX, + args.gridDimY, + args.gridDimZ, + args.blockDimX, + args.blockDimY, + args.blockDimZ, + args.sharedMemBytes, + args.stream + ); + } + break; + case HIP_API_ID_hipLaunchCooperativeKernelMultiDevice: + case HIP_API_ID_hipExtLaunchMultiKernelMultiDevice: +#if 0 + { + auto &args = data->args.hipLaunchCooperativeKernelMultiDevice.launchParamsList__val; + dis->kernelRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + args.function_address, + nullptr, + args.numBlocks.x, + args.numBlocks.y, + args.numBlocks.z, + args.dimBlocks.x, + args.dimBlocks.y, + args.dimBlocks.z, + args.sharedMemBytes, + args.stream + ); + } +#endif + break; + case HIP_API_ID_hipMalloc: + dis->mallocRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + data->args.hipMalloc.ptr__val, + data->args.hipMalloc.size + ); + break; + case HIP_API_ID_hipFree: + dis->mallocRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + data->args.hipFree.ptr, + 0 + ); + break; + case HIP_API_ID_hipMemcpy: + { + auto &args = data->args.hipMemcpy; + dis->copyRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + args.src, + args.dst, + args.sizeBytes, + args.kind, + static_cast(0) // use placeholder? + ); + } + break; + case HIP_API_ID_hipMemcpyAsync: + case HIP_API_ID_hipMemcpyWithStream: + { + auto &args = data->args.hipMemcpyAsync; + dis->copyRows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime), + args.src, + args.dst, + args.sizeBytes, + args.kind, + args.stream + ); + } + break; + default: + dis->rows_.emplace_back(data->correlation_id, + domain, + cid, + processId(), + systemThreadId(), + timespec_to_ns(startTime), + timespec_to_ns(endTime) + ); + break; + } + } + } +} + +void RoctracerActivityApi::activity_callback(const char* begin, const char* end, void* arg) +{ + size_t size = end - begin; + uint8_t *buffer = (uint8_t*) malloc(size); + auto &gpuTraceBuffers = singleton().gpuTraceBuffers_; + memcpy(buffer, begin, size); + gpuTraceBuffers->emplace_back(buffer, size); +} + +void RoctracerActivityApi::enableActivities( + const std::set& selected_activities) { +#ifdef HAS_ROCTRACER + if (!registered_) { + roctracer_set_properties(ACTIVITY_DOMAIN_HIP_API, nullptr); // Magic encantation + + // Set some api calls to ignore + loggedIds_.setInvertMode(true); // Omit the specified api + loggedIds_.add("hipGetDevice"); + loggedIds_.add("hipSetDevice"); + loggedIds_.add("hipGetLastError"); + loggedIds_.add("__hipPushCallConfiguration"); + loggedIds_.add("__hipPopCallConfiguration"); + loggedIds_.add("hipCtxSetCurrent"); + loggedIds_.add("hipEventRecord"); + loggedIds_.add("hipEventQuery"); + loggedIds_.add("hipGetDeviceProperties"); + loggedIds_.add("hipPeekAtLastError"); + loggedIds_.add("hipModuleGetFunction"); + loggedIds_.add("hipEventCreateWithFlags"); + + // Enable API callbacks + if (loggedIds_.invertMode() == true) { + // exclusion list - enable entire domain and turn off things in list + roctracer_enable_domain_callback(ACTIVITY_DOMAIN_HIP_API, api_callback, nullptr); + const std::unordered_map &filter = loggedIds_.filterList(); + for (auto it = filter.begin(); it != filter.end(); ++it) { + roctracer_disable_op_callback(ACTIVITY_DOMAIN_HIP_API, it->first); + } + } + else { + // inclusion list - only enable things in the list + const std::unordered_map &filter = loggedIds_.filterList(); + roctracer_disable_domain_callback(ACTIVITY_DOMAIN_HIP_API); + for (auto it = filter.begin(); it != filter.end(); ++it) { + roctracer_enable_op_callback(ACTIVITY_DOMAIN_HIP_API, it->first, api_callback, nullptr); + } + } + //roctracer_enable_domain_callback(ACTIVITY_DOMAIN_ROCTX, api_callback, nullptr); + + // Allocate default tracing pool + roctracer_properties_t properties; + memset(&properties, 0, sizeof(roctracer_properties_t)); + properties.buffer_size = 0x1000; + roctracer_open_pool(&properties); + + // Enable async op collection + roctracer_properties_t hcc_cb_properties; + memset(&hcc_cb_properties, 0, sizeof(roctracer_properties_t)); + hcc_cb_properties.buffer_size = 0x4000; + hcc_cb_properties.buffer_callback_fun = activity_callback; + roctracer_open_pool_expl(&hcc_cb_properties, &hccPool_); + roctracer_enable_domain_activity_expl(ACTIVITY_DOMAIN_HCC_OPS, hccPool_); + + registered_ = true; + } + + for (const auto& activity : selected_activities) { + if (activity == ActivityType::EXTERNAL_CORRELATION) { + externalCorrelationEnabled_ = true; + } + } + + roctracer_start(); +#endif +} + +void RoctracerActivityApi::disableActivities( + const std::set& selected_activities) { +#ifdef HAS_ROCTRACER + roctracer_stop(); + roctracer_flush_activity_expl(hccPool_); + + for (const auto& activity : selected_activities) { + if (activity == ActivityType::EXTERNAL_CORRELATION) { + externalCorrelationEnabled_ = false; + } + } +#endif +} + +void RoctracerActivityApi::endTracing() { + if (registered_ == true) { + roctracer_disable_domain_callback(ACTIVITY_DOMAIN_HIP_API); + //roctracer_disable_domain_callback(ACTIVITY_DOMAIN_ROCTX); + + roctracer_disable_domain_activity(ACTIVITY_DOMAIN_HCC_OPS); + roctracer_close_pool_expl(hccPool_); + } +} + + +ApiIdList::ApiIdList() +: invert_(true) +{ +} + +void ApiIdList::add(std::string apiName) +{ + uint32_t cid = 0; + if (roctracer_op_code(ACTIVITY_DOMAIN_HIP_API, apiName.c_str(), &cid, nullptr) == ROCTRACER_STATUS_SUCCESS) { + filter_[cid] = 1; + } +} +void ApiIdList::remove(std::string apiName) +{ + uint32_t cid = 0; + if (roctracer_op_code(ACTIVITY_DOMAIN_HIP_API, apiName.c_str(), &cid, nullptr) == ROCTRACER_STATUS_SUCCESS) { + filter_.erase(cid); + } +} + +bool ApiIdList::loadUserPrefs() +{ + // placeholder + return false; +} +bool ApiIdList::contains(uint32_t apiId) +{ + return (filter_.find(apiId) != filter_.end()) ? !invert_ : invert_; // XOR +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityApi.h b/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityApi.h new file mode 100644 index 0000000000000000000000000000000000000000..28280253e7c8426e85c11d679785bcd74fa2a0c7 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityApi.h @@ -0,0 +1,171 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef HAS_ROCTRACER +#include +#include +#include +#include +#include +#endif + +#include "ActivityType.h" +#include "GenericTraceActivity.h" +#include "RoctracerActivityBuffer.h" + + +namespace KINETO_NAMESPACE { + +using namespace libkineto; + +class ApiIdList +{ +public: + ApiIdList(); + bool invertMode() { return invert_; } + void setInvertMode(bool invert) { invert_ = invert; } + void add(std::string apiName); + void remove(std::string apiName); + bool loadUserPrefs(); + bool contains(uint32_t apiId); + const std::unordered_map &filterList() { return filter_; } + +private: + std::unordered_map filter_; + bool invert_; +}; + +struct roctracerRow { + roctracerRow(uint64_t id, uint32_t domain, uint32_t cid, uint32_t pid + , uint32_t tid, uint64_t begin, uint64_t end) + : id(id), domain(domain), cid(cid), pid(pid), tid(tid), begin(begin), end(end) {} + uint64_t id; // correlation_id + uint32_t domain; + uint32_t cid; + uint32_t pid; + uint32_t tid; + uint64_t begin; + uint64_t end; +}; + +struct kernelRow : public roctracerRow { + kernelRow(uint64_t id, uint32_t domain, uint32_t cid, uint32_t pid + , uint32_t tid, uint64_t begin, uint64_t end + , const void *faddr, hipFunction_t function + , unsigned int gx, unsigned int gy, unsigned int gz + , unsigned int wx, unsigned int wy, unsigned int wz + , size_t gss, hipStream_t stream) + : roctracerRow(id, domain, cid, pid, tid, begin, end), functionAddr(faddr) + , function(function), gridX(gx), gridY(gy), gridZ(gz) + , workgroupX(wx), workgroupY(wy), workgroupZ(wz), groupSegmentSize(gss) + , stream(stream) {} + const void* functionAddr; + hipFunction_t function; + unsigned int gridX; + unsigned int gridY; + unsigned int gridZ; + unsigned int workgroupX; + unsigned int workgroupY; + unsigned int workgroupZ; + size_t groupSegmentSize; + hipStream_t stream; +}; + +struct copyRow : public roctracerRow { + copyRow(uint64_t id, uint32_t domain, uint32_t cid, uint32_t pid + , uint32_t tid, uint64_t begin, uint64_t end + , const void* src, const void *dst, size_t size, hipMemcpyKind kind + , hipStream_t stream) + : roctracerRow(id, domain, cid, pid, tid, begin, end) + , src(src), dst(dst), size(size), kind(kind), stream(stream) {} + const void *src; + const void *dst; + size_t size; + hipMemcpyKind kind; + hipStream_t stream; +}; + +struct mallocRow : public roctracerRow { + mallocRow(uint64_t id, uint32_t domain, uint32_t cid, uint32_t pid + , uint32_t tid, uint64_t begin, uint64_t end + , const void* ptr, size_t size) + : roctracerRow(id, domain, cid, pid, tid, begin, end) + , ptr(ptr), size(size) {} + const void *ptr; + size_t size; +}; + + +class RoctracerActivityApi { + public: + enum CorrelationFlowType { + Default, + User + }; + + RoctracerActivityApi(); + RoctracerActivityApi(const RoctracerActivityApi&) = delete; + RoctracerActivityApi& operator=(const RoctracerActivityApi&) = delete; + + virtual ~RoctracerActivityApi(); + + static RoctracerActivityApi& singleton(); + + static void pushCorrelationID(int id, CorrelationFlowType type); + static void popCorrelationID(CorrelationFlowType type); + + void enableActivities( + const std::set& selected_activities); + void disableActivities( + const std::set& selected_activities); + void clearActivities(); + + int processActivities(ActivityLogger& logger); + + void setMaxBufferSize(int size); + + std::atomic_bool stopCollection{false}; + + private: + bool registered_{false}; + void endTracing(); + +#ifdef HAS_ROCTRACER + roctracer_pool_t *hccPool_{NULL}; + static void api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void* arg); + static void activity_callback(const char* begin, const char* end, void* arg); + + //Name cache + uint32_t nextStringId_{2}; + std::map strings_; + std::map reverseStrings_; + std::map kernelNames_; + + ApiIdList loggedIds_; + + // Api callback data + std::deque rows_; + std::deque kernelRows_; + std::deque copyRows_; + std::deque mallocRows_; + std::map kernelLaunches_; +#endif + + int maxGpuBufferCount_{0}; + std::unique_ptr> gpuTraceBuffers_; + bool externalCorrelationEnabled_{true}; +}; + +} // namespace KINETO_NAMESPACE + diff --git a/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityBuffer.h b/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityBuffer.h new file mode 100644 index 0000000000000000000000000000000000000000..cd8a5709a841b7c988ab3f2d1f3108d693343584 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/RoctracerActivityBuffer.h @@ -0,0 +1,30 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include + +namespace KINETO_NAMESPACE { + +class RoctracerActivityBuffer { + public: + // data must be allocated using malloc. + // Ownership is transferred to this object. + RoctracerActivityBuffer(uint8_t* data, size_t validSize) + : data(data), validSize(validSize) {} + + ~RoctracerActivityBuffer() { + free(data); + } + + // Allocated by malloc + uint8_t* data{nullptr}; + + // Number of bytes used + size_t validSize; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/SampleListener.h b/plugins/tensorboard-plugins/libkineto/src/SampleListener.h new file mode 100644 index 0000000000000000000000000000000000000000..bff86ad122a051d4f3dfdbdd329a3b63d93a7c77 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/SampleListener.h @@ -0,0 +1,146 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include + +namespace KINETO_NAMESPACE { + +class Config; + +class SampleValue { + public: + template + explicit SampleValue(T v) { + init(v); + } + + SampleValue(const SampleValue&) = default; + SampleValue& operator=(const SampleValue&) = delete; + SampleValue(SampleValue&&) = default; + SampleValue& operator=(SampleValue&&) = default; + + bool isInt() const { + return type_ == INT64; + } + + int64_t getInt() const { + assert(isInt()); + return int_; + } + + bool isDouble() const { + return type_ == DOUBLE; + } + + double getDouble() const { + assert(isDouble()); + return dbl_; + } + + inline void operator*=(double x) { + assert(isDouble() || isInt()); + if (isDouble()) { + dbl_ *= x; + } else { + int_ = std::round(int_ * x); + } + } + + inline bool operator<(const SampleValue& o) const { + if (type_ != o.type_) { + return type_ < o.type_; + } else if (type_ == INT64) { + return int_ < o.int_; + } else if (type_ == DOUBLE) { + return dbl_ < o.dbl_; + } + assert(false); + return true; + } + + void print(std::ostream& s) const { + if (type_ == INT64) { + s << int_; + } else if (type_ == DOUBLE) { + s << dbl_; + } else { + assert(false); + } + } + + private: + enum Type { INT64, DOUBLE }; + + template + void init(T v); + + Type type_{INT64}; + union { + int64_t int_{0}; + double dbl_; + }; +}; + +template <> +inline void SampleValue::init(uint64_t v) { + int_ = v, type_ = INT64; +} +template <> +inline void SampleValue::init(int64_t v) { + int_ = v, type_ = INT64; +} +template <> +inline void SampleValue::init(int v) { + int_ = v, type_ = INT64; +} +template <> +inline void SampleValue::init(double v) { + dbl_ = v, type_ = DOUBLE; +} + +inline std::ostream& operator<<(std::ostream& out, const SampleValue& s) { + s.print(out); + return out; +} + +using PercentileList = std::vector>; + +struct Stat { + const std::string& name; + const PercentileList percentileValues; + SampleValue total; +}; + +struct Sample { + Sample(int stats_count) { + stats.reserve(stats_count); + } + + // Offset in milliseconds from first sample in report + int deltaMsec; + std::vector stats; +}; + +// Inherit from this to be notified of samples +class SampleListener { + public: + SampleListener(const SampleListener&) = delete; + SampleListener& operator=(const SampleListener&) = delete; + + virtual ~SampleListener(){}; + + // Report bucketed & aggregated values for event + virtual void handleSample(int device, const Sample& sample, bool from_new_version) = 0; + + virtual void update(const Config& config) = 0; + + protected: + SampleListener() = default; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/ScopeExit.h b/plugins/tensorboard-plugins/libkineto/src/ScopeExit.h new file mode 100644 index 0000000000000000000000000000000000000000..b9a6bc83ef942c7fb0e4b198b0396e5d75aa5a3a --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ScopeExit.h @@ -0,0 +1,29 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +// Implement a simple scope handler allowing a function to release +// resources when an error or exception occurs + +template +class ScopeExit { + public: + explicit ScopeExit(T t) : t(t) {} + ~ScopeExit() { + t(); + } + T t; +}; + +template +ScopeExit makeScopeExit(T t) { + return ScopeExit(t); +}; + +// Add a level of indirection so __LINE__ is expanded +#define __kINETO_CONCAT(name, line) name##line +#define ANON_VAR(name, line) __kINETO_CONCAT(name, line) + +#define SCOPE_EXIT(func) \ + const auto ANON_VAR(SCOPE_BLOCK, __LINE__) = \ + makeScopeExit([=]() { func; }) diff --git a/plugins/tensorboard-plugins/libkineto/src/ThreadUtil.cpp b/plugins/tensorboard-plugins/libkineto/src/ThreadUtil.cpp new file mode 100644 index 0000000000000000000000000000000000000000..0f67d54d58512aa47b05aed69748a6894aa06b1c --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/ThreadUtil.cpp @@ -0,0 +1,203 @@ +#include "ThreadUtil.h" + +#ifndef _MSC_VER +#include +#include +#include +#include +#else // _MSC_VER +#include +#include +#define WIN32_LEAN_AND_MEAN +#define NOGDI +#include +#include +#undef ERROR +#endif // _MSC_VER + +#ifdef __ANDROID__ +#include +#endif + +#include +#include +#include + +namespace libkineto { + +namespace { +thread_local int32_t _pid = 0; +thread_local int32_t _tid = 0; +thread_local int32_t _sysTid = 0; +} + +int32_t processId() { + if (!_pid) { +#ifndef _MSC_VER + _pid = (int32_t)getpid(); +#else + _pid = (int32_t)GetCurrentProcessId(); +#endif + } + return _pid; +} + +int32_t systemThreadId() { + if (!_sysTid) { +#ifdef __APPLE__ + _sysTid = (int32_t)syscall(SYS_thread_selfid); +#elif defined _MSC_VER + _sysTid = (int32_t)GetCurrentThreadId(); +#else + _sysTid = (int32_t)syscall(SYS_gettid); +#endif + } + return _sysTid; +} + +int32_t threadId() { + if (!_tid) { +#ifdef __APPLE__ + uint64_t tid; + pthread_threadid_np(nullptr, &tid); + _tid = tid; +#elif defined _MSC_VER + _tid = (int32_t)GetCurrentThreadId(); +#else + pthread_t pth = pthread_self(); + int32_t* ptr = reinterpret_cast(&pth); + _tid = *ptr; +#endif + } + return _tid; +} + +namespace { +static constexpr size_t kMaxThreadNameLength = 16; + +static constexpr const char* basename(const char* s, int off = 0) { + return !s[off] + ? s + : s[off] == '/' ? basename(&s[off + 1]) : basename(s, off + 1); +} +#if defined(_MSC_VER) +void *getKernel32Func(const char* procName) { + return GetProcAddress(GetModuleHandleA("KERNEL32.DLL"), procName); +} +#endif +} + +bool setThreadName(const std::string& name) { +#ifdef __APPLE__ + return 0 == pthread_setname_np(name.c_str()); +#elif defined _MSC_VER + // Per https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreaddescription + // Use runtime linking to set thread description + static auto _SetThreadDescription = reinterpret_cast(getKernel32Func("SetThreadDescription")); + if (!_SetThreadDescription) { + return false; + } + std::wstring_convert> conv; + std::wstring wname = conv.from_bytes(name); + HRESULT hr = _SetThreadDescription(GetCurrentThread(), wname.c_str()); + return SUCCEEDED(hr); +#else + return 0 == pthread_setname_np(pthread_self(), name.c_str()); +#endif +} + +std::string getThreadName() { +#ifndef _MSC_VER + char buf[kMaxThreadNameLength] = ""; + if ( +#ifndef __ANDROID__ + pthread_getname_np(pthread_self(), buf, kMaxThreadNameLength) != 0 +#else + prctl(PR_GET_NAME, buf, kMaxThreadNameLength) != 0 +#endif + ) { + return "Unknown"; + } + return buf; +#else // _MSC_VER + static auto _GetThreadDescription = reinterpret_cast(getKernel32Func("GetThreadDescription")); + if (!_GetThreadDescription) { + return "Unknown"; + } + PWSTR data; + HRESULT hr = _GetThreadDescription(GetCurrentThread(), &data); + if (!SUCCEEDED(hr)) { + return ""; + } + std::wstring_convert> conv; + std::string name = conv.to_bytes(data); + LocalFree(data); + return name; +#endif +} + +// Linux: +// Extract process name from /proc/pid/cmdline. This does not have +// the 16 character limit that /proc/pid/status and /prod/pid/comm has. +std::string processName(int32_t pid) { +#ifdef __linux__ + FILE* cmdfile = fopen(fmt::format("/proc/{}/cmdline", pid).c_str(), "r"); + if (cmdfile != nullptr) { + char* command = nullptr; + int scanned = fscanf(cmdfile, "%ms", &command); + fclose(cmdfile); + if (scanned > 0 && command) { + std::string ret(basename(command)); + free(command); + return ret; + } + } + std::cerr << "Failed to read process name for pid " << pid << std::endl; +#endif + return ""; +} + +// Max number of parent pids to collect, just for extra safeguarding. +constexpr int kMaxParentPids = 10; + +// Return a pair of +static std::pair parentPidAndCommand(int32_t pid) { +#ifdef __linux__ + FILE* statfile = fopen(fmt::format("/proc/{}/stat", pid).c_str(), "r"); + if (statfile == nullptr) { + return std::make_pair(0, ""); + } + int32_t parent_pid; + char* command = nullptr; + int scanned = fscanf(statfile, "%*d (%m[^)]) %*c %d", &command, &parent_pid); + fclose(statfile); + std::pair ret; + if (scanned == 2) { + ret = std::make_pair(parent_pid, std::string(command)); + } else { + std::cerr << "Failed to parse /proc/" << pid << "/stat" << std::endl; + ret = std::make_pair(0, ""); + } + + // The 'm' character in the format tells fscanf to allocate memory + // for the parsed string, which we need to free here. + free(command); + return ret; +#else + return std::make_pair(0, ""); +#endif +} + +std::vector> pidCommandPairsOfAncestors() { + std::vector> pairs; + pairs.reserve(kMaxParentPids + 1); + int32_t curr_pid = processId(); + for (int i = 0; i <= kMaxParentPids && curr_pid > 1; i++) { + std::pair ppid_and_comm = parentPidAndCommand(curr_pid); + pairs.push_back(std::make_pair(curr_pid, ppid_and_comm.second)); + curr_pid = ppid_and_comm.first; + } + return pairs; +} + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/WeakSymbols.cpp b/plugins/tensorboard-plugins/libkineto/src/WeakSymbols.cpp new file mode 100644 index 0000000000000000000000000000000000000000..540a5ac8f97c8f38c7ee3d31ea285a3ab7c9f375 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/WeakSymbols.cpp @@ -0,0 +1,12 @@ +#include + +#ifndef _MSC_VER +extern "C" { +// This function is needed to avoid superfluous dependency on GNU OpenMP library when cuPTI is linked statically +// For more details see https://github.com/pytorch/pytorch/issues/51026 +__attribute__((weak)) int acc_get_device_type() { + throw std::runtime_error("Dummy implementation of acc_get_device_type is not supposed to be called!"); +} + +} // extern "C" +#endif diff --git a/plugins/tensorboard-plugins/libkineto/src/cupti_call.h b/plugins/tensorboard-plugins/libkineto/src/cupti_call.h new file mode 100644 index 0000000000000000000000000000000000000000..fd6ebae7691ed607867db5717248ba22f4efa5c0 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/cupti_call.h @@ -0,0 +1,33 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +#ifdef HAS_CUPTI + +#include + +#define CUPTI_CALL(call) \ + [&]() -> CUptiResult { \ + CUptiResult _status_ = call; \ + if (_status_ != CUPTI_SUCCESS) { \ + const char* _errstr_ = nullptr; \ + cuptiGetResultString(_status_, &_errstr_); \ + LOG(WARNING) << fmt::format( \ + "function {} failed with error {} ({})", \ + #call, \ + _errstr_, \ + (int)_status_); \ + } \ + return _status_; \ + }() + +#define CUPTI_CALL_NOWARN(call) call + +#else + +#define CUPTI_CALL(call) call +#define CUPTI_CALL_NOWARN(call) call + +#endif // HAS_CUPTI diff --git a/plugins/tensorboard-plugins/libkineto/src/cupti_strings.cpp b/plugins/tensorboard-plugins/libkineto/src/cupti_strings.cpp new file mode 100644 index 0000000000000000000000000000000000000000..4535273a277e04b0b6f98b539df82955ef62468f --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/cupti_strings.cpp @@ -0,0 +1,502 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "cupti_strings.h" + +namespace libkineto { + +const char* memcpyKindString( + CUpti_ActivityMemcpyKind kind) { + switch (kind) { + case CUPTI_ACTIVITY_MEMCPY_KIND_HTOD: + return "HtoD"; + case CUPTI_ACTIVITY_MEMCPY_KIND_DTOH: + return "DtoH"; + case CUPTI_ACTIVITY_MEMCPY_KIND_HTOA: + return "HtoA"; + case CUPTI_ACTIVITY_MEMCPY_KIND_ATOH: + return "AtoH"; + case CUPTI_ACTIVITY_MEMCPY_KIND_ATOA: + return "AtoA"; + case CUPTI_ACTIVITY_MEMCPY_KIND_ATOD: + return "AtoD"; + case CUPTI_ACTIVITY_MEMCPY_KIND_DTOA: + return "DtoA"; + case CUPTI_ACTIVITY_MEMCPY_KIND_DTOD: + return "DtoD"; + case CUPTI_ACTIVITY_MEMCPY_KIND_HTOH: + return "HtoH"; + case CUPTI_ACTIVITY_MEMCPY_KIND_PTOP: + return "PtoP"; + default: + break; + } + return ""; +} + +const char* memoryKindString( + CUpti_ActivityMemoryKind kind) { + switch (kind) { + case CUPTI_ACTIVITY_MEMORY_KIND_UNKNOWN: + return "Unknown"; + case CUPTI_ACTIVITY_MEMORY_KIND_PAGEABLE: + return "Pageable"; + case CUPTI_ACTIVITY_MEMORY_KIND_PINNED: + return "Pinned"; + case CUPTI_ACTIVITY_MEMORY_KIND_DEVICE: + return "Device"; + case CUPTI_ACTIVITY_MEMORY_KIND_ARRAY: + return "Array"; + case CUPTI_ACTIVITY_MEMORY_KIND_MANAGED: + return "Managed"; + case CUPTI_ACTIVITY_MEMORY_KIND_DEVICE_STATIC: + return "Device Static"; + case CUPTI_ACTIVITY_MEMORY_KIND_MANAGED_STATIC: + return "Managed Static"; + case CUPTI_ACTIVITY_MEMORY_KIND_FORCE_INT: + return "Force Int"; + default: + return "Unrecognized"; + } +} + +const char* overheadKindString( + CUpti_ActivityOverheadKind kind) { + switch (kind) { + case CUPTI_ACTIVITY_OVERHEAD_UNKNOWN: + return "Unknown"; + case CUPTI_ACTIVITY_OVERHEAD_DRIVER_COMPILER: + return "Driver Compiler"; + case CUPTI_ACTIVITY_OVERHEAD_CUPTI_BUFFER_FLUSH: + return "Buffer Flush"; + case CUPTI_ACTIVITY_OVERHEAD_CUPTI_INSTRUMENTATION: + return "Instrumentation"; + case CUPTI_ACTIVITY_OVERHEAD_CUPTI_RESOURCE: + return "Resource"; + case CUPTI_ACTIVITY_OVERHEAD_FORCE_INT: + return "Force Int"; + default: + return "Unrecognized"; + } +} + + + +static const char* runtimeCbidNames[] = { + "INVALID", + "cudaDriverGetVersion", + "cudaRuntimeGetVersion", + "cudaGetDeviceCount", + "cudaGetDeviceProperties", + "cudaChooseDevice", + "cudaGetChannelDesc", + "cudaCreateChannelDesc", + "cudaConfigureCall", + "cudaSetupArgument", + "cudaGetLastError", + "cudaPeekAtLastError", + "cudaGetErrorString", + "cudaLaunch", + "cudaFuncSetCacheConfig", + "cudaFuncGetAttributes", + "cudaSetDevice", + "cudaGetDevice", + "cudaSetValidDevices", + "cudaSetDeviceFlags", + "cudaMalloc", + "cudaMallocPitch", + "cudaFree", + "cudaMallocArray", + "cudaFreeArray", + "cudaMallocHost", + "cudaFreeHost", + "cudaHostAlloc", + "cudaHostGetDevicePointer", + "cudaHostGetFlags", + "cudaMemGetInfo", + "cudaMemcpy", + "cudaMemcpy2D", + "cudaMemcpyToArray", + "cudaMemcpy2DToArray", + "cudaMemcpyFromArray", + "cudaMemcpy2DFromArray", + "cudaMemcpyArrayToArray", + "cudaMemcpy2DArrayToArray", + "cudaMemcpyToSymbol", + "cudaMemcpyFromSymbol", + "cudaMemcpyAsync", + "cudaMemcpyToArrayAsync", + "cudaMemcpyFromArrayAsync", + "cudaMemcpy2DAsync", + "cudaMemcpy2DToArrayAsync", + "cudaMemcpy2DFromArrayAsync", + "cudaMemcpyToSymbolAsync", + "cudaMemcpyFromSymbolAsync", + "cudaMemset", + "cudaMemset2D", + "cudaMemsetAsync", + "cudaMemset2DAsync", + "cudaGetSymbolAddress", + "cudaGetSymbolSize", + "cudaBindTexture", + "cudaBindTexture2D", + "cudaBindTextureToArray", + "cudaUnbindTexture", + "cudaGetTextureAlignmentOffset", + "cudaGetTextureReference", + "cudaBindSurfaceToArray", + "cudaGetSurfaceReference", + "cudaGLSetGLDevice", + "cudaGLRegisterBufferObject", + "cudaGLMapBufferObject", + "cudaGLUnmapBufferObject", + "cudaGLUnregisterBufferObject", + "cudaGLSetBufferObjectMapFlags", + "cudaGLMapBufferObjectAsync", + "cudaGLUnmapBufferObjectAsync", + "cudaWGLGetDevice", + "cudaGraphicsGLRegisterImage", + "cudaGraphicsGLRegisterBuffer", + "cudaGraphicsUnregisterResource", + "cudaGraphicsResourceSetMapFlags", + "cudaGraphicsMapResources", + "cudaGraphicsUnmapResources", + "cudaGraphicsResourceGetMappedPointer", + "cudaGraphicsSubResourceGetMappedArray", + "cudaVDPAUGetDevice", + "cudaVDPAUSetVDPAUDevice", + "cudaGraphicsVDPAURegisterVideoSurface", + "cudaGraphicsVDPAURegisterOutputSurface", + "cudaD3D11GetDevice", + "cudaD3D11GetDevices", + "cudaD3D11SetDirect3DDevice", + "cudaGraphicsD3D11RegisterResource", + "cudaD3D10GetDevice", + "cudaD3D10GetDevices", + "cudaD3D10SetDirect3DDevice", + "cudaGraphicsD3D10RegisterResource", + "cudaD3D10RegisterResource", + "cudaD3D10UnregisterResource", + "cudaD3D10MapResources", + "cudaD3D10UnmapResources", + "cudaD3D10ResourceSetMapFlags", + "cudaD3D10ResourceGetSurfaceDimensions", + "cudaD3D10ResourceGetMappedArray", + "cudaD3D10ResourceGetMappedPointer", + "cudaD3D10ResourceGetMappedSize", + "cudaD3D10ResourceGetMappedPitch", + "cudaD3D9GetDevice", + "cudaD3D9GetDevices", + "cudaD3D9SetDirect3DDevice", + "cudaD3D9GetDirect3DDevice", + "cudaGraphicsD3D9RegisterResource", + "cudaD3D9RegisterResource", + "cudaD3D9UnregisterResource", + "cudaD3D9MapResources", + "cudaD3D9UnmapResources", + "cudaD3D9ResourceSetMapFlags", + "cudaD3D9ResourceGetSurfaceDimensions", + "cudaD3D9ResourceGetMappedArray", + "cudaD3D9ResourceGetMappedPointer", + "cudaD3D9ResourceGetMappedSize", + "cudaD3D9ResourceGetMappedPitch", + "cudaD3D9Begin", + "cudaD3D9End", + "cudaD3D9RegisterVertexBuffer", + "cudaD3D9UnregisterVertexBuffer", + "cudaD3D9MapVertexBuffer", + "cudaD3D9UnmapVertexBuffer", + "cudaThreadExit", + "cudaSetDoubleForDevice", + "cudaSetDoubleForHost", + "cudaThreadSynchronize", + "cudaThreadGetLimit", + "cudaThreadSetLimit", + "cudaStreamCreate", + "cudaStreamDestroy", + "cudaStreamSynchronize", + "cudaStreamQuery", + "cudaEventCreate", + "cudaEventCreateWithFlags", + "cudaEventRecord", + "cudaEventDestroy", + "cudaEventSynchronize", + "cudaEventQuery", + "cudaEventElapsedTime", + "cudaMalloc3D", + "cudaMalloc3DArray", + "cudaMemset3D", + "cudaMemset3DAsync", + "cudaMemcpy3D", + "cudaMemcpy3DAsync", + "cudaThreadSetCacheConfig", + "cudaStreamWaitEvent", + "cudaD3D11GetDirect3DDevice", + "cudaD3D10GetDirect3DDevice", + "cudaThreadGetCacheConfig", + "cudaPointerGetAttributes", + "cudaHostRegister", + "cudaHostUnregister", + "cudaDeviceCanAccessPeer", + "cudaDeviceEnablePeerAccess", + "cudaDeviceDisablePeerAccess", + "cudaPeerRegister", + "cudaPeerUnregister", + "cudaPeerGetDevicePointer", + "cudaMemcpyPeer", + "cudaMemcpyPeerAsync", + "cudaMemcpy3DPeer", + "cudaMemcpy3DPeerAsync", + "cudaDeviceReset", + "cudaDeviceSynchronize", + "cudaDeviceGetLimit", + "cudaDeviceSetLimit", + "cudaDeviceGetCacheConfig", + "cudaDeviceSetCacheConfig", + "cudaProfilerInitialize", + "cudaProfilerStart", + "cudaProfilerStop", + "cudaDeviceGetByPCIBusId", + "cudaDeviceGetPCIBusId", + "cudaGLGetDevices", + "cudaIpcGetEventHandle", + "cudaIpcOpenEventHandle", + "cudaIpcGetMemHandle", + "cudaIpcOpenMemHandle", + "cudaIpcCloseMemHandle", + "cudaArrayGetInfo", + "cudaFuncSetSharedMemConfig", + "cudaDeviceGetSharedMemConfig", + "cudaDeviceSetSharedMemConfig", + "cudaCreateTextureObject", + "cudaDestroyTextureObject", + "cudaGetTextureObjectResourceDesc", + "cudaGetTextureObjectTextureDesc", + "cudaCreateSurfaceObject", + "cudaDestroySurfaceObject", + "cudaGetSurfaceObjectResourceDesc", + "cudaMallocMipmappedArray", + "cudaGetMipmappedArrayLevel", + "cudaFreeMipmappedArray", + "cudaBindTextureToMipmappedArray", + "cudaGraphicsResourceGetMappedMipmappedArray", + "cudaStreamAddCallback", + "cudaStreamCreateWithFlags", + "cudaGetTextureObjectResourceViewDesc", + "cudaDeviceGetAttribute", + "cudaStreamDestroy", + "cudaStreamCreateWithPriority", + "cudaStreamGetPriority", + "cudaStreamGetFlags", + "cudaDeviceGetStreamPriorityRange", + "cudaMallocManaged", + "cudaOccupancyMaxActiveBlocksPerMultiprocessor", + "cudaStreamAttachMemAsync", + "cudaGetErrorName", + "cudaOccupancyMaxActiveBlocksPerMultiprocessor", + "cudaLaunchKernel", + "cudaGetDeviceFlags", + "cudaLaunch_ptsz", + "cudaLaunchKernel_ptsz", + "cudaMemcpy_ptds", + "cudaMemcpy2D_ptds", + "cudaMemcpyToArray_ptds", + "cudaMemcpy2DToArray_ptds", + "cudaMemcpyFromArray_ptds", + "cudaMemcpy2DFromArray_ptds", + "cudaMemcpyArrayToArray_ptds", + "cudaMemcpy2DArrayToArray_ptds", + "cudaMemcpyToSymbol_ptds", + "cudaMemcpyFromSymbol_ptds", + "cudaMemcpyAsync_ptsz", + "cudaMemcpyToArrayAsync_ptsz", + "cudaMemcpyFromArrayAsync_ptsz", + "cudaMemcpy2DAsync_ptsz", + "cudaMemcpy2DToArrayAsync_ptsz", + "cudaMemcpy2DFromArrayAsync_ptsz", + "cudaMemcpyToSymbolAsync_ptsz", + "cudaMemcpyFromSymbolAsync_ptsz", + "cudaMemset_ptds", + "cudaMemset2D_ptds", + "cudaMemsetAsync_ptsz", + "cudaMemset2DAsync_ptsz", + "cudaStreamGetPriority_ptsz", + "cudaStreamGetFlags_ptsz", + "cudaStreamSynchronize_ptsz", + "cudaStreamQuery_ptsz", + "cudaStreamAttachMemAsync_ptsz", + "cudaEventRecord_ptsz", + "cudaMemset3D_ptds", + "cudaMemset3DAsync_ptsz", + "cudaMemcpy3D_ptds", + "cudaMemcpy3DAsync_ptsz", + "cudaStreamWaitEvent_ptsz", + "cudaStreamAddCallback_ptsz", + "cudaMemcpy3DPeer_ptds", + "cudaMemcpy3DPeerAsync_ptsz", + "cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags", + "cudaMemPrefetchAsync", + "cudaMemPrefetchAsync_ptsz", + "cudaMemAdvise", + "cudaDeviceGetP2PAttribute", + "cudaGraphicsEGLRegisterImage", + "cudaEGLStreamConsumerConnect", + "cudaEGLStreamConsumerDisconnect", + "cudaEGLStreamConsumerAcquireFrame", + "cudaEGLStreamConsumerReleaseFrame", + "cudaEGLStreamProducerConnect", + "cudaEGLStreamProducerDisconnect", + "cudaEGLStreamProducerPresentFrame", + "cudaEGLStreamProducerReturnFrame", + "cudaGraphicsResourceGetMappedEglFrame", + "cudaMemRangeGetAttribute", + "cudaMemRangeGetAttributes", + "cudaEGLStreamConsumerConnectWithFlags", + "cudaLaunchCooperativeKernel", + "cudaLaunchCooperativeKernel_ptsz", + "cudaEventCreateFromEGLSync", + "cudaLaunchCooperativeKernelMultiDevice", + "cudaFuncSetAttribute", + "cudaImportExternalMemory", + "cudaExternalMemoryGetMappedBuffer", + "cudaExternalMemoryGetMappedMipmappedArray", + "cudaDestroyExternalMemory", + "cudaImportExternalSemaphore", + "cudaSignalExternalSemaphoresAsync", + "cudaSignalExternalSemaphoresAsync_ptsz", + "cudaWaitExternalSemaphoresAsync", + "cudaWaitExternalSemaphoresAsync_ptsz", + "cudaDestroyExternalSemaphore", + "cudaLaunchHostFunc", + "cudaLaunchHostFunc_ptsz", + "cudaGraphCreate", + "cudaGraphKernelNodeGetParams", + "cudaGraphKernelNodeSetParams", + "cudaGraphAddKernelNode", + "cudaGraphAddMemcpyNode", + "cudaGraphMemcpyNodeGetParams", + "cudaGraphMemcpyNodeSetParams", + "cudaGraphAddMemsetNode", + "cudaGraphMemsetNodeGetParams", + "cudaGraphMemsetNodeSetParams", + "cudaGraphAddHostNode", + "cudaGraphHostNodeGetParams", + "cudaGraphAddChildGraphNode", + "cudaGraphChildGraphNodeGetGraph", + "cudaGraphAddEmptyNode", + "cudaGraphClone", + "cudaGraphNodeFindInClone", + "cudaGraphNodeGetType", + "cudaGraphGetRootNodes", + "cudaGraphNodeGetDependencies", + "cudaGraphNodeGetDependentNodes", + "cudaGraphAddDependencies", + "cudaGraphRemoveDependencies", + "cudaGraphDestroyNode", + "cudaGraphInstantiate", + "cudaGraphLaunch", + "cudaGraphLaunch_ptsz", + "cudaGraphExecDestroy", + "cudaGraphDestroy", + "cudaStreamBeginCapture", + "cudaStreamBeginCapture_ptsz", + "cudaStreamIsCapturing", + "cudaStreamIsCapturing_ptsz", + "cudaStreamEndCapture", + "cudaStreamEndCapture_ptsz", + "cudaGraphHostNodeSetParams", + "cudaGraphGetNodes", + "cudaGraphGetEdges", + "cudaStreamGetCaptureInfo", + "cudaStreamGetCaptureInfo_ptsz", + "cudaGraphExecKernelNodeSetParams", + "cudaThreadExchangeStreamCaptureMode", + "cudaDeviceGetNvSciSyncAttributes", + "cudaOccupancyAvailableDynamicSMemPerBlock", + "cudaStreamSetFlags", + "cudaStreamSetFlags_ptsz", + "cudaGraphExecMemcpyNodeSetParams", + "cudaGraphExecMemsetNodeSetParams", + "cudaGraphExecHostNodeSetParams", + "cudaGraphExecUpdate", + "cudaGetFuncBySymbol", + "cudaCtxResetPersistingL2Cache", + "cudaGraphKernelNodeCopyAttributes", + "cudaGraphKernelNodeGetAttribute", + "cudaGraphKernelNodeSetAttribute", + "cudaStreamCopyAttributes", + "cudaStreamCopyAttributes_ptsz", + "cudaStreamGetAttribute", + "cudaStreamGetAttribute_ptsz", + "cudaStreamSetAttribute", + "cudaStreamSetAttribute_ptsz", + "cudaDeviceGetTexture1DLinearMaxWidth", + "cudaGraphUpload", + "cudaGraphUpload_ptsz", + "cudaGraphAddMemcpyNodeToSymbol", + "cudaGraphAddMemcpyNodeFromSymbol", + "cudaGraphAddMemcpyNode1D", + "cudaGraphMemcpyNodeSetParamsToSymbol", + "cudaGraphMemcpyNodeSetParamsFromSymbol", + "cudaGraphMemcpyNodeSetParams1D", + "cudaGraphExecMemcpyNodeSetParamsToSymbol", + "cudaGraphExecMemcpyNodeSetParamsFromSymbol", + "cudaGraphExecMemcpyNodeSetParams1D", + "cudaArrayGetSparseProperties", + "cudaMipmappedArrayGetSparseProperties", + "cudaGraphExecChildGraphNodeSetParams", + "cudaGraphAddEventRecordNode", + "cudaGraphEventRecordNodeGetEvent", + "cudaGraphEventRecordNodeSetEvent", + "cudaGraphAddEventWaitNode", + "cudaGraphEventWaitNodeGetEvent", + "cudaGraphEventWaitNodeSetEvent", + "cudaGraphExecEventRecordNodeSetEvent", + "cudaGraphExecEventWaitNodeSetEvent", + "cudaEventRecordWithFlags", + "cudaEventRecordWithFlags_ptsz", + "cudaDeviceGetDefaultMemPool", + "cudaMallocAsync", + "cudaMallocAsync_ptsz", + "cudaFreeAsync", + "cudaFreeAsync_ptsz", + "cudaMemPoolTrimTo", + "cudaMemPoolSetAttribute", + "cudaMemPoolGetAttribute", + "cudaMemPoolSetAccess", + "cudaArrayGetPlane", + "cudaMemPoolGetAccess", + "cudaMemPoolCreate", + "cudaMemPoolDestroy", + "cudaDeviceSetMemPool", + "cudaDeviceGetMemPool", + "cudaMemPoolExportToShareableHandle", + "cudaMemPoolImportFromShareableHandle", + "cudaMemPoolExportPointer", + "cudaMemPoolImportPointer", + "cudaMallocFromPoolAsync", + "cudaMallocFromPoolAsync_ptsz", + "cudaSignalExternalSemaphoresAsync", + "cudaSignalExternalSemaphoresAsync", + "cudaWaitExternalSemaphoresAsync", + "cudaWaitExternalSemaphoresAsync", + "cudaGraphAddExternalSemaphoresSignalNode", + "cudaGraphExternalSemaphoresSignalNodeGetParams", + "cudaGraphExternalSemaphoresSignalNodeSetParams", + "cudaGraphAddExternalSemaphoresWaitNode", + "cudaGraphExternalSemaphoresWaitNodeGetParams", + "cudaGraphExternalSemaphoresWaitNodeSetParams", + "cudaGraphExecExternalSemaphoresSignalNodeSetParams", + "cudaGraphExecExternalSemaphoresWaitNodeSetParams", + "SIZE" +}; + +const char* runtimeCbidName(CUpti_CallbackId cbid) { + constexpr int names_size = + sizeof(runtimeCbidNames) / sizeof(runtimeCbidNames[0]); + if (cbid < 0 || cbid >= names_size) { + return runtimeCbidNames[CUPTI_RUNTIME_TRACE_CBID_INVALID]; + } + return runtimeCbidNames[cbid]; +} + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/cupti_strings.h b/plugins/tensorboard-plugins/libkineto/src/cupti_strings.h new file mode 100644 index 0000000000000000000000000000000000000000..bbfebb983648005d8268d9a29d613d369d6a5384 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/cupti_strings.h @@ -0,0 +1,14 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include + +namespace libkineto { + +const char* memoryKindString(CUpti_ActivityMemoryKind kind); +const char* memcpyKindString(CUpti_ActivityMemcpyKind kind); +const char* runtimeCbidName(CUpti_CallbackId cbid); +const char* overheadKindString(CUpti_ActivityOverheadKind kind); + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/init.cpp b/plugins/tensorboard-plugins/libkineto/src/init.cpp new file mode 100644 index 0000000000000000000000000000000000000000..4e1022485ac5d17b5af1e0676b6a4595a138e1b5 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/init.cpp @@ -0,0 +1,139 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include + +#include "ActivityProfilerProxy.h" +#include "Config.h" +#ifdef HAS_CUPTI +#include "CuptiCallbackApi.h" +#include "CuptiActivityApi.h" +#include "EventProfilerController.h" +#endif +#include "cupti_call.h" +#include "libkineto.h" + +#include "Logger.h" + +namespace KINETO_NAMESPACE { + +#ifdef HAS_CUPTI +static bool initialized = false; +static std::mutex initMutex; + +static void initProfilers( + CUpti_CallbackDomain /*domain*/, + CUpti_CallbackId /*cbid*/, + const CUpti_CallbackData* cbInfo) { + CUpti_ResourceData* d = (CUpti_ResourceData*)cbInfo; + CUcontext ctx = d->context; + + VLOG(0) << "CUDA Context created"; + std::lock_guard lock(initMutex); + + if (!initialized) { + libkineto::api().initProfilerIfRegistered(); + initialized = true; + VLOG(0) << "libkineto profilers activated"; + } + if (getenv("KINETO_DISABLE_EVENT_PROFILER") != nullptr) { + VLOG(0) << "Event profiler disabled via env var"; + } else { + ConfigLoader& config_loader = libkineto::api().configLoader(); + config_loader.initBaseConfig(); + EventProfilerController::start(ctx, config_loader); + } +} + +// Some models suffer from excessive instrumentation code gen +// on dynamic attach which can hang for more than 5+ seconds. +// If the workload was meant to be traced, preload the CUPTI +// to take the performance hit early on. +// https://docs.nvidia.com/cupti/r_main.html#r_overhead +static bool shouldPreloadCuptiInstrumentation() { + return getenv("PRELOAD_CUPTI_INSTRUMENTATION"); +} + +static void stopProfiler( + CUpti_CallbackDomain /*domain*/, + CUpti_CallbackId /*cbid*/, + const CUpti_CallbackData* cbInfo) { + CUpti_ResourceData* d = (CUpti_ResourceData*)cbInfo; + CUcontext ctx = d->context; + + LOG(INFO) << "CUDA Context destroyed"; + std::lock_guard lock(initMutex); + EventProfilerController::stop(ctx); +} +#endif // HAS_CUPTI + +} // namespace KINETO_NAMESPACE + +// Callback interface with CUPTI and library constructors +using namespace KINETO_NAMESPACE; +extern "C" { + +// Return true if no CUPTI errors occurred during init +bool libkineto_init(bool cpuOnly, bool logOnError) { + bool success = true; +#ifdef HAS_CUPTI + if (!cpuOnly) { + // libcupti will be lazily loaded on this call. + // If it is not available (e.g. CUDA is not installed), + // then this call will return an error and we just abort init. + auto& cbapi = CuptiCallbackApi::singleton(); + bool status = false; + + if (cbapi.initSuccess()){ + const CUpti_CallbackDomain domain = CUPTI_CB_DOMAIN_RESOURCE; + status = cbapi.registerCallback( + domain, CuptiCallbackApi::RESOURCE_CONTEXT_CREATED, initProfilers); + status = status && cbapi.registerCallback( + domain, CuptiCallbackApi::RESOURCE_CONTEXT_DESTROYED, stopProfiler); + + if (status) { + status = cbapi.enableCallback( + domain, CuptiCallbackApi::RESOURCE_CONTEXT_CREATED); + status = status && cbapi.enableCallback( + domain, CuptiCallbackApi::RESOURCE_CONTEXT_DESTROYED); + } + } + + if (!cbapi.initSuccess() || !status) { + success = false; + cpuOnly = true; + if (logOnError) { + CUPTI_CALL(cbapi.getCuptiStatus()); + LOG(WARNING) << "CUPTI initialization failed - " + << "CUDA profiler activities will be missing"; + LOG(INFO) << "If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to " + << "https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti"; + } + } + } + + if (shouldPreloadCuptiInstrumentation()) { + CuptiActivityApi::forceLoadCupti(); + } +#endif // HAS_CUPTI + + ConfigLoader& config_loader = libkineto::api().configLoader(); + libkineto::api().registerProfiler( + std::make_unique(cpuOnly, config_loader)); + + return success; +} + +// The cuda driver calls this function if the CUDA_INJECTION64_PATH environment +// variable is set +int InitializeInjection(void) { + LOG(INFO) << "Injection mode: Initializing libkineto"; + libkineto_init(false /*cpuOnly*/, true /*logOnError*/); + return 1; +} + +void suppressLibkinetoLogMessages() { + SET_LOG_SEVERITY_LEVEL(ERROR); +} + +} // extern C diff --git a/plugins/tensorboard-plugins/libkineto/src/libkineto_api.cpp b/plugins/tensorboard-plugins/libkineto/src/libkineto_api.cpp new file mode 100644 index 0000000000000000000000000000000000000000..9a622e4f5e5cfd54848cb8c6dc05b98da2fb6011 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/libkineto_api.cpp @@ -0,0 +1,41 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "libkineto.h" + +#include "ConfigLoader.h" +#include "ThreadUtil.h" + +namespace libkineto { + +LibkinetoApi& api() { + static LibkinetoApi instance(ConfigLoader::instance()); + return instance; +} + +void LibkinetoApi::initClientIfRegistered() { + if (client_) { + if (clientRegisterThread_ != threadId()) { + fprintf( + stderr, + "ERROR: External init callback must run in same thread as registerClient " + "(%d != %d)\n", + threadId(), + (int)clientRegisterThread_); + } else { + client_->init(); + } + } +} + +void LibkinetoApi::registerClient(ClientInterface* client) { + client_ = client; + if (client && activityProfiler_) { + // Can initialize straight away + client->init(); + } + // Assume here that the external init callback is *not* threadsafe + // and only call it if it's the same thread that called registerClient + clientRegisterThread_ = threadId(); +} + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/src/output_base.h b/plugins/tensorboard-plugins/libkineto/src/output_base.h new file mode 100644 index 0000000000000000000000000000000000000000..29d0d57768c91b8593f202cea51071a1affcd88d --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/output_base.h @@ -0,0 +1,104 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include + +#ifdef HAS_CUPTI +#include +#include "CuptiActivity.h" +#endif // HAS_CUPTI +#include "ActivityBuffers.h" +#include "GenericTraceActivity.h" +#include "ThreadUtil.h" +#include "TraceSpan.h" + +namespace KINETO_NAMESPACE { + class Config; + class GpuKernelActivity; + struct RuntimeActivity; +} + +namespace libkineto { + +using namespace KINETO_NAMESPACE; + +class ActivityLogger { + public: + + virtual ~ActivityLogger() = default; + + struct DeviceInfo { + DeviceInfo(int64_t id, const std::string& name, const std::string& label) : + id(id), name(name), label(label) {} + int64_t id; + const std::string name; + const std::string label; + }; + + struct ResourceInfo { + ResourceInfo( + int64_t deviceId, + int64_t id, + int64_t sortIndex, + const std::string& name) : + id(id), sortIndex(sortIndex), deviceId(deviceId), name(name) {} + int64_t id; + int64_t sortIndex; + int64_t deviceId; + const std::string name; + }; + + struct OverheadInfo { + explicit OverheadInfo(const std::string& name) : name(name) {} + const std::string name; + }; + + virtual void handleDeviceInfo( + const DeviceInfo& info, + uint64_t time) = 0; + + virtual void handleResourceInfo(const ResourceInfo& info, int64_t time) = 0; + + virtual void handleOverheadInfo(const OverheadInfo& info, int64_t time) = 0; + + virtual void handleTraceSpan(const TraceSpan& span) = 0; + + virtual void handleActivity( + const libkineto::ITraceActivity& activity) = 0; + virtual void handleGenericActivity( + const libkineto::GenericTraceActivity& activity) = 0; + +#ifdef HAS_CUPTI + virtual void handleGpuActivity( + const GpuActivity& activity) = 0; + virtual void handleGpuActivity( + const GpuActivity& activity) = 0; + virtual void handleGpuActivity( + const GpuActivity& activity) = 0; + virtual void handleGpuActivity( + const GpuActivity& activity) = 0; +#endif // HAS_CUPTI + + virtual void handleTraceStart( + const std::unordered_map& metadata) = 0; + + void handleTraceStart() { + handleTraceStart(std::unordered_map()); + } + + virtual void finalizeTrace( + const KINETO_NAMESPACE::Config& config, + std::unique_ptr buffers, + int64_t endTime, + std::unordered_map>& metadata) = 0; + + protected: + ActivityLogger() = default; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/output_csv.cpp b/plugins/tensorboard-plugins/libkineto/src/output_csv.cpp new file mode 100644 index 0000000000000000000000000000000000000000..e56c02293982745ed0c013b83bd04d9f42ea7305 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/output_csv.cpp @@ -0,0 +1,88 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "output_csv.h" + +#include +#include +#include + +#include +#include + +#include "Config.h" +#include "Logger.h" + +namespace KINETO_NAMESPACE { + +static void write_header( + std::ostream& out, + const std::vector& percentiles) { + out << "timestamp,delta_ms,device,event_name"; + for (int p : percentiles) { + out << ",p" << p; + } + out << ",total" << std::endl; +} + +void EventCSVLogger::update(const Config& config) { + eventNames_.clear(); + eventNames_.insert(config.eventNames().begin(), config.eventNames().end()); + eventNames_.insert(config.metricNames().begin(), config.metricNames().end()); + if (config.percentiles() != percentiles_) { + percentiles_ = config.percentiles(); + if (out_) { + write_header(*out_, percentiles_); + } + } +} + +void EventCSVLogger::handleSample(int device, const Sample& sample, bool from_new_version) { + using namespace std::chrono; + if (out_) { + auto now = system_clock::now(); + auto time = system_clock::to_time_t(now); + for (const Stat& s : sample.stats) { + if (eventNames_.find(s.name) == eventNames_.end()) { + continue; + } + *out_ << fmt::format("{:%Y-%m-%d %H:%M:%S}", fmt::localtime(time)) << ","; + *out_ << sample.deltaMsec << ","; + *out_ << device << ","; + *out_ << s.name; + for (const auto& p : s.percentileValues) { + *out_ << "," << p.second; + } + *out_ << "," << s.total << std::endl; + } + } +} + +void EventCSVFileLogger::update(const Config& config) { + if (config.eventLogFile() != filename_) { + if (of_.is_open()) { + of_.close(); + out_ = nullptr; + percentiles_.clear(); + } + filename_ = config.eventLogFile(); + if (!filename_.empty()) { + of_.open(filename_, std::ios::out | std::ios::trunc); + out_ = &of_; + } + } + EventCSVLogger::update(config); +} + +void EventCSVDbgLogger::update(const Config& config) { + if (out_ && config.verboseLogLevel() < 0) { + out_ = nullptr; + } else if (!out_ && config.verboseLogLevel() >= 0) { + out_ = &LIBKINETO_DBG_STREAM; + } + if (config.verboseLogLevel() >= 0) { + percentiles_.clear(); + EventCSVLogger::update(config); + } +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/output_csv.h b/plugins/tensorboard-plugins/libkineto/src/output_csv.h new file mode 100644 index 0000000000000000000000000000000000000000..bca29f4db99af8aedf031aed869ff2efd3df6155 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/output_csv.h @@ -0,0 +1,39 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once +#include "SampleListener.h" + +#include +#include +#include + +namespace KINETO_NAMESPACE { + +class EventCSVLogger : public SampleListener { + public: + void update(const Config& config) override; + void handleSample(int device, const Sample& sample, bool from_new_version) override; + + protected: + EventCSVLogger() : out_(nullptr) {} + + std::ostream* out_; + std::set eventNames_; + std::vector percentiles_; +}; + +class EventCSVFileLogger : public EventCSVLogger { + public: + void update(const Config& config) override; + + private: + std::ofstream of_; + std::string filename_; +}; + +class EventCSVDbgLogger : public EventCSVLogger { + public: + void update(const Config& config) override; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/output_json.cpp b/plugins/tensorboard-plugins/libkineto/src/output_json.cpp new file mode 100644 index 0000000000000000000000000000000000000000..0ef22339fad15d6a78e43d7fcb7761fbbc97333b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/output_json.cpp @@ -0,0 +1,583 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "output_json.h" + +#include +#include +#include +#include + +#include "Config.h" +#ifdef HAS_CUPTI +#include "CuptiActivity.h" +#include "CuptiActivity.tpp" +#include "CuptiActivityApi.h" +#include "CudaDeviceProperties.h" +#endif // HAS_CUPTI +#include "Demangle.h" +#include "TraceSpan.h" + +#include "Logger.h" + +using std::endl; +using namespace libkineto; + +namespace KINETO_NAMESPACE { + +static constexpr int kSchemaVersion = 1; +static constexpr char kFlowStart = 's'; +static constexpr char kFlowEnd = 'f'; + +#ifdef __linux__ +static constexpr char kDefaultLogFileFmt[] = + "/tmp/libkineto_activities_{}.json"; +#else +static constexpr char kDefaultLogFileFmt[] = "libkineto_activities_{}.json"; +#endif + +std::string& ChromeTraceLogger::sanitizeStrForJSON(std::string& value) { +// Replace all backslashes with forward slash because Windows paths causing JSONDecodeError. +#ifdef _WIN32 + std::replace(value.begin(), value.end(), '\\', '/'); +#endif + return value; +} + +void ChromeTraceLogger::metadataToJSON( + const std::unordered_map& metadata) { + for (const auto& kv : metadata) { + traceOf_ << fmt::format(R"JSON( + "{}": {},)JSON", kv.first, kv.second); + } +} + +void ChromeTraceLogger::handleTraceStart( + const std::unordered_map& metadata) { + traceOf_ << fmt::format(R"JSON( +{{ + "schemaVersion": {},)JSON", kSchemaVersion); + +#ifdef HAS_CUPTI + traceOf_ << fmt::format(R"JSON( + "deviceProperties": [{} + ],)JSON", devicePropertiesJson()); +#endif + + metadataToJSON(metadata); + traceOf_ << R"JSON( + "traceEvents": [)JSON"; +} + +static std::string defaultFileName() { + return fmt::format(kDefaultLogFileFmt, processId()); +} + +void ChromeTraceLogger::openTraceFile() { + traceOf_.open(fileName_, std::ofstream::out | std::ofstream::trunc); + if (!traceOf_) { + PLOG(ERROR) << "Failed to open '" << fileName_ << "'"; + } else { + LOG(INFO) << "Tracing to " << fileName_; + } +} + +ChromeTraceLogger::ChromeTraceLogger(const std::string& traceFileName) { + fileName_ = traceFileName.empty() ? defaultFileName() : traceFileName; + traceOf_.clear(std::ios_base::badbit); + openTraceFile(); +} + +static int64_t us(int64_t timestamp) { + // It's important that this conversion is the same here and in the CPU trace. + // No rounding! + return timestamp / 1000; +} + +void ChromeTraceLogger::handleDeviceInfo( + const DeviceInfo& info, + uint64_t time) { + if (!traceOf_) { + return; + } + + // M is for metadata + // process_name needs a pid and a name arg + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "name": "process_name", "ph": "M", "ts": {}, "pid": {}, "tid": 0, + "args": {{ + "name": "{}" + }} + }}, + {{ + "name": "process_labels", "ph": "M", "ts": {}, "pid": {}, "tid": 0, + "args": {{ + "labels": "{}" + }} + }}, + {{ + "name": "process_sort_index", "ph": "M", "ts": {}, "pid": {}, "tid": 0, + "args": {{ + "sort_index": {} + }} + }},)JSON", + time, info.id, + info.name, + time, info.id, + info.label, + time, info.id, + info.id < 8 ? info.id + 0x1000000ll : info.id); + // clang-format on +} + +void ChromeTraceLogger::handleResourceInfo( + const ResourceInfo& info, + int64_t time) { + if (!traceOf_) { + return; + } + + // M is for metadata + // thread_name needs a pid and a name arg + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "name": "thread_name", "ph": "M", "ts": {}, "pid": {}, "tid": {}, + "args": {{ + "name": "{}" + }} + }}, + {{ + "name": "thread_sort_index", "ph": "M", "ts": {}, "pid": {}, "tid": {}, + "args": {{ + "sort_index": {} + }} + }},)JSON", + time, info.deviceId, info.id, + info.name, + time, info.deviceId, info.id, + info.sortIndex); + // clang-format on +} + +void ChromeTraceLogger::handleOverheadInfo( + const OverheadInfo& info, + int64_t time) { + if (!traceOf_) { + return; + } + + // TOOD: reserve pid = -1 for overhead but we need to rethink how to scale this for + // other metadata + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "name": "process_name", "ph": "M", "ts": {}, "pid": -1, "tid": 0, + "args": {{ + "name": "{}" + }} + }}, + {{ + "name": "process_sort_index", "ph": "M", "ts": {}, "pid": -1, "tid": 0, + "args": {{ + "sort_index": {} + }} + }},)JSON", + time, + info.name, + time, + 0x100000All); + // clang-format on +} + +void ChromeTraceLogger::handleTraceSpan(const TraceSpan& span) { + if (!traceOf_) { + return; + } + + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "X", "cat": "Trace", "ts": {}, "dur": {}, + "pid": "Spans", "tid": "{}", + "name": "{}{} ({})", + "args": {{ + "Op count": {} + }} + }}, + {{ + "name": "process_sort_index", "ph": "M", "ts": {}, + "pid": "Spans", "tid": 0, + "args": {{ + "sort_index": {} + }} + }},)JSON", + span.startTime, span.endTime - span.startTime, + span.name, + span.prefix, span.name, span.iteration, + span.opCount, + span.startTime, + // Large sort index to appear at the bottom + 0x20000000ll); + // clang-format on + + addIterationMarker(span); +} + +void ChromeTraceLogger::addIterationMarker(const TraceSpan& span) { + if (!traceOf_) { + return; + } + + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "name": "Iteration Start: {}", "ph": "i", "s": "g", + "pid": "Traces", "tid": "Trace {}", "ts": {} + }},)JSON", + span.name, + span.name, span.startTime); + // clang-format on +} + +static std::string traceActivityJson(const ITraceActivity& activity) { + // clang-format off + int64_t ts = activity.timestamp(); + int64_t duration = activity.duration(); + if (activity.type() == ActivityType::GPU_USER_ANNOTATION) { + // The GPU user annotations start at the same time as the + // first associated GPU activity. Since they appear later + // in the trace file, this causes a visualization issue in Chrome. + // Make it start one us earlier. + ts--; + duration++; // Still need it to end at the orginal point + } + return fmt::format(R"JSON( + "name": "{}", "pid": {}, "tid": {}, + "ts": {}, "dur": {})JSON", + activity.name(), activity.deviceId(), activity.resourceId(), + ts, duration); + // clang-format on +} + +void ChromeTraceLogger::handleGenericInstantEvent( + const libkineto::ITraceActivity& op) { + if (!traceOf_) { + return; + } + + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "i", "s": "t", "name": "{}", + "pid": {}, "tid": {}, + "ts": {}, + "args": {{ + {} + }} + }},)JSON", + op.name(), op.deviceId(), op.resourceId(), + op.timestamp(), op.metadataJson()); +} + +void ChromeTraceLogger::handleActivity( + const libkineto::ITraceActivity& op) { + if (!traceOf_) { + return; + } + + if (op.type() == ActivityType::CPU_INSTANT_EVENT) { + handleGenericInstantEvent(op); + return; + } + + const std::string op_metadata = op.metadataJson(); + std::string separator = ""; + if (op_metadata.find_first_not_of(" \t\n") != std::string::npos) { + separator = ",\n "; + } + std::string span = ""; + if (op.traceSpan()) { + span = fmt::format(R"JSON( + "Trace name": "{}", "Trace iteration": {},)JSON", + op.traceSpan()->name, + op.traceSpan()->iteration); + } + + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "X", "cat": "{}", {}, + "args": {{{} + "External id": {}{}{} + }} + }},)JSON", + toString(op.type()), traceActivityJson(op), + // args + span, + op.correlationId(), separator, op_metadata); + // clang-format on + if (op.flowId() > 0) { + handleGenericLink(op); + } +} + +void ChromeTraceLogger::handleGenericActivity( + const libkineto::GenericTraceActivity& op) { + handleActivity(op); +} + +void ChromeTraceLogger::handleGenericLink(const ITraceActivity& act) { + static struct { + int type; + char longName[24]; + char shortName[16]; + } flow_names[] = { + {kLinkFwdBwd, "forward_backward", "fwd_bwd"}, + {kLinkAsyncCpuGpu, "async_cpu_to_gpu", "async_gpu"} + }; + for (auto& flow : flow_names) { + if (act.flowType() == flow.type) { + // Link the activities via flow ID in source and destination. + // The source node must return true from flowStart() + // and the destination node false. + if (act.flowStart()) { + handleLink(kFlowStart, act, act.flowId(), flow.longName, flow.shortName); + } else { + handleLink(kFlowEnd, act, act.flowId(), flow.longName, flow.shortName); + } + return; + } + } + LOG(ERROR) << "Unknown flow type: " << act.flowType(); +} + +void ChromeTraceLogger::handleLink( + char type, + const ITraceActivity& e, + int64_t id, + const std::string& cat, + const std::string& name) { + if (!traceOf_) { + return; + } + + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "{}", "id": {}, "pid": {}, "tid": {}, "ts": {}, + "cat": "{}", "name": "{}", "bp": "e" + }},)JSON", + type, id, e.deviceId(), e.resourceId(), e.timestamp(), cat, name); + // clang-format on +} + +#ifdef HAS_CUPTI +// GPU side kernel activity +void ChromeTraceLogger::handleGpuActivity( + const GpuActivity& activity) { + if (!traceOf_) { + return; + } + const CUpti_ActivityKernel4* kernel = &activity.raw(); + constexpr int threads_per_warp = 32; + float blocks_per_sm = -1.0; + float warps_per_sm = -1.0; + int sm_count = smCount(kernel->deviceId); + if (sm_count) { + blocks_per_sm = + (kernel->gridX * kernel->gridY * kernel->gridZ) / (float) sm_count; + warps_per_sm = + blocks_per_sm * (kernel->blockX * kernel->blockY * kernel->blockZ) + / threads_per_warp; + } + + // Calculate occupancy + float occupancy = KINETO_NAMESPACE::kernelOccupancy( + kernel->deviceId, + kernel->registersPerThread, + kernel->staticSharedMemory, + kernel->dynamicSharedMemory, + kernel->blockX, + kernel->blockY, + kernel->blockZ, + blocks_per_sm); + + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "X", "cat": "Kernel", {}, + "args": {{ + "queued": {}, "device": {}, "context": {}, + "stream": {}, "correlation": {}, + "registers per thread": {}, + "shared memory": {}, + "blocks per SM": {}, + "warps per SM": {}, + "grid": [{}, {}, {}], + "block": [{}, {}, {}], + "est. achieved occupancy %": {} + }} + }},)JSON", + traceActivityJson(activity), + // args + us(kernel->queued), kernel->deviceId, kernel->contextId, + kernel->streamId, kernel->correlationId, + kernel->registersPerThread, + kernel->staticSharedMemory + kernel->dynamicSharedMemory, + blocks_per_sm, + warps_per_sm, + kernel->gridX, kernel->gridY, kernel->gridZ, + kernel->blockX, kernel->blockY, kernel->blockZ, + (int) (0.5 + occupancy * 100.0)); + // clang-format on + + auto to_id = activity.correlationId(); + handleLink(kFlowEnd, activity, to_id, "async_cpu_to_gpu", "async_gpu"); +} + +static std::string bandwidth(uint64_t bytes, uint64_t duration) { + return duration == 0 ? "\"N/A\"" : fmt::format("{}", bytes * 1.0 / duration); +} + +// GPU side memcpy activity +void ChromeTraceLogger::handleGpuActivity( + const GpuActivity& activity) { + if (!traceOf_) { + return; + } + const CUpti_ActivityMemcpy& memcpy = activity.raw(); + VLOG(2) << memcpy.correlationId << ": MEMCPY"; + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "X", "cat": "Memcpy", {}, + "args": {{ + "device": {}, "context": {}, + "stream": {}, "correlation": {}, + "bytes": {}, "memory bandwidth (GB/s)": {} + }} + }},)JSON", + traceActivityJson(activity), + // args + memcpy.deviceId, memcpy.contextId, + memcpy.streamId, memcpy.correlationId, + memcpy.bytes, bandwidth(memcpy.bytes, memcpy.end - memcpy.start)); + // clang-format on + + int64_t to_id = activity.correlationId(); + handleLink(kFlowEnd, activity, to_id, "async_cpu_to_gpu", "async_gpu"); +} + +// GPU side memcpy activity +void ChromeTraceLogger::handleGpuActivity( + const GpuActivity& activity) { + if (!traceOf_) { + return; + } + const CUpti_ActivityMemcpy2& memcpy = activity.raw(); + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "X", "cat": "Memcpy", {}, + "args": {{ + "fromDevice": {}, "inDevice": {}, "toDevice": {}, + "fromContext": {}, "inContext": {}, "toContext": {}, + "stream": {}, "correlation": {}, + "bytes": {}, "memory bandwidth (GB/s)": {} + }} + }},)JSON", + traceActivityJson(activity), + // args + memcpy.srcDeviceId, memcpy.deviceId, memcpy.dstDeviceId, + memcpy.srcContextId, memcpy.contextId, memcpy.dstContextId, + memcpy.streamId, memcpy.correlationId, + memcpy.bytes, bandwidth(memcpy.bytes, memcpy.end - memcpy.start)); + // clang-format on + + int64_t to_id = activity.correlationId(); + handleLink(kFlowEnd, activity, to_id, "async_cpu_to_gpu", "async_gpu"); +} + +void ChromeTraceLogger::handleGpuActivity( + const GpuActivity& activity) { + if (!traceOf_) { + return; + } + const CUpti_ActivityMemset& memset = activity.raw(); + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "ph": "X", "cat": "Memset", {}, + "args": {{ + "device": {}, "context": {}, + "stream": {}, "correlation": {}, + "bytes": {}, "memory bandwidth (GB/s)": {} + }} + }},)JSON", + traceActivityJson(activity), + // args + memset.deviceId, memset.contextId, + memset.streamId, memset.correlationId, + memset.bytes, bandwidth(memset.bytes, memset.end - memset.start)); + // clang-format on + + int64_t to_id = activity.correlationId(); + handleLink(kFlowEnd, activity, to_id, "async_cpu_to_gpu", "async_gpu"); +} +#endif // HAS_CUPTI + +void ChromeTraceLogger::finalizeTrace( + const Config& /*unused*/, + std::unique_ptr /*unused*/, + int64_t endTime, + std::unordered_map>& metadata) { + if (!traceOf_) { + LOG(ERROR) << "Failed to write to log file!"; + return; + } + LOG(INFO) << "Chrome Trace written to " << fileName_; + // clang-format off + traceOf_ << fmt::format(R"JSON( + {{ + "name": "Record Window End", "ph": "i", "s": "g", + "pid": "", "tid": "", "ts": {} + }} + ],)JSON", + endTime); + +#if !USE_GOOGLE_LOG + std::unordered_map PreparedMetadata; + for (const auto& kv : metadata) { + // Skip empty log buckets, ex. skip ERROR if its empty. + if (!kv.second.empty()) { + std::string value = "["; + // Ex. Each metadata from logger is a list of strings, expressed in JSON as + // "ERROR": ["Error 1", "Error 2"], + // "WARNING": ["Warning 1", "Warning 2", "Warning 3"], + // ... + int mdv_count = kv.second.size(); + for (const auto& v : kv.second) { + value.append("\"" + v + "\""); + if(mdv_count > 1) { + value.append(","); + mdv_count--; + } + } + value.append("]"); + PreparedMetadata[kv.first] = sanitizeStrForJSON(value); + } + } + metadataToJSON(PreparedMetadata); +#endif // !USE_GOOGLE_LOG + + // Putting this here because the last entry MUST not end with a comma. + traceOf_ << fmt::format(R"JSON( + "traceName": "{}" +}})JSON", sanitizeStrForJSON(fileName_)); + // clang-format on + + traceOf_.close(); +} + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/output_json.h b/plugins/tensorboard-plugins/libkineto/src/output_json.h new file mode 100644 index 0000000000000000000000000000000000000000..5a8a81e4a9fdeef09b0e9ace59b964d5ab99b7ad --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/output_json.h @@ -0,0 +1,91 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include +#include + +#ifdef HAS_CUPTI +#include +#endif +#include "GenericTraceActivity.h" +#include "output_base.h" + +namespace KINETO_NAMESPACE { + // Previous declaration of TraceSpan is struct. Must match the same here. + struct TraceSpan; +} + +namespace KINETO_NAMESPACE { + +class Config; + +class ChromeTraceLogger : public libkineto::ActivityLogger { + public: + explicit ChromeTraceLogger(const std::string& traceFileName); + + // Note: the caller of these functions should handle concurrency + // i.e., we these functions are not thread-safe + void handleDeviceInfo( + const DeviceInfo& info, + uint64_t time) override; + + void handleOverheadInfo(const OverheadInfo& info, int64_t time) override; + + void handleResourceInfo(const ResourceInfo& info, int64_t time) override; + + void handleTraceSpan(const TraceSpan& span) override; + + void handleActivity(const ITraceActivity& activity) override; + void handleGenericActivity(const GenericTraceActivity& activity) override; + +#ifdef HAS_CUPTI + void handleGpuActivity(const GpuActivity& activity) override; + void handleGpuActivity(const GpuActivity& activity) override; + void handleGpuActivity(const GpuActivity& activity) override; + void handleGpuActivity(const GpuActivity& activity) override; +#endif // HAS_CUPTI + + void handleTraceStart( + const std::unordered_map& metadata) override; + + void finalizeTrace( + const Config& config, + std::unique_ptr buffers, + int64_t endTime, + std::unordered_map>& metadata) override; + + std::string traceFileName() const { + return fileName_; + } + + private: + + // Create a flow event (arrow) + void handleLink( + char type, + const ITraceActivity& e, + int64_t id, + const std::string& cat, + const std::string& name); + + void addIterationMarker(const TraceSpan& span); + + void openTraceFile(); + + void handleGenericInstantEvent(const ITraceActivity& op); + + void handleGenericLink(const ITraceActivity& activity); + + void metadataToJSON(const std::unordered_map& metadata); + + std::string& sanitizeStrForJSON(std::string& value); + + std::string fileName_; + std::ofstream traceOf_; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/src/output_membuf.h b/plugins/tensorboard-plugins/libkineto/src/output_membuf.h new file mode 100644 index 0000000000000000000000000000000000000000..ef6aadeb65728e0e05e454f98b32ccecca229cf4 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/src/output_membuf.h @@ -0,0 +1,130 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include +#include + +#ifdef HAS_CUPTI +#include +#endif + +#include "Config.h" +#include "GenericTraceActivity.h" +#ifdef HAS_CUPTI +#include "CuptiActivity.h" +#include "CuptiActivity.tpp" +#endif // HAS_CUPTI +#include "output_base.h" + +namespace KINETO_NAMESPACE { + +class Config; + +class MemoryTraceLogger : public ActivityLogger { + public: + MemoryTraceLogger(const Config& config) : config_(config.clone()) { + activities_.reserve(100000); + } + + // Note: the caller of these functions should handle concurrency + // i.e., these functions are not thread-safe + void handleDeviceInfo( + const DeviceInfo& info, + uint64_t time) override { + deviceInfoList_.emplace_back(info, time); + } + + void handleResourceInfo(const ResourceInfo& info, int64_t time) override { + resourceInfoList_.emplace_back(info, time); + } + + void handleOverheadInfo(const OverheadInfo& info, int64_t time) override {} + + void handleTraceSpan(const TraceSpan& span) override { + // Handled separately + } + + template + void addActivityWrapper(const T& act) { + wrappers_.push_back(std::make_unique(act)); + activities_.push_back(wrappers_.back().get()); + } + + // Just add the pointer to the list - ownership of the underlying + // objects must be transferred in ActivityBuffers via finalizeTrace + void handleActivity(const ITraceActivity& activity) override { + activities_.push_back(&activity); + } + void handleGenericActivity(const GenericTraceActivity& activity) override { + addActivityWrapper(activity); + } + +#ifdef HAS_CUPTI + void handleGpuActivity(const GpuActivity& activity) override { + addActivityWrapper(activity); + } + void handleGpuActivity(const GpuActivity& activity) override { + addActivityWrapper(activity); + } + void handleGpuActivity(const GpuActivity& activity) override { + addActivityWrapper(activity); + } + void handleGpuActivity(const GpuActivity& activity) override { + addActivityWrapper(activity); + } +#endif // HAS_CUPTI + + void handleTraceStart( + const std::unordered_map& metadata) override { + metadata_ = metadata; + } + + void finalizeTrace( + const Config& config, + std::unique_ptr buffers, + int64_t endTime, + std::unordered_map>& metadata) override { + buffers_ = std::move(buffers); + endTime_ = endTime; + } + + const std::vector* traceActivities() { + return &activities_; + } + + void log(ActivityLogger& logger) { + logger.handleTraceStart(metadata_); + for (auto& activity : activities_) { + activity->log(logger); + } + for (auto& p : deviceInfoList_) { + logger.handleDeviceInfo(p.first, p.second); + } + for (auto& p : resourceInfoList_) { + logger.handleResourceInfo(p.first, p.second); + } + for (auto& cpu_trace_buffer : buffers_->cpu) { + logger.handleTraceSpan(cpu_trace_buffer->span); + } + // Hold on to the buffers + logger.finalizeTrace(*config_, nullptr, endTime_, loggerMetadata_); + } + + private: + + std::unique_ptr config_; + // Optimization: Remove unique_ptr by keeping separate vector per type + std::vector activities_; + std::vector> wrappers_; + std::vector> deviceInfoList_; + std::vector> resourceInfoList_; + std::unique_ptr buffers_; + std::unordered_map metadata_; + std::unordered_map> loggerMetadata_; + int64_t endTime_{0}; +}; + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/test/CMakeLists.txt b/plugins/tensorboard-plugins/libkineto/test/CMakeLists.txt new file mode 100644 index 0000000000000000000000000000000000000000..ca54460b36cd4ade93918c8512f1309b48552e65 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CMakeLists.txt @@ -0,0 +1,3 @@ +cmake_minimum_required(VERSION 3.5 FATAL_ERROR) + +# TODO diff --git a/plugins/tensorboard-plugins/libkineto/test/ConfigTest.cpp b/plugins/tensorboard-plugins/libkineto/test/ConfigTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..16bc86e751cefdbee1d48aeb79fc849b7d151a18 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/ConfigTest.cpp @@ -0,0 +1,315 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "include/Config.h" + +#include +#include +#include +#include + +using namespace std::chrono; +using namespace KINETO_NAMESPACE; + +TEST(ParseTest, Whitespace) { + Config cfg; + // Check that various types of whitespace is ignored + EXPECT_TRUE(cfg.parse("")); + EXPECT_TRUE(cfg.parse(" ")); + EXPECT_TRUE(cfg.parse("\t")); + EXPECT_TRUE(cfg.parse("\n")); + EXPECT_TRUE(cfg.parse(" ")); + EXPECT_TRUE(cfg.parse("\t \n \t\t\n\n")); + // Only the above characters are supported + EXPECT_FALSE(cfg.parse("\r\n")); +} + +TEST(ParseTest, Comment) { + Config cfg; + // Anything following a '#' should be ignored, up to a newline + EXPECT_TRUE(cfg.parse("# comment")); + EXPECT_TRUE(cfg.parse(" # ~!@#$")); + EXPECT_TRUE(cfg.parse("\t#abc")); + EXPECT_TRUE(cfg.parse("###\n##")); + EXPECT_TRUE(cfg.parse("EVENTS=util ##ok")); + EXPECT_TRUE(cfg.parse("EVENTS=util ## EVENTS=instruction")); + // Whatever appears before the comment must be valid format + EXPECT_FALSE(cfg.parse("util ## not ok")); + EXPECT_FALSE(cfg.parse("## ok \n blah # not OK")); + // Check that a comment does not affect config parsing + EXPECT_TRUE(cfg.parse("SAMPLE_PERIOD_MSECS = 1 # Sample every millisecond")); + EXPECT_EQ(cfg.samplePeriod(), milliseconds(1)); +} + +TEST(ParseTest, Format) { + Config cfg; + // The basic format is just "name = value". + // Where both value and name can be almost anything. + // Leading and trailing whitespace should be removed + // for both 'name' and 'value', but internal whitespace is not. + EXPECT_FALSE(cfg.parse("events")); + EXPECT_TRUE(cfg.parse("events=")); + EXPECT_FALSE(cfg.parse("=events=")); + EXPECT_TRUE(cfg.parse("events=1,2,3")); + // Only one setting per line + EXPECT_FALSE(cfg.parse("events = 1,2,3 ; metrics = 4,5,6")); + // Names are case sensitive + EXPECT_TRUE(cfg.parse("EVENTS = 1,2,3 \n metrics = 4,5,6")); + EXPECT_EQ(cfg.eventNames(), std::set({"1", "2", "3"})); + EXPECT_EQ(cfg.metricNames().size(), 0); + // Leading and trailing whitespace removed for event and metric names, + // but not internal. + EXPECT_TRUE( + cfg.parse("EVENTS = 1, 2, 3 \n \tMETRICS\t = \t4,\t5\t,\ts i x ")); + EXPECT_EQ(cfg.eventNames(), std::set({"1", "2", "3"})); + EXPECT_EQ(cfg.metricNames(), std::set({"4", "5", "s i x"})); +} + +TEST(ParseTest, DefaultActivityTypes) { + Config cfg; + cfg.validate(std::chrono::system_clock::now()); + auto all_activities = activityTypes(); + // TODO: introduce optional activities + EXPECT_EQ(cfg.selectedActivityTypes(), + std::set(all_activities.begin(), all_activities.end() - 1)); +} + +TEST(ParseTest, ActivityTypes) { + Config cfg; + EXPECT_FALSE(cfg.parse("ACTIVITY_TYPES")); + EXPECT_TRUE(cfg.parse("ACTIVITY_TYPES=")); + EXPECT_FALSE(cfg.parse("=ACTIVITY_TYPES=")); + + EXPECT_EQ(cfg.selectedActivityTypes(), + std::set({ActivityType::CPU_OP, + ActivityType::CPU_INSTANT_EVENT, + ActivityType::PYTHON_FUNCTION, + ActivityType::USER_ANNOTATION, + ActivityType::GPU_USER_ANNOTATION, + ActivityType::GPU_MEMCPY, + ActivityType::GPU_MEMSET, + ActivityType::CONCURRENT_KERNEL, + ActivityType::EXTERNAL_CORRELATION, + ActivityType::GLOW_RUNTIME, + ActivityType::CUDA_RUNTIME, + ActivityType::CUDA_PROFILER_RANGE})); + + Config cfg2; + EXPECT_TRUE(cfg2.parse("ACTIVITY_TYPES=gpu_memcpy,gpu_MeMsEt,kernel")); + EXPECT_EQ(cfg2.selectedActivityTypes(), + std::set({ActivityType::GPU_MEMCPY, + ActivityType::GPU_MEMSET, + ActivityType::CONCURRENT_KERNEL})); + + EXPECT_TRUE(cfg2.parse("ACTIVITY_TYPES = cuda_Runtime,")); + EXPECT_EQ(cfg2.selectedActivityTypes(), + std::set({ActivityType::CUDA_RUNTIME})); + + // Should throw an exception because incorrect activity name + EXPECT_FALSE(cfg2.parse("ACTIVITY_TYPES = memcopy,cuda_runtime")); + + EXPECT_TRUE(cfg2.parse("ACTIVITY_TYPES = cpu_op")); + EXPECT_EQ(cfg2.selectedActivityTypes(), + std::set({ActivityType::CPU_OP})); +} + +TEST(ParseTest, SamplePeriod) { + Config cfg; + EXPECT_TRUE(cfg.parse("SAMPLE_PERIOD_MSECS=10")); + EXPECT_EQ(cfg.samplePeriod(), milliseconds(10)); + EXPECT_TRUE(cfg.parse("SAMPLE_PERIOD_MSECS=0")); + cfg.validate(std::chrono::system_clock::now()); + // 0 should be adjustd up to 1 + EXPECT_EQ(cfg.samplePeriod(), milliseconds(1)); + // Negative and non-int values should fail + EXPECT_FALSE(cfg.parse("SAMPLE_PERIOD_MSECS=-10")); + EXPECT_FALSE(cfg.parse("SAMPLE_PERIOD_MSECS=1.5")); + EXPECT_FALSE(cfg.parse("SAMPLE_PERIOD_MSECS=")); + EXPECT_FALSE(cfg.parse("SAMPLE_PERIOD_MSECS=string")); + EXPECT_EQ(cfg.samplePeriod(), milliseconds(1)); +} + +TEST(ParseTest, MultiplexPeriod) { + Config cfg; + auto now = std::chrono::system_clock::now(); + + EXPECT_TRUE(cfg.parse("SAMPLE_PERIOD_MSECS=100\nMULTIPLEX_PERIOD_MSECS=100")); + EXPECT_EQ(cfg.multiplexPeriod(), milliseconds(100)); + EXPECT_TRUE(cfg.parse("MULTIPLEX_PERIOD_MSECS = 0")); + cfg.validate(now); + // Adjusted to match sample period + EXPECT_EQ(cfg.multiplexPeriod(), milliseconds(100)); + EXPECT_TRUE(cfg.parse("MULTIPLEX_PERIOD_MSECS \t= \t 750 \n")); + cfg.validate(now); + // Adjusted to match multiple of sample period + EXPECT_EQ(cfg.multiplexPeriod(), milliseconds(800)); + EXPECT_FALSE(cfg.parse("MULTIPLEX_PERIOD_MSECS=-10")); + EXPECT_FALSE(cfg.parse("MULTIPLEX_PERIOD_MSECS=1.5")); + EXPECT_FALSE(cfg.parse("MULTIPLEX_PERIOD_MSECS=")); + EXPECT_FALSE(cfg.parse("MULTIPLEX_PERIOD_MSECS=string")); + // Previous value not affected + EXPECT_EQ(cfg.multiplexPeriod(), milliseconds(800)); +} + +TEST(ParseTest, ReportPeriod) { + Config cfg; + EXPECT_TRUE(cfg.parse("REPORT_PERIOD_SECS=1")); + EXPECT_EQ(cfg.reportPeriod(), seconds(1)); + // Whitespace + EXPECT_TRUE(cfg.parse("REPORT_PERIOD_SECS = \t100")); + EXPECT_EQ(cfg.reportPeriod(), seconds(100)); + // Invalid types + EXPECT_FALSE(cfg.parse("REPORT_PERIOD_SECS=-1")); + EXPECT_EQ(cfg.reportPeriod(), seconds(100)); +} + +TEST(ParseTest, SamplesPerReport) { + Config cfg; + auto now = std::chrono::system_clock::now(); + + EXPECT_TRUE(cfg.parse(R"( + SAMPLE_PERIOD_MSECS = 1000 + REPORT_PERIOD_SECS = 1 + SAMPLES_PER_REPORT = 10)")); + cfg.validate(now); + // Adjusted down to one sample per report + EXPECT_EQ(cfg.samplesPerReport(), 1); + EXPECT_TRUE(cfg.parse(R"( + SAMPLE_PERIOD_MSECS = 1000 + REPORT_PERIOD_SECS = 10 + SAMPLES_PER_REPORT = 10)")); + cfg.validate(now); + // No adjustment needed + EXPECT_EQ(cfg.samplesPerReport(), 10); + EXPECT_TRUE(cfg.parse(R"( + SAMPLE_PERIOD_MSECS = 1000 + REPORT_PERIOD_SECS = 2 + SAMPLES_PER_REPORT = 10)")); + cfg.validate(now); + // Adjusted to 2 samples per report + EXPECT_EQ(cfg.samplesPerReport(), 2); + EXPECT_TRUE(cfg.parse(R"( + SAMPLE_PERIOD_MSECS = 200 + REPORT_PERIOD_SECS = 2 + SAMPLES_PER_REPORT = 10)")); + cfg.validate(now); + // No adjustment needed + EXPECT_EQ(cfg.samplesPerReport(), 10); + EXPECT_TRUE(cfg.parse("SAMPLES_PER_REPORT=0")); + cfg.validate(now); + // Adjusted up to 1 + EXPECT_EQ(cfg.samplesPerReport(), 1); + // Invalid value types + EXPECT_FALSE(cfg.parse("SAMPLES_PER_REPORT=-10")); + EXPECT_FALSE(cfg.parse("SAMPLES_PER_REPORT=1.5")); + EXPECT_EQ(cfg.samplesPerReport(), 1); + + EXPECT_TRUE(cfg.parse(R"( + SAMPLE_PERIOD_MSECS=1000 + MULTIPLEX_PERIOD_MSECS=500 # Must be a multiple of sample period + REPORT_PERIOD_SECS=0 # Must be non-zero multiple of multiplex period + SAMPLES_PER_REPORT=5 # Max report period / multiplex period)")); + cfg.validate(now); + // Multiple adjustments + EXPECT_EQ(cfg.samplePeriod(), milliseconds(1000)); + EXPECT_EQ(cfg.multiplexPeriod(), milliseconds(1000)); + EXPECT_EQ(cfg.reportPeriod(), seconds(1)); + EXPECT_EQ(cfg.samplesPerReport(), 1); +} + +TEST(ParseTest, EnableSigUsr2) { + Config cfg; + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=yes")); + EXPECT_TRUE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=no")); + EXPECT_FALSE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=YES")); + EXPECT_TRUE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=NO")); + EXPECT_FALSE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=Y")); + EXPECT_TRUE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=N")); + EXPECT_FALSE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=T")); + EXPECT_TRUE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=F")); + EXPECT_FALSE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=true")); + EXPECT_TRUE(cfg.sigUsr2Enabled()); + EXPECT_TRUE(cfg.parse("ENABLE_SIGUSR2=false")); + EXPECT_FALSE(cfg.sigUsr2Enabled()); + EXPECT_FALSE(cfg.parse("ENABLE_SIGUSR2= ")); + EXPECT_FALSE(cfg.parse("ENABLE_SIGUSR2=2")); + EXPECT_FALSE(cfg.parse("ENABLE_SIGUSR2=-1")); + EXPECT_FALSE(cfg.parse("ENABLE_SIGUSR2=yep")); +} + +TEST(ParseTest, DeviceMask) { + Config cfg; + // Single device + EXPECT_TRUE(cfg.parse("EVENTS_ENABLED_DEVICES = 0")); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(0)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(1)); + + // Two devices, internal whitespace + EXPECT_TRUE(cfg.parse("EVENTS_ENABLED_DEVICES = 1, 2")); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(0)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(1)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(2)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(3)); + + // Three devices, check that previous devices are ignored + EXPECT_TRUE(cfg.parse("EVENTS_ENABLED_DEVICES = 0, 2,4")); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(0)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(1)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(2)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(3)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(4)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(5)); + + // Repeated numbers have no effect + EXPECT_TRUE(cfg.parse("EVENTS_ENABLED_DEVICES = 0,1,1,1,2,3,2,1,3,7,7,3")); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(0)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(1)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(2)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(3)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(4)); + EXPECT_FALSE(cfg.eventProfilerEnabledForDevice(6)); + EXPECT_TRUE(cfg.eventProfilerEnabledForDevice(7)); + + // 8 is larger than the max allowed + EXPECT_FALSE(cfg.parse("EVENTS_ENABLED_DEVICES = 3,8")); + + // 300 cannot be held in an uint8_t + EXPECT_FALSE(cfg.parse("EVENTS_ENABLED_DEVICES = 300")); + + // Various illegal cases + EXPECT_FALSE(cfg.parse("EVENTS_ENABLED_DEVICES = 0,1,two,three")); + EXPECT_FALSE(cfg.parse("EVENTS_ENABLED_DEVICES = 0,1,,2")); + EXPECT_FALSE(cfg.parse("EVENTS_ENABLED_DEVICES = -1")); + EXPECT_FALSE(cfg.parse("EVENTS_ENABLED_DEVICES = 1.0")); +} + +TEST(ParseTest, RequestTime) { + Config cfg; + system_clock::time_point now = system_clock::now(); + int64_t tgood_ms = + duration_cast(now.time_since_epoch()).count(); + EXPECT_TRUE(cfg.parse(fmt::format("REQUEST_TIMESTAMP = {}", tgood_ms))); + + tgood_ms = duration_cast((now - seconds(5)).time_since_epoch()) + .count(); + EXPECT_TRUE(cfg.parse(fmt::format("REQUEST_TIMESTAMP = {}", tgood_ms))); + + int64_t tbad_ms = + duration_cast((now - seconds(20)).time_since_epoch()) + .count(); + EXPECT_FALSE(cfg.parse(fmt::format("REQUEST_TIMESTAMP = {}", tbad_ms))); + + EXPECT_FALSE(cfg.parse("REQUEST_TIMESTAMP = 0")); + EXPECT_FALSE(cfg.parse("REQUEST_TIMESTAMP = -1")); + + tbad_ms = duration_cast((now + seconds(10)).time_since_epoch()) + .count(); + EXPECT_FALSE(cfg.parse(fmt::format("REQUEST_TIMESTAMP = {}", tbad_ms))); +} diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiActivityProfilerTest.cpp b/plugins/tensorboard-plugins/libkineto/test/CuptiActivityProfilerTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..6e67980ee31a3386580974033201b7acae75d22b --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiActivityProfilerTest.cpp @@ -0,0 +1,629 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include +#include +#include +#include +#include + +#ifdef __linux__ +#include +#include +#include +#endif + +#include "include/libkineto.h" +#include "include/Config.h" +#include "src/CuptiActivityProfiler.h" +#include "src/ActivityTrace.h" +#include "src/CuptiActivityApi.h" +#include "src/output_base.h" +#include "src/output_json.h" +#include "src/output_membuf.h" + +#include "src/Logger.h" +#include "test/MockActivitySubProfiler.h" + +using namespace std::chrono; +using namespace KINETO_NAMESPACE; + +#define CUDA_LAUNCH_KERNEL CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000 +#define CUDA_MEMCPY CUPTI_RUNTIME_TRACE_CBID_cudaMemcpy_v3020 + +namespace { +const TraceSpan& defaultTraceSpan() { + static TraceSpan span(0, 0, "Unknown", ""); + return span; +} +} + +// Provides ability to easily create a few test CPU-side ops +struct MockCpuActivityBuffer : public CpuTraceBuffer { + MockCpuActivityBuffer(int64_t startTime, int64_t endTime) { + span = TraceSpan(startTime, endTime,"Test trace"); + gpuOpCount = 0; + } + + void addOp(std::string name, int64_t startTime, int64_t endTime, int64_t correlation) { + GenericTraceActivity op(span, ActivityType::CPU_OP, name); + op.startTime = startTime; + op.endTime = endTime; + op.resource = systemThreadId(); + op.id = correlation; + activities.push_back(std::move(op)); + span.opCount++; + } +}; + +// Provides ability to easily create a few test CUPTI ops +struct MockCuptiActivityBuffer { + void addCorrelationActivity(int64_t correlation, CUpti_ExternalCorrelationKind externalKind, int64_t externalId) { + auto& act = *(CUpti_ActivityExternalCorrelation*) malloc(sizeof(CUpti_ActivityExternalCorrelation)); + act.kind = CUPTI_ACTIVITY_KIND_EXTERNAL_CORRELATION; + act.externalId = externalId; + act.externalKind = externalKind; + act.correlationId = correlation; + activities.push_back(reinterpret_cast(&act)); + } + + void addRuntimeActivity( + CUpti_runtime_api_trace_cbid_enum cbid, + int64_t start_us, int64_t end_us, int64_t correlation) { + auto& act = createActivity( + start_us, end_us, correlation); + act.kind = CUPTI_ACTIVITY_KIND_RUNTIME; + act.cbid = cbid; + act.threadId = threadId(); + activities.push_back(reinterpret_cast(&act)); + } + + void addKernelActivity( + int64_t start_us, int64_t end_us, int64_t correlation) { + auto& act = createActivity( + start_us, end_us, correlation); + act.kind = CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL; + act.deviceId = 0; + act.streamId = 1; + act.name = "kernel"; + act.gridX = act.gridY = act.gridZ = 1; + act.blockX = act.blockY = act.blockZ = 1; + activities.push_back(reinterpret_cast(&act)); + } + + void addMemcpyActivity( + int64_t start_us, int64_t end_us, int64_t correlation) { + auto& act = createActivity( + start_us, end_us, correlation); + act.kind = CUPTI_ACTIVITY_KIND_MEMCPY; + act.deviceId = 0; + act.streamId = 2; + act.copyKind = CUPTI_ACTIVITY_MEMCPY_KIND_HTOD; + act.srcKind = CUPTI_ACTIVITY_MEMORY_KIND_PINNED; + act.dstKind = CUPTI_ACTIVITY_MEMORY_KIND_DEVICE; + activities.push_back(reinterpret_cast(&act)); + } + + template + T& createActivity( + int64_t start_us, int64_t end_us, int64_t correlation) { + T& act = *static_cast(malloc(sizeof(T))); + bzero(&act, sizeof(act)); + act.start = start_us * 1000; + act.end = end_us * 1000; + act.correlationId = correlation; + return act; + } + + ~MockCuptiActivityBuffer() { + for (CUpti_Activity* act : activities) { + free(act); + } + } + + std::vector activities; +}; + +// Mock parts of the CuptiActivityApi +class MockCuptiActivities : public CuptiActivityApi { + public: + virtual int smCount() override { + return 10; + } + + virtual const std::pair processActivities( + CuptiActivityBufferMap&, /*unused*/ + std::function handler) override { + for (CUpti_Activity* act : activityBuffer->activities) { + handler(act); + } + return {activityBuffer->activities.size(), 100}; + } + + virtual std::unique_ptr + activityBuffers() override { + auto map = std::make_unique(); + auto buf = std::make_unique(100); + uint8_t* addr = buf->data(); + (*map)[addr] = std::move(buf); + return map; + } + + void bufferRequestedOverride(uint8_t** buffer, size_t* size, size_t* maxNumRecords) { + this->bufferRequested(buffer, size, maxNumRecords); + } + + std::unique_ptr activityBuffer; +}; + + +// Common setup / teardown and helper functions +class CuptiActivityProfilerTest : public ::testing::Test { + protected: + void SetUp() override { + profiler_ = std::make_unique( + cuptiActivities_, /*cpu only*/ false); + cfg_ = std::make_unique(); + cfg_->validate(std::chrono::system_clock::now()); + loggerFactory.addProtocol("file", [](const std::string& url) { + return std::unique_ptr(new ChromeTraceLogger(url)); + }); + } + + std::unique_ptr cfg_; + MockCuptiActivities cuptiActivities_; + std::unique_ptr profiler_; + ActivityLoggerFactory loggerFactory; +}; + +void checkTracefile(const char* filename) { +#ifdef __linux__ + // Check that the expected file was written and that it has some content + int fd = open(filename, O_RDONLY); + if (!fd) { + perror(filename); + } + EXPECT_TRUE(fd); + // Should expect at least 100 bytes + struct stat buf{}; + fstat(fd, &buf); + EXPECT_GT(buf.st_size, 100); + close(fd); +#endif +} + +TEST(CuptiActivityProfiler, AsyncTrace) { + std::vector log_modules( + {"CuptiActivityProfiler.cpp", "output_json.cpp"}); + SET_LOG_VERBOSITY_LEVEL(1, log_modules); + + MockCuptiActivities activities; + CuptiActivityProfiler profiler(activities, /*cpu only*/ true); + + char filename[] = "/tmp/libkineto_testXXXXXX.json"; + mkstemps(filename, 5); + + Config cfg; + + int iter = 0; + int warmup = 5; + auto now = system_clock::now(); + auto startTime = now + seconds(10); + + bool success = cfg.parse(fmt::format(R"CFG( + ACTIVITIES_WARMUP_PERIOD_SECS = {} + ACTIVITIES_DURATION_SECS = 1 + ACTIVITIES_LOG_FILE = {} + PROFILE_START_TIME = {} + )CFG", warmup, filename, duration_cast(startTime.time_since_epoch()).count())); + + EXPECT_TRUE(success); + EXPECT_FALSE(profiler.isActive()); + + auto logger = std::make_unique(cfg.activitiesLogFile()); + + // Usually configuration is done when now is startTime - warmup to kick off warmup + // but start right away in the test + profiler.configure(cfg, now); + profiler.setLogger(logger.get()); + + EXPECT_TRUE(profiler.isActive()); + + // fast forward in time and we have reached the startTime + now = startTime; + + // Run the profiler + // Warmup + // performRunLoopStep is usually called by the controller loop and takes + // the current time and the controller's next wakeup time. + profiler.performRunLoopStep( + /* Current time */ now, /* Next wakeup time */ now); + + auto next = now + milliseconds(1000); + + // performRunLoopStep can also be called by an application thread to update iteration count + // since this config does not use iteration this should have no effect on the state + while (++iter < 20) { + profiler.performRunLoopStep(now, now, iter); + } + + // Runloop should now be in collect state, so start workload + // Perform another runloop step, passing in the end profile time as current. + // This should terminate collection + profiler.performRunLoopStep( + /* Current time */ next, /* Next wakeup time */ next); + // One step needed for each of the Process and Finalize phases + // Doesn't really matter what times we pass in here. + + EXPECT_TRUE(profiler.isActive()); + + auto nextnext = next + milliseconds(1000); + + while (++iter < 40) { + profiler.performRunLoopStep(next, next, iter); + } + + EXPECT_TRUE(profiler.isActive()); + + profiler.performRunLoopStep(nextnext,nextnext); + profiler.performRunLoopStep(nextnext,nextnext); + + // Assert that tracing has completed + EXPECT_FALSE(profiler.isActive()); + + checkTracefile(filename); +} + +TEST(CuptiActivityProfiler, AsyncTraceUsingIter) { + std::vector log_modules( + {"CuptiActivityProfiler.cpp", "output_json.cpp"}); + SET_LOG_VERBOSITY_LEVEL(1, log_modules); + + auto runIterTest = [&]( + int start_iter, int warmup_iters, int trace_iters) { + + LOG(INFO ) << "Async Trace Test: start_iteration = " << start_iter + << " warmup iterations = " << warmup_iters + << " trace iterations = " << trace_iters; + + MockCuptiActivities activities; + CuptiActivityProfiler profiler(activities, /*cpu only*/ true); + + char filename[] = "/tmp/libkineto_testXXXXXX.json"; + mkstemps(filename, 5); + + Config cfg; + + int iter = 0; + auto now = system_clock::now(); + + bool success = cfg.parse(fmt::format(R"CFG( + PROFILE_START_ITERATION = {} + ACTIVITIES_WARMUP_ITERATIONS={} + ACTIVITIES_ITERATIONS={} + ACTIVITIES_DURATION_SECS = 1 + ACTIVITIES_LOG_FILE = {} + )CFG", start_iter, warmup_iters, trace_iters, filename)); + + EXPECT_TRUE(success); + EXPECT_FALSE(profiler.isActive()); + + auto logger = std::make_unique(cfg.activitiesLogFile()); + + // Usually configuration is done when now is startIter - warmup iter to kick off warmup + // but start right away in the test + while (iter < (start_iter - warmup_iters)) { + profiler.performRunLoopStep(now, now, iter++); + } + + profiler.configure(cfg, now); + profiler.setLogger(logger.get()); + + EXPECT_TRUE(profiler.isActive()); + + // fast forward in time, mimicking what will happen in reality + now += seconds(10); + auto next = now + milliseconds(1000); + + // this call to runloop step should not be effecting the state + profiler.performRunLoopStep(now, next); + EXPECT_TRUE(profiler.isActive()); + + // start trace collection + while (iter < start_iter) { + profiler.performRunLoopStep(now, next, iter++); + } + + // Runloop should now be in collect state, so start workload + + while (iter < (start_iter + trace_iters)) { + profiler.performRunLoopStep(now, next, iter++); + } + + // One step is required for each of the Process and Finalize phases + // Doesn't really matter what times we pass in here. + if (iter >= (start_iter + trace_iters)) { + profiler.performRunLoopStep(now, next, iter++); + } + EXPECT_TRUE(profiler.isActive()); + + auto nextnext = next + milliseconds(1000); + + profiler.performRunLoopStep(nextnext, nextnext); + profiler.performRunLoopStep(nextnext, nextnext); + + // Assert that tracing has completed + EXPECT_FALSE(profiler.isActive()); + + checkTracefile(filename); + }; + + // start iter = 50, warmup iters = 5, trace iters = 10 + runIterTest(50, 5, 10); + // should be able to start at 0 iteration + runIterTest(0, 0, 2); + runIterTest(0, 5, 5); +} + +TEST_F(CuptiActivityProfilerTest, SyncTrace) { + using ::testing::Return; + using ::testing::ByMove; + + // Verbose logging is useful for debugging + std::vector log_modules( + {"CuptiActivityProfiler.cpp"}); + SET_LOG_VERBOSITY_LEVEL(2, log_modules); + + // Start and stop profiling + CuptiActivityProfiler profiler(cuptiActivities_, /*cpu only*/ false); + int64_t start_time_us = 100; + int64_t duration_us = 300; + auto start_time = time_point(microseconds(start_time_us)); + profiler.configure(*cfg_, start_time); + profiler.startTrace(start_time); + profiler.stopTrace(start_time + microseconds(duration_us)); + + profiler.recordThreadInfo(); + + // Log some cpu ops + auto cpuOps = std::make_unique( + start_time_us, start_time_us + duration_us); + cpuOps->addOp("op1", 120, 150, 1); + cpuOps->addOp("op2", 130, 140, 2); + cpuOps->addOp("op3", 200, 250, 3); + profiler.transferCpuTrace(std::move(cpuOps)); + + // And some GPU ops + auto gpuOps = std::make_unique(); + gpuOps->addRuntimeActivity(CUDA_LAUNCH_KERNEL, 133, 138, 1); + gpuOps->addRuntimeActivity(CUDA_MEMCPY, 210, 220, 2); + gpuOps->addRuntimeActivity(CUDA_LAUNCH_KERNEL, 230, 245, 3); + gpuOps->addKernelActivity(150, 170, 1); + gpuOps->addMemcpyActivity(240, 250, 2); + gpuOps->addKernelActivity(260, 320, 3); + cuptiActivities_.activityBuffer = std::move(gpuOps); + + // Have the profiler process them + auto logger = std::make_unique(*cfg_); + profiler.processTrace(*logger); + + // Profiler can be reset at this point - logger owns the activities + profiler_->reset(); + + // Wrapper that allows iterating over the activities + ActivityTrace trace(std::move(logger), loggerFactory); + EXPECT_EQ(trace.activities()->size(), 9); + std::map activityCounts; + std::map resourceIds; + for (auto& activity : *trace.activities()) { + activityCounts[activity->name()]++; + resourceIds[activity->resourceId()]++; + } + for (const auto& p : activityCounts) { + LOG(INFO) << p.first << ": " << p.second; + } + EXPECT_EQ(activityCounts["op1"], 1); + EXPECT_EQ(activityCounts["op2"], 1); + EXPECT_EQ(activityCounts["op3"], 1); + EXPECT_EQ(activityCounts["cudaLaunchKernel"], 2); + EXPECT_EQ(activityCounts["cudaMemcpy"], 1); + EXPECT_EQ(activityCounts["kernel"], 2); + EXPECT_EQ(activityCounts["Memcpy HtoD (Pinned -> Device)"], 1); + + auto sysTid = systemThreadId(); + // Ops and runtime events are on thread sysTid + EXPECT_EQ(resourceIds[sysTid], 6); + // Kernels are on stream 1, memcpy on stream 2 + EXPECT_EQ(resourceIds[1], 2); + EXPECT_EQ(resourceIds[2], 1); + +#ifdef __linux__ + char filename[] = "/tmp/libkineto_testXXXXXX.json"; + mkstemps(filename, 5); + trace.save(filename); + // Check that the expected file was written and that it has some content + int fd = open(filename, O_RDONLY); + if (!fd) { + perror(filename); + } + EXPECT_TRUE(fd); + // Should expect at least 100 bytes + struct stat buf{}; + fstat(fd, &buf); + EXPECT_GT(buf.st_size, 100); +#endif +} + +TEST_F(CuptiActivityProfilerTest, GpuUserAnnotationTest) { + // Verbose logging is useful for debugging + std::vector log_modules( + {"CuptiActivityProfiler.cpp"}); + SET_LOG_VERBOSITY_LEVEL(2, log_modules); + + // Start and stop profiling + CuptiActivityProfiler profiler(cuptiActivities_, /*cpu only*/ false); + int64_t start_time_us = 100; + int64_t duration_us = 300; + auto start_time = time_point(microseconds(start_time_us)); + profiler.configure(*cfg_, start_time); + profiler.startTrace(start_time); + profiler.stopTrace(start_time + microseconds(duration_us)); + + int64_t kernelLaunchTime = 120; + profiler.recordThreadInfo(); + + // set up CPU event + auto cpuOps = std::make_unique( + start_time_us, start_time_us + duration_us); + cpuOps->addOp("annotation", kernelLaunchTime, kernelLaunchTime + 10, 1); + profiler.transferCpuTrace(std::move(cpuOps)); + + // set up a couple of GPU events and correlate with above CPU event. + // CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1 is used for user annotations. + auto gpuOps = std::make_unique(); + gpuOps->addCorrelationActivity(1, CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1, 1); + gpuOps->addKernelActivity(kernelLaunchTime + 5, kernelLaunchTime + 10, 1); + gpuOps->addCorrelationActivity(1, CUPTI_EXTERNAL_CORRELATION_KIND_CUSTOM1, 1); + gpuOps->addKernelActivity(kernelLaunchTime + 15, kernelLaunchTime + 25, 1); + cuptiActivities_.activityBuffer = std::move(gpuOps); + + // process trace + auto logger = std::make_unique(*cfg_); + profiler.processTrace(*logger); + + ActivityTrace trace(std::move(logger), loggerFactory); + std::map counts; + for (auto& activity : *trace.activities()) { + counts[activity->name()]++; + } + + // We should now have an additional annotation activity created + // on the GPU timeline. + EXPECT_EQ(counts["annotation"], 2); + EXPECT_EQ(counts["kernel"], 2); + + auto& annotation = trace.activities()->at(0); + auto& kernel1 = trace.activities()->at(1); + auto& kernel2 = trace.activities()->at(2); + auto& gpu_annotation = trace.activities()->at(3); + EXPECT_EQ(gpu_annotation->type(), ActivityType::GPU_USER_ANNOTATION); + EXPECT_EQ(gpu_annotation->timestamp(), kernel1->timestamp()); + EXPECT_EQ( + gpu_annotation->duration(), + kernel2->timestamp() + kernel2->duration() - kernel1->timestamp()); + EXPECT_EQ(gpu_annotation->deviceId(), kernel1->deviceId()); + EXPECT_EQ(gpu_annotation->resourceId(), kernel1->resourceId()); + EXPECT_EQ(gpu_annotation->correlationId(), annotation->correlationId()); + EXPECT_EQ(gpu_annotation->name(), annotation->name()); +} + +TEST_F(CuptiActivityProfilerTest, SubActivityProfilers) { + using ::testing::Return; + using ::testing::ByMove; + + // Verbose logging is useful for debugging + std::vector log_modules( + {"CuptiActivityProfiler.cpp"}); + SET_LOG_VERBOSITY_LEVEL(2, log_modules); + + // Setup example events to test + GenericTraceActivity ev{defaultTraceSpan(), ActivityType::GLOW_RUNTIME, ""}; + ev.device = 1; + ev.resource = 0; + + int64_t start_time_us = 100; + int64_t duration_us = 1000; + auto start_time = time_point(microseconds(start_time_us)); + + std::vector test_activities{3, ev}; + test_activities[0].startTime = start_time_us; + test_activities[0].endTime = start_time_us + 5000; + test_activities[0].activityName = "SubGraph A execution"; + test_activities[1].startTime = start_time_us; + test_activities[1].endTime = start_time_us + 2000; + test_activities[1].activityName = "Operator foo"; + test_activities[2].startTime = start_time_us + 2500; + test_activities[2].endTime = start_time_us + 2900; + test_activities[2].activityName = "Operator bar"; + + auto mock_activity_profiler = + std::make_unique(test_activities); + + MockCuptiActivities activities; + CuptiActivityProfiler profiler(activities, /*cpu only*/ true); + profiler.addChildActivityProfiler( + std::move(mock_activity_profiler)); + + profiler.configure(*cfg_, start_time); + profiler.startTrace(start_time); + EXPECT_TRUE(profiler.isActive()); + + profiler.stopTrace(start_time + microseconds(duration_us)); + EXPECT_TRUE(profiler.isActive()); + + char filename[] = "/tmp/libkineto_testXXXXXX.json"; + mkstemps(filename, 5); + LOG(INFO) << "Logging to tmp file " << filename; + + // process trace + auto logger = std::make_unique(*cfg_); + profiler.processTrace(*logger); + profiler.setLogger(logger.get()); + + ActivityTrace trace(std::move(logger), loggerFactory); + trace.save(filename); + const auto& traced_activites = trace.activities(); + + // Test we have all the events + EXPECT_EQ(traced_activites->size(), test_activities.size()); + + // Check that the expected file was written and that it has some content + int fd = open(filename, O_RDONLY); + if (!fd) { + perror(filename); + } + EXPECT_TRUE(fd); + + // Should expect at least 100 bytes + struct stat buf{}; + fstat(fd, &buf); + EXPECT_GT(buf.st_size, 100); +} + +TEST_F(CuptiActivityProfilerTest, BufferSizeLimitTestWarmup) { + CuptiActivityProfiler profiler(cuptiActivities_, /*cpu only*/ false); + + auto now = system_clock::now(); + auto startTime = now + seconds(10); + + int maxBufferSizeMB = 3; + + auto startTimeEpoch = std::to_string(duration_cast(startTime.time_since_epoch()).count()); + std::string maxBufferSizeMBStr = std::to_string(maxBufferSizeMB); + cfg_->handleOption("ACTIVITIES_MAX_GPU_BUFFER_SIZE_MB", maxBufferSizeMBStr); + cfg_->handleOption("PROFILE_START_TIME", startTimeEpoch); + + + EXPECT_FALSE(profiler.isActive()); + profiler.configure(*cfg_, now); + EXPECT_TRUE(profiler.isActive()); + + for (size_t i = 0; i < maxBufferSizeMB; i++) { + uint8_t* buf; + size_t gpuBufferSize; + size_t maxNumRecords; + cuptiActivities_.bufferRequestedOverride(&buf, &gpuBufferSize, &maxNumRecords); + } + + // fast forward to startTime and profiler is now running + now = startTime; + + profiler.performRunLoopStep(now, now); + + auto next = now + milliseconds(1000); + profiler.performRunLoopStep(next, next); + profiler.performRunLoopStep(next, next); + profiler.performRunLoopStep(next, next); + + EXPECT_FALSE(profiler.isActive()); +} diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiCallbackApiTest.cpp b/plugins/tensorboard-plugins/libkineto/test/CuptiCallbackApiTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..253b696da54d1919e9c0076c5691a11e35345686 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiCallbackApiTest.cpp @@ -0,0 +1,239 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "src/Logger.h" +#include "src/CuptiCallbackApi.h" + +#include +#include +#include +#include + +using namespace std::chrono; +using namespace KINETO_NAMESPACE; +using namespace libkineto; + +const size_t some_data = 42; + +std::atomic simple_cb_calls = 0; + +void simple_cb( + CUpti_CallbackDomain domain, + CUpti_CallbackId cbid, + const CUpti_CallbackData* cbInfo) { + + // simple arg check + EXPECT_EQ(domain, CUPTI_CB_DOMAIN_RUNTIME_API); + EXPECT_EQ(cbid, CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000); + EXPECT_EQ(*reinterpret_cast(cbInfo), some_data); + + simple_cb_calls++; +} + +void atomic_cb( + CUpti_CallbackDomain /*domain*/, + CUpti_CallbackId /*cbid*/, + const CUpti_CallbackData* /*cbInfo)*/) { + // do some atomics in a loop + for (int i = 0; i < 1000; i++) { + // would have used release consistency but this is fine + simple_cb_calls++; + } +} + +void empty_cb( + CUpti_CallbackDomain /*domain*/, + CUpti_CallbackId /*cbid*/, + const CUpti_CallbackData* /*cbInfo*/) { +} + +TEST(CuptiCallbackApiTest, SimpleTest) { + auto& api = CuptiCallbackApi::singleton(); + + auto addSimpleCallback = [&]() -> bool { + bool ret = api.registerCallback( + CUPTI_CB_DOMAIN_RUNTIME_API, + CuptiCallbackApi::CUDA_LAUNCH_KERNEL, + &simple_cb + ); + return ret; + }; + EXPECT_TRUE(addSimpleCallback()) << "Failed to add callback"; + + // duplicate add should be okay + EXPECT_TRUE(addSimpleCallback()) << "Failed to re-add callback"; + + simple_cb_calls = 0; + + // simulate callback + api.__callback_switchboard( + CUPTI_CB_DOMAIN_RUNTIME_API, + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000, + reinterpret_cast(&some_data)); + + EXPECT_EQ(simple_cb_calls, 1); + + bool ret = api.deleteCallback( + CUPTI_CB_DOMAIN_RUNTIME_API, + CuptiCallbackApi::CUDA_LAUNCH_KERNEL, + &simple_cb + ); + + EXPECT_TRUE(ret) << "Failed to remove callback"; + + ret = api.deleteCallback( + CUPTI_CB_DOMAIN_RUNTIME_API, + CuptiCallbackApi::CUDA_LAUNCH_KERNEL, + &atomic_cb + ); + + EXPECT_FALSE(ret) << "oops! deleted a callback that was never added"; +} + +TEST(CuptiCallbackApiTest, AllCallbacks) { + auto& api = CuptiCallbackApi::singleton(); + + auto testCallback = [&]( + CUpti_CallbackDomain domain, + CUpti_CallbackId cbid, + CuptiCallbackApi::CuptiCallBackID kineto_cbid) -> bool { + + bool ret = api.registerCallback(domain, kineto_cbid, atomic_cb); + EXPECT_TRUE(ret) << "Failed to add callback"; + + if (!ret) { + return false; + } + + simple_cb_calls = 0; + api.__callback_switchboard(domain, cbid, nullptr); + EXPECT_EQ(simple_cb_calls, 1000); + ret = simple_cb_calls == 1000; + + EXPECT_TRUE(api.deleteCallback(domain, kineto_cbid, atomic_cb)); + + return ret; + }; + + EXPECT_TRUE( + testCallback( + CUPTI_CB_DOMAIN_RESOURCE, + CUPTI_CBID_RESOURCE_CONTEXT_CREATED, + CuptiCallbackApi::RESOURCE_CONTEXT_CREATED)) + << "Failed to run callback for RESOURCE_CONTEXT_CREATED"; + + EXPECT_TRUE( + testCallback( + CUPTI_CB_DOMAIN_RESOURCE, + CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING, + CuptiCallbackApi::RESOURCE_CONTEXT_DESTROYED)) + << "Failed to run callback for RESOURCE_CONTEXT_DESTROYED"; + + EXPECT_TRUE( + testCallback( + CUPTI_CB_DOMAIN_RUNTIME_API, + CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000, + CuptiCallbackApi::CUDA_LAUNCH_KERNEL)) + << "Failed to run callback for CUDA_LAUNCH_KERNEL"; + +} + +TEST(CuptiCallbackApiTest, ContentionTest) { + auto& api = CuptiCallbackApi::singleton(); + const CUpti_CallbackDomain domain = CUPTI_CB_DOMAIN_RUNTIME_API; + const CUpti_CallbackId cbid = CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000; + const CuptiCallbackApi::CuptiCallBackID kineto_cbid = + CuptiCallbackApi::CUDA_LAUNCH_KERNEL; + + bool ret = api.registerCallback(domain, kineto_cbid, empty_cb); + EXPECT_TRUE(ret) << "Failed to add callback"; + + const int iters = 10000; + const int num_readers = 8; + + simple_cb_calls = 0; + + // simulate callbacks being executed on multiple threads in parallel + // during this interval add a new atomic_callback. + // this test ensured mutual exclusion is working fine + auto read_fn = [&](int tid){ + auto start_ts = high_resolution_clock::now(); + for (int i = 0; i < iters; i++) { + api.__callback_switchboard(domain, cbid, nullptr); + } + auto runtime_ms = duration_cast( + high_resolution_clock::now() - start_ts); + LOG(INFO) << "th " << tid << " done in " << runtime_ms.count() << " ms"; + }; + + + std::vector read_ths; + for (int i = 0; i< num_readers; i++) { + read_ths.emplace_back(read_fn, i); + } + + ret = api.registerCallback(domain, kineto_cbid, atomic_cb); + EXPECT_TRUE(ret) << "Failed to add callback"; + + for (auto& t : read_ths) { + t.join(); + } + + //EXPECT_GT(simple_cb_calls, 0) + // << "Atomic callback should have been called at least once."; + + api.deleteCallback(domain, kineto_cbid, empty_cb); + api.deleteCallback(domain, kineto_cbid, atomic_cb); +} + +TEST(CuptiCallbackApiTest, Bechmark) { + + constexpr int iters = 1000; + // atomic bench a number of times to get a baseline + + const CUpti_CallbackDomain domain = CUPTI_CB_DOMAIN_RUNTIME_API; + const CUpti_CallbackId cbid = CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000; + const CuptiCallbackApi::CuptiCallBackID kineto_cbid = + CuptiCallbackApi::CUDA_LAUNCH_KERNEL; + + LOG(INFO) << "Iteration count = " << iters; + + const bool use_empty = true; + auto cbfn = use_empty ? &empty_cb : &atomic_cb; + + // warmup + for (int i = 0; i < 50; i++) { + (*cbfn)(domain, cbid, nullptr); + } + + auto start_ts = high_resolution_clock::now(); + for (int i = 0; i < iters; i++) { + (*cbfn)(domain, cbid, nullptr); + } + auto delta_baseline_ns = duration_cast( + high_resolution_clock::now() - start_ts); + LOG(INFO) << "Baseline runtime = " << delta_baseline_ns.count() << " ns"; + + + auto& api = CuptiCallbackApi::singleton(); + bool ret = api.registerCallback(domain, kineto_cbid, cbfn); + EXPECT_TRUE(ret) << "Failed to add callback"; + + // warmup + for (int i = 0; i < 50; i++) { + api.__callback_switchboard(domain, cbid, nullptr); + } + + start_ts = high_resolution_clock::now(); + for (int i = 0; i < iters; i++) { + api.__callback_switchboard(domain, cbid, nullptr); + } + + auto delta_callback_ns = duration_cast( + high_resolution_clock::now() - start_ts); + LOG(INFO) << "Callback runtime = " << delta_callback_ns.count() << " ns"; + + LOG(INFO) << "Callback runtime per iteration = " << + (delta_callback_ns.count() - delta_baseline_ns.count()) / (double) iters + << " ns"; + +} diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiProfilerApiTest.cu b/plugins/tensorboard-plugins/libkineto/test/CuptiProfilerApiTest.cu new file mode 100644 index 0000000000000000000000000000000000000000..54ad51b0a1fc9a6a54585d1cad4674943c874b98 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiProfilerApiTest.cu @@ -0,0 +1,353 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include +#include + +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "src/Logger.h" +#include "src/CuptiRangeProfilerApi.h" + +#define DRIVER_API_CALL(apiFuncCall) \ + do { \ + CUresult _status = apiFuncCall; \ + if (_status != CUDA_SUCCESS) { \ + LOG(ERROR) << "Failed invoking CUDA driver function " \ + << #apiFuncCall << " status = " \ + << _status; \ + exit(-1); \ + } \ + } while (0) + +#define EXPECT(expr)\ + if (!(expr)) {\ + }; + +using namespace KINETO_NAMESPACE; + +static int numRanges = 1; + +using Type = double; + +// Device code +__global__ void VecAdd(const Type* A, const Type* B, Type* C, int N) { + int i = blockDim.x * blockIdx.x + threadIdx.x; + if (i < N) { + C[i] = A[i] + B[i]; + } +} + +// Device code +__global__ void VecSub(const Type* A, const Type* B, Type* C, int N) { + int i = blockDim.x * blockIdx.x + threadIdx.x; + if (i < N) { + C[i] = A[i] - B[i]; + } +} + +static void initVec(Type* vec, int n) { + for (int i = 0; i < n; i++) { + vec[i] = i; + } +} + +static void cleanUp( + Type* h_A, + Type* h_B, + Type* h_C, + Type* h_D, + Type* d_A, + Type* d_B, + Type* d_C, + Type* d_D) { + if (d_A) + cudaFree(d_A); + if (d_B) + cudaFree(d_B); + if (d_C) + cudaFree(d_C); + if (d_D) + cudaFree(d_D); + + // Free host memory + if (h_A) + free(h_A); + if (h_B) + free(h_B); + if (h_C) + free(h_C); + if (h_D) + free(h_D); +} + +/* Benchmark application used to test profiler measurements + * This simply runs two kernels vector Add and Vector Subtract + */ + +void VectorAddSubtract() { + int N = 50000; + size_t size = N * sizeof(Type); + int threadsPerBlock = 0; + int blocksPerGrid = 0; + Type *h_A, *h_B, *h_C, *h_D; + Type *d_A, *d_B, *d_C, *d_D; + int i; + Type sum, diff; + + // Allocate input vectors h_A and h_B in host memory + h_A = (Type*)malloc(size); + h_B = (Type*)malloc(size); + h_C = (Type*)malloc(size); + h_D = (Type*)malloc(size); + + // Initialize input vectors + initVec(h_A, N); + initVec(h_B, N); + memset(h_C, 0, size); + memset(h_D, 0, size); + + // Allocate vectors in device memory + cudaMalloc((void**)&d_A, size); + cudaMalloc((void**)&d_B, size); + cudaMalloc((void**)&d_C, size); + cudaMalloc((void**)&d_D, size); + + // Copy vectors from host memory to device memory + cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice); + cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice); + + // Invoke kernel + threadsPerBlock = 256; + blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; + LOG(INFO) << fmt::format( + "Launching kernel: blocks {}, thread/block {}", + blocksPerGrid, + threadsPerBlock); + + VecAdd<<>>(d_A, d_B, d_C, N); + + VecSub<<>>(d_A, d_B, d_D, N); + + // Copy result from device memory to host memory + // h_C contains the result in host memory + cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost); + cudaMemcpy(h_D, d_D, size, cudaMemcpyDeviceToHost); + + // Verify result + for (i = 0; i < N; ++i) { + sum = h_A[i] + h_B[i]; + diff = h_A[i] - h_B[i]; + if (h_C[i] != sum || h_D[i] != diff) { + LOG(ERROR) << "Result verification failed"; + break; + } + } + + cleanUp(h_A, h_B, h_C, h_D, d_A, d_B, d_C, d_D); +} + +#if HAS_CUPTI_RANGE_PROFILER +bool runTestWithAutoRange( + int deviceNum, + const std::vector& metricNames, + CUcontext cuContext, + bool async) { + + // create a CUPTI range based profiling profiler + // this configures the counter data as well + CuptiRBProfilerSession profiler( + metricNames, deviceNum, 2, 1, async ? nullptr : cuContext); + + CUpti_ProfilerRange profilerRange = CUPTI_AutoRange; + CUpti_ProfilerReplayMode profilerReplayMode = CUPTI_KernelReplay; + + if (async) { + profiler.asyncStartAndEnable(profilerRange, profilerReplayMode); + } else { + profiler.start(profilerRange, profilerReplayMode); + profiler.enable(); + } + + VectorAddSubtract(); + + if (!async) { + profiler.disable(); + // stop profiler + profiler.stop(); + } else { + profiler.asyncDisableAndStop(); + } + + auto result = profiler.evaluateMetrics(true); + + // check results + EXPECT_EQ(result.metricNames.size(), 3); + EXPECT_EQ(result.rangeVals.size(), 2); + + for (const auto& measurement : result.rangeVals) { + EXPECT_EQ(measurement.values.size(), 3); + + if (measurement.values.size() == 3) { + // smsp__warps_launched.avg + EXPECT_NE(measurement.values[0], 0); + // smsp__sass_thread_inst_executed_op_dadd_pred_on.sum + // each kernel has 50000 dadd ops + EXPECT_EQ(measurement.values[1], 50000); + // sm__inst_executed_pipe_tensor.sum + //EXPECT_EQ(measurement.values[2], 0); + } + } + return true; +} + +bool runTestWithUserRange( + int deviceNum, + const std::vector& metricNames, + CUcontext cuContext, + bool async = false) { + + // create a CUPTI range based profiling profiler + // this configures the counter data as well + CuptiRBProfilerSession profiler( + metricNames, deviceNum, numRanges, 1, async ? nullptr : cuContext); + + CUpti_ProfilerRange profilerRange = CUPTI_UserRange; + CUpti_ProfilerReplayMode profilerReplayMode = CUPTI_UserReplay; + + if (async) { + profiler.asyncStartAndEnable(profilerRange, profilerReplayMode); + { VectorAddSubtract(); } + profiler.disableAndStop(); + } else { + profiler.start(profilerRange, profilerReplayMode); + + /* User takes the resposiblity of replaying the kernel launches */ + bool replay = true; + do { + profiler.beginPass(); + { + profiler.enable(); + + std::string rangeName = "vecAddSub"; + profiler.pushRange(rangeName); + + { VectorAddSubtract(); } + + profiler.popRange(); + profiler.disable(); + } + LOG(INFO) << "Replay starting."; + replay = profiler.endPass(); + + } while (!replay); + + // stop profiler + profiler.stop(); + } + VectorAddSubtract(); + auto result = profiler.evaluateMetrics(true); + + // check results + EXPECT_EQ(result.metricNames.size(), 3); + EXPECT_EQ(result.rangeVals.size(), 1); + + if (result.rangeVals.size() > 0) { + const auto& measurement = result.rangeVals[0]; + EXPECT_EQ(measurement.values.size(), 3); + + if (measurement.values.size() == 3) { + // smsp__warps_launched.avg + EXPECT_NE(measurement.values[0], 0); + // smsp__sass_thread_inst_executed_op_dadd_pred_on.sum + // in async mode multiple passes are not supported yet + if (!async) { + EXPECT_EQ(measurement.values[1], 100000); + } + // sm__inst_executed_pipe_tensor.sum + //EXPECT_EQ(measurement.values[2], 0); + } + } + return true; +} +#endif // HAS_CUPTI_RANGE_PROFILER + +int main(int argc, char* argv[]) { + + CUdevice cuDevice; + + int deviceCount, deviceNum; + int computeCapabilityMajor = 0, computeCapabilityMinor = 0; + + printf("Usage: %s [device_num]\n", argv[0]); + + DRIVER_API_CALL(cuInit(0)); + DRIVER_API_CALL(cuDeviceGetCount(&deviceCount)); + + if (deviceCount == 0) { + LOG(ERROR) << "There is no device supporting CUDA."; + return -2; + } + + if (argc > 1) + deviceNum = atoi(argv[1]); + else + deviceNum = 0; + LOG(INFO) << "CUDA Device Number: " << deviceNum; + + DRIVER_API_CALL(cuDeviceGet(&cuDevice, deviceNum)); + DRIVER_API_CALL(cuDeviceGetAttribute( + &computeCapabilityMajor, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, + cuDevice)); + DRIVER_API_CALL(cuDeviceGetAttribute( + &computeCapabilityMinor, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, + cuDevice)); + + LOG(INFO) << "Compute Cabapbility = " + << fmt::format("{},{}",computeCapabilityMajor, computeCapabilityMinor); + + if (computeCapabilityMajor < 7) { + LOG(ERROR) << "CUPTI Profiler is not supported with compute capability < 7.0"; + return -2; + } + + CuptiRBProfilerSession::staticInit(); + + // metrics to profile + std::vector metricNames = { + "smsp__warps_launched.avg", + "smsp__sass_thread_inst_executed_op_dadd_pred_on.sum", + "sm__inst_executed_pipe_tensor.sum", + }; + + CUcontext cuContext; + DRIVER_API_CALL(cuCtxCreate(&cuContext, 0, cuDevice)); + + VectorAddSubtract(); + +#if HAS_CUPTI_RANGE_PROFILER + CuptiRBProfilerSession::staticInit(); + + if (!runTestWithUserRange(deviceNum, metricNames, cuContext, false)) { + LOG(ERROR) << "Failed to profiler test benchmark in user range"; + } else if (!runTestWithAutoRange(deviceNum, metricNames, cuContext, false)) { + LOG(ERROR) << "Failed to profiler test benchmark in auto range"; + } else if (!runTestWithUserRange(deviceNum, metricNames, cuContext, true)) { + LOG(ERROR) << "Failed to profiler test benchmark in user range async"; + } else if (!runTestWithAutoRange(deviceNum, metricNames, cuContext, true)) { + LOG(ERROR) << "Failed to profiler test benchmark in auto range async"; + } + + CuptiRBProfilerSession::deInitCupti(); +#else + LOG(WARNING) << "CuptiRBProfilerSession is not supported."; +#endif // HAS_CUPTI_RANGE_PROFILER + DRIVER_API_CALL(cuCtxDestroy(cuContext)); + + + return 0; +} diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerApiTest.cpp b/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerApiTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..28cad722c53ee5defaa7c24cbe0d6b2cbc840a30 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerApiTest.cpp @@ -0,0 +1,113 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include +#include + +#include "include/libkineto.h" +#include "include/Config.h" +#include "src/CuptiRangeProfilerApi.h" + +#include "src/Logger.h" +#include "test/CuptiRangeProfilerTestUtil.h" + +using namespace KINETO_NAMESPACE; + +#if HAS_CUPTI_PROFILER + +TEST(CuptiRangeProfilerApiTest, contextTracking) { + std::vector log_modules( + {"CuptiRangeProfilerApi.cpp"}); + SET_LOG_VERBOSITY_LEVEL(1, log_modules); + + std::array data; + std::array contexts; + for (int i = 0; i < data.size(); i++) { + contexts[i] = reinterpret_cast(&data[i]); + } + + // simulate creating contexts, this calls the trackCudaContexts + // function that would otherwise be called via a callback + uint32_t dev = 0; + for (auto ctx : contexts) { + simulateCudaContextCreate(ctx, dev++); + } + + EXPECT_EQ( + CuptiRBProfilerSession::getActiveDevices(), + std::set({0, 1, 2})); + + simulateCudaContextDestroy(contexts[1], 1); + + EXPECT_EQ( + CuptiRBProfilerSession::getActiveDevices(), + std::set({0, 2})); + + simulateCudaContextDestroy(contexts[0], 0); + simulateCudaContextDestroy(contexts[2], 2); + + EXPECT_TRUE( + CuptiRBProfilerSession::getActiveDevices().empty()); +} + +TEST(CuptiRangeProfilerApiTest, asyncLaunchUserRange) { + std::vector log_modules( + {"CuptiRangeProfilerApi.cpp"}); + SET_LOG_VERBOSITY_LEVEL(1, log_modules); + + // this is bad but the pointer is never accessed + CUcontext ctx0 = reinterpret_cast(10); + simulateCudaContextCreate(ctx0, 0 /*device_id*/); + + auto session = std::make_unique(0, ctx0); + session->asyncStartAndEnable(CUPTI_UserRange, CUPTI_UserReplay); + + simulateKernelLaunch(ctx0, "hello"); + simulateKernelLaunch(ctx0, "foo"); + simulateKernelLaunch(ctx0, "bar"); + + session->asyncDisableAndStop(); + // stop happens after next kernel is run + simulateKernelLaunch(ctx0, "bar"); + simulateCudaContextDestroy(ctx0, 0 /*device_id*/); + + EXPECT_EQ(session->passes_ended, 1); + EXPECT_EQ(session->ranges_ended, 1); + EXPECT_TRUE(session->enabled); +} + +TEST(CuptiRangeProfilerApiTest, asyncLaunchAutoRange) { + std::vector log_modules( + {"CuptiRangeProfilerApi.cpp"}); + SET_LOG_VERBOSITY_LEVEL(1, log_modules); + + // this is bad but the pointer is never accessed + CUcontext ctx0 = reinterpret_cast(10); + CUcontext ctx1 = reinterpret_cast(11); + + simulateCudaContextCreate(ctx0, 0 /*device_id*/); + + auto session = std::make_unique(0, ctx0); + session->asyncStartAndEnable(CUPTI_AutoRange, CUPTI_KernelReplay); + + simulateKernelLaunch(ctx0, "hello"); + simulateKernelLaunch(ctx0, "foo"); + simulateKernelLaunch(ctx1, "kernel_on_different_device"); + simulateKernelLaunch(ctx0, "bar"); + + session->asyncDisableAndStop(); + // stop happens after next kernel is run + simulateKernelLaunch(ctx0, "bar"); + simulateCudaContextDestroy(ctx0, 0 /*device_id*/); + + EXPECT_EQ(session->passes_ended, 0); + EXPECT_EQ(session->ranges_ended, 0); + EXPECT_TRUE(session->enabled); + + EXPECT_EQ( + session->getKernelNames(), + std::vector({"hello", "foo", "bar"})) + << "Kernel names were not tracked"; +} + +#endif // HAS_CUPTI_PROFILER diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerConfigTest.cpp b/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerConfigTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..3f568968238a0e376ab3bae621af00a162af0d25 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerConfigTest.cpp @@ -0,0 +1,67 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "include/Config.h" +#include "src/CuptiRangeProfilerConfig.h" + +#include +#include +#include +#include + +using namespace std::chrono; +using namespace KINETO_NAMESPACE; + +class CuptiRangeProfilerConfigTest : public ::testing::Test { + protected: + void SetUp() override { + CuptiRangeProfilerConfig::registerFactory(); + } +}; + +TEST_F(CuptiRangeProfilerConfigTest, ConfigureProfiler) { + Config cfg; + std::vector metrics = { + "kineto__cuda_core_flops", + "sm__inst_executed.sum", + "l1tex__data_bank_conflicts_pipe_lsu.sum", + }; + auto metricsConfigStr = + fmt::format("CUPTI_PROFILER_METRICS = {}", fmt::join(metrics, ",")); + + EXPECT_TRUE(cfg.parse(metricsConfigStr)); + EXPECT_TRUE(cfg.parse("CUPTI_PROFILER_ENABLE_PER_KERNEL = true")); + EXPECT_TRUE(cfg.parse("CUPTI_PROFILER_MAX_RANGES = 42")); + + const CuptiRangeProfilerConfig& cupti_cfg = + CuptiRangeProfilerConfig::get(cfg); + + EXPECT_EQ(cupti_cfg.activitiesCuptiMetrics(), metrics); + EXPECT_EQ(cupti_cfg.cuptiProfilerPerKernel(), true); + EXPECT_EQ(cupti_cfg.cuptiProfilerMaxRanges(), 42); + +} + +TEST_F(CuptiRangeProfilerConfigTest, RangesDefaults) { + Config cfg, cfg_auto; + + // do not set max ranges in config, check defaults are sane + EXPECT_TRUE(cfg.parse("CUPTI_PROFILER_METRICS = kineto__cuda_core_flops")); + EXPECT_TRUE(cfg.parse("CUPTI_PROFILER_ENABLE_PER_KERNEL = false")); + + cfg.setSignalDefaults(); + + EXPECT_TRUE(cfg_auto.parse("CUPTI_PROFILER_METRICS = kineto__cuda_core_flops")); + EXPECT_TRUE(cfg_auto.parse("CUPTI_PROFILER_ENABLE_PER_KERNEL = true")); + + cfg_auto.setClientDefaults(); + + int user_ranges, auto_ranges; + + user_ranges = CuptiRangeProfilerConfig::get(cfg).cuptiProfilerMaxRanges(); + auto_ranges = CuptiRangeProfilerConfig::get(cfg_auto).cuptiProfilerMaxRanges(); + + EXPECT_GE(user_ranges, 1) << " in user range mode default to at least 1 ranges"; + EXPECT_GE(auto_ranges, 1000) << " in auto range mode default to at least 1000 ranges"; + + EXPECT_GT(auto_ranges, user_ranges); +} diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerTestUtil.h b/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerTestUtil.h new file mode 100644 index 0000000000000000000000000000000000000000..861b65fd701bf69373df657ab2a22d9dba0b27df --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiRangeProfilerTestUtil.h @@ -0,0 +1,96 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "CuptiRangeProfilerApi.h" + +namespace KINETO_NAMESPACE { + +#if HAS_CUPTI_PROFILER + +class MockCuptiRBProfilerSession : public CuptiRBProfilerSession { + public: + MockCuptiRBProfilerSession(int deviceId, CUcontext ctx) + : CuptiRBProfilerSession(deviceId, ctx) {} + + void beginPass() override { + LOG(INFO) << " Mock CUPTI begin pass"; + passes_started++; + } + + bool endPass() override { + passes_ended++; + return true; + } + + void flushCounterData() override {} + + void pushRange(const std::string& rangeName) override { + LOG(INFO) << " Mock CUPTI pushrange ( " << rangeName << " )"; + ranges_started++; + } + + void popRange() override { + LOG(INFO) << " Mock CUPTI poprange"; + ranges_ended++; + } + + void stop() override { + runChecks(); + } + + void enable() override { + enabled = true; + } + void disable() override {} + + CuptiProfilerResult evaluateMetrics(bool /*verbose*/) override { + return result; + } + +protected: + void startInternal( + CUpti_ProfilerRange profilerRange, + CUpti_ProfilerReplayMode profilerReplayMode) override { + curRange_ = profilerRange; + curReplay_ = profilerReplayMode; + } + +private: + void runChecks() { + EXPECT_EQ(passes_started, passes_ended); + EXPECT_EQ(ranges_started, ranges_ended); + } + + public: + int passes_started = 0; + int passes_ended = 0; + int ranges_started = 0; + int ranges_ended = 0; + bool enabled = false; + + CuptiProfilerResult result; + +}; + +inline void simulateCudaContextCreate(CUcontext context, uint32_t dev) { + testing::trackCudaCtx( + context, dev, CUPTI_CBID_RESOURCE_CONTEXT_CREATED); +} + +inline void simulateCudaContextDestroy(CUcontext context, uint32_t dev) { + testing::trackCudaCtx( + context, dev, CUPTI_CBID_RESOURCE_CONTEXT_DESTROY_STARTING); +} + +inline void simulateKernelLaunch( + CUcontext context, const std::string& kernelName) { + testing::trackCudaKernelLaunch(context, kernelName.c_str()); +} + +#endif // HAS_CUPTI_PROFILER + +} // namespace KINETO_NAMESPACE diff --git a/plugins/tensorboard-plugins/libkineto/test/CuptiStringsTest.cpp b/plugins/tensorboard-plugins/libkineto/test/CuptiStringsTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..405f9404a49a5bf8b7433930b0ad2fe898ea2d89 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/CuptiStringsTest.cpp @@ -0,0 +1,29 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include + +#include "src/cupti_strings.h" + +using namespace KINETO_NAMESPACE; + +TEST(CuptiStringsTest, Valid) { + ASSERT_STREQ( + runtimeCbidName(CUPTI_RUNTIME_TRACE_CBID_INVALID), "INVALID"); + ASSERT_STREQ( + runtimeCbidName(CUPTI_RUNTIME_TRACE_CBID_cudaDriverGetVersion_v3020), + "cudaDriverGetVersion"); + ASSERT_STREQ(runtimeCbidName + (CUPTI_RUNTIME_TRACE_CBID_cudaDeviceSynchronize_v3020), + "cudaDeviceSynchronize"); + ASSERT_STREQ( + runtimeCbidName(CUPTI_RUNTIME_TRACE_CBID_cudaStreamSetAttribute_ptsz_v11000), + "cudaStreamSetAttribute_ptsz"); +} + +TEST(CuptiStringsTest, Invalid) { + ASSERT_STREQ(runtimeCbidName(-1), "INVALID"); + // We can't actually use CUPTI_RUNTIME_TRACE_CBID_SIZE here until we + // auto-generate the string table, since it may have more entries than + // the enum in the version used to compile. + ASSERT_STREQ(runtimeCbidName(1000), "INVALID"); +} diff --git a/plugins/tensorboard-plugins/libkineto/test/EventProfilerTest.cpp b/plugins/tensorboard-plugins/libkineto/test/EventProfilerTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..cb36c826a7f32b2fe6732e73eae3b6a006b0cd3d --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/EventProfilerTest.cpp @@ -0,0 +1,578 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "src/EventProfiler.h" + +#include +#include +#include + +using namespace std::chrono; +using namespace KINETO_NAMESPACE; + +TEST(PercentileTest, Create) { + PercentileList pct = {{10, SampleValue(0)}, + {49, SampleValue(0)}, + {50, SampleValue(0)}, + {90, SampleValue(0)}}; + + percentiles({0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100}, pct); + EXPECT_EQ(pct[0].second.getInt(), 10); + EXPECT_EQ(pct[1].second.getInt(), 50); + EXPECT_EQ(pct[2].second.getInt(), 50); + EXPECT_EQ(pct[3].second.getInt(), 90); + + percentiles({80, 10, 20, 70, 60, 40, 90, 30, 50, 0, 100}, pct); + EXPECT_EQ(pct[0].second.getInt(), 10); + EXPECT_EQ(pct[1].second.getInt(), 50); + EXPECT_EQ(pct[2].second.getInt(), 50); + EXPECT_EQ(pct[3].second.getInt(), 90); + + percentiles({80}, pct); + EXPECT_EQ(pct[0].second.getInt(), 80); + EXPECT_EQ(pct[1].second.getInt(), 80); + EXPECT_EQ(pct[2].second.getInt(), 80); + EXPECT_EQ(pct[3].second.getInt(), 80); + + percentiles({80, 50}, pct); + EXPECT_EQ(pct[0].second.getInt(), 50); + EXPECT_EQ(pct[1].second.getInt(), 50); + EXPECT_EQ(pct[2].second.getInt(), 80); + EXPECT_EQ(pct[3].second.getInt(), 80); +} + +TEST(PercentileTest, Normalize) { + PercentileList pct = { + {10, SampleValue(10)}, {50, SampleValue(100.0)}, {90, SampleValue(2000)}}; + + normalize(pct, 2.5); + + EXPECT_EQ(pct[0].second.getInt(), 25); + EXPECT_EQ((int)pct[1].second.getDouble(), 250); + EXPECT_EQ(pct[2].second.getInt(), 5000); +} + +TEST(EventTest, SumSamples) { + Event ev; + ev.instanceCount = 4; + auto t = system_clock::now(); + ev.addSample(t, {1, 2, 3, 4}); + ev.addSample(t, {10, 20, 30, 40}); + ev.addSample(t, {100, 200, 300, 400}); + + EXPECT_EQ(ev.sumInstance(0, {0, 0, 3}), 1); + EXPECT_EQ(ev.sumInstance(0, {0, 1, 3}), 10); + EXPECT_EQ(ev.sumInstance(0, {0, 2, 3}), 100); + + EXPECT_EQ(ev.sumInstance(0, {0, 0, 1}), 111); + + EXPECT_EQ(ev.sumInstance(3, {0, 0, 1}), 444); + + // Non-zero offset + EXPECT_EQ(ev.sumInstance(0, {1, 0, 2}), 10); + EXPECT_EQ(ev.sumInstance(0, {1, 1, 2}), 100); + EXPECT_EQ(ev.sumInstance(0, {1, 0, 1}), 110); + + ev.addSample(t, {1000, 2000, 3000, 4000}); + + EXPECT_EQ(ev.sumInstance(0, {1, 0, 3}), 10); + EXPECT_EQ(ev.sumInstance(0, {1, 1, 3}), 100); + EXPECT_EQ(ev.sumInstance(0, {2, 1, 2}), 1000); + EXPECT_EQ(ev.sumInstance(0, {2, 0, 1}), 1100); + + EXPECT_EQ(ev.sumAll({0, 0, 4}), 10); + EXPECT_EQ(ev.sumAll({1, 0, 3}), 100); + EXPECT_EQ(ev.sumAll({2, 1, 2}), 10000); + EXPECT_EQ(ev.sumAll({0, 1, 2}), 11000); + EXPECT_EQ(ev.sumAll({0, 0, 1}), 11110); +} + +TEST(EventTest, Percentiles) { + Event ev; + ev.instanceCount = 4; + auto t = system_clock::now(); + ev.addSample(t, {3, 2, 1, 4}); + ev.addSample(t, {30, 20, 10, 40}); + ev.addSample(t, {300, 200, 100, 400}); + + PercentileList pct = { + {10, SampleValue(0)}, {50, SampleValue(0)}, {90, SampleValue(0)}}; + + ev.percentiles(pct, {0, 0, 3}); + EXPECT_EQ(pct[0].second.getInt(), 1); + EXPECT_EQ(pct[1].second.getInt(), 3); + EXPECT_EQ(pct[2].second.getInt(), 4); + + ev.percentiles(pct, {0, 0, 1}); + EXPECT_EQ(pct[0].second.getInt(), 111); + EXPECT_EQ(pct[1].second.getInt(), 333); + EXPECT_EQ(pct[2].second.getInt(), 444); +} + +class MockCuptiMetrics : public CuptiMetricApi { + public: + MockCuptiMetrics() : CuptiMetricApi(0) {} + MOCK_METHOD1(idFromName, CUpti_MetricID(const std::string& name)); + MOCK_METHOD1( + events, + std::map(CUpti_MetricID metric_id)); + MOCK_METHOD1(valueKind, CUpti_MetricValueKind(CUpti_MetricID metric)); + MOCK_METHOD1( + evaluationMode, + CUpti_MetricEvaluationMode(CUpti_MetricID metric)); + MOCK_METHOD5( + calculate, + SampleValue( + CUpti_MetricID metric, + CUpti_MetricValueKind kind, + std::vector& events, + std::vector& values, + int64_t duration)); +}; + +TEST(MetricTest, Calculate) { + using ::testing::Return; + MockCuptiMetrics metrics; + + // The events used for the ipc metrics: instructions and cycles + // Pretend we have 2 SMs and 2 samples of each event + Event instr("instructions"); + instr.instanceCount = 2; + auto t = system_clock::now(); + instr.addSample(t, {100, 200}); + instr.addSample(t, {300, 400}); + + Event cycles("cycles"); + cycles.instanceCount = 2; + cycles.addSample(t, {1000, 1200}); + cycles.addSample(t, {1300, 1300}); + + // 2 & 3 are the event ids we specified in the metric + std::map events; + events[2] = std::move(instr); + events[3] = std::move(cycles); + + // Define an ipc metric + EXPECT_CALL(metrics, valueKind(1)) + .Times(1) + .WillOnce(Return(CUPTI_METRIC_VALUE_KIND_DOUBLE)); + Metric m( + "ipc", 1, {2, 3}, CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE, metrics); + + // Calculate metric for first sample + // Since evaluation mode is CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE, + // Cupti API will be called three times: once for each SM (2) and once + // to get the total across SMs. + std::vector ids = {2, 3}; + std::vector vals = {100, 1000}; + EXPECT_CALL( + metrics, calculate(1, CUPTI_METRIC_VALUE_KIND_DOUBLE, ids, vals, 1000)) + .Times(1) + .WillOnce(Return(SampleValue(0.1))); + vals = {200, 1200}; + EXPECT_CALL( + metrics, calculate(1, CUPTI_METRIC_VALUE_KIND_DOUBLE, ids, vals, 1000)) + .Times(1) + .WillOnce(Return(SampleValue(0.17))); + vals = {300, 2200}; + EXPECT_CALL( + metrics, calculate(1, CUPTI_METRIC_VALUE_KIND_DOUBLE, ids, vals, 1000)) + .Times(1) + .WillOnce(Return(SampleValue(0.14))); + auto v = m.calculate(events, nanoseconds(1000), {0, 0, 2}); + + EXPECT_EQ(v.perInstance.size(), 2); + EXPECT_EQ(v.perInstance[0].getDouble(), 0.1); + EXPECT_EQ(v.perInstance[1].getDouble(), 0.17); + EXPECT_EQ(v.total.getDouble(), 0.14); + + // Calculate second sample. + // Change evaluation mode to CUPTI_METRIC_EVALUATION_MODE_AGGREGATE. + // Now we should get only one call to the Cupti API for the total. + EXPECT_CALL(metrics, valueKind(1)) + .Times(1) + .WillOnce(Return(CUPTI_METRIC_VALUE_KIND_DOUBLE)); + Metric m2("ipc", 1, {2, 3}, CUPTI_METRIC_EVALUATION_MODE_AGGREGATE, metrics); + vals = {700, 2600}; + EXPECT_CALL( + metrics, calculate(1, CUPTI_METRIC_VALUE_KIND_DOUBLE, ids, vals, 1000)) + .Times(1) + .WillOnce(Return(SampleValue(0.27))); + v = m2.calculate(events, nanoseconds(1000), {0, 1, 2}); + + EXPECT_EQ(v.perInstance.size(), 1); + EXPECT_EQ(v.perInstance[0].getDouble(), 0.27); + EXPECT_EQ(v.total.getDouble(), 0.27); +} + +class MockCuptiEvents : public CuptiEventApi { + public: + MOCK_METHOD1( + createGroupSets, + CUpti_EventGroupSets*(std::vector& ids)); + MOCK_METHOD1(destroyGroupSets, void(CUpti_EventGroupSets* sets)); + MOCK_METHOD0(setContinuousMode, bool()); + MOCK_METHOD1(enablePerInstance, void(CUpti_EventGroup eventGroup)); + MOCK_METHOD1(instanceCount, uint32_t(CUpti_EventGroup eventGroup)); + MOCK_METHOD1(enableGroupSet, void(CUpti_EventGroupSet& set)); + MOCK_METHOD1(disableGroupSet, void(CUpti_EventGroupSet& set)); + MOCK_METHOD3( + readEvent, + void(CUpti_EventGroup g, CUpti_EventID id, std::vector& vals)); + MOCK_METHOD1(eventsInGroup, std::vector(CUpti_EventGroup g)); + MOCK_METHOD1(eventId, CUpti_EventID(const std::string& name)); +}; + +TEST(EventGroupSetTest, CollectSample) { + using ::testing::_; + using ::testing::Return; + using ::testing::SetArgPointee; + const CUpti_EventGroup g1{nullptr}; + const CUpti_EventGroup g2{reinterpret_cast(0x1000)}; + CUpti_EventGroup groups[] = {g1, g2}; + CUpti_EventGroupSet set; + set.eventGroups = groups; + set.numEventGroups = 2; + + std::map events; + Event instr("instructions"); + events[4] = std::move(instr); + Event cycles("cycles"); + events[5] = std::move(cycles); + Event branches("branches"); + events[10] = std::move(branches); + + MockCuptiEvents cupti_events; + EXPECT_CALL(cupti_events, enablePerInstance(g1)).Times(1); + EXPECT_CALL(cupti_events, enablePerInstance(g2)).Times(1); + EXPECT_CALL(cupti_events, instanceCount(g1)).Times(1).WillOnce(Return(80)); + EXPECT_CALL(cupti_events, instanceCount(g2)).Times(1).WillOnce(Return(40)); + std::vector events_in_group1 = {4, 5}; + EXPECT_CALL(cupti_events, eventsInGroup(g1)) + .Times(1) + .WillOnce(Return(events_in_group1)); + std::vector events_in_group2 = {10}; + EXPECT_CALL(cupti_events, eventsInGroup(g2)) + .Times(1) + .WillOnce(Return(events_in_group2)); + EventGroupSet group_set(set, events, cupti_events); + + EXPECT_EQ(group_set.groupCount(), 2); + EXPECT_EQ(events[4].instanceCount, 80); + EXPECT_EQ(events[5].instanceCount, 80); + EXPECT_EQ(events[10].instanceCount, 40); + + // This should not cause any Cupti API action as the group + // set is already disabled + group_set.setEnabled(false); + + // Activate group set - if activated twice, only the first + // should cause cupti API to be called + EXPECT_CALL(cupti_events, enableGroupSet(_)).Times(1); + group_set.setEnabled(false); + group_set.setEnabled(true); + + EXPECT_CALL(cupti_events, eventsInGroup(g1)) + .Times(1) + .WillOnce(Return(events_in_group1)); + EXPECT_CALL(cupti_events, eventsInGroup(g2)) + .Times(1) + .WillOnce(Return(events_in_group2)); + EXPECT_CALL(cupti_events, readEvent(g1, 4, _)).Times(1); + EXPECT_CALL(cupti_events, readEvent(g1, 5, _)).Times(1); + EXPECT_CALL(cupti_events, readEvent(g2, 10, _)).Times(1); + group_set.collectSample(); + + EXPECT_EQ(events[4].sampleCount(), 1); + EXPECT_EQ(events[5].sampleCount(), 1); + EXPECT_EQ(events[10].sampleCount(), 1); +} + +class MockLogger : public SampleListener { + public: + MOCK_METHOD3(handleSample, void(int device, const Sample& sample, bool from_new_version)); + MOCK_METHOD1(update, void(const Config& config)); +}; + +class EventProfilerTest : public ::testing::Test { + protected: + void SetUp() override { + auto cupti_events_ptr = std::make_unique(); + auto cupti_metrics_ptr = std::make_unique(); + cuptiEvents_ = cupti_events_ptr.get(); + cuptiMetrics_ = cupti_metrics_ptr.get(); + loggers_.push_back(std::make_unique()); + onDemandLoggers_.push_back(std::make_unique()); + profiler_ = std::make_unique( + std::move(cupti_events_ptr), + std::move(cupti_metrics_ptr), + loggers_, + onDemandLoggers_); + + for (int i = 0; i < kEventGroupCount; i++) { + eventGroups_[i] = &eventGroups_[i]; + } + for (int i = 0; i < kGroupSetCount; i++) { + // Default size to 1 but can be changed by test + groupSet_[i].numEventGroups = 1; + // Two groups per set + groupSet_[i].eventGroups = &eventGroups_[i * 2]; + } + groupSets_.numSets = 1; + groupSets_.sets = groupSet_; + } + + MockCuptiEvents* cuptiEvents_; + MockCuptiMetrics* cuptiMetrics_; + std::vector> loggers_; + std::vector> onDemandLoggers_; + constexpr static int kEventGroupCount = 4; + constexpr static int kGroupSetCount = 2; + CUpti_EventGroup eventGroups_[kEventGroupCount]; + CUpti_EventGroupSet groupSet_[kGroupSetCount]; + CUpti_EventGroupSets groupSets_; + std::unique_ptr profiler_; +}; + +TEST_F(EventProfilerTest, ConfigureFailure) { + using namespace testing; + + // Default config has no counters enabled. + // Check that profiler remains disabled. + Config cfg; + profiler_->configure(cfg, nullptr); + + EXPECT_FALSE(profiler_->enabled()); + + // There is no event named "cycles" + // In this case the profiler should print a warning and remain disabled + bool parsed = cfg.parse("EVENTS = cycles"); + EXPECT_TRUE(parsed); + + // EventProfiler should handle exception thrown from createGroupSets + // Configuration will be applied twice - once for combined base + on-demand + // and then again falling back to base + EXPECT_CALL(*cuptiEvents_, eventId("cycles")) + .Times(2) + .WillRepeatedly(Return(0)); + std::vector ids = {0}; + EXPECT_CALL(*cuptiEvents_, createGroupSets(ids)) + .Times(2) + .WillRepeatedly(Throw( + std::system_error(EINVAL, std::generic_category(), "Event ID"))); + profiler_->configure(cfg, nullptr); + + EXPECT_FALSE(profiler_->enabled()); +} + +TEST_F(EventProfilerTest, ConfigureBase) { + using namespace testing; + + // Test normal path, simple base config + Config cfg; + bool parsed = cfg.parse("EVENTS = elapsed_cycles_sm"); + EXPECT_TRUE(parsed); + + // One valid event - expect one call to eventId and createGroupSets + EXPECT_CALL(*cuptiEvents_, eventId("elapsed_cycles_sm")) + .Times(1) + .WillOnce(Return(5)); + std::vector ids = {5}; + EXPECT_CALL(*cuptiEvents_, createGroupSets(ids)) + .Times(1) + .WillOnce(Return(&groupSets_)); + EXPECT_CALL(*cuptiEvents_, enablePerInstance(eventGroups_[0])).Times(1); + EXPECT_CALL(*cuptiEvents_, instanceCount(eventGroups_[0])) + .Times(1) + .WillOnce(Return(80)); + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[0])) + .Times(1) + .WillOnce(Return(ids)); + EXPECT_CALL(*cuptiEvents_, enableGroupSet(_)).Times(1); + + profiler_->configure(cfg, nullptr); + + EXPECT_TRUE(profiler_->enabled()); +} + +TEST_F(EventProfilerTest, ConfigureOnDemand) { + using namespace testing; + + // Test base + on-demand config, one event and one metric + Config cfg, on_demand_cfg; + bool parsed = cfg.parse(R"( + EVENTS = active_cycles + SAMPLE_PERIOD_MSECS=500 + REPORT_PERIOD_SECS=10 + SAMPLES_PER_REPORT=5 + )"); + EXPECT_TRUE(parsed); + + parsed = on_demand_cfg.parse(R"( + METRICS = ipc + EVENTS_DURATION_SECS=60 + SAMPLE_PERIOD_MSECS=200 + MULTIPLEX_PERIOD_MSECS=2000 + REPORT_PERIOD_SECS=3 + SAMPLES_PER_REPORT=10 + )"); + EXPECT_TRUE(parsed); + + // One event + EXPECT_CALL(*cuptiEvents_, eventId("active_cycles")) + .Times(1) + .WillOnce(Return(3)); + // One metric + EXPECT_CALL(*cuptiMetrics_, idFromName("ipc")).Times(1).WillOnce(Return(10)); + std::map ipc_events; + ipc_events[4] = "instructions"; + ipc_events[5] = "elapsed_cycles_sm"; + EXPECT_CALL(*cuptiMetrics_, events(10)).Times(1).WillOnce(Return(ipc_events)); + EXPECT_CALL(*cuptiMetrics_, evaluationMode(10)) + .Times(1) + .WillOnce(Return(CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE)); + EXPECT_CALL(*cuptiMetrics_, valueKind(10)) + .Times(1) + .WillOnce(Return(CUPTI_METRIC_VALUE_KIND_DOUBLE)); + std::vector ids = {3, 4, 5}; + groupSet_[0].numEventGroups = 2; + groupSets_.numSets = 2; + EXPECT_CALL(*cuptiEvents_, createGroupSets(ids)) + .Times(1) + .WillOnce(Return(&groupSets_)); + // Specified CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE per instance above + // So check that it's enabled + EXPECT_CALL(*cuptiEvents_, enablePerInstance(eventGroups_[0])).Times(1); + EXPECT_CALL(*cuptiEvents_, enablePerInstance(eventGroups_[1])).Times(1); + EXPECT_CALL(*cuptiEvents_, enablePerInstance(eventGroups_[2])).Times(1); + std::vector ids_g1{3}, ids_g2{4}, ids_g3{5}; + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[0])) + .Times(1) + .WillOnce(Return(ids_g1)); + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[1])) + .Times(1) + .WillOnce(Return(ids_g2)); + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[2])) + .Times(1) + .WillOnce(Return(ids_g3)); + EXPECT_CALL(*cuptiEvents_, enableGroupSet(_)).Times(1); + + profiler_->configure(cfg, &on_demand_cfg); + + EXPECT_TRUE(profiler_->enabled()); + EXPECT_EQ(profiler_->samplePeriod().count(), 250); + EXPECT_EQ(profiler_->multiplexPeriod().count(), 1000); + EXPECT_EQ(profiler_->reportPeriod().count(), 10000); + EXPECT_EQ(profiler_->onDemandReportPeriod().count(), 4000); +} + +TEST_F(EventProfilerTest, ReportSample) { + using namespace testing; + + // Test base + on-demand config, one event and one metric + Config cfg, on_demand_cfg; + bool parsed = cfg.parse("EVENTS = active_cycles"); + EXPECT_TRUE(parsed); + + parsed = on_demand_cfg.parse(R"( + METRICS = ipc + EVENTS_DURATION_SECS=60 + )"); + EXPECT_TRUE(parsed); + + // One event + EXPECT_CALL(*cuptiEvents_, eventId("active_cycles")) + .Times(1) + .WillOnce(Return(3)); + // One metric + EXPECT_CALL(*cuptiMetrics_, idFromName("ipc")).Times(1).WillOnce(Return(10)); + std::map ipc_events; + ipc_events[4] = "instructions"; + ipc_events[5] = "elapsed_cycles_sm"; + EXPECT_CALL(*cuptiMetrics_, events(10)).Times(1).WillOnce(Return(ipc_events)); + EXPECT_CALL(*cuptiMetrics_, evaluationMode(10)) + .Times(1) + .WillOnce(Return(CUPTI_METRIC_EVALUATION_MODE_PER_INSTANCE)); + EXPECT_CALL(*cuptiMetrics_, valueKind(10)) + .Times(1) + .WillOnce(Return(CUPTI_METRIC_VALUE_KIND_DOUBLE)); + std::vector ids = {3, 4, 5}; + groupSet_[0].numEventGroups = 2; + groupSets_.numSets = 2; + EXPECT_CALL(*cuptiEvents_, createGroupSets(ids)) + .Times(1) + .WillOnce(Return(&groupSets_)); + EXPECT_CALL(*cuptiEvents_, instanceCount(_)) + .Times(3) + .WillRepeatedly(Return(4)); + std::vector ids_g1{3}, ids_g2{4}, ids_g3{5}; + // These will be called by collectSample() as well, which is called twice + // per group set + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[0])) + .Times(3) + .WillRepeatedly(Return(ids_g1)); + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[1])) + .Times(3) + .WillRepeatedly(Return(ids_g2)); + EXPECT_CALL(*cuptiEvents_, eventsInGroup(eventGroups_[2])) + .Times(3) + .WillRepeatedly(Return(ids_g3)); + EXPECT_CALL(*cuptiEvents_, enableGroupSet(_)).Times(1); + + profiler_->configure(cfg, &on_demand_cfg); + + EXPECT_TRUE(profiler_->enabled()); + + EXPECT_CALL(*cuptiEvents_, readEvent(_, _, _)) + .Times(6) + .WillRepeatedly(Invoke( + [](CUpti_EventGroup g, CUpti_EventID id, std::vector& vals) { + vals = {1, 2, 3, 4}; + })); + + // Need to collect four times - twice for each group set + profiler_->collectSample(); + profiler_->collectSample(); + EXPECT_CALL(*cuptiEvents_, disableGroupSet(_)).Times(1); + EXPECT_CALL(*cuptiEvents_, enableGroupSet(_)).Times(1); + profiler_->enableNextCounterSet(); + profiler_->collectSample(); + profiler_->collectSample(); + + std::vector ipc_ids = {4, 5}; + // Called once for each instance (4) and once for the total. + // x2 since we recompute per logger. + EXPECT_CALL( + *cuptiMetrics_, + calculate(10, CUPTI_METRIC_VALUE_KIND_DOUBLE, ipc_ids, _, 2000000000)) + .Times(10) + .WillRepeatedly(Return(SampleValue(0.3))); + auto& logger = dynamic_cast(*loggers_[0]); + EXPECT_CALL(logger, handleSample(0, _, _)) + .Times(1) + .WillOnce(Invoke([](int device, const Sample& sample, bool from_new_version) { + // Sample will include all stats - logger must pick the + // ones it wants. + EXPECT_EQ(sample.stats.size(), 4); + EXPECT_EQ(sample.stats[0].name, "active_cycles"); + EXPECT_EQ(sample.stats[1].name, "instructions"); + EXPECT_EQ(sample.stats[2].name, "elapsed_cycles_sm"); + EXPECT_EQ(sample.stats[3].name, "ipc"); + // 2 samples, each with values {1, 2, 3, 4} + // i.e. {2, 4, 6, 8} total + EXPECT_EQ(sample.stats[0].total.getInt(), 20); + EXPECT_EQ(sample.stats[0].percentileValues[0].second.getInt(), 2); + EXPECT_EQ(sample.stats[0].percentileValues.back().second.getInt(), 8); + // ipc is always 0.3 from mocked calculate function above + EXPECT_EQ(sample.stats[3].total.getDouble(), 0.3); + EXPECT_EQ(sample.stats[3].percentileValues[0].second.getDouble(), 0.3); + EXPECT_EQ( + sample.stats[3].percentileValues.back().second.getDouble(), 0.3); + })); + profiler_->reportSamples(); + + auto& on_demand_logger = dynamic_cast(*onDemandLoggers_[0]); + EXPECT_CALL(on_demand_logger, handleSample(0, _, _)).Times(1); + profiler_->reportOnDemandSamples(); + + EXPECT_CALL(*cuptiEvents_, disableGroupSet(_)).Times(1); +} diff --git a/plugins/tensorboard-plugins/libkineto/test/LoggerObserverTest.cpp b/plugins/tensorboard-plugins/libkineto/test/LoggerObserverTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..30ba4a824af10401a45100b0b39cec54fcf98680 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/LoggerObserverTest.cpp @@ -0,0 +1,96 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include + +// TODO(T90238193) +// @lint-ignore-every CLANGTIDY facebook-hte-RelativeInclude +#include "include/libkineto.h" +#include "src/Logger.h" +#include "LoggerCollector.h" + +using namespace KINETO_NAMESPACE; + +#if !USE_GOOGLE_LOG + +constexpr char InfoTestStr[] = "Checking LOG(INFO)"; +constexpr char WarningTestStr[] = "Checking LOG(WARNING)"; +constexpr char ErrorTestStr[] = "Checking LOG(ERROR)"; + +TEST(LoggerObserverTest, SingleCollectorObserver) { + // Add a LoggerObserverCollector to collect all logs during the trace. + std::unique_ptr lCollector = std::make_unique(); + Logger::addLoggerObserver(lCollector.get()); + + LOG(INFO) << InfoTestStr; + LOG(WARNING) << WarningTestStr; + LOG(ERROR) << ErrorTestStr; + + auto LoggerMD = lCollector->extractCollectorMetadata(); + EXPECT_TRUE(LoggerMD[LoggerOutputType::INFO][0].find(InfoTestStr) != std::string::npos); + EXPECT_TRUE(LoggerMD[LoggerOutputType::WARNING][0].find(WarningTestStr) != std::string::npos); + EXPECT_TRUE(LoggerMD[LoggerOutputType::ERROR][0].find(ErrorTestStr) != std::string::npos); + + Logger::removeLoggerObserver(lCollector.get()); +} + +#define NUM_OF_MESSAGES_FOR_EACH_TYPE 10 +#define NUM_OF_WRITE_THREADS 200 + +// Writes NUM_OF_MESSAGES_FOR_EACH_TYPE messages for each INFO, WARNING, and ERROR. +// NOLINTNEXTLINE(clang-diagnostic-unused-parameter) +void* writeSeveralMessages(void* ptr) { + for(int i=0; i lc1 = std::make_unique(); + std::unique_ptr lc2 = std::make_unique(); + std::unique_ptr lc3 = std::make_unique(); + std::unique_ptr lc4 = std::make_unique(); + Logger::addLoggerObserver(lc1.get()); + Logger::addLoggerObserver(lc2.get()); + Logger::addLoggerObserver(lc3.get()); + Logger::addLoggerObserver(lc4.get()); + + // Launch NUM_OF_WRITE_THREADS threads writing several messages. + pthread_t ListOfThreads[NUM_OF_WRITE_THREADS]; + for (int i=0; iextractCollectorMetadata(); + int InfoCount = 0, WarnCount = 0, ErrorCount = 0; + for (auto& md : lc1MD) { + InfoCount += md.first == LoggerOutputType::INFO ? md.second.size() : 0; + WarnCount += md.first == LoggerOutputType::WARNING ? md.second.size() : 0; + ErrorCount += md.first == LoggerOutputType::ERROR ? md.second.size() : 0; + } + + EXPECT_EQ(InfoCount, NUM_OF_WRITE_THREADS * NUM_OF_MESSAGES_FOR_EACH_TYPE); + EXPECT_EQ(WarnCount, NUM_OF_WRITE_THREADS * NUM_OF_MESSAGES_FOR_EACH_TYPE); + EXPECT_EQ(ErrorCount, NUM_OF_WRITE_THREADS * NUM_OF_MESSAGES_FOR_EACH_TYPE); + + Logger::removeLoggerObserver(lc1.get()); + Logger::removeLoggerObserver(lc2.get()); + Logger::removeLoggerObserver(lc3.get()); + Logger::removeLoggerObserver(lc4.get()); +} + +#endif // !USE_GOOGLE_LOG + +int main(int argc, char **argv) { + ::testing::InitGoogleTest(&argc, argv); + return RUN_ALL_TESTS(); +} diff --git a/plugins/tensorboard-plugins/libkineto/test/MockActivitySubProfiler.cpp b/plugins/tensorboard-plugins/libkineto/test/MockActivitySubProfiler.cpp new file mode 100644 index 0000000000000000000000000000000000000000..89f1d536ca8d6d794b7ffc7402001d0e3d4d9c06 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/MockActivitySubProfiler.cpp @@ -0,0 +1,49 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include +#include +#include + +#include "test/MockActivitySubProfiler.h" + +namespace libkineto { + +const std::set supported_activities {ActivityType::CPU_OP}; +const std::string profile_name{"MockProfiler"}; + +void MockProfilerSession::processTrace(ActivityLogger& logger) { + for (const auto& activity: activities()) { + activity.log(logger); + } +} + +const std::string& MockActivityProfiler::name() const { + return profile_name; +} + +const std::set& MockActivityProfiler::availableActivities() const { + return supported_activities; +} + +MockActivityProfiler::MockActivityProfiler( + std::vector& activities) : + test_activities_(activities) {}; + +std::unique_ptr MockActivityProfiler::configure( + const std::set& /*activity_types*/, + const Config& /*config*/) { + auto session = std::make_unique(); + session->set_test_activities(std::move(test_activities_)); + return session; +}; + +std::unique_ptr MockActivityProfiler::configure( + int64_t /*ts_ms*/, + int64_t /*duration_ms*/, + const std::set& activity_types, + const Config& config) { + return configure(activity_types, config); +}; + +} // namespace libkineto + diff --git a/plugins/tensorboard-plugins/libkineto/test/MockActivitySubProfiler.h b/plugins/tensorboard-plugins/libkineto/test/MockActivitySubProfiler.h new file mode 100644 index 0000000000000000000000000000000000000000..36eaa13d1a544c624a2f4bb053891d055686ebf4 --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/MockActivitySubProfiler.h @@ -0,0 +1,72 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#pragma once + +#include +#include +#include + +#include "include/IActivityProfiler.h" + +namespace libkineto { + +class MockProfilerSession: public IActivityProfilerSession { + + public: + explicit MockProfilerSession() {} + + void start() override { + start_count++; + status_ = TraceStatus::RECORDING; + } + + void stop() override { + stop_count++; + status_ = TraceStatus::PROCESSING; + } + + std::vector& activities() override { + return test_activities_; + } + + std::vector errors() override { + return {}; + } + + void processTrace(ActivityLogger& logger) override; + + void set_test_activities(std::vector&& acs) { + test_activities_ = std::move(acs); + } + + int start_count = 0; + int stop_count = 0; + private: + std::vector test_activities_; +}; + + +class MockActivityProfiler: public IActivityProfiler { + + public: + explicit MockActivityProfiler(std::vector& activities); + + const std::string& name() const override; + + const std::set& availableActivities() const override; + + std::unique_ptr configure( + const std::set& activity_types, + const Config& config) override; + + std::unique_ptr configure( + int64_t ts_ms, + int64_t duration_ms, + const std::set& activity_types, + const Config& config) override; + + private: + std::vector test_activities_; +}; + +} // namespace libkineto diff --git a/plugins/tensorboard-plugins/libkineto/test/PidInfoTest.cpp b/plugins/tensorboard-plugins/libkineto/test/PidInfoTest.cpp new file mode 100644 index 0000000000000000000000000000000000000000..b86cfb36d0581ba9a8a03a09724b181c2fd2e88a --- /dev/null +++ b/plugins/tensorboard-plugins/libkineto/test/PidInfoTest.cpp @@ -0,0 +1,27 @@ +// (c) Meta Platforms, Inc. and affiliates. Confidential and proprietary. + +#include "include/ThreadUtil.h" + +#include +#include + +#include +#include + +using namespace KINETO_NAMESPACE; + +TEST(ThreadNameTest, setAndGet) { + setThreadName("ThreadNameTest"); + EXPECT_EQ(getThreadName(), "ThreadNameTest"); + + setThreadName(""); + EXPECT_EQ(getThreadName(), ""); + + // Spaces etc are ok + setThreadName("Name w/ spaces"); + EXPECT_EQ(getThreadName(), "Name w/ spaces"); + + // More than 16 chars is not OK + setThreadName("More than 16 characters"); + EXPECT_EQ(getThreadName(), "Name w/ spaces"); +} diff --git a/plugins/tensorboard-plugins/tb_plugin/README.md b/plugins/tensorboard-plugins/tb_plugin/README.md index b4b417c4d21899c195674be08d196a5d9f11021a..152a91b9588bd002813e5fb04da2e8a4c909e9ae 100644 --- a/plugins/tensorboard-plugins/tb_plugin/README.md +++ b/plugins/tensorboard-plugins/tb_plugin/README.md @@ -17,7 +17,7 @@ 2. 从源代码安装 * 从仓库下载源码: - `git clone https://gitee.com/ascend/mstt.git` + `git clone https://gitee.com/ascend/att.git` * 进入目录 `/plugins/tensorboard_plugins/tb_plugin` 下. * 编译前端代码 @@ -128,37 +128,25 @@ ##### Kernel View - Kernel View 展示算子在加速核上运行的详细信息。此视图包含两张饼图和两张表,可通过 Group By 切换表格数据:算子的详情表以及统计表。 - - * 上方为饼图,展示耗时最多的数个算子耗时比例信息(左侧饼图)和算子执行在各类加速核上耗时百分比(右侧饼图) + Kernel View展示算子在加速核上运行的详细信息。 ![Alt text](./docs/images/kernel_view.PNG) - * 选择 Group By 为 All 时,展示算子详情表,部分字段说明如下: + * Calls: 算子调度的次数。 + + * Accelerator Core: 计算核。 - | 字段名 | 说明 | - | ---------------- | -------------------------------------- | - | Step Id | 标识在哪个 Step 采集的数据 | - | Name | 运行在 npu 上的算子名称 | - | Type | 算子类型 | - | Accelerator Core | AI 加速核类型,包括 AI Core、AI CPU 等 | - | Start Time(us) | 算子执行开始时间 | - | Duration(us) | 当前算子执行耗时 | - | Wait Time(us) | 算子执行等待时间 | - | Block Dim | 运行切分数量,对应任务执行时的核数 | + * Block Dim: Task运行切分数量,对应Task运行时核数。 ![Alt text](./docs/images/kernel_view_group_by_statistic.PNG) - * 选择 Group By 为 Statistic 时,展示算子信息统计表,此表格展示各算子的执行统计信息,字段说明如下: + * Accelerator Core Utilization: 算子执行在各类core上耗时百分比。 - | 字段名 | 说明 | - | ---------------- | -------| - | Name | 运行在 npu 上的算子名称 | - | Calls | 算子执行次数 | - | Total Duration(us) | 算子执行总时间 | - | Min Duration(us) | 算子执行的最小时间 | - | Max Duration(us) | 算子执行的最大时间 | - | Avg Duration(us) | 算子执行平均时间 | + * Name: 运行在npu上的算子名称。 + + * Total Duration、 Max Duration、Avg Duration、Min Duration: 算子调用总耗时、最大耗时、平均耗时以及最小耗时。 + + 此视图包含两张饼图和两张表,可通过Group By切换表格数据:算子的详细表以及统计表。 ##### Trace View @@ -174,7 +162,7 @@ ![Alt text](./docs/images/trace_view_launch.PNG) - 选择只展示async_npu,可以查看框架侧算子与昇腾硬件上执行的算子的下发执行关系。 + 选择只展示async_nup,可以查看框架侧算子与昇腾硬件上执行的算子的关联关系。 ![Alt text](./docs/images/trace_view_npu_utilization.PNG) @@ -280,7 +268,7 @@ ###### 文件导入 界面分为左侧边栏和右侧展示界面。点击左侧的Import Files或在左侧未勾选文件时点击右侧界面中心的Import Files字体,将会弹出系统文件资源管理窗,可以上传需要比对的模型网络训练日志文件。 - **注:当前最多支持上传6个文件,单个文件大小不能超过50MB。** + 注:当前最多支持上传6个文件,单个文件大小不能超过50MB。 ![Alt text](./docs/images/accuracy.PNG) ###### 已上传文件操作 @@ -331,8 +319,8 @@ * 比对方式有三种,通过Comparison Setting进行设定。 * Comparison Normal:相同iteration,后选择文件的loss值减去先选择文件的loss值。 - * Comparison Absolute:相同iteration,两个文件的loss的差值的绝对值。 - * Comparison Relative:相同iteration,两个文件的loss的差值的绝对值 / 先选择文件的loss值。 + * Comparison Normal:相同iteration,两个文件的loss的差值的绝对值。 + * Comparison Normal:相同iteration,两个文件的loss的差值的绝对值 / 先选择文件的loss值。 ### 公网URL说明 diff --git "a/plugins/tensorboard-plugins/tb_plugin/docs/\345\205\254\347\275\221URL\350\257\264\346\230\216.xlsx" "b/plugins/tensorboard-plugins/tb_plugin/docs/\345\205\254\347\275\221URL\350\257\264\346\230\216.xlsx" index de0bb25fe155aa188e5670a377311e96168586e8..b7a8bf1fd0e7eec640e46af76e16c6a228f335ba 100644 Binary files "a/plugins/tensorboard-plugins/tb_plugin/docs/\345\205\254\347\275\221URL\350\257\264\346\230\216.xlsx" and "b/plugins/tensorboard-plugins/tb_plugin/docs/\345\205\254\347\275\221URL\350\257\264\346\230\216.xlsx" differ diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/prettier.json b/plugins/tensorboard-plugins/tb_plugin/fe/prettier.json index ef5789da9458a66e7dacc1dfdeeb764642331734..6049640793f6907bbd38c7065360df0ac24d64d4 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/prettier.json +++ b/plugins/tensorboard-plugins/tb_plugin/fe/prettier.json @@ -1,12 +1,12 @@ { - "parser": "typescript", - "semi": true, - "singleQuote": true, - "jsxSingleQuote": false, - "bracketSpacing": true, - "tabWidth": 2, - "useTabs": false, - "trailingComma": "all", - "proseWrap": "always", - "endOfLine": "lf" + "parser": "typescript", + "semi": false, + "singleQuote": true, + "jsxSingleQuote": false, + "bracketSpacing": true, + "tabWidth": 2, + "useTabs": false, + "trailingComma": "none", + "proseWrap": "always", + "endOfLine": "lf" } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/scripts/add_header.py b/plugins/tensorboard-plugins/tb_plugin/fe/scripts/add_header.py index 69bc6c05541cbaff0fc88eb7456f501fb5bd4f71..03fb7c15aea6bf361b241910fa4529bc0996286c 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/scripts/add_header.py +++ b/plugins/tensorboard-plugins/tb_plugin/fe/scripts/add_header.py @@ -1,23 +1,4 @@ -# ------------------------------------------------------------------------- -# Copyright (c) Microsoft Corporation. -# Copyright(c) 2023 Huawei Technologies. -# All rights reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Modifications: Add visualization of PyTorch Ascend profiling. -# -------------------------------------------------------------------------- -# !/usr/bin/env python +#!/usr/bin/env python import glob import os import sys diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/api.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/api.ts index 29cde96ebbde928cde967b3b1b365d12e74ee734..b00601fba8852eeed9be052c6ed8adc106d49215 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/api.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/api.ts @@ -15,7 +15,7 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ @@ -33,11 +33,11 @@ * Do not edit the file manually. */ -import * as url from 'url'; -import * as portableFetch from 'portable-fetch'; -import { Configuration } from './configuration'; +import * as url from 'url' +import * as portableFetch from 'portable-fetch' +import { Configuration } from './configuration' -const BASE_PATH = '.'.replace(/\/+$/, ''); +const BASE_PATH = '.'.replace(/\/+$/, '') /** * @@ -47,8 +47,8 @@ export const COLLECTION_FORMATS = { csv: ',', ssv: ' ', tsv: '\t', - pipes: '|', -}; + pipes: '|' +} /** * @@ -56,7 +56,7 @@ export const COLLECTION_FORMATS = { * @interface FetchAPI */ export interface FetchAPI { - (url: string, init?: any): Promise; + (url: string, init?: any): Promise } /** @@ -65,8 +65,8 @@ export interface FetchAPI { * @interface FetchArgs */ export interface FetchArgs { - url: string; - options: any; + url: string + options: any } /** @@ -75,7 +75,7 @@ export interface FetchArgs { * @class BaseAPI */ export class BaseAPI { - protected configuration: Configuration; + protected configuration: Configuration constructor( configuration?: Configuration, @@ -83,8 +83,8 @@ export class BaseAPI { protected fetch: FetchAPI = portableFetch ) { if (configuration) { - this.configuration = configuration; - this.basePath = configuration.basePath || this.basePath; + this.configuration = configuration + this.basePath = configuration.basePath || this.basePath } } } @@ -96,9 +96,9 @@ export class BaseAPI { * @extends {Error} */ export class RequiredError extends Error { - name: 'RequiredError'; + name: 'RequiredError' constructor(public field: string, msg?: string) { - super(msg); + super(msg) } } @@ -107,7 +107,7 @@ export class RequiredError extends Error { * @export * @interface CallStackTableData */ -export interface CallStackTableData extends Array {} +export interface CallStackTableData extends Array { } /** * * @export @@ -119,67 +119,67 @@ export interface CallStackTableDataInner { * @type {string} * @memberof CallStackTableDataInner */ - name: string; + name: string /** * * @type {string} * @memberof CallStackTableDataInner */ - input_shape?: string; + input_shape?: string /** * * @type {number} * @memberof CallStackTableDataInner */ - calls: number; + calls: number /** * * @type {number} * @memberof CallStackTableDataInner */ - device_self_duration?: number; + device_self_duration?: number /** * * @type {number} * @memberof CallStackTableDataInner */ - device_total_duration?: number; + device_total_duration?: number /** * * @type {number} * @memberof CallStackTableDataInner */ - host_self_duration: number; + host_self_duration: number /** * * @type {number} * @memberof CallStackTableDataInner */ - host_total_duration: number; + host_total_duration: number /** * * @type {string} * @memberof CallStackTableDataInner */ - call_stack?: string; + call_stack?: string /** * * @type {string} * @memberof CallStackTableDataInner */ - tc_eligible?: string; + tc_eligible?: string /** * * @type {number} * @memberof CallStackTableDataInner */ - tc_self_ratio?: number; + tc_self_ratio?: number /** * * @type {number} * @memberof CallStackTableDataInner */ - tc_total_ratio?: number; + tc_total_ratio?: number } /** * @@ -192,25 +192,25 @@ export interface DiffNode { * @type {OpStats} * @memberof DiffNode */ - left: OpStats; + left: OpStats /** * * @type {OpStats} * @memberof DiffNode */ - right: OpStats; + right: OpStats /** * * @type {string} * @memberof DiffNode */ - path: string; + path: string /** * * @type {Array} * @memberof DiffNode */ - children: Array; + children: Array } /** * @@ -223,13 +223,13 @@ export interface DistributedGraph { * @type {DistributedGraphMetadata} * @memberof DistributedGraph */ - metadata: DistributedGraphMetadata; + metadata: DistributedGraphMetadata /** * * @type {any} * @memberof DistributedGraph */ - data: any; + data: any } /** * @@ -242,19 +242,19 @@ export interface DistributedGraphMetadata { * @type {string} * @memberof DistributedGraphMetadata */ - title: string; + title: string /** * * @type {Array} * @memberof DistributedGraphMetadata */ - legends: Array; + legends: Array /** * * @type {string} * @memberof DistributedGraphMetadata */ - units: string; + units: string } /** * @@ -267,13 +267,13 @@ export interface Environment { * @type {string} * @memberof Environment */ - title: string; + title: string /** * * @type {string} * @memberof Environment */ - value: string; + value: string } /** * @@ -286,13 +286,13 @@ export interface GpuInfo { * @type {GpuInfoMetadata} * @memberof GpuInfo */ - metadata: GpuInfoMetadata; + metadata: GpuInfoMetadata /** * * @type {any} * @memberof GpuInfo */ - data: any; + data: any } /** * @@ -305,7 +305,7 @@ export interface GpuInfoMetadata { * @type {string} * @memberof GpuInfoMetadata */ - title: string; + title: string } /** * @@ -318,13 +318,13 @@ export interface GpuMetric { * @type {string} * @memberof GpuMetric */ - title: string; + title: string /** * * @type {string} * @memberof GpuMetric */ - value: string; + value: string } /** * @@ -337,13 +337,13 @@ export interface GpuMetrics { * @type {Array} * @memberof GpuMetrics */ - data: Array; + data: Array /** * * @type {string} * @memberof GpuMetrics */ - tooltip: string; + tooltip: string } /** * @@ -356,19 +356,19 @@ export interface Graph { * @type {string} * @memberof Graph */ - title?: string; + title?: string /** * * @type {Array} * @memberof Graph */ - columns: Array; + columns: Array /** * * @type {Array>} * @memberof Graph */ - rows: Array>; + rows: Array> } /** * @@ -381,13 +381,13 @@ export interface ValueAndTooltip { * @type {string | number} * @memberof ValueAndTooltip */ - value: string | number; + value: string | number /** * * @type {string} * @memberof ValueAndTooltip */ - tooltip?: string; + tooltip?: string } /** * @@ -400,19 +400,19 @@ export interface StepedGraph { * @type {string} * @memberof StepedGraph */ - title?: string; + title?: string /** * * @type {Array} * @memberof StepedGraph */ - columns: Array; + columns: Array /** * * @type {Array>} * @memberof StepedGraph */ - rows: Array>; + rows: Array> } /** * @@ -425,19 +425,19 @@ export interface GraphAscend { * @type {string} * @memberof GraphAscend */ - title?: string; + title?: string /** * * @type {Array} * @memberof GraphAscend */ - columns: Array; + columns: Array /** * * @type {any} * @memberof GraphAscend */ - rows: any; + rows: any } /** * @@ -450,25 +450,25 @@ export interface GraphColumn { * @type {string} * @memberof GraphColumn */ - type: string; + type: string /** * * @type {string} * @memberof GraphColumn */ - name: string; + name: string /** * * @type {string} * @memberof GraphColumn */ - role?: string; + role?: string /** * * @type {GraphColumnP} * @memberof GraphColumn */ - p?: GraphColumnP; + p?: GraphColumnP } /** * @@ -481,7 +481,7 @@ export interface GraphColumnP { * @type {boolean} * @memberof GraphColumnP */ - html?: boolean; + html?: boolean } /** * @@ -494,13 +494,13 @@ export interface InlineResponse200 { * @type {TableMetadata} * @memberof InlineResponse200 */ - metadata: TableMetadata; + metadata: TableMetadata /** * * @type {OperationTableData} * @memberof InlineResponse200 */ - data: OperationTableData; + data: OperationTableData } /** * @@ -513,13 +513,13 @@ export interface InlineResponse2001 { * @type {TableMetadata} * @memberof InlineResponse2001 */ - metadata: TableMetadata; + metadata: TableMetadata /** * * @type {CallStackTableData} * @memberof InlineResponse2001 */ - data: CallStackTableData; + data: CallStackTableData } /** * @@ -532,13 +532,13 @@ export interface InlineResponse2002 { * @type {GpuInfoMetadata} * @memberof InlineResponse2002 */ - metadata: GpuInfoMetadata; + metadata: GpuInfoMetadata /** * * @type {any} * @memberof InlineResponse2002 */ - data: any; + data: any } /** * @@ -551,8 +551,8 @@ export interface KernelGraph { * @type {Graph} * @memberof KernelGraph */ - total: Graph; - device_target: string; + total: Graph, + device_target: string } /** * @@ -565,50 +565,50 @@ export interface KeyedColumn { * @type {string} * @memberof KeyedColumn */ - type: string; + type: string /** * * @type {string} * @memberof KeyedColumn */ - name: string; + name: string /** * * @type {string} * @memberof KeyedColumn */ - key: string; + key: string } /** - * + * * @export * @interface MemoryCurveDataAll */ export interface MemoryCurveDataAll { /** - * + * * @type {string} * @memberof MemoryCurveDataAll */ - default_device: string; + default_device: string /** - * + * * @type {Array} * @memberof MemoryCurveDataAll */ - devices: Array; + devices: Array /** * * @type {MemoryCurveDataAscend} * @memberof MemoryCurveDataAll */ - total: MemoryCurveDataAscend; + total: MemoryCurveDataAscend /** * * @type {MemoryCurveDataAscend} * @memberof MemoryCurveDataAll */ - ptaGe: MemoryCurveDataAscend; + ptaGe: MemoryCurveDataAscend } /** * @@ -621,19 +621,19 @@ export interface MemoryCurveData { * @type {MemoryCurveDataMetadata} * @memberof MemoryCurveData */ - metadata: MemoryCurveDataMetadata; + metadata: MemoryCurveDataMetadata /** * * @type {Array} * @memberof MemoryCurveData */ - columns: Array; + columns: Array /** * * @type {any} * @memberof MemoryCurveData */ - rows: any; + rows: any } /** * @@ -646,19 +646,19 @@ export interface MemoryCurveDataAscend { * @type {MemoryCurveDataMetadata} * @memberof MemoryCurveDataAscend */ - metadata: MemoryCurveDataMetadata; + metadata: MemoryCurveDataMetadata /** * * @type {any} * @memberof MemoryCurveDataAscend */ - columns: any; + columns: any /** * * @type {any} * @memberof MemoryCurveDataAscend */ - rows: any; + rows: any } /** * @@ -671,55 +671,55 @@ export interface MemoryCurveDataMetadata { * @type {string} * @memberof MemoryCurveDataMetadata */ - default_device: string; + default_device: string /** * * @type {Array} * @memberof MemoryCurveDataMetadata */ - devices: Array; + devices: Array /** * * @type {any} * @memberof MemoryCurveDataMetadata */ - peaks: any; + peaks: any /** * * @type {any} * @memberof MemoryCurveDataMetadata */ - totals: any; + totals: any /** * * @type {number} * @memberof MemoryCurveDataMetadata */ - first_ts: number; + first_ts: number /** * * @type {string} * @memberof MemoryCurveDataMetadata */ - time_metric: string; + time_metric: string /** * * @type {string} * @memberof MemoryCurveDataMetadata */ - memory_metric: string; + memory_metric: string /** * * @type {number} * @memberof MemoryCurveDataMetadata */ - time_factor: number; + time_factor: number /** * * @type {number} * @memberof MemoryCurveDataMetadata */ - memory_factor: number; + memory_factor: number } /** * @@ -732,38 +732,38 @@ export interface MemoryEventsData { * @type {MemoryEventsTableMetadata} * @memberof MemoryEventsData */ - metadata: MemoryEventsTableMetadata; + metadata: MemoryEventsTableMetadata /** * * @type {Array} * @memberof MemoryEventsData */ - columns: Array; + columns: Array /** * * @type {any} * @memberof MemoryEventsData */ - rows: any; + rows: any } /** - * + * * @exports * @interface MemoryEventsDataAll */ export interface MemoryEventsDataAll { /** - * + * * @type {MemoryEventsData} * @memberof MemoryEventsDataAll */ - operator: MemoryEventsData; + operator: MemoryEventsData /** - * + * * @type {MemoryEventsData} * @memberof MemoryEventsDataAll */ - component: MemoryEventsData; + component: MemoryEventsData } /** * @@ -776,25 +776,25 @@ export interface MemoryEventsTableMetadata { * @type {string} * @memberof MemoryEventsTableMetadata */ - title: string; + title: string /** * * @type {string} * @memberof MemoryEventsTableMetadata */ - default_device: string; + default_device: string /** * * @type {string} * @memberof MemoryEventsTableMetadata */ - search?: string; + search?: string /** * * @type {string} * @memberof MemoryEventsTableMetadata */ - sort?: string; + sort?: string } /** * @@ -807,19 +807,19 @@ export interface MemoryStatsData { * @type {MemoryStatsTableMetadata} * @memberof MemoryStatsData */ - metadata: MemoryStatsTableMetadata; + metadata: MemoryStatsTableMetadata /** * * @type {Array} * @memberof MemoryStatsData */ - columns: Array; + columns: Array /** * * @type {any} * @memberof MemoryStatsData */ - rows: any; + rows: any } /** * @@ -832,25 +832,25 @@ export interface MemoryStatsTableMetadata { * @type {string} * @memberof MemoryStatsTableMetadata */ - title: string; + title: string /** * * @type {string} * @memberof MemoryStatsTableMetadata */ - default_device: string; + default_device: string /** * * @type {string} * @memberof MemoryStatsTableMetadata */ - search: string; + search: string /** * * @type {string} * @memberof MemoryStatsTableMetadata */ - sort: string; + sort: string } /** * @@ -863,61 +863,61 @@ export interface ModuleStats { * @type {string} * @memberof ModuleStats */ - name: string; + name: string /** * * @type {string} * @memberof ModuleStats */ - id: string; + id: string /** * * @type {number} * @memberof ModuleStats */ - occurences: number; + occurences: number /** * * @type {number} * @memberof ModuleStats */ - operators: number; + operators: number /** * * @type {number} * @memberof ModuleStats */ - host_duration: number; + host_duration: number /** * * @type {number} * @memberof ModuleStats */ - self_host_duration: number; + self_host_duration: number /** * * @type {number} * @memberof ModuleStats */ - device_duration: number; + device_duration: number /** * * @type {number} * @memberof ModuleStats */ - self_device_duration: number; + self_device_duration: number /** * * @type {number} * @memberof ModuleStats */ - avg_duration: number; + avg_duration: number /** * * @type {Array} * @memberof ModuleStats */ - children: Array; + children: Array } /** * @@ -930,13 +930,13 @@ export interface ModuleViewData { * @type {Array} * @memberof ModuleViewData */ - columns: Array; + columns: Array /** * * @type {Array} * @memberof ModuleViewData */ - data: Array; + data: Array } /** * @@ -949,37 +949,37 @@ export interface OpAgg { * @type {string} * @memberof OpAgg */ - name: string; + name: string /** * * @type {number} * @memberof OpAgg */ - calls: number; + calls: number /** * * @type {number} * @memberof OpAgg */ - host_duration: number; + host_duration: number /** * * @type {number} * @memberof OpAgg */ - device_duration: number; + device_duration: number /** * * @type {number} * @memberof OpAgg */ - self_host_duration: number; + self_host_duration: number /** * * @type {number} * @memberof OpAgg */ - self_device_duration: number; + self_device_duration: number } /** * @@ -992,38 +992,38 @@ export interface OpStats { * @type {string} * @memberof OpStats */ - name: string; + name: string /** * * @type {number} * @memberof OpStats */ - duration: number; + duration: number /** * * @type {number} * @memberof OpStats */ - device_duration: number; + device_duration: number /** * * @type {number} * @memberof OpStats */ - total_duration: number; + total_duration: number /** * * @type {Array} * @memberof OpStats */ - aggs: Array; + aggs: Array } /** * * @export * @interface OperationTableData */ -export interface OperationTableData extends Array {} +export interface OperationTableData extends Array { } /** * * @export @@ -1035,67 +1035,67 @@ export interface OperationTableDataInner { * @type {string} * @memberof OperationTableDataInner */ - name: string; + name: string /** * * @type {string} * @memberof OperationTableDataInner */ - input_shape?: string; + input_shape?: string /** * * @type {number} * @memberof OperationTableDataInner */ - calls: number; + calls: number /** * * @type {number} * @memberof OperationTableDataInner */ - device_self_duration?: number; + device_self_duration?: number /** * * @type {number} * @memberof OperationTableDataInner */ - device_total_duration?: number; + device_total_duration?: number /** * * @type {number} * @memberof OperationTableDataInner */ - host_self_duration: number; + host_self_duration: number /** * * @type {number} * @memberof OperationTableDataInner */ - host_total_duration: number; + host_total_duration: number /** * * @type {boolean} * @memberof OperationTableDataInner */ - has_call_stack: boolean; + has_call_stack: boolean /** * * @type {string} * @memberof OperationTableDataInner */ - tc_eligible?: string; + tc_eligible?: string /** * * @type {number} * @memberof OperationTableDataInner */ - tc_self_ratio?: number; + tc_self_ratio?: number /** * * @type {number} * @memberof OperationTableDataInner */ - tc_total_ratio?: number; + tc_total_ratio?: number } /** * @@ -1108,25 +1108,25 @@ export interface OperatorGraph { * @type {Graph} * @memberof OperatorGraph */ - device_total_time: Graph; + device_total_time: Graph /** * * @type {Graph} * @memberof OperatorGraph */ - device_self_time: Graph; + device_self_time: Graph /** * * @type {Graph} * @memberof OperatorGraph */ - host_total_time: Graph; + host_total_time: Graph /** * * @type {Graph} * @memberof OperatorGraph */ - host_self_time: Graph; + host_self_time: Graph } /** * @@ -1139,37 +1139,37 @@ export interface OperatorNode { * @type {string} * @memberof OperatorNode */ - name: string; + name: string /** * * @type {number} * @memberof OperatorNode */ - start_time: number; + start_time: number /** * * @type {number} * @memberof OperatorNode */ - end_time: number; + end_time: number /** * * @type {string} * @memberof OperatorNode */ - type: string; + type: string /** * * @type {number} * @memberof OperatorNode */ - tid: number; + tid: number /** * * @type {Array} * @memberof OperatorNode */ - children: Array; + children: Array } /** * @@ -1182,31 +1182,31 @@ export interface Overview { * @type {Array} * @memberof Overview */ - performance: Array; + performance: Array /** * * @type {Array} * @memberof Overview */ - environments: Array; + environments: Array /** * * @type {StepedGraph} * @memberof Overview */ - steps: StepedGraph; + steps: StepedGraph /** * * @type {string} * @memberof Overview */ - recommendations: string; + recommendations: string /** * * @type {GpuMetrics} * @memberof Overview */ - gpu_metrics?: GpuMetrics; + gpu_metrics?: GpuMetrics } /** * @@ -1219,31 +1219,31 @@ export interface Performance { * @type {string} * @memberof Performance */ - name: string; + name: string /** * * @type {string} * @memberof Performance */ - description?: string; + description?: string /** * * @type {string} * @memberof Performance */ - value?: string; + value?: string /** * * @type {string} * @memberof Performance */ - extra?: string; + extra?: string /** * * @type {Array} * @memberof Performance */ - children?: Array; + children?: Array } /** * @@ -1256,13 +1256,13 @@ export interface Runs { * @type {Array} * @memberof Runs */ - runs: Array; + runs: Array /** * * @type {boolean} * @memberof Runs */ - loading: boolean; + loading: boolean } /** * @@ -1275,13 +1275,13 @@ export interface TableData { * @type {Graph} * @memberof TableData */ - data: Graph; + data: Graph /** * * @type {TableMetadata} * @memberof TableData */ - metadata: TableMetadata; + metadata: TableMetadata } /** * @@ -1294,13 +1294,13 @@ export interface TableMetadata { * @type {string} * @memberof TableMetadata */ - sort: string; + sort: string /** * * @type {any} * @memberof TableMetadata */ - tooltips?: any; + tooltips?: any } /** * @@ -1313,7 +1313,7 @@ export interface TensorCoresGraph { * @type {Graph} * @memberof TensorCoresGraph */ - total: Graph; + total: Graph } /** * @@ -1326,32 +1326,32 @@ export interface ValueAndFormat { * @type {string | number | boolean} * @memberof ValueAndFormat */ - v: string | number | boolean; + v: string | number | boolean /** * * @type {string} * @memberof ValueAndFormat */ - f: string; + f: string } /** - * + * * @exports * @interface Views */ export interface Views { /** - * + * * @type {string} * @memberof Views */ - device_target: string; + device_target: string /** - * + * * @type {Array} * @memberof Views */ - views: Array; + views: Array } /** * DefaultApi - fetch parameter creator @@ -1388,75 +1388,75 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling diffnodeGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling diffnodeGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling diffnodeGet.' - ); + ) } // verify required parameter 'exp_run' is not null or undefined if (exp_run === null || exp_run === undefined) { throw new RequiredError( 'exp_run', 'Required parameter exp_run was null or undefined when calling diffnodeGet.' - ); + ) } // verify required parameter 'exp_worker' is not null or undefined if (exp_worker === null || exp_worker === undefined) { throw new RequiredError( 'exp_worker', 'Required parameter exp_worker was null or undefined when calling diffnodeGet.' - ); + ) } // verify required parameter 'exp_span' is not null or undefined if (exp_span === null || exp_span === undefined) { throw new RequiredError( 'exp_span', 'Required parameter exp_span was null or undefined when calling diffnodeGet.' - ); + ) } - const localVarPath = `/diffnode`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/diffnode` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (exp_run !== undefined) { - localVarQueryParameter.exp_run = exp_run; + localVarQueryParameter['exp_run'] = exp_run } if (exp_worker !== undefined) { - localVarQueryParameter.exp_worker = exp_worker; + localVarQueryParameter['exp_worker'] = exp_worker } if (exp_span !== undefined) { - localVarQueryParameter.exp_span = exp_span; + localVarQueryParameter['exp_span'] = exp_span } if (path !== undefined) { - localVarQueryParameter.path = path; + localVarQueryParameter['path'] = path } localVarUrlObj.query = Object.assign( @@ -1464,19 +1464,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1497,38 +1497,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling distributedCommopsGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling distributedCommopsGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling distributedCommopsGet.' - ); + ) } - const localVarPath = `/distributed/commops`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/distributed/commops` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -1536,19 +1536,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1569,38 +1569,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling distributedGpuinfoGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling distributedGpuinfoGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling distributedGpuinfoGet.' - ); + ) } - const localVarPath = `/distributed/gpuinfo`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/distributed/gpuinfo` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -1608,19 +1608,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1641,38 +1641,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling distributedOverlapGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling distributedOverlapGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling distributedOverlapGet.' - ); + ) } - const localVarPath = `/distributed/overlap`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/distributed/overlap` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -1680,19 +1680,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1713,38 +1713,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling distributedWaittimeGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling distributedWaittimeGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling distributedWaittimeGet.' - ); + ) } - const localVarPath = `/distributed/waittime`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/distributed/waittime` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -1752,19 +1752,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1787,49 +1787,49 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling kernelGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling kernelGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling kernelGet.' - ); + ) } // verify required parameter 'group_by' is not null or undefined if (group_by === null || group_by === undefined) { throw new RequiredError( 'group_by', 'Required parameter group_by was null or undefined when calling kernelGet.' - ); + ) } - const localVarPath = `/kernel`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/kernel` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (group_by !== undefined) { - localVarQueryParameter.group_by = group_by; + localVarQueryParameter['group_by'] = group_by } localVarUrlObj.query = Object.assign( @@ -1837,19 +1837,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1872,42 +1872,42 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling kernelTableGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling kernelTableGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling kernelTableGet.' - ); + ) } - const localVarPath = `/kernel/table`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/kernel/table` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (group_by !== undefined) { - localVarQueryParameter.group_by = group_by; + localVarQueryParameter['group_by'] = group_by } localVarUrlObj.query = Object.assign( @@ -1915,19 +1915,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -1948,38 +1948,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling kernelTcPieGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling kernelTcPieGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling kernelTcPieGet.' - ); + ) } - const localVarPath = `/kernel/tc_pie`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/kernel/tc_pie` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -1987,19 +1987,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2020,38 +2020,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling memoryCurveGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling memoryCurveGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling memoryCurveGet.' - ); + ) } - const localVarPath = `/memory_curve`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/memory_curve` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -2059,19 +2059,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2096,46 +2096,46 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling memoryEventsGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling memoryEventsGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling memoryEventsGet.' - ); + ) } - const localVarPath = `/memory_events`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/memory_events` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (start_ts !== undefined) { - localVarQueryParameter.start_ts = start_ts; + localVarQueryParameter['start_ts'] = start_ts } if (end_ts !== undefined) { - localVarQueryParameter.end_ts = end_ts; + localVarQueryParameter['end_ts'] = end_ts } localVarUrlObj.query = Object.assign( @@ -2143,19 +2143,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2180,46 +2180,46 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling memoryGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling memoryGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling memoryGet.' - ); + ) } - const localVarPath = `/memory`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/memory` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (start_ts !== undefined) { - localVarQueryParameter.start_ts = start_ts; + localVarQueryParameter['start_ts'] = start_ts } if (end_ts !== undefined) { - localVarQueryParameter.end_ts = end_ts; + localVarQueryParameter['end_ts'] = end_ts } localVarUrlObj.query = Object.assign( @@ -2227,19 +2227,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2260,38 +2260,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling moduleGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling moduleGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling moduleGet.' - ); + ) } - const localVarPath = `/module`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/module` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -2299,19 +2299,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2334,49 +2334,49 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling operationGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling operationGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling operationGet.' - ); + ) } // verify required parameter 'group_by' is not null or undefined if (group_by === null || group_by === undefined) { throw new RequiredError( 'group_by', 'Required parameter group_by was null or undefined when calling operationGet.' - ); + ) } - const localVarPath = `/operation`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/operation` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (group_by !== undefined) { - localVarQueryParameter.group_by = group_by; + localVarQueryParameter['group_by'] = group_by } localVarUrlObj.query = Object.assign( @@ -2384,19 +2384,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2423,64 +2423,64 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling operationStackGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling operationStackGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling operationStackGet.' - ); + ) } // verify required parameter 'group_by' is not null or undefined if (group_by === null || group_by === undefined) { throw new RequiredError( 'group_by', 'Required parameter group_by was null or undefined when calling operationStackGet.' - ); + ) } // verify required parameter 'op_name' is not null or undefined if (op_name === null || op_name === undefined) { throw new RequiredError( 'op_name', 'Required parameter op_name was null or undefined when calling operationStackGet.' - ); + ) } - const localVarPath = `/operation/stack`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/operation/stack` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (group_by !== undefined) { - localVarQueryParameter.group_by = group_by; + localVarQueryParameter['group_by'] = group_by } if (op_name !== undefined) { - localVarQueryParameter.op_name = op_name; + localVarQueryParameter['op_name'] = op_name } if (input_shape !== undefined) { - localVarQueryParameter.input_shape = input_shape; + localVarQueryParameter['input_shape'] = input_shape } localVarUrlObj.query = Object.assign( @@ -2488,19 +2488,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2523,49 +2523,49 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling operationTableGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling operationTableGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling operationTableGet.' - ); + ) } // verify required parameter 'group_by' is not null or undefined if (group_by === null || group_by === undefined) { throw new RequiredError( 'group_by', 'Required parameter group_by was null or undefined when calling operationTableGet.' - ); + ) } - const localVarPath = `/operation/table`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/operation/table` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } if (group_by !== undefined) { - localVarQueryParameter.group_by = group_by; + localVarQueryParameter['group_by'] = group_by } localVarUrlObj.query = Object.assign( @@ -2573,19 +2573,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2606,38 +2606,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling overviewGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling overviewGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling overviewGet.' - ); + ) } - const localVarPath = `/overview`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/overview` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -2645,19 +2645,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2665,30 +2665,30 @@ export const DefaultApiFetchParamCreator = function ( * @throws {RequiredError} */ runsGet(options: any = {}): FetchArgs { - const localVarPath = `/runs`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/runs` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any localVarUrlObj.query = Object.assign( {}, localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2703,27 +2703,27 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling spansGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling spansGet.' - ); + ) } - const localVarPath = `/spans`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/spans` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } localVarUrlObj.query = Object.assign( @@ -2731,19 +2731,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2764,38 +2764,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling traceGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling traceGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling traceGet.' - ); + ) } - const localVarPath = `/trace`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/trace` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -2803,19 +2803,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2836,38 +2836,38 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling treeGet.' - ); + ) } // verify required parameter 'worker' is not null or undefined if (worker === null || worker === undefined) { throw new RequiredError( 'worker', 'Required parameter worker was null or undefined when calling treeGet.' - ); + ) } // verify required parameter 'span' is not null or undefined if (span === null || span === undefined) { throw new RequiredError( 'span', 'Required parameter span was null or undefined when calling treeGet.' - ); + ) } - const localVarPath = `/tree`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/tree` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (worker !== undefined) { - localVarQueryParameter.worker = worker; + localVarQueryParameter['worker'] = worker } if (span !== undefined) { - localVarQueryParameter.span = span; + localVarQueryParameter['span'] = span } localVarUrlObj.query = Object.assign( @@ -2875,19 +2875,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2901,16 +2901,16 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling viewsGet.' - ); + ) } - const localVarPath = `/views`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/views` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } localVarUrlObj.query = Object.assign( @@ -2918,19 +2918,19 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; + options: localVarRequestOptions + } }, /** * @@ -2945,27 +2945,27 @@ export const DefaultApiFetchParamCreator = function ( throw new RequiredError( 'run', 'Required parameter run was null or undefined when calling workersGet.' - ); + ) } // verify required parameter 'view' is not null or undefined if (view === null || view === undefined) { throw new RequiredError( 'view', 'Required parameter view was null or undefined when calling workersGet.' - ); + ) } - const localVarPath = `/workers`; - const localVarUrlObj = url.parse(localVarPath, true); - const localVarRequestOptions = Object.assign({ method: 'GET' }, options); - const localVarHeaderParameter = {} as any; - const localVarQueryParameter = {} as any; + const localVarPath = `/workers` + const localVarUrlObj = url.parse(localVarPath, true) + const localVarRequestOptions = Object.assign({ method: 'GET' }, options) + const localVarHeaderParameter = {} as any + const localVarQueryParameter = {} as any if (run !== undefined) { - localVarQueryParameter.run = run; + localVarQueryParameter['run'] = run } if (view !== undefined) { - localVarQueryParameter.view = view; + localVarQueryParameter['view'] = view } localVarUrlObj.query = Object.assign( @@ -2973,22 +2973,22 @@ export const DefaultApiFetchParamCreator = function ( localVarUrlObj.query, localVarQueryParameter, options.query - ); + ) // fix override query string Detail: https://stackoverflow.com/a/7517673/1077943 - delete localVarUrlObj.search; + delete localVarUrlObj.search localVarRequestOptions.headers = Object.assign( {}, localVarHeaderParameter, options.headers - ); + ) return { url: url.format(localVarUrlObj), - options: localVarRequestOptions, - }; - }, - }; -}; + options: localVarRequestOptions + } + } + } +} /** * DefaultApi - functional programming interface @@ -3029,7 +3029,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { exp_span, path, options - ); + ) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3039,12 +3039,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3062,7 +3062,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).distributedCommopsGet(run, worker, span, options); + ).distributedCommopsGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3072,12 +3072,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3095,7 +3095,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).distributedGpuinfoGet(run, worker, span, options); + ).distributedGpuinfoGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3105,12 +3105,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3128,7 +3128,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).distributedOverlapGet(run, worker, span, options); + ).distributedOverlapGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3138,12 +3138,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3161,7 +3161,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).distributedWaittimeGet(run, worker, span, options); + ).distributedWaittimeGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3171,12 +3171,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3196,7 +3196,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).kernelGet(run, worker, span, group_by, options); + ).kernelGet(run, worker, span, group_by, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3206,12 +3206,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3231,7 +3231,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).kernelTableGet(run, worker, span, group_by, options); + ).kernelTableGet(run, worker, span, group_by, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3241,12 +3241,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3264,7 +3264,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).kernelTcPieGet(run, worker, span, options); + ).kernelTcPieGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3274,12 +3274,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3294,13 +3294,10 @@ export const DefaultApiFp = function (configuration?: Configuration) { worker: string, span: string, options?: any - ): ( - fetch?: FetchAPI, - basePath?: string - ) => Promise { + ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).memoryCurveGet(run, worker, span, options); + ).memoryCurveGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3310,12 +3307,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3334,13 +3331,10 @@ export const DefaultApiFp = function (configuration?: Configuration) { start_ts?: number, end_ts?: number, options?: any - ): ( - fetch?: FetchAPI, - basePath?: string - ) => Promise { + ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).memoryEventsGet(run, worker, span, start_ts, end_ts, options); + ).memoryEventsGet(run, worker, span, start_ts, end_ts, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3350,12 +3344,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3377,7 +3371,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).memoryGet(run, worker, span, start_ts, end_ts, options); + ).memoryGet(run, worker, span, start_ts, end_ts, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3387,12 +3381,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3410,7 +3404,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).moduleGet(run, worker, span, options); + ).moduleGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3420,12 +3414,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3445,7 +3439,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).operationGet(run, worker, span, group_by, options); + ).operationGet(run, worker, span, group_by, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3455,12 +3449,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3492,7 +3486,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { op_name, input_shape, options - ); + ) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3502,12 +3496,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3527,7 +3521,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).operationTableGet(run, worker, span, group_by, options); + ).operationTableGet(run, worker, span, group_by, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3537,12 +3531,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3560,7 +3554,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).overviewGet(run, worker, span, options); + ).overviewGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3570,12 +3564,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3585,8 +3579,9 @@ export const DefaultApiFp = function (configuration?: Configuration) { runsGet( options?: any ): (fetch?: FetchAPI, basePath?: string) => Promise { - const localVarFetchArgs = - DefaultApiFetchParamCreator(configuration).runsGet(options); + const localVarFetchArgs = DefaultApiFetchParamCreator( + configuration + ).runsGet(options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3596,12 +3591,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3617,7 +3612,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise> { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).spansGet(run, worker, options); + ).spansGet(run, worker, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3627,12 +3622,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3650,7 +3645,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).traceGet(run, worker, span, options); + ).traceGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3660,12 +3655,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3683,7 +3678,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).treeGet(run, worker, span, options); + ).treeGet(run, worker, span, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3693,12 +3688,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3712,7 +3707,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).viewsGet(run, options); + ).viewsGet(run, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3722,12 +3717,12 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; + }) + } }, /** * @@ -3743,7 +3738,7 @@ export const DefaultApiFp = function (configuration?: Configuration) { ): (fetch?: FetchAPI, basePath?: string) => Promise> { const localVarFetchArgs = DefaultApiFetchParamCreator( configuration - ).workersGet(run, view, options); + ).workersGet(run, view, options) return ( fetch: FetchAPI = portableFetch, basePath: string = BASE_PATH @@ -3753,15 +3748,15 @@ export const DefaultApiFp = function (configuration?: Configuration) { localVarFetchArgs.options ).then((response) => { if (response.status >= 200 && response.status < 300) { - return response.json(); + return response.json() } else { - throw response; + throw response } - }); - }; - }, - }; -}; + }) + } + } + } +} /** * DefaultApi - factory interface @@ -3804,7 +3799,7 @@ export const DefaultApiFactory = function ( exp_span, path, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3825,7 +3820,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3846,7 +3841,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3867,7 +3862,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3888,7 +3883,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3912,7 +3907,7 @@ export const DefaultApiFactory = function ( span, group_by, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3936,7 +3931,7 @@ export const DefaultApiFactory = function ( span, group_by, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3952,7 +3947,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3968,7 +3963,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -3995,7 +3990,7 @@ export const DefaultApiFactory = function ( start_ts, end_ts, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4022,7 +4017,7 @@ export const DefaultApiFactory = function ( start_ts, end_ts, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4038,7 +4033,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4062,7 +4057,7 @@ export const DefaultApiFactory = function ( span, group_by, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4092,7 +4087,7 @@ export const DefaultApiFactory = function ( op_name, input_shape, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4116,7 +4111,7 @@ export const DefaultApiFactory = function ( span, group_by, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4132,7 +4127,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4140,7 +4135,7 @@ export const DefaultApiFactory = function ( * @throws {RequiredError} */ runsGet(options?: any) { - return DefaultApiFp(configuration).runsGet(options)(fetch, basePath); + return DefaultApiFp(configuration).runsGet(options)(fetch, basePath) }, /** * @@ -4154,7 +4149,7 @@ export const DefaultApiFactory = function ( run, worker, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4170,7 +4165,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4186,7 +4181,7 @@ export const DefaultApiFactory = function ( worker, span, options - )(fetch, basePath); + )(fetch, basePath) }, /** * @@ -4195,10 +4190,7 @@ export const DefaultApiFactory = function ( * @throws {RequiredError} */ viewsGet(run: string, options?: any) { - return DefaultApiFp(configuration).viewsGet(run, options)( - fetch, - basePath - ); + return DefaultApiFp(configuration).viewsGet(run, options)(fetch, basePath) }, /** * @@ -4212,10 +4204,10 @@ export const DefaultApiFactory = function ( run, view, options - )(fetch, basePath); - }, - }; -}; + )(fetch, basePath) + } + } +} /** * DefaultApi - object-oriented interface @@ -4256,7 +4248,7 @@ export class DefaultApi extends BaseAPI { exp_span, path, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4279,7 +4271,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4302,7 +4294,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4325,7 +4317,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4348,7 +4340,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4374,7 +4366,7 @@ export class DefaultApi extends BaseAPI { span, group_by, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4400,7 +4392,7 @@ export class DefaultApi extends BaseAPI { span, group_by, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4423,7 +4415,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4446,7 +4438,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4475,7 +4467,7 @@ export class DefaultApi extends BaseAPI { start_ts, end_ts, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4504,7 +4496,7 @@ export class DefaultApi extends BaseAPI { start_ts, end_ts, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4522,7 +4514,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4548,7 +4540,7 @@ export class DefaultApi extends BaseAPI { span, group_by, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4580,7 +4572,7 @@ export class DefaultApi extends BaseAPI { op_name, input_shape, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4606,7 +4598,7 @@ export class DefaultApi extends BaseAPI { span, group_by, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4624,7 +4616,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4637,7 +4629,7 @@ export class DefaultApi extends BaseAPI { return DefaultApiFp(this.configuration).runsGet(options)( this.fetch, this.basePath - ); + ) } /** @@ -4653,7 +4645,7 @@ export class DefaultApi extends BaseAPI { run, worker, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4671,7 +4663,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4689,7 +4681,7 @@ export class DefaultApi extends BaseAPI { worker, span, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } /** @@ -4703,7 +4695,7 @@ export class DefaultApi extends BaseAPI { return DefaultApiFp(this.configuration).viewsGet(run, options)( this.fetch, this.basePath - ); + ) } /** @@ -4719,6 +4711,6 @@ export class DefaultApi extends BaseAPI { run, view, options - )(this.fetch, this.basePath); + )(this.fetch, this.basePath) } } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/configuration.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/configuration.ts index 85b77bf651c049ec5a2ec85379414f619904c6dd..edec57eed84498fa3dcaa804ada9787b0202066c 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/configuration.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/configuration.ts @@ -14,12 +14,13 @@ * https://github.com/swagger-api/swagger-codegen.git * Do not edit the file manually. */ + export interface ConfigurationParameters { - apiKey?: string | ((name: string) => string); - username?: string; - password?: string; - accessToken?: string | ((name: string, scopes?: string[]) => string); - basePath?: string; + apiKey?: string | ((name: string) => string) + username?: string + password?: string + accessToken?: string | ((name: string, scopes?: string[]) => string) + basePath?: string } export class Configuration { @@ -28,41 +29,41 @@ export class Configuration { * @param name security name * @memberof Configuration */ - apiKey?: string | ((name: string) => string); + apiKey?: string | ((name: string) => string) /** * parameter for basic security * * @type {string} * @memberof Configuration */ - username?: string; + username?: string /** * parameter for basic security * * @type {string} * @memberof Configuration */ - password?: string; + password?: string /** * parameter for oauth2 security * @param name security name * @param scopes oauth2 scope * @memberof Configuration */ - accessToken?: string | ((name: string, scopes?: string[]) => string); + accessToken?: string | ((name: string, scopes?: string[]) => string) /** * override base path * * @type {string} * @memberof Configuration */ - basePath?: string; + basePath?: string constructor(param: ConfigurationParameters = {}) { - this.apiKey = param.apiKey; - this.username = param.username; - this.password = param.password; - this.accessToken = param.accessToken; - this.basePath = param.basePath; + this.apiKey = param.apiKey + this.username = param.username + this.password = param.password + this.accessToken = param.accessToken + this.basePath = param.basePath } } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/custom.d.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/custom.d.ts index 992af468898f15bee4f609a8cb752e21f0a9ad48..bfe6a59d9df208845d2fb5a43edb7a2f3d8721ae 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/custom.d.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/custom.d.ts @@ -2,5 +2,5 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -declare module 'portable-fetch'; -declare module 'url'; +declare module 'portable-fetch' +declare module 'url' diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/index.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/index.ts index 7ad784e60de2777174cea9d902ad9cf2550fad68..1ab79fb65f34d7c33099bac7e54378c3f54fdb35 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/index.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/generated/index.ts @@ -14,5 +14,6 @@ * https://github.com/swagger-api/swagger-codegen.git * Do not edit the file manually. */ -export * from './api'; -export * from './configuration'; + +export * from './api' +export * from './configuration' diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/index.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/index.ts index 98b35abfbc09785ffa09b1bbaa48c73685ec84f5..f43336a583b81998422facba8787270d6cee7673 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/index.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/index.ts @@ -2,7 +2,7 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as api from './generated'; +import * as api from './generated' -export const defaultApi = new api.DefaultApi(undefined, undefined, fetch); -export * from './generated/api'; +export const defaultApi = new api.DefaultApi(undefined, undefined, fetch) +export * from './generated/api' diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/mock.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/mock.ts index 4b4b447d97192b7c7c00784dd9176faeed25d64b..744c222a0266eed6359bb60fc0f6ba9601ba8edc 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/api/mock.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/api/mock.ts @@ -6,8 +6,8 @@ export class MockAPI { runsGet() { return { runs: ['resnet50_num_workers_0', 'resnet50_num_workers_4'], - loading: false, - }; + loading: false + } } viewsGet(run: string) { @@ -16,16 +16,16 @@ export class MockAPI { 'Operator', 'Kernel', 'Trace', - 'Memory', - ]); + 'Memory' + ]) } - spansGet(run: string, view: string): Promise { - return Promise.resolve(['1', '2']); + spansGet(run: string, view: String) { + return Promise.resolve(['1', '2']) } - workersGet(run: string, view: string): Promise { - return Promise.resolve(['worker0']); + workersGet(run: string, view: String) { + return Promise.resolve(['worker0']) } overviewGet(run: string, worker: string, span: string) { @@ -46,7 +46,7 @@ export class MockAPI { { type: 'number', name: 'CPU Exec' }, { type: 'string', role: 'tooltip', p: { html: 'true' } }, { type: 'number', name: 'Other' }, - { type: 'string', role: 'tooltip', p: { html: 'true' } }, + { type: 'string', role: 'tooltip', p: { html: 'true' } } ], rows: [ [ @@ -64,7 +64,7 @@ export class MockAPI { 14091, '
Step 5
Total: 187948us
CPU Exec: 14091us
Percentage: 7.5%
', 1115, - '
Step 5
Total: 187948us
Other: 1115us
Percentage: 0.59%
', + '
Step 5
Total: 187948us
Other: 1115us
Percentage: 0.59%
' ], [ '6', @@ -81,7 +81,7 @@ export class MockAPI { 12968, '
Step 6
Total: 175153us
CPU Exec: 12968us
Percentage: 7.4%
', 1148, - '
Step 6
Total: 175153us
Other: 1148us
Percentage: 0.66%
', + '
Step 6
Total: 175153us
Other: 1148us
Percentage: 0.66%
' ], [ '7', @@ -98,7 +98,7 @@ export class MockAPI { 13768, '
Step 7
Total: 179733us
CPU Exec: 13768us
Percentage: 7.66%
', 1213, - '
Step 7
Total: 179733us
Other: 1213us
Percentage: 0.67%
', + '
Step 7
Total: 179733us
Other: 1213us
Percentage: 0.67%
' ], [ '8', @@ -115,7 +115,7 @@ export class MockAPI { 13420, '
Step 8
Total: 174564us
CPU Exec: 13420us
Percentage: 7.69%
', 1200, - '
Step 8
Total: 174564us
Other: 1200us
Percentage: 0.69%
', + '
Step 8
Total: 174564us
Other: 1200us
Percentage: 0.69%
' ], [ '9', @@ -132,7 +132,7 @@ export class MockAPI { 15025, '
Step 9
Total: 182172us
CPU Exec: 15025us
Percentage: 8.25%
', 1141, - '
Step 9
Total: 182172us
Other: 1141us
Percentage: 0.63%
', + '
Step 9
Total: 182172us
Other: 1141us
Percentage: 0.63%
' ], [ '10', @@ -149,9 +149,9 @@ export class MockAPI { 12773, '
Step 10
Total: 165983us
CPU Exec: 12773us
Percentage: 7.7%
', 1117, - '
Step 10
Total: 165983us
Other: 1117us
Percentage: 0.67%
', - ], - ], + '
Step 10
Total: 165983us
Other: 1117us
Percentage: 0.67%
' + ] + ] }, performance: [ { @@ -166,15 +166,15 @@ export class MockAPI { { name: 'Runtime', description: '', value: 2908, extra: 1.64 }, { name: 'DataLoader', description: '', value: 59262, extra: 33.37 }, { name: 'CPU Exec', description: '', value: 13674, extra: 7.7 }, - { name: 'Other', description: '', value: 1156, extra: 0.65 }, - ], - }, + { name: 'Other', description: '', value: 1156, extra: 0.65 } + ] + } ], recommendations: '
  • This run has high time cost on input data loading. 33.4% of the step time is in DataLoader. You could try to set num_workers on DataLoader\'s construction and enable multi-processes on data loading.
  • Kernels with 68% time are launched by Tensor Cores eligible operators. You could enable Automatic Mixed Precision to speedup by using FP16.
', environments: [ { title: 'Number of Worker(s)', value: '1' }, - { title: 'Device Type', value: 'GPU' }, + { title: 'Device Type', value: 'GPU' } ], gpu_metrics: { title: 'GPU Summary', @@ -186,12 +186,12 @@ export class MockAPI { { title: 'GPU Utilization', value: '55.51 %' }, { title: 'Est. SM Efficiency', value: '54.68 %' }, { title: 'Est. Achieved Occupancy', value: '49.13 %' }, - { title: 'Kernel Time using Tensor Cores', value: '0.0 %' }, + { title: 'Kernel Time using Tensor Cores', value: '0.0 %' } ], tooltip: - "The GPU usage metrics:\n\nGPU Utilization:\nGPU busy time / All steps time. The higher, the better. GPU busy time is the time during which there is at least one GPU kernel running on it. All steps time is the total time of all profiler steps(or called as iterations).\n\nEst. SM Efficiency:\nEstimated Stream Multiprocessor Efficiency. The higher, the better. This metric of a kernel, SM_Eff_K = min(blocks of this kernel / SM number of this GPU, 100%). This overall number is the sum of all kernels' SM_Eff_K weighted by kernel's execution duration, divided by all steps time.\n\nEst. Achieved Occupancy:\nFor most cases such as memory bandwidth bounded kernels, the higher the better. Occupancy is the ratio of active warps on an SM to the maximum number of active warps supported by the SM. The theoretical occupancy of a kernel is upper limit occupancy of this kernel, limited by multiple factors such as kernel shape, kernel used resource, and the GPU compute capability.\nEst. Achieved Occupancy of a kernel, OCC_K = min(threads of the kernel / SM number / max threads per SM, theoretical occupancy of the kernel). This overall number is the weighted average of all kernels' OCC_K using kernel's execution duration as weight. It shows fine-grained low-level GPU utilization.\n\nKernel using Tensor Cores:\nTotal GPU Time for Tensor Core kernels / Total GPU Time for all kernels.\n", - }, - }); + "The GPU usage metrics:\n\nGPU Utilization:\nGPU busy time / All steps time. The higher, the better. GPU busy time is the time during which there is at least one GPU kernel running on it. All steps time is the total time of all profiler steps(or called as iterations).\n\nEst. SM Efficiency:\nEstimated Stream Multiprocessor Efficiency. The higher, the better. This metric of a kernel, SM_Eff_K = min(blocks of this kernel / SM number of this GPU, 100%). This overall number is the sum of all kernels' SM_Eff_K weighted by kernel's execution duration, divided by all steps time.\n\nEst. Achieved Occupancy:\nFor most cases such as memory bandwidth bounded kernels, the higher the better. Occupancy is the ratio of active warps on an SM to the maximum number of active warps supported by the SM. The theoretical occupancy of a kernel is upper limit occupancy of this kernel, limited by multiple factors such as kernel shape, kernel used resource, and the GPU compute capability.\nEst. Achieved Occupancy of a kernel, OCC_K = min(threads of the kernel / SM number / max threads per SM, theoretical occupancy of the kernel). This overall number is the weighted average of all kernels' OCC_K using kernel's execution duration as weight. It shows fine-grained low-level GPU utilization.\n\nKernel using Tensor Cores:\nTotal GPU Time for Tensor Core kernels / Total GPU Time for all kernels.\n" + } + }) } diffnodeGet( @@ -216,7 +216,7 @@ export class MockAPI { host_duration: 186312, device_duration: 0, self_host_duration: 186312, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -224,7 +224,7 @@ export class MockAPI { host_duration: 31902, device_duration: 736, self_host_duration: 17460, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -232,7 +232,7 @@ export class MockAPI { host_duration: 62713, device_duration: 0, self_host_duration: 32640, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -240,7 +240,7 @@ export class MockAPI { host_duration: 1711486, device_duration: 8796, self_host_duration: 37162, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach', @@ -248,7 +248,7 @@ export class MockAPI { host_duration: 4379, device_duration: 0, self_host_duration: 4379, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach', @@ -256,7 +256,7 @@ export class MockAPI { host_duration: 10596, device_duration: 0, self_host_duration: 6217, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -264,7 +264,7 @@ export class MockAPI { host_duration: 8470, device_duration: 0, self_host_duration: 8470, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::unsqueeze', @@ -272,7 +272,7 @@ export class MockAPI { host_duration: 19150, device_duration: 0, self_host_duration: 16142, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_strided', @@ -280,7 +280,7 @@ export class MockAPI { host_duration: 50043, device_duration: 0, self_host_duration: 50043, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -288,7 +288,7 @@ export class MockAPI { host_duration: 1518205, device_duration: 8796, self_host_duration: 1509009, - self_device_duration: 8796, + self_device_duration: 8796 }, { name: 'aten::_to_copy', @@ -296,7 +296,7 @@ export class MockAPI { host_duration: 1674324, device_duration: 8796, self_host_duration: 104788, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::upsample_bilinear2d', @@ -304,7 +304,7 @@ export class MockAPI { host_duration: 460479, device_duration: 0, self_host_duration: 421547, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::squeeze', @@ -312,7 +312,7 @@ export class MockAPI { host_duration: 9401, device_duration: 0, self_host_duration: 8211, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::round', @@ -320,7 +320,7 @@ export class MockAPI { host_duration: 31311, device_duration: 0, self_host_duration: 31311, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::slice', @@ -328,7 +328,7 @@ export class MockAPI { host_duration: 17762, device_duration: 0, self_host_duration: 15082, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach_', @@ -336,7 +336,7 @@ export class MockAPI { host_duration: 4194, device_duration: 0, self_host_duration: 4194, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach_', @@ -344,7 +344,7 @@ export class MockAPI { host_duration: 14514, device_duration: 0, self_host_duration: 10320, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::result_type', @@ -352,7 +352,7 @@ export class MockAPI { host_duration: 1734, device_duration: 0, self_host_duration: 1734, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::pow', @@ -360,7 +360,7 @@ export class MockAPI { host_duration: 86249, device_duration: 0, self_host_duration: 78373, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sub', @@ -368,7 +368,7 @@ export class MockAPI { host_duration: 183533, device_duration: 0, self_host_duration: 75637, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::gt', @@ -376,7 +376,7 @@ export class MockAPI { host_duration: 71284, device_duration: 0, self_host_duration: 49575, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_local_scalar_dense', @@ -384,7 +384,7 @@ export class MockAPI { host_duration: 4948, device_duration: 0, self_host_duration: 4948, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::item', @@ -392,7 +392,7 @@ export class MockAPI { host_duration: 20922, device_duration: 0, self_host_duration: 15974, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::is_nonzero', @@ -400,7 +400,7 @@ export class MockAPI { host_duration: 27934, device_duration: 0, self_host_duration: 10747, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -408,7 +408,7 @@ export class MockAPI { host_duration: 168214, device_duration: 75, self_host_duration: 146203, - self_device_duration: 75, + self_device_duration: 75 }, { name: 'aten::resize_', @@ -416,7 +416,7 @@ export class MockAPI { host_duration: 248, device_duration: 0, self_host_duration: 248, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::narrow', @@ -424,7 +424,7 @@ export class MockAPI { host_duration: 280, device_duration: 0, self_host_duration: 99, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_cat', @@ -432,7 +432,7 @@ export class MockAPI { host_duration: 92993, device_duration: 0, self_host_duration: 92405, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cat', @@ -440,7 +440,7 @@ export class MockAPI { host_duration: 93282, device_duration: 0, self_host_duration: 289, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::stack', @@ -448,7 +448,7 @@ export class MockAPI { host_duration: 124757, device_duration: 0, self_host_duration: 22050, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution', @@ -456,7 +456,7 @@ export class MockAPI { host_duration: 44043, device_duration: 71832, self_host_duration: 35027, - self_device_duration: 71832, + self_device_duration: 71832 }, { name: 'aten::_convolution', @@ -464,7 +464,7 @@ export class MockAPI { host_duration: 51312, device_duration: 71832, self_host_duration: 7269, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::convolution', @@ -472,7 +472,7 @@ export class MockAPI { host_duration: 55287, device_duration: 71832, self_host_duration: 3975, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::conv2d', @@ -480,7 +480,7 @@ export class MockAPI { host_duration: 59323, device_duration: 71832, self_host_duration: 4036, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -488,7 +488,7 @@ export class MockAPI { host_duration: 17461, device_duration: 10540, self_host_duration: 15188, - self_device_duration: 10540, + self_device_duration: 10540 }, { name: 'aten::empty_like', @@ -496,7 +496,7 @@ export class MockAPI { host_duration: 11504, device_duration: 0, self_host_duration: 4865, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::view', @@ -504,7 +504,7 @@ export class MockAPI { host_duration: 3589, device_duration: 0, self_host_duration: 3589, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm', @@ -512,7 +512,7 @@ export class MockAPI { host_duration: 71328, device_duration: 25802, self_host_duration: 40944, - self_device_duration: 25802, + self_device_duration: 25802 }, { name: 'aten::_batch_norm_impl_index', @@ -520,7 +520,7 @@ export class MockAPI { host_duration: 76354, device_duration: 25802, self_host_duration: 5026, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::batch_norm', @@ -528,7 +528,7 @@ export class MockAPI { host_duration: 79832, device_duration: 25802, self_host_duration: 3478, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::clamp_min', @@ -536,7 +536,7 @@ export class MockAPI { host_duration: 5417, device_duration: 12000, self_host_duration: 3885, - self_device_duration: 12000, + self_device_duration: 12000 }, { name: 'aten::clamp_min_', @@ -544,7 +544,7 @@ export class MockAPI { host_duration: 8537, device_duration: 12000, self_host_duration: 3120, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::relu_', @@ -552,7 +552,7 @@ export class MockAPI { host_duration: 16708, device_duration: 12000, self_host_duration: 8171, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices', @@ -560,7 +560,7 @@ export class MockAPI { host_duration: 442, device_duration: 940, self_host_duration: 405, - self_device_duration: 940, + self_device_duration: 940 }, { name: 'aten::max_pool2d', @@ -568,7 +568,7 @@ export class MockAPI { host_duration: 542, device_duration: 940, self_host_duration: 100, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -576,7 +576,7 @@ export class MockAPI { host_duration: 72931, device_duration: 13090, self_host_duration: 57558, - self_device_duration: 13090, + self_device_duration: 13090 }, { name: 'aten::mean', @@ -584,7 +584,7 @@ export class MockAPI { host_duration: 376, device_duration: 133, self_host_duration: 339, - self_device_duration: 133, + self_device_duration: 133 }, { name: 'aten::adaptive_avg_pool2d', @@ -592,7 +592,7 @@ export class MockAPI { host_duration: 465, device_duration: 133, self_host_duration: 89, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -600,7 +600,7 @@ export class MockAPI { host_duration: 170, device_duration: 0, self_host_duration: 170, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::flatten', @@ -608,7 +608,7 @@ export class MockAPI { host_duration: 207, device_duration: 0, self_host_duration: 103, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -616,7 +616,7 @@ export class MockAPI { host_duration: 587, device_duration: 0, self_host_duration: 465, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -624,7 +624,7 @@ export class MockAPI { host_duration: 1068, device_duration: 0, self_host_duration: 481, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -632,7 +632,7 @@ export class MockAPI { host_duration: 277, device_duration: 0, self_host_duration: 227, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::addmm', @@ -640,7 +640,7 @@ export class MockAPI { host_duration: 809, device_duration: 84, self_host_duration: 604, - self_device_duration: 84, + self_device_duration: 84 }, { name: 'aten::linear', @@ -648,7 +648,7 @@ export class MockAPI { host_duration: 1185, device_duration: 84, self_host_duration: 137, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax', @@ -656,7 +656,7 @@ export class MockAPI { host_duration: 308, device_duration: 14, self_host_duration: 271, - self_device_duration: 14, + self_device_duration: 14 }, { name: 'aten::log_softmax', @@ -664,7 +664,7 @@ export class MockAPI { host_duration: 472, device_duration: 14, self_host_duration: 153, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_forward', @@ -672,7 +672,7 @@ export class MockAPI { host_duration: 522, device_duration: 8, self_host_duration: 476, - self_device_duration: 8, + self_device_duration: 8 }, { name: 'aten::nll_loss', @@ -680,7 +680,7 @@ export class MockAPI { host_duration: 590, device_duration: 8, self_host_duration: 68, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_nd', @@ -688,7 +688,7 @@ export class MockAPI { host_duration: 641, device_duration: 8, self_host_duration: 51, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cross_entropy_loss', @@ -696,7 +696,7 @@ export class MockAPI { host_duration: 1234, device_duration: 22, self_host_duration: 121, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -704,7 +704,7 @@ export class MockAPI { host_duration: 14541, device_duration: 738, self_host_duration: 10083, - self_device_duration: 738, + self_device_duration: 738 }, { name: 'aten::ones_like', @@ -712,7 +712,7 @@ export class MockAPI { host_duration: 516, device_duration: 2, self_host_duration: 142, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_backward', @@ -720,7 +720,7 @@ export class MockAPI { host_duration: 573, device_duration: 8, self_host_duration: 310, - self_device_duration: 6, + self_device_duration: 6 }, { name: 'NllLossBackward0', @@ -728,7 +728,7 @@ export class MockAPI { host_duration: 774, device_duration: 8, self_host_duration: 201, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: NllLossBackward0', @@ -736,7 +736,7 @@ export class MockAPI { host_duration: 1025, device_duration: 8, self_host_duration: 251, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax_backward_data', @@ -744,7 +744,7 @@ export class MockAPI { host_duration: 236, device_duration: 18, self_host_duration: 196, - self_device_duration: 18, + self_device_duration: 18 }, { name: 'LogSoftmaxBackward0', @@ -752,7 +752,7 @@ export class MockAPI { host_duration: 385, device_duration: 18, self_host_duration: 149, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: LogSoftmaxBackward0', @@ -760,7 +760,7 @@ export class MockAPI { host_duration: 632, device_duration: 18, self_host_duration: 247, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mm', @@ -768,7 +768,7 @@ export class MockAPI { host_duration: 668, device_duration: 140, self_host_duration: 547, - self_device_duration: 140, + self_device_duration: 140 }, { name: 'AddmmBackward0', @@ -776,7 +776,7 @@ export class MockAPI { host_duration: 1698, device_duration: 140, self_host_duration: 417, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sum', @@ -784,7 +784,7 @@ export class MockAPI { host_duration: 370, device_duration: 15, self_host_duration: 328, - self_device_duration: 15, + self_device_duration: 15 }, { name: 'autograd::engine::evaluate_function: AddmmBackward0', @@ -792,7 +792,7 @@ export class MockAPI { host_duration: 2710, device_duration: 155, self_host_duration: 567, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'torch::autograd::AccumulateGrad', @@ -800,15 +800,16 @@ export class MockAPI { host_duration: 41184, device_duration: 997, self_host_duration: 16159, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', + name: + 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', calls: 322, host_duration: 70946, device_duration: 997, self_host_duration: 29762, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'TBackward0', @@ -816,7 +817,7 @@ export class MockAPI { host_duration: 280, device_duration: 0, self_host_duration: 64, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: TBackward0', @@ -824,7 +825,7 @@ export class MockAPI { host_duration: 428, device_duration: 0, self_host_duration: 148, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::reshape', @@ -832,7 +833,7 @@ export class MockAPI { host_duration: 170, device_duration: 0, self_host_duration: 104, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'ReshapeAliasBackward0', @@ -840,7 +841,7 @@ export class MockAPI { host_duration: 264, device_duration: 0, self_host_duration: 94, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReshapeAliasBackward0', @@ -848,7 +849,7 @@ export class MockAPI { host_duration: 402, device_duration: 0, self_host_duration: 138, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'MeanBackward1', @@ -856,7 +857,7 @@ export class MockAPI { host_duration: 1036, device_duration: 75, self_host_duration: 231, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: MeanBackward1', @@ -864,7 +865,7 @@ export class MockAPI { host_duration: 1254, device_duration: 75, self_host_duration: 218, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::threshold_backward', @@ -872,7 +873,7 @@ export class MockAPI { host_duration: 13838, device_duration: 17984, self_host_duration: 12131, - self_device_duration: 17984, + self_device_duration: 17984 }, { name: 'ReluBackward0', @@ -880,7 +881,7 @@ export class MockAPI { host_duration: 21183, device_duration: 17984, self_host_duration: 7345, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReluBackward0', @@ -888,7 +889,7 @@ export class MockAPI { host_duration: 33492, device_duration: 17984, self_host_duration: 12309, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'AddBackward0', @@ -896,7 +897,7 @@ export class MockAPI { host_duration: 251, device_duration: 0, self_host_duration: 251, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddBackward0', @@ -904,7 +905,7 @@ export class MockAPI { host_duration: 2579, device_duration: 0, self_host_duration: 2328, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm_backward', @@ -912,7 +913,7 @@ export class MockAPI { host_duration: 62175, device_duration: 44433, self_host_duration: 36053, - self_device_duration: 44433, + self_device_duration: 44433 }, { name: 'CudnnBatchNormBackward0', @@ -920,15 +921,16 @@ export class MockAPI { host_duration: 69160, device_duration: 44433, self_host_duration: 6985, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', + name: + 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', calls: 106, host_duration: 88613, device_duration: 44433, self_host_duration: 19453, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution_backward_input', @@ -936,7 +938,7 @@ export class MockAPI { host_duration: 40820, device_duration: 76620, self_host_duration: 30768, - self_device_duration: 76620, + self_device_duration: 76620 }, { name: 'aten::cudnn_convolution_backward_weight', @@ -944,7 +946,7 @@ export class MockAPI { host_duration: 44875, device_duration: 90108, self_host_duration: 27458, - self_device_duration: 90108, + self_device_duration: 90108 }, { name: 'aten::cudnn_convolution_backward', @@ -952,7 +954,7 @@ export class MockAPI { host_duration: 101020, device_duration: 166728, self_host_duration: 15325, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'CudnnConvolutionBackward0', @@ -960,15 +962,16 @@ export class MockAPI { host_duration: 107964, device_duration: 166728, self_host_duration: 6944, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', + name: + 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', calls: 106, host_duration: 129129, device_duration: 177161, self_host_duration: 16746, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices_backward', @@ -976,7 +979,7 @@ export class MockAPI { host_duration: 483, device_duration: 3048, self_host_duration: 257, - self_device_duration: 2588, + self_device_duration: 2588 }, { name: 'MaxPool2DWithIndicesBackward0', @@ -984,15 +987,16 @@ export class MockAPI { host_duration: 599, device_duration: 3048, self_host_duration: 116, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', + name: + 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', calls: 2, host_duration: 836, device_duration: 3048, self_host_duration: 237, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mul_', @@ -1000,9 +1004,9 @@ export class MockAPI { host_duration: 23818, device_duration: 797, self_host_duration: 19073, - self_device_duration: 797, - }, - ], + self_device_duration: 797 + } + ] }, right: { name: 'multiple nodes', @@ -1016,7 +1020,7 @@ export class MockAPI { host_duration: 31594, device_duration: 0, self_host_duration: 31594, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -1024,7 +1028,7 @@ export class MockAPI { host_duration: 6010, device_duration: 864, self_host_duration: 1910, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -1032,7 +1036,7 @@ export class MockAPI { host_duration: 10338, device_duration: 0, self_host_duration: 2951, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -1040,7 +1044,7 @@ export class MockAPI { host_duration: 47031, device_duration: 8684, self_host_duration: 4258, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach', @@ -1048,7 +1052,7 @@ export class MockAPI { host_duration: 701, device_duration: 0, self_host_duration: 698, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach', @@ -1056,7 +1060,7 @@ export class MockAPI { host_duration: 1374, device_duration: 0, self_host_duration: 676, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -1064,7 +1068,7 @@ export class MockAPI { host_duration: 1013, device_duration: 0, self_host_duration: 1013, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::unsqueeze', @@ -1072,7 +1076,7 @@ export class MockAPI { host_duration: 2074, device_duration: 0, self_host_duration: 1723, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_strided', @@ -1080,7 +1084,7 @@ export class MockAPI { host_duration: 6859, device_duration: 0, self_host_duration: 6859, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -1088,7 +1092,7 @@ export class MockAPI { host_duration: 25248, device_duration: 8684, self_host_duration: 16166, - self_device_duration: 8684, + self_device_duration: 8684 }, { name: 'aten::_to_copy', @@ -1096,7 +1100,7 @@ export class MockAPI { host_duration: 42773, device_duration: 8684, self_host_duration: 10227, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::upsample_bilinear2d', @@ -1104,7 +1108,7 @@ export class MockAPI { host_duration: 51788, device_duration: 0, self_host_duration: 46788, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::squeeze', @@ -1112,7 +1116,7 @@ export class MockAPI { host_duration: 1035, device_duration: 0, self_host_duration: 895, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::round', @@ -1120,7 +1124,7 @@ export class MockAPI { host_duration: 11074, device_duration: 0, self_host_duration: 11074, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::slice', @@ -1128,7 +1132,7 @@ export class MockAPI { host_duration: 1892, device_duration: 0, self_host_duration: 1600, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach_', @@ -1136,7 +1140,7 @@ export class MockAPI { host_duration: 278, device_duration: 0, self_host_duration: 244, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach_', @@ -1144,7 +1148,7 @@ export class MockAPI { host_duration: 1341, device_duration: 0, self_host_duration: 1097, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::result_type', @@ -1152,7 +1156,7 @@ export class MockAPI { host_duration: 317, device_duration: 0, self_host_duration: 317, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::pow', @@ -1160,7 +1164,7 @@ export class MockAPI { host_duration: 8857, device_duration: 0, self_host_duration: 7959, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sub', @@ -1168,7 +1172,7 @@ export class MockAPI { host_duration: 17840, device_duration: 0, self_host_duration: 7688, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::gt', @@ -1176,7 +1180,7 @@ export class MockAPI { host_duration: 6903, device_duration: 0, self_host_duration: 4901, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_local_scalar_dense', @@ -1184,7 +1188,7 @@ export class MockAPI { host_duration: 395, device_duration: 0, self_host_duration: 395, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::item', @@ -1192,7 +1196,7 @@ export class MockAPI { host_duration: 2532, device_duration: 0, self_host_duration: 2130, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::is_nonzero', @@ -1200,7 +1204,7 @@ export class MockAPI { host_duration: 3601, device_duration: 0, self_host_duration: 1427, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -1208,7 +1212,7 @@ export class MockAPI { host_duration: 11707, device_duration: 75, self_host_duration: 9531, - self_device_duration: 75, + self_device_duration: 75 }, { name: 'aten::resize_', @@ -1216,7 +1220,7 @@ export class MockAPI { host_duration: 79, device_duration: 0, self_host_duration: 79, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::narrow', @@ -1224,7 +1228,7 @@ export class MockAPI { host_duration: 37, device_duration: 0, self_host_duration: 16, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_cat', @@ -1232,7 +1236,7 @@ export class MockAPI { host_duration: 9241, device_duration: 0, self_host_duration: 9113, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cat', @@ -1240,7 +1244,7 @@ export class MockAPI { host_duration: 9286, device_duration: 0, self_host_duration: 45, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::stack', @@ -1248,7 +1252,7 @@ export class MockAPI { host_duration: 16195, device_duration: 0, self_host_duration: 6105, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution', @@ -1256,7 +1260,7 @@ export class MockAPI { host_duration: 17357, device_duration: 71414, self_host_duration: 13601, - self_device_duration: 71414, + self_device_duration: 71414 }, { name: 'aten::_convolution', @@ -1264,7 +1268,7 @@ export class MockAPI { host_duration: 18514, device_duration: 71414, self_host_duration: 1157, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::convolution', @@ -1272,7 +1276,7 @@ export class MockAPI { host_duration: 19185, device_duration: 71414, self_host_duration: 671, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::conv2d', @@ -1280,7 +1284,7 @@ export class MockAPI { host_duration: 19750, device_duration: 71414, self_host_duration: 565, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -1288,7 +1292,7 @@ export class MockAPI { host_duration: 4973, device_duration: 10567, self_host_duration: 3157, - self_device_duration: 10567, + self_device_duration: 10567 }, { name: 'aten::empty_like', @@ -1296,7 +1300,7 @@ export class MockAPI { host_duration: 1924, device_duration: 0, self_host_duration: 598, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::view', @@ -1304,7 +1308,7 @@ export class MockAPI { host_duration: 596, device_duration: 0, self_host_duration: 596, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm', @@ -1312,7 +1316,7 @@ export class MockAPI { host_duration: 11083, device_duration: 25737, self_host_duration: 5031, - self_device_duration: 25737, + self_device_duration: 25737 }, { name: 'aten::_batch_norm_impl_index', @@ -1320,7 +1324,7 @@ export class MockAPI { host_duration: 11856, device_duration: 25737, self_host_duration: 773, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::batch_norm', @@ -1328,7 +1332,7 @@ export class MockAPI { host_duration: 12386, device_duration: 25737, self_host_duration: 530, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::clamp_min', @@ -1336,7 +1340,7 @@ export class MockAPI { host_duration: 2189, device_duration: 12010, self_host_duration: 1030, - self_device_duration: 12010, + self_device_duration: 12010 }, { name: 'aten::clamp_min_', @@ -1344,7 +1348,7 @@ export class MockAPI { host_duration: 2614, device_duration: 12010, self_host_duration: 425, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::relu_', @@ -1352,7 +1356,7 @@ export class MockAPI { host_duration: 3880, device_duration: 12010, self_host_duration: 1266, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices', @@ -1360,7 +1364,7 @@ export class MockAPI { host_duration: 112, device_duration: 938, self_host_duration: 82, - self_device_duration: 938, + self_device_duration: 938 }, { name: 'aten::max_pool2d', @@ -1368,7 +1372,7 @@ export class MockAPI { host_duration: 127, device_duration: 938, self_host_duration: 15, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -1376,7 +1380,7 @@ export class MockAPI { host_duration: 21459, device_duration: 13178, self_host_duration: 11041, - self_device_duration: 13178, + self_device_duration: 13178 }, { name: 'aten::mean', @@ -1384,7 +1388,7 @@ export class MockAPI { host_duration: 104, device_duration: 126, self_host_duration: 76, - self_device_duration: 126, + self_device_duration: 126 }, { name: 'aten::adaptive_avg_pool2d', @@ -1392,7 +1396,7 @@ export class MockAPI { host_duration: 117, device_duration: 126, self_host_duration: 13, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -1400,7 +1404,7 @@ export class MockAPI { host_duration: 26, device_duration: 0, self_host_duration: 26, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::flatten', @@ -1408,7 +1412,7 @@ export class MockAPI { host_duration: 31, device_duration: 0, self_host_duration: 15, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -1416,7 +1420,7 @@ export class MockAPI { host_duration: 85, device_duration: 0, self_host_duration: 68, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -1424,7 +1428,7 @@ export class MockAPI { host_duration: 145, device_duration: 0, self_host_duration: 60, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -1432,7 +1436,7 @@ export class MockAPI { host_duration: 30, device_duration: 0, self_host_duration: 25, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::addmm', @@ -1440,7 +1444,7 @@ export class MockAPI { host_duration: 334, device_duration: 84, self_host_duration: 234, - self_device_duration: 84, + self_device_duration: 84 }, { name: 'aten::linear', @@ -1448,7 +1452,7 @@ export class MockAPI { host_duration: 386, device_duration: 84, self_host_duration: 19, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax', @@ -1456,7 +1460,7 @@ export class MockAPI { host_duration: 83, device_duration: 14, self_host_duration: 55, - self_device_duration: 14, + self_device_duration: 14 }, { name: 'aten::log_softmax', @@ -1464,7 +1468,7 @@ export class MockAPI { host_duration: 106, device_duration: 14, self_host_duration: 20, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_forward', @@ -1472,7 +1476,7 @@ export class MockAPI { host_duration: 96, device_duration: 8, self_host_duration: 68, - self_device_duration: 8, + self_device_duration: 8 }, { name: 'aten::nll_loss', @@ -1480,7 +1484,7 @@ export class MockAPI { host_duration: 105, device_duration: 8, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_nd', @@ -1488,7 +1492,7 @@ export class MockAPI { host_duration: 113, device_duration: 8, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cross_entropy_loss', @@ -1496,7 +1500,7 @@ export class MockAPI { host_duration: 243, device_duration: 22, self_host_duration: 24, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -1504,7 +1508,7 @@ export class MockAPI { host_duration: 4140, device_duration: 866, self_host_duration: 1851, - self_device_duration: 866, + self_device_duration: 866 }, { name: 'aten::ones_like', @@ -1512,7 +1516,7 @@ export class MockAPI { host_duration: 104, device_duration: 2, self_host_duration: 14, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_backward', @@ -1520,7 +1524,7 @@ export class MockAPI { host_duration: 192, device_duration: 9, self_host_duration: 84, - self_device_duration: 6, + self_device_duration: 6 }, { name: 'NllLossBackward0', @@ -1528,7 +1532,7 @@ export class MockAPI { host_duration: 297, device_duration: 9, self_host_duration: 105, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: NllLossBackward0', @@ -1536,7 +1540,7 @@ export class MockAPI { host_duration: 352, device_duration: 9, self_host_duration: 55, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax_backward_data', @@ -1544,7 +1548,7 @@ export class MockAPI { host_duration: 71, device_duration: 18, self_host_duration: 43, - self_device_duration: 18, + self_device_duration: 18 }, { name: 'LogSoftmaxBackward0', @@ -1552,7 +1556,7 @@ export class MockAPI { host_duration: 91, device_duration: 18, self_host_duration: 20, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: LogSoftmaxBackward0', @@ -1560,7 +1564,7 @@ export class MockAPI { host_duration: 126, device_duration: 18, self_host_duration: 35, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mm', @@ -1568,7 +1572,7 @@ export class MockAPI { host_duration: 283, device_duration: 134, self_host_duration: 186, - self_device_duration: 134, + self_device_duration: 134 }, { name: 'AddmmBackward0', @@ -1576,7 +1580,7 @@ export class MockAPI { host_duration: 418, device_duration: 134, self_host_duration: 47, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sum', @@ -1584,7 +1588,7 @@ export class MockAPI { host_duration: 92, device_duration: 14, self_host_duration: 62, - self_device_duration: 14, + self_device_duration: 14 }, { name: 'autograd::engine::evaluate_function: AddmmBackward0', @@ -1592,7 +1596,7 @@ export class MockAPI { host_duration: 594, device_duration: 148, self_host_duration: 75, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'torch::autograd::AccumulateGrad', @@ -1600,15 +1604,16 @@ export class MockAPI { host_duration: 10317, device_duration: 1069, self_host_duration: 2127, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', + name: + 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', calls: 322, host_duration: 15128, device_duration: 1069, self_host_duration: 4811, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'TBackward0', @@ -1616,7 +1621,7 @@ export class MockAPI { host_duration: 30, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: TBackward0', @@ -1624,7 +1629,7 @@ export class MockAPI { host_duration: 45, device_duration: 0, self_host_duration: 15, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::reshape', @@ -1632,7 +1637,7 @@ export class MockAPI { host_duration: 20, device_duration: 0, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'ReshapeAliasBackward0', @@ -1640,7 +1645,7 @@ export class MockAPI { host_duration: 31, device_duration: 0, self_host_duration: 11, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReshapeAliasBackward0', @@ -1648,7 +1653,7 @@ export class MockAPI { host_duration: 48, device_duration: 0, self_host_duration: 17, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'MeanBackward1', @@ -1656,7 +1661,7 @@ export class MockAPI { host_duration: 172, device_duration: 75, self_host_duration: 18, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: MeanBackward1', @@ -1664,7 +1669,7 @@ export class MockAPI { host_duration: 201, device_duration: 75, self_host_duration: 29, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::threshold_backward', @@ -1672,7 +1677,7 @@ export class MockAPI { host_duration: 3652, device_duration: 18018, self_host_duration: 2361, - self_device_duration: 18018, + self_device_duration: 18018 }, { name: 'ReluBackward0', @@ -1680,7 +1685,7 @@ export class MockAPI { host_duration: 4567, device_duration: 18018, self_host_duration: 915, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReluBackward0', @@ -1688,7 +1693,7 @@ export class MockAPI { host_duration: 6457, device_duration: 18018, self_host_duration: 1890, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'AddBackward0', @@ -1696,7 +1701,7 @@ export class MockAPI { host_duration: 26, device_duration: 0, self_host_duration: 26, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddBackward0', @@ -1704,7 +1709,7 @@ export class MockAPI { host_duration: 261, device_duration: 0, self_host_duration: 235, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm_backward', @@ -1712,7 +1717,7 @@ export class MockAPI { host_duration: 9943, device_duration: 44401, self_host_duration: 4355, - self_device_duration: 44401, + self_device_duration: 44401 }, { name: 'CudnnBatchNormBackward0', @@ -1720,15 +1725,16 @@ export class MockAPI { host_duration: 11132, device_duration: 44401, self_host_duration: 1189, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', + name: + 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', calls: 106, host_duration: 14696, device_duration: 44401, self_host_duration: 3564, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution_backward_input', @@ -1736,7 +1742,7 @@ export class MockAPI { host_duration: 18813, device_duration: 75568, self_host_duration: 13997, - self_device_duration: 75568, + self_device_duration: 75568 }, { name: 'aten::cudnn_convolution_backward_weight', @@ -1744,7 +1750,7 @@ export class MockAPI { host_duration: 18792, device_duration: 88992, self_host_duration: 11101, - self_device_duration: 88992, + self_device_duration: 88992 }, { name: 'aten::cudnn_convolution_backward', @@ -1752,7 +1758,7 @@ export class MockAPI { host_duration: 40064, device_duration: 164560, self_host_duration: 2459, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'CudnnConvolutionBackward0', @@ -1760,15 +1766,16 @@ export class MockAPI { host_duration: 41205, device_duration: 164560, self_host_duration: 1141, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', + name: + 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', calls: 106, host_duration: 45209, device_duration: 175014, self_host_duration: 2826, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices_backward', @@ -1776,7 +1783,7 @@ export class MockAPI { host_duration: 145, device_duration: 3016, self_host_duration: 61, - self_device_duration: 2556, + self_device_duration: 2556 }, { name: 'MaxPool2DWithIndicesBackward0', @@ -1784,15 +1791,16 @@ export class MockAPI { host_duration: 165, device_duration: 3016, self_host_duration: 20, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', + name: + 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', calls: 2, host_duration: 209, device_duration: 3016, self_host_duration: 44, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mul_', @@ -1800,9 +1808,9 @@ export class MockAPI { host_duration: 6835, device_duration: 803, self_host_duration: 3630, - self_device_duration: 803, - }, - ], + self_device_duration: 803 + } + ] }, path: '0', children: [ @@ -1819,7 +1827,7 @@ export class MockAPI { host_duration: 100, device_duration: 0, self_host_duration: 100, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -1827,7 +1835,7 @@ export class MockAPI { host_duration: 4, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -1835,9 +1843,9 @@ export class MockAPI { host_duration: 119, device_duration: 0, self_host_duration: 64, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'multiple nodes', @@ -1851,7 +1859,7 @@ export class MockAPI { host_duration: 17, device_duration: 0, self_host_duration: 17, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -1859,7 +1867,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -1867,11 +1875,11 @@ export class MockAPI { host_duration: 15, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-0', + path: '0-0' }, { left: { @@ -1886,7 +1894,7 @@ export class MockAPI { host_duration: 62288, device_duration: 0, self_host_duration: 62288, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -1894,7 +1902,7 @@ export class MockAPI { host_duration: 959, device_duration: 0, self_host_duration: 959, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -1902,7 +1910,7 @@ export class MockAPI { host_duration: 35273, device_duration: 0, self_host_duration: 16154, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -1910,7 +1918,7 @@ export class MockAPI { host_duration: 877101, device_duration: 0, self_host_duration: 18482, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach', @@ -1918,7 +1926,7 @@ export class MockAPI { host_duration: 2191, device_duration: 0, self_host_duration: 2191, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach', @@ -1926,7 +1934,7 @@ export class MockAPI { host_duration: 5301, device_duration: 0, self_host_duration: 3110, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -1934,7 +1942,7 @@ export class MockAPI { host_duration: 4175, device_duration: 0, self_host_duration: 4175, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::unsqueeze', @@ -1942,7 +1950,7 @@ export class MockAPI { host_duration: 9560, device_duration: 0, self_host_duration: 8045, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_strided', @@ -1950,7 +1958,7 @@ export class MockAPI { host_duration: 24689, device_duration: 0, self_host_duration: 24689, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -1958,7 +1966,7 @@ export class MockAPI { host_duration: 780214, device_duration: 0, self_host_duration: 780214, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_to_copy', @@ -1966,7 +1974,7 @@ export class MockAPI { host_duration: 858619, device_duration: 0, self_host_duration: 53009, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::upsample_bilinear2d', @@ -1974,7 +1982,7 @@ export class MockAPI { host_duration: 224031, device_duration: 0, self_host_duration: 204660, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::squeeze', @@ -1982,7 +1990,7 @@ export class MockAPI { host_duration: 4719, device_duration: 0, self_host_duration: 4119, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::round', @@ -1990,7 +1998,7 @@ export class MockAPI { host_duration: 16028, device_duration: 0, self_host_duration: 16028, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::slice', @@ -1998,7 +2006,7 @@ export class MockAPI { host_duration: 8918, device_duration: 0, self_host_duration: 7569, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach_', @@ -2006,7 +2014,7 @@ export class MockAPI { host_duration: 2092, device_duration: 0, self_host_duration: 2092, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach_', @@ -2014,7 +2022,7 @@ export class MockAPI { host_duration: 7228, device_duration: 0, self_host_duration: 5136, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::result_type', @@ -2022,7 +2030,7 @@ export class MockAPI { host_duration: 884, device_duration: 0, self_host_duration: 884, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::pow', @@ -2030,7 +2038,7 @@ export class MockAPI { host_duration: 43030, device_duration: 0, self_host_duration: 39068, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sub', @@ -2038,7 +2046,7 @@ export class MockAPI { host_duration: 91440, device_duration: 0, self_host_duration: 37676, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::gt', @@ -2046,7 +2054,7 @@ export class MockAPI { host_duration: 35514, device_duration: 0, self_host_duration: 24706, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_local_scalar_dense', @@ -2054,7 +2062,7 @@ export class MockAPI { host_duration: 2467, device_duration: 0, self_host_duration: 2467, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::item', @@ -2062,7 +2070,7 @@ export class MockAPI { host_duration: 10375, device_duration: 0, self_host_duration: 7908, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::is_nonzero', @@ -2070,7 +2078,7 @@ export class MockAPI { host_duration: 13905, device_duration: 0, self_host_duration: 5383, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -2078,7 +2086,7 @@ export class MockAPI { host_duration: 87841, device_duration: 0, self_host_duration: 76794, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -2086,7 +2094,7 @@ export class MockAPI { host_duration: 117, device_duration: 0, self_host_duration: 117, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::narrow', @@ -2094,7 +2102,7 @@ export class MockAPI { host_duration: 142, device_duration: 0, self_host_duration: 51, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_cat', @@ -2102,7 +2110,7 @@ export class MockAPI { host_duration: 51526, device_duration: 0, self_host_duration: 51229, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cat', @@ -2110,7 +2118,7 @@ export class MockAPI { host_duration: 51674, device_duration: 0, self_host_duration: 148, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::stack', @@ -2118,9 +2126,9 @@ export class MockAPI { host_duration: 75677, device_duration: 0, self_host_duration: 19330, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'enumerate(DataLoader)#_SingleProcessDataLoaderIter.__next__', @@ -2134,7 +2142,7 @@ export class MockAPI { host_duration: 12399, device_duration: 0, self_host_duration: 12399, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -2142,7 +2150,7 @@ export class MockAPI { host_duration: 98, device_duration: 0, self_host_duration: 98, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -2150,7 +2158,7 @@ export class MockAPI { host_duration: 7665, device_duration: 0, self_host_duration: 1689, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -2158,7 +2166,7 @@ export class MockAPI { host_duration: 21137, device_duration: 0, self_host_duration: 2377, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach', @@ -2166,7 +2174,7 @@ export class MockAPI { host_duration: 364, device_duration: 0, self_host_duration: 361, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach', @@ -2174,7 +2182,7 @@ export class MockAPI { host_duration: 745, device_duration: 0, self_host_duration: 384, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -2182,7 +2190,7 @@ export class MockAPI { host_duration: 527, device_duration: 0, self_host_duration: 527, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::unsqueeze', @@ -2190,7 +2198,7 @@ export class MockAPI { host_duration: 1050, device_duration: 0, self_host_duration: 869, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_strided', @@ -2198,7 +2206,7 @@ export class MockAPI { host_duration: 3689, device_duration: 0, self_host_duration: 3689, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -2206,7 +2214,7 @@ export class MockAPI { host_duration: 8695, device_duration: 0, self_host_duration: 8695, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_to_copy', @@ -2214,7 +2222,7 @@ export class MockAPI { host_duration: 18760, device_duration: 0, self_host_duration: 6122, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::upsample_bilinear2d', @@ -2222,7 +2230,7 @@ export class MockAPI { host_duration: 20349, device_duration: 0, self_host_duration: 17634, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::squeeze', @@ -2230,7 +2238,7 @@ export class MockAPI { host_duration: 562, device_duration: 0, self_host_duration: 487, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::round', @@ -2238,7 +2246,7 @@ export class MockAPI { host_duration: 6658, device_duration: 0, self_host_duration: 6658, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::slice', @@ -2246,7 +2254,7 @@ export class MockAPI { host_duration: 1028, device_duration: 0, self_host_duration: 870, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach_', @@ -2254,7 +2262,7 @@ export class MockAPI { host_duration: 142, device_duration: 0, self_host_duration: 129, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach_', @@ -2262,7 +2270,7 @@ export class MockAPI { host_duration: 755, device_duration: 0, self_host_duration: 626, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::result_type', @@ -2270,7 +2278,7 @@ export class MockAPI { host_duration: 168, device_duration: 0, self_host_duration: 168, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::pow', @@ -2278,7 +2286,7 @@ export class MockAPI { host_duration: 4922, device_duration: 0, self_host_duration: 4440, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sub', @@ -2286,7 +2294,7 @@ export class MockAPI { host_duration: 9959, device_duration: 0, self_host_duration: 4339, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::gt', @@ -2294,7 +2302,7 @@ export class MockAPI { host_duration: 3848, device_duration: 0, self_host_duration: 2737, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_local_scalar_dense', @@ -2302,7 +2310,7 @@ export class MockAPI { host_duration: 209, device_duration: 0, self_host_duration: 209, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::item', @@ -2310,7 +2318,7 @@ export class MockAPI { host_duration: 1398, device_duration: 0, self_host_duration: 1187, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::is_nonzero', @@ -2318,7 +2326,7 @@ export class MockAPI { host_duration: 2013, device_duration: 0, self_host_duration: 812, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -2326,7 +2334,7 @@ export class MockAPI { host_duration: 7421, device_duration: 0, self_host_duration: 6234, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -2334,7 +2342,7 @@ export class MockAPI { host_duration: 36, device_duration: 0, self_host_duration: 36, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::narrow', @@ -2342,7 +2350,7 @@ export class MockAPI { host_duration: 19, device_duration: 0, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_cat', @@ -2350,7 +2358,7 @@ export class MockAPI { host_duration: 4628, device_duration: 0, self_host_duration: 4566, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cat', @@ -2358,7 +2366,7 @@ export class MockAPI { host_duration: 4649, device_duration: 0, self_host_duration: 21, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::stack', @@ -2366,11 +2374,11 @@ export class MockAPI { host_duration: 10884, device_duration: 0, self_host_duration: 5859, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-1', + path: '0-1' }, { left: { @@ -2385,7 +2393,7 @@ export class MockAPI { host_duration: 209, device_duration: 0, self_host_duration: 209, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -2393,7 +2401,7 @@ export class MockAPI { host_duration: 4696, device_duration: 4402, self_host_duration: 93, - self_device_duration: 4402, + self_device_duration: 4402 }, { name: 'aten::_to_copy', @@ -2401,7 +2409,7 @@ export class MockAPI { host_duration: 5111, device_duration: 4402, self_host_duration: 206, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -2409,9 +2417,9 @@ export class MockAPI { host_duration: 5170, device_duration: 4402, self_host_duration: 59, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'multiple nodes', @@ -2425,7 +2433,7 @@ export class MockAPI { host_duration: 65, device_duration: 0, self_host_duration: 65, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -2433,7 +2441,7 @@ export class MockAPI { host_duration: 4575, device_duration: 4350, self_host_duration: 26, - self_device_duration: 4350, + self_device_duration: 4350 }, { name: 'aten::_to_copy', @@ -2441,7 +2449,7 @@ export class MockAPI { host_duration: 4670, device_duration: 4350, self_host_duration: 30, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -2449,11 +2457,11 @@ export class MockAPI { host_duration: 4681, device_duration: 4350, self_host_duration: 11, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-2', + path: '0-2' }, { left: { @@ -2468,7 +2476,7 @@ export class MockAPI { host_duration: 14161, device_duration: 0, self_host_duration: 14161, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution', @@ -2476,7 +2484,7 @@ export class MockAPI { host_duration: 22091, device_duration: 36599, self_host_duration: 17567, - self_device_duration: 36599, + self_device_duration: 36599 }, { name: 'aten::_convolution', @@ -2484,7 +2492,7 @@ export class MockAPI { host_duration: 25744, device_duration: 36599, self_host_duration: 3653, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::convolution', @@ -2492,7 +2500,7 @@ export class MockAPI { host_duration: 27753, device_duration: 36599, self_host_duration: 2009, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::conv2d', @@ -2500,7 +2508,7 @@ export class MockAPI { host_duration: 29777, device_duration: 36599, self_host_duration: 2024, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -2508,7 +2516,7 @@ export class MockAPI { host_duration: 6519, device_duration: 54, self_host_duration: 5666, - self_device_duration: 54, + self_device_duration: 54 }, { name: 'aten::empty_like', @@ -2516,7 +2524,7 @@ export class MockAPI { host_duration: 5624, device_duration: 0, self_host_duration: 2390, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::view', @@ -2524,7 +2532,7 @@ export class MockAPI { host_duration: 826, device_duration: 0, self_host_duration: 826, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm', @@ -2532,7 +2540,7 @@ export class MockAPI { host_duration: 35818, device_duration: 12974, self_host_duration: 20557, - self_device_duration: 12974, + self_device_duration: 12974 }, { name: 'aten::_batch_norm_impl_index', @@ -2540,7 +2548,7 @@ export class MockAPI { host_duration: 38324, device_duration: 12974, self_host_duration: 2506, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::batch_norm', @@ -2548,7 +2556,7 @@ export class MockAPI { host_duration: 40105, device_duration: 12974, self_host_duration: 1781, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::clamp_min', @@ -2556,7 +2564,7 @@ export class MockAPI { host_duration: 2702, device_duration: 6002, self_host_duration: 1935, - self_device_duration: 6002, + self_device_duration: 6002 }, { name: 'aten::clamp_min_', @@ -2564,7 +2572,7 @@ export class MockAPI { host_duration: 4273, device_duration: 6002, self_host_duration: 1571, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::relu_', @@ -2572,7 +2580,7 @@ export class MockAPI { host_duration: 8371, device_duration: 6002, self_host_duration: 4098, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices', @@ -2580,7 +2588,7 @@ export class MockAPI { host_duration: 230, device_duration: 474, self_host_duration: 212, - self_device_duration: 474, + self_device_duration: 474 }, { name: 'aten::max_pool2d', @@ -2588,7 +2596,7 @@ export class MockAPI { host_duration: 280, device_duration: 474, self_host_duration: 50, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -2596,7 +2604,7 @@ export class MockAPI { host_duration: 1546, device_duration: 5141, self_host_duration: 1290, - self_device_duration: 5141, + self_device_duration: 5141 }, { name: 'aten::mean', @@ -2604,7 +2612,7 @@ export class MockAPI { host_duration: 189, device_duration: 69, self_host_duration: 170, - self_device_duration: 69, + self_device_duration: 69 }, { name: 'aten::adaptive_avg_pool2d', @@ -2612,7 +2620,7 @@ export class MockAPI { host_duration: 234, device_duration: 69, self_host_duration: 45, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -2620,7 +2628,7 @@ export class MockAPI { host_duration: 52, device_duration: 0, self_host_duration: 52, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::flatten', @@ -2628,7 +2636,7 @@ export class MockAPI { host_duration: 106, device_duration: 0, self_host_duration: 54, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -2636,7 +2644,7 @@ export class MockAPI { host_duration: 23, device_duration: 0, self_host_duration: 23, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -2644,7 +2652,7 @@ export class MockAPI { host_duration: 55, device_duration: 0, self_host_duration: 41, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -2652,7 +2660,7 @@ export class MockAPI { host_duration: 119, device_duration: 0, self_host_duration: 64, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -2660,7 +2668,7 @@ export class MockAPI { host_duration: 49, device_duration: 0, self_host_duration: 40, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::addmm', @@ -2668,7 +2676,7 @@ export class MockAPI { host_duration: 404, device_duration: 43, self_host_duration: 302, - self_device_duration: 43, + self_device_duration: 43 }, { name: 'aten::linear', @@ -2676,9 +2684,9 @@ export class MockAPI { host_duration: 591, device_duration: 43, self_host_duration: 68, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: ResNet', @@ -2692,7 +2700,7 @@ export class MockAPI { host_duration: 2292, device_duration: 0, self_host_duration: 2292, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution', @@ -2700,7 +2708,7 @@ export class MockAPI { host_duration: 8713, device_duration: 36205, self_host_duration: 6819, - self_device_duration: 36205, + self_device_duration: 36205 }, { name: 'aten::_convolution', @@ -2708,7 +2716,7 @@ export class MockAPI { host_duration: 9298, device_duration: 36205, self_host_duration: 585, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::convolution', @@ -2716,7 +2724,7 @@ export class MockAPI { host_duration: 9653, device_duration: 36205, self_host_duration: 355, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::conv2d', @@ -2724,7 +2732,7 @@ export class MockAPI { host_duration: 9932, device_duration: 36205, self_host_duration: 279, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -2732,7 +2740,7 @@ export class MockAPI { host_duration: 1897, device_duration: 58, self_host_duration: 1201, - self_device_duration: 58, + self_device_duration: 58 }, { name: 'aten::empty_like', @@ -2740,7 +2748,7 @@ export class MockAPI { host_duration: 933, device_duration: 0, self_host_duration: 284, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::view', @@ -2748,7 +2756,7 @@ export class MockAPI { host_duration: 130, device_duration: 0, self_host_duration: 130, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm', @@ -2756,7 +2764,7 @@ export class MockAPI { host_duration: 5540, device_duration: 12913, self_host_duration: 2504, - self_device_duration: 12913, + self_device_duration: 12913 }, { name: 'aten::_batch_norm_impl_index', @@ -2764,7 +2772,7 @@ export class MockAPI { host_duration: 5942, device_duration: 12913, self_host_duration: 402, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::batch_norm', @@ -2772,7 +2780,7 @@ export class MockAPI { host_duration: 6219, device_duration: 12913, self_host_duration: 277, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::clamp_min', @@ -2780,7 +2788,7 @@ export class MockAPI { host_duration: 1108, device_duration: 6006, self_host_duration: 523, - self_device_duration: 6006, + self_device_duration: 6006 }, { name: 'aten::clamp_min_', @@ -2788,7 +2796,7 @@ export class MockAPI { host_duration: 1315, device_duration: 6006, self_host_duration: 207, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::relu_', @@ -2796,7 +2804,7 @@ export class MockAPI { host_duration: 1939, device_duration: 6006, self_host_duration: 624, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices', @@ -2804,7 +2812,7 @@ export class MockAPI { host_duration: 53, device_duration: 472, self_host_duration: 38, - self_device_duration: 472, + self_device_duration: 472 }, { name: 'aten::max_pool2d', @@ -2812,7 +2820,7 @@ export class MockAPI { host_duration: 61, device_duration: 472, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -2820,7 +2828,7 @@ export class MockAPI { host_duration: 448, device_duration: 5140, self_host_duration: 268, - self_device_duration: 5140, + self_device_duration: 5140 }, { name: 'aten::mean', @@ -2828,7 +2836,7 @@ export class MockAPI { host_duration: 53, device_duration: 63, self_host_duration: 39, - self_device_duration: 63, + self_device_duration: 63 }, { name: 'aten::adaptive_avg_pool2d', @@ -2836,7 +2844,7 @@ export class MockAPI { host_duration: 59, device_duration: 63, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -2844,7 +2852,7 @@ export class MockAPI { host_duration: 8, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::flatten', @@ -2852,7 +2860,7 @@ export class MockAPI { host_duration: 15, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -2860,7 +2868,7 @@ export class MockAPI { host_duration: 3, device_duration: 0, self_host_duration: 3, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -2868,7 +2876,7 @@ export class MockAPI { host_duration: 8, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -2876,7 +2884,7 @@ export class MockAPI { host_duration: 15, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -2884,7 +2892,7 @@ export class MockAPI { host_duration: 6, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::addmm', @@ -2892,7 +2900,7 @@ export class MockAPI { host_duration: 173, device_duration: 42, self_host_duration: 123, - self_device_duration: 42, + self_device_duration: 42 }, { name: 'aten::linear', @@ -2900,11 +2908,11 @@ export class MockAPI { host_duration: 198, device_duration: 42, self_host_duration: 10, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-3', + path: '0-3' }, { left: { @@ -2919,7 +2927,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax', @@ -2927,7 +2935,7 @@ export class MockAPI { host_duration: 158, device_duration: 7, self_host_duration: 139, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::log_softmax', @@ -2935,7 +2943,7 @@ export class MockAPI { host_duration: 241, device_duration: 7, self_host_duration: 78, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -2943,7 +2951,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_forward', @@ -2951,7 +2959,7 @@ export class MockAPI { host_duration: 256, device_duration: 4, self_host_duration: 233, - self_device_duration: 4, + self_device_duration: 4 }, { name: 'aten::nll_loss', @@ -2959,7 +2967,7 @@ export class MockAPI { host_duration: 290, device_duration: 4, self_host_duration: 34, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_nd', @@ -2967,7 +2975,7 @@ export class MockAPI { host_duration: 313, device_duration: 4, self_host_duration: 23, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cross_entropy_loss', @@ -2975,9 +2983,9 @@ export class MockAPI { host_duration: 614, device_duration: 11, self_host_duration: 60, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: CrossEntropyLoss', @@ -2991,7 +2999,7 @@ export class MockAPI { host_duration: 2, device_duration: 0, self_host_duration: 2, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax', @@ -2999,7 +3007,7 @@ export class MockAPI { host_duration: 42, device_duration: 7, self_host_duration: 28, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::log_softmax', @@ -3007,7 +3015,7 @@ export class MockAPI { host_duration: 54, device_duration: 7, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -3015,7 +3023,7 @@ export class MockAPI { host_duration: 0, device_duration: 0, self_host_duration: 0, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_forward', @@ -3023,7 +3031,7 @@ export class MockAPI { host_duration: 47, device_duration: 4, self_host_duration: 34, - self_device_duration: 4, + self_device_duration: 4 }, { name: 'aten::nll_loss', @@ -3031,7 +3039,7 @@ export class MockAPI { host_duration: 52, device_duration: 4, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_nd', @@ -3039,7 +3047,7 @@ export class MockAPI { host_duration: 56, device_duration: 4, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cross_entropy_loss', @@ -3047,11 +3055,11 @@ export class MockAPI { host_duration: 119, device_duration: 11, self_host_duration: 9, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-4', + path: '0-4' }, { left: { @@ -3066,7 +3074,7 @@ export class MockAPI { host_duration: 47, device_duration: 0, self_host_duration: 47, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -3074,7 +3082,7 @@ export class MockAPI { host_duration: 4, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -3082,9 +3090,9 @@ export class MockAPI { host_duration: 119, device_duration: 0, self_host_duration: 68, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'aten::zeros', @@ -3098,7 +3106,7 @@ export class MockAPI { host_duration: 8, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -3106,7 +3114,7 @@ export class MockAPI { host_duration: 2, device_duration: 0, self_host_duration: 2, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -3114,11 +3122,11 @@ export class MockAPI { host_duration: 17, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-5', + path: '0-5' }, { left: { @@ -3133,7 +3141,7 @@ export class MockAPI { host_duration: 38, device_duration: 0, self_host_duration: 38, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -3141,7 +3149,7 @@ export class MockAPI { host_duration: 7097, device_duration: 142, self_host_duration: 4914, - self_device_duration: 142, + self_device_duration: 142 }, { name: 'aten::zero_', @@ -3149,9 +3157,9 @@ export class MockAPI { host_duration: 14725, device_duration: 142, self_host_duration: 7628, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'Optimizer.zero_grad#SGD.zero_grad', @@ -3165,7 +3173,7 @@ export class MockAPI { host_duration: 6, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -3173,7 +3181,7 @@ export class MockAPI { host_duration: 2036, device_duration: 264, self_host_duration: 909, - self_device_duration: 264, + self_device_duration: 264 }, { name: 'aten::zero_', @@ -3181,11 +3189,11 @@ export class MockAPI { host_duration: 2855, device_duration: 264, self_host_duration: 819, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-6', + path: '0-6' }, { left: { @@ -3200,7 +3208,7 @@ export class MockAPI { host_duration: 79, device_duration: 0, self_host_duration: 79, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_like', @@ -3208,7 +3216,7 @@ export class MockAPI { host_duration: 126, device_duration: 0, self_host_duration: 47, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -3216,7 +3224,7 @@ export class MockAPI { host_duration: 50, device_duration: 1, self_host_duration: 35, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::ones_like', @@ -3224,9 +3232,9 @@ export class MockAPI { host_duration: 253, device_duration: 1, self_host_duration: 77, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'aten::ones_like', @@ -3240,7 +3248,7 @@ export class MockAPI { host_duration: 18, device_duration: 0, self_host_duration: 18, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_like', @@ -3248,7 +3256,7 @@ export class MockAPI { host_duration: 26, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -3256,7 +3264,7 @@ export class MockAPI { host_duration: 20, device_duration: 1, self_host_duration: 8, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::ones_like', @@ -3264,11 +3272,11 @@ export class MockAPI { host_duration: 53, device_duration: 1, self_host_duration: 7, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-7', + path: '0-7' }, { left: { @@ -3283,7 +3291,7 @@ export class MockAPI { host_duration: 69, device_duration: 1, self_host_duration: 43, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::zero_', @@ -3291,7 +3299,7 @@ export class MockAPI { host_duration: 120, device_duration: 1, self_host_duration: 51, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_backward', @@ -3299,7 +3307,7 @@ export class MockAPI { host_duration: 304, device_duration: 4, self_host_duration: 168, - self_device_duration: 3, + self_device_duration: 3 }, { name: 'NllLossBackward0', @@ -3307,7 +3315,7 @@ export class MockAPI { host_duration: 368, device_duration: 4, self_host_duration: 64, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: NllLossBackward0', @@ -3315,7 +3323,7 @@ export class MockAPI { host_duration: 503, device_duration: 4, self_host_duration: 135, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax_backward_data', @@ -3323,7 +3331,7 @@ export class MockAPI { host_duration: 127, device_duration: 9, self_host_duration: 105, - self_device_duration: 9, + self_device_duration: 9 }, { name: 'LogSoftmaxBackward0', @@ -3331,17 +3339,18 @@ export class MockAPI { host_duration: 207, device_duration: 9, self_host_duration: 80, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: LogSoftmaxBackward0', + name: + 'autograd::engine::evaluate_function: LogSoftmaxBackward0', calls: 1, host_duration: 349, device_duration: 9, self_host_duration: 142, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: CrossEntropyLoss.backward', @@ -3355,7 +3364,7 @@ export class MockAPI { host_duration: 36, device_duration: 2, self_host_duration: 13, - self_device_duration: 2, + self_device_duration: 2 }, { name: 'aten::zero_', @@ -3363,7 +3372,7 @@ export class MockAPI { host_duration: 45, device_duration: 2, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_backward', @@ -3371,7 +3380,7 @@ export class MockAPI { host_duration: 99, device_duration: 5, self_host_duration: 43, - self_device_duration: 3, + self_device_duration: 3 }, { name: 'NllLossBackward0', @@ -3379,7 +3388,7 @@ export class MockAPI { host_duration: 112, device_duration: 5, self_host_duration: 13, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: NllLossBackward0', @@ -3387,7 +3396,7 @@ export class MockAPI { host_duration: 141, device_duration: 5, self_host_duration: 29, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax_backward_data', @@ -3395,7 +3404,7 @@ export class MockAPI { host_duration: 35, device_duration: 9, self_host_duration: 21, - self_device_duration: 9, + self_device_duration: 9 }, { name: 'LogSoftmaxBackward0', @@ -3403,19 +3412,20 @@ export class MockAPI { host_duration: 46, device_duration: 9, self_host_duration: 11, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: LogSoftmaxBackward0', + name: + 'autograd::engine::evaluate_function: LogSoftmaxBackward0', calls: 1, host_duration: 64, device_duration: 9, self_host_duration: 18, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-8', + path: '0-8' }, { left: { @@ -3430,7 +3440,7 @@ export class MockAPI { host_duration: 61, device_duration: 0, self_host_duration: 61, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -3438,7 +3448,7 @@ export class MockAPI { host_duration: 226, device_duration: 0, self_host_duration: 180, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -3446,7 +3456,7 @@ export class MockAPI { host_duration: 399, device_duration: 0, self_host_duration: 173, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mm', @@ -3454,7 +3464,7 @@ export class MockAPI { host_duration: 345, device_duration: 72, self_host_duration: 282, - self_device_duration: 72, + self_device_duration: 72 }, { name: 'AddmmBackward0', @@ -3462,7 +3472,7 @@ export class MockAPI { host_duration: 854, device_duration: 72, self_host_duration: 208, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sum', @@ -3470,7 +3480,7 @@ export class MockAPI { host_duration: 173, device_duration: 8, self_host_duration: 153, - self_device_duration: 8, + self_device_duration: 8 }, { name: 'aten::view', @@ -3478,7 +3488,7 @@ export class MockAPI { host_duration: 971, device_duration: 0, self_host_duration: 971, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddmmBackward0', @@ -3486,7 +3496,7 @@ export class MockAPI { host_duration: 1333, device_duration: 80, self_host_duration: 271, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -3494,7 +3504,7 @@ export class MockAPI { host_duration: 12621, device_duration: 501, self_host_duration: 9839, - self_device_duration: 501, + self_device_duration: 501 }, { name: 'torch::autograd::AccumulateGrad', @@ -3502,15 +3512,16 @@ export class MockAPI { host_duration: 20767, device_duration: 501, self_host_duration: 8146, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', + name: + 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', calls: 161, host_duration: 35735, device_duration: 501, self_host_duration: 14968, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'TBackward0', @@ -3518,7 +3529,7 @@ export class MockAPI { host_duration: 128, device_duration: 0, self_host_duration: 30, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: TBackward0', @@ -3526,7 +3537,7 @@ export class MockAPI { host_duration: 197, device_duration: 0, self_host_duration: 69, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -3534,7 +3545,7 @@ export class MockAPI { host_duration: 31, device_duration: 0, self_host_duration: 31, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::reshape', @@ -3542,7 +3553,7 @@ export class MockAPI { host_duration: 79, device_duration: 0, self_host_duration: 48, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'ReshapeAliasBackward0', @@ -3550,15 +3561,16 @@ export class MockAPI { host_duration: 131, device_duration: 0, self_host_duration: 52, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: ReshapeAliasBackward0', + name: + 'autograd::engine::evaluate_function: ReshapeAliasBackward0', calls: 1, host_duration: 197, device_duration: 0, self_host_duration: 66, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -3566,7 +3578,7 @@ export class MockAPI { host_duration: 84, device_duration: 0, self_host_duration: 69, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -3574,7 +3586,7 @@ export class MockAPI { host_duration: 6, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -3582,7 +3594,7 @@ export class MockAPI { host_duration: 289, device_duration: 38, self_host_duration: 267, - self_device_duration: 38, + self_device_duration: 38 }, { name: 'MeanBackward1', @@ -3590,7 +3602,7 @@ export class MockAPI { host_duration: 489, device_duration: 38, self_host_duration: 110, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: MeanBackward1', @@ -3598,7 +3610,7 @@ export class MockAPI { host_duration: 592, device_duration: 38, self_host_duration: 103, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::threshold_backward', @@ -3606,7 +3618,7 @@ export class MockAPI { host_duration: 6958, device_duration: 8972, self_host_duration: 6094, - self_device_duration: 8972, + self_device_duration: 8972 }, { name: 'ReluBackward0', @@ -3614,7 +3626,7 @@ export class MockAPI { host_duration: 10647, device_duration: 8972, self_host_duration: 3689, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReluBackward0', @@ -3622,7 +3634,7 @@ export class MockAPI { host_duration: 16826, device_duration: 8972, self_host_duration: 6179, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'AddBackward0', @@ -3630,7 +3642,7 @@ export class MockAPI { host_duration: 129, device_duration: 0, self_host_duration: 129, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddBackward0', @@ -3638,7 +3650,7 @@ export class MockAPI { host_duration: 1301, device_duration: 0, self_host_duration: 1172, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty', @@ -3646,7 +3658,7 @@ export class MockAPI { host_duration: 20319, device_duration: 0, self_host_duration: 20319, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm_backward', @@ -3654,7 +3666,7 @@ export class MockAPI { host_duration: 31300, device_duration: 22267, self_host_duration: 18144, - self_device_duration: 22267, + self_device_duration: 22267 }, { name: 'CudnnBatchNormBackward0', @@ -3662,15 +3674,16 @@ export class MockAPI { host_duration: 34805, device_duration: 22267, self_host_duration: 3505, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', + name: + 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', calls: 53, host_duration: 44607, device_duration: 22267, self_host_duration: 9802, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution_backward_input', @@ -3678,7 +3691,7 @@ export class MockAPI { host_duration: 20324, device_duration: 38733, self_host_duration: 15252, - self_device_duration: 38733, + self_device_duration: 38733 }, { name: 'aten::cudnn_convolution_backward_weight', @@ -3686,7 +3699,7 @@ export class MockAPI { host_duration: 21997, device_duration: 45837, self_host_duration: 13786, - self_device_duration: 45837, + self_device_duration: 45837 }, { name: 'aten::cudnn_convolution_backward', @@ -3694,7 +3707,7 @@ export class MockAPI { host_duration: 50059, device_duration: 84570, self_host_duration: 7738, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'CudnnConvolutionBackward0', @@ -3702,15 +3715,16 @@ export class MockAPI { host_duration: 53558, device_duration: 84570, self_host_duration: 3499, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', + name: + 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', calls: 53, host_duration: 64252, device_duration: 89775, self_host_duration: 8462, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -3718,7 +3732,7 @@ export class MockAPI { host_duration: 2232, device_duration: 5205, self_host_duration: 1944, - self_device_duration: 5205, + self_device_duration: 5205 }, { name: 'aten::fill_', @@ -3726,7 +3740,7 @@ export class MockAPI { host_duration: 61, device_duration: 230, self_host_duration: 44, - self_device_duration: 230, + self_device_duration: 230 }, { name: 'aten::zero_', @@ -3734,7 +3748,7 @@ export class MockAPI { host_duration: 104, device_duration: 230, self_host_duration: 43, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices_backward', @@ -3742,7 +3756,7 @@ export class MockAPI { host_duration: 246, device_duration: 1544, self_host_duration: 128, - self_device_duration: 1314, + self_device_duration: 1314 }, { name: 'MaxPool2DWithIndicesBackward0', @@ -3750,17 +3764,18 @@ export class MockAPI { host_duration: 304, device_duration: 1544, self_host_duration: 58, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', + name: + 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', calls: 1, host_duration: 425, device_duration: 1544, self_host_duration: 121, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: ResNet.backward', @@ -3774,7 +3789,7 @@ export class MockAPI { host_duration: 9, device_duration: 0, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -3782,7 +3797,7 @@ export class MockAPI { host_duration: 38, device_duration: 0, self_host_duration: 31, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -3790,7 +3805,7 @@ export class MockAPI { host_duration: 59, device_duration: 0, self_host_duration: 21, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mm', @@ -3798,7 +3813,7 @@ export class MockAPI { host_duration: 139, device_duration: 67, self_host_duration: 90, - self_device_duration: 67, + self_device_duration: 67 }, { name: 'AddmmBackward0', @@ -3806,7 +3821,7 @@ export class MockAPI { host_duration: 210, device_duration: 67, self_host_duration: 23, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sum', @@ -3814,7 +3829,7 @@ export class MockAPI { host_duration: 47, device_duration: 7, self_host_duration: 32, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::view', @@ -3822,7 +3837,7 @@ export class MockAPI { host_duration: 166, device_duration: 0, self_host_duration: 166, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddmmBackward0', @@ -3830,7 +3845,7 @@ export class MockAPI { host_duration: 299, device_duration: 74, self_host_duration: 37, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -3838,7 +3853,7 @@ export class MockAPI { host_duration: 4087, device_duration: 534, self_host_duration: 2037, - self_device_duration: 534, + self_device_duration: 534 }, { name: 'torch::autograd::AccumulateGrad', @@ -3846,15 +3861,16 @@ export class MockAPI { host_duration: 5134, device_duration: 534, self_host_duration: 1047, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', + name: + 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', calls: 161, host_duration: 7473, device_duration: 534, self_host_duration: 2339, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'TBackward0', @@ -3862,7 +3878,7 @@ export class MockAPI { host_duration: 14, device_duration: 0, self_host_duration: 3, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: TBackward0', @@ -3870,7 +3886,7 @@ export class MockAPI { host_duration: 21, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -3878,7 +3894,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::reshape', @@ -3886,7 +3902,7 @@ export class MockAPI { host_duration: 10, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'ReshapeAliasBackward0', @@ -3894,15 +3910,16 @@ export class MockAPI { host_duration: 14, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: ReshapeAliasBackward0', + name: + 'autograd::engine::evaluate_function: ReshapeAliasBackward0', calls: 1, host_duration: 21, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -3910,7 +3927,7 @@ export class MockAPI { host_duration: 9, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -3918,7 +3935,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -3926,7 +3943,7 @@ export class MockAPI { host_duration: 70, device_duration: 38, self_host_duration: 49, - self_device_duration: 38, + self_device_duration: 38 }, { name: 'MeanBackward1', @@ -3934,7 +3951,7 @@ export class MockAPI { host_duration: 89, device_duration: 38, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: MeanBackward1', @@ -3942,7 +3959,7 @@ export class MockAPI { host_duration: 102, device_duration: 38, self_host_duration: 13, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::threshold_backward', @@ -3950,7 +3967,7 @@ export class MockAPI { host_duration: 1789, device_duration: 9015, self_host_duration: 1158, - self_device_duration: 9015, + self_device_duration: 9015 }, { name: 'ReluBackward0', @@ -3958,7 +3975,7 @@ export class MockAPI { host_duration: 2237, device_duration: 9015, self_host_duration: 448, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReluBackward0', @@ -3966,7 +3983,7 @@ export class MockAPI { host_duration: 3144, device_duration: 9015, self_host_duration: 907, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'AddBackward0', @@ -3974,7 +3991,7 @@ export class MockAPI { host_duration: 12, device_duration: 0, self_host_duration: 12, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddBackward0', @@ -3982,7 +3999,7 @@ export class MockAPI { host_duration: 126, device_duration: 0, self_host_duration: 114, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty', @@ -3990,7 +4007,7 @@ export class MockAPI { host_duration: 3292, device_duration: 0, self_host_duration: 3292, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm_backward', @@ -3998,7 +4015,7 @@ export class MockAPI { host_duration: 4896, device_duration: 22157, self_host_duration: 2136, - self_device_duration: 22157, + self_device_duration: 22157 }, { name: 'CudnnBatchNormBackward0', @@ -4006,15 +4023,16 @@ export class MockAPI { host_duration: 5495, device_duration: 22157, self_host_duration: 599, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', + name: + 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', calls: 53, host_duration: 7289, device_duration: 22157, self_host_duration: 1794, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution_backward_input', @@ -4022,7 +4040,7 @@ export class MockAPI { host_duration: 9468, device_duration: 37714, self_host_duration: 7052, - self_device_duration: 37714, + self_device_duration: 37714 }, { name: 'aten::cudnn_convolution_backward_weight', @@ -4030,7 +4048,7 @@ export class MockAPI { host_duration: 8906, device_duration: 44342, self_host_duration: 5723, - self_device_duration: 44342, + self_device_duration: 44342 }, { name: 'aten::cudnn_convolution_backward', @@ -4038,7 +4056,7 @@ export class MockAPI { host_duration: 19611, device_duration: 82056, self_host_duration: 1237, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'CudnnConvolutionBackward0', @@ -4046,15 +4064,16 @@ export class MockAPI { host_duration: 20205, device_duration: 82056, self_host_duration: 594, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', + name: + 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', calls: 53, host_duration: 22185, device_duration: 87283, self_host_duration: 1386, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -4062,7 +4081,7 @@ export class MockAPI { host_duration: 594, device_duration: 5227, self_host_duration: 380, - self_device_duration: 5227, + self_device_duration: 5227 }, { name: 'aten::fill_', @@ -4070,7 +4089,7 @@ export class MockAPI { host_duration: 24, device_duration: 230, self_host_duration: 11, - self_device_duration: 230, + self_device_duration: 230 }, { name: 'aten::zero_', @@ -4078,7 +4097,7 @@ export class MockAPI { host_duration: 32, device_duration: 230, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices_backward', @@ -4086,7 +4105,7 @@ export class MockAPI { host_duration: 72, device_duration: 1503, self_host_duration: 31, - self_device_duration: 1273, + self_device_duration: 1273 }, { name: 'MaxPool2DWithIndicesBackward0', @@ -4094,19 +4113,20 @@ export class MockAPI { host_duration: 82, device_duration: 1503, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', + name: + 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', calls: 1, host_duration: 103, device_duration: 1503, self_host_duration: 21, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-9', + path: '0-9' }, { left: { @@ -4121,7 +4141,7 @@ export class MockAPI { host_duration: 75, device_duration: 0, self_host_duration: 75, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -4129,7 +4149,7 @@ export class MockAPI { host_duration: 4, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -4137,9 +4157,9 @@ export class MockAPI { host_duration: 154, device_duration: 0, self_host_duration: 75, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'aten::zeros', @@ -4153,7 +4173,7 @@ export class MockAPI { host_duration: 32, device_duration: 0, self_host_duration: 32, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -4161,7 +4181,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -4169,11 +4189,11 @@ export class MockAPI { host_duration: 42, device_duration: 0, self_host_duration: 9, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-10', + path: '0-10' }, { left: { @@ -4188,7 +4208,7 @@ export class MockAPI { host_duration: 40, device_duration: 0, self_host_duration: 40, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mul_', @@ -4196,7 +4216,7 @@ export class MockAPI { host_duration: 11873, device_duration: 396, self_host_duration: 9505, - self_device_duration: 396, + self_device_duration: 396 }, { name: 'aten::add_', @@ -4204,9 +4224,9 @@ export class MockAPI { host_duration: 22327, device_duration: 893, self_host_duration: 17668, - self_device_duration: 893, - }, - ], + self_device_duration: 893 + } + ] }, right: { name: 'Optimizer.step#SGD.step', @@ -4220,7 +4240,7 @@ export class MockAPI { host_duration: 6, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mul_', @@ -4228,7 +4248,7 @@ export class MockAPI { host_duration: 3395, device_duration: 399, self_host_duration: 1806, - self_device_duration: 399, + self_device_duration: 399 }, { name: 'aten::add_', @@ -4236,11 +4256,11 @@ export class MockAPI { host_duration: 6217, device_duration: 906, self_host_duration: 3246, - self_device_duration: 906, - }, - ], + self_device_duration: 906 + } + ] }, - path: '0-11', + path: '0-11' }, { left: { @@ -4255,7 +4275,7 @@ export class MockAPI { host_duration: 79, device_duration: 0, self_host_duration: 79, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -4263,7 +4283,7 @@ export class MockAPI { host_duration: 4, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -4271,9 +4291,9 @@ export class MockAPI { host_duration: 106, device_duration: 0, self_host_duration: 62, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'multiple nodes', @@ -4287,7 +4307,7 @@ export class MockAPI { host_duration: 10, device_duration: 0, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -4295,7 +4315,7 @@ export class MockAPI { host_duration: 0, device_duration: 0, self_host_duration: 0, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -4303,11 +4323,11 @@ export class MockAPI { host_duration: 9, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-12', + path: '0-12' }, { left: { @@ -4322,7 +4342,7 @@ export class MockAPI { host_duration: 53837, device_duration: 0, self_host_duration: 53837, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -4330,7 +4350,7 @@ export class MockAPI { host_duration: 955, device_duration: 0, self_host_duration: 955, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -4338,7 +4358,7 @@ export class MockAPI { host_duration: 26673, device_duration: 0, self_host_duration: 16083, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -4346,7 +4366,7 @@ export class MockAPI { host_duration: 824006, device_duration: 0, self_host_duration: 18525, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach', @@ -4354,7 +4374,7 @@ export class MockAPI { host_duration: 2188, device_duration: 0, self_host_duration: 2188, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach', @@ -4362,7 +4382,7 @@ export class MockAPI { host_duration: 5295, device_duration: 0, self_host_duration: 3107, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -4370,7 +4390,7 @@ export class MockAPI { host_duration: 4123, device_duration: 0, self_host_duration: 4123, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::unsqueeze', @@ -4378,7 +4398,7 @@ export class MockAPI { host_duration: 9590, device_duration: 0, self_host_duration: 8097, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_strided', @@ -4386,7 +4406,7 @@ export class MockAPI { host_duration: 24764, device_duration: 0, self_host_duration: 24764, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -4394,7 +4414,7 @@ export class MockAPI { host_duration: 728608, device_duration: 0, self_host_duration: 728608, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_to_copy', @@ -4402,7 +4422,7 @@ export class MockAPI { host_duration: 805481, device_duration: 0, self_host_duration: 51350, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::upsample_bilinear2d', @@ -4410,7 +4430,7 @@ export class MockAPI { host_duration: 236448, device_duration: 0, self_host_duration: 216887, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::squeeze', @@ -4418,7 +4438,7 @@ export class MockAPI { host_duration: 4682, device_duration: 0, self_host_duration: 4092, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::round', @@ -4426,7 +4446,7 @@ export class MockAPI { host_duration: 15283, device_duration: 0, self_host_duration: 15283, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::slice', @@ -4434,7 +4454,7 @@ export class MockAPI { host_duration: 8844, device_duration: 0, self_host_duration: 7513, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach_', @@ -4442,7 +4462,7 @@ export class MockAPI { host_duration: 2102, device_duration: 0, self_host_duration: 2102, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach_', @@ -4450,7 +4470,7 @@ export class MockAPI { host_duration: 7286, device_duration: 0, self_host_duration: 5184, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::result_type', @@ -4458,7 +4478,7 @@ export class MockAPI { host_duration: 850, device_duration: 0, self_host_duration: 850, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::pow', @@ -4466,7 +4486,7 @@ export class MockAPI { host_duration: 43219, device_duration: 0, self_host_duration: 39305, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sub', @@ -4474,7 +4494,7 @@ export class MockAPI { host_duration: 92093, device_duration: 0, self_host_duration: 37961, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::gt', @@ -4482,7 +4502,7 @@ export class MockAPI { host_duration: 35770, device_duration: 0, self_host_duration: 24869, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_local_scalar_dense', @@ -4490,7 +4510,7 @@ export class MockAPI { host_duration: 2481, device_duration: 0, self_host_duration: 2481, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::item', @@ -4498,7 +4518,7 @@ export class MockAPI { host_duration: 10547, device_duration: 0, self_host_duration: 8066, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::is_nonzero', @@ -4506,7 +4526,7 @@ export class MockAPI { host_duration: 14029, device_duration: 0, self_host_duration: 5364, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -4514,7 +4534,7 @@ export class MockAPI { host_duration: 79760, device_duration: 0, self_host_duration: 68841, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -4522,7 +4542,7 @@ export class MockAPI { host_duration: 121, device_duration: 0, self_host_duration: 121, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::narrow', @@ -4530,7 +4550,7 @@ export class MockAPI { host_duration: 138, device_duration: 0, self_host_duration: 48, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_cat', @@ -4538,7 +4558,7 @@ export class MockAPI { host_duration: 41467, device_duration: 0, self_host_duration: 41176, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cat', @@ -4546,7 +4566,7 @@ export class MockAPI { host_duration: 41608, device_duration: 0, self_host_duration: 141, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::stack', @@ -4554,9 +4574,9 @@ export class MockAPI { host_duration: 49080, device_duration: 0, self_host_duration: 2720, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'enumerate(DataLoader)#_SingleProcessDataLoaderIter.__next__', @@ -4570,7 +4590,7 @@ export class MockAPI { host_duration: 6528, device_duration: 0, self_host_duration: 6528, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -4578,7 +4598,7 @@ export class MockAPI { host_duration: 94, device_duration: 0, self_host_duration: 94, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -4586,7 +4606,7 @@ export class MockAPI { host_duration: 2448, device_duration: 0, self_host_duration: 1214, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -4594,7 +4614,7 @@ export class MockAPI { host_duration: 16544, device_duration: 0, self_host_duration: 1856, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach', @@ -4602,7 +4622,7 @@ export class MockAPI { host_duration: 337, device_duration: 0, self_host_duration: 337, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach', @@ -4610,7 +4630,7 @@ export class MockAPI { host_duration: 629, device_duration: 0, self_host_duration: 292, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -4618,7 +4638,7 @@ export class MockAPI { host_duration: 464, device_duration: 0, self_host_duration: 464, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::unsqueeze', @@ -4626,7 +4646,7 @@ export class MockAPI { host_duration: 1024, device_duration: 0, self_host_duration: 854, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_strided', @@ -4634,7 +4654,7 @@ export class MockAPI { host_duration: 3009, device_duration: 0, self_host_duration: 3009, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -4642,7 +4662,7 @@ export class MockAPI { host_duration: 7419, device_duration: 0, self_host_duration: 7419, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_to_copy', @@ -4650,7 +4670,7 @@ export class MockAPI { host_duration: 14688, device_duration: 0, self_host_duration: 4039, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::upsample_bilinear2d', @@ -4658,7 +4678,7 @@ export class MockAPI { host_duration: 31439, device_duration: 0, self_host_duration: 29154, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::squeeze', @@ -4666,7 +4686,7 @@ export class MockAPI { host_duration: 473, device_duration: 0, self_host_duration: 408, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::round', @@ -4674,7 +4694,7 @@ export class MockAPI { host_duration: 4416, device_duration: 0, self_host_duration: 4416, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::slice', @@ -4682,7 +4702,7 @@ export class MockAPI { host_duration: 864, device_duration: 0, self_host_duration: 730, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'detach_', @@ -4690,7 +4710,7 @@ export class MockAPI { host_duration: 136, device_duration: 0, self_host_duration: 115, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::detach_', @@ -4698,7 +4718,7 @@ export class MockAPI { host_duration: 586, device_duration: 0, self_host_duration: 471, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::result_type', @@ -4706,7 +4726,7 @@ export class MockAPI { host_duration: 149, device_duration: 0, self_host_duration: 149, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::pow', @@ -4714,7 +4734,7 @@ export class MockAPI { host_duration: 3935, device_duration: 0, self_host_duration: 3519, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sub', @@ -4722,7 +4742,7 @@ export class MockAPI { host_duration: 7881, device_duration: 0, self_host_duration: 3349, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::gt', @@ -4730,7 +4750,7 @@ export class MockAPI { host_duration: 3055, device_duration: 0, self_host_duration: 2164, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_local_scalar_dense', @@ -4738,7 +4758,7 @@ export class MockAPI { host_duration: 186, device_duration: 0, self_host_duration: 186, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::item', @@ -4746,7 +4766,7 @@ export class MockAPI { host_duration: 1134, device_duration: 0, self_host_duration: 943, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::is_nonzero', @@ -4754,7 +4774,7 @@ export class MockAPI { host_duration: 1588, device_duration: 0, self_host_duration: 615, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -4762,7 +4782,7 @@ export class MockAPI { host_duration: 4153, device_duration: 0, self_host_duration: 3203, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -4770,7 +4790,7 @@ export class MockAPI { host_duration: 42, device_duration: 0, self_host_duration: 42, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::narrow', @@ -4778,7 +4798,7 @@ export class MockAPI { host_duration: 18, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_cat', @@ -4786,7 +4806,7 @@ export class MockAPI { host_duration: 4613, device_duration: 0, self_host_duration: 4547, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cat', @@ -4794,7 +4814,7 @@ export class MockAPI { host_duration: 4637, device_duration: 0, self_host_duration: 24, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::stack', @@ -4802,11 +4822,11 @@ export class MockAPI { host_duration: 5311, device_duration: 0, self_host_duration: 246, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-13', + path: '0-13' }, { left: { @@ -4821,7 +4841,7 @@ export class MockAPI { host_duration: 203, device_duration: 0, self_host_duration: 203, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -4829,7 +4849,7 @@ export class MockAPI { host_duration: 4687, device_duration: 4394, self_host_duration: 94, - self_device_duration: 4394, + self_device_duration: 4394 }, { name: 'aten::_to_copy', @@ -4837,7 +4857,7 @@ export class MockAPI { host_duration: 5113, device_duration: 4394, self_host_duration: 223, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -4845,9 +4865,9 @@ export class MockAPI { host_duration: 5185, device_duration: 4394, self_host_duration: 72, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'multiple nodes', @@ -4861,7 +4881,7 @@ export class MockAPI { host_duration: 60, device_duration: 0, self_host_duration: 60, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::copy_', @@ -4869,7 +4889,7 @@ export class MockAPI { host_duration: 4559, device_duration: 4334, self_host_duration: 26, - self_device_duration: 4334, + self_device_duration: 4334 }, { name: 'aten::_to_copy', @@ -4877,7 +4897,7 @@ export class MockAPI { host_duration: 4655, device_duration: 4334, self_host_duration: 36, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -4885,11 +4905,11 @@ export class MockAPI { host_duration: 4664, device_duration: 4334, self_host_duration: 9, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-14', + path: '0-14' }, { left: { @@ -4904,7 +4924,7 @@ export class MockAPI { host_duration: 13992, device_duration: 0, self_host_duration: 13992, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution', @@ -4912,7 +4932,7 @@ export class MockAPI { host_duration: 21952, device_duration: 35233, self_host_duration: 17460, - self_device_duration: 35233, + self_device_duration: 35233 }, { name: 'aten::_convolution', @@ -4920,7 +4940,7 @@ export class MockAPI { host_duration: 25568, device_duration: 35233, self_host_duration: 3616, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::convolution', @@ -4928,7 +4948,7 @@ export class MockAPI { host_duration: 27534, device_duration: 35233, self_host_duration: 1966, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::conv2d', @@ -4936,7 +4956,7 @@ export class MockAPI { host_duration: 29546, device_duration: 35233, self_host_duration: 2012, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -4944,7 +4964,7 @@ export class MockAPI { host_duration: 6523, device_duration: 53, self_host_duration: 5669, - self_device_duration: 53, + self_device_duration: 53 }, { name: 'aten::empty_like', @@ -4952,7 +4972,7 @@ export class MockAPI { host_duration: 5605, device_duration: 0, self_host_duration: 2378, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::view', @@ -4960,7 +4980,7 @@ export class MockAPI { host_duration: 829, device_duration: 0, self_host_duration: 829, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm', @@ -4968,7 +4988,7 @@ export class MockAPI { host_duration: 35510, device_duration: 12828, self_host_duration: 20387, - self_device_duration: 12828, + self_device_duration: 12828 }, { name: 'aten::_batch_norm_impl_index', @@ -4976,7 +4996,7 @@ export class MockAPI { host_duration: 38030, device_duration: 12828, self_host_duration: 2520, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::batch_norm', @@ -4984,7 +5004,7 @@ export class MockAPI { host_duration: 39727, device_duration: 12828, self_host_duration: 1697, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::clamp_min', @@ -4992,7 +5012,7 @@ export class MockAPI { host_duration: 2715, device_duration: 5998, self_host_duration: 1950, - self_device_duration: 5998, + self_device_duration: 5998 }, { name: 'aten::clamp_min_', @@ -5000,7 +5020,7 @@ export class MockAPI { host_duration: 4264, device_duration: 5998, self_host_duration: 1549, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::relu_', @@ -5008,7 +5028,7 @@ export class MockAPI { host_duration: 8337, device_duration: 5998, self_host_duration: 4073, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices', @@ -5016,7 +5036,7 @@ export class MockAPI { host_duration: 212, device_duration: 466, self_host_duration: 193, - self_device_duration: 466, + self_device_duration: 466 }, { name: 'aten::max_pool2d', @@ -5024,7 +5044,7 @@ export class MockAPI { host_duration: 262, device_duration: 466, self_host_duration: 50, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -5032,7 +5052,7 @@ export class MockAPI { host_duration: 1553, device_duration: 5165, self_host_duration: 1297, - self_device_duration: 5165, + self_device_duration: 5165 }, { name: 'aten::mean', @@ -5040,7 +5060,7 @@ export class MockAPI { host_duration: 187, device_duration: 64, self_host_duration: 169, - self_device_duration: 64, + self_device_duration: 64 }, { name: 'aten::adaptive_avg_pool2d', @@ -5048,7 +5068,7 @@ export class MockAPI { host_duration: 231, device_duration: 64, self_host_duration: 44, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -5056,7 +5076,7 @@ export class MockAPI { host_duration: 52, device_duration: 0, self_host_duration: 52, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::flatten', @@ -5064,7 +5084,7 @@ export class MockAPI { host_duration: 101, device_duration: 0, self_host_duration: 49, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -5072,7 +5092,7 @@ export class MockAPI { host_duration: 21, device_duration: 0, self_host_duration: 21, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -5080,7 +5100,7 @@ export class MockAPI { host_duration: 51, device_duration: 0, self_host_duration: 40, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -5088,7 +5108,7 @@ export class MockAPI { host_duration: 120, device_duration: 0, self_host_duration: 69, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -5096,7 +5116,7 @@ export class MockAPI { host_duration: 49, device_duration: 0, self_host_duration: 39, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::addmm', @@ -5104,7 +5124,7 @@ export class MockAPI { host_duration: 405, device_duration: 41, self_host_duration: 302, - self_device_duration: 41, + self_device_duration: 41 }, { name: 'aten::linear', @@ -5112,9 +5132,9 @@ export class MockAPI { host_duration: 594, device_duration: 41, self_host_duration: 69, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: ResNet', @@ -5128,7 +5148,7 @@ export class MockAPI { host_duration: 2234, device_duration: 0, self_host_duration: 2234, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution', @@ -5136,7 +5156,7 @@ export class MockAPI { host_duration: 8644, device_duration: 35209, self_host_duration: 6782, - self_device_duration: 35209, + self_device_duration: 35209 }, { name: 'aten::_convolution', @@ -5144,7 +5164,7 @@ export class MockAPI { host_duration: 9216, device_duration: 35209, self_host_duration: 572, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::convolution', @@ -5152,7 +5172,7 @@ export class MockAPI { host_duration: 9532, device_duration: 35209, self_host_duration: 316, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::conv2d', @@ -5160,7 +5180,7 @@ export class MockAPI { host_duration: 9818, device_duration: 35209, self_host_duration: 286, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -5168,7 +5188,7 @@ export class MockAPI { host_duration: 1898, device_duration: 55, self_host_duration: 1202, - self_device_duration: 55, + self_device_duration: 55 }, { name: 'aten::empty_like', @@ -5176,7 +5196,7 @@ export class MockAPI { host_duration: 941, device_duration: 0, self_host_duration: 300, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::view', @@ -5184,7 +5204,7 @@ export class MockAPI { host_duration: 137, device_duration: 0, self_host_duration: 137, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm', @@ -5192,7 +5212,7 @@ export class MockAPI { host_duration: 5543, device_duration: 12824, self_host_duration: 2527, - self_device_duration: 12824, + self_device_duration: 12824 }, { name: 'aten::_batch_norm_impl_index', @@ -5200,7 +5220,7 @@ export class MockAPI { host_duration: 5914, device_duration: 12824, self_host_duration: 371, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::batch_norm', @@ -5208,7 +5228,7 @@ export class MockAPI { host_duration: 6167, device_duration: 12824, self_host_duration: 253, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::clamp_min', @@ -5216,7 +5236,7 @@ export class MockAPI { host_duration: 1081, device_duration: 6004, self_host_duration: 507, - self_device_duration: 6004, + self_device_duration: 6004 }, { name: 'aten::clamp_min_', @@ -5224,7 +5244,7 @@ export class MockAPI { host_duration: 1299, device_duration: 6004, self_host_duration: 218, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::relu_', @@ -5232,7 +5252,7 @@ export class MockAPI { host_duration: 1941, device_duration: 6004, self_host_duration: 642, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices', @@ -5240,7 +5260,7 @@ export class MockAPI { host_duration: 59, device_duration: 466, self_host_duration: 44, - self_device_duration: 466, + self_device_duration: 466 }, { name: 'aten::max_pool2d', @@ -5248,7 +5268,7 @@ export class MockAPI { host_duration: 66, device_duration: 466, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -5256,7 +5276,7 @@ export class MockAPI { host_duration: 443, device_duration: 5169, self_host_duration: 267, - self_device_duration: 5169, + self_device_duration: 5169 }, { name: 'aten::mean', @@ -5264,7 +5284,7 @@ export class MockAPI { host_duration: 51, device_duration: 63, self_host_duration: 37, - self_device_duration: 63, + self_device_duration: 63 }, { name: 'aten::adaptive_avg_pool2d', @@ -5272,7 +5292,7 @@ export class MockAPI { host_duration: 58, device_duration: 63, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -5280,7 +5300,7 @@ export class MockAPI { host_duration: 8, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::flatten', @@ -5288,7 +5308,7 @@ export class MockAPI { host_duration: 16, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::as_strided', @@ -5296,7 +5316,7 @@ export class MockAPI { host_duration: 3, device_duration: 0, self_host_duration: 3, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -5304,7 +5324,7 @@ export class MockAPI { host_duration: 10, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -5312,7 +5332,7 @@ export class MockAPI { host_duration: 18, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -5320,7 +5340,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::addmm', @@ -5328,7 +5348,7 @@ export class MockAPI { host_duration: 161, device_duration: 42, self_host_duration: 111, - self_device_duration: 42, + self_device_duration: 42 }, { name: 'aten::linear', @@ -5336,11 +5356,11 @@ export class MockAPI { host_duration: 188, device_duration: 42, self_host_duration: 9, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-15', + path: '0-15' }, { left: { @@ -5355,7 +5375,7 @@ export class MockAPI { host_duration: 6, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax', @@ -5363,7 +5383,7 @@ export class MockAPI { host_duration: 150, device_duration: 7, self_host_duration: 132, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::log_softmax', @@ -5371,7 +5391,7 @@ export class MockAPI { host_duration: 231, device_duration: 7, self_host_duration: 75, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -5379,7 +5399,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_forward', @@ -5387,7 +5407,7 @@ export class MockAPI { host_duration: 266, device_duration: 4, self_host_duration: 243, - self_device_duration: 4, + self_device_duration: 4 }, { name: 'aten::nll_loss', @@ -5395,7 +5415,7 @@ export class MockAPI { host_duration: 300, device_duration: 4, self_host_duration: 34, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_nd', @@ -5403,7 +5423,7 @@ export class MockAPI { host_duration: 328, device_duration: 4, self_host_duration: 28, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cross_entropy_loss', @@ -5411,9 +5431,9 @@ export class MockAPI { host_duration: 620, device_duration: 11, self_host_duration: 61, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: CrossEntropyLoss', @@ -5427,7 +5447,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax', @@ -5435,7 +5455,7 @@ export class MockAPI { host_duration: 41, device_duration: 7, self_host_duration: 27, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::log_softmax', @@ -5443,7 +5463,7 @@ export class MockAPI { host_duration: 52, device_duration: 7, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::resize_', @@ -5451,7 +5471,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_forward', @@ -5459,7 +5479,7 @@ export class MockAPI { host_duration: 49, device_duration: 4, self_host_duration: 34, - self_device_duration: 4, + self_device_duration: 4 }, { name: 'aten::nll_loss', @@ -5467,7 +5487,7 @@ export class MockAPI { host_duration: 53, device_duration: 4, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_nd', @@ -5475,7 +5495,7 @@ export class MockAPI { host_duration: 57, device_duration: 4, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cross_entropy_loss', @@ -5483,11 +5503,11 @@ export class MockAPI { host_duration: 124, device_duration: 11, self_host_duration: 15, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-16', + path: '0-16' }, { left: { @@ -5502,7 +5522,7 @@ export class MockAPI { host_duration: 39, device_duration: 0, self_host_duration: 39, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -5510,7 +5530,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -5518,9 +5538,9 @@ export class MockAPI { host_duration: 109, device_duration: 0, self_host_duration: 65, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'aten::zeros', @@ -5534,7 +5554,7 @@ export class MockAPI { host_duration: 13, device_duration: 0, self_host_duration: 13, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -5542,7 +5562,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -5550,11 +5570,11 @@ export class MockAPI { host_duration: 23, device_duration: 0, self_host_duration: 9, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-17', + path: '0-17' }, { left: { @@ -5569,7 +5589,7 @@ export class MockAPI { host_duration: 44, device_duration: 0, self_host_duration: 44, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -5577,7 +5597,7 @@ export class MockAPI { host_duration: 7104, device_duration: 132, self_host_duration: 4941, - self_device_duration: 132, + self_device_duration: 132 }, { name: 'aten::zero_', @@ -5585,9 +5605,9 @@ export class MockAPI { host_duration: 14806, device_duration: 132, self_host_duration: 7702, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'Optimizer.zero_grad#SGD.zero_grad', @@ -5601,7 +5621,7 @@ export class MockAPI { host_duration: 6, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -5609,7 +5629,7 @@ export class MockAPI { host_duration: 1945, device_duration: 137, self_host_duration: 878, - self_device_duration: 137, + self_device_duration: 137 }, { name: 'aten::zero_', @@ -5617,11 +5637,11 @@ export class MockAPI { host_duration: 2805, device_duration: 137, self_host_duration: 860, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-18', + path: '0-18' }, { left: { @@ -5636,7 +5656,7 @@ export class MockAPI { host_duration: 99, device_duration: 0, self_host_duration: 99, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_like', @@ -5644,7 +5664,7 @@ export class MockAPI { host_duration: 149, device_duration: 0, self_host_duration: 50, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -5652,7 +5672,7 @@ export class MockAPI { host_duration: 49, device_duration: 1, self_host_duration: 34, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::ones_like', @@ -5660,9 +5680,9 @@ export class MockAPI { host_duration: 263, device_duration: 1, self_host_duration: 65, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'aten::ones_like', @@ -5676,7 +5696,7 @@ export class MockAPI { host_duration: 18, device_duration: 0, self_host_duration: 18, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty_like', @@ -5684,7 +5704,7 @@ export class MockAPI { host_duration: 24, device_duration: 0, self_host_duration: 6, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::fill_', @@ -5692,7 +5712,7 @@ export class MockAPI { host_duration: 20, device_duration: 1, self_host_duration: 8, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::ones_like', @@ -5700,11 +5720,11 @@ export class MockAPI { host_duration: 51, device_duration: 1, self_host_duration: 7, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-19', + path: '0-19' }, { left: { @@ -5719,7 +5739,7 @@ export class MockAPI { host_duration: 58, device_duration: 1, self_host_duration: 36, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::zero_', @@ -5727,7 +5747,7 @@ export class MockAPI { host_duration: 112, device_duration: 1, self_host_duration: 54, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_backward', @@ -5735,7 +5755,7 @@ export class MockAPI { host_duration: 269, device_duration: 4, self_host_duration: 142, - self_device_duration: 3, + self_device_duration: 3 }, { name: 'NllLossBackward0', @@ -5743,7 +5763,7 @@ export class MockAPI { host_duration: 406, device_duration: 4, self_host_duration: 137, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: NllLossBackward0', @@ -5751,7 +5771,7 @@ export class MockAPI { host_duration: 522, device_duration: 4, self_host_duration: 116, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax_backward_data', @@ -5759,7 +5779,7 @@ export class MockAPI { host_duration: 109, device_duration: 9, self_host_duration: 91, - self_device_duration: 9, + self_device_duration: 9 }, { name: 'LogSoftmaxBackward0', @@ -5767,17 +5787,18 @@ export class MockAPI { host_duration: 178, device_duration: 9, self_host_duration: 69, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: LogSoftmaxBackward0', + name: + 'autograd::engine::evaluate_function: LogSoftmaxBackward0', calls: 1, host_duration: 283, device_duration: 9, self_host_duration: 105, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: CrossEntropyLoss.backward', @@ -5791,7 +5812,7 @@ export class MockAPI { host_duration: 33, device_duration: 1, self_host_duration: 12, - self_device_duration: 1, + self_device_duration: 1 }, { name: 'aten::zero_', @@ -5799,7 +5820,7 @@ export class MockAPI { host_duration: 41, device_duration: 1, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::nll_loss_backward', @@ -5807,7 +5828,7 @@ export class MockAPI { host_duration: 93, device_duration: 4, self_host_duration: 41, - self_device_duration: 3, + self_device_duration: 3 }, { name: 'NllLossBackward0', @@ -5815,7 +5836,7 @@ export class MockAPI { host_duration: 185, device_duration: 4, self_host_duration: 92, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: NllLossBackward0', @@ -5823,7 +5844,7 @@ export class MockAPI { host_duration: 211, device_duration: 4, self_host_duration: 26, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_log_softmax_backward_data', @@ -5831,7 +5852,7 @@ export class MockAPI { host_duration: 36, device_duration: 9, self_host_duration: 22, - self_device_duration: 9, + self_device_duration: 9 }, { name: 'LogSoftmaxBackward0', @@ -5839,19 +5860,20 @@ export class MockAPI { host_duration: 45, device_duration: 9, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: LogSoftmaxBackward0', + name: + 'autograd::engine::evaluate_function: LogSoftmaxBackward0', calls: 1, host_duration: 62, device_duration: 9, self_host_duration: 17, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-20', + path: '0-20' }, { left: { @@ -5866,7 +5888,7 @@ export class MockAPI { host_duration: 67, device_duration: 0, self_host_duration: 67, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -5874,7 +5896,7 @@ export class MockAPI { host_duration: 255, device_duration: 0, self_host_duration: 204, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -5882,7 +5904,7 @@ export class MockAPI { host_duration: 430, device_duration: 0, self_host_duration: 175, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mm', @@ -5890,7 +5912,7 @@ export class MockAPI { host_duration: 323, device_duration: 68, self_host_duration: 265, - self_device_duration: 68, + self_device_duration: 68 }, { name: 'AddmmBackward0', @@ -5898,7 +5920,7 @@ export class MockAPI { host_duration: 844, device_duration: 68, self_host_duration: 209, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sum', @@ -5906,7 +5928,7 @@ export class MockAPI { host_duration: 197, device_duration: 7, self_host_duration: 175, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::view', @@ -5914,7 +5936,7 @@ export class MockAPI { host_duration: 963, device_duration: 0, self_host_duration: 963, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddmmBackward0', @@ -5922,7 +5944,7 @@ export class MockAPI { host_duration: 1377, device_duration: 75, self_host_duration: 296, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -5930,7 +5952,7 @@ export class MockAPI { host_duration: 12404, device_duration: 496, self_host_duration: 9659, - self_device_duration: 496, + self_device_duration: 496 }, { name: 'torch::autograd::AccumulateGrad', @@ -5938,15 +5960,16 @@ export class MockAPI { host_duration: 20417, device_duration: 496, self_host_duration: 8013, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', + name: + 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', calls: 161, host_duration: 35211, device_duration: 496, self_host_duration: 14794, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'TBackward0', @@ -5954,7 +5977,7 @@ export class MockAPI { host_duration: 152, device_duration: 0, self_host_duration: 34, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: TBackward0', @@ -5962,7 +5985,7 @@ export class MockAPI { host_duration: 231, device_duration: 0, self_host_duration: 79, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -5970,7 +5993,7 @@ export class MockAPI { host_duration: 35, device_duration: 0, self_host_duration: 35, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::reshape', @@ -5978,7 +6001,7 @@ export class MockAPI { host_duration: 91, device_duration: 0, self_host_duration: 56, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'ReshapeAliasBackward0', @@ -5986,15 +6009,16 @@ export class MockAPI { host_duration: 133, device_duration: 0, self_host_duration: 42, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: ReshapeAliasBackward0', + name: + 'autograd::engine::evaluate_function: ReshapeAliasBackward0', calls: 1, host_duration: 205, device_duration: 0, self_host_duration: 72, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -6002,7 +6026,7 @@ export class MockAPI { host_duration: 95, device_duration: 0, self_host_duration: 79, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -6010,7 +6034,7 @@ export class MockAPI { host_duration: 7, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -6018,7 +6042,7 @@ export class MockAPI { host_duration: 324, device_duration: 37, self_host_duration: 301, - self_device_duration: 37, + self_device_duration: 37 }, { name: 'MeanBackward1', @@ -6026,7 +6050,7 @@ export class MockAPI { host_duration: 547, device_duration: 37, self_host_duration: 121, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: MeanBackward1', @@ -6034,7 +6058,7 @@ export class MockAPI { host_duration: 662, device_duration: 37, self_host_duration: 115, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::threshold_backward', @@ -6042,7 +6066,7 @@ export class MockAPI { host_duration: 6880, device_duration: 9012, self_host_duration: 6037, - self_device_duration: 9012, + self_device_duration: 9012 }, { name: 'ReluBackward0', @@ -6050,7 +6074,7 @@ export class MockAPI { host_duration: 10536, device_duration: 9012, self_host_duration: 3656, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReluBackward0', @@ -6058,7 +6082,7 @@ export class MockAPI { host_duration: 16666, device_duration: 9012, self_host_duration: 6130, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'AddBackward0', @@ -6066,7 +6090,7 @@ export class MockAPI { host_duration: 122, device_duration: 0, self_host_duration: 122, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddBackward0', @@ -6074,7 +6098,7 @@ export class MockAPI { host_duration: 1278, device_duration: 0, self_host_duration: 1156, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty', @@ -6082,7 +6106,7 @@ export class MockAPI { host_duration: 21126, device_duration: 0, self_host_duration: 21126, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm_backward', @@ -6090,7 +6114,7 @@ export class MockAPI { host_duration: 30875, device_duration: 22166, self_host_duration: 17909, - self_device_duration: 22166, + self_device_duration: 22166 }, { name: 'CudnnBatchNormBackward0', @@ -6098,15 +6122,16 @@ export class MockAPI { host_duration: 34355, device_duration: 22166, self_host_duration: 3480, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', + name: + 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', calls: 53, host_duration: 44006, device_duration: 22166, self_host_duration: 9651, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution_backward_input', @@ -6114,7 +6139,7 @@ export class MockAPI { host_duration: 20496, device_duration: 37887, self_host_duration: 15516, - self_device_duration: 37887, + self_device_duration: 37887 }, { name: 'aten::cudnn_convolution_backward_weight', @@ -6122,7 +6147,7 @@ export class MockAPI { host_duration: 22878, device_duration: 44271, self_host_duration: 13672, - self_device_duration: 44271, + self_device_duration: 44271 }, { name: 'aten::cudnn_convolution_backward', @@ -6130,7 +6155,7 @@ export class MockAPI { host_duration: 50961, device_duration: 82158, self_host_duration: 7587, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'CudnnConvolutionBackward0', @@ -6138,15 +6163,16 @@ export class MockAPI { host_duration: 54406, device_duration: 82158, self_host_duration: 3445, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', + name: + 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', calls: 53, host_duration: 64877, device_duration: 87386, self_host_duration: 8284, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -6154,7 +6180,7 @@ export class MockAPI { host_duration: 2187, device_duration: 5228, self_host_duration: 1909, - self_device_duration: 5228, + self_device_duration: 5228 }, { name: 'aten::fill_', @@ -6162,7 +6188,7 @@ export class MockAPI { host_duration: 53, device_duration: 230, self_host_duration: 36, - self_device_duration: 230, + self_device_duration: 230 }, { name: 'aten::zero_', @@ -6170,7 +6196,7 @@ export class MockAPI { host_duration: 96, device_duration: 230, self_host_duration: 43, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices_backward', @@ -6178,7 +6204,7 @@ export class MockAPI { host_duration: 237, device_duration: 1504, self_host_duration: 129, - self_device_duration: 1274, + self_device_duration: 1274 }, { name: 'MaxPool2DWithIndicesBackward0', @@ -6186,17 +6212,18 @@ export class MockAPI { host_duration: 295, device_duration: 1504, self_host_duration: 58, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', + name: + 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', calls: 1, host_duration: 411, device_duration: 1504, self_host_duration: 116, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'nn.Module: ResNet.backward', @@ -6210,7 +6237,7 @@ export class MockAPI { host_duration: 7, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::transpose', @@ -6218,7 +6245,7 @@ export class MockAPI { host_duration: 29, device_duration: 0, self_host_duration: 23, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::t', @@ -6226,7 +6253,7 @@ export class MockAPI { host_duration: 53, device_duration: 0, self_host_duration: 24, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mm', @@ -6234,7 +6261,7 @@ export class MockAPI { host_duration: 144, device_duration: 67, self_host_duration: 96, - self_device_duration: 67, + self_device_duration: 67 }, { name: 'AddmmBackward0', @@ -6242,7 +6269,7 @@ export class MockAPI { host_duration: 208, device_duration: 67, self_host_duration: 24, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::sum', @@ -6250,7 +6277,7 @@ export class MockAPI { host_duration: 45, device_duration: 7, self_host_duration: 30, - self_device_duration: 7, + self_device_duration: 7 }, { name: 'aten::view', @@ -6258,7 +6285,7 @@ export class MockAPI { host_duration: 163, device_duration: 0, self_host_duration: 163, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddmmBackward0', @@ -6266,7 +6293,7 @@ export class MockAPI { host_duration: 295, device_duration: 74, self_host_duration: 38, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add_', @@ -6274,7 +6301,7 @@ export class MockAPI { host_duration: 4103, device_duration: 535, self_host_duration: 2037, - self_device_duration: 535, + self_device_duration: 535 }, { name: 'torch::autograd::AccumulateGrad', @@ -6282,15 +6309,16 @@ export class MockAPI { host_duration: 5183, device_duration: 535, self_host_duration: 1080, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', + name: + 'autograd::engine::evaluate_function: torch::autograd::AccumulateGrad', calls: 161, host_duration: 7655, device_duration: 535, self_host_duration: 2472, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'TBackward0', @@ -6298,7 +6326,7 @@ export class MockAPI { host_duration: 16, device_duration: 0, self_host_duration: 3, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: TBackward0', @@ -6306,7 +6334,7 @@ export class MockAPI { host_duration: 24, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::_reshape_alias', @@ -6314,7 +6342,7 @@ export class MockAPI { host_duration: 5, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::reshape', @@ -6322,7 +6350,7 @@ export class MockAPI { host_duration: 10, device_duration: 0, self_host_duration: 5, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'ReshapeAliasBackward0', @@ -6330,15 +6358,16 @@ export class MockAPI { host_duration: 17, device_duration: 0, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: ReshapeAliasBackward0', + name: + 'autograd::engine::evaluate_function: ReshapeAliasBackward0', calls: 1, host_duration: 27, device_duration: 0, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::expand', @@ -6346,7 +6375,7 @@ export class MockAPI { host_duration: 10, device_duration: 0, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::to', @@ -6354,7 +6383,7 @@ export class MockAPI { host_duration: 1, device_duration: 0, self_host_duration: 1, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::div', @@ -6362,7 +6391,7 @@ export class MockAPI { host_duration: 63, device_duration: 37, self_host_duration: 45, - self_device_duration: 37, + self_device_duration: 37 }, { name: 'MeanBackward1', @@ -6370,7 +6399,7 @@ export class MockAPI { host_duration: 83, device_duration: 37, self_host_duration: 9, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: MeanBackward1', @@ -6378,7 +6407,7 @@ export class MockAPI { host_duration: 99, device_duration: 37, self_host_duration: 16, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::threshold_backward', @@ -6386,7 +6415,7 @@ export class MockAPI { host_duration: 1863, device_duration: 9003, self_host_duration: 1203, - self_device_duration: 9003, + self_device_duration: 9003 }, { name: 'ReluBackward0', @@ -6394,7 +6423,7 @@ export class MockAPI { host_duration: 2330, device_duration: 9003, self_host_duration: 467, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: ReluBackward0', @@ -6402,7 +6431,7 @@ export class MockAPI { host_duration: 3313, device_duration: 9003, self_host_duration: 983, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'AddBackward0', @@ -6410,7 +6439,7 @@ export class MockAPI { host_duration: 14, device_duration: 0, self_host_duration: 14, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'autograd::engine::evaluate_function: AddBackward0', @@ -6418,7 +6447,7 @@ export class MockAPI { host_duration: 135, device_duration: 0, self_host_duration: 121, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::empty', @@ -6426,7 +6455,7 @@ export class MockAPI { host_duration: 4638, device_duration: 0, self_host_duration: 4638, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_batch_norm_backward', @@ -6434,7 +6463,7 @@ export class MockAPI { host_duration: 5047, device_duration: 22244, self_host_duration: 2219, - self_device_duration: 22244, + self_device_duration: 22244 }, { name: 'CudnnBatchNormBackward0', @@ -6442,15 +6471,16 @@ export class MockAPI { host_duration: 5637, device_duration: 22244, self_host_duration: 590, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', + name: + 'autograd::engine::evaluate_function: CudnnBatchNormBackward0', calls: 53, host_duration: 7407, device_duration: 22244, self_host_duration: 1770, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::cudnn_convolution_backward_input', @@ -6458,7 +6488,7 @@ export class MockAPI { host_duration: 9345, device_duration: 37854, self_host_duration: 6945, - self_device_duration: 37854, + self_device_duration: 37854 }, { name: 'aten::cudnn_convolution_backward_weight', @@ -6466,7 +6496,7 @@ export class MockAPI { host_duration: 9886, device_duration: 44650, self_host_duration: 5378, - self_device_duration: 44650, + self_device_duration: 44650 }, { name: 'aten::cudnn_convolution_backward', @@ -6474,7 +6504,7 @@ export class MockAPI { host_duration: 20453, device_duration: 82504, self_host_duration: 1222, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'CudnnConvolutionBackward0', @@ -6482,15 +6512,16 @@ export class MockAPI { host_duration: 21000, device_duration: 82504, self_host_duration: 547, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', + name: + 'autograd::engine::evaluate_function: CudnnConvolutionBackward0', calls: 53, host_duration: 23024, device_duration: 87731, self_host_duration: 1440, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::add', @@ -6498,7 +6529,7 @@ export class MockAPI { host_duration: 584, device_duration: 5227, self_host_duration: 374, - self_device_duration: 5227, + self_device_duration: 5227 }, { name: 'aten::fill_', @@ -6506,7 +6537,7 @@ export class MockAPI { host_duration: 26, device_duration: 230, self_host_duration: 12, - self_device_duration: 230, + self_device_duration: 230 }, { name: 'aten::zero_', @@ -6514,7 +6545,7 @@ export class MockAPI { host_duration: 33, device_duration: 230, self_host_duration: 7, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::max_pool2d_with_indices_backward', @@ -6522,7 +6553,7 @@ export class MockAPI { host_duration: 73, device_duration: 1513, self_host_duration: 30, - self_device_duration: 1283, + self_device_duration: 1283 }, { name: 'MaxPool2DWithIndicesBackward0', @@ -6530,19 +6561,20 @@ export class MockAPI { host_duration: 83, device_duration: 1513, self_host_duration: 10, - self_device_duration: 0, + self_device_duration: 0 }, { - name: 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', + name: + 'autograd::engine::evaluate_function: MaxPool2DWithIndicesBackward0', calls: 1, host_duration: 106, device_duration: 1513, self_host_duration: 23, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-21', + path: '0-21' }, { left: { @@ -6557,7 +6589,7 @@ export class MockAPI { host_duration: 87, device_duration: 0, self_host_duration: 87, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -6565,7 +6597,7 @@ export class MockAPI { host_duration: 4, device_duration: 0, self_host_duration: 4, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -6573,9 +6605,9 @@ export class MockAPI { host_duration: 160, device_duration: 0, self_host_duration: 69, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, right: { name: 'aten::zeros', @@ -6589,7 +6621,7 @@ export class MockAPI { host_duration: 105, device_duration: 0, self_host_duration: 105, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zero_', @@ -6597,7 +6629,7 @@ export class MockAPI { host_duration: 2, device_duration: 0, self_host_duration: 2, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::zeros', @@ -6605,11 +6637,11 @@ export class MockAPI { host_duration: 119, device_duration: 0, self_host_duration: 12, - self_device_duration: 0, - }, - ], + self_device_duration: 0 + } + ] }, - path: '0-22', + path: '0-22' }, { left: { @@ -6624,7 +6656,7 @@ export class MockAPI { host_duration: 40, device_duration: 0, self_host_duration: 40, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mul_', @@ -6632,7 +6664,7 @@ export class MockAPI { host_duration: 11945, device_duration: 401, self_host_duration: 9568, - self_device_duration: 401, + self_device_duration: 401 }, { name: 'aten::add_', @@ -6640,9 +6672,9 @@ export class MockAPI { host_duration: 22480, device_duration: 894, self_host_duration: 17805, - self_device_duration: 894, - }, - ], + self_device_duration: 894 + } + ] }, right: { name: 'Optimizer.step#SGD.step', @@ -6656,7 +6688,7 @@ export class MockAPI { host_duration: 8, device_duration: 0, self_host_duration: 8, - self_device_duration: 0, + self_device_duration: 0 }, { name: 'aten::mul_', @@ -6664,7 +6696,7 @@ export class MockAPI { host_duration: 3440, device_duration: 404, self_host_duration: 1824, - self_device_duration: 404, + self_device_duration: 404 }, { name: 'aten::add_', @@ -6672,13 +6704,13 @@ export class MockAPI { host_duration: 6161, device_duration: 894, self_host_duration: 3186, - self_device_duration: 894, - }, - ], - }, - path: '0-23', - }, - ], - }); + self_device_duration: 894 + } + ] + }, + path: '0-23' + } + ] + }) } } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/app.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/app.tsx index 19eb4b112529073c6b8db9a86b8d68a7633598db..c8cd2ddec26fee10f0a6d448a2051e749ae20696 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/app.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/app.tsx @@ -15,52 +15,51 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import Box from '@material-ui/core/Box'; -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import ClickAwayListener from '@material-ui/core/ClickAwayListener'; -import CssBaseline from '@material-ui/core/CssBaseline'; -import Divider from '@material-ui/core/Divider'; -import Drawer from '@material-ui/core/Drawer'; -import Fab from '@material-ui/core/Fab'; -import FormControl from '@material-ui/core/FormControl'; -import IconButton from '@material-ui/core/IconButton'; -import ListSubheader from '@material-ui/core/ListSubheader'; -import MenuItem from '@material-ui/core/MenuItem'; -import Select, { SelectProps } from '@material-ui/core/Select'; -import { makeStyles } from '@material-ui/core/styles'; -import Tab from '@material-ui/core/Tab'; -import Tabs from '@material-ui/core/Tabs'; -import Typography from '@material-ui/core/Typography'; -import ChevronLeftIcon from '@material-ui/icons/ChevronLeft'; -import ChevronRightIcon from '@material-ui/icons/ChevronRight'; -import { message } from 'antd'; -import 'antd/es/button/style/css'; -import 'antd/es/list/style/css'; -import 'antd/es/table/style/css'; -import clsx from 'clsx'; -import * as React from 'react'; -import * as api from './api'; -import { AccuracyLeftPanel } from './components/Accuracy/AccuracyLeftPanel'; -import { FileInfo } from './components/Accuracy/entity'; -import { LossComparison } from './components/Accuracy/LossComparison'; -import { DiffOverview } from './components/DiffOverview'; -import { DistributedView } from './components/DistributedView'; -import { FullCircularProgress } from './components/FullCircularProgress'; -import { Kernel as KernelView } from './components/Kernel'; -import { MemoryView } from './components/MemoryView'; -import { ModuleView } from './components/ModuleView'; -import { Operator as OperatorView } from './components/Operator'; -import { Overview as OverviewPage } from './components/Overview'; -import { TraceView } from './components/TraceView'; -import { setup } from './setup'; -import './styles.css'; -import { firstOrUndefined, sleep } from './utils'; +import Box from '@material-ui/core/Box' +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import ClickAwayListener from '@material-ui/core/ClickAwayListener' +import CssBaseline from '@material-ui/core/CssBaseline' +import Divider from '@material-ui/core/Divider' +import Drawer from '@material-ui/core/Drawer' +import Fab from '@material-ui/core/Fab' +import FormControl from '@material-ui/core/FormControl' +import IconButton from '@material-ui/core/IconButton' +import ListSubheader from '@material-ui/core/ListSubheader' +import MenuItem from '@material-ui/core/MenuItem' +import Select, { SelectProps } from '@material-ui/core/Select' +import { makeStyles } from '@material-ui/core/styles' +import Tab from '@material-ui/core/Tab' +import Tabs from '@material-ui/core/Tabs' +import Typography from '@material-ui/core/Typography' +import ChevronLeftIcon from '@material-ui/icons/ChevronLeft' +import ChevronRightIcon from '@material-ui/icons/ChevronRight' +import 'antd/es/button/style/css' +import 'antd/es/list/style/css' +import 'antd/es/table/style/css' +import clsx from 'clsx' +import * as React from 'react' +import * as api from './api' +import { AccuracyLeftPanel } from './components/Accuracy/AccuracyLeftPanel' +import { FileInfo } from './components/Accuracy/entity' +import { LossComparison } from './components/Accuracy/LossComparison' +import { DiffOverview } from './components/DiffOverview' +import { DistributedView } from './components/DistributedView' +import { FullCircularProgress } from './components/FullCircularProgress' +import { Kernel } from './components/Kernel' +import { MemoryView } from './components/MemoryView' +import { ModuleView } from './components/ModuleView' +import { Operator } from './components/Operator' +import { Overview } from './components/Overview' +import { TraceView } from './components/TraceView' +import { setup } from './setup' +import './styles.css' +import { firstOrUndefined, sleep } from './utils' export enum Views { Overview = 'Overview', @@ -70,10 +69,10 @@ export enum Views { Distributed = 'Distributed', Memory = 'Memory', Module = 'Module', - Lightning = 'Lightning', + Lightning = 'Lightning' } -const viewNames = { +const ViewNames = { [Views.Overview]: Views.Overview, [Views.Operator]: Views.Operator, [Views.Kernel]: 'Kernel', @@ -81,59 +80,61 @@ const viewNames = { [Views.Distributed]: Views.Distributed, [Views.Memory]: Views.Memory, [Views.Module]: Views.Module, - [Views.Lightning]: Views.Lightning, -}; + [Views.Lightning]: Views.Lightning +} + +const accViews = ['Loss Comparison'] -const drawerWidth = 340; +const drawerWidth = 340 const useStyles = makeStyles((theme) => ({ root: { display: 'flex', - height: '100%', + height: '100%' }, appBar: { zIndex: theme.zIndex.drawer + 1, transition: theme.transitions.create(['width', 'margin'], { easing: theme.transitions.easing.sharp, - duration: theme.transitions.duration.leavingScreen, - }), + duration: theme.transitions.duration.leavingScreen + }) }, appBarShift: { marginLeft: drawerWidth, width: `calc(100% - ${drawerWidth}px)`, transition: theme.transitions.create(['width', 'margin'], { easing: theme.transitions.easing.sharp, - duration: theme.transitions.duration.enteringScreen, - }), + duration: theme.transitions.duration.enteringScreen + }) }, menuButton: { - marginRight: 36, + marginRight: 36 }, hide: { - display: 'none', + display: 'none' }, drawer: { width: drawerWidth, flexShrink: 0, - whiteSpace: 'nowrap', + whiteSpace: 'nowrap' }, drawerOpen: { width: drawerWidth, zIndex: 999, transition: theme.transitions.create('width', { easing: theme.transitions.easing.sharp, - duration: theme.transitions.duration.enteringScreen, - }), + duration: theme.transitions.duration.enteringScreen + }) }, drawerClose: { transition: theme.transitions.create('width', { easing: theme.transitions.easing.sharp, - duration: theme.transitions.duration.leavingScreen, + duration: theme.transitions.duration.leavingScreen }), overflowX: 'hidden', width: 0, [theme.breakpoints.up('sm')]: { - width: 0, - }, + width: 0 + } }, toolbar: { display: 'flex', @@ -141,304 +142,322 @@ const useStyles = makeStyles((theme) => ({ justifyContent: 'flex-end', padding: theme.spacing(0, 1), // necessary for content to be below app bar - ...theme.mixins.toolbar, + ...theme.mixins.toolbar }, content: { flexGrow: 1, padding: theme.spacing(3), - overflowX: 'hidden', + overflowX: 'hidden' }, formControl: { margin: theme.spacing(1), - minWidth: 120, + minWidth: 120 }, fab: { marginLeft: theme.spacing(1), marginTop: theme.spacing(1), - position: 'absolute', + position: 'absolute' }, iconButton: { - padding: '8px', - }, -})); + padding: '8px' + } +})) -export const App = (): JSX.Element => { - const classes = useStyles(); +export const App = () => { + const classes = useStyles() // #region - State - const [selectedTab, setSelectedTab] = React.useState(0); - - const [run, setRun] = React.useState(''); - const [runs, setRuns] = React.useState([]); - const [runsLoading, setRunsLoading] = React.useState(true); - - const [workers, setWorkers] = React.useState([]); - const [worker, setWorker] = React.useState(''); - - const [spans, setSpans] = React.useState([]); - const [span, setSpan] = React.useState(''); - const [views, setViews] = React.useState([]); - const [view, setView] = React.useState(''); - const [loaded, setLoaded] = React.useState(false); - const iframeRef = React.useRef(null); - const [deviceTarget, setDeviceTarget] = React.useState('GPU'); + const [selectedTab, setSelectedTab] = React.useState(0) + + const [run, setRun] = React.useState('') + const [runs, setRuns] = React.useState([]) + const [runsLoading, setRunsLoading] = React.useState(true) + + const [workers, setWorkers] = React.useState([]) + const [worker, setWorker] = React.useState('') + + const [spans, setSpans] = React.useState([]) + const [span, setSpan] = React.useState('') + + const [views, setViews] = React.useState([]) + const [view, setView] = React.useState('') + const [loaded, setLoaded] = React.useState(false) + const iframeRef = React.useRef(null) + const [deviceTarget, setDeviceTarget] = React.useState('GPU') + + const [diffLeftWorkerOptions, setDiffLeftWorkerOptions] = React.useState< + string[] + >([]) + const [diffLeftSpansOptions, setDiffLeftSpansOptions] = React.useState< + string[] + >([]) + const [diffLeftRun, setDiffLeftRun] = React.useState('') + const [diffLeftWorker, setDiffLeftWorker] = React.useState('') + const [diffLeftSpan, setDiffLeftSpan] = React.useState('') + + const [diffRightWorkerOptions, setDiffRightWorkerOptions] = React.useState< + string[] + >([]) + const [diffRightSpansOptions, setDiffRightSpansOptions] = React.useState< + string[] + >([]) + const [diffRightRun, setDiffRightRun] = React.useState('') + const [diffRightWorker, setDiffRightWorker] = React.useState('') + const [diffRightSpan, setDiffRightSpan] = React.useState('') + + const [open, setOpen] = React.useState(true) + + const [topTab, setTopTab] = React.useState(0) + const [fileList, setFileList] = React.useState([]) + const [uploadedCount, setUploadedCount] = React.useState(0) - const [diffLeftWorkerOptions, setDiffLeftWorkerOptions] = React.useState([]); - const [diffLeftSpansOptions, setDiffLeftSpansOptions] = React.useState([]); - const [diffLeftRun, setDiffLeftRun] = React.useState(''); - const [diffLeftWorker, setDiffLeftWorker] = React.useState(''); - const [diffLeftSpan, setDiffLeftSpan] = React.useState(''); - - const [diffRightWorkerOptions, setDiffRightWorkerOptions] = React.useState([]); - const [diffRightSpansOptions, setDiffRightSpansOptions] = React.useState([]); - const [diffRightRun, setDiffRightRun] = React.useState(''); - const [diffRightWorker, setDiffRightWorker] = React.useState(''); - const [diffRightSpan, setDiffRightSpan] = React.useState(''); - - const [open, setOpen] = React.useState(true); - - const [topTab, setTopTab] = React.useState(0); - const [fileList, setFileList] = React.useState([]); - const [uploadedCount, setUploadedCount] = React.useState(0); // #endregion + // #endregion React.useEffect(() => { - setup() - .catch(() => { - message.warning('google chart is not supported offline'); - }) - .finally(() => { - setLoaded(true); - }); - }, []); - - const continuouslyFetchRuns = async (): Promise => { + setup().catch(() => { + console.log('google chart is not supported offline') + }).finally(() => { + setLoaded(true) + }) + }, []) + + const continuouslyFetchRuns = async () => { while (true) { try { - const result = await api.defaultApi.runsGet(); - setRuns(result.runs); - setRunsLoading(result.loading); + const runs = await api.defaultApi.runsGet() + setRuns(runs.runs) + setRunsLoading(runs.loading) } catch (e) { - message.warning(`Cannot fetch runs: ${e}`); + console.info('Cannot fetch runs: ', e) } - await sleep(5000); + await sleep(5000) } - }; + } React.useEffect(() => { - continuouslyFetchRuns(); - }, []); + continuouslyFetchRuns() + }, []) React.useEffect(() => { if (!run || !runs.includes(run)) { - setRun(firstOrUndefined(runs) ?? ''); + setRun(firstOrUndefined(runs) ?? '') } - }, [runs]); // #region - Diff Left + }, [runs]) + + // #region - Diff Left React.useEffect(() => { if (diffLeftRun) { - api.defaultApi.workersGet(diffLeftRun, Views.Overview).then((data) => { - setDiffLeftWorkerOptions(data); - }); + api.defaultApi.workersGet(diffLeftRun, Views.Overview).then((workers) => { + setDiffLeftWorkerOptions(workers) + }) } - }, [diffLeftRun]); + }, [diffLeftRun]) React.useEffect(() => { if (diffLeftRun && diffLeftWorker) { - api.defaultApi.spansGet(diffLeftRun, diffLeftWorker).then((data) => { - setDiffLeftSpansOptions(data); - }); + api.defaultApi.spansGet(diffLeftRun, diffLeftWorker).then((spans) => { + setDiffLeftSpansOptions(spans) + }) } - }, [diffLeftRun, diffLeftWorker]); + }, [diffLeftRun, diffLeftWorker]) // #endregion + // #region - Diff Right + React.useEffect(() => { if (diffRightRun) { - api.defaultApi.workersGet(diffRightRun, Views.Overview).then((data) => { - setDiffRightWorkerOptions(data); - }); + api.defaultApi + .workersGet(diffRightRun, Views.Overview) + .then((workers) => { + setDiffRightWorkerOptions(workers) + }) } - }, [diffRightRun]); + }, [diffRightRun]) React.useEffect(() => { if (diffRightRun && diffRightWorker) { - api.defaultApi.spansGet(diffRightRun, diffRightWorker).then((data) => { - setDiffRightSpansOptions(data); - }); + api.defaultApi.spansGet(diffRightRun, diffRightWorker).then((spans) => { + setDiffRightSpansOptions(spans) + }) } - }, [diffRightRun, diffRightWorker]); + }, [diffRightRun, diffRightWorker]) // #endregion + // #region - normal + React.useEffect(() => { if (run) { api.defaultApi.viewsGet(run).then((rawViews) => { - const result = rawViews.views.map((v) => Views[Views[v as Views]]).filter(Boolean); - setDeviceTarget(rawViews.device_target); - setViews(result); - }); + const views = rawViews.views + .map((v) => Views[Views[v as Views]]) + .filter(Boolean) + setDeviceTarget(rawViews.device_target) + setViews(views) + }) } - }, [run]); + }, [run]) React.useEffect(() => { - setView(firstOrUndefined(views) ?? ''); - }, [views]); + setView(firstOrUndefined(views) ?? '') + }, [views]) React.useEffect(() => { if (run && view) { - api.defaultApi.workersGet(run, view).then((data) => { - setWorkers(data); - }); + api.defaultApi.workersGet(run, view).then((workers) => { + setWorkers(workers) + }) } - }, [run, view]); + }, [run, view]) React.useEffect(() => { - setWorker(firstOrUndefined(workers) ?? ''); - }, [workers]); + setWorker(firstOrUndefined(workers) ?? '') + }, [workers]) React.useEffect(() => { if (run && worker) { - api.defaultApi.spansGet(run, worker).then((data) => { - setSpans(data); - }); + api.defaultApi.spansGet(run, worker).then((spans) => { + setSpans(spans) + }) } - }, [run, worker]); + }, [run, worker]) React.useEffect(() => { - setSpan(firstOrUndefined(spans) ?? ''); - }, [spans]); + setSpan(firstOrUndefined(spans) ?? '') + }, [spans]) // #endregion // #region - Event Handler - const handleTabChange = (event: React.ChangeEvent>, value: any): void => { - setSelectedTab(value as number); - }; + const handleTabChange = (event: React.ChangeEvent<{}>, value: any) => { + setSelectedTab(value as number) + } - const handleTopTabChange = (event: React.ChangeEvent>, value: any): void => { - setTopTab(value as number); - }; + const handleTopTabChange = (event: React.ChangeEvent<{}>, value: any) => { + setTopTab(value as number) + } const handleRunChange: SelectProps['onChange'] = (event) => { - setRun(event.target.value as string); - setView(''); - setWorker(''); - setSpan(''); - }; + setRun(event.target.value as string) + setView('') + setWorker('') + setSpan('') + } const handleViewChange: SelectProps['onChange'] = (event) => { - setView(event.target.value as Views); - setWorker(''); - setSpan(''); - }; + setView(event.target.value as Views) + setWorker('') + setSpan('') + } const handleWorkerChange: SelectProps['onChange'] = (event) => { - setWorker(event.target.value as string); - setSpan(''); - }; + setWorker(event.target.value as string) + setSpan('') + } const handleSpanChange: SelectProps['onChange'] = (event) => { - setSpan(event.target.value as string); - }; + setSpan(event.target.value as string) + } const handleDiffLeftRunChange: SelectProps['onChange'] = (event) => { - setDiffLeftRun(event.target.value as string); - setDiffLeftWorker(''); - setDiffLeftSpan(''); - }; + setDiffLeftRun(event.target.value as string) + setDiffLeftWorker('') + setDiffLeftSpan('') + } const handleDiffLeftWorkerChange: SelectProps['onChange'] = (event) => { - setDiffLeftWorker(event.target.value as string); - setDiffLeftSpan(''); - }; + setDiffLeftWorker(event.target.value as string) + setDiffLeftSpan('') + } const handleDiffLeftSpanChange: SelectProps['onChange'] = (event) => { - setDiffLeftSpan(event.target.value as string); - }; + setDiffLeftSpan(event.target.value as string) + } const handleDiffRightRunChange: SelectProps['onChange'] = (event) => { - setDiffRightRun(event.target.value as string); - setDiffRightWorker(''); - setDiffRightSpan(''); - }; + setDiffRightRun(event.target.value as string) + setDiffRightWorker('') + setDiffRightSpan('') + } const handleDiffRightWorkerChange: SelectProps['onChange'] = (event) => { - setDiffRightWorker(event.target.value as string); - setDiffRightSpan(''); - }; + setDiffRightWorker(event.target.value as string) + setDiffRightSpan('') + } const handleDiffRightSpanChange: SelectProps['onChange'] = (event) => { - setDiffRightSpan(event.target.value as string); - }; + setDiffRightSpan(event.target.value as string) + } - const handleDrawerOpen = (): void => { - setOpen(true); - setIframeActive(); - }; + const handleDrawerOpen = () => { + setOpen(true) + SetIframeActive() + } - const handleDrawerClose = (): void => { - setOpen(false); - setIframeActive(); - }; + const handleDrawerClose = () => { + setOpen(false) + SetIframeActive() + } - const setIframeActive = (): void => { - iframeRef.current?.focus(); - }; + const SetIframeActive = () => { + iframeRef.current?.focus() + } - const _changeFileList = (files: FileInfo[]): void => { + const _changeFileList = (files: FileInfo[]) => { if (JSON.stringify(files) !== JSON.stringify(fileList)) { - setFileList(files); + setFileList(files) } - }; + } - const _getViews = (viewName: Views): string => { - if (viewName === Views.Kernel) { - return deviceTarget === 'Ascend' ? `NPU ${viewNames[viewName]}` : `GPU ${viewNames[viewName]}`; - } else { - return viewNames[viewName]; - } - }; + const _changeUploadCount = (count: number) => { + setUploadedCount(count) + } - const _changeUploadCount = (count: number): void => { - setUploadedCount(count); - }; // #endregion + // #endregion - const renderContent = (): JSX.Element => { - if (!runsLoading && runs.length === 0) { + const renderContent = () => { + if (!runsLoading && runs.length == 0) { return ( - - + + There are not any runs in the log folder. - ); + ) } - const notReady = !loaded || !run || !worker || !view || !span; - if (notReady) { - return ; + + if (!loaded || !run || !worker || !view || !span) { + return } if (selectedTab === 0) { switch (view) { case Views.Overview: - return ; + return case Views.Operator: - return ; + return case Views.Kernel: - return ; + return case Views.Trace: - return ; + return ( + + ) case Views.Distributed: - return ; + return case Views.Memory: - return ; + return case Views.Module: case Views.Lightning: - return ; - default: - return <>; + return } } else { return ( @@ -450,99 +469,112 @@ export const App = (): JSX.Element => { expWorker={diffRightWorker} expSpan={diffRightSpan} /> - ); + ) } - }; + } - const spanComponent = (): JSX.Element => { + const spanComponent = () => { const spanFragment = ( Spans - - + + - ); + ) if (!spans || spans.length <= 1) { - return
{spanFragment}
; + return
{spanFragment}
} else { - return spanFragment; + return spanFragment } - }; + } return (
- +
- - - + + + {topTab === 0 ? ( <> - - - + + + - {selectedTab === 0 ? ( + {selectedTab == 0 ? ( <> Runs - - + + Views - - + + Workers - - + + @@ -551,75 +583,93 @@ export const App = (): JSX.Element => { ) : ( <> -   Baseline +   Baseline Runs - + Workers - - - - Spans - - - + + + + Spans + + + - + -   Experimental +   Experimental Runs - + Workers - - + {diffRightWorkerOptions.map((worker) => ( + {worker} ))} Spans - - + {diffRightSpansOptions.map((span) => ( + {span} ))} )} - ) : ( - - )} + ) : + + }
{!open && ( - + )}
{topTab === 0 ? renderContent() : }
-
- ); -}; + + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/AccuracyLeftPanel.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/AccuracyLeftPanel.tsx index c7b7d7cf0841e7dc3686138b584e101e5052f4a6..ef9b170ec7a3de46039e5345ddf574f6fd620077 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/AccuracyLeftPanel.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/AccuracyLeftPanel.tsx @@ -17,32 +17,38 @@ * limitations under the License. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { useState, useEffect, useCallback, useRef } from 'react'; -import { makeStyles } from '@material-ui/core/styles'; -import { Button, Checkbox, Spin, Modal, message } from 'antd'; -import { CheckboxChangeEvent } from 'antd/es/checkbox'; -import { DeleteOutlined, DownloadOutlined, ImportOutlined, SettingOutlined, WarningTwoTone } from '@ant-design/icons'; -import { RegexConfigModal } from './RegexConfigModal'; -import { FileInfo } from './entity'; +import * as React from 'react' +import { useState, useEffect, useCallback, useRef } from 'react' +import { makeStyles } from '@material-ui/core/styles' +import { Button, Checkbox, Spin, Modal, message } from 'antd' +import { CheckboxChangeEvent } from 'antd/es/checkbox' +import { + DeleteOutlined, + DownloadOutlined, + ImportOutlined, + SettingOutlined, + WarningTwoTone, +} from '@ant-design/icons' +import { RegexConfigModal } from './RegexConfigModal' +import { FileInfo } from './entity' interface IProps { - onChangeCheckedFileList: (files: FileInfo[]) => void; - onChangeUploadedCount: (count: number) => void; + onChangeCheckedFileList: (files: FileInfo[]) => void + onChangeUploadedCount: (count: number) => void } // 匹配数字包括科学计数法 -const LOSS_REG_EXP = /[+-]?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?/; +const LOSS_REG_EXP = /[+-]?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?/ // 匹配自然数 -const ITER_REG_EXP = /\d+/; +const ITER_REG_EXP = /\d+/ // 单个文件最大大小 -const FILE_MAX_SIZE = 50 * 1024 * 1024; +const FILE_MAX_SIZE = 50 * 1024 * 1024 // 最大文件上传数量 -export const MAX_FILE_COUNT = 6; +export const MAX_FILE_COUNT = 6 const useStyles = makeStyles(() => ({ root: { - height: '100%', + height: '100%' }, btnPanel: { height: 50, @@ -50,8 +56,8 @@ const useStyles = makeStyles(() => ({ borderBottom: '1px solid #DFE5EF', display: 'flex', '& .ant-btn': { - margin: 'auto', - }, + margin: 'auto' + } }, fileContainer: { height: 54, @@ -65,7 +71,7 @@ const useStyles = makeStyles(() => ({ fontSize: 14, overflow: 'hidden', textOverflow: 'ellipsis', - whiteSpace: 'nowrap', + whiteSpace: 'nowrap' }, '& .btns': { display: 'inline-block', @@ -73,17 +79,17 @@ const useStyles = makeStyles(() => ({ '& .icon': { cursor: 'pointer', '&:hover': { - color: '#1890ff', - }, + color: '#1890ff' + } }, '& .iconLeft': { - marginRight: 8, - }, + marginRight: 8 + } }, }, deleteModal: { '& .ant-modal-title': { - fontWeight: 'bold', + fontWeight: 'bold' }, '& .deleteModalBody': { display: 'flex', @@ -91,210 +97,203 @@ const useStyles = makeStyles(() => ({ height: 80, '& .warningIcon': { display: 'inline-block', - fontSize: 50, + fontSize: 50 }, '& .warningText': { display: 'inline-block', marginLeft: 16, overflow: 'hidden', wordBreak: 'break-all', - flex: 1, - }, - }, - }, -})); + flex: 1 + } + } + } +})) export const AccuracyLeftPanel: React.FC = (props) => { - const { onChangeCheckedFileList, onChangeUploadedCount } = props; - const classes = useStyles(); - const [configModalVis, setConfigModalVis] = useState(false); - const [deleteModalVis, setDeleteModalVis] = useState(false); - const [fileList, setFileList] = useState([]); - const [importSpin, setImportSpin] = useState(false); - const [selectedFile, setSelectedFile] = useState(undefined); - const downLoadRef = useRef(null); + const { onChangeCheckedFileList, onChangeUploadedCount } = props + const classes = useStyles() + const [configModalVis, setConfigModalVis] = useState(false) + const [deleteModalVis, setDeleteModalVis] = useState(false) + const [fileList, setFileList] = useState([]) + const [importSpin, setImportSpin] = useState(false) + const [selectedFile, setSelectedFile] = useState(undefined) + const downLoadRef = useRef(null) const parseFile = (file: FileInfo): FileInfo => { - file.losses = []; - file.iterLosses = {}; - file.iters = []; - const lines = file.fileContent.split(/\r\n|\n|\r/); + file.losses = [] + file.iterLosses = {} + file.iters = [] + const lines = file.fileContent.split(/\r\n|\n|\r/) for (let i = 0; i < lines.length; i++) { - const iter = parseByTag(lines[i], file.iterTag, false); - const loss = parseByTag(lines[i], file.lossTag, true); + const iter = parseByTag(lines[i], file.iterTag, false) + const loss = parseByTag(lines[i], file.lossTag, true) if (iter !== null && loss !== null) { - file.iters.push(iter); - file.losses.push([iter, loss]); - file.iterLosses[iter] = loss; + file.iters.push(iter) + file.losses.push([iter, loss]) + file.iterLosses[iter] = loss } } - return file; - }; + return file + } const parseByTag = (line: string, tag: string, isLoss: boolean): number | null => { - let pos = line.indexOf(tag); - let result: number | null = null; + let pos = line.indexOf(tag) + let result: number | null = null if (pos !== -1) { - const res = (isLoss ? LOSS_REG_EXP : ITER_REG_EXP).exec( - line - .substring(pos + tag.length) - .trim() - .split(/\s+/)[0] - ); + const res = (isLoss ? LOSS_REG_EXP : ITER_REG_EXP) + .exec(line.substring(pos + tag.length).trim().split(/\s+/)[0]) if (res !== null) { if (isLoss) { - result = parseFloat(res[0]); + result = parseFloat(res[0]) } else { - result = parseInt(res[0]); + result = parseInt(res[0]) } } else { - console.warn(`Found ${isLoss ? 'loss' : 'iteration'} text, but parse value with error: [${line}]`); + console.log(`Found ${isLoss ? 'loss' : 'iteration'} text, but parse value with error: [${line}]`) } } - return result; - }; + return result + } - const importFile = (): void => { - document.getElementById('accComparisonSelectFile')?.click(); - }; + const importFile = () => { + document.getElementById('accComparisonSelectFile')?.click() + } - const uploadFile = (e: React.ChangeEvent): void => { - setImportSpin(true); - const file = e.target.files?.[0]; + const uploadFile = (e: React.ChangeEvent) => { + setImportSpin(true) + const file = e.target.files?.[0] if (file) { if (file.size > FILE_MAX_SIZE) { - message.warn('Sorry, the file size cannot be greater than 50MB.'); - setImportSpin(false); + message.warn('Sorry, the file size cannot be greater than 50MB.') + setImportSpin(false) // 防止同名文件不触发事件 - e.target.value = ''; - return; + e.target.value = '' + return } - const reader = new FileReader(); - reader.onload = ((loadedFile) => { - return (event) => { - addFile(loadedFile.name.trim(), event.target?.result as string); - setImportSpin(false); - }; + const reader = new FileReader() + reader.onload = ((selectedFile) => { + return (e) => { + addFile(selectedFile.name.trim(), e.target?.result as string) + setImportSpin(false) + } })(file); - reader.readAsText(file); + reader.readAsText(file) } // 防止同名文件不触发事件 - e.target.value = ''; - }; + e.target.value = '' + } - const addFile = (fileName: string, fileContent: string): void => { - const fileLength = fileName.length; - const tempList: FileInfo[] = JSON.parse(JSON.stringify(fileList)); - let updatedFileName = fileName; // 新变量用于存储更新后的文件名 + const addFile = (fileName: string, fileContent: string) => { + const fileLength = fileName.length + const tempList: FileInfo[] = JSON.parse(JSON.stringify(fileList)) // 上传同名文件加上(1~最大文件数减1)标识 - if (!!tempList.find((item) => item.fileName === fileName)) { + if (!!tempList.find(item => item.fileName === fileName)) { for (let i = 1; i < MAX_FILE_COUNT; i++) { - let temp = `${fileName.slice(0, fileLength - 4)}(${i})${fileName.slice(fileLength - 4)}`; - if (tempList.find((item) => item.fileName === temp) === undefined) { - updatedFileName = temp; - break; + let temp = `${fileName.slice(0, fileLength - 4)}(${i})${fileName.slice(fileLength - 4)}` + if (tempList.find(item => item.fileName === temp) === undefined) { + fileName = temp + break } } } const file: FileInfo = { id: fileList.length, - fileName: updatedFileName, + fileName: fileName, fileContent, checked: true, lossTag: 'loss:', iterTag: 'iteration', iters: [], losses: [], - iterLosses: {}, - }; - tempList.push(parseFile(file)); - setFileList(tempList); - }; + iterLosses: {} + } + tempList.push(parseFile(file)) + setFileList(tempList) + } - const exportCsv = (data: FileInfo): void => { - let csvContent = `data:text/csv;charset=utf-8,${data.iterTag},${data.lossTag}\n`; - data.losses.forEach((item) => { - csvContent += `${item[0]},${item[1]}\n`; - }); - downLoadRef.current?.setAttribute('href', encodeURI(csvContent)); - downLoadRef.current?.setAttribute('download', `${data.fileName}.csv`); - downLoadRef.current?.click(); - }; + const exportCsv = (data: FileInfo) => { + let csvContent = `data:text/csv;charset=utf-8,${data.iterTag},${data.lossTag}\n` + data.losses.forEach(item => { + csvContent += `${item[0]},${item[1]}\n` + }) + downLoadRef.current?.setAttribute('href', encodeURI(csvContent)) + downLoadRef.current?.setAttribute('download', `${data.fileName}.csv`) + downLoadRef.current?.click() + } - const onCheckChange = (e: CheckboxChangeEvent, index: number): void => { - const tempList: FileInfo[] = JSON.parse(JSON.stringify(fileList)); - tempList[index].checked = e.target.checked; - setFileList(tempList); - }; + const onCheckChange = (e: CheckboxChangeEvent, index: number) => { + const tempList: FileInfo[] = JSON.parse(JSON.stringify(fileList)) + tempList[index].checked = e.target.checked + setFileList(tempList) + } - const onConfigIconClick = (data: FileInfo): void => { - setSelectedFile(data); - setConfigModalVis(true); - }; + const onConfigIconClick = (data: FileInfo) => { + setSelectedFile(data) + setConfigModalVis(true) + } - const onDeleteIconClick = (data: FileInfo): void => { - setSelectedFile(data); - setDeleteModalVis(true); - }; + const onDeleteIconClick = (data: FileInfo) => { + setSelectedFile(data) + setDeleteModalVis(true) + } - const configModalOk = (data: FileInfo): void => { - const tempList = fileList.map((item) => { - return item.id === data.id ? parseFile(data) : item; - }); - setFileList(tempList); - setConfigModalVis(false); - }; + const configModalOk = (data: FileInfo) => { + const tempList = fileList.map(item => { + return item.id === data.id ? parseFile(data) : item + }) + setFileList(tempList) + setConfigModalVis(false) + } - const configModalCancel = (): void => { - setConfigModalVis(false); - }; + const configModalCancel = () => { + setConfigModalVis(false) + } - const deleteModalOk = (): void => { - const tempList = JSON.parse(JSON.stringify(fileList)); - let founded = false; - let index = 0; + const deleteModalOk = () => { + const tempList = JSON.parse(JSON.stringify(fileList)) + let founded = false + let index = 0 for (let i = 0; i < tempList.length; i++) { if (founded) { - tempList[i].id -= 1; - continue; + tempList[i].id -= 1 + continue } if (tempList[i].id === selectedFile?.id) { - founded = true; - index = i; + founded = true + index = i } } - tempList.splice(index, 1); - setFileList(tempList); - setSelectedFile(undefined); - setDeleteModalVis(false); - }; + tempList.splice(index, 1) + setFileList(tempList) + setSelectedFile(undefined) + setDeleteModalVis(false) + } const renderFileItems = useCallback(() => { return fileList.map((item) => { return (
- onCheckChange(e, item.id)} /> - - {item.fileName} - -
- onConfigIconClick(item)} /> - exportCsv(item)} /> - onDeleteIconClick(item)} /> + onCheckChange(e, item.id)} /> + {item.fileName} +
+ onConfigIconClick(item)} /> + exportCsv(item)} /> + onDeleteIconClick(item)} />
- ); - }); - }, [JSON.stringify(fileList)]); + ) + }) + }, [JSON.stringify(fileList)]) useEffect(() => { - onChangeCheckedFileList(fileList.filter((item) => item.checked)); - onChangeUploadedCount(fileList.length); - }, [JSON.stringify(fileList)]); + onChangeCheckedFileList(fileList.filter(item => item.checked)) + onChangeUploadedCount(fileList.length) + }, [JSON.stringify(fileList)]) return (
- +
- +
{renderFileItems()}
- {configModalVis && ( - - )} + {configModalVis && + + } setDeleteModalVis(false)} + onCancel={() => setDeleteModalVis(false)} onOk={deleteModalOk} width={500} className={classes.deleteModal} > -
- - +
+ + Are you sure to delete "{selectedFile?.fileName}"?
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/ComparisonPanel.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/ComparisonPanel.tsx index 500d29764c5209958ba19630ac1d4e08c10f24a5..a9c9d34feb585cac7c6aa26f9e962c0ed9d11d88 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/ComparisonPanel.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/ComparisonPanel.tsx @@ -17,23 +17,23 @@ * limitations under the License. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { useState, useLayoutEffect, useRef, useEffect } from 'react'; -import { makeStyles } from '@material-ui/core/styles'; -import { FileInfo } from './entity'; -import { Empty, Popover, Radio, RadioChangeEvent, Select, Table } from 'antd'; -import { ColumnsType } from 'antd/es/table'; -import * as echarts from 'echarts'; -import { InfoCircleOutlined } from '@ant-design/icons'; +import * as React from 'react' +import { useState, useLayoutEffect, useRef, useEffect } from 'react' +import { makeStyles } from '@material-ui/core/styles' +import { FileInfo } from './entity' +import { Empty, Popover, Radio, RadioChangeEvent, Select, Table } from 'antd' +import { ColumnsType } from 'antd/es/table' +import * as echarts from 'echarts' +import { InfoCircleOutlined } from '@ant-design/icons' interface IProps { - fileList: FileInfo[]; + fileList: FileInfo[] } interface ILineDataList { - normal: number[][]; - absolute: number[][]; - relative: number[][]; + normal: number[][] + absolute: number[][] + relative: number[][] } const useStyles = makeStyles(() => ({ @@ -49,26 +49,26 @@ const useStyles = makeStyles(() => ({ lineHeight: '24px', fontFamily: 'sans-serif', fontSize: 16, - fontWeight: 700, + fontWeight: 700 }, filter: { height: 40, lineHeight: '40px', '& .comparisonSelect': { - margin: '0 8px', + margin: '0 8px' }, '& .comparisonLabel': { - marginRight: 8, + marginRight: 8 }, '& .comparisonBtn': { - marginLeft: 20, + marginLeft: 20 }, '& .infoLabel': { - fontSize: 20, - }, + fontSize: 20 + } }, empty: { - marginTop: 60, + marginTop: 60 }, content: { flex: 1, @@ -76,11 +76,11 @@ const useStyles = makeStyles(() => ({ }, lossChart: { height: '100%', - flex: 1, + flex: 1 }, lossTable: { height: '100%', - width: '32%', + width: '32%' }, tableHeader: { display: 'inline-block', @@ -90,163 +90,149 @@ const useStyles = makeStyles(() => ({ transform: 'translateY(-50%)', overflow: 'hidden', textOverflow: 'ellipsis', - whiteSpace: 'nowrap', - }, -})); + whiteSpace: 'nowrap' + } +})) export const ComparisonPanel: React.FC = (props) => { - const { fileList } = props; - const classes = useStyles(); - const [selectedFiles, setSelectedFiles] = useState([]); - const [compareWay, setCompareWay] = useState(0); - const [pageSize, setPageSize] = useState(20); - const [lineData, setLineData] = useState(undefined); - const [tableData, setTableData] = useState([]); - const chartRef = useRef(null); + const { fileList } = props + const classes = useStyles() + const [selectedFiles, setSelectedFiles] = useState([]) + const [compareWay, setCompareWay] = useState(0) + const [pageSize, setPageSize] = useState(20) + const [lineData, setLineData] = useState(undefined) + const [tableData, setTableData] = useState([]) + const chartRef = useRef(null) const getColumns = (): ColumnsType => { - const columns: ColumnsType = [ - { - title: 'Iteration', - key: 'iter', - dataIndex: 'iter', - }, - ]; + const columns: ColumnsType = [{ + title: 'Iteration', + key: 'iter', + dataIndex: 'iter', + }] selectedFiles.forEach((item, index) => { columns.push({ title: () => ( -
- {item} -
+
{item}
), key: index, dataIndex: item, - width: '40%', - }); - }); - return columns; - }; + width: '40%' + }) + }) + return columns + } - const compareFile = (fileNames: string[]): void => { + const compareFile = (fileNames: string[]) => { if (fileNames.length < 2) { - return; + return } - const baseFile = fileList.find((item) => item.fileName === fileNames[0]); - const expFile = fileList.find((item) => item.fileName === fileNames[1]); + const baseFile = fileList.find(item => item.fileName === fileNames[0]) + const expFile = fileList.find(item => item.fileName === fileNames[1]) if (!!baseFile && !!expFile) { - const commonIters: number[] = []; - const lessIters = baseFile.iters.length <= expFile.iters.length ? baseFile.iters : expFile.iters; - const moreIters = baseFile.iters.length > expFile.iters.length ? baseFile.iters : expFile.iters; - lessIters.forEach((iter) => { + const commonIters: number[] = [] + const lessIters = baseFile.iters.length <= expFile.iters.length ? baseFile.iters : expFile.iters + const moreIters = baseFile.iters.length > expFile.iters.length ? baseFile.iters : expFile.iters + lessIters.forEach(iter => { if (moreIters.includes(iter)) { - commonIters.push(iter); + commonIters.push(iter) } - }); - commonIters.sort((a, b) => a - b); - const tempTableData: any[] = []; + }) + commonIters.sort((a, b) => a - b) + const tempTableData: any[] = [] const tempChartData: ILineDataList = { normal: [], absolute: [], - relative: [], - }; + relative: [] + } commonIters.forEach((iter, index) => { - const baseLoss = baseFile.iterLosses[iter]; - const expLoss = expFile.iterLosses[iter]; + const baseLoss = baseFile.iterLosses[iter] + const expLoss = expFile.iterLosses[iter] tempTableData.push({ key: `${iter}_${index}`, iter, [baseFile.fileName]: baseLoss, - [expFile.fileName]: expLoss, - }); - tempChartData.normal.push([iter, expLoss - baseLoss]); - tempChartData.absolute.push([iter, Math.abs(expLoss - baseLoss)]); - tempChartData.relative.push([iter, baseLoss === 0 ? 0 : Math.abs(expLoss - baseLoss) / baseLoss]); - }); - setTableData(tempTableData); - setLineData(tempChartData); + [expFile.fileName]: expLoss + }) + tempChartData.normal.push([iter, expLoss - baseLoss]) + tempChartData.absolute.push([iter, Math.abs(expLoss - baseLoss)]) + tempChartData.relative.push([iter, baseLoss === 0 ? 0 : Math.abs(expLoss - baseLoss) / baseLoss]) + }) + setTableData(tempTableData) + setLineData(tempChartData) } - }; + } - const onSelectChange = (value: string[]): void => { - setSelectedFiles(value); - compareFile(value); - }; + const onSelectChange = (value: string[]) => { + setSelectedFiles(value) + compareFile(value) + } - const onRadioChange = (e: RadioChangeEvent): void => { - setCompareWay(e.target.value); - }; + const onRadioChange = (e: RadioChangeEvent) => { + setCompareWay(e.target.value) + } - const onShowSizeChange = (current: number, size: number): void => { - setPageSize(size); - }; + const onShowSizeChange = (current: number, size: number) => { + setPageSize(size) + } useLayoutEffect(() => { - const element = chartRef.current; + const element = chartRef.current if (!element || !lineData) { - return undefined; - } - const echart = echarts.init(element); - let dataSource: number[][] = []; - if (compareWay === 0) { - dataSource = lineData.normal; - } else if (compareWay === 1) { - dataSource = lineData.absolute; - } else { - dataSource = lineData.relative; + return } + const echart = echarts.init(element) const option: echarts.EChartsOption = { title: { text: 'Comparison Chart', textStyle: { fontSize: 12, - color: '#000', - }, + color: '#000' + } }, legend: { bottom: 0 }, xAxis: { type: 'category', boundaryGap: false, - name: 'Iteration', + name: 'Iteration' }, yAxis: { type: 'value', name: 'Difference', - scale: true, + scale: true }, tooltip: { trigger: 'axis', - valueFormatter: (value) => (value as number).toFixed(6), + valueFormatter: (value) => (value as number).toFixed(6) }, dataZoom: { - type: 'inside', + type: 'inside' }, dataset: { - source: dataSource, + source: compareWay === 0 ? lineData.normal : (compareWay === 1 ? lineData.absolute : lineData.relative) }, series: { type: 'line', name: 'Difference', - symbol: 'none', - }, - }; - - if (option) { - echart.setOption(option, true); + symbol: 'none' + } } + + option && echart.setOption(option, true) return () => { - echart.dispose(); - }; - }, [compareWay, lineData]); + echart.dispose() + } + }, [compareWay, lineData]) useEffect(() => { - const tempValue = selectedFiles.filter((item) => { - return !!fileList.find((file) => file.fileName === item); - }); + const tempValue = selectedFiles.filter(item => { + return !!fileList.find(file => file.fileName === item) + }) if (JSON.stringify(tempValue) === JSON.stringify(selectedFiles)) { - compareFile(tempValue); + compareFile(tempValue) } - setSelectedFiles(tempValue); - }, [fileList]); + setSelectedFiles(tempValue) + }, [fileList]) return (
@@ -254,23 +240,25 @@ export const ComparisonPanel: React.FC = (props) => {
Comparison objects:
- Iteration Tag + Iteration Tag
- ); -}; + ) +} \ No newline at end of file diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/entity.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/entity.ts index 270c4cb6535633f9a03e5b9fe02dca6121cd3ba7..0a0a1ee4b28661799aea5a9233c4f3a90f4a251e 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/entity.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Accuracy/entity.ts @@ -18,13 +18,13 @@ *--------------------------------------------------------------------------------------------*/ export interface FileInfo { - id: number; - fileName: string; - fileContent: string; - checked: boolean; - lossTag: string; - iterTag: string; - iters: number[]; - losses: number[][]; - iterLosses: { [iter: number]: number }; + id: number + fileName: string + fileContent: string + checked: boolean + lossTag: string + iterTag: string + iters: number[] + losses: number[][] + iterLosses: { [iter: number]: number } } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DataLoading.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DataLoading.tsx index 3c5d353ce641c409b51a7aaef8c00ff2f57df6e8..e2967bdf74196ad74a13f2d2f8b1799911d3b553 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DataLoading.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DataLoading.tsx @@ -2,18 +2,18 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { FullCircularProgress } from './FullCircularProgress'; +import * as React from 'react' +import { FullCircularProgress } from './FullCircularProgress' interface IProps { - value?: T | null; - children: (t: T) => JSX.Element; + value: T | undefined | null + children: (t: T) => JSX.Element } -export function DataLoading(props: IProps): JSX.Element { +export function DataLoading(props: IProps) { if (props.value === undefined || props.value === null) { - return ; + return } - return props.children(props.value); + return props.children(props.value) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DiffOverview.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DiffOverview.tsx index ed029d5020ed1eaf8caea159b25d33c7a5ad03e3..e8071b2c5966d944804b4d8abd780d8389042d38 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DiffOverview.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DiffOverview.tsx @@ -2,101 +2,130 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import Button from '@material-ui/core/Button'; -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import Grid from '@material-ui/core/Grid'; -import { makeStyles } from '@material-ui/core/styles'; -import Typography from '@material-ui/core/Typography'; -import ChevronLeftIcon from '@material-ui/icons/ChevronLeft'; -import { Select, Table } from 'antd'; -import * as React from 'react'; -import * as api from '../api'; -import { useResizeEventDependency } from '../utils/resize'; -import { FullCircularProgress } from './FullCircularProgress'; -import * as echarts from 'echarts'; - -const { Option } = Select; - -const topGraphHeight = 230; +import Button from '@material-ui/core/Button' +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import Grid from '@material-ui/core/Grid' +import { makeStyles } from '@material-ui/core/styles' +import Typography from '@material-ui/core/Typography' +import ChevronLeftIcon from '@material-ui/icons/ChevronLeft' +import { Select, Table } from 'antd' +import * as React from 'react' +import * as api from '../api' +import { useResizeEventDependency } from '../utils/resize' +import { FullCircularProgress } from './FullCircularProgress' +import * as echarts from 'echarts' + +const { Option } = Select + +const topGraphHeight = 230 const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, pre: { '& ul': { margin: 0, paddingLeft: theme.spacing(3), - ...theme.typography.body1, + ...theme.typography.body1 }, '& li': {}, '& a': { - color: '#ffa726', + color: '#ffa726' }, '& a:active': { - color: '#ffa726', + color: '#ffa726' }, '& p': { margin: 0, ...theme.typography.subtitle1, - fontWeight: theme.typography.fontWeightBold, - }, + fontWeight: theme.typography.fontWeightBold + } }, topGraph: { - height: topGraphHeight + 40, + height: topGraphHeight + 40 }, iconButton: { - padding: '8px', - }, -})); + padding: '8px' + } +})) -const getAngleByDataLength = (data: number): number => { +const getAngleByDataLength = (data: number) => { if (data < 10) { - return 0; + return 0 } else { // 数量越大越趋近于旋转90度 - return 90 * (1 - (10 / data)); + return 90 * (1 - 10 / data) } -}; +} export interface DiffColumnChartIProps { - rawData: any[]; - selectCallback: (row: number, column: number) => void; + rawData: any[] + selectCallback: (row: number, column: number) => void } export interface DiffStepChartIProps { - rawData: any[]; + rawData: any[] } -const DiffColumnChart: React.FC = (props: DiffColumnChartIProps) => { - const { rawData, selectCallback } = props; - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); +const DiffColumnChart: React.FC = ( + props: DiffColumnChartIProps +) => { + const { rawData, selectCallback } = props + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; + const element = graphRef.current + if (!element) return + + let left_duration_data: number[] = [] + let left_accumulated_duration_data: number[] = [] + + let right_duration_data: number[] = [] + let right_accumulated_duration_data: number[] = [] + + for (let i = 0; i < rawData.length; i++) { + let curr = rawData[i] + left_duration_data.push(curr[1]) + right_duration_data.push(curr[2]) + left_accumulated_duration_data.push(curr[3]) + right_accumulated_duration_data.push(curr[4]) } - const chart = echarts.init(element); + let left_duration_max = Math.max(...left_duration_data) + let right_duration_max = Math.max(...right_duration_data) + let duration_max = Math.max(left_duration_max, right_duration_max) + + let left_accumulated_duration_max = Math.max( + ...left_accumulated_duration_data + ) + let right_accumulated_duration_max = Math.max( + ...right_accumulated_duration_data + ) + let accumulated_max = Math.max( + left_accumulated_duration_max, + right_accumulated_duration_max + ) + + const chart = echarts.init(element) const options: echarts.EChartsOption = { title: { - text: 'Execution Comparsion', + text: 'Execution Comparsion' }, legend: { top: 10, - right: 10, + right: 10 }, tooltip: { trigger: 'axis', formatter: function (params: any) { - const index = params[0].name.indexOf('@'); - const safeName = params[0].name.replace(//g, '>'); - let res = `${index > -1 ? safeName.slice(index + 1) : safeName}
`; + const index = params[0].name.indexOf('@') + const safeName = params[0].name.replace(//g, '>') + var res = `${index > -1 ? safeName.slice(index + 1) : safeName}
` for (const item of params) { if (typeof item.value[item.encode.y[0]] === 'number') { res += ` - ${item.seriesName}: ${item.value[item.encode.y[0]]}
`; + ${item.seriesName}: ${item.value[item.encode.y[0]]}
` } } - return res; - }, + return res + } }, series: [ { type: 'bar', itemStyle: { - color: '#3366cc', + color: '#3366cc' }, yAxisIndex: 0, + }, { type: 'bar', itemStyle: { - color: '#dc3912', + color: '#dc3912' }, - yAxisIndex: 0, + yAxisIndex: 0 }, { type: 'line', itemStyle: { - color: '#ff9900', + color: '#ff9900' }, - yAxisIndex: 1, + yAxisIndex: 1 }, { type: 'line', itemStyle: { - color: '#109618', + color: '#109618' }, - yAxisIndex: 1, - }, + yAxisIndex: 1 + } ], xAxis: { type: 'category', @@ -148,81 +178,78 @@ const DiffColumnChart: React.FC = (props: DiffColumnChart interval: 0, rotate: getAngleByDataLength(rawData.length), formatter: (name: string) => { - const index = name.indexOf('@'); - const displayName = index > -1 ? name.slice(index + 1) : name; // 创建新变量 - return displayName.length > 16 ? `${displayName.slice(0, 14)}...` : displayName; - }, - }, + const index = name.indexOf('@') + if (index > -1) { + name = name.slice(index + 1) + } + return name.length > 16 ? name.slice(0, 14) + "..." : name; + } + } }, - yAxis: [ - { - type: 'value', - name: 'Time Difference(us)', - scale: true, - }, - { - type: 'value', - name: 'Accumulated Difference(us)', - scale: true, - }, - ], + yAxis: [{ + type: 'value', + name: 'Time Difference(us)', + scale: true + }, { + type: 'value', + name: 'Accumulated Difference(us)', + scale: true + }], dataset: { source: rawData.map((item, idx) => { // 添加索引保证x轴刻度不重复 - let param: any[] = [...item]; - param[0] = `${idx}@${param[0]}`; - return param; - }), - }, - }; - - if (options) { - chart.setOption(options, true); + let param: any[] = [...item] + param[0] = `${idx}@${param[0]}` + return param + }) + } } + + options && chart.setOption(options, true) chart.on('click', (param) => { if (param.seriesIndex !== undefined) { - selectCallback(param.dataIndex, param.seriesIndex + 1); + selectCallback(param.dataIndex, param.seriesIndex + 1) } - }); + }) return () => { - chart.dispose(); - }; - }, [rawData, resizeEventDependency]); + chart.dispose() + } + }, [rawData, resizeEventDependency]) return (
- ); -}; + ) +} -const DiffStepChart: React.FC = (props: DiffStepChartIProps) => { - const { rawData } = props; - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); +const DiffStepChart: React.FC = ( + props: DiffStepChartIProps +) => { + const { rawData } = props + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; - } - const chart = echarts.init(element); + const element = graphRef.current + if (!element) return + const chart = echarts.init(element) const options: echarts.EChartsOption = { title: { - text: 'Execution Diff', + text: 'Execution Diff' }, legend: { top: 10, - right: 10, + right: 10 }, dataset: { source: rawData.map((item, idx) => { // 添加索引保证x轴刻度不重复 - let param: any[] = [...item]; - param[0] = `${idx}@${param[0]}`; - return param; - }), + let param: any[] = [...item] + param[0] = `${idx}@${param[0]}` + return param + }) }, xAxis: { type: 'category', @@ -230,22 +257,24 @@ const DiffStepChart: React.FC = (props: DiffStepChartIProps interval: 0, rotate: getAngleByDataLength(rawData.length), formatter: (name: string) => { - const index = name.indexOf('@'); - const displayName = index > -1 ? name.slice(index + 1) : name; // 创建新变量 - return displayName.length > 16 ? `${displayName.slice(0, 14)}...` : displayName; - }, - }, + const index = name.indexOf('@') + if (index > -1) { + name = name.slice(index + 1) + } + return name.length > 16 ? name.slice(0, 14) + "..." : name; + } + } }, yAxis: { type: 'value', - scale: true, + scale: true }, tooltip: { trigger: 'axis', formatter: function (params: any) { - const index = params[0].name.indexOf('@'); - const safeName = params[0].name.replace(//g, '>'); - let res = `${index > -1 ? safeName.slice(index + 1) : safeName}
`; + const index = params[0].name.indexOf('@') + const safeName = params[0].name.replace(//g, '>') + var res = `${index > -1 ? safeName.slice(index + 1) : safeName}
` for (const item of params) { if (typeof item.value[item.encode.y[0]] === 'number') { res += ` - ${item.seriesName}: ${item.value[item.encode.y[0]]}
`; + ${item.seriesName}: ${item.value[item.encode.y[0]]}
` } } - return res; - }, + return res + } }, series: [ { @@ -269,411 +298,413 @@ const DiffStepChart: React.FC = (props: DiffStepChartIProps step: 'middle', areaStyle: { color: '#c1d1ef', - opacity: 1, - }, - }, - { + opacity: 1 + } + }, { type: 'line', color: '#dc3912', symbolSize: 0, step: 'middle', areaStyle: { color: '#f4c3b7', - opacity: 1, - }, - }, - ], - }; - - if (options) { - chart.setOption(options, true); + opacity: 1 + } + } + ] } + + options && chart.setOption(options, true) return () => { - chart.dispose(); - }; - }, [rawData, resizeEventDependency]); + chart.dispose() + } + }, [rawData, resizeEventDependency]) return (
- ); -}; + ) +} export interface IProps { - run: string; - worker: string; - span: string; - expRun: string; - expWorker: string; - expSpan: string; + run: string + worker: string + span: string + expRun: string + expWorker: string + expSpan: string } export interface ColumnUnderlyingData { - name: string; - path: string; - leftAggs: any[]; - rightAggs: any[]; + name: string + path: string + leftAggs: any[] + rightAggs: any[] } export interface TableRow { - key: number; - - operator: string; - baselineCalls?: number; - expCalls?: number; - deltaCalls?: number; - deltaCallsPercentNumber?: number; - deltaCallsPercent?: string; - - baselineHostDuration: number; - expHostDuration: number; - deltaHostDuration: number; - deltaHostDurationPercentNumber: number; - deltaHostDurationPercent: string; - - baselineSelfHostDuration: number; - expSelfHostDuration: number; - deltaSelfHostDuration: number; - deltaSelfHostDurationPercentNumber: number; - deltaSelfHostDurationPercent: string; - - baselineDeviceDuration: number; - expDeviceDuration: number; - deltaDeviceDuration: number; - deltaDeviceDurationPercentNumber: number; - deltaDeviceDurationPercent: string; - - baselineSelfDeviceDuration: number; - expSelfDeviceDuration: number; - deltaSelfDeviceDuration: number; - deltaSelfDeviceDurationPercentNumber: number; - deltaSelfDeviceDurationPercent: string; + key: number + + operator: string + baselineCalls?: number + expCalls?: number + deltaCalls?: number + deltaCallsPercentNumber?: number + deltaCallsPercent?: string + + baselineHostDuration: number + expHostDuration: number + deltaHostDuration: number + deltaHostDurationPercentNumber: number + deltaHostDurationPercent: string + + baselineSelfHostDuration: number + expSelfHostDuration: number + deltaSelfHostDuration: number + deltaSelfHostDurationPercentNumber: number + deltaSelfHostDurationPercent: string + + baselineDeviceDuration: number + expDeviceDuration: number + deltaDeviceDuration: number + deltaDeviceDurationPercentNumber: number + deltaDeviceDurationPercent: string + + baselineSelfDeviceDuration: number + expSelfDeviceDuration: number + deltaSelfDeviceDuration: number + deltaSelfDeviceDurationPercentNumber: number + deltaSelfDeviceDurationPercent: string } -let columnChartDataStack: any[][] = []; -let stepChartDataStack: any[][] = []; -let columnUnderlyingDataStack: ColumnUnderlyingData[][] = []; -let columnTableDataSourceStack: TableRow[][] = []; +let columnChartDataStack: any[][] = [] +let stepChartDataStack: any[][] = [] +let columnUnderlyingDataStack: ColumnUnderlyingData[][] = [] +let columnTableDataSourceStack: TableRow[][] = [] export const DiffOverview: React.FC = (props: IProps) => { // #region - Constant - const COMPOSITE_NODES_NAME = 'CompositeNodes'; + + const COMPOSITE_NODES_NAME = 'CompositeNodes' const hostDurationColumns = [ { title: 'Baseline Host Duration (us)', dataIndex: 'baselineHostDuration', key: 'baselineHostDuration', - sorter: (a: TableRow, b: TableRow): number => { - const aBaselineHost = a.baselineHostDuration ?? 0; - const bBaselineHost = b.baselineHostDuration ?? 0; - return aBaselineHost - bBaselineHost; - }, + sorter: (a: TableRow, b: TableRow) => + a.baselineHostDuration - b.baselineHostDuration }, { title: 'Exp Host Duration (us)', dataIndex: 'expHostDuration', key: 'expHostDuration', - sorter: (a: TableRow, b: TableRow): number => { - const aExpHost = a.expHostDuration ?? 0; - const bExpHost = b.expHostDuration ?? 0; - return aExpHost - bExpHost; - }, + sorter: (a: TableRow, b: TableRow) => + a.expHostDuration - b.expHostDuration }, { title: 'Delta Host Duration (us)', dataIndex: 'deltaHostDuration', key: 'deltaHostDuration', - sorter: (a: TableRow, b: TableRow): number => { - const aDeltaHost = a.deltaHostDuration ?? 0; - const bDeltaHost = b.deltaHostDuration ?? 0; - return aDeltaHost - bDeltaHost; - }, + sorter: (a: TableRow, b: TableRow) => + a.deltaHostDuration! - b.deltaHostDuration! }, { title: 'Delta Host Duration%', dataIndex: 'deltaHostDurationPercent', key: 'deltaHostDurationPercent', - sorter: (a: TableRow, b: TableRow): number => { - const aPercent = a.deltaHostDurationPercentNumber ?? 0; - const bPercent = b.deltaHostDurationPercentNumber ?? 0; - return aPercent - bPercent; - }, - }, - ]; + sorter: (a: TableRow, b: TableRow) => + a.deltaHostDurationPercentNumber! - b.deltaHostDurationPercentNumber! + } + ] const selfHostDurationColumns = [ { title: 'Baseline Self Host Duration (us)', dataIndex: 'baselineSelfHostDuration', key: 'baselineSelfHostDuration', - sorter: (a: TableRow, b: TableRow): number => a.baselineSelfHostDuration - b.baselineSelfHostDuration, + sorter: (a: TableRow, b: TableRow) => + a.baselineSelfHostDuration - b.baselineSelfHostDuration }, { title: 'Exp Self Host Duration (us)', dataIndex: 'expSelfHostDuration', key: 'expSelfHostDuration', - sorter: (a: TableRow, b: TableRow): number => a.expSelfHostDuration - b.expSelfHostDuration, + sorter: (a: TableRow, b: TableRow) => + a.expSelfHostDuration - b.expSelfHostDuration }, { title: 'Delta Self Host Duration (us)', dataIndex: 'deltaSelfHostDuration', key: 'deltaSelfHostDuration', - sorter: (a: TableRow, b: TableRow): number => { - const aDeltaSelfHost = a.deltaSelfHostDuration ?? 0; - const bDeltaSelfHost = b.deltaSelfHostDuration ?? 0; - return aDeltaSelfHost - bDeltaSelfHost; - }, + sorter: (a: TableRow, b: TableRow) => + a.deltaSelfHostDuration! - b.deltaSelfHostDuration! }, { title: 'Delta Self Host Duration%', dataIndex: 'deltaSelfHostDurationPercent', key: 'deltaSelfHostDurationPercent', - sorter: (a: TableRow, b: TableRow): number => { - const aSelfPercent = a.deltaSelfHostDurationPercentNumber ?? 0; - const bSelfPercent = b.deltaSelfHostDurationPercentNumber ?? 0; - return aSelfPercent - bSelfPercent; - }, - }, - ]; + sorter: (a: TableRow, b: TableRow) => + a.deltaSelfHostDurationPercentNumber! - + b.deltaSelfHostDurationPercentNumber! + } + ] const deviceDurationColumns = [ { title: 'Baseline Device Duration (us)', dataIndex: 'baselineDeviceDuration', key: 'baselineDeviceDuration', - sorter: (a: TableRow, b: TableRow): number => a.baselineDeviceDuration - b.baselineDeviceDuration, + sorter: (a: TableRow, b: TableRow) => + a.baselineDeviceDuration - b.baselineDeviceDuration }, { title: 'Exp Device Duration (us)', dataIndex: 'expDeviceDuration', key: 'expDeviceDuration', - sorter: (a: TableRow, b: TableRow): number => a.expDeviceDuration - b.expDeviceDuration, + sorter: (a: TableRow, b: TableRow) => + a.expDeviceDuration - b.expDeviceDuration }, { title: 'Delta Device Duration (us)', dataIndex: 'deltaDeviceDuration', key: 'deltaDeviceDuration', - sorter: (a: TableRow, b: TableRow): number => { - const aDeltaDeviceDuration = a.deltaDeviceDuration ?? 0; - const bdeltaDeviceDuration = b.deltaDeviceDuration ?? 0; - return aDeltaDeviceDuration - bdeltaDeviceDuration; - }, + sorter: (a: TableRow, b: TableRow) => + a.deltaDeviceDuration! - b.deltaDeviceDuration! }, { title: 'Delta Device Duration%', dataIndex: 'deltaDeviceDurationPercent', key: 'deltaDeviceDurationPercent', - sorter: (a: TableRow, b: TableRow): number => { - const aDeltaDeviceDurationPercentNumber = a.deltaDeviceDurationPercentNumber ?? 0; - const bDeltaDeviceDurationPercentNumber = b.deltaDeviceDurationPercentNumber ?? 0; - return aDeltaDeviceDurationPercentNumber - bDeltaDeviceDurationPercentNumber; - }, - }, - ]; + sorter: (a: TableRow, b: TableRow) => + a.deltaDeviceDurationPercentNumber! - + b.deltaDeviceDurationPercentNumber! + } + ] const selfDeviceDurationColumns = [ { title: 'Baseline Self Device Duration (us)', dataIndex: 'baselineSelfDeviceDuration', key: 'baselineSelfDeviceDuration', - sorter: (a: TableRow, b: TableRow): number => a.baselineSelfDeviceDuration - b.baselineSelfDeviceDuration, + sorter: (a: TableRow, b: TableRow) => + a.baselineSelfDeviceDuration - b.baselineSelfDeviceDuration }, { title: 'Exp Self Device Duration (us)', dataIndex: 'expSelfDeviceDuration', key: 'expSelfDeviceDuration', - sorter: (a: TableRow, b: TableRow): number => a.expSelfDeviceDuration - b.expSelfDeviceDuration, + sorter: (a: TableRow, b: TableRow) => + a.expSelfDeviceDuration - b.expSelfDeviceDuration }, { title: 'Delta Self Device Duration (us)', dataIndex: 'deltaSelfDeviceDuration', key: 'deltaSelfDeviceDuration', - sorter: (a: TableRow, b: TableRow): number => { - const aDeltaSelfDeviceDuration = a.deltaSelfDeviceDuration ?? 0; - const bDeltaSelfDeviceDuration = b.deltaSelfDeviceDuration ?? 0; - return aDeltaSelfDeviceDuration - bDeltaSelfDeviceDuration; - }, + sorter: (a: TableRow, b: TableRow) => + a.deltaSelfDeviceDuration! - b.deltaSelfDeviceDuration! }, { title: 'Delta Self Device Duration%', dataIndex: 'deltaSelfDeviceDurationPercent', key: 'deltaSelfDeviceDurationPercent', - sorter: (a: TableRow, b: TableRow): number => { - const aDeltaSelfDeviceDurationPercentNumber = a.deltaSelfDeviceDurationPercentNumber ?? 0; - const bDeltaSelfDeviceDurationPercentNumber = b.deltaSelfDeviceDurationPercentNumber ?? 0; - return aDeltaSelfDeviceDurationPercentNumber - bDeltaSelfDeviceDurationPercentNumber; - }, - }, - ]; + sorter: (a: TableRow, b: TableRow) => + a.deltaSelfDeviceDurationPercentNumber! - + b.deltaSelfDeviceDurationPercentNumber! + } + ] - interface IColumnMap { - [key: string]: any; - } - type IColumnMapType = IColumnMap; + type IColumnMapType = { [key: string]: any } const tableSourceColumnMap: IColumnMapType = { selfHostDuration: selfHostDurationColumns, hostDuration: hostDurationColumns, deviceDuration: deviceDurationColumns, - selfDeviceDuration: selfDeviceDurationColumns, - }; + selfDeviceDuration: selfDeviceDurationColumns + } const baseTableColumns = [ { title: 'Operator', dataIndex: 'operator', key: 'operator', - sorter: (a: TableRow, b: TableRow) => a.operator.localeCompare(b.operator), + sorter: (a: TableRow, b: TableRow) => a.operator.localeCompare(b.operator) }, { title: 'Baseline Calls', dataIndex: 'baselineCalls', key: 'baselineCalls', - sorter: (a: TableRow, b: TableRow) => a.baselineCalls ?? 0 - (b.baselineCalls ?? 0), + sorter: (a: TableRow, b: TableRow) => a.baselineCalls! - b.baselineCalls! }, { title: 'Exp Calls', dataIndex: 'expCalls', key: 'expCalls', - sorter: (a: TableRow, b: TableRow) => a.expCalls ?? 0 - (b.expCalls ?? 0), + sorter: (a: TableRow, b: TableRow) => a.expCalls! - b.expCalls! }, { title: 'Delta Calls', dataIndex: 'deltaCalls', key: 'deltaCalls', - sorter: (a: TableRow, b: TableRow) => a.deltaCalls ?? 0 - (b.deltaCalls ?? 0), + sorter: (a: TableRow, b: TableRow) => a.deltaCalls! - b.deltaCalls! }, { title: 'Delta Calls%', dataIndex: 'deltaCallsPercent', key: 'deltaCallsPercent', - sorter: (a: TableRow, b: TableRow) => a.deltaCallsPercentNumber ?? 0 - (b.deltaCallsPercentNumber ?? 0), - }, - ]; + sorter: (a: TableRow, b: TableRow) => + a.deltaCallsPercentNumber! - b.deltaCallsPercentNumber! + } + ] // #endregion // #region - State - const [tableDataSource, setTableDataSource] = React.useState([]); - const { run, worker, span, expRun, expWorker, expSpan } = props; + const [tableDataSource, setTableDataSource] = React.useState([]) + const { run, worker, span, expRun, expWorker, expSpan } = props - const [columnUnderlyingData, setColumnUnderlyingData] = React.useState([]); + const [columnUnderlyingData, setColumnUnderlyingData] = React.useState< + ColumnUnderlyingData[] + >([]) - const [rootUnderlyingData, setRootUnderlyingData] = React.useState(); + const [ + rootUnderlyingData, + setRootUnderlyingData + ] = React.useState() - const [columnChartData, setColumnChartData] = React.useState([]); - const [stepChartData, setStepChartData] = React.useState([]); + const [columnChartData, setColumnChartData] = React.useState([]) + const [stepChartData, setStepChartData] = React.useState([]) - const [selectedTableColumnsOptions, setSelectedTableColumnsOptions] = React.useState<[key: string]>(['hostDuration']); - const [selectedTableColumns, setSelectedTableColumns] = React.useState([ - ...baseTableColumns, - ...hostDurationColumns, - ]); + const [ + selectedTableColumnsOptions, + setSelectedTableColumnsOptions + ] = React.useState<[key: string]>(['hostDuration']) + const [selectedTableColumns, setSelectedTableColumns] = React.useState( + [...baseTableColumns, ...hostDurationColumns] + ) - const [dataStackLevel, setDataStackLevel] = React.useState(0); - const [loading, setLoading] = React.useState(false); + const [dataStackLevel, setDataStackLevel] = React.useState(0) + const [loading, setLoading] = React.useState(false) // #endregion - const classes = useStyles(); + const classes = useStyles() // #region - Event Handler - const handleChartColumnSelect = (row: number, column: number): void => { + const handleChartColumnSelect = (row: number, column: number) => { if (columnUnderlyingData.length === 0) { - return; + return } - let selectedUnderlyingData = columnUnderlyingData[row]; + let selectedUnderlyingData = columnUnderlyingData[row] if (!selectedUnderlyingData) { - return; + return } - let tableDataSource1 = generateDataSourceFromUnderlyingData(selectedUnderlyingData); - setTableDataSource(tableDataSource1); - columnTableDataSourceStack.push(tableDataSource1); + let tableDataSource = generateDataSourceFromUnderlyingData( + selectedUnderlyingData + ) + setTableDataSource(tableDataSource) + columnTableDataSourceStack.push(tableDataSource) - setLoading(true); + setLoading(true) api.defaultApi - .diffnodeGet(run, worker, span, expRun, expWorker, expSpan, selectedUnderlyingData.path) + .diffnodeGet( + run, + worker, + span, + expRun, + expWorker, + expSpan, + selectedUnderlyingData.path + ) .then((resp) => handleDiffNodeResp(resp)) - .finally(() => setLoading(false)); - }; + .finally(() => setLoading(false)) + } - const handleGoBack = (): void => { + const handleGoBack = () => { if (columnChartDataStack.length > 1) { - columnChartDataStack.pop(); - let top = columnChartDataStack[columnChartDataStack.length - 1]; - setColumnChartData(top); + columnChartDataStack.pop() + let top = columnChartDataStack[columnChartDataStack.length - 1] + setColumnChartData(top) } if (stepChartDataStack.length > 1) { - stepChartDataStack.pop(); - let top = stepChartDataStack[stepChartDataStack.length - 1]; - setStepChartData(top); + stepChartDataStack.pop() + let top = stepChartDataStack[stepChartDataStack.length - 1] + setStepChartData(top) } if (columnUnderlyingDataStack.length > 0) { - columnUnderlyingDataStack.pop(); - let top = columnUnderlyingDataStack[columnUnderlyingDataStack.length - 1]; - setColumnUnderlyingData(top); + columnUnderlyingDataStack.pop() + let top = columnUnderlyingDataStack[columnUnderlyingDataStack.length - 1] + setColumnUnderlyingData(top) } if (columnTableDataSourceStack.length > 0) { - columnTableDataSourceStack.pop(); - let top = columnTableDataSourceStack[columnTableDataSourceStack.length - 1]; + columnTableDataSourceStack.pop() + let top = + columnTableDataSourceStack[columnTableDataSourceStack.length - 1] if (top) { - setTableDataSource(top); + setTableDataSource(top) } else { - let tableDataSource2 = generateDataSourceFromUnderlyingData(rootUnderlyingData); - setTableDataSource(tableDataSource2); + let tableDataSource = generateDataSourceFromUnderlyingData( + rootUnderlyingData! + ) + setTableDataSource(tableDataSource) } } - setDataStackLevel(dataStackLevel - 1); - }; + setDataStackLevel(dataStackLevel - 1) + } - const toPercentString = (percentNumber: number): string => { + const toPercentString = (percentNumber: number) => { if (isNaN(percentNumber)) { - return 'N/A'; + return 'N/A' } - return `${percentNumber.toFixed(2)}%`; - }; + return `${percentNumber.toFixed(2)}%` + } - const handleColumnSelectionChange = (value: [key: string]): void => { - let columns = value.map((x) => tableSourceColumnMap[x]).flat(); - let r = [...baseTableColumns, ...columns]; - setSelectedTableColumnsOptions(value); - setSelectedTableColumns(r); - }; + const handleColumnSelectionChange = (value: [key: string]) => { + let columns = value.map((x) => tableSourceColumnMap[x]).flat() + let r = [...baseTableColumns, ...columns] + setSelectedTableColumnsOptions(value) + setSelectedTableColumns(r) + } - const generateDataSourceFromUnderlyingData = (selectedUnderlyingData?: ColumnUnderlyingData): TableRow[] => { - if (!selectedUnderlyingData) { - return []; - } - let newTableDataSource: TableRow[] = []; + const generateDataSourceFromUnderlyingData = ( + selectedUnderlyingData: ColumnUnderlyingData + ) => { + let tableDataSource: TableRow[] = [] for (let i = 0; i < selectedUnderlyingData.leftAggs.length; i++) { - let left = selectedUnderlyingData.leftAggs[i]; - let right = selectedUnderlyingData.rightAggs[i]; + let left = selectedUnderlyingData.leftAggs[i] + let right = selectedUnderlyingData.rightAggs[i] - let deltaCallsPercentNumber = ((right.calls - left.calls) / left.calls) * 100; + let deltaCallsPercentNumber = + ((right.calls - left.calls) / left.calls) * 100 - let deltaHostDurationPercentNumber = ((right.host_duration - left.host_duration) / left.host_duration) * 100; + let deltaHostDurationPercentNumber = + ((right.host_duration - left.host_duration) / left.host_duration) * 100 let deltaSelfHostDurationPercentNumber = - ((right.self_host_duration - left.self_host_duration) / left.self_host_duration) * 100; + ((right.self_host_duration - left.self_host_duration) / + left.self_host_duration) * + 100 let deltaDeviceDurationPercentNumber = - ((right.device_duration - left.device_duration) / left.device_duration) * 100; + ((right.device_duration - left.device_duration) / + left.device_duration) * + 100 let deltaSelfDeviceDurationPercentNumber = - ((right.self_device_duration - left.self_device_duration) / left.self_device_duration) * 100; + ((right.self_device_duration - left.self_device_duration) / + left.self_device_duration) * + 100 - newTableDataSource.push({ + tableDataSource.push({ key: i, operator: left.name, baselineCalls: left.calls, @@ -686,194 +717,214 @@ export const DiffOverview: React.FC = (props: IProps) => { expHostDuration: right.host_duration, deltaHostDuration: parseFloat((right.host_duration - left.host_duration).toFixed(3)), deltaHostDurationPercentNumber: deltaHostDurationPercentNumber, - deltaHostDurationPercent: toPercentString(deltaHostDurationPercentNumber), + deltaHostDurationPercent: toPercentString( + deltaHostDurationPercentNumber + ), baselineSelfHostDuration: left.self_host_duration, expSelfHostDuration: right.self_host_duration, - deltaSelfHostDuration: parseFloat((right.self_host_duration - left.self_host_duration).toFixed(3)), + deltaSelfHostDuration: + parseFloat((right.self_host_duration - left.self_host_duration).toFixed(3)), deltaSelfHostDurationPercentNumber: deltaSelfHostDurationPercentNumber, - deltaSelfHostDurationPercent: toPercentString(deltaSelfHostDurationPercentNumber), + deltaSelfHostDurationPercent: toPercentString( + deltaSelfHostDurationPercentNumber + ), baselineDeviceDuration: left.device_duration, expDeviceDuration: right.device_duration, deltaDeviceDuration: parseFloat((right.device_duration - left.device_duration).toFixed(3)), deltaDeviceDurationPercentNumber: deltaDeviceDurationPercentNumber, - deltaDeviceDurationPercent: toPercentString(deltaDeviceDurationPercentNumber), + deltaDeviceDurationPercent: toPercentString( + deltaDeviceDurationPercentNumber + ), baselineSelfDeviceDuration: left.self_device_duration, expSelfDeviceDuration: right.self_device_duration, - deltaSelfDeviceDuration: parseFloat((right.self_device_duration - left.self_device_duration).toFixed(3)), + deltaSelfDeviceDuration: + parseFloat((right.self_device_duration - left.self_device_duration).toFixed(3)), deltaSelfDeviceDurationPercentNumber: deltaSelfDeviceDurationPercentNumber, - deltaSelfDeviceDurationPercent: toPercentString(deltaSelfDeviceDurationPercentNumber), - }); + deltaSelfDeviceDurationPercent: toPercentString( + deltaSelfDeviceDurationPercentNumber + ) + }) } - return newTableDataSource; - }; + return tableDataSource + } React.useEffect(() => { - const hasData = + if ( run.length > 0 && worker.length > 0 && span.length > 0 && expRun.length > 0 && expWorker.length > 0 && - expSpan.length > 0; - if (hasData) { - setLoading(true); + expSpan.length > 0 + ) { + setLoading(true) - columnChartDataStack = []; - stepChartDataStack = []; - columnUnderlyingDataStack = []; - columnTableDataSourceStack = []; + columnChartDataStack = [] + stepChartDataStack = [] + columnUnderlyingDataStack = [] + columnTableDataSourceStack = [] api.defaultApi .diffnodeGet(run, worker, span, expRun, expWorker, expSpan) .then((resp) => { - handleDiffNodeResp(resp); - let newRootUnderlyingData = { + handleDiffNodeResp(resp) + let rootUnderlyingData = { name: 'rootNode', path: resp.path, leftAggs: resp.left.aggs, - rightAggs: resp.right.aggs, - }; + rightAggs: resp.right.aggs + } - setRootUnderlyingData(newRootUnderlyingData); - let tableDataSource3 = generateDataSourceFromUnderlyingData(newRootUnderlyingData); - setTableDataSource(tableDataSource3); + setRootUnderlyingData(rootUnderlyingData) + let tableDataSource = generateDataSourceFromUnderlyingData( + rootUnderlyingData! + ) + setTableDataSource(tableDataSource) }) - .finally(() => setLoading(false)); + .finally(() => setLoading(false)) - setSelectedTableColumns([...baseTableColumns, ...hostDurationColumns]); + setSelectedTableColumns([...baseTableColumns, ...hostDurationColumns]) } - }, [run, worker, span, expRun, expWorker, expSpan]); - - const handleDiffNodeResp = (resp: any): void => { - let newColumnChartData: any[] = []; - let newStepChartData: any[] = []; - let underlyingData: ColumnUnderlyingData[] = []; - - newColumnChartData.push(['Call', 'Baseline', 'Experiment', 'Baseline Trend', 'Exp Trend']); - newStepChartData.push(['Call', 'Diff', 'Accumulated Diff']); + }, [run, worker, span, expRun, expWorker, expSpan]) + + const handleDiffNodeResp = (resp: any) => { + let columnChartData: any[] = [] + let stepChartData: any[] = [] + let underlyingData: ColumnUnderlyingData[] = [] + + columnChartData.push([ + 'Call', + 'Baseline', + 'Experiment', + 'Baseline Trend', + 'Exp Trend' + ]) + stepChartData.push(['Call', 'Diff', 'Accumulated Diff']) if (resp.children.length > 0) { - let accumulatedLeftDuration = 0; - let accumulatedRightDuration = 0; - let accumulatedStepDiff = 0; + let accumulated_left_duration = 0 + let accumulated_right_duration = 0 + let accumulated_step_diff = 0 for (let i = 0; i < resp.children.length; i++) { - let left = resp.children[i].left; - let right = resp.children[i].right; - let currColumn: any[] = []; - let currStep: any[] = []; + let left = resp.children[i].left + let right = resp.children[i].right + let currColumn: any[] = [] + let currStep: any[] = [] - let name = left.name; + let name = left.name if (name === COMPOSITE_NODES_NAME) { - continue; + continue } if (name.startsWith('aten::')) { // Ignore aten operators - continue; + continue } if (name.startsWith('enumerate(DataLoader)')) { - name = name.substring(21); + name = name.substring(21) } if (name.startsWith('enumerate(DataPipe)')) { - name = name.substring(19); + name = name.substring(19) } if (name.startsWith('nn.Module: ')) { - name = name.substring(11); + name = name.substring(11) } if (name.startsWith('Optimizer.zero_grad')) { - name = 'Optimizer.zero_grad'; + name = 'Optimizer.zero_grad' } if (name.startsWith('Optimizer.step')) { - name = 'Optimizer.step'; + name = 'Optimizer.step' } - currColumn.push(name); - currColumn.push(left.total_duration); - currColumn.push(right.total_duration); + currColumn.push(name) + currColumn.push(left.total_duration) + currColumn.push(right.total_duration) - accumulatedLeftDuration += left.total_duration; - currColumn.push(accumulatedLeftDuration); + accumulated_left_duration += left.total_duration + currColumn.push(accumulated_left_duration) - accumulatedRightDuration += right.total_duration; - currColumn.push(accumulatedRightDuration); - newColumnChartData.push(currColumn); + accumulated_right_duration += right.total_duration + currColumn.push(accumulated_right_duration) + columnChartData.push(currColumn) underlyingData.push({ name: name, path: resp.children[i].path, leftAggs: left.aggs, - rightAggs: right.aggs, - }); + rightAggs: right.aggs + }) - currStep.push(name); - let stepDiff = right.total_duration - left.total_duration; - currStep.push(stepDiff); + currStep.push(name) + let stepDiff = right.total_duration - left.total_duration + currStep.push(stepDiff) - accumulatedStepDiff += stepDiff; - currStep.push(accumulatedStepDiff); + accumulated_step_diff += stepDiff + currStep.push(accumulated_step_diff) - newStepChartData.push(currStep); + stepChartData.push(currStep) } } else { - let left = resp.left; - let right = resp.right; - let currColumn: any[] = []; - let currStep: any[] = []; - let name = left.name; + let left = resp.left + let right = resp.right + let currColumn: any[] = [] + let currStep: any[] = [] + let name = left.name if (name.startsWith('nn.Module: ')) { - name = name.substring(11); + name = name.substring(11) } - currColumn.push(name); - currColumn.push(left.total_duration); - currColumn.push(right.total_duration); - currColumn.push(left.total_duration); - currColumn.push(right.total_duration); + currColumn.push(name) + currColumn.push(left.total_duration) + currColumn.push(right.total_duration) + currColumn.push(left.total_duration) + currColumn.push(right.total_duration) - newColumnChartData.push(currColumn); + columnChartData.push(currColumn) - currStep.push(name); - let stepDiff = right.total_duration - left.total_duration; - currStep.push(stepDiff); - currStep.push(stepDiff); - newStepChartData.push(currStep); + currStep.push(name) + let stepDiff = right.total_duration - left.total_duration + currStep.push(stepDiff) + currStep.push(stepDiff) + stepChartData.push(currStep) } - setColumnChartData(newColumnChartData); - columnChartDataStack.push(newColumnChartData); + setColumnChartData(columnChartData) + columnChartDataStack.push(columnChartData) + + setStepChartData(stepChartData) + stepChartDataStack.push(stepChartData) - setStepChartData(newStepChartData); - stepChartDataStack.push(newStepChartData); + setColumnUnderlyingData(underlyingData) + columnUnderlyingDataStack.push(underlyingData) - setColumnUnderlyingData(underlyingData); - columnUnderlyingDataStack.push(underlyingData); + setDataStackLevel(columnChartDataStack.length) + } - setDataStackLevel(columnChartDataStack.length); - }; // #endregion + // #endregion if (!loading && columnUnderlyingDataStack.length === 0) { return ( - - + + There is no run selected for diff. - ); + ) } if (loading) { - return ; + return } return ( @@ -881,62 +932,73 @@ export const DiffOverview: React.FC = (props: IProps) => { - - + + {columnChartData.length > 1 && ( <> - + )} - {columnChartData.length === 1 && No more level to show.} + {columnChartData.length === 1 && ( + No more level to show. + )} - - + +   - +
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DistributedView.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DistributedView.tsx index 096501b61bc9ce41978c65dc24f6b3640ab960f3..aad14aa29828fa1a8886ab3f68c54dd62cd396f9 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DistributedView.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/DistributedView.tsx @@ -2,54 +2,54 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import Grid from '@material-ui/core/Grid'; -import InputLabel from '@material-ui/core/InputLabel'; -import MenuItem from '@material-ui/core/MenuItem'; -import Select, { SelectProps } from '@material-ui/core/Select'; -import { makeStyles } from '@material-ui/core/styles'; -import { Table } from 'antd'; -import { ColumnsType } from 'antd/es/table'; -import * as React from 'react'; -import * as api from '../api'; -import { DistributedGraph, GpuInfo, Graph } from '../api'; -import { firstOrUndefined } from '../utils'; -import { ColumnChart } from './charts/ColumnChart'; -import { DataLoading } from './DataLoading'; -import { GpuInfoTable } from './GpuInfoTable'; -import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers'; +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import Grid from '@material-ui/core/Grid' +import InputLabel from '@material-ui/core/InputLabel' +import MenuItem from '@material-ui/core/MenuItem' +import Select, { SelectProps } from '@material-ui/core/Select' +import { makeStyles } from '@material-ui/core/styles' +import { Table } from 'antd' +import { ColumnsType } from 'antd/es/table' +import * as React from 'react' +import * as api from '../api' +import { DistributedGraph, GpuInfo, Graph } from '../api' +import { firstOrUndefined } from '../utils' +import { ColumnChart } from './charts/ColumnChart' +import { DataLoading } from './DataLoading' +import { GpuInfoTable } from './GpuInfoTable' +import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers' import { - distributedCommopsTableTooltip, - distributedGpuInfoTableTooltip, - distributedOverlapGraphTooltip, - distributedWaittimeGraphTooltip, -} from './TooltipDescriptions'; + DistributedCommopsTableTooltip, + DistributedGpuInfoTableTooltip, + DistributedOverlapGraphTooltip, + DistributedWaittimeGraphTooltip +} from './TooltipDescriptions' export interface IProps { - run: string; - worker: string; - span: string; + run: string + worker: string + span: string } const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, verticalInput: { display: 'flex', - alignItems: 'center', + alignItems: 'center' }, inputWidth: { - width: '4em', + width: '4em' }, inputWidthOverflow: { minWidth: '15em', - whiteSpace: 'nowrap', + whiteSpace: 'nowrap' }, description: { - marginLeft: theme.spacing(1), + marginLeft: theme.spacing(1) }, table: { height: '100%', @@ -58,152 +58,165 @@ const useStyles = makeStyles((theme) => ({ height: 20, fontSize: '10pt', '& > td': { - padding: '0 8px!important', - }, - }, - }, -})); + padding: '0 8px!important' + } + } + } +})) export const DistributedView: React.FC = (props) => { - const tooltipCommonClasses = useTooltipCommonStyles(); + const tooltipCommonClasses = useTooltipCommonStyles() const chartHeaderRenderer = React.useMemo( () => makeChartHeaderRenderer(tooltipCommonClasses), [tooltipCommonClasses] - ); + ) - let { run, worker, span } = props; - const classes = useStyles(); + let { run, worker, span } = props + const classes = useStyles() - const [overlapGraph, setOverlapGraph] = React.useState(undefined); - const [waittimeGraph, setWaittimeGraph] = React.useState(undefined); - const [commopsTableData, setCommopsTableData] = React.useState(undefined); - const [gpuInfo, setGpuInfo] = React.useState(undefined); - const [commopsTableTitle, setCommopsTableTitle] = React.useState(''); - const [commopsWorkers, setCommopsWorkers] = React.useState([]); - const [overlapSteps, setOverlapSteps] = React.useState([]); - const [waittimeSteps, setWaittimeSteps] = React.useState([]); - const [overlapStep, setOverlapStep] = React.useState(''); - const [waittimeStep, setWaittimeStep] = React.useState(''); - const [commopsWorker, setCommopsWorker] = React.useState(''); - const [columns, setColumns] = React.useState>([]); - const [pageSize, setPageSize] = React.useState(30); + const [overlapGraph, setOverlapGraph] = React.useState< + DistributedGraph | undefined + >(undefined) + const [waittimeGraph, setWaittimeGraph] = React.useState< + DistributedGraph | undefined + >(undefined) + const [commopsTableData, setCommopsTableData] = React.useState< + any | undefined + >(undefined) + const [gpuInfo, setGpuInfo] = React.useState(undefined) + const [commopsTableTitle, setCommopsTableTitle] = React.useState('') + const [commopsWorkers, setCommopsWorkers] = React.useState([]) + const [overlapSteps, setOverlapSteps] = React.useState([]) + const [waittimeSteps, setWaittimeSteps] = React.useState([]) + const [overlapStep, setOverlapStep] = React.useState('') + const [waittimeStep, setWaittimeStep] = React.useState('') + const [commopsWorker, setCommopsWorker] = React.useState('') + const [columns, setColumns] = React.useState>([]) + const [pageSize, setPageSize] = React.useState(30) React.useEffect(() => { if (waittimeSteps.includes('all')) { - setWaittimeStep('all'); + setWaittimeStep('all') } else { - setWaittimeStep(firstOrUndefined(waittimeSteps) ?? ''); + setWaittimeStep(firstOrUndefined(waittimeSteps) ?? '') } - }, [waittimeSteps]); + }, [waittimeSteps]) React.useEffect(() => { if (overlapSteps.includes('all')) { - setOverlapStep('all'); + setOverlapStep('all') } else { - setOverlapStep(firstOrUndefined(overlapSteps) ?? ''); + setOverlapStep(firstOrUndefined(overlapSteps) ?? '') } - }, [overlapSteps]); + }, [overlapSteps]) React.useEffect(() => { - setCommopsWorker(firstOrUndefined(commopsWorkers) ?? ''); - }, [commopsWorkers]); + setCommopsWorker(firstOrUndefined(commopsWorkers) ?? '') + }, [commopsWorkers]) React.useEffect(() => { api.defaultApi.distributedOverlapGet(run, 'All', span).then((resp) => { - setOverlapGraph(resp); - setOverlapSteps(Object.keys(resp.data)); - }); + setOverlapGraph(resp) + setOverlapSteps(Object.keys(resp.data)) + }) api.defaultApi.distributedWaittimeGet(run, 'All', span).then((resp) => { - setWaittimeGraph(resp); - setWaittimeSteps(Object.keys(resp.data)); - }); + setWaittimeGraph(resp) + setWaittimeSteps(Object.keys(resp.data)) + }) api.defaultApi.distributedCommopsGet(run, 'All', span).then((resp) => { - setCommopsTableData(resp.data); - setCommopsWorkers(Object.keys(resp.data)); - setCommopsTableTitle(resp.metadata.title); - }); + setCommopsTableData(resp.data) + setCommopsWorkers(Object.keys(resp.data)) + setCommopsTableTitle(resp.metadata.title) + }) api.defaultApi.distributedGpuinfoGet(run, 'All', span).then((resp) => { - setGpuInfo(resp); - }); - }, [run, worker, span]); + setGpuInfo(resp) + }) + }, [run, worker, span]) const onCommopsWorkerChanged: SelectProps['onChange'] = (event) => { - setCommopsWorker(event.target.value as string); - }; + setCommopsWorker(event.target.value as string) + } const onOverlapStepChanged: SelectProps['onChange'] = (event) => { - setOverlapStep(event.target.value as string); - }; + setOverlapStep(event.target.value as string) + } const onWaittimeStepChanged: SelectProps['onChange'] = (event) => { - setWaittimeStep(event.target.value as string); - }; + setWaittimeStep(event.target.value as string) + } - const getColumnChartData = (distributedGraph?: DistributedGraph, step?: string): any => { - if (!distributedGraph || !step) { - return undefined; - } - const barLabels = Object.keys(distributedGraph.data[step]); + const getColumnChartData = ( + distributedGraph?: DistributedGraph, + step?: string + ) => { + if (!distributedGraph || !step) return undefined + const barLabels = Object.keys(distributedGraph.data[step]) return { legends: distributedGraph.metadata.legends, barLabels, - barHeights: barLabels.map((label) => distributedGraph.data[step][label]), - }; - }; - const overlapData = React.useMemo(() => getColumnChartData(overlapGraph, overlapStep), [overlapGraph, overlapStep]); + barHeights: barLabels.map((label) => distributedGraph.data[step][label]) + } + } + const overlapData = React.useMemo( + () => getColumnChartData(overlapGraph, overlapStep), + [overlapGraph, overlapStep] + ) const waittimeData = React.useMemo( () => getColumnChartData(waittimeGraph, waittimeStep), [waittimeGraph, waittimeStep] - ); + ) - const getTableData = (tableData?: any, opsWorker?: string): any[] => { - if (!tableData || !opsWorker) { - return []; + const getTableData = (tableData?: any, worker?: string) => { + if (!tableData || !worker) { + return [] } - let dataInfo: api.Graph = tableData[opsWorker]; - const stringCompare = (a: string, b: string): number => a.localeCompare(b); - const numberCompare = (a: number, b: number): number => a - b; - let column: any[] = dataInfo.columns.map((item) => { + let dataInfo: api.Graph = tableData[worker] + const stringCompare = (a: string, b: string) => a.localeCompare(b) + const numberCompare = (a: number, b: number) => a - b + let column: any[] = dataInfo.columns.map(item => { return { title: item.name, key: item.name, dataIndex: item.name, - sorter: - item.type === 'string' - ? (a: any, b: any): number => stringCompare(a[item.name], b[item.name]) - : (a: any, b: any): number => numberCompare(a[item.name], b[item.name]), - }; - }); - setColumns(column); + sorter: item.type == 'string' ? (a: any, b: any) => stringCompare(a[item.name], b[item.name]) + : (a: any, b: any) => numberCompare(a[item.name], b[item.name]) + } + }) + setColumns(column) return dataInfo.rows.map((row, index) => { if (row.length !== dataInfo.columns.length) { - return null; + return null } - const dataRow: { [column: string]: number | string } = { key: index }; - dataInfo.columns.forEach((item, idx) => { - dataRow[item.name] = row[idx] as string | number; - }); - return dataRow; - }); - }; + const dataRow: { [column: string]: number | string } = { key: index } + dataInfo.columns.forEach((column, index) => { + dataRow[column.name] = row[index] as string | number + }) + return dataRow + }) + } const commopsTable: any[] = React.useMemo(() => { - return getTableData(commopsTableData, commopsWorker); - }, [commopsTableData, commopsWorker]); + return getTableData(commopsTableData, commopsWorker) + }, [commopsTableData, commopsWorker]) - const onShowSizeChange = (current: number, size: number): void => { - setPageSize(size); - }; + const onShowSizeChange = (current: number, size: number) => { + setPageSize(size) + } return (
- - + + {gpuInfo && ( - + @@ -212,15 +225,19 @@ export const DistributedView: React.FC = (props) => { )} - {(chartData): JSX.Element => ( + {(chartData) => ( - + - Step + Step - {overlapSteps.map((step) => ( {step} ))} @@ -230,25 +247,35 @@ export const DistributedView: React.FC = (props) => { {overlapGraph?.metadata?.title && ( )} - + )} - {(chartData): JSX.Element => ( + {(chartData) => ( - + - Step + Step - {waittimeSteps.map((step) => ( {step} ))} @@ -258,7 +285,10 @@ export const DistributedView: React.FC = (props) => { {waittimeGraph?.metadata?.title && ( )} = (props) => { - - + + - + - Worker + Worker - + {commopsWorkers.map((worker) => ( + {worker} ))} @@ -299,7 +338,7 @@ export const DistributedView: React.FC = (props) => { pageSize, pageSizeOptions: ['20', '30', '50', '100'], hideOnSinglePage: true, - onShowSizeChange, + onShowSizeChange }} /> @@ -309,5 +348,5 @@ export const DistributedView: React.FC = (props) => {
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/FullCircularProgress.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/FullCircularProgress.tsx index 3f4c0fbaf15a15d402aa205574a28df045d24aec..5212bd74bf9739cc171d369e6591a0c26f058f6a 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/FullCircularProgress.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/FullCircularProgress.tsx @@ -1,23 +1,23 @@ /*--------------------------------------------------------------------------------------------- * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import CircularProgress from '@material-ui/core/CircularProgress'; -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; +import CircularProgress from '@material-ui/core/CircularProgress' +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' const useStyles = makeStyles(() => ({ root: { width: '100%', display: 'flex', - justifyContent: 'center', - }, -})); + justifyContent: 'center' + } +})) export const FullCircularProgress: React.FC = () => { - const classes = useStyles(); + const classes = useStyles() return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/GpuInfoTable.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/GpuInfoTable.tsx index 07f6f1d78c88abab5f62f844356b47ca517a2561..4c624db0580caa466271e56505f2838637705884 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/GpuInfoTable.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/GpuInfoTable.tsx @@ -2,123 +2,127 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' export interface IProps { - gpuInfo: any; + gpuInfo: any } const useStyles = makeStyles((theme) => ({ root: { border: '1px solid #E0E0E0', borderCollapse: 'collapse', - width: '100%', + width: '100%' }, td: { borderTop: '1px solid #E0E0E0', borderBottom: '1px solid #E0E0E0', borderCollapse: 'collapse', paddingLeft: 10, - paddingRight: 10, + paddingRight: 10 }, nodeTd: { - fontWeight: 'bold', + fontWeight: 'bold' }, pidTd: { - fontWeight: 'normal', + fontWeight: 'normal' }, gpuTd: { - fontWeight: 'normal', + fontWeight: 'normal' }, keyTd: { fontWeight: 'normal', - textAlign: 'right', + textAlign: 'right' }, valueTd: { - fontWeight: 'bold', - }, -})); + fontWeight: 'bold' + } +})) interface TableCellInfo { - content: string; - rowspan: number; - cellType: 'node' | 'pid' | 'gpu' | 'key' | 'value'; - last?: boolean; + content: string + rowspan: number + cellType: 'node' | 'pid' | 'gpu' | 'key' | 'value' + last?: boolean } function makeTableCellInfo(gpuInfo: any): TableCellInfo[][] { - const rows: TableCellInfo[][] = []; - let currRow: TableCellInfo[] = []; - rows.push(currRow); - Object.keys(gpuInfo.data).forEach((nodeName) => { - const nodeCell = { - content: nodeName, + const rows: TableCellInfo[][] = [] + let curr_row: TableCellInfo[] = [] + rows.push(curr_row) + Object.keys(gpuInfo.data).forEach(function (node_name) { + const node_cell = { + content: node_name, rowspan: 0, - cellType: 'node' as const, - }; - const i = rows.length; - currRow.push(nodeCell); - Object.keys(gpuInfo.data[nodeName]).forEach((pid) => { - const pidCell = { content: pid, rowspan: 0, cellType: 'pid' as const }; - const j = rows.length; - currRow.push(pidCell); - Object.keys(gpuInfo.data[nodeName][pid]).forEach((gpu) => { - const gpuCell = { content: gpu, rowspan: 0, cellType: 'gpu' as const }; - const k = rows.length; - currRow.push(gpuCell); - Object.keys(gpuInfo.data[nodeName][pid][gpu]).forEach((keyName) => { - currRow.push({ - content: keyName, + cellType: 'node' as const + } + const i = rows.length + curr_row.push(node_cell) + Object.keys(gpuInfo.data[node_name]).forEach(function (pid) { + const pid_cell = { content: pid, rowspan: 0, cellType: 'pid' as const } + const i = rows.length + curr_row.push(pid_cell) + Object.keys(gpuInfo.data[node_name][pid]).forEach(function (gpu) { + const gpu_cell = { content: gpu, rowspan: 0, cellType: 'gpu' as const } + const i = rows.length + curr_row.push(gpu_cell) + Object.keys(gpuInfo.data[node_name][pid][gpu]).forEach(function ( + key_name + ) { + curr_row.push({ + content: key_name, rowspan: 1, - cellType: 'key' as const, - }); - const value: string = gpuInfo.data[nodeName][pid][gpu][keyName]; - currRow.push({ + cellType: 'key' as const + }) + const value: string = gpuInfo.data[node_name][pid][gpu][key_name] + curr_row.push({ content: value, rowspan: 1, - cellType: 'value' as const, - }); - currRow = []; - rows.push(currRow); - }); - gpuCell.rowspan = rows.length - k; - }); - pidCell.rowspan = rows.length - j; - }); - nodeCell.rowspan = rows.length - i; - }); - rows.pop(); - return rows; + cellType: 'value' as const + }) + curr_row = [] + rows.push(curr_row) + }) + gpu_cell.rowspan = rows.length - i + }) + pid_cell.rowspan = rows.length - i + }) + node_cell.rowspan = rows.length - i + }) + rows.pop() + return rows } export const GpuInfoTable: React.FC = (props) => { - const classes = useStyles(); - interface TableCellInfoNoLast { - content: string; - rowspan: number; - cellType: 'node' | 'pid' | 'gpu' | 'key' | 'value'; + const classes = useStyles() + interface TableCellInfo { + content: string + rowspan: number + cellType: 'node' | 'pid' | 'gpu' | 'key' | 'value' } - const rows = React.useMemo(() => makeTableCellInfo(props.gpuInfo), [props.gpuInfo]); + const rows = React.useMemo(() => makeTableCellInfo(props.gpuInfo), [ + props.gpuInfo + ]) const cellToClass = { node: classes.nodeTd, pid: classes.pidTd, gpu: classes.gpuTd, key: classes.keyTd, - value: classes.valueTd, - }; + value: classes.valueTd + } - const renderCell = function (info: TableCellInfoNoLast): JSX.Element { - let cellClass = cellToClass[info.cellType]; - let content = info.cellType === 'key' ? `${info.content}:` : info.content; + const renderCell = function (info: TableCellInfo) { + let cellClass = cellToClass[info.cellType] + let content = info.cellType == 'key' ? info.content + ':' : info.content return ( -
- ); - }; + ) + } return (
+ {content}
@@ -126,5 +130,5 @@ export const GpuInfoTable: React.FC = (props) => { {row.map(renderCell)} ))}
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Kernel.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Kernel.tsx index 66e05695153a853f68d382a2f3b6a68931861abf..62ec350b8b400a03bd64c032ee2a61a4ca9a1852 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Kernel.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Kernel.tsx @@ -15,183 +15,208 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import FormControlLabel from '@material-ui/core/FormControlLabel'; -import Grid from '@material-ui/core/Grid'; -import InputLabel from '@material-ui/core/InputLabel'; -import MenuItem from '@material-ui/core/MenuItem'; -import Radio from '@material-ui/core/Radio'; -import RadioGroup, { RadioGroupProps } from '@material-ui/core/RadioGroup'; -import Select, { SelectProps } from '@material-ui/core/Select'; -import { makeStyles } from '@material-ui/core/styles'; -import TextField, { StandardTextFieldProps, TextFieldProps } from '@material-ui/core/TextField'; -import * as React from 'react'; -import * as api from '../api'; -import { Graph } from '../api'; -import { KernelGroupBy } from '../constants/groupBy'; -import { useSearch } from '../utils/search'; -import { topIsValid, UseTop, useTopN } from '../utils/top'; -import { AntTableChart } from './charts/AntTableChart'; -import { PieChart } from './charts/PieChart'; -import { DataLoading } from './DataLoading'; -import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers'; +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import FormControlLabel from '@material-ui/core/FormControlLabel' +import Grid from '@material-ui/core/Grid' +import InputLabel from '@material-ui/core/InputLabel' +import MenuItem from '@material-ui/core/MenuItem' +import Radio from '@material-ui/core/Radio' +import RadioGroup, { RadioGroupProps } from '@material-ui/core/RadioGroup' +import Select, { SelectProps } from '@material-ui/core/Select' +import { makeStyles } from '@material-ui/core/styles' +import TextField, { + StandardTextFieldProps, + TextFieldProps +} from '@material-ui/core/TextField' +import * as React from 'react' +import * as api from '../api' +import { Graph } from '../api' +import { KernelGroupBy } from '../constants/groupBy' +import { useSearch } from '../utils/search' +import { topIsValid, UseTop, useTopN } from '../utils/top' +import { AntTableChart } from './charts/AntTableChart' +import { PieChart } from './charts/PieChart' +import { DataLoading } from './DataLoading' +import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers' import { - gpuKernelTotalTimeTooltip, - tensorCoresPieChartTooltip, - tensorCoresPieChartTooltipAscend, -} from './TooltipDescriptions'; + GPUKernelTotalTimeTooltip, + TensorCoresPieChartTooltip, + TensorCoresPieChartTooltipAscend +} from './TooltipDescriptions' export interface IProps { - run: string; - worker: string; - span: string; - deviceTarget: string; + run: string + worker: string + span: string + deviceTarget: string } const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, verticalInput: { display: 'flex', - alignItems: 'center', + alignItems: 'center' }, inputWidth: { - width: '4em', + width: '4em' }, inputWidthOverflow: { minWidth: '15em', - whiteSpace: 'nowrap', + whiteSpace: 'nowrap' }, description: { - marginLeft: theme.spacing(1), - }, -})); + marginLeft: theme.spacing(1) + } +})) export const Kernel: React.FC = (props) => { - const { run, worker, span, deviceTarget } = props; - const classes = useStyles(); - const tooltipCommonClasses = useTooltipCommonStyles(); + const { run, worker, span, deviceTarget } = props + const classes = useStyles() + const tooltipCommonClasses = useTooltipCommonStyles() const chartHeaderRenderer = React.useMemo( () => makeChartHeaderRenderer(tooltipCommonClasses), [tooltipCommonClasses] - ); + ) - const [kernelGraph, setKernelGraph] = React.useState(undefined); - const [tcGraph, setTcGraph] = React.useState(undefined); - const [kernelTable, setKernelTable] = React.useState(undefined); - const [groupBy, setGroupBy] = React.useState(KernelGroupBy.KERNEL); - const [searchKernelName, setSearchKernelName] = React.useState(''); - const [searchOpName, setSearchOpName] = React.useState(''); - const [sortColumn, setSortColumn] = React.useState(''); - const [hasStep, setHasStep] = React.useState(false); + const [kernelGraph, setKernelGraph] = React.useState( + undefined + ) + const [tcGraph, setTcGraph] = React.useState(undefined) + const [kernelTable, setKernelTable] = React.useState( + undefined + ) + const [groupBy, setGroupBy] = React.useState(KernelGroupBy.Kernel) + const [searchKernelName, setSearchKernelName] = React.useState('') + const [searchOpName, setSearchOpName] = React.useState('') + const [sortColumn, setSortColumn] = React.useState('') + const [hasStep, setHasStep] = React.useState(false) const [topText, actualTop, useTop, setTopText, setUseTop] = useTopN({ - defaultUseTop: UseTop.USE, - defaultTop: 10, - }); + defaultUseTop: UseTop.Use, + defaultTop: 10 + }) React.useEffect(() => { - setSearchOpName(''); - }, [groupBy]); + setSearchOpName('') + }, [groupBy]) React.useEffect(() => { if (kernelGraph) { - setTopText(String(Math.min(kernelGraph.rows?.length, 10))); + setTopText(String(Math.min(kernelGraph.rows?.length, 10))) } - }, [kernelGraph]); + }, [kernelGraph]) React.useEffect(() => { api.defaultApi.kernelTableGet(run, worker, span, groupBy).then((resp) => { - setSortColumn(resp.metadata.sort); - setKernelTable(resp.data); - const nameColumnIdx = resp.data.columns.findIndex((c) => c.name.toLowerCase() === 'step id'); - setHasStep(nameColumnIdx > -1); - }); - }, [run, worker, span, groupBy]); + setSortColumn(resp.metadata.sort) + setKernelTable(resp.data) + const nameColumnIdx = resp.data.columns.findIndex( + (c) => c.name.toLowerCase() === 'step id' + ) + setHasStep(nameColumnIdx > -1) + }) + }, [run, worker, span, groupBy]) React.useEffect(() => { - api.defaultApi.kernelGet(run, worker, span, KernelGroupBy.KERNEL).then((resp) => { - setKernelGraph(resp.total); - setGroupBy(resp.device_target === 'Ascend' ? KernelGroupBy.KERNEL_NAME_AND_OP_NAME : KernelGroupBy.KERNEL); - }); - }, [run, worker, span]); + api.defaultApi + .kernelGet(run, worker, span, KernelGroupBy.Kernel) + .then((resp) => { + setKernelGraph(resp.total) + setGroupBy(resp.device_target === 'Ascend' ? KernelGroupBy.KernelNameAndOpName : KernelGroupBy.Kernel) + }) + }, [run, worker, span]) React.useEffect(() => { api.defaultApi.kernelTcPieGet(run, worker, span).then((resp) => { - setTcGraph(resp.total); - }); - }, [run, worker, span]); + setTcGraph(resp.total) + }) + }, [run, worker, span]) - const [searchedKernelTable] = useSearch(searchKernelName, 'name', kernelTable); + const [searchedKernelTable] = useSearch(searchKernelName, 'name', kernelTable) const [searchedOpTable] = useSearch( searchOpName, deviceTarget === 'Ascend' ? 'step id' : 'operator', searchedKernelTable - ); + ) const onGroupByChanged: SelectProps['onChange'] = (event) => { - setGroupBy(event.target.value as KernelGroupBy); - }; + setGroupBy(event.target.value as KernelGroupBy) + } const onSearchKernelChanged: TextFieldProps['onChange'] = (event) => { - setSearchKernelName(event.target.value as string); - }; + setSearchKernelName(event.target.value as string) + } const onSearchOpChanged: TextFieldProps['onChange'] = (event) => { - setSearchOpName(event.target.value as string); - }; + setSearchOpName(event.target.value as string) + } const onUseTopChanged: RadioGroupProps['onChange'] = (event) => { - setUseTop(event.target.value as UseTop); - }; + setUseTop(event.target.value as UseTop) + } - const onTopChanged = (event: React.ChangeEvent): void => { - setTopText(event.target.value); - }; + const onTopChanged = (event: React.ChangeEvent) => { + setTopText(event.target.value) + } const inputProps: StandardTextFieldProps['inputProps'] = { - min: 1, - }; + min: 1 + } const GPUKernelTotalTimeTitle = React.useMemo( - () => chartHeaderRenderer('Total Time (us)', gpuKernelTotalTimeTooltip), + () => chartHeaderRenderer('Total Time (us)', GPUKernelTotalTimeTooltip), [chartHeaderRenderer] - ); + ) const TensorCoresTitle = React.useMemo( - () => - deviceTarget === 'Ascend' - ? chartHeaderRenderer('Accelerator Core Utilization', tensorCoresPieChartTooltipAscend) - : chartHeaderRenderer('Tensor Cores Utilization', tensorCoresPieChartTooltip), + () => deviceTarget === 'Ascend' ? + chartHeaderRenderer( + 'Accelerator Core Utilization', + TensorCoresPieChartTooltipAscend + ) + : + chartHeaderRenderer( + 'Tensor Cores Utilization', + TensorCoresPieChartTooltip + ), [chartHeaderRenderer, deviceTarget] - ); + ) return (
- - + + - } label='All kernels' /> - } label='Top kernels to show' /> + } + label="All kernels" + /> + } + label="Top kernels to show" + /> - {useTop === UseTop.USE && ( + {useTop === UseTop.Use && ( = (props) => { - {(graph): JSX.Element => ( + {(graph) => ( - + )} - {(graph): JSX.Element => ( + {(graph) => ( = (props) => { graph={graph} colors={['#0099C6', '#DD4477', '#66AA00', '#B82E2E']} top={actualTop} - tooltipMode='percentage' + tooltip_mode="percentage" /> )} - + - + - Group By - + {deviceTarget === 'Ascend' ? 'Statistic' : 'Kernel Properties + Op Name'} - + {deviceTarget === 'Ascend' ? 'All' : 'Kernel Name'} @@ -246,49 +279,50 @@ export const Kernel: React.FC = (props) => { classes={{ root: classes.inputWidthOverflow }} value={searchKernelName} onChange={onSearchKernelChanged} - type='search' - label='Search by Name' + type="search" + label="Search by Name" inputProps={{ - maxLength: 200, + maxLength: 200 }} /> - {deviceTarget === 'Ascend' - ? groupBy === KernelGroupBy.KERNEL && - hasStep && ( - - - - ) - : groupBy === KernelGroupBy.KERNEL_NAME_AND_OP_NAME && ( - - - - )} + {deviceTarget === 'Ascend' ? + (groupBy === KernelGroupBy.Kernel && hasStep && + + + ) + : + (groupBy === KernelGroupBy.KernelNameAndOpName && + + + ) + } - {(graph): JSX.Element => } + {(graph) => ( + + )} @@ -297,5 +331,5 @@ export const Kernel: React.FC = (props) => {
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/MemoryView.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/MemoryView.tsx index 225f28a931e969d7cfd40d3f490e7cb45c64a305..a8f6c458eae79adf09371fcb73ecb29d1a62d067 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/MemoryView.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/MemoryView.tsx @@ -15,22 +15,22 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import Grid from '@material-ui/core/Grid'; -import InputLabel from '@material-ui/core/InputLabel'; -import MenuItem from '@material-ui/core/MenuItem'; -import Select, { SelectProps } from '@material-ui/core/Select'; -import Slider from '@material-ui/core/Slider'; -import { makeStyles } from '@material-ui/core/styles'; -import TextField, { TextFieldProps } from '@material-ui/core/TextField'; -import * as React from 'react'; -import * as api from '../api'; +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import Grid from '@material-ui/core/Grid' +import InputLabel from '@material-ui/core/InputLabel' +import MenuItem from '@material-ui/core/MenuItem' +import Select, { SelectProps } from '@material-ui/core/Select' +import Slider from '@material-ui/core/Slider' +import { makeStyles } from '@material-ui/core/styles' +import TextField, { TextFieldProps } from '@material-ui/core/TextField' +import * as React from 'react' +import * as api from '../api' import { Graph, GraphAscend, @@ -39,237 +39,288 @@ import { MemoryCurveDataAscend, MemoryEventsData, MemoryEventsDataAll, - MemoryStatsData, -} from '../api'; -import { useSearchDirectly } from '../utils/search'; -import { AntTableChart } from './charts/AntTableChart'; -import { LineChart } from './charts/NewLineChart'; -import { DataLoading } from './DataLoading'; -import { MemoryStatsTable } from './tables/MemoryStatsTable'; + MemoryStatsData +} from '../api' +import { useSearchDirectly } from '../utils/search' +import { AntTableChart } from './charts/AntTableChart' +import { LineChart } from './charts/NewLineChart' +import { DataLoading } from './DataLoading' +import { MemoryStatsTable } from './tables/MemoryStatsTable' const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, curve: { - marginBottom: 20, + marginBottom: 20 }, verticalInput: { display: 'flex', - alignItems: 'center', + alignItems: 'center' }, inputWidth: { - width: '4em', + width: '4em' }, inputWidthOverflow: { minWidth: '15em', - whiteSpace: 'nowrap', + whiteSpace: 'nowrap' }, full: { - width: '100%', + width: '100%' }, description: { - marginLeft: theme.spacing(1), + marginLeft: theme.spacing(1) }, filterSlider: { marginTop: 15, marginRight: 6, - width: 250, + width: 250 }, filterInput: { - width: 100, - }, -})); + width: 100 + } +})) export interface IProps { - run: string; - worker: string; - span: string; - deviceTarget: string; + run: string + worker: string + span: string + deviceTarget: string } -const tags = ['Operator', 'Component']; +const tags = ['Operator', 'Component'] export const MemoryView: React.FC = React.memo((props) => { interface EventSizeFilter { - [deviceName: string]: Array; + [deviceName: string]: Array } interface MaxEventSize { - [deviceName: string]: number; + [deviceName: string]: number } - const { run, worker, span, deviceTarget } = props; - const classes = useStyles(); + const { run, worker, span, deviceTarget } = props + const classes = useStyles() - const [memoryStatsData, setMemoryStatsData] = React.useState(undefined); + const [memoryStatsData, setMemoryStatsData] = React.useState< + MemoryStatsData | undefined + >(undefined) // for backward compatability, old profile do not have events to show - const showEvents = (): boolean | undefined => { - return memoryEventsData && Object.keys(memoryEventsData.rows).length !== 0; - }; - const [memoryEventsData, setMemoryEventsData] = React.useState(undefined); + const showEvents = () => { + return memoryEventsData && Object.keys(memoryEventsData.rows).length != 0 + } + const [memoryEventsData, setMemoryEventsData] = React.useState< + MemoryEventsData | undefined + >(undefined) // for backward compatability, old profile do not have curve to show - const showCurve = (): boolean | undefined => { - return memoryCurveData && Object.keys(memoryCurveData.rows).length !== 0; - }; - const [memoryCurveData, setMemoryCurveData] = React.useState( - undefined - ); + const showCurve = () => { + return memoryCurveData && Object.keys(memoryCurveData.rows).length != 0 + } + const [memoryCurveData, setMemoryCurveData] = React.useState< + MemoryCurveData | MemoryCurveDataAscend | undefined + >(undefined) - const [lineChartData, setLineChartData] = React.useState(undefined); + const [lineChartData, setLineChartData] = React.useState( + undefined + ) - const [devices, setDevices] = React.useState([]); - const [device, setDevice] = React.useState(''); - const [tag, setTag] = React.useState('Operator'); - const memoryCurveDataAllRef = React.useRef(undefined); - const memoryEventDataAllRef = React.useRef(undefined); + const [devices, setDevices] = React.useState([]) + const [device, setDevice] = React.useState('') + const [tag, setTag] = React.useState('Operator') + const memoryCurveDataAllRef = React.useRef(undefined) + const memoryEventDataAllRef = React.useRef(undefined) interface SelectedRange { - start: number; - end: number; - startTs: number; - endTs: number; + start: number + end: number + startTs: number + endTs: number } - const [selectedRange, setSelectedRange] = React.useState(); - const [searchOperatorName, setSearchOperatorName] = React.useState(''); - const [searchEventOperatorName, setSearchEventOperatorName] = React.useState(''); - const [filterEventSize, setFilterEventSize] = React.useState({}); - const [maxSize, setMaxSize] = React.useState({}); - - const getSearchIndex = function (): number { + const [selectedRange, setSelectedRange] = React.useState< + SelectedRange | undefined + >() + const [searchOperatorName, setSearchOperatorName] = React.useState('') + const [searchEventOperatorName, setSearchEventOperatorName] = React.useState( + '' + ) + const [filterEventSize, setFilterEventSize] = React.useState( + {} + ) + const [maxSize, setMaxSize] = React.useState({}) + + const getSearchIndex = function () { if (!memoryStatsData) { - return -1; + return -1 } for (let i = 0; i < memoryStatsData.columns.length; i++) { - if (memoryStatsData.columns[i].name === memoryStatsData.metadata.search) { - return i; + if (memoryStatsData.columns[i].name == memoryStatsData.metadata.search) { + return i } } - return -1; - }; + return -1 + } - const getStep = (size: number, indexBias: number): number => { - return 10 ** (Math.floor(Math.log10(size !== 0 ? size : 1)) - indexBias); - }; + const getStep = (size: number, indexBias: number) => { + return 10 ** (Math.floor(Math.log10(size != 0 ? size : 1)) - indexBias) + } - const filterByEventSize = (rows: T[] | undefined, size: Array): T[] | undefined => { + const filterByEventSize = ( + rows: T[] | undefined, + size: Array + ) => { const result = React.useMemo(() => { if (!rows) { - return undefined; + return undefined } // workaround type system const field = (row: any): number => { - const sizeColIndex = 1; - return row[sizeColIndex]; - }; + const sizeColIndex = 1 + return row[sizeColIndex] + } return rows.filter((row) => { - return field(row) >= size[0] && field(row) <= size[1]; - }); - }, [rows, size]); + return field(row) >= size[0] && field(row) <= size[1] + }) + }, [rows, size]) - return result; - }; + return result + } - const searchIndex = getSearchIndex(); - const getName = React.useCallback((row: any) => row[searchIndex], [searchIndex]); - const getNameAscend = (row: any): any => row[0]; - const [searchedTableDataRows] = useSearchDirectly(searchOperatorName, getName, memoryStatsData?.rows[device] ?? []); + const searchIndex = getSearchIndex() + const getName = React.useCallback((row: any) => row[searchIndex], [ + searchIndex + ]) + const getNameAscend = (row: any) => row[0] + const [searchedTableDataRows] = useSearchDirectly( + searchOperatorName, + getName, + memoryStatsData?.rows[device] ?? [] + ) const [searchedEventsTableDataRows] = useSearchDirectly( searchEventOperatorName, deviceTarget === 'Ascend' ? getNameAscend : getName, - filterByEventSize(memoryEventsData?.rows[device], filterEventSize[device] ?? [0, Infinity]) ?? [] - ); + filterByEventSize( + memoryEventsData?.rows[device], + filterEventSize[device] ?? [0, Infinity] + ) ?? [] + ) const onSearchOperatorChanged: TextFieldProps['onChange'] = (event) => { - setSearchOperatorName(event.target.value as string); - }; + setSearchOperatorName(event.target.value as string) + } const onSearchEventOperatorChanged: TextFieldProps['onChange'] = (event) => { - setSearchEventOperatorName(event.target.value as string); - }; + setSearchEventOperatorName(event.target.value as string) + } - const [selectedRecord, setSelectedRecord] = React.useState(); - const onRowSelected = (record?: object, rowIndex?: number): void => { - setSelectedRecord(record); - }; + const [selectedRecord, setSelectedRecord] = React.useState() + const onRowSelected = (record?: object, rowIndex?: number) => { + setSelectedRecord(record) + } - const onFilterEventSizeChanged = (event: any, newValue: number | number[]): void => { + const onFilterEventSizeChanged = ( + event: any, + newValue: number | number[] + ) => { setFilterEventSize({ ...filterEventSize, - [device]: newValue as number[], - }); - }; + [device]: newValue as number[] + }) + } - const onFilterEventMinSizeInputChanged = (event: React.ChangeEvent): void => { + const onFilterEventMinSizeInputChanged = ( + event: React.ChangeEvent + ) => { setFilterEventSize({ ...filterEventSize, - [device]: [Number(event.target.value), filterEventSize[device][1]], - }); - }; + [device]: [Number(event.target.value), filterEventSize[device][1]] + }) + } - const onFilterEventMaxSizeInputChanged = (event: React.ChangeEvent): void => { + const onFilterEventMaxSizeInputChanged = ( + event: React.ChangeEvent + ) => { setFilterEventSize({ ...filterEventSize, - [device]: [filterEventSize[device][0], Number(event.target.value)], - }); - }; + [device]: [filterEventSize[device][0], Number(event.target.value)] + }) + } React.useEffect(() => { - if (deviceTarget !== 'Ascend') { - api.defaultApi.memoryGet(run, worker, span, selectedRange?.startTs, selectedRange?.endTs).then((resp) => { - setMemoryStatsData(resp); - if (!devices || devices.length === 0) { + deviceTarget !== 'Ascend' && api.defaultApi + .memoryGet( + run, + worker, + span, + selectedRange?.startTs, + selectedRange?.endTs + ) + .then((resp) => { + setMemoryStatsData(resp) + if (!devices || devices.length == 0) { // setDevices only execute on view load. Since selection on curve // might filter all events later, some devices might is missing. - setDevices(Object.keys(resp.rows)); - setDevice(resp.metadata.default_device); + setDevices(Object.keys(resp.rows)) + setDevice(resp.metadata.default_device) } - }); - } - }, [run, worker, span, selectedRange]); + }) + }, [run, worker, span, selectedRange]) React.useEffect(() => { - api.defaultApi.memoryEventsGet(run, worker, span, selectedRange?.startTs, selectedRange?.endTs).then((resp) => { - const tempRes = deviceTarget === 'Ascend' ? (resp as MemoryEventsDataAll).operator : (resp as MemoryEventsData); - if (deviceTarget === 'Ascend') { - memoryEventDataAllRef.current = resp as MemoryEventsDataAll; - } - let curMaxSize: MaxEventSize = {}; - let curFilterEventSize: EventSizeFilter = {}; - Object.keys(tempRes.rows).forEach((deviceName) => { - curMaxSize[deviceName] = 0; - for (let i = 0; i < tempRes.rows[deviceName].length; i++) { - curMaxSize[deviceName] = Math.max(curMaxSize[deviceName], tempRes.rows[deviceName][i][1]); + api.defaultApi + .memoryEventsGet( + run, + worker, + span, + selectedRange?.startTs, + selectedRange?.endTs + ) + .then((resp) => { + const tempRes = deviceTarget === 'Ascend' ? (resp as MemoryEventsDataAll).operator : resp as MemoryEventsData + if (deviceTarget === 'Ascend') { + memoryEventDataAllRef.current = resp as MemoryEventsDataAll + } + let curMaxSize: MaxEventSize = {} + let curFilterEventSize: EventSizeFilter = {} + for (let deviceName in tempRes.rows) { + curMaxSize[deviceName] = 0 + for (let i = 0; i < tempRes.rows[deviceName].length; i++) { + curMaxSize[deviceName] = Math.max( + curMaxSize[deviceName], + tempRes.rows[deviceName][i][1] + ) + } + curFilterEventSize[deviceName] = [ + curMaxSize[deviceName] / 4, + curMaxSize[deviceName] + ] + curMaxSize[deviceName] = curMaxSize[deviceName] } - curFilterEventSize[deviceName] = [curMaxSize[deviceName] / 4, curMaxSize[deviceName]]; - curMaxSize[deviceName] = curMaxSize[deviceName]; - }); - setMaxSize(curMaxSize); - setFilterEventSize(curFilterEventSize); - setMemoryEventsData(tempRes); - }); - }, [run, worker, span, selectedRange]); + setMaxSize(curMaxSize) + setFilterEventSize(curFilterEventSize) + setMemoryEventsData(tempRes) + }) + }, [run, worker, span, selectedRange]) React.useEffect(() => { api.defaultApi.memoryCurveGet(run, worker, span).then((resp) => { // Reset the select range to null whenever run/worker/span changes - setSelectedRange(undefined); + setSelectedRange(undefined) if (deviceTarget === 'Ascend') { - const allCurveData = resp as MemoryCurveDataAll; - memoryCurveDataAllRef.current = allCurveData; - setDevice(allCurveData.default_device); - setDevices(allCurveData.devices); - setMemoryCurveData(allCurveData.total); - setTag('Operator'); + const allCurveData = resp as MemoryCurveDataAll + memoryCurveDataAllRef.current = allCurveData + setDevice(allCurveData.default_device) + setDevices(allCurveData.devices) + setMemoryCurveData(allCurveData.total) + setTag('Operator') } else { - setMemoryCurveData(resp as MemoryCurveData); + setMemoryCurveData(resp as MemoryCurveData) } - }); - }, [run, worker, span]); + }) + }, [run, worker, span]) React.useEffect(() => { if (memoryCurveData !== undefined) { @@ -277,118 +328,127 @@ export const MemoryView: React.FC = React.memo((props) => { setLineChartData({ title: memoryCurveData.metadata.peaks[device] ?? '', columns: memoryCurveData.columns[device] ?? [], - rows: memoryCurveData.rows[device] ?? {}, - }); + rows: memoryCurveData.rows[device] ?? {} + }) } else { setLineChartData({ title: memoryCurveData.metadata.peaks[device], columns: memoryCurveData.columns, - rows: memoryCurveData.rows[device] ?? [], - }); + rows: memoryCurveData.rows[device] ?? [] + }) } } - }, [memoryCurveData, device]); + }, [memoryCurveData, device]) const onDeviceChanged: SelectProps['onChange'] = (event) => { - setDevice(event.target.value as string); - setSelectedRange(undefined); - }; + setDevice(event.target.value as string) + setSelectedRange(undefined) + } const onTagChanged: SelectProps['onChange'] = (event) => { - setTag(event.target.value as string); + setTag(event.target.value as string) if (event.target.value === 'Operator') { - setMemoryCurveData(memoryCurveDataAllRef.current?.total); - setMemoryEventsData(memoryEventDataAllRef.current?.operator); - setSelectedRange(undefined); + setMemoryCurveData(memoryCurveDataAllRef.current?.total) + setMemoryEventsData(memoryEventDataAllRef.current?.operator) + setSelectedRange(undefined) } else { - setMemoryCurveData(memoryCurveDataAllRef.current?.ptaGe); - setMemoryEventsData(memoryEventDataAllRef.current?.component); + setMemoryCurveData(memoryCurveDataAllRef.current?.ptaGe) + setMemoryEventsData(memoryEventDataAllRef.current?.component) } - }; + } - const onSelectedRangeChanged = (start: number, end: number): void => { + const onSelectedRangeChanged = (start: number, end: number) => { if (start > end) { - setSelectedRange(undefined); - return; + setSelectedRange(undefined) + return } - let allDatas = deviceTarget === 'Ascend' ? memoryCurveData?.rows[device]?.Allocated : memoryCurveData?.rows[device]; + let allDatas = deviceTarget === 'Ascend' ? + memoryCurveData?.rows[device]?.Allocated : memoryCurveData?.rows[device] if (allDatas.length <= 1) { - setSelectedRange(undefined); - return; + setSelectedRange(undefined) + return } - let startTs = 0; - let endTs = 0; - let realStart = 0; - let realEnd = 0; - let startId = 1; - let endId = 0; - let needLoopStart = true; + let startTs = 0 + let endTs = 0 + let realStart = 0 + let realEnd = 0 + let startId = 1 + let endId = 0 + let needLoopStart = true for (let i = 1; i < allDatas.length; i++) { if (startId > start && needLoopStart) { - needLoopStart = false; - realStart = i - 1; + needLoopStart = false + realStart = i - 1 } if (allDatas[i][0] !== allDatas[i - 1][0]) { if (startId <= start) { - startId += 1; + startId += 1 } - endId += 1; + endId += 1 } if (endId > end) { - realEnd = i - 1; - break; + realEnd = i - 1 + break } else { - realEnd = i; + realEnd = i if (needLoopStart) { - realStart = i; + realStart = i } } } if (deviceTarget === 'Ascend') { - startTs = allDatas[realStart][0]; - endTs = allDatas[realEnd][0]; + startTs = allDatas[realStart][0] + endTs = allDatas[realEnd][0] } else { - let bias = memoryCurveData?.metadata.first_ts ?? 0; - let scale = 1 / (memoryCurveData?.metadata.time_factor ?? 1); - startTs = Math.round((allDatas[realStart][0] * scale) + bias); - endTs = Math.round((allDatas[realEnd][0] * scale) + bias); + let bias = memoryCurveData?.metadata.first_ts ?? 0 + let scale = 1 / (memoryCurveData?.metadata.time_factor ?? 1) + startTs = Math.round(allDatas[realStart][0] * scale + bias) + endTs = Math.round(allDatas[realEnd][0] * scale + bias) } - setSelectedRange({ start, end, startTs, endTs }); - }; + setSelectedRange({ start, end, startTs, endTs }) + } return (
- - + + - + - {(graph): JSX.Element => ( - + {(graph) => ( + - Device - + {devices.map((device) => ( + {device} ))} - {deviceTarget === 'Ascend' && ( + {deviceTarget === 'Ascend' && - Group By - + {tags.map((device) => ( + {device} ))} - )} + } {showCurve() && lineChartData && lineChartData.columns.length > 0 && ( @@ -411,28 +471,28 @@ export const MemoryView: React.FC = React.memo((props) => { {showEvents() && ( <> - {(deviceTarget !== 'Ascend' || tag === 'Operator') && ( + {(deviceTarget !== 'Ascend' || tag === 'Operator') && - + - + = React.memo((props) => { min: 0, max: filterEventSize[device]?.[1] ?? 0, type: 'number', - 'aria-labelledby': 'input-slider', + 'aria-labelledby': 'input-slider' }} /> @@ -449,7 +509,7 @@ export const MemoryView: React.FC = React.memo((props) => { className={classes.filterSlider} value={filterEventSize[device] ?? [0, 0]} onChange={onFilterEventSizeChanged} - aria-labelledby='input-slider' + aria-labelledby="input-slider" min={0} max={maxSize[device] ?? 0} step={getStep(maxSize[device] ?? 0, 5)} @@ -458,7 +518,7 @@ export const MemoryView: React.FC = React.memo((props) => { = React.memo((props) => { min: filterEventSize[device]?.[0] ?? 0, max: maxSize[device] ?? 0, type: 'number', - 'aria-labelledby': 'input-slider', + 'aria-labelledby': 'input-slider' }} /> - )} - + } + - {(data): JSX.Element => { + {(data) => { return ( - ); + ) }} @@ -494,29 +555,29 @@ export const MemoryView: React.FC = React.memo((props) => { )} {deviceTarget !== 'Ascend' && ( <> - - + + - + - {(data): JSX.Element => ( + {(data) => ( )} @@ -527,5 +588,5 @@ export const MemoryView: React.FC = React.memo((props) => {
- ); -}); + ) +}) diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/ModuleView.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/ModuleView.tsx index a66a825365fd3c813e58865c609643ab547b4c49..396188aba4e69cced5208ff4af86631bf02e172c 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/ModuleView.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/ModuleView.tsx @@ -1,227 +1,241 @@ /*--------------------------------------------------------------------------------------------- * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import Card from '@material-ui/core/Card'; -import CardHeader from '@material-ui/core/CardHeader'; -import InputLabel from '@material-ui/core/InputLabel'; -import MenuItem from '@material-ui/core/MenuItem'; -import Select, { SelectProps } from '@material-ui/core/Select'; -import { makeStyles } from '@material-ui/core/styles'; -import { message, Table } from 'antd'; -import * as React from 'react'; -import { FlameGraph } from 'react-flame-graph'; -import { defaultApi, KeyedColumn, ModuleStats, ModuleViewData, OperatorNode } from '../api'; +import Card from '@material-ui/core/Card' +import CardHeader from '@material-ui/core/CardHeader' +import InputLabel from '@material-ui/core/InputLabel' +import MenuItem from '@material-ui/core/MenuItem' +import Select, { SelectProps } from '@material-ui/core/Select' +import { makeStyles } from '@material-ui/core/styles' +import { Table } from 'antd' +import * as React from 'react' +import { FlameGraph } from 'react-flame-graph' +import { + defaultApi, + KeyedColumn, + ModuleStats, + ModuleViewData, + OperatorNode +} from '../api' const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, hide: { - display: 'none', - }, -})); + display: 'none' + } +})) export interface IProps { - run: string; - worker: string; - span: string; + run: string + worker: string + span: string } -const getKeyedTableColumns = (columns: KeyedColumn[]): any[] => { +const getKeyedTableColumns = (columns: KeyedColumn[]) => { return columns.map((col) => { return { dataIndex: col.key, key: col.key, - title: col.name, - }; - }); -}; + title: col.name + } + }) +} -const getTableRows = (key: number, rows: ModuleStats[]): any[] => { - let initialKey = key; +const getTableRows = (key: number, rows: ModuleStats[]) => { return rows.map((row) => { - const currentKey = initialKey++; const data: any = { - key: currentKey, + key: key++, name: row.name, occurences: row.occurences, operators: row.operators, host_duration: row.host_duration, self_host_duration: row.self_host_duration, device_duration: row.device_duration, - self_device_duration: row.self_device_duration, - }; + self_device_duration: row.self_device_duration + } if (row.children.length) { - data.children = getTableRows(key, row.children); + data.children = getTableRows(key, row.children) } - return data; - }); -}; + return data + }) +} -const getFlameGraphData = (rows: ModuleStats[]): any[] => { +const getFlameGraphData = (rows: ModuleStats[]) => { return rows.map((row) => { const data: any = { name: row.name, value: row.avg_duration, - tooltip: `${row.name} (module id: ${row.id}): ${row.avg_duration} us`, - }; + tooltip: `${row.name} (module id: ${row.id}): ${row.avg_duration} us` + } if (row.children.length) { - data.children = getFlameGraphData(row.children); + data.children = getFlameGraphData(row.children) } - return data; - }); -}; + return data + }) +} const getTreeHeight = (row: ModuleStats): number => { - if (row.children?.length) { - return 1 + Math.max(...row.children.map((child) => getTreeHeight(child))); + if (row.children && row.children.length) { + return 1 + Math.max(...row.children.map((child) => getTreeHeight(child))) } else { - return 1; + return 1 } -}; +} -const getOperatorTree = (level: number, row: OperatorNode, result: object[]): void => { +const getOperatorTree = ( + level: number, + row: OperatorNode, + result: object[] +) => { result.push({ level: level, name: row.name, start: row.start_time, - end: row.end_time, - }); + end: row.end_time + }) if (row.children.length) { - row.children.forEach((child) => getOperatorTree(level + 1, child, result)); + row.children.forEach((child) => getOperatorTree(level + 1, child, result)) } -}; +} export const ModuleView: React.FC = (props) => { - const { run, worker, span } = props; - const classes = useStyles(); + const { run, worker, span } = props + const classes = useStyles() - const [moduleView, setModuleView] = React.useState(undefined); - const [flameData, setFlameData] = React.useState([]); - const [flameHeight, setFlameHeight] = React.useState(0); - const [modules, setModules] = React.useState([]); - const [module, setModule] = React.useState(0); + const [moduleView, setModuleView] = React.useState< + ModuleViewData | undefined + >(undefined) + const [flameData, setFlameData] = React.useState([]) + const [flameHeight, setFlameHeight] = React.useState(0) + const [modules, setModules] = React.useState([]) + const [module, setModule] = React.useState(0) - const [columns, setColumns] = React.useState([]); - const [rows, setRows] = React.useState([]); + const [columns, setColumns] = React.useState([]) + const [rows, setRows] = React.useState([]) - const cardRef = React.useRef(null); - const [cardWidth, setCardWidth] = React.useState(undefined); - const timelineRef = React.useRef(null); + const cardRef = React.useRef(null) + const [cardWidth, setCardWidth] = React.useState( + undefined + ) + const timelineRef = React.useRef(null) React.useEffect(() => { defaultApi .moduleGet(run, worker, span) .then((resp) => { - setModuleView(resp); + setModuleView(resp) if (resp) { // set the flamegraph data - const flameGraphData: any[] = getFlameGraphData(resp.data); - setFlameData(flameGraphData); - const flameGraphHeight = Math.max(...flameGraphData.map((x) => getTreeHeight(x))); - setFlameHeight(flameGraphHeight * 25); - setModules(Array.from(Array(flameGraphData.length).keys())); - setModule(0); + const flameData: any[] = getFlameGraphData(resp.data) + setFlameData(flameData) + const flameHeight = Math.max( + ...flameData.map((x) => getTreeHeight(x)) + ) + setFlameHeight(flameHeight * 25) + setModules(Array.from(Array(flameData.length).keys())) + setModule(0) // set the tree table data - setColumns(getKeyedTableColumns(resp.columns)); - setRows(getTableRows(1, resp.data)); + setColumns(getKeyedTableColumns(resp.columns)) + setRows(getTableRows(1, resp.data)) } }) .catch((e) => { - if (e.status === 404) { - setModules([]); - setFlameData([]); - setRows([]); + if (e.status == 404) { + setModules([]) + setFlameData([]) + setRows([]) } - }); + }) if (cardRef.current) { - setCardWidth(cardRef.current.offsetWidth - 10); + setCardWidth(cardRef.current.offsetWidth - 10) } try { if (timelineRef.current) { defaultApi.treeGet(run, worker, span).then((resp) => { if (resp) { - const data = new google.visualization.DataTable(); - data.addColumn({ type: 'string', id: 'Layer' }); - data.addColumn({ type: 'string', id: 'Name' }); - data.addColumn({ type: 'string', role: 'tooltip' }); - data.addColumn({ type: 'number', id: 'Start' }); - data.addColumn({ type: 'number', id: 'End' }); - - let timelineData: any[] = []; - getOperatorTree(0, resp, timelineData); - timelineData.sort((a, b) => a.level - b.level); - const maxLevel = timelineData[timelineData.length - 1].level; - timelineData.forEach((d) => { + const data = new google.visualization.DataTable() + data.addColumn({ type: 'string', id: 'Layer' }) + data.addColumn({ type: 'string', id: 'Name' }) + data.addColumn({ type: 'string', role: 'tooltip' }) + data.addColumn({ type: 'number', id: 'Start' }) + data.addColumn({ type: 'number', id: 'End' }) + + let timeline_data: any[] = [] + getOperatorTree(0, resp, timeline_data) + timeline_data.sort((a, b) => a.level - b.level) + const max_level = timeline_data[timeline_data.length - 1].level + timeline_data.forEach((d) => { data.addRow([ d.level.toString(), d.name, `${d.name} Duration: ${d.end - d.start} us`, d.start / 1000.0, // the time unit is us returned from server, but the google charts only accept milliseconds here - d.end / 1000.0, - ]); - }); + d.end / 1000.0 + ]) + }) - const chart = new google.visualization.Timeline(timelineRef.current); + const chart = new google.visualization.Timeline(timelineRef.current) const options = { - height: (maxLevel + 1) * 50, + height: (max_level + 1) * 50, tooltip: { - isHtml: true, + isHtml: true }, timeline: { - showRowLabels: false, - }, - }; - chart.draw(data, options); + showRowLabels: false + } + } + chart.draw(data, options) } - }); + }) } } catch (e) { - message.warning('Timeline in module view is not supported offline.'); + console.warn('Timeline in module view is not supported offline.') } - }, [run, worker, span]); + }, [run, worker, span]) const handleModuleChange: SelectProps['onChange'] = (event) => { - setModule(event.target.value as number); - }; + setModule(event.target.value as number) + } - const moduleComponent = (): JSX.Element => { + const moduleComponent = () => { const moduleFragment = ( - Module + Module - ); + ) if (!modules || modules.length <= 1) { - return
{moduleFragment}
; + return
{moduleFragment}
} else { - return moduleFragment; + return moduleFragment } - }; + } return (
- - + + {rows && rows.length > 0 && ( )} @@ -233,12 +247,13 @@ export const ModuleView: React.FC = (props) => { data={flameData[module]} height={flameHeight} width={cardWidth} - onChange={(node: any): void => {}} + onChange={(node: any) => { + }} /> )}
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Operator.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Operator.tsx index b19bef1967a31915c3c1d660b699b11c83ebb226..7278ca59c938874b85b2a52abbb36c59f924373b 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Operator.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Operator.tsx @@ -15,99 +15,119 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import FormControlLabel from '@material-ui/core/FormControlLabel'; -import Grid from '@material-ui/core/Grid'; -import GridList from '@material-ui/core/GridList'; -import GridListTile from '@material-ui/core/GridListTile'; -import InputLabel from '@material-ui/core/InputLabel'; -import MenuItem from '@material-ui/core/MenuItem'; -import Radio from '@material-ui/core/Radio'; -import RadioGroup, { RadioGroupProps } from '@material-ui/core/RadioGroup'; -import Select, { SelectProps } from '@material-ui/core/Select'; -import { makeStyles } from '@material-ui/core/styles'; -import TextField, { StandardTextFieldProps, TextFieldProps } from '@material-ui/core/TextField'; -import * as React from 'react'; -import * as api from '../api'; -import { OperationTableData, OperationTableDataInner, OperatorGraph } from '../api'; -import { OperationGroupBy } from '../constants/groupBy'; -import { useSearchDirectly } from '../utils/search'; -import { topIsValid, UseTop, useTopN } from '../utils/top'; -import { PieChart } from './charts/PieChart'; -import { DataLoading } from './DataLoading'; -import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers'; -import { OperationTable } from './tables/OperationTable'; +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import FormControlLabel from '@material-ui/core/FormControlLabel' +import Grid from '@material-ui/core/Grid' +import GridList from '@material-ui/core/GridList' +import GridListTile from '@material-ui/core/GridListTile' +import InputLabel from '@material-ui/core/InputLabel' +import MenuItem from '@material-ui/core/MenuItem' +import Radio from '@material-ui/core/Radio' +import RadioGroup, { RadioGroupProps } from '@material-ui/core/RadioGroup' +import Select, { SelectProps } from '@material-ui/core/Select' +import { makeStyles } from '@material-ui/core/styles' +import TextField, { + StandardTextFieldProps, + TextFieldProps +} from '@material-ui/core/TextField' +import * as React from 'react' +import * as api from '../api' +import { + OperationTableData, + OperationTableDataInner, + OperatorGraph +} from '../api' +import { OperationGroupBy } from '../constants/groupBy' +import { useSearchDirectly } from '../utils/search' +import { topIsValid, UseTop, useTopN } from '../utils/top' +import { PieChart } from './charts/PieChart' +import { DataLoading } from './DataLoading' +import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers' +import { OperationTable } from './tables/OperationTable' import { - deviceSelfTimeTooltip, - deviceSelfTimeTooltipAscend, - deviceTotalTimeTooltip, - deviceTotalTimeTooltipAscend, - hostSelfTimeTooltip, - hostTotalTimeTooltip, -} from './TooltipDescriptions'; + DeviceSelfTimeTooltip, + DeviceSelfTimeTooltipAscend, + DeviceTotalTimeTooltip, + DeviceTotalTimeTooltipAscend, + HostSelfTimeTooltip, + HostTotalTimeTooltip +} from './TooltipDescriptions' const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, verticalInput: { display: 'flex', - alignItems: 'center', + alignItems: 'center' }, inputWidth: { - width: '4em', + width: '4em' }, inputWidthOverflow: { minWidth: '15em', - whiteSpace: 'nowrap', + whiteSpace: 'nowrap' }, full: { - width: '100%', + width: '100%' }, description: { - marginLeft: theme.spacing(1), - }, -})); + marginLeft: theme.spacing(1) + } +})) export interface IProps { - run: string; - worker: string; - span: string; - deviceTarget: string; + run: string + worker: string + span: string + deviceTarget: string } export const Operator: React.FC = (props) => { - const { run, worker, span, deviceTarget } = props; - const classes = useStyles(); - const tooltipCommonClasses = useTooltipCommonStyles(); + const { run, worker, span, deviceTarget } = props + const classes = useStyles() + const tooltipCommonClasses = useTooltipCommonStyles() const chartHeaderRenderer = React.useMemo( () => makeChartHeaderRenderer(tooltipCommonClasses), [tooltipCommonClasses] - ); + ) - const [operatorGraph, setOperatorGraph] = React.useState(undefined); - const [operatorTable, setOperatorTable] = React.useState(undefined); - const [sortColumn, setSortColumn] = React.useState(''); - const [tableTooltips, setTableTooltips] = React.useState(undefined); - const [groupBy, setGroupBy] = React.useState(OperationGroupBy.OPERATION); - const [searchOperatorName, setSearchOperatorName] = React.useState(''); + const [operatorGraph, setOperatorGraph] = React.useState< + OperatorGraph | undefined + >(undefined) + const [operatorTable, setOperatorTable] = React.useState< + OperationTableData | undefined + >(undefined) + const [sortColumn, setSortColumn] = React.useState('') + const [tableTooltips, setTableTooltips] = React.useState( + undefined + ) + const [groupBy, setGroupBy] = React.useState(OperationGroupBy.Operation) + const [searchOperatorName, setSearchOperatorName] = React.useState('') const [topText, actualTop, useTop, setTopText, setUseTop] = useTopN({ - defaultUseTop: UseTop.USE, - defaultTop: 10, - }); + defaultUseTop: UseTop.Use, + defaultTop: 10 + }) - const getName = React.useCallback((row: OperationTableDataInner) => row.name, []); - const [searchedOperatorTable] = useSearchDirectly(searchOperatorName, getName, operatorTable); + const getName = React.useCallback( + (row: OperationTableDataInner) => row.name, + [] + ) + const [searchedOperatorTable] = useSearchDirectly( + searchOperatorName, + getName, + operatorTable + ) const onSearchOperatorChanged: TextFieldProps['onChange'] = (event) => { - setSearchOperatorName(event.target.value as string); - }; + setSearchOperatorName(event.target.value as string) + } React.useEffect(() => { if (operatorGraph) { @@ -115,45 +135,49 @@ export const Operator: React.FC = (props) => { operatorGraph.device_self_time?.rows.length ?? 0, operatorGraph.device_total_time?.rows.length ?? 0, operatorGraph.host_self_time.rows?.length ?? 0, - operatorGraph.host_total_time.rows?.length ?? 0, - ]; - setTopText(String(Math.min(Math.max(...counts), 10))); + operatorGraph.host_total_time.rows?.length ?? 0 + ] + setTopText(String(Math.min(Math.max(...counts), 10))) } - }, [operatorGraph]); + }, [operatorGraph]) React.useEffect(() => { - api.defaultApi.operationTableGet(run, worker, span, groupBy).then((resp) => { - setSortColumn(resp.metadata.sort); - setTableTooltips(resp.metadata.tooltips); - setOperatorTable(resp.data); - }); - }, [run, worker, span, groupBy]); + api.defaultApi + .operationTableGet(run, worker, span, groupBy) + .then((resp) => { + setSortColumn(resp.metadata.sort) + setTableTooltips(resp.metadata.tooltips) + setOperatorTable(resp.data) + }) + }, [run, worker, span, groupBy]) React.useEffect(() => { - api.defaultApi.operationGet(run, worker, span, groupBy).then((resp) => { - setOperatorGraph(resp); - }); - }, [run, worker, span, groupBy]); + api.defaultApi + .operationGet(run, worker, span, groupBy) + .then((resp) => { + setOperatorGraph(resp) + }) + }, [run, worker, span, groupBy]) const onGroupByChanged: SelectProps['onChange'] = (event) => { - setGroupBy(event.target.value as OperationGroupBy); - }; + setGroupBy(event.target.value as OperationGroupBy) + } const onUseTopChanged: RadioGroupProps['onChange'] = (event) => { - setUseTop(event.target.value as UseTop); - }; + setUseTop(event.target.value as UseTop) + } - const onTopChanged = (event: React.ChangeEvent): void => { - setTopText(event.target.value); - }; + const onTopChanged = (event: React.ChangeEvent) => { + setTopText(event.target.value) + } const inputProps: StandardTextFieldProps['inputProps'] = { - min: 1, - }; + min: 1 + } - const renderCharts = (graph: api.OperatorGraph): JSX.Element => { + const renderCharts = (graph: api.OperatorGraph) => { return ( - + {graph.device_self_time && ( @@ -161,7 +185,7 @@ export const Operator: React.FC = (props) => { )} @@ -176,7 +200,7 @@ export const Operator: React.FC = (props) => { )} @@ -187,7 +211,12 @@ export const Operator: React.FC = (props) => { {graph.host_self_time.title && ( - + )} @@ -195,34 +224,47 @@ export const Operator: React.FC = (props) => { {graph.host_total_time.title && ( - + )} - ); - }; + ) + } return (
- - + + - + - } label='All operators' /> - } label='Top operators to show' /> + } + label="All operators" + /> + } + label="Top operators to show" + /> - {useTop === UseTop.USE && ( + {useTop === UseTop.Use && ( = (props) => { {renderCharts} - + - + - Group By - + + Operator + Input Shape + + + Operator + @@ -248,10 +298,10 @@ export const Operator: React.FC = (props) => { classes={{ root: classes.inputWidthOverflow }} value={searchOperatorName} onChange={onSearchOperatorChanged} - type='search' - label='Search by Name' + type="search" + label="Search by Name" inputProps={{ - maxLength: 200, + maxLength: 200 }} /> @@ -259,7 +309,7 @@ export const Operator: React.FC = (props) => { - {(table): JSX.Element => ( + {(table) => ( = (props) => {
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Overview.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Overview.tsx index 6a81c567bc5e44b1dd6eb4746135d61268cadb81..e5f6f17bdaae3d276f24ed24f3566fc994fec0ad 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Overview.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/Overview.tsx @@ -2,50 +2,53 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import Card from '@material-ui/core/Card'; -import CardContent from '@material-ui/core/CardContent'; -import CardHeader from '@material-ui/core/CardHeader'; -import Grid from '@material-ui/core/Grid'; -import { makeStyles } from '@material-ui/core/styles'; -import { Table } from 'antd'; -import { ColumnsType } from 'antd/es/table'; -import * as React from 'react'; -import * as api from '../api'; -import { PieChart } from './charts/PieChart'; -import { SteppedAreaChart } from './charts/SteppedAreaChart'; -import { DataLoading } from './DataLoading'; -import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers'; -import { TextListItem } from './TextListItem'; -import { stepTimeBreakDownTooltip } from './TooltipDescriptions'; -import { transformPerformanceIntoPie, transformPerformanceIntoTable } from './transform'; +import Card from '@material-ui/core/Card' +import CardContent from '@material-ui/core/CardContent' +import CardHeader from '@material-ui/core/CardHeader' +import Grid from '@material-ui/core/Grid' +import { makeStyles } from '@material-ui/core/styles' +import { Table } from 'antd' +import { ColumnsType } from 'antd/es/table' +import * as React from 'react' +import * as api from '../api' +import { PieChart } from './charts/PieChart' +import { SteppedAreaChart } from './charts/SteppedAreaChart' +import { DataLoading } from './DataLoading' +import { makeChartHeaderRenderer, useTooltipCommonStyles } from './helpers' +import { TextListItem } from './TextListItem' +import { StepTimeBreakDownTooltip } from './TooltipDescriptions' +import { + transformPerformanceIntoPie, + transformPerformanceIntoTable +} from './transform' -const topGraphHeight = 230; +const topGraphHeight = 230 const useStyles = makeStyles((theme) => ({ root: { - flexGrow: 1, + flexGrow: 1 }, pre: { '& ul': { margin: 0, paddingLeft: theme.spacing(3), - ...theme.typography.body1, + ...theme.typography.body1 }, '& li': {}, '& a': { - color: '#ffa726', + color: '#ffa726' }, '& a:active': { - color: '#ffa726', + color: '#ffa726' }, '& p': { margin: 0, ...theme.typography.subtitle1, - fontWeight: theme.typography.fontWeightBold, - }, + fontWeight: theme.typography.fontWeightBold + } }, topGraph: { - height: topGraphHeight + 40, + height: topGraphHeight + 40 }, table: { height: '100%', @@ -54,87 +57,89 @@ const useStyles = makeStyles((theme) => ({ height: 20, fontSize: '10pt', '& > td': { - padding: '0 8px!important', - }, - }, - }, -})); + padding: '0 8px!important' + } + } + } +})) export interface IProps { - run: string; - worker: string; - span: string; + run: string + worker: string + span: string } export const Overview: React.FC = (props) => { - const { run, worker, span } = props; + const { run, worker, span } = props - const [steps, setSteps] = React.useState(undefined); - const [performances, setPerformances] = React.useState([]); - const [environments, setEnvironments] = React.useState([]); - const [gpuMetrics, setGpuMetrics] = React.useState(undefined); - const [recommendations, setRecommendations] = React.useState(''); - const [columns, setColumns] = React.useState>([]); + const [steps, setSteps] = React.useState(undefined) + const [performances, setPerformances] = React.useState([]) + const [environments, setEnvironments] = React.useState([]) + const [gpuMetrics, setGpuMetrics] = React.useState< + api.GpuMetrics | undefined + >(undefined) + const [recommendations, setRecommendations] = React.useState('') + const [columns, setColumns] = React.useState>([]) const tableRows = React.useMemo(() => { - let dataInfo: api.Graph = transformPerformanceIntoTable(performances); + let dataInfo: api.Graph = transformPerformanceIntoTable(performances) if (dataInfo.columns.length < 3) { - return []; + return [] } - const stringCompare = (a: string, b: string): number => a.localeCompare(b); - const numberCompare = (a: number, b: number): number => a - b; - let column: any[] = dataInfo.columns.map((item) => { + const stringCompare = (a: string, b: string) => a.localeCompare(b) + const numberCompare = (a: number, b: number) => a - b + let column: any[] = dataInfo.columns.map(item => { return { title: item.name, key: item.name, dataIndex: item.name, - sorter: - item.type === 'string' - ? (a: any, b: any): number => stringCompare(a[item.name], b[item.name]) - : (a: any, b: any): number => numberCompare(a[item.name], b[item.name]), - }; - }); - setColumns(column); + sorter: item.type == 'string' ? (a: any, b: any) => stringCompare(a[item.name], b[item.name]) + : (a: any, b: any) => numberCompare(a[item.name], b[item.name]) + } + }) + setColumns(column) return dataInfo.rows.map((row, index) => { if (row.length < 3) { - return null; + return null } return { key: index, [dataInfo.columns[0].name]: row[0], [dataInfo.columns[1].name]: row[1], - [dataInfo.columns[2].name]: row[2], - }; - }); - }, [performances]); + [dataInfo.columns[2].name]: row[2] + } + }) + }, [performances]) const synthesizedPieGraph = React.useMemo(() => { - return transformPerformanceIntoPie(performances); - }, [performances]); + return transformPerformanceIntoPie(performances) + }, [performances]) React.useEffect(() => { api.defaultApi.overviewGet(run, worker, span).then((resp) => { - setPerformances(resp.performance); - setEnvironments(resp.environments); - setSteps(resp.steps); - setRecommendations(resp.recommendations); - setGpuMetrics(resp.gpu_metrics); - }); - }, [run, worker, span]); + setPerformances(resp.performance) + setEnvironments(resp.environments) + setSteps(resp.steps) + setRecommendations(resp.recommendations) + setGpuMetrics(resp.gpu_metrics) + }) + }, [run, worker, span]) - const classes = useStyles(); - const tooltipCommonClasses = useTooltipCommonStyles(); + const classes = useStyles() + const tooltipCommonClasses = useTooltipCommonStyles() const chartHeaderRenderer = React.useMemo( () => makeChartHeaderRenderer(tooltipCommonClasses, false), [tooltipCommonClasses] - ); + ) const stepTimeBreakDownTitle = React.useMemo( - () => chartHeaderRenderer('Step Time Breakdown', stepTimeBreakDownTooltip), + () => chartHeaderRenderer('Step Time Breakdown', StepTimeBreakDownTooltip), [tooltipCommonClasses, chartHeaderRenderer] - ); + ) - const cardSizes = gpuMetrics ? ([2, 3, 7] as const) : ([4, undefined, 8] as const); + const cardSizes = gpuMetrics + ? ([2, 3, 7] as const) + : ([4, undefined, 8] as const) return (
@@ -143,11 +148,14 @@ export const Overview: React.FC = (props) => { {React.useMemo( () => ( - - + + {environments.map((environment) => ( - + ))} @@ -157,19 +165,28 @@ export const Overview: React.FC = (props) => { {gpuMetrics && ( - - - + + + {gpuMetrics.data.map((metric) => ( - + ))} )} - - + + @@ -182,7 +199,10 @@ export const Overview: React.FC = (props) => { /> - + @@ -191,12 +211,16 @@ export const Overview: React.FC = (props) => { - + - {(graph): JSX.Element => ( - + {(graph) => ( + )} @@ -205,13 +229,13 @@ export const Overview: React.FC = (props) => { - - + +
@@ -221,5 +245,5 @@ export const Overview: React.FC = (props) => {
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TextListItem.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TextListItem.tsx index 59eb79c2a8f05cc750d264880bb66ab646c4bbb4..c5e4eee5251f7ab8afedf58f305a5cb30ad92a19 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TextListItem.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TextListItem.tsx @@ -2,69 +2,76 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import Grid from '@material-ui/core/Grid'; -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; +import Grid from '@material-ui/core/Grid' +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' export interface IStylesProps { - root?: string; - name?: string; + root?: string + name?: string } export interface IProps { - name: string; - value?: string; - description?: string; - extra?: string; - classes?: IStylesProps; - dangerouslyAllowHtml?: boolean; + name: string + value?: string + description?: string + extra?: string + classes?: IStylesProps + dangerouslyAllowHtml?: boolean } const useStyles = makeStyles((theme) => ({ label: { ...theme.typography.subtitle2, - fontWeight: 'bolder', + fontWeight: 'bolder' }, value: { textAlign: 'right', ...theme.typography.subtitle2, - fontWeight: 'bolder', - }, -})); + fontWeight: 'bolder' + } +})) export const TextListItem: React.FC = (props) => { - const classes = useStyles(); + const classes = useStyles() - const getSizes = function (): readonly any[] { + const getSizes = function () { if (props.value && props.extra) { - return [4, 4, 4] as const; + return [4, 4, 4] as const } if (props.value) { if (props.value.length > props.name.length) { - return [4, 8, undefined] as const; + return [4, 8, undefined] as const } - return [8, 4, undefined] as const; + return [8, 4, undefined] as const } - return [12, undefined, undefined] as const; - }; + return [12, undefined, undefined] as const + } - const sizes = getSizes(); + const sizes = getSizes() - const renderSpan = function (content: string, className?: string): React.JSX.Element { + const renderSpan = function (content: string, className?: string) { if (props.dangerouslyAllowHtml) { - return ; + return ( + + ) } - return {content}; - }; + return {content} + } return ( - + {renderSpan(props.name, props.classes?.name)} - {props.description && {renderSpan(props.description)}} + {props.description && ( + {renderSpan(props.description)} + )} {props.value && ( @@ -78,5 +85,5 @@ export const TextListItem: React.FC = (props) => { )} - ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TooltipDescriptions.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TooltipDescriptions.ts index 6d3631fee97a4dd8da5ebde1550573d8c6e501fa..8f434221ddbdbd48a7a41ab6c73b2901519007c5 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TooltipDescriptions.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TooltipDescriptions.ts @@ -2,37 +2,37 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -export const stepTimeBreakDownTooltip = `The time spent on each step is broken down into multiple categories as follows: +export const StepTimeBreakDownTooltip = `The time spent on each step is broken down into multiple categories as follows: Kernel: Kernels execution time on GPU device; Memcpy: GPU involved memory copy time (either D2D, D2H or H2D); Memset: GPU involved memory set time; Runtime: CUDA runtime execution time on host side; Such as cudaLaunchKernel, cudaMemcpyAsync, cudaStreamSynchronize, ... DataLoader: The data loading time spent in PyTorch DataLoader object; CPU Exec: Host compute time, including every PyTorch operator running time; -Other: The time not included in any of the above.`; +Other: The time not included in any of the above.` -export const deviceSelfTimeTooltip = `The accumulated time spent on GPU, not including this operator’s child operators.`; +export const DeviceSelfTimeTooltip = `The accumulated time spent on GPU, not including this operator’s child operators.` -export const deviceSelfTimeTooltipAscend = `The accumulated time spent on NPU, not including this operator’s child operators.`; +export const DeviceSelfTimeTooltipAscend = `The accumulated time spent on NPU, not including this operator’s child operators.` -export const deviceTotalTimeTooltip = `The accumulated time spent on GPU, including this operator’s child operators.`; +export const DeviceTotalTimeTooltip = `The accumulated time spent on GPU, including this operator’s child operators.` -export const deviceTotalTimeTooltipAscend = `The accumulated time spent on NPU, including this operator’s child operators.`; +export const DeviceTotalTimeTooltipAscend = `The accumulated time spent on NPU, including this operator’s child operators.` -export const hostSelfTimeTooltip = `The accumulated time spent on Host, not including this operator’s child operators.`; +export const HostSelfTimeTooltip = `The accumulated time spent on Host, not including this operator’s child operators.` -export const hostTotalTimeTooltip = `The accumulated time spent on Host, including this operator’s child operators.`; +export const HostTotalTimeTooltip = `The accumulated time spent on Host, including this operator’s child operators.` -export const gpuKernelTotalTimeTooltip = `The accumulated time of all calls of this kernel.`; +export const GPUKernelTotalTimeTooltip = `The accumulated time of all calls of this kernel.` -export const tensorCoresPieChartTooltip = `The accumulated time of all kernels using or not using Tensor Cores.`; +export const TensorCoresPieChartTooltip = `The accumulated time of all kernels using or not using Tensor Cores.` -export const tensorCoresPieChartTooltipAscend = `The accumulated time of all kernels group by Accelerator Core.`; +export const TensorCoresPieChartTooltipAscend = `The accumulated time of all kernels group by Accelerator Core.` -export const distributedGpuInfoTableTooltip = `Information about GPU hardware used during the run.`; +export const DistributedGpuInfoTableTooltip = `Information about GPU hardware used during the run.` -export const distributedOverlapGraphTooltip = `The time spent on computation vs communication.`; +export const DistributedOverlapGraphTooltip = `The time spent on computation vs communication.` -export const distributedWaittimeGraphTooltip = `The time spent waiting vs communicating between devices.`; +export const DistributedWaittimeGraphTooltip = `The time spent waiting vs communicating between devices.` -export const distributedCommopsTableTooltip = `Statistics for operations managing communications between nodes.`; +export const DistributedCommopsTableTooltip = `Statistics for operations managing communications between nodes.` diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TraceView.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TraceView.tsx index be499794936a085ed72740eea8bac5f33df37171..8f1f3684305cabfe6f35d341557386c1d8f71cf1 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TraceView.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/TraceView.tsx @@ -2,78 +2,85 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import ClickAwayListener from '@material-ui/core/ClickAwayListener'; -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; -import * as api from '../api'; +import ClickAwayListener from '@material-ui/core/ClickAwayListener' +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import * as api from '../api' export interface IProps { - run: string; - worker: string; - span: string; - iframeRef: React.RefObject; + run: string + worker: string + span: string + iframeRef: React.RefObject } const useStyles = makeStyles(() => ({ root: { - flexGrow: 1, + flexGrow: 1 }, frame: { width: '100%', height: 'calc(100vh - 48px)', - border: 'none', - }, -})); + border: 'none' + } +})) export const TraceView: React.FC = (props) => { - const { run, worker, span, iframeRef } = props; - const classes = useStyles(); + const { run, worker, span, iframeRef } = props + const classes = useStyles() - const [traceData, setTraceData] = React.useState | null>(null); - const [traceViewReady, setTraceViewReady] = React.useState(false); + const [traceData, setTraceData] = React.useState | null>(null) + const [traceViewReady, setTraceViewReady] = React.useState(false) React.useEffect(() => { setTraceData( api.defaultApi.traceGet(run, worker, span).then((resp) => { - return JSON.stringify(resp); + return JSON.stringify(resp) }) - ); - }, [run, worker, span]); + ) + }, [run, worker, span]) React.useEffect(() => { - function callback(event: MessageEvent): void { - const data = event.data || {}; + function callback(event: MessageEvent) { + const data = event.data || {} if (data.msg === 'ready') { - setTraceViewReady(true); + setTraceViewReady(true) } } - window.addEventListener('message', callback); + window.addEventListener('message', callback) return () => { - window.removeEventListener('message', callback); - }; - }, []); + window.removeEventListener('message', callback) + } + }, []) React.useEffect(() => { if (traceData && traceViewReady) { traceData.then((data) => { - iframeRef.current?.contentWindow?.postMessage({ msg: 'data', data }, window.origin); - }); + iframeRef.current?.contentWindow?.postMessage( + { msg: 'data', data }, + '*' + ) + }) } - }, [traceData, traceViewReady]); - const setIframeActive = (): void => { - iframeRef.current?.focus(); - }; + }, [traceData, traceViewReady]) + const SetIframeActive = () => { + iframeRef.current?.focus() + } return (
{React.useMemo( () => ( - - + + ), [] )}
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AntTableChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AntTableChart.tsx index 83618064b55223ab06d4d1fec8b8b5eeab8d3268..064167fc64b4e00ec79b648a85d12dff23ecfcd0 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AntTableChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AntTableChart.tsx @@ -2,110 +2,110 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { makeStyles } from '@material-ui/core/styles'; -import { Table } from 'antd'; -import * as React from 'react'; -import { Graph } from '../../api'; +import { makeStyles } from '@material-ui/core/styles' +import { Table } from 'antd' +import * as React from 'react' +import { Graph } from '../../api' interface IProps { - graph: Graph; - sortColumn?: string; - initialPageSize?: number; - onRowSelected?: (record?: object, rowIndex?: number) => void; + graph: Graph + sortColumn?: string + initialPageSize?: number + onRowSelected?: (record?: object, rowIndex?: number) => void } const useStyles = makeStyles((theme) => ({ tooltip: { - whiteSpace: 'pre-wrap', + whiteSpace: 'pre-wrap' }, row: { - wordBreak: 'break-word', - }, -})); + wordBreak: 'break-word' + } +})) -const getTableColumns = function (columns: any, sort: string | undefined, tooltipClass: string): any { - let i = 0; - return columns.map((col: any) => { - const key = `col${i++}`; - const stringCompare = (a: any, b: any): number => a[key].localeCompare(b[key]); - const numberCompare = (a: any, b: any): number => (a[key] || 0) - (b[key] || 0); +const getTableColumns = function ( + columns: any, + sort: string | undefined, + tooltipClass: string +) { + let i = 0 + return columns.map(function (col: any) { + const key = 'col' + i++ + const stringCompare = (a: any, b: any) => a[key].localeCompare(b[key]) + const numberCompare = (a: any, b: any) => (a[key] || 0) - (b[key] || 0) return { dataIndex: key, key: key, title: col.name, - sorter: col.type === 'string' ? stringCompare : numberCompare, - defaultSortOrder: sort === col.name ? ('descend' as const) : undefined, - showSorterTooltip: col.tooltip ? { title: col.tooltip, overlayClassName: tooltipClass } : true, - }; - }); -}; + sorter: col.type == 'string' ? stringCompare : numberCompare, + defaultSortOrder: sort == col.name ? ('descend' as const) : undefined, + showSorterTooltip: col.tooltip + ? { title: col.tooltip, overlayClassName: tooltipClass } + : true + } + }) +} -const getTableRows = function (rows: any): any { - return rows.map((row: any) => { - let i = 0; - const res: any = {}; - row.forEach((entry: any) => { - res[`col${i++}`] = entry; - }); - return res; - }); -}; +const getTableRows = function (rows: any) { + return rows.map(function (row: any) { + let i = 0 + const res: any = {} + row.forEach(function (entry: any) { + res['col' + i++] = entry + }) + return res + }) +} export const AntTableChart: React.FC = (props) => { - const { graph, sortColumn, initialPageSize, onRowSelected } = props; - const classes = useStyles(props); + const { graph, sortColumn, initialPageSize, onRowSelected } = props + const classes = useStyles(props) - const rows = React.useMemo(() => getTableRows(graph.rows), [graph.rows]); + const rows = React.useMemo(() => getTableRows(graph.rows), [graph.rows]) const columns = React.useMemo( () => getTableColumns(graph.columns, sortColumn, classes.tooltip), [graph.columns, sortColumn, classes.tooltip] - ); + ) // key is used to reset the Table state (page and sort) if the columns change - const key: string = React.useMemo(() => `${Math.random()}`, [graph.columns]); + const key = React.useMemo(() => Math.random() + '', [graph.columns]) - const [pageSize, setPageSize] = React.useState(initialPageSize ?? 30); - const onShowSizeChange = (current: number, size: number): void => { - setPageSize(size); - }; + const [pageSize, setPageSize] = React.useState(initialPageSize ?? 30) + const onShowSizeChange = (current: number, size: number) => { + setPageSize(size) + } - const onRow = ( - record: object, - rowIndex?: number - ): { - onMouseEnter: (event: any) => void; - onMouseLeave: (event: any) => void; - } => { + const onRow = (record: object, rowIndex?: number) => { return { - onMouseEnter: (event: any): void => { + onMouseEnter: (event: any) => { if (onRowSelected) { - onRowSelected(record, rowIndex); + onRowSelected(record, rowIndex) } }, - onMouseLeave: (event: any): void => { + onMouseLeave: (event: any) => { if (onRowSelected) { - onRowSelected(undefined, undefined); + onRowSelected(undefined, undefined) } - }, - }; - }; + } + } + } return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AreaChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AreaChart.tsx index cda12860c2fba41f5a15c5d9e73fb92093c0371b..6a0f5b484d9c156927edfeae64a729bec821c164 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AreaChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/AreaChart.tsx @@ -2,46 +2,44 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; -import { Graph } from '../../api'; -import { useResizeEventDependency } from '../../utils/resize'; +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import { Graph } from '../../api' +import { useResizeEventDependency } from '../../utils/resize' interface IProps { - graph: Graph; - height?: number; - hAxisTitle?: string; + graph: Graph + height?: number + hAxisTitle?: string } const useStyles = makeStyles(() => ({ root: { - height: (props: Pick): number | undefined => props.height, - }, -})); + height: (props: Pick) => props.height + } +})) export const AreaChart: React.FC = (props) => { - const { graph, height = 400, hAxisTitle } = props; - const classes = useStyles({ height }); - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); + const { graph, height = 400, hAxisTitle } = props + const classes = useStyles({ height }) + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; - } + const element = graphRef.current + if (!element) return - const data = new google.visualization.DataTable(); - data.addColumn('string', 'step'); + const data = new google.visualization.DataTable() + data.addColumn('string', 'step') graph.columns.forEach((column) => { data.addColumn({ type: column.type, label: column.name, role: column.role, - p: column.p, - }); - }); - data.addRows(graph.rows.map((x, i) => [(i + 1).toString(), ...x])); + p: column.p + }) + }) + data.addRows(graph.rows.map((x, i) => [(i + 1).toString(), ...x])) const options = { title: graph.title, @@ -51,22 +49,22 @@ export const AreaChart: React.FC = (props) => { tooltip: { isHtml: true }, chartArea: { left: '15%', width: '80%', top: '10%' }, hAxis: { - title: hAxisTitle, - }, - }; + title: hAxisTitle + } + } - const chart = new google.visualization.AreaChart(element); + const chart = new google.visualization.AreaChart(element) - chart.draw(data, options); + chart.draw(data, options) return () => { - chart.clearChart(); - }; - }, [graph, height, resizeEventDependency]); + chart.clearChart() + } + }, [graph, height, resizeEventDependency]) return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/ColumnChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/ColumnChart.tsx index ae51dc1a34e94b1c91eab2fe502ffe2cbc20f618..1c83eea95998222903a161d6ddbb678189a03775 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/ColumnChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/ColumnChart.tsx @@ -15,62 +15,58 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Offer offline supporting. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { useResizeEventDependency } from '../../utils/resize'; -import * as echarts from 'echarts'; +import * as React from 'react' +import { useResizeEventDependency } from '../../utils/resize' +import * as echarts from 'echarts' interface IProps { - title?: string; - units?: string; - colors?: Array; - chartData: ColumnChartData; + title?: string + units?: string + colors?: Array + chartData: ColumnChartData } export interface ColumnChartData { - legends: Array; - barLabels: Array; - barHeights: Array>; + legends: Array + barLabels: Array + barHeights: Array> } export const ColumnChart: React.FC = (props) => { - const { title, units, colors, chartData } = props; - const { legends, barLabels, barHeights } = chartData; - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); + const { title, units, colors, chartData } = props + const { legends, barLabels, barHeights } = chartData + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() - const getAngleByDataLength = (data: number): number => { + const getAngleByDataLength = (data: number) => { if (data < 10) { - return 0; + return 0 } else { // 数量越大越趋近于旋转90度 - return 90 * (1 - (10 / data)); + return 90 * (1 - 10 / data) } - }; + } React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; - } + const element = graphRef.current + if (!element) return - const chart = echarts.init(element); - const dataSource: Array> = []; - dataSource.push(['worker', ...legends]); + const chart = echarts.init(element) + const dataSource: Array> = [] + dataSource.push(['worker', ...legends]) barHeights.forEach((item, index) => { - if (barLabels[index] !== undefined) { - dataSource.push([barLabels[index], ...item]); - } - }); + barLabels[index] !== undefined && dataSource.push([barLabels[index], ...item]) + }) const options: echarts.EChartsOption = { title: { - text: title, + text: title }, legend: { - bottom: 0, + bottom: 0 }, xAxis: { type: 'category', @@ -78,41 +74,43 @@ export const ColumnChart: React.FC = (props) => { interval: 0, rotate: getAngleByDataLength(barLabels.length), formatter: (name: string) => { - const index = name.indexOf('@'); - const processedName = index > -1 ? name.slice(index + 1) : name; // 使用新变量处理 - return processedName.length > 16 ? `${processedName.slice(0, 14)}...` : processedName; - }, - }, + const index = name.indexOf('@') + if (index > -1) { + name = name.slice(index + 1) + } + return name.length > 16 ? name.slice(0, 14) + "..." : name; + } + } }, yAxis: { type: 'value', name: units, nameTextStyle: { - fontSize: 16, - }, + fontSize: 16 + } }, tooltip: { - trigger: 'item', + trigger: 'item' }, dataset: { - source: dataSource, + source: dataSource }, series: Array(legends.length).fill({ type: 'bar', - stack: 'samesign', + stack: 'samesign' }), - }; + } if (colors) { - options.color = colors.slice(0, barLabels.length); + options.color = colors.slice(0, barLabels.length) } - if (options) { - chart.setOption(options, true); - } + options && chart.setOption(options, true) return () => { - chart.dispose(); - }; - }, [title, chartData, resizeEventDependency]); + chart.dispose() + } + }, [title, chartData, resizeEventDependency]) - return
; -}; + return ( +
+ ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/LineChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/LineChart.tsx new file mode 100644 index 0000000000000000000000000000000000000000..b9a031d3a44336e568f30524abc8837590b3f603 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/LineChart.tsx @@ -0,0 +1,224 @@ +/*--------------------------------------------------------------------------------------------- + * Copyright (c) Microsoft Corporation. All rights reserved. + *--------------------------------------------------------------------------------------------*/ + +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import { Graph, GraphAscend } from '../../api' +import { useResizeEventDependency } from '../../utils/resize' +import { binarySearch } from '../../utils/binarysearch' + +interface IProps { + graph: Graph | GraphAscend + height?: number + deviceTarget: string + tag: string + hAxisTitle?: string + vAxisTitle?: string + explorerOptions?: object + onSelectionChanged?: (start: number, end: number) => void + record?: any +} + +const useStyles = makeStyles(() => ({ + root: { + height: (props: Pick) => props.height + } +})) + +export const LineChart: React.FC = (props) => { + const { + graph, + height = 400, + deviceTarget, + tag, + hAxisTitle, + vAxisTitle, + onSelectionChanged, + explorerOptions, + record + } = props + const classes = useStyles({ height }) + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() + const [chartObj, setChartObj] = React.useState() + + React.useLayoutEffect(() => { + const element = graphRef.current + if (!element) return + + const options = { + title: graph.title, + isStacked: true, + height, + legend: { position: 'bottom' }, + tooltip: { isHtml: true }, + hAxis: { + title: hAxisTitle + }, + vAxis: { + title: vAxisTitle + }, + explorer: explorerOptions + } + + const chart = new google.visualization.LineChart(element) + + // Disable selection of single point + google.visualization.events.addListener(chart, 'select', function () { + chart.setSelection() + }) + + google.visualization.events.addListener(chart, 'ready', function () { + var zoomLast = getCoords() + var observer = new MutationObserver(function () { + var zoomCurrent = getCoords() + if (JSON.stringify(zoomLast) !== JSON.stringify(zoomCurrent)) { + zoomLast = getCoords() + if (onSelectionChanged) { + onSelectionChanged(zoomLast.x_min, zoomLast.x_max) + } + } + }) + if (graphRef.current) { + observer.observe(graphRef.current, { + childList: true, + subtree: true + }) + } + }) + + function getCoords() { + var chartLayout = chart.getChartLayoutInterface() + var chartBounds = chartLayout.getChartAreaBoundingBox() + + return { + x_min: chartLayout.getHAxisValue(chartBounds.left), + x_max: chartLayout.getHAxisValue(chartBounds.width + chartBounds.left) + } + } + + if (deviceTarget === 'Ascend') { + let data = new google.visualization.DataTable() + if (tag === 'Component') { + if (graph.columns.length === 3) { + graph.columns.forEach((column) => { + data.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + }) + data.addRows(graph.rows['PTA'] ?? graph.rows['GE']) + } else if (graph.columns.length === 5) { + const data2 = new google.visualization.DataTable() + graph.columns.forEach((column, index) => { + if (index === 0 || index < 3) { + data.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + } + if (index === 0 || index >= 3) { + data2.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + } + }) + data.addRows(graph.rows['PTA']) + data2.addRows(graph.rows['GE']) + data = google.visualization.data.join(data, data2, 'full', [[0, 0]], [1, 2], [1, 2]) + } + } else { + if (graph.columns.length === 2) { + graph.columns.forEach((column) => { + data.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + }) + data.addRows(graph.rows['Allocated'] ?? graph.rows['Reserved']) + } else if (graph.columns.length === 3) { + const data2 = new google.visualization.DataTable() + graph.columns.forEach((column, index) => { + if (index === 0 || index < 2) { + data.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + } + if (index === 0 || index >= 2) { + data2.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + } + }) + data.addRows(graph.rows['Allocated']) + data2.addRows(graph.rows['Reserved']) + data = google.visualization.data.join(data, data2, 'full', [[0, 0]], [1], [1]) + } + } + + chart.draw(data, options) + } else { + const data = new google.visualization.DataTable() + graph.columns.forEach((column) => { + data.addColumn({ + type: column.type, + label: column.name, + role: column.role, + p: column.p + }) + }) + data.addRows(graph.rows) + chart.draw(data, options) + } + + setChartObj(chart) + }, [graph, height, resizeEventDependency]) + + React.useEffect(() => { + const compare_fn = (key: number, mid: Array) => + key - parseFloat(mid[0].toFixed(2)) + if (chartObj && tag === 'Operator') { + if (record) { + if (deviceTarget === 'Ascend') { + let startId = binarySearch(graph.rows['Allocated'], record.col2, compare_fn) + let endId = binarySearch(graph.rows['Allocated'], record.col3, compare_fn) + let selection = [] + if (startId >= 0) selection.push({ row: startId, column: 1 }) + if (endId >= 0) selection.push({ row: endId, column: 1 }) + chartObj.setSelection(selection) + } else { + let startId = binarySearch(graph.rows, record.col2, compare_fn) + let endId = binarySearch(graph.rows, record.col3, compare_fn) + let selection = [] + if (startId >= 0) selection.push({ row: startId, column: 1 }) + if (endId >= 0) selection.push({ row: endId, column: 1 }) + chartObj.setSelection(selection) + } + } else { + chartObj.setSelection() + } + } + }, [graph, record, chartObj]) + + return ( +
+
+
+ ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/NewLineChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/NewLineChart.tsx index a6e222a6cc9d04b3b0c9031be60b91b75fe9ab37..af350e93d96c364d9baf4952bd59458a7bbd0801 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/NewLineChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/NewLineChart.tsx @@ -15,79 +15,85 @@ * limitations under the License. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { Graph, GraphAscend } from '../../api'; -import { useResizeEventDependency } from '../../utils/resize'; -import { binarySearch } from '../../utils/binarysearch'; -import * as echarts from 'echarts'; +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import { Graph, GraphAscend } from '../../api' +import { useResizeEventDependency } from '../../utils/resize' +import { binarySearch } from '../../utils/binarysearch' +import * as echarts from 'echarts' interface IProps { - graph: Graph | GraphAscend; - height?: number; - deviceTarget: string; - tag: string; - hAxisTitle?: string; - vAxisTitle?: string; - onSelectionChanged?: (start: number, end: number) => void; - record?: any; + graph: Graph | GraphAscend + height?: number + deviceTarget: string + tag: string + hAxisTitle?: string + vAxisTitle?: string + onSelectionChanged?: (start: number, end: number) => void + record?: any } export const LineChart: React.FC = (props) => { - const { graph, height = 400, deviceTarget, tag, hAxisTitle, vAxisTitle, onSelectionChanged, record } = props; - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); - const [chartObj, setChartObj] = React.useState(); - const selectedPoints = React.useRef>([]); + const { + graph, + height = 400, + deviceTarget, + tag, + hAxisTitle, + vAxisTitle, + onSelectionChanged, + record + } = props + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() + const [chartObj, setChartObj] = React.useState() + const selectedPoints = React.useRef>([]) React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; - } - element.oncontextmenu = (): boolean => { - return false; - }; + const element = graphRef.current + if (!element) return + element.oncontextmenu = () => { return false } - let myChart = echarts.init(element); + let myChart = echarts.init(element) let option: echarts.EChartsOption = { title: { text: graph.title, textStyle: { - fontSize: 16, - }, + fontSize: 16 + } }, tooltip: { trigger: 'axis' }, legend: { type: 'scroll', - bottom: 0, + bottom: 0 }, xAxis: { type: 'category', boundaryGap: false, - name: hAxisTitle, + name: hAxisTitle }, yAxis: { type: 'value', name: vAxisTitle, - scale: true, + scale: true }, toolbox: { feature: { dataZoom: { - yAxisIndex: 'none', + yAxisIndex: 'none' }, - restore: {}, - }, - }, - }; + restore: {} + } + } + } if (deviceTarget === 'Ascend') { if (tag === 'Component') { const mixedTooltip: echarts.TooltipComponentOption = { trigger: 'axis', formatter: function (params: any) { - let res = `${params[0].name}
`; + var res = `${params[0].name}
` for (const item of params) { if (typeof item.value[item.encode.y[0]] === 'number') { res += ` - ${item.seriesName}: ${item.value[item.encode.y[0]]}
`; + ${item.seriesName}: ${item.value[item.encode.y[0]]}
` } } - return res; - }, - }; + return res + } + } if (graph.columns.length <= 4) { - let finalRows = graph.rows.PTA ?? graph.rows.GE; + let finalRows = graph.rows['PTA'] ?? graph.rows['GE'] if (graph.columns.length === 4) { - const mergedAPPRows = graph.rows.APP.map((item: Array) => { - return [item[0], null, null, item[1]]; - }); + const mergedAPPRows = graph.rows['APP'].map((item: Array) => { + return [item[0], null, null, item[1]] + }) finalRows = finalRows.concat(mergedAPPRows).sort((a: any, b: any) => { - return a[0] - b[0]; - }); + return a[0] - b[0] + }) } option = { ...option, tooltip: mixedTooltip, dataset: { - source: [graph.columns.map((column) => column.name), ...finalRows], + source: [ + graph.columns.map(column => column.name), + ...finalRows + ] }, - series: Array(graph.columns.length - 1).fill({ - type: 'line', - select: { - itemStyle: { - borderWidth: 5, - shadowBlur: 5, + series: Array(graph.columns.length - 1).fill( + { + type: 'line', + select: { + itemStyle: { + borderWidth: 5, + shadowBlur: 5 + } }, - }, - emphasis: { - itemStyle: { - borderWidth: 5, - shadowBlur: 5, + emphasis: { + itemStyle: { + borderWidth: 5, + shadowBlur: 5 + } }, - }, - selectedMode: 'single', - }), - }; + selectedMode: 'single', + } + ) + } } else if (graph.columns.length <= 6) { - const datasetTitle = graph.columns.map((item) => item.name); - let mergedGERows = graph.rows.GE.map((item: Array) => { - return [item[0], null, null, item[1], item[2]]; - }); + const datasetTitle = graph.columns.map(item => item.name) + let mergedGERows = graph.rows['GE'].map((item: Array) => { + return [item[0], null, null, item[1], item[2]] + }) if (graph.columns.length === 6) { - const mergedAPPRows = graph.rows.APP.map((item: Array) => { - return [item[0], null, null, null, null, item[2]]; - }); - mergedGERows = mergedGERows.concat(mergedAPPRows); + const mergedAPPRows = graph.rows['APP'].map((item: Array) => { + return [item[0], null, null, null, null, item[2]] + }) + mergedGERows = mergedGERows.concat(mergedAPPRows) } - const finalRows = graph.rows.PTA.concat(mergedGERows).sort((a: any, b: any) => { - return a[0] - b[0]; - }); + const finalRows = graph.rows['PTA'].concat(mergedGERows).sort((a: any, b: any) => { + return a[0] - b[0] + }) option = { ...option, tooltip: mixedTooltip, - dataset: { - source: [datasetTitle, ...finalRows], + dataset: + { + source: [ + datasetTitle, + ...finalRows + ] }, - series: Array(graph.columns.length - 1).fill({ - type: 'line', - connectNulls: true, - select: { - itemStyle: { - borderWidth: 5, - shadowBlur: 5, + series: Array(graph.columns.length - 1).fill( + { + type: 'line', + connectNulls: true, + select: { + itemStyle: { + borderWidth: 5, + shadowBlur: 5 + } }, - }, - emphasis: { - itemStyle: { - borderWidth: 5, - shadowBlur: 5, + emphasis: { + itemStyle: { + borderWidth: 5, + shadowBlur: 5 + } }, - }, - selectedMode: 'single', - datasetIndex: 0, - }), - }; + selectedMode: 'single', + datasetIndex: 0 + }) + } } } else { if (graph.columns.length === 3) { - const datasetTitle1: Array = []; - const datasetTitle2: Array = []; + const datasetTitle1: Array = [] + const datasetTitle2: Array = [] graph.columns.forEach((column, index) => { if (index === 0 || index < 2) { - datasetTitle1.push(column.name); + datasetTitle1.push(column.name) } if (index === 0 || index >= 2) { - datasetTitle2.push(column.name); + datasetTitle2.push(column.name) } - }); + }) option = { ...option, dataset: [ { - source: [datasetTitle1, ...graph.rows.Allocated], + source: [ + datasetTitle1, + ...graph.rows['Allocated'] + ] }, { - source: [datasetTitle2, ...graph.rows.Reserved], - }, + source: [ + datasetTitle2, + ...graph.rows['Reserved'] + ] + } ], series: [ { @@ -204,20 +226,20 @@ export const LineChart: React.FC = (props) => { name: 'Allocated', emphasis: { label: { - show: true, + show: true }, itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, select: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, - datasetIndex: 0, + datasetIndex: 0 }, { type: 'line', @@ -225,27 +247,30 @@ export const LineChart: React.FC = (props) => { select: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, emphasis: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, selectedMode: 'single', - datasetIndex: 1, - }, - ], - }; + datasetIndex: 1 + } + ] + } } } } else { option = { ...option, dataset: { - source: [graph.columns.map((column) => column.name), ...graph.rows], + source: [ + graph.columns.map(column => column.name), + ...graph.rows + ] }, series: [ { @@ -254,16 +279,16 @@ export const LineChart: React.FC = (props) => { select: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, emphasis: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, - selectedMode: 'single', + selectedMode: 'single' }, { type: 'line', @@ -271,116 +296,112 @@ export const LineChart: React.FC = (props) => { select: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, emphasis: { itemStyle: { borderWidth: 5, - shadowBlur: 5, - }, + shadowBlur: 5 + } }, - selectedMode: 'single', - }, - ], - }; + selectedMode: 'single' + } + ] + } } - if (option) { - myChart.setOption(option, true); - } + option && myChart.setOption(option, true) myChart.dispatchAction({ type: 'takeGlobalCursor', key: 'dataZoomSelect', - dataZoomSelectActive: true, - }); + dataZoomSelectActive: true + }) myChart.on('dataZoom', (param: any) => { if (onSelectionChanged) { - onSelectionChanged(param.batch[0].startValue, param.batch[0].endValue); + onSelectionChanged(param.batch[0].startValue, param.batch[0].endValue) } - }); + }) myChart.on('restore', () => { if (onSelectionChanged) { // Set startId greater than endId to query all memory events. - onSelectionChanged(0, -1); + onSelectionChanged(0, -1) } - }); + }) myChart.on('click', (param) => { myChart.dispatchAction({ type: 'unselect', seriesId: param.seriesId, - dataIndex: selectedPoints.current, - }); + dataIndex: selectedPoints.current + }) myChart.dispatchAction({ type: 'select', seriesId: param.seriesId, - dataIndex: param.dataIndex, - }); + dataIndex: param.dataIndex + }) - selectedPoints.current = [param.dataIndex]; - }); + selectedPoints.current = [param.dataIndex] + }) myChart.getZr().on('contextmenu', () => { myChart.dispatchAction({ - type: 'restore', - }); + type: 'restore' + }) myChart.dispatchAction({ type: 'takeGlobalCursor', key: 'dataZoomSelect', - dataZoomSelectActive: true, - }); - }); + dataZoomSelectActive: true + }) + }) - setChartObj(myChart); + setChartObj(myChart) return () => { - myChart.dispose(); - }; - }, [graph, height, resizeEventDependency]); + myChart.dispose() + } + }, [graph, height, resizeEventDependency]) React.useEffect(() => { - const compareFn = (key: number, mid: Array): number => key - mid[0]; + const compare_fn = (key: number, mid: Array) => key - mid[0] if (chartObj && tag === 'Operator') { if (record) { - let startId = -1; - let endId = -1; + let startId = -1 + let endId = -1 if (deviceTarget === 'Ascend') { - startId = binarySearch(graph.rows.Allocated, record.col2, compareFn); - endId = binarySearch(graph.rows.Allocated, record.col3, compareFn); + startId = binarySearch(graph.rows['Allocated'], record.col2, compare_fn) + endId = binarySearch(graph.rows['Allocated'], record.col3, compare_fn) } else { - startId = binarySearch(graph.rows, record.col2, compareFn); - endId = binarySearch(graph.rows, record.col3, compareFn); - } - let selection = []; - if (startId >= 0) { - selection.push(startId); - } - if (endId >= 0) { - selection.push(endId); + startId = binarySearch(graph.rows, record.col2, compare_fn) + endId = binarySearch(graph.rows, record.col3, compare_fn) } + let selection = [] + startId >= 0 && selection.push(startId) + endId >= 0 && selection.push(endId) chartObj.dispatchAction({ type: 'downplay', seriesName: 'Allocated', - dataIndex: selectedPoints.current, - }); + dataIndex: selectedPoints.current + }) chartObj.dispatchAction({ type: 'highlight', seriesName: 'Allocated', - dataIndex: selection, - }); - selectedPoints.current = selection; + dataIndex: selection + }) + selectedPoints.current = selection } else { chartObj.dispatchAction({ type: 'downplay', seriesName: 'Allocated', - dataIndex: selectedPoints.current, - }); - selectedPoints.current = []; + dataIndex: selectedPoints.current + }) + selectedPoints.current = [] } } - }, [graph, record, chartObj]); + }, [graph, record, chartObj]) - return
; -}; + return ( +
+ ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/PieChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/PieChart.tsx index 49c59ff02e91f7b7fe0d90ddff4239478ca19a0a..2c7ea1c1413ab932c226d1a919362a611a88d4ae 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/PieChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/PieChart.tsx @@ -19,104 +19,83 @@ * Modifications: Offer offline supporting. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { Graph } from '../../api'; -import { value } from '../../utils'; -import { useResizeEventDependency } from '../../utils/resize'; -import * as echarts from 'echarts'; +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import { Graph } from '../../api' +import { value } from '../../utils' +import { useResizeEventDependency } from '../../utils/resize' +import * as echarts from 'echarts' interface IProps { - graph: Graph; - height?: number; - top?: number; - noLegend?: boolean; - title?: string; - colors?: Array; - tooltipMode?: string; + graph: Graph + height?: number + top?: number + noLegend?: boolean + title?: string + colors?: Array + tooltip_mode?: string } -interface IAreaPosition { - left: string; - width: string; - top?: string; - height?: string; -} - -const noLegendArea: IAreaPosition = { - left: '5%', - width: '90%', - top: '5%', - height: '90%', -}; -const normalArea: IAreaPosition = { left: '5%', width: '95%' }; -const noTitleArea: IAreaPosition = { - left: '5%', - width: '95%', - top: '10%', - height: '80%', -}; +const noLegendArea = { left: '5%', width: '90%', top: '5%', height: '90%' } +const normalArea = { left: '5%', width: '95%' } +const noTitleArea = { left: '5%', width: '95%', top: '10%', height: '80%' } export const PieChart: React.FC = (props) => { - const { graph, height = 300, top, noLegend, title, colors, tooltipMode = 'both' } = props; - const graphRef = React.useRef(null); + const { + graph, + height = 300, + top, + noLegend, + title, + colors, + tooltip_mode = 'both' + } = props + const graphRef = React.useRef(null) - const [resizeEventDependency] = useResizeEventDependency(); + const [resizeEventDependency] = useResizeEventDependency() React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; - } + const element = graphRef.current + if (!element) return - const chart = echarts.init(element); + const chart = echarts.init(element) - let totalValue = 0; - const rowsWithUniqueName: Array<{ name: string; value: number }> = + let totalValue = 0 + const rowsWithUniqueName: Array<{ name: string, value: number }> = top === undefined ? graph.rows.map((item, index) => { - totalValue += item[1] as number; - return { name: `${index}_${item[0]}`, value: item[1] as number }; - }) + totalValue += item[1] as number + return { name: `${index}_${item[0]}`, value: item[1] as number } + }) : graph.rows - .sort((a, b) => (value(b[1]) as number) - (value(a[1]) as number)) - .slice(0, top) - .map((item, index) => { - totalValue += item[1] as number; - return { name: `${index}_${item[0]}`, value: item[1] as number }; - }); + .sort((a, b) => (value(b[1]) as number) - (value(a[1]) as number)) + .slice(0, top).map((item, index) => { + totalValue += item[1] as number + return { name: `${index}_${item[0]}`, value: item[1] as number } + }) const option: echarts.EChartsOption = { height, width: '100%', title: { - text: title, + text: title }, tooltip: { trigger: 'item', formatter: (data) => { - const typedData = data as echarts.DefaultLabelFormatterCallbackParams; - const index = typedData.name.indexOf('_'); - const safeName = typedData.name.replace(//g, '>'); - return `${index > -1 ? safeName.slice(index + 1) : safeName}
${ - tooltipMode === 'both' ? typedData.value : '' - }(${typedData.percent}%)`; + const typedData = data as echarts.DefaultLabelFormatterCallbackParams + const index = typedData.name.indexOf('_') + const safeName = typedData.name.replace(//g, '>') + return `${index > -1 ? safeName.slice(index + 1) : safeName}
${tooltip_mode === 'both' ? + typedData.value : ''}(${typedData.percent}%)` }, confine: true, extraCssText: `max-width: 300px; word-wrap:break-word; white-space:pre-wrap; - padding-right: 10px`, + padding-right: 10px` }, - chartArea: ((): IAreaPosition => { - if (noLegend) { - return noLegendArea; - } - if (!title) { - return noTitleArea; - } else { - return normalArea; - } - })(), + chartArea: noLegend ? noLegendArea : !title ? noTitleArea : normalArea, legend: { type: noLegend ? 'plain' : 'scroll', orient: 'vertical', @@ -125,23 +104,24 @@ export const PieChart: React.FC = (props) => { // Display at most 36 characters. formatter: (name) => { // Show legends for datas with the same name. - const index = name.indexOf('_'); - const processedName = index > -1 ? name.slice(index + 1) : name; // 使用新变量处理 - return processedName.length > 36 ? `${processedName.slice(0, 34)}...` : processedName; + const index = name.indexOf('_') + if (index > -1) { + name = name.slice(index + 1) + } + return name.length > 36 ? name.slice(0, 34) + "..." : name; }, tooltip: { show: true, triggerOn: 'mousemove', formatter: (data) => { - const currentItem = rowsWithUniqueName.find((item) => item.name === data.name); - const index = data.name.indexOf('_'); - const percent = (((currentItem?.value || 0) * 100) / totalValue).toFixed(2); - const safeName = data.name.replace(//g, '>'); - return `${index > -1 ? safeName.slice(index + 1) : safeName}
${ - tooltipMode === 'both' ? currentItem?.value || 0 : '' - }(${percent}%)`; - }, - }, + const currentItem = rowsWithUniqueName.find(item => item.name === data.name) + const index = data.name.indexOf('_') + const percent = ((currentItem?.value || 0) * 100 / totalValue).toFixed(2) + const safeName = data.name.replace(//g, '>') + return `${index > -1 ? safeName.slice(index + 1) : + safeName}
${tooltip_mode === 'both' ? (currentItem?.value || 0) : ''}(${percent}%)` + } + } }, sliceVisibilityThreshold: 0, colors, @@ -153,21 +133,21 @@ export const PieChart: React.FC = (props) => { label: { position: 'inside', formatter: `{d}%`, - color: '#ffffff', + color: '#ffffff' }, - data: rowsWithUniqueName, - }, - ], - }; - - if (option) { - chart.setOption(option, true); + data: rowsWithUniqueName + } + ] } + option && chart.setOption(option, true) + return () => { - chart.dispose(); - }; - }, [graph, height, top, resizeEventDependency]); + chart.dispose() + } + }, [graph, height, top, resizeEventDependency]) - return
; -}; + return ( +
+ ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/SteppedAreaChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/SteppedAreaChart.tsx index 3e3b01ccb112aeb80795246bd6f3e2ad83aa2a66..bc38cc31747cd69e8fee7af4d55476f49bef9914 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/SteppedAreaChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/SteppedAreaChart.tsx @@ -19,88 +19,84 @@ * Modifications: Offer offline supporting. *--------------------------------------------------------------------------------------------*/ -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; -import { StepedGraph } from '../../api'; -import { useResizeEventDependency } from '../../utils/resize'; -import * as echarts from 'echarts'; +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import { StepedGraph } from '../../api' +import { useResizeEventDependency } from '../../utils/resize' +import * as echarts from 'echarts' interface IProps { - graph: StepedGraph; - height?: number; - hAxisTitle?: string; - vAxisTitle?: string; + graph: StepedGraph + height?: number + hAxisTitle?: string + vAxisTitle?: string } const useStyles = makeStyles(() => ({ root: { - height: (props: Pick): number | undefined => props.height, - }, -})); + height: (props: Pick) => props.height + } +})) export const SteppedAreaChart: React.FC = (props) => { - const { graph, height = 400, hAxisTitle, vAxisTitle } = props; - const classes = useStyles({ height }); - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); + const { graph, height = 400, hAxisTitle, vAxisTitle } = props + const classes = useStyles({ height }) + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element) { - return undefined; - } + const element = graphRef.current + if (!element) return - const chart = echarts.init(element); - const dataSource: Array> = []; - dataSource.push(graph.columns); + const chart = echarts.init(element) + const dataSource: Array> = [] + dataSource.push(graph.columns) graph.rows.forEach((row) => { - dataSource.push(row.map((item) => item.value)); - }); + dataSource.push(row.map(item => item.value)) + }) const options: echarts.EChartsOption = { title: { - text: graph.title, + text: graph.title }, legend: { - bottom: 0, + bottom: 0 }, xAxis: { type: 'category', name: hAxisTitle, axisLabel: { interval: 0, - }, + } }, yAxis: { type: 'value', - name: vAxisTitle, + name: vAxisTitle }, tooltip: { trigger: 'item', formatter: (params: any) => { - return graph.rows[params.dataIndex][params.seriesIndex + 1]?.tooltip || ''; - }, + return graph.rows[params.dataIndex][params.seriesIndex + 1]?.tooltip || '' + } }, dataset: { - source: dataSource, + source: dataSource }, series: Array(graph.columns.length - 1).fill({ type: 'bar', - stack: 'samesign', - }), - }; - - if (options) { - chart.setOption(options, true); + stack: 'samesign' + }) } + options && chart.setOption(options, true) + return () => { - chart.dispose(); - }; - }, [graph, height, resizeEventDependency]); + chart.dispose() + } + }, [graph, height, resizeEventDependency]) return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/TableChart.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/TableChart.tsx index 444b41b196c162340b846ac488d70eb908c7b717..267624c85e02e30e047ff50e7d126259b765c83e 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/TableChart.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/charts/TableChart.tsx @@ -2,54 +2,56 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { makeStyles } from '@material-ui/core/styles'; -import * as React from 'react'; -import { Graph } from '../../api'; -import { useResizeEventDependency } from '../../utils/resize'; +import { makeStyles } from '@material-ui/core/styles' +import * as React from 'react' +import { Graph } from '../../api' +import { useResizeEventDependency } from '../../utils/resize' interface IProps { - graph: Graph; - sortColumn?: number; - height?: number; - allowHtml?: boolean; - setCellProperty?: (row: number, column: number, cb: (key: string, value: any) => void) => void; + graph: Graph + sortColumn?: number + height?: number + allowHtml?: boolean + setCellProperty?: ( + row: number, + column: number, + cb: (key: string, value: any) => void + ) => void } const useStyles = makeStyles(() => ({ root: { - height: (props: IProps): number | undefined => props.height, - }, -})); + height: (props: IProps) => props.height + } +})) export const TableChart: React.FC = (props) => { - const { graph, sortColumn, setCellProperty, allowHtml } = props; - const classes = useStyles(props); - const graphRef = React.useRef(null); - const [resizeEventDependency] = useResizeEventDependency(); + const { graph, sortColumn, setCellProperty, allowHtml } = props + const classes = useStyles(props) + const graphRef = React.useRef(null) + const [resizeEventDependency] = useResizeEventDependency() React.useLayoutEffect(() => { - const element = graphRef.current; - if (!element || !element.parentElement) { - return; - } + const element = graphRef.current + if (!element) return - const data = new google.visualization.DataTable(); + const data = new google.visualization.DataTable() graph.columns.forEach((column) => { data.addColumn({ type: column.type, label: column.name, role: column.role, - p: column.p, - }); - }); - data.addRows(graph.rows); + p: column.p + }) + }) + data.addRows(graph.rows) if (setCellProperty) { for (let row = 0; row < graph.rows.length; ++row) { for (let column = 0; column < graph.columns.length; ++column) { setCellProperty(row, column, (key: string, value: any) => { - data.setProperty(row, column, key, value); - }); + data.setProperty(row, column, key, value) + }) } } } @@ -62,24 +64,24 @@ export const TableChart: React.FC = (props) => { pageSize: 30, tooltip: { isHtml: true }, sortColumn: sortColumn, - sortAscending: false, - }; + sortAscending: false + } - const chart = new google.visualization.Table(element); + const chart = new google.visualization.Table(element) /* `chart.draw()` removes the contents of `element` and rebuilds it. This can cause a jump in the scroll position * if the height/width change to 0. Since we can't change the code of Google Charts, we temporarily lock the dims * of the parent container. */ if (element.offsetHeight > 0) { - element.parentElement.style.height = `${element.offsetHeight}px`; + element.parentElement!.style.height = element.offsetHeight + 'px' } - chart.draw(data, options); - element.parentElement.style.height = ''; - }, [graph, resizeEventDependency]); + chart.draw(data, options) + element.parentElement!.style.height = '' + }, [graph, resizeEventDependency]) return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/helpers.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/helpers.tsx index bfbb346e4b3daf65247e6e954346ed7245993f31..b787a5e91976a7f8f5839978276b35cf2a900cab 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/helpers.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/helpers.tsx @@ -2,40 +2,48 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { makeStyles } from '@material-ui/core/styles'; -import Tooltip from '@material-ui/core/Tooltip'; -import HelpOutline from '@material-ui/icons/HelpOutline'; -import clsx from 'clsx'; -import * as React from 'react'; +import { makeStyles } from '@material-ui/core/styles' +import Tooltip from '@material-ui/core/Tooltip' +import HelpOutline from '@material-ui/icons/HelpOutline' +import clsx from 'clsx' +import * as React from 'react' export const useTooltipCommonStyles = makeStyles((theme) => ({ tooltip: { maxWidth: '600px', whiteSpace: 'pre-wrap', - fontSize: '14px', + fontSize: '14px' }, cardTitle: { display: 'flex', - alignItems: 'center', + alignItems: 'center' }, titleText: { - marginRight: theme.spacing(0.5), + marginRight: theme.spacing(0.5) }, smallTitleText: { fontSize: '.8rem', - fontWeight: 'bold', - }, -})); + fontWeight: 'bold' + } +})) -export const makeChartHeaderRenderer = - (classes: ReturnType, smallTitleText = true) => - (title: string, tooltip: string): JSX.Element => { - return ( - - {title} - - - +export const makeChartHeaderRenderer = ( + classes: ReturnType, + smallTitleText = true +) => (title: string, tooltip: string) => { + return ( + + + {title} - ); - }; + + + + + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallFrameList.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallFrameList.tsx index 0334d29e511399664d5204224e47cf1b88d50655..1e2a385bb634b3988142ada0d947adbb46c99715 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallFrameList.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallFrameList.tsx @@ -2,25 +2,25 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { CallStackFrame } from './transform'; -import { List } from 'antd'; -import { NavToCodeButton } from './NavToCodeButton'; -import { makeStyles } from '@material-ui/core/styles'; +import * as React from 'react' +import { CallStackFrame } from './transform' +import { List } from 'antd' +import { NavToCodeButton } from './NavToCodeButton' +import { makeStyles } from '@material-ui/core/styles' interface IProps { - callFrames: CallStackFrame[]; + callFrames: CallStackFrame[] } const useStyles = makeStyles(() => ({ item: { paddingTop: '1px !important', - paddingBottom: '1px !important', - }, -})); + paddingBottom: '1px !important' + } +})) -export const CallFrameList = (props: IProps): React.JSX.Element => { - const classes = useStyles(); +export const CallFrameList = (props: IProps) => { + const classes = useStyles() const renderItem = React.useCallback( (item: CallStackFrame) => ( @@ -29,7 +29,14 @@ export const CallFrameList = (props: IProps): React.JSX.Element => { ), [classes.item] - ); + ) - return ; -}; + return ( + + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallStackTable.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallStackTable.tsx index c3176428d11b8b40c691947b2f0da8fc15674c16..359d7c9028aaeb7497e0a8aa1baba8fa6d8768c1 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallStackTable.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/CallStackTable.tsx @@ -15,89 +15,99 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { makeStyles } from '@material-ui/core/styles'; -import { CallStackTableData, OperationTableDataInner } from '../../api'; -import { Table, TableProps } from 'antd'; +import * as React from 'react' +import { makeStyles } from '@material-ui/core/styles' +import { CallStackTableData, OperationTableDataInner } from '../../api' +import { Table, TableProps } from 'antd' -import * as api from '../../api'; -import { transformTableData, TransformedCallStackDataInner } from './transform'; -import { attachId, getCommonOperationColumns } from './common'; -import { OperationGroupBy } from '../../constants/groupBy'; -import { makeExpandIcon } from './ExpandIcon'; -import { CallFrameList } from './CallFrameList'; +import * as api from '../../api' +import { transformTableData, TransformedCallStackDataInner } from './transform' +import { attachId, getCommonOperationColumns } from './common' +import { OperationGroupBy } from '../../constants/groupBy' +import { makeExpandIcon } from './ExpandIcon' +import { CallFrameList } from './CallFrameList' export interface IProps { - data: OperationTableDataInner; - run: string; - worker: string; - span: string; - groupBy: OperationGroupBy; - deviceTarget: string; + data: OperationTableDataInner + run: string + worker: string + span: string + groupBy: OperationGroupBy + deviceTarget: string } const useStyles = makeStyles((theme) => ({ tooltip: { - whiteSpace: 'pre-wrap', - }, -})); + whiteSpace: 'pre-wrap' + } +})) const expandIcon = makeExpandIcon( 'View call frames', (record) => !record.callStackFrames.length -); +) -const rowExpandable = (record: TransformedCallStackDataInner): boolean => !!record.callStackFrames.length; -const expandedRowRender = (record: TransformedCallStackDataInner): React.JSX.Element => ( +const rowExpandable = (record: TransformedCallStackDataInner) => + !!record.callStackFrames.length +const expandedRowRender = (record: TransformedCallStackDataInner) => ( -); +) -export const CallStackTable = (props: IProps): React.JSX.Element => { - const { data, run, worker, span, groupBy, deviceTarget } = props; - const { name, input_shape } = data; - const classes = useStyles(props); +export const CallStackTable = (props: IProps) => { + const { data, run, worker, span, groupBy, deviceTarget } = props + const { name, input_shape } = data + const classes = useStyles(props) - const [stackData, setStackData] = React.useState(undefined); - const [tooltips, setTooltips] = React.useState(); + const [stackData, setStackData] = React.useState< + CallStackTableData | undefined + >(undefined) + const [tooltips, setTooltips] = React.useState() React.useEffect(() => { - api.defaultApi.operationStackGet(run, worker, span, groupBy, name, input_shape).then((resp) => { - setTooltips(resp.metadata.tooltips); - setStackData(resp.data); - }); - }, [name, input_shape, run, worker, span, groupBy]); + api.defaultApi + .operationStackGet(run, worker, span, groupBy, name, input_shape) + .then((resp) => { + setTooltips(resp.metadata.tooltips) + setStackData(resp.data) + }) + }, [name, input_shape, run, worker, span, groupBy]) - const transformedData = React.useMemo(() => stackData && transformTableData(attachId(stackData)), [stackData]); + const transformedData = React.useMemo( + () => stackData && transformTableData(attachId(stackData)), + [stackData] + ) const columns = React.useMemo( - () => transformedData && getCommonOperationColumns(transformedData, deviceTarget, undefined, tooltips, classes), + () => + transformedData && + getCommonOperationColumns(transformedData, deviceTarget, undefined, tooltips, classes), [transformedData] - ); + ) - const expandIconColumnIndex = columns?.length; + const expandIconColumnIndex = columns?.length const expandable: TableProps['expandable'] = React.useMemo( () => ({ expandIconColumnIndex, expandIcon, expandedRowRender, - rowExpandable, + rowExpandable }), [expandIconColumnIndex] - ); + ) return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/ExpandIcon.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/ExpandIcon.tsx index 422bb781630c24c6dc4915c3aed8c1f341dba363..68ff482827679d9c51c1ca0178b256dc5ae39581 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/ExpandIcon.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/ExpandIcon.tsx @@ -2,34 +2,33 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { Button, TableProps } from 'antd'; -import { OperationTableDataInner, CallStackTableDataInner } from '../../api'; -import { Arguments } from '../../utils/type'; +import * as React from 'react' +import { Button, TableProps } from 'antd' +import { OperationTableDataInner, CallStackTableDataInner } from '../../api' +import { Arguments } from '../../utils/type' -type Types = NonNullable['expandable']>['expandIcon']; -type BasePropType = Arguments>>[0]; -type PropType = BasePropType & { text: string; disabled?: boolean }; +type Types = NonNullable['expandable']>['expandIcon'] +type BasePropType = Arguments>>[0] +type PropType = BasePropType & { text: string; disabled?: boolean } -export function ExpandIcon( - props: PropType -): React.JSX.Element { - const onClick = (e: React.MouseEvent): void => { - props.onExpand(props.record, e); - }; +export function ExpandIcon< + T extends OperationTableDataInner | CallStackTableDataInner +>(props: PropType) { + const onClick = (e: React.MouseEvent) => { + props.onExpand(props.record, e) + } return ( - - ); + ) } -export function makeExpandIcon( - text: string, - disabled?: (v: T) => boolean -) { - return (props: BasePropType): React.JSX.Element => ( +export function makeExpandIcon< + T extends OperationTableDataInner | CallStackTableDataInner +>(text: string, disabled?: (v: T) => boolean) { + return (props: BasePropType) => ( - ); + ) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/MemoryStatsTable.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/MemoryStatsTable.tsx index c7e1809a3c0b58297ca99066243cf7d65fbe4c8c..0b33ab4167ba11e9bb610d7ebc0717def2addda2 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/MemoryStatsTable.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/MemoryStatsTable.tsx @@ -2,76 +2,84 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { Table } from 'antd'; -import { makeStyles } from '@material-ui/core'; +import * as React from 'react' +import { Table } from 'antd' +import { makeStyles } from '@material-ui/core' export interface IProps { - data: any; - sort: string; + data: any + sort: string } const useStyles = makeStyles((theme) => ({ tooltip: { - whiteSpace: 'pre-wrap', - }, -})); + whiteSpace: 'pre-wrap' + } +})) -const getMemoryStatsTableColumns = function (columns: any, sort: string, tooltipClass: string): any { - let i = 0; - return columns.map((col: any) => { - const key = `col${i++}`; - const stringCompare = (a: any, b: any): number => a[key].localeCompare(b[key]); - const numberCompare = (a: any, b: any): number => (a[key] || 0) - (b[key] || 0); +const getMemoryStatsTableColumns = function ( + columns: any, + sort: string, + tooltipClass: string +) { + let i = 0 + return columns.map(function (col: any) { + const key = 'col' + i++ + const stringCompare = (a: any, b: any) => a[key].localeCompare(b[key]) + const numberCompare = (a: any, b: any) => (a[key] || 0) - (b[key] || 0) return { dataIndex: key, key: key, title: col.name, - sorter: col.type === 'string' ? stringCompare : numberCompare, - defaultSortOrder: sort === col.name ? ('descend' as const) : undefined, - showSorterTooltip: col.tooltip ? { title: col.tooltip, overlayClassName: tooltipClass } : true, - }; - }); -}; + sorter: col.type == 'string' ? stringCompare : numberCompare, + defaultSortOrder: sort == col.name ? ('descend' as const) : undefined, + showSorterTooltip: col.tooltip + ? { title: col.tooltip, overlayClassName: tooltipClass } + : true + } + }) +} -const getMemoryStatsTableRows = function (rows: any): any { - return rows.map((row: any) => { - let i = 0; - const res: any = {}; - row.forEach((entry: any) => { - res[`col${i++}`] = entry; - }); - return res; - }); -}; +const getMemoryStatsTableRows = function (rows: any) { + return rows.map(function (row: any) { + let i = 0 + const res: any = {} + row.forEach(function (entry: any) { + res['col' + i++] = entry + }) + return res + }) +} -export const MemoryStatsTable = (props: IProps): React.JSX.Element => { - const { data, sort } = props; - const classes = useStyles(); +export const MemoryStatsTable = (props: IProps) => { + const { data, sort } = props + const classes = useStyles() - const rows = React.useMemo(() => getMemoryStatsTableRows(data.rows), [data.rows]); + const rows = React.useMemo(() => getMemoryStatsTableRows(data.rows), [ + data.rows + ]) const columns = React.useMemo( () => getMemoryStatsTableColumns(data.columns, sort, classes.tooltip), [data.columns, sort, classes.tooltip] - ); + ) - const [pageSize, setPageSize] = React.useState(30); - const onShowSizeChange = (current: number, size: number): void => { - setPageSize(size); - }; + const [pageSize, setPageSize] = React.useState(30) + const onShowSizeChange = (current: number, size: number) => { + setPageSize(size) + } return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/NavToCodeButton.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/NavToCodeButton.tsx index 2c999aa12a49726aad12321f260b31b6f331eda2..fb40e7f38bf5ccbe89851b5fe2d0b684af71239a 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/NavToCodeButton.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/NavToCodeButton.tsx @@ -2,28 +2,28 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { CallStackFrame } from './transform'; -import { Button } from 'antd'; -import { navToCode } from '../../utils/vscode'; +import * as React from 'react' +import { CallStackFrame } from './transform' +import { Button } from 'antd' +import { navToCode } from '../../utils/vscode' interface IProps { - frame: CallStackFrame; + frame: CallStackFrame } -export const NavToCodeButton = (props: IProps): React.JSX.Element => { - const { raw, line, file } = props.frame; - const couldNavToFile = line && file; +export const NavToCodeButton = (props: IProps) => { + const { raw, line, file } = props.frame + const couldNavToFile = line && file - const onClick = (): void => { + const onClick = () => { if (line && file) { - navToCode(file, line - 1); + navToCode(file, line - 1) } - }; + } return ( - - ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/OperationTable.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/OperationTable.tsx index 1ce77ee817967ee69961ccd8c91dbc3b0357bed7..799b8497a04cce30dfc248b380bf477eab85909a 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/OperationTable.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/OperationTable.tsx @@ -15,55 +15,62 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { makeStyles } from '@material-ui/core/styles'; -import { OperationTableData, OperationTableDataInner, TableMetadata } from '../../api'; -import { OperationGroupBy } from '../../constants/groupBy'; -import { attachId, getCommonOperationColumns } from './common'; -import { Table, TableProps } from 'antd'; -import { makeExpandIcon } from './ExpandIcon'; -import { CallStackTable } from './CallStackTable'; +import * as React from 'react' +import { makeStyles } from '@material-ui/core/styles' +import { + OperationTableData, + OperationTableDataInner, + TableMetadata +} from '../../api' +import { OperationGroupBy } from '../../constants/groupBy' +import { attachId, getCommonOperationColumns } from './common' +import { Table, TablePaginationConfig, TableProps } from 'antd' +import { makeExpandIcon } from './ExpandIcon' +import { CallStackTable } from './CallStackTable' export interface IProps { - data: OperationTableData; - run: string; - worker: string; - span: string; - groupBy: OperationGroupBy; - sortColumn: string; - tooltips?: any; - deviceTarget: string; + data: OperationTableData + run: string + worker: string + span: string + groupBy: OperationGroupBy + sortColumn: string + tooltips?: any + deviceTarget: string } const useStyles = makeStyles((theme) => ({ tooltip: { - whiteSpace: 'pre-wrap', - }, -})); + whiteSpace: 'pre-wrap' + } +})) -const rowExpandable = (record: OperationTableDataInner): boolean => record.has_call_stack; -const expandIcon = makeExpandIcon('View CallStack', (record) => !record.has_call_stack); -export const OperationTable = (props: IProps): React.JSX.Element => { - const { data, run, worker, span, groupBy, sortColumn, tooltips, deviceTarget } = props; - const classes = useStyles(props); +const rowExpandable = (record: OperationTableDataInner) => record.has_call_stack +const expandIcon = makeExpandIcon( + 'View CallStack', + (record) => !record.has_call_stack +) +export const OperationTable = (props: IProps) => { + const { data, run, worker, span, groupBy, sortColumn, tooltips, deviceTarget } = props + const classes = useStyles(props) - const rows = React.useMemo(() => attachId(data), [data]); + const rows = React.useMemo(() => attachId(data), [data]) const columns = React.useMemo( () => getCommonOperationColumns(rows, deviceTarget, sortColumn, tooltips, classes), [rows] - ); + ) - const [pageSize, setPageSize] = React.useState(30); - const onShowSizeChange = (current: number, size: number): void => { - setPageSize(size); - }; + const [pageSize, setPageSize] = React.useState(30) + const onShowSizeChange = (current: number, size: number) => { + setPageSize(size) + } - const expandIconColumnIndex = columns.length; + const expandIconColumnIndex = columns.length const expandedRowRender = React.useCallback( (record: OperationTableDataInner) => ( { /> ), [run, worker, span, groupBy] - ); + ) const expandable: TableProps['expandable'] = React.useMemo( () => ({ expandIconColumnIndex, expandIcon, expandedRowRender, - rowExpandable, + rowExpandable }), [expandIconColumnIndex, expandedRowRender] - ); + ) return (
- ); -}; + ) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/common.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/common.tsx index a84a1a3bb3ff96fd5df257af51bdcd302dc318e2..a6f1770e7424539d916c01abef122808291d86a6 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/common.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/common.tsx @@ -15,136 +15,147 @@ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. - * + * * Modifications: Add visualization of PyTorch Ascend profiling. *--------------------------------------------------------------------------------------------*/ -import { firstOrUndefined, isDef } from '../../utils/def'; -import { CallStackTableDataInner, OperationTableDataInner } from '../../api'; -import type { ColumnsType } from 'antd/es/table'; -import { ClassNameMap } from '@material-ui/styles'; +import { firstOrUndefined, isDef } from '../../utils/def' +import { CallStackTableDataInner, OperationTableDataInner } from '../../api' +import type { ColumnsType } from 'antd/es/table' +import { ClassNameMap } from '@material-ui/styles' -export function getCommonOperationColumns( - data?: T[], +export function getCommonOperationColumns< + T extends OperationTableDataInner | CallStackTableDataInner +>( + data: T[] | undefined, deviceTarget?: string, defaultSort?: string, tooltips?: any, classes?: ClassNameMap<'tooltip'> ): ColumnsType { - const firstData = firstOrUndefined(data); + const firstData = firstOrUndefined(data) - const hasInputShape = !firstData || isDef(firstData.input_shape); - const hasDeviceSelfDuration = !firstData || isDef(firstData.device_self_duration); - const hasDeviceTotalDuration = !firstData || isDef(firstData.device_total_duration); - const hasTcEligible = !firstData || isDef(firstData.tc_eligible); - const hasTcSelfRatio = !firstData || isDef(firstData.tc_self_ratio); - const hasTcTotalRatio = !firstData || isDef(firstData.tc_total_ratio); + const hasInputShape = !firstData || isDef(firstData.input_shape) + const hasDeviceSelfDuration = + !firstData || isDef(firstData.device_self_duration) + const hasDeviceTotalDuration = + !firstData || isDef(firstData.device_total_duration) + const hasTcEligible = !firstData || isDef(firstData.tc_eligible) + const hasTcSelfRatio = !firstData || isDef(firstData.tc_self_ratio) + const hasTcTotalRatio = !firstData || isDef(firstData.tc_total_ratio) - const nameCompare = (a: T, b: T): number => a.name.localeCompare(b.name); - const callsCompare = (a: T, b: T): number => a.calls - b.calls; - const deviceSelfDurationCompare = (a: T, b: T): number => - (a.device_self_duration || 0) - (b.device_self_duration || 0); - const deviceTotalDurationCompare = (a: T, b: T): number => - (a.device_total_duration || 0) - (b.device_total_duration || 0); - const hostSelfDurationCompare = (a: T, b: T): number => (a.host_self_duration || 0) - (b.host_self_duration || 0); - const hostTotalDurationCompare = (a: T, b: T): number => (a.host_total_duration || 0) - (b.host_total_duration || 0); - const tcEligibleCompare = (a: T, b: T): number => (a.tc_eligible ?? '').localeCompare(b.tc_eligible ?? ''); - const tcSelfRatioCompare = (a: T, b: T): number => (a.tc_self_ratio || 0) - (b.tc_self_ratio || 0); - const tcTotalRatioCompare = (a: T, b: T): number => (a.tc_total_ratio || 0) - (b.tc_total_ratio || 0); + const nameCompare = (a: T, b: T) => a.name.localeCompare(b.name) + const callsCompare = (a: T, b: T) => a.calls - b.calls + const deviceSelfDurationCompare = (a: T, b: T) => + (a.device_self_duration || 0) - (b.device_self_duration || 0) + const deviceTotalDurationCompare = (a: T, b: T) => + (a.device_total_duration || 0) - (b.device_total_duration || 0) + const hostSelfDurationCompare = (a: T, b: T) => + (a.host_self_duration || 0) - (b.host_self_duration || 0) + const hostTotalDurationCompare = (a: T, b: T) => + (a.host_total_duration || 0) - (b.host_total_duration || 0) + const tcEligibleCompare = (a: T, b: T) => + a.tc_eligible!.localeCompare(b.tc_eligible!) + const tcSelfRatioCompare = (a: T, b: T) => + (a.tc_self_ratio || 0) - (b.tc_self_ratio || 0) + const tcTotalRatioCompare = (a: T, b: T) => + (a.tc_total_ratio || 0) - (b.tc_total_ratio || 0) const columns: ColumnsType = [ { dataIndex: 'name', key: 'name', title: 'Name', - sorter: nameCompare, + sorter: nameCompare }, hasInputShape ? { - dataIndex: 'input_shape', - key: 'input_shape', - title: 'Input Shape', - } + dataIndex: 'input_shape', + key: 'input_shape', + title: 'Input Shape' + } : undefined, { dataIndex: 'calls', sorter: callsCompare, key: 'calls', - title: 'Calls', + title: 'Calls' }, hasDeviceSelfDuration ? { - dataIndex: 'device_self_duration', - key: 'device_self_duration', - title: 'Device Self Duration (us)', - sorter: deviceSelfDurationCompare, - // Use device_self_duration as default sort if defaultSort is unspecified - defaultSortOrder: defaultSort ? undefined : ('descend' as const), - } + dataIndex: 'device_self_duration', + key: 'device_self_duration', + title: 'Device Self Duration (us)', + sorter: deviceSelfDurationCompare, + // Use device_self_duration as default sort if defaultSort is unspecified + defaultSortOrder: defaultSort ? undefined : ('descend' as const) + } : undefined, hasDeviceTotalDuration ? { - dataIndex: 'device_total_duration', - key: 'device_total_duration', - title: 'Device Total Duration (us)', - sorter: deviceTotalDurationCompare, - } + dataIndex: 'device_total_duration', + key: 'device_total_duration', + title: 'Device Total Duration (us)', + sorter: deviceTotalDurationCompare + } : undefined, { dataIndex: 'host_self_duration', key: 'host_self_duration', title: 'Host Self Duration (us)', - sorter: hostSelfDurationCompare, + sorter: hostSelfDurationCompare }, { dataIndex: 'host_total_duration', key: 'host_total_duration', title: 'Host Total Duration (us)', - sorter: hostTotalDurationCompare, + sorter: hostTotalDurationCompare }, hasTcEligible ? { - dataIndex: 'tc_eligible', - key: 'tc_eligible', - title: deviceTarget === 'Ascend' ? 'AI Cores Eligible' : 'Tensor Cores Eligible', - sorter: tcEligibleCompare, - } + dataIndex: 'tc_eligible', + key: 'tc_eligible', + title: deviceTarget === 'Ascend' ? 'AI Cores Eligible' : 'Tensor Cores Eligible', + sorter: tcEligibleCompare + } : undefined, hasTcSelfRatio ? { - dataIndex: 'tc_self_ratio', - key: 'tc_self_ratio', - title: deviceTarget === 'Ascend' ? 'AI Cores Self(%)' : 'Tensor Cores Self(%)', - sorter: tcSelfRatioCompare, - } + dataIndex: 'tc_self_ratio', + key: 'tc_self_ratio', + title: deviceTarget === 'Ascend' ? 'AI Cores Self(%)' : 'Tensor Cores Self(%)', + sorter: tcSelfRatioCompare + } : undefined, hasTcTotalRatio ? { - dataIndex: 'tc_total_ratio', - key: 'tc_total_ratio', - title: deviceTarget === 'Ascend' ? 'AI Cores Total(%)' : 'Tensor Cores Total(%)', - sorter: tcTotalRatioCompare, - } - : undefined, - ].filter(isDef); + dataIndex: 'tc_total_ratio', + key: 'tc_total_ratio', + title: deviceTarget === 'Ascend' ? 'AI Cores Total(%)' : 'Tensor Cores Total(%)', + sorter: tcTotalRatioCompare + } + : undefined + ].filter(isDef) columns.forEach((column) => { - if (column.key === defaultSort) { - column.defaultSortOrder = 'descend' as const; + if (column.key == defaultSort) { + column.defaultSortOrder = 'descend' as const } if (tooltips[column.key as string]) { column.showSorterTooltip = { title: tooltips[column.key as string], - overlayClassName: classes?.tooltip, - }; + overlayClassName: classes?.tooltip + } } - }); - return columns; + }) + return columns } -let uid = 1; -export function attachId(data: T[]): T[] { +let uid = 1 +export function attachId< + T extends CallStackTableDataInner | OperationTableDataInner +>(data: T[]): T[] { return data.map((d) => ({ ...d, - key: uid++, - })); + key: uid++ + })) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/transform.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/transform.ts index 5f59728feb30ef6d3230c3eec9803b08cdd72779..bd051fd429d5cb26a44a59b60f776b207a861d64 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/transform.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/tables/transform.ts @@ -2,49 +2,49 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { CallStackTableData, CallStackTableDataInner } from '../../api'; +import { CallStackTableData, CallStackTableDataInner } from '../../api' export interface CallStackFrame { - file?: string; - line?: number; - raw: string; + file?: string + line?: number + raw: string } export interface TransformedCallStackDataInner extends CallStackTableDataInner { - callStackFrames: CallStackFrame[]; + callStackFrames: CallStackFrame[] } -const lineRegex = /\([0-9]+\)$/; +const lineRegex = /\([0-9]+\)$/ function parseCallStackLine(raw: string): CallStackFrame { - let rawResult = raw.trim(); - const results = rawResult.split(':'); - const location = results.slice(0, results.length - 1).join(':'); + raw = raw.trim() + const results = raw.split(':') + const location = results.slice(0, results.length - 1).join(':') - const result = lineRegex.exec(location); + const result = lineRegex.exec(location) if (!result) { - return { raw: rawResult }; + return { raw } } - const lineWithParens = result[0].trim(); - const file = rawResult.slice(0, result.index).trim(); + const lineWithParens = result[0].trim() + const file = raw.slice(0, result.index).trim() const line = Number( lineWithParens.substr(1, lineWithParens.length - 2).trim() - ); + ) return { - raw: rawResult, + raw, file, - line, - }; + line + } } -function parseCallStack(callStack?: string): CallStackFrame[] { +function parseCallStack(callStack: string | undefined): CallStackFrame[] { const lines = (callStack ?? '') .trim() .split(';') - .map((x) => x.trim()); - return lines.map(parseCallStackLine); + .map((x) => x.trim()) + return lines.map(parseCallStackLine) } function transformCallStackData( @@ -52,12 +52,12 @@ function transformCallStackData( ): TransformedCallStackDataInner { return { ...data, - callStackFrames: parseCallStack(data.call_stack), - }; + callStackFrames: parseCallStack(data.call_stack) + } } export function transformTableData( data: CallStackTableData ): TransformedCallStackDataInner[] { - return data.map(transformCallStackData); + return data.map(transformCallStackData) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/transform.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/transform.ts index 94ee9f384ebde3a3ddb057c88fc42beb69b0c908..08dcb25a20daf1868cc4ff2ea6245f444330b93f 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/components/transform.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/components/transform.ts @@ -2,82 +2,81 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as api from '../api'; -import { assertDef, isDef } from '../utils/def'; +import * as api from '../api' +import { assertDef, isDef } from '../utils/def' -export function transformPerformanceIntoTable(performances: api.Performance[]): api.Graph { +export function transformPerformanceIntoTable( + performances: api.Performance[] +): api.Graph { const columns: api.GraphColumn[] = [ { type: 'string', name: 'Category' }, { type: 'number', name: 'Time Duration (us)' }, - { type: 'number', name: 'Percentage (%)' }, - ]; + { type: 'number', name: 'Percentage (%)' } + ] - const rows: api.Graph['rows'] = []; - const queue = [...performances]; + const rows: api.Graph['rows'] = [] + const queue = [...performances] while (queue.length) { - const first = queue.shift(); - assertDef(first); + const first = queue.shift() + assertDef(first) - const row: api.Graph['rows'][number] = []; - const { name, value, extra, children } = first; - assertDef(value); - assertDef(extra); + const row: api.Graph['rows'][number] = [] + const { name, value, extra, children } = first + assertDef(value) + assertDef(extra) - row.push(name); - row.push(value); - row.push(extra); + row.push(name) + row.push(value) + row.push(extra) if (isDef(children) && children.length) { - queue.push(...children); + queue.push(...children) } - rows.push(row); + rows.push(row) } return { columns, - rows, - }; + rows + } } -export function transformPerformanceIntoPie(performances: api.Performance[]): { - columns: api.GraphColumn[]; - rows: Array>; -} { +export function transformPerformanceIntoPie(performances: api.Performance[]) { const columns: api.GraphColumn[] = [ { type: 'string', name: 'Name' }, - { type: 'number', name: 'Value' }, - ]; + { type: 'number', name: 'Value' } + ] - const rows: api.Graph['rows'] = []; - const queue: api.Performance[] = []; + const rows: api.Graph['rows'] = [] + const queue: api.Performance[] = [] performances.forEach((topLevel) => { if (topLevel.children) { - queue.push(...topLevel.children); + queue.push(...topLevel.children) } - }); + }) while (queue.length) { - const first = queue.shift(); - assertDef(first); + const first = queue.shift() + assertDef(first) - const row: api.Graph['rows'][number] = []; - const { name, value, children } = first; - assertDef(value); + const row: api.Graph['rows'][number] = [] + const { name, value, children } = first + assertDef(value) - row.push(name); - row.push(Number.parseInt(value, 10)); + row.push(name) + row.push(Number.parseInt(value, 10)) if (isDef(children) && children.length) { - queue.push(...children); + queue.push(...children) } - rows.push(row); + rows.push(row) } return { columns, - rows, - }; + rows + } } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/constants/groupBy.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/constants/groupBy.ts index 88ea9e3f42adfecd2a829384cc78b7ddc88d11aa..2b96c6b8dd3a0f1127f2617b72934d65c89f01f0 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/constants/groupBy.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/constants/groupBy.ts @@ -3,11 +3,11 @@ *--------------------------------------------------------------------------------------------*/ export enum OperationGroupBy { - OPERATION = 'Operation', - OPERATION_AND_INPUT_SHAPE = 'OperationAndInputShape', + Operation = 'Operation', + OperationAndInputShape = 'OperationAndInputShape' } export enum KernelGroupBy { - KERNEL = 'Kernel', - KERNEL_NAME_AND_OP_NAME = 'KernelNameAndOpName', + Kernel = 'Kernel', + KernelNameAndOpName = 'KernelNameAndOpName' } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/gstatic.d.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/gstatic.d.ts index 521c5fbb8d985136529d8233f8a65dffb8acca95..646255c2cdc20595fc0166b8cd5ce4743549bd2c 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/gstatic.d.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/gstatic.d.ts @@ -2,5 +2,5 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -declare const google: any; -declare module 'react-flame-graph'; +declare const google: any +declare module 'react-flame-graph' diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/index.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/index.tsx index 851474766de5d9adee682e66ed752c85ffd6d4bf..224f37a5fd066414815caf9e83b15298364fd2bd 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/index.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/index.tsx @@ -2,9 +2,9 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { render } from 'react-dom'; -import { App } from './app'; -import 'antd/dist/antd.css'; +import * as React from 'react' +import { render } from 'react-dom' +import { App } from './app' +import 'antd/dist/antd.css' -render(, document.getElementById('app')); +render(, document.getElementById('app')) diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/setup.tsx b/plugins/tensorboard-plugins/tb_plugin/fe/src/setup.tsx index c811ae1524ec7cc6f82410e8aeb999f2ea22476b..5db44e8243119c7988ef33007e2eb3134fe6e857 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/setup.tsx +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/setup.tsx @@ -2,8 +2,8 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -export async function setup(): Promise { +export async function setup() { await google.charts.load('current', { - packages: ['corechart', 'table', 'timeline'], - }); + packages: ['corechart', 'table', 'timeline'] + }) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/binarysearch.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/binarysearch.ts index 41382dcdb7acc8cb9e2b1b4f856e1855fb7ed88f..0477cac74d0b0d6836b53f18689891feb2f10cea 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/binarysearch.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/binarysearch.ts @@ -1,20 +1,20 @@ export function binarySearch( arr: Array, key: any, - compareFn: (key: number, mid: Array) => number + compare_fn: Function ): number { - let low = 0; - let high = arr.length - 1; + let low = 0, + high = arr.length - 1 while (low <= high) { - let mid = Math.round((high + low) / 2); - let cmp = compareFn(key, arr[mid]); + let mid = Math.round((high + low) / 2) + let cmp = compare_fn(key, arr[mid]) if (cmp > 0) { - low = mid + 1; + low = mid + 1 } else if (cmp < 0) { - high = mid - 1; + high = mid - 1 } else { - return mid; + return mid } } - return -1; + return -1 } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/debounce.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/debounce.ts index 82c7f04a98b788ab2c7c7647c292f163b8a92783..fcd6368e6ac9e971c85267fe5e6ccc9781235c9e 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/debounce.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/debounce.ts @@ -2,20 +2,20 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; +import * as React from 'react' export function useDebounce(value: T, delay: number): T { - const [debouncedValue, setDebouncedValue] = React.useState(value); + const [debouncedValue, setDebouncedValue] = React.useState(value) React.useEffect(() => { const handler = setTimeout(() => { - setDebouncedValue(value); - }, delay); + setDebouncedValue(value) + }, delay) return () => { - clearTimeout(handler); - }; - }, [value, delay]); + clearTimeout(handler) + } + }, [value, delay]) - return debouncedValue; + return debouncedValue } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/def.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/def.ts index df6bef8eab076d13c0785902127f46a472ff9fa6..c024293a54e18e543c331226c317713f829c5c10 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/def.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/def.ts @@ -2,19 +2,17 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -export function isDef(v?: T | null): v is T { - return v !== null && v !== undefined; +export function isDef(v: T | undefined | null): v is T { + return v !== null && v !== undefined } -export function assertDef(v?: T | null): asserts v is T { +export function assertDef(v: T | undefined | null): asserts v is T { if (!isDef(v)) { - throw new Error('Must be defined'); + throw new Error('Must be defined') } } -export function firstOrUndefined(v?: T[]): T | undefined { - if (!v || !v.length) { - return undefined; - } - return v[0]; +export function firstOrUndefined(v: T[] | undefined): T | undefined { + if (!v || !v.length) return undefined + return v[0] } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/hooks.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/hooks.ts index 473b393d9fa270438be85a7b528d78107c5f87f5..d8dd3eff536eb5e22683debe4338e785fe630616 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/hooks.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/hooks.ts @@ -2,26 +2,26 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; +import * as React from 'react' -const cbs: Array<() => void> = []; -export const useOnResize = (cb: () => void): void => { +const cbs: (() => void)[] = [] +export const useOnResize = (cb: () => void) => { React.useEffect(() => { if (cbs.length === 0) { window.addEventListener('resize', () => { - cbs.forEach((callback) => callback()); - }); + cbs.forEach((cb) => cb()) + }) } - cbs.push(cb); + cbs.push(cb) - return (): void => { - const idx = cbs.findIndex(cb); + return () => { + const idx = cbs.findIndex(cb) if (idx > -1) { - cbs.splice(idx, 1); + cbs.splice(idx, 1) } if (cbs.length === 0) { - window.removeEventListener('reset', cb); + window.removeEventListener('reset', cb) } - }; - }, [cb]); -}; + } + }, [cb]) +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/index.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/index.ts index 5da446721e9d1cac3729d8aea03bca2615031f41..1c7074b4c2002c40dc0b3f2f3da88d9a2b783a5f 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/index.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/index.ts @@ -2,23 +2,23 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import { ValueAndFormat } from '../api'; +import { ValueAndFormat } from '../api' -export function firstOrUndefined(v?: T[] | null): T | undefined { - if (!v || !v.length) { - return undefined; - } - return v[0]; +export function firstOrUndefined(v: T[] | undefined | null): T | undefined { + if (!v || !v.length) return undefined + return v[0] } -export function sleep(delay: number): Promise { - return new Promise((resolve) => setTimeout(resolve, delay)); +export function sleep(delay: number) { + return new Promise((resolve) => setTimeout(resolve, delay)) } export function isValueAndFormat(v: any): v is ValueAndFormat { - return 'f' in v && 'v' in v; + return 'f' in v && 'v' in v } -export function value(v: boolean | number | string | ValueAndFormat): boolean | number | string { - return typeof v === 'object' && isValueAndFormat(v) ? v.v : v; +export function value( + v: boolean | number | string | ValueAndFormat +): boolean | number | string { + return typeof v === 'object' && isValueAndFormat(v) ? v.v : v } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/resize.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/resize.ts index 766a10d54143fecd637b1d0dff33db17f22bee0d..57ab394042651fcddb7a48cfa158647d2e6b9faa 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/resize.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/resize.ts @@ -2,26 +2,26 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import debounce from '@material-ui/core/utils/debounce'; +import * as React from 'react' +import debounce from '@material-ui/core/utils/debounce' -export function useResizeEventDependency(): readonly [number] { - const [version, setVersion] = React.useState(0); +export function useResizeEventDependency() { + const [version, setVersion] = React.useState(0) const increaseVersion = React.useCallback( debounce(() => { - setVersion((prev) => prev + 1); + setVersion((prev) => prev + 1) }, 100), [] - ); + ) React.useEffect(() => { - window.addEventListener('resize', increaseVersion); + window.addEventListener('resize', increaseVersion) - return (): void => { - window.removeEventListener('resize', increaseVersion); - }; - }, []); + return () => { + window.removeEventListener('resize', increaseVersion) + } + }, []) - return [version] as const; + return [version] as const } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/search.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/search.ts index 8a2efc36ddf505aee50171affd722bd5ef0a5b86..36689758752625b6c249c5fd532d93c9e5fbafb4 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/search.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/search.ts @@ -2,67 +2,65 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import * as React from 'react'; -import { value } from '.'; -import * as api from '../api'; -import { useDebounce } from './debounce'; +import * as React from 'react' +import { value } from '.' +import * as api from '../api' +import { useDebounce } from './debounce' export function useSearch( searchName: string, columnName: string, - table?: api.Graph + table: api.Graph | undefined ): [api.Graph | undefined] { - const searchNameDebounce = useDebounce(searchName.trim(), 500); + const searchNameDebounce = useDebounce(searchName.trim(), 500) const searchedTable: api.Graph | undefined = React.useMemo(() => { if (!searchNameDebounce) { - return table; + return table } if (!table) { - return undefined; + return undefined } - const columnNameToFind = columnName.toLowerCase(); + const columnNameToFind = columnName.toLowerCase() const nameColumnIdx = table.columns.findIndex( (c) => c.name.toLowerCase() === columnNameToFind - ); + ) if (nameColumnIdx < 0) { - return table; + return table } return { ...table, rows: table.rows.filter((x) => { - const cell = value(x[nameColumnIdx]); - return typeof cell === 'string' && cell.includes(searchNameDebounce); - }), - }; - }, [table, searchNameDebounce]); - return [searchedTable]; + const cell = value(x[nameColumnIdx]) + return typeof cell === 'string' && cell.includes(searchNameDebounce) + }) + } + }, [table, searchNameDebounce]) + return [searchedTable] } export function useSearchDirectly( searchName: string, field: (v: T) => string, - table?: T[] + table: T[] | undefined ): [T[] | undefined] { - const searchNameDebounce = useDebounce(searchName.trim(), 500); + const searchNameDebounce = useDebounce(searchName.trim(), 500) const result = React.useMemo(() => { if (!searchNameDebounce) { - return table; + return table } if (!table) { - return undefined; + return undefined } return table.filter((row) => { - return field(row) - .toLowerCase() - .includes(searchNameDebounce.toLowerCase()); - }); - }, [table, field, searchNameDebounce]); - return [result]; + return field(row).toLowerCase().includes(searchNameDebounce.toLowerCase()) + }) + }, [table, field, searchNameDebounce]) + return [result] } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/top.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/top.ts index 4af19968d637d6c13bf64caa94f09fff104f6091..87bd3c1b86f763a63dbf195ee5feaf649d56e006 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/top.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/top.ts @@ -2,53 +2,49 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -import debounce from '@material-ui/core/utils/debounce'; -import * as React from 'react'; +import debounce from '@material-ui/core/utils/debounce' +import * as React from 'react' export enum UseTop { - NOT_USE = 'NotUse', - USE = 'Use', + NotUse = 'NotUse', + Use = 'Use' } interface IOptions { - defaultTop?: number; - defaultUseTop?: UseTop; - noDebounce?: boolean; - wait?: number; + defaultTop?: number + defaultUseTop?: UseTop + noDebounce?: boolean + wait?: number } -export function useTopN( - options?: IOptions -): readonly [ - string, - number | undefined, - UseTop, - React.Dispatch>, - React.Dispatch> -] { - let realOptions = options ?? {}; - - const [topText, setTopText] = React.useState(String(realOptions.defaultTop ?? 15)); - const [actualTop, setActualTop] = React.useState(Number(topText)); - const [useTop, setUseTop] = React.useState(realOptions.defaultUseTop ?? UseTop.NOT_USE); - - const setActualDebounce = !realOptions.noDebounce - ? React.useCallback(debounce(setActualTop, realOptions.wait ?? 500), []) - : setActualTop; +export function useTopN(options?: IOptions) { + options ??= {} + + const [topText, setTopText] = React.useState(String(options.defaultTop ?? 15)) + const [actualTop, setActualTop] = React.useState( + Number(topText) + ) + const [useTop, setUseTop] = React.useState( + options.defaultUseTop ?? UseTop.NotUse + ) + + const setActualDebounce = !options.noDebounce + ? React.useCallback(debounce(setActualTop, options.wait ?? 500), []) + : setActualTop React.useEffect(() => { - if (useTop !== UseTop.USE) { - setActualDebounce(undefined); + if (useTop !== UseTop.Use) { + setActualDebounce(undefined) } else if (topIsValid(topText)) { - setActualDebounce(Number(topText)); + setActualDebounce(Number(topText)) } else { - setActualDebounce(actualTop); + setActualDebounce(actualTop) } - }, [topText, useTop]); + }, [topText, useTop]) - return [topText, actualTop, useTop, setTopText, setUseTop] as const; + return [topText, actualTop, useTop, setTopText, setUseTop] as const } -export function topIsValid(topText: string): boolean { - const top = Number(topText); - return !Number.isNaN(top) && top > 0 && Number.isInteger(top); +export function topIsValid(topText: string) { + const top = Number(topText) + return !Number.isNaN(top) && top > 0 && Number.isInteger(top) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/type.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/type.ts index ccd45fd16e11043abe40a4235a7b39a5d18afcdd..fde74bc598b930f26dd8a83157c91953da2c045c 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/type.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/type.ts @@ -6,4 +6,4 @@ export type Arguments void> = T extends ( ...args: infer A ) => void ? A - : never; + : never diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/vscode.ts b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/vscode.ts index 2a763adca54ef3eba96837aa111df627e3f8b116..62f1a90809548691f3b7b7a89d71ac65e4bf622b 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/vscode.ts +++ b/plugins/tensorboard-plugins/tb_plugin/fe/src/utils/vscode.ts @@ -2,12 +2,12 @@ * Copyright (c) Microsoft Corporation. All rights reserved. *--------------------------------------------------------------------------------------------*/ -export function navToCode(filename: string, line: number): void { +export function navToCode(filename: string, line: number) { window.parent.parent.postMessage( { filename, - line, + line }, - window.origin - ); + '*' + ) } diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/update-static.js b/plugins/tensorboard-plugins/tb_plugin/fe/update-static.js index 67c9be6ccc266ca2470705ad7bb990e550769e96..9923c216781c4cfd3505bdc4cb99a736b1bc61a1 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/update-static.js +++ b/plugins/tensorboard-plugins/tb_plugin/fe/update-static.js @@ -1,7 +1,7 @@ -const fs = require('fs'); -const path = require('path'); +const fs = require('fs') +const path = require('path') fs.copyFileSync( path.resolve(__dirname, 'dist/index.html'), path.resolve(__dirname, '../torch_tb_profiler/static/index.html') -); +) diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/webpack.config.js b/plugins/tensorboard-plugins/tb_plugin/fe/webpack.config.js index a47f8b319e83a9c96c80c11afe5adf09e308fbfa..70541ae9cff81eccfd33a8edd2b2a8424edf5a4b 100644 --- a/plugins/tensorboard-plugins/tb_plugin/fe/webpack.config.js +++ b/plugins/tensorboard-plugins/tb_plugin/fe/webpack.config.js @@ -1,8 +1,8 @@ -const path = require('path'); -const HtmlWebpackPlugin = require('html-webpack-plugin'); -const InlineChunkHtmlPlugin = require('inline-chunk-html-plugin'); +const path = require('path') +const HtmlWebpackPlugin = require('html-webpack-plugin') +const InlineChunkHtmlPlugin = require('inline-chunk-html-plugin') -const isDev = process.env.NODE_ENV !== 'production'; +const isDev = process.env.NODE_ENV !== 'production' /** * @type {import('webpack').Configuration & import('webpack-dev-server').Configuration} @@ -12,25 +12,25 @@ module.exports = { entry: './src/index.tsx', output: { path: path.resolve(__dirname, 'dist'), - filename: 'index.js', + filename: 'index.js' }, resolve: { // Add `.ts` and `.tsx` as a resolvable extension. - extensions: ['.ts', '.tsx', '.js'], + extensions: ['.ts', '.tsx', '.js'] }, module: { rules: [ { test: /\.tsx?$/i, use: 'ts-loader' }, - { test: /\.css$/i, use: ['style-loader', 'css-loader'] }, - ], + { test: /\.css$/i, use: ['style-loader', 'css-loader'] } + ] }, plugins: [ new HtmlWebpackPlugin({ inject: true, scriptLoading: 'blocking', - template: 'index.html', + template: 'index.html' }), - !isDev ? new InlineChunkHtmlPlugin(HtmlWebpackPlugin, [/.*/]) : undefined, + !isDev ? new InlineChunkHtmlPlugin(HtmlWebpackPlugin, [/.*/]) : undefined ].filter(Boolean), - devServer: {}, -}; + devServer: {} +} diff --git a/plugins/tensorboard-plugins/tb_plugin/fe/yarn.lock b/plugins/tensorboard-plugins/tb_plugin/fe/yarn.lock new file mode 100644 index 0000000000000000000000000000000000000000..3e914db864c7654443e9041cfc1899ea2ac30bb1 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_plugin/fe/yarn.lock @@ -0,0 +1,3672 @@ +# THIS IS AN AUTOGENERATED FILE. DO NOT EDIT THIS FILE DIRECTLY. +# yarn lockfile v1 + + +"@ant-design/colors@^6.0.0": + version "6.0.0" + resolved "https://registry.yarnpkg.com/@ant-design/colors/-/colors-6.0.0.tgz#9b9366257cffcc47db42b9d0203bb592c13c0298" + integrity sha512-qAZRvPzfdWHtfameEGP2Qvuf838NhergR35o+EuVyB5XvSA98xod5r4utvi4TJ3ywmevm290g9nsCG5MryrdWQ== + dependencies: + "@ctrl/tinycolor" "^3.4.0" + +"@ant-design/icons-svg@^4.2.1": + version "4.2.1" + resolved "https://registry.yarnpkg.com/@ant-design/icons-svg/-/icons-svg-4.2.1.tgz#8630da8eb4471a4aabdaed7d1ff6a97dcb2cf05a" + integrity sha512-EB0iwlKDGpG93hW8f85CTJTs4SvMX7tt5ceupvhALp1IF44SeUFOMhKUOYqpsoYWQKAOuTRDMqn75rEaKDp0Xw== + +"@ant-design/icons@^4.7.0": + version "4.7.0" + resolved "https://registry.yarnpkg.com/@ant-design/icons/-/icons-4.7.0.tgz#8c3cbe0a556ba92af5dc7d1e70c0b25b5179af0f" + integrity sha512-aoB4Z7JA431rt6d4u+8xcNPPCrdufSRMUOpxa1ab6mz1JCQZOEVolj2WVs/tDFmN62zzK30mNelEsprLYsSF3g== + dependencies: + "@ant-design/colors" "^6.0.0" + "@ant-design/icons-svg" "^4.2.1" + "@babel/runtime" "^7.11.2" + classnames "^2.2.6" + rc-util "^5.9.4" + +"@ant-design/react-slick@~0.28.1": + version "0.28.4" + resolved "https://registry.yarnpkg.com/@ant-design/react-slick/-/react-slick-0.28.4.tgz#8b296b87ad7c7ae877f2a527b81b7eebd9dd29a9" + integrity sha512-j9eAHTn7GxbXUFNknJoHS2ceAsqrQi2j8XykjZE1IXCD8kJF+t28EvhBLniDpbOsBk/3kjalnhriTfZcjBHNqg== + dependencies: + "@babel/runtime" "^7.10.4" + classnames "^2.2.5" + json2mq "^0.2.0" + lodash "^4.17.21" + resize-observer-polyfill "^1.5.0" + +"@babel/runtime@^7.0.0", "@babel/runtime@^7.10.1", "@babel/runtime@^7.10.2", "@babel/runtime@^7.10.4", "@babel/runtime@^7.11.1", "@babel/runtime@^7.11.2", "@babel/runtime@^7.12.5", "@babel/runtime@^7.13.10", "@babel/runtime@^7.3.1", "@babel/runtime@^7.4.4", "@babel/runtime@^7.5.5", "@babel/runtime@^7.8.3", "@babel/runtime@^7.8.4", "@babel/runtime@^7.8.7": + version "7.17.2" + resolved "https://registry.yarnpkg.com/@babel/runtime/-/runtime-7.17.2.tgz#66f68591605e59da47523c631416b18508779941" + integrity sha512-hzeyJyMA1YGdJTuWU0e/j4wKXrU4OMFvY2MSlaI9B7VQb0r5cxTE3EAIS2Q7Tn2RIcDkRvTA/v2JsAEhxe99uw== + dependencies: + regenerator-runtime "^0.13.4" + +"@ctrl/tinycolor@^3.4.0": + version "3.4.0" + resolved "https://registry.yarnpkg.com/@ctrl/tinycolor/-/tinycolor-3.4.0.tgz#c3c5ae543c897caa9c2a68630bed355be5f9990f" + integrity sha512-JZButFdZ1+/xAfpguQHoabIXkcqRRKpMrWKBkpEZZyxfY9C1DpADFB8PEqGSTeFr135SaTRfKqGKx5xSCLI7ZQ== + +"@discoveryjs/json-ext@^0.5.0": + version "0.5.6" + resolved "https://registry.yarnpkg.com/@discoveryjs/json-ext/-/json-ext-0.5.6.tgz#d5e0706cf8c6acd8c6032f8d54070af261bbbb2f" + integrity sha512-ws57AidsDvREKrZKYffXddNkyaF14iHNHm8VQnZH6t99E8gczjNN0GpvcGny0imC80yQ0tHz1xVUKk/KFQSUyA== + +"@emotion/hash@^0.8.0": + version "0.8.0" + resolved "https://registry.yarnpkg.com/@emotion/hash/-/hash-0.8.0.tgz#bbbff68978fefdbe68ccb533bc8cbe1d1afb5413" + integrity sha512-kBJtf7PH6aWwZ6fka3zQ0p6SBYzx4fl1LoZXE2RrnYST9Xljm7WfKJrU4g/Xr3Beg72MLrp1AWNUmuYJTL7Cow== + +"@material-ui/core@^4.11.3": + version "4.12.3" + resolved "https://registry.yarnpkg.com/@material-ui/core/-/core-4.12.3.tgz#80d665caf0f1f034e52355c5450c0e38b099d3ca" + integrity sha512-sdpgI/PL56QVsEJldwEe4FFaFTLUqN+rd7sSZiRCdx2E/C7z5yK0y/khAWVBH24tXwto7I1hCzNWfJGZIYJKnw== + dependencies: + "@babel/runtime" "^7.4.4" + "@material-ui/styles" "^4.11.4" + "@material-ui/system" "^4.12.1" + "@material-ui/types" "5.1.0" + "@material-ui/utils" "^4.11.2" + "@types/react-transition-group" "^4.2.0" + clsx "^1.0.4" + hoist-non-react-statics "^3.3.2" + popper.js "1.16.1-lts" + prop-types "^15.7.2" + react-is "^16.8.0 || ^17.0.0" + react-transition-group "^4.4.0" + +"@material-ui/icons@^4.11.2": + version "4.11.2" + resolved "https://registry.yarnpkg.com/@material-ui/icons/-/icons-4.11.2.tgz#b3a7353266519cd743b6461ae9fdfcb1b25eb4c5" + integrity sha512-fQNsKX2TxBmqIGJCSi3tGTO/gZ+eJgWmMJkgDiOfyNaunNaxcklJQFaFogYcFl0qFuaEz1qaXYXboa/bUXVSOQ== + dependencies: + "@babel/runtime" "^7.4.4" + +"@material-ui/styles@^4.11.4": + version "4.11.4" + resolved "https://registry.yarnpkg.com/@material-ui/styles/-/styles-4.11.4.tgz#eb9dfccfcc2d208243d986457dff025497afa00d" + integrity sha512-KNTIZcnj/zprG5LW0Sao7zw+yG3O35pviHzejMdcSGCdWbiO8qzRgOYL8JAxAsWBKOKYwVZxXtHWaB5T2Kvxew== + dependencies: + "@babel/runtime" "^7.4.4" + "@emotion/hash" "^0.8.0" + "@material-ui/types" "5.1.0" + "@material-ui/utils" "^4.11.2" + clsx "^1.0.4" + csstype "^2.5.2" + hoist-non-react-statics "^3.3.2" + jss "^10.5.1" + jss-plugin-camel-case "^10.5.1" + jss-plugin-default-unit "^10.5.1" + jss-plugin-global "^10.5.1" + jss-plugin-nested "^10.5.1" + jss-plugin-props-sort "^10.5.1" + jss-plugin-rule-value-function "^10.5.1" + jss-plugin-vendor-prefixer "^10.5.1" + prop-types "^15.7.2" + +"@material-ui/system@^4.12.1": + version "4.12.1" + resolved "https://registry.yarnpkg.com/@material-ui/system/-/system-4.12.1.tgz#2dd96c243f8c0a331b2bb6d46efd7771a399707c" + integrity sha512-lUdzs4q9kEXZGhbN7BptyiS1rLNHe6kG9o8Y307HCvF4sQxbCgpL2qi+gUk+yI8a2DNk48gISEQxoxpgph0xIw== + dependencies: + "@babel/runtime" "^7.4.4" + "@material-ui/utils" "^4.11.2" + csstype "^2.5.2" + prop-types "^15.7.2" + +"@material-ui/types@5.1.0": + version "5.1.0" + resolved "https://registry.yarnpkg.com/@material-ui/types/-/types-5.1.0.tgz#efa1c7a0b0eaa4c7c87ac0390445f0f88b0d88f2" + integrity sha512-7cqRjrY50b8QzRSYyhSpx4WRw2YuO0KKIGQEVk5J8uoz2BanawykgZGoWEqKm7pVIbzFDN0SpPcVV4IhOFkl8A== + +"@material-ui/utils@^4.11.2": + version "4.11.2" + resolved "https://registry.yarnpkg.com/@material-ui/utils/-/utils-4.11.2.tgz#f1aefa7e7dff2ebcb97d31de51aecab1bb57540a" + integrity sha512-Uul8w38u+PICe2Fg2pDKCaIG7kOyhowZ9vjiC1FsVwPABTW8vPPKfF6OvxRq3IiBaI1faOJmgdvMG7rMJARBhA== + dependencies: + "@babel/runtime" "^7.4.4" + prop-types "^15.7.2" + react-is "^16.8.0 || ^17.0.0" + +"@nodelib/fs.scandir@2.1.5": + version "2.1.5" + resolved "https://registry.yarnpkg.com/@nodelib/fs.scandir/-/fs.scandir-2.1.5.tgz#7619c2eb21b25483f6d167548b4cfd5a7488c3d5" + integrity sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g== + dependencies: + "@nodelib/fs.stat" "2.0.5" + run-parallel "^1.1.9" + +"@nodelib/fs.stat@2.0.5", "@nodelib/fs.stat@^2.0.2": + version "2.0.5" + resolved "https://registry.yarnpkg.com/@nodelib/fs.stat/-/fs.stat-2.0.5.tgz#5bd262af94e9d25bd1e71b05deed44876a222e8b" + integrity sha512-RkhPPp2zrqDAQA/2jNhnztcPAlv64XdhIp7a7454A5ovI7Bukxgt7MX7udwAu3zg1DcpPU0rz3VV1SeaqvY4+A== + +"@nodelib/fs.walk@^1.2.3": + version "1.2.8" + resolved "https://registry.yarnpkg.com/@nodelib/fs.walk/-/fs.walk-1.2.8.tgz#e95737e8bb6746ddedf69c556953494f196fe69a" + integrity sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg== + dependencies: + "@nodelib/fs.scandir" "2.1.5" + fastq "^1.6.0" + +"@types/body-parser@*": + version "1.19.2" + resolved "https://registry.yarnpkg.com/@types/body-parser/-/body-parser-1.19.2.tgz#aea2059e28b7658639081347ac4fab3de166e6f0" + integrity sha512-ALYone6pm6QmwZoAgeyNksccT9Q4AWZQ6PvfwR37GT6r6FWUPguq6sUmNGSMV2Wr761oQoBxwGGa6DR5o1DC9g== + dependencies: + "@types/connect" "*" + "@types/node" "*" + +"@types/bonjour@^3.5.9": + version "3.5.10" + resolved "https://registry.yarnpkg.com/@types/bonjour/-/bonjour-3.5.10.tgz#0f6aadfe00ea414edc86f5d106357cda9701e275" + integrity sha512-p7ienRMiS41Nu2/igbJxxLDWrSZ0WxM8UQgCeO9KhoVF7cOVFkrKsiDr1EsJIla8vV3oEEjGcz11jc5yimhzZw== + dependencies: + "@types/node" "*" + +"@types/connect-history-api-fallback@^1.3.5": + version "1.3.5" + resolved "https://registry.yarnpkg.com/@types/connect-history-api-fallback/-/connect-history-api-fallback-1.3.5.tgz#d1f7a8a09d0ed5a57aee5ae9c18ab9b803205dae" + integrity sha512-h8QJa8xSb1WD4fpKBDcATDNGXghFj6/3GRWG6dhmRcu0RX1Ubasur2Uvx5aeEwlf0MwblEC2bMzzMQntxnw/Cw== + dependencies: + "@types/express-serve-static-core" "*" + "@types/node" "*" + +"@types/connect@*": + version "3.4.35" + resolved "https://registry.yarnpkg.com/@types/connect/-/connect-3.4.35.tgz#5fcf6ae445e4021d1fc2219a4873cc73a3bb2ad1" + integrity sha512-cdeYyv4KWoEgpBISTxWvqYsVy444DOqehiF3fM3ne10AmJ62RSyNkUnxMJXHQWRQQX2eR94m5y1IZyDwBjV9FQ== + dependencies: + "@types/node" "*" + +"@types/eslint-scope@^3.7.3": + version "3.7.3" + resolved "https://registry.yarnpkg.com/@types/eslint-scope/-/eslint-scope-3.7.3.tgz#125b88504b61e3c8bc6f870882003253005c3224" + integrity sha512-PB3ldyrcnAicT35TWPs5IcwKD8S333HMaa2VVv4+wdvebJkjWuW/xESoB8IwRcog8HYVYamb1g/R31Qv5Bx03g== + dependencies: + "@types/eslint" "*" + "@types/estree" "*" + +"@types/eslint@*": + version "8.4.1" + resolved "https://registry.yarnpkg.com/@types/eslint/-/eslint-8.4.1.tgz#c48251553e8759db9e656de3efc846954ac32304" + integrity sha512-GE44+DNEyxxh2Kc6ro/VkIj+9ma0pO0bwv9+uHSyBrikYOHr8zYcdPvnBOp1aw8s+CjRvuSx7CyWqRrNFQ59mA== + dependencies: + "@types/estree" "*" + "@types/json-schema" "*" + +"@types/estree@*", "@types/estree@^0.0.51": + version "0.0.51" + resolved "https://registry.yarnpkg.com/@types/estree/-/estree-0.0.51.tgz#cfd70924a25a3fd32b218e5e420e6897e1ac4f40" + integrity sha512-CuPgU6f3eT/XgKKPqKd/gLZV1Xmvf1a2R5POBOGQa6uv82xpls89HU5zKeVoyR8XzHd1RGNOlQlvUe3CFkjWNQ== + +"@types/express-serve-static-core@*", "@types/express-serve-static-core@^4.17.18": + version "4.17.28" + resolved "https://registry.yarnpkg.com/@types/express-serve-static-core/-/express-serve-static-core-4.17.28.tgz#c47def9f34ec81dc6328d0b1b5303d1ec98d86b8" + integrity sha512-P1BJAEAW3E2DJUlkgq4tOL3RyMunoWXqbSCygWo5ZIWTjUgN1YnaXWW4VWl/oc8vs/XoYibEGBKP0uZyF4AHig== + dependencies: + "@types/node" "*" + "@types/qs" "*" + "@types/range-parser" "*" + +"@types/express@*", "@types/express@^4.17.13": + version "4.17.13" + resolved "https://registry.yarnpkg.com/@types/express/-/express-4.17.13.tgz#a76e2995728999bab51a33fabce1d705a3709034" + integrity sha512-6bSZTPaTIACxn48l50SR+axgrqm6qXFIxrdAKaG6PaJk3+zuUr35hBlgT7vOmJcum+OEaIBLtHV/qloEAFITeA== + dependencies: + "@types/body-parser" "*" + "@types/express-serve-static-core" "^4.17.18" + "@types/qs" "*" + "@types/serve-static" "*" + +"@types/html-minifier-terser@^6.0.0": + version "6.1.0" + resolved "https://registry.yarnpkg.com/@types/html-minifier-terser/-/html-minifier-terser-6.1.0.tgz#4fc33a00c1d0c16987b1a20cf92d20614c55ac35" + integrity sha512-oh/6byDPnL1zeNXFrDXFLyZjkr1MsBG667IM792caf1L2UPOOMf65NFzjUH/ltyfwjAGfs1rsX1eftK0jC/KIg== + +"@types/http-proxy@^1.17.8": + version "1.17.8" + resolved "https://registry.yarnpkg.com/@types/http-proxy/-/http-proxy-1.17.8.tgz#968c66903e7e42b483608030ee85800f22d03f55" + integrity sha512-5kPLG5BKpWYkw/LVOGWpiq3nEVqxiN32rTgI53Sk12/xHFQ2rG3ehI9IO+O3W2QoKeyB92dJkoka8SUm6BX1pA== + dependencies: + "@types/node" "*" + +"@types/json-schema@*", "@types/json-schema@^7.0.8", "@types/json-schema@^7.0.9": + version "7.0.9" + resolved "https://registry.yarnpkg.com/@types/json-schema/-/json-schema-7.0.9.tgz#97edc9037ea0c38585320b28964dde3b39e4660d" + integrity sha512-qcUXuemtEu+E5wZSJHNxUXeCZhAfXKQ41D+duX+VYPde7xyEVZci+/oXKJL13tnRs9lR2pr4fod59GT6/X1/yQ== + +"@types/mime@^1": + version "1.3.2" + resolved "https://registry.yarnpkg.com/@types/mime/-/mime-1.3.2.tgz#93e25bf9ee75fe0fd80b594bc4feb0e862111b5a" + integrity sha512-YATxVxgRqNH6nHEIsvg6k2Boc1JHI9ZbH5iWFFv/MTkchz3b1ieGDa5T0a9RznNdI0KhVbdbWSN+KWWrQZRxTw== + +"@types/node@*": + version "17.0.21" + resolved "https://registry.yarnpkg.com/@types/node/-/node-17.0.21.tgz#864b987c0c68d07b4345845c3e63b75edd143644" + integrity sha512-DBZCJbhII3r90XbQxI8Y9IjjiiOGlZ0Hr32omXIZvwwZ7p4DMMXGrKXVyPfuoBOri9XNtL0UK69jYIBIsRX3QQ== + +"@types/prop-types@*": + version "15.7.4" + resolved "https://registry.yarnpkg.com/@types/prop-types/-/prop-types-15.7.4.tgz#fcf7205c25dff795ee79af1e30da2c9790808f11" + integrity sha512-rZ5drC/jWjrArrS8BR6SIr4cWpW09RNTYt9AMZo3Jwwif+iacXAqgVjm0B0Bv/S1jhDXKHqRVNCbACkJ89RAnQ== + +"@types/qs@*": + version "6.9.7" + resolved "https://registry.yarnpkg.com/@types/qs/-/qs-6.9.7.tgz#63bb7d067db107cc1e457c303bc25d511febf6cb" + integrity sha512-FGa1F62FT09qcrueBA6qYTrJPVDzah9a+493+o2PCXsesWHIn27G98TsSMs3WPNbZIEj4+VJf6saSFpvD+3Zsw== + +"@types/range-parser@*": + version "1.2.4" + resolved "https://registry.yarnpkg.com/@types/range-parser/-/range-parser-1.2.4.tgz#cd667bcfdd025213aafb7ca5915a932590acdcdc" + integrity sha512-EEhsLsD6UsDM1yFhAvy0Cjr6VwmpMWqFBCb9w07wVugF7w9nfajxLuVmngTIpgS6svCnm6Vaw+MZhoDCKnOfsw== + +"@types/react-dom@^16.9.8": + version "16.9.14" + resolved "https://registry.yarnpkg.com/@types/react-dom/-/react-dom-16.9.14.tgz#674b8f116645fe5266b40b525777fc6bb8eb3bcd" + integrity sha512-FIX2AVmPTGP30OUJ+0vadeIFJJ07Mh1m+U0rxfgyW34p3rTlXI+nlenvAxNn4BP36YyI9IJ/+UJ7Wu22N1pI7A== + dependencies: + "@types/react" "^16" + +"@types/react-transition-group@^4.2.0": + version "4.4.4" + resolved "https://registry.yarnpkg.com/@types/react-transition-group/-/react-transition-group-4.4.4.tgz#acd4cceaa2be6b757db61ed7b432e103242d163e" + integrity sha512-7gAPz7anVK5xzbeQW9wFBDg7G++aPLAFY0QaSMOou9rJZpbuI58WAuJrgu+qR92l61grlnCUe7AFX8KGahAgug== + dependencies: + "@types/react" "*" + +"@types/react@*": + version "17.0.39" + resolved "https://registry.yarnpkg.com/@types/react/-/react-17.0.39.tgz#d0f4cde092502a6db00a1cded6e6bf2abb7633ce" + integrity sha512-UVavlfAxDd/AgAacMa60Azl7ygyQNRwC/DsHZmKgNvPmRR5p70AJ5Q9EAmL2NWOJmeV+vVUI4IAP7GZrN8h8Ug== + dependencies: + "@types/prop-types" "*" + "@types/scheduler" "*" + csstype "^3.0.2" + +"@types/react@^16", "@types/react@^16.9.51": + version "16.14.23" + resolved "https://registry.yarnpkg.com/@types/react/-/react-16.14.23.tgz#37201b9f2324c5ff8fa4600dbf19079dfdffc880" + integrity sha512-WngBZLuSkP4IAgPi0HOsGCHo6dn3CcuLQnCfC17VbA7YBgipZiZoTOhObwl/93DsFW0Y2a/ZXeonpW4DxirEJg== + dependencies: + "@types/prop-types" "*" + "@types/scheduler" "*" + csstype "^3.0.2" + +"@types/retry@^0.12.0": + version "0.12.1" + resolved "https://registry.yarnpkg.com/@types/retry/-/retry-0.12.1.tgz#d8f1c0d0dc23afad6dc16a9e993a0865774b4065" + integrity sha512-xoDlM2S4ortawSWORYqsdU+2rxdh4LRW9ytc3zmT37RIKQh6IHyKwwtKhKis9ah8ol07DCkZxPt8BBvPjC6v4g== + +"@types/scheduler@*": + version "0.16.2" + resolved "https://registry.yarnpkg.com/@types/scheduler/-/scheduler-0.16.2.tgz#1a62f89525723dde24ba1b01b092bf5df8ad4d39" + integrity sha512-hppQEBDmlwhFAXKJX2KnWLYu5yMfi91yazPb2l+lbJiwW+wdo1gNeRA+3RgNSO39WYX2euey41KEwnqesU2Jew== + +"@types/serve-index@^1.9.1": + version "1.9.1" + resolved "https://registry.yarnpkg.com/@types/serve-index/-/serve-index-1.9.1.tgz#1b5e85370a192c01ec6cec4735cf2917337a6278" + integrity sha512-d/Hs3nWDxNL2xAczmOVZNj92YZCS6RGxfBPjKzuu/XirCgXdpKEb88dYNbrYGint6IVWLNP+yonwVAuRC0T2Dg== + dependencies: + "@types/express" "*" + +"@types/serve-static@*": + version "1.13.10" + resolved "https://registry.yarnpkg.com/@types/serve-static/-/serve-static-1.13.10.tgz#f5e0ce8797d2d7cc5ebeda48a52c96c4fa47a8d9" + integrity sha512-nCkHGI4w7ZgAdNkrEu0bv+4xNV/XDqW+DydknebMOQwkpDGx8G+HTlj7R7ABI8i8nKxVw0wtKPi1D+lPOkh4YQ== + dependencies: + "@types/mime" "^1" + "@types/node" "*" + +"@types/sockjs@^0.3.33": + version "0.3.33" + resolved "https://registry.yarnpkg.com/@types/sockjs/-/sockjs-0.3.33.tgz#570d3a0b99ac995360e3136fd6045113b1bd236f" + integrity sha512-f0KEEe05NvUnat+boPTZ0dgaLZ4SfSouXUgv5noUiefG2ajgKjmETo9ZJyuqsl7dfl2aHlLJUiki6B4ZYldiiw== + dependencies: + "@types/node" "*" + +"@types/ws@^8.2.2": + version "8.5.2" + resolved "https://registry.yarnpkg.com/@types/ws/-/ws-8.5.2.tgz#77e0c2e360e9579da930ffcfa53c5975ea3bdd26" + integrity sha512-VXI82ykONr5tacHEojnErTQk+KQSoYbW1NB6iz6wUwrNd+BqfkfggQNoNdCqhJSzbNumShPERbM+Pc5zpfhlbw== + dependencies: + "@types/node" "*" + +"@webassemblyjs/ast@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/ast/-/ast-1.11.1.tgz#2bfd767eae1a6996f432ff7e8d7fc75679c0b6a7" + integrity sha512-ukBh14qFLjxTQNTXocdyksN5QdM28S1CxHt2rdskFyL+xFV7VremuBLVbmCePj+URalXBENx/9Lm7lnhihtCSw== + dependencies: + "@webassemblyjs/helper-numbers" "1.11.1" + "@webassemblyjs/helper-wasm-bytecode" "1.11.1" + +"@webassemblyjs/floating-point-hex-parser@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/floating-point-hex-parser/-/floating-point-hex-parser-1.11.1.tgz#f6c61a705f0fd7a6aecaa4e8198f23d9dc179e4f" + integrity sha512-iGRfyc5Bq+NnNuX8b5hwBrRjzf0ocrJPI6GWFodBFzmFnyvrQ83SHKhmilCU/8Jv67i4GJZBMhEzltxzcNagtQ== + +"@webassemblyjs/helper-api-error@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-api-error/-/helper-api-error-1.11.1.tgz#1a63192d8788e5c012800ba6a7a46c705288fd16" + integrity sha512-RlhS8CBCXfRUR/cwo2ho9bkheSXG0+NwooXcc3PAILALf2QLdFyj7KGsKRbVc95hZnhnERon4kW/D3SZpp6Tcg== + +"@webassemblyjs/helper-buffer@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-buffer/-/helper-buffer-1.11.1.tgz#832a900eb444884cde9a7cad467f81500f5e5ab5" + integrity sha512-gwikF65aDNeeXa8JxXa2BAk+REjSyhrNC9ZwdT0f8jc4dQQeDQ7G4m0f2QCLPJiMTTO6wfDmRmj/pW0PsUvIcA== + +"@webassemblyjs/helper-numbers@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-numbers/-/helper-numbers-1.11.1.tgz#64d81da219fbbba1e3bd1bfc74f6e8c4e10a62ae" + integrity sha512-vDkbxiB8zfnPdNK9Rajcey5C0w+QJugEglN0of+kmO8l7lDb77AnlKYQF7aarZuCrv+l0UvqL+68gSDr3k9LPQ== + dependencies: + "@webassemblyjs/floating-point-hex-parser" "1.11.1" + "@webassemblyjs/helper-api-error" "1.11.1" + "@xtuc/long" "4.2.2" + +"@webassemblyjs/helper-wasm-bytecode@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-wasm-bytecode/-/helper-wasm-bytecode-1.11.1.tgz#f328241e41e7b199d0b20c18e88429c4433295e1" + integrity sha512-PvpoOGiJwXeTrSf/qfudJhwlvDQxFgelbMqtq52WWiXC6Xgg1IREdngmPN3bs4RoO83PnL/nFrxucXj1+BX62Q== + +"@webassemblyjs/helper-wasm-section@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-wasm-section/-/helper-wasm-section-1.11.1.tgz#21ee065a7b635f319e738f0dd73bfbda281c097a" + integrity sha512-10P9No29rYX1j7F3EVPX3JvGPQPae+AomuSTPiF9eBQeChHI6iqjMIwR9JmOJXwpnn/oVGDk7I5IlskuMwU/pg== + dependencies: + "@webassemblyjs/ast" "1.11.1" + "@webassemblyjs/helper-buffer" "1.11.1" + "@webassemblyjs/helper-wasm-bytecode" "1.11.1" + "@webassemblyjs/wasm-gen" "1.11.1" + +"@webassemblyjs/ieee754@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/ieee754/-/ieee754-1.11.1.tgz#963929e9bbd05709e7e12243a099180812992614" + integrity sha512-hJ87QIPtAMKbFq6CGTkZYJivEwZDbQUgYd3qKSadTNOhVY7p+gfP6Sr0lLRVTaG1JjFj+r3YchoqRYxNH3M0GQ== + dependencies: + "@xtuc/ieee754" "^1.2.0" + +"@webassemblyjs/leb128@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/leb128/-/leb128-1.11.1.tgz#ce814b45574e93d76bae1fb2644ab9cdd9527aa5" + integrity sha512-BJ2P0hNZ0u+Th1YZXJpzW6miwqQUGcIHT1G/sf72gLVD9DZ5AdYTqPNbHZh6K1M5VmKvFXwGSWZADz+qBWxeRw== + dependencies: + "@xtuc/long" "4.2.2" + +"@webassemblyjs/utf8@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/utf8/-/utf8-1.11.1.tgz#d1f8b764369e7c6e6bae350e854dec9a59f0a3ff" + integrity sha512-9kqcxAEdMhiwQkHpkNiorZzqpGrodQQ2IGrHHxCy+Ozng0ofyMA0lTqiLkVs1uzTRejX+/O0EOT7KxqVPuXosQ== + +"@webassemblyjs/wasm-edit@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-edit/-/wasm-edit-1.11.1.tgz#ad206ebf4bf95a058ce9880a8c092c5dec8193d6" + integrity sha512-g+RsupUC1aTHfR8CDgnsVRVZFJqdkFHpsHMfJuWQzWU3tvnLC07UqHICfP+4XyL2tnr1amvl1Sdp06TnYCmVkA== + dependencies: + "@webassemblyjs/ast" "1.11.1" + "@webassemblyjs/helper-buffer" "1.11.1" + "@webassemblyjs/helper-wasm-bytecode" "1.11.1" + "@webassemblyjs/helper-wasm-section" "1.11.1" + "@webassemblyjs/wasm-gen" "1.11.1" + "@webassemblyjs/wasm-opt" "1.11.1" + "@webassemblyjs/wasm-parser" "1.11.1" + "@webassemblyjs/wast-printer" "1.11.1" + +"@webassemblyjs/wasm-gen@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-gen/-/wasm-gen-1.11.1.tgz#86c5ea304849759b7d88c47a32f4f039ae3c8f76" + integrity sha512-F7QqKXwwNlMmsulj6+O7r4mmtAlCWfO/0HdgOxSklZfQcDu0TpLiD1mRt/zF25Bk59FIjEuGAIyn5ei4yMfLhA== + dependencies: + "@webassemblyjs/ast" "1.11.1" + "@webassemblyjs/helper-wasm-bytecode" "1.11.1" + "@webassemblyjs/ieee754" "1.11.1" + "@webassemblyjs/leb128" "1.11.1" + "@webassemblyjs/utf8" "1.11.1" + +"@webassemblyjs/wasm-opt@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-opt/-/wasm-opt-1.11.1.tgz#657b4c2202f4cf3b345f8a4c6461c8c2418985f2" + integrity sha512-VqnkNqnZlU5EB64pp1l7hdm3hmQw7Vgqa0KF/KCNO9sIpI6Fk6brDEiX+iCOYrvMuBWDws0NkTOxYEb85XQHHw== + dependencies: + "@webassemblyjs/ast" "1.11.1" + "@webassemblyjs/helper-buffer" "1.11.1" + "@webassemblyjs/wasm-gen" "1.11.1" + "@webassemblyjs/wasm-parser" "1.11.1" + +"@webassemblyjs/wasm-parser@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-parser/-/wasm-parser-1.11.1.tgz#86ca734534f417e9bd3c67c7a1c75d8be41fb199" + integrity sha512-rrBujw+dJu32gYB7/Lup6UhdkPx9S9SnobZzRVL7VcBH9Bt9bCBLEuX/YXOOtBsOZ4NQrRykKhffRWHvigQvOA== + dependencies: + "@webassemblyjs/ast" "1.11.1" + "@webassemblyjs/helper-api-error" "1.11.1" + "@webassemblyjs/helper-wasm-bytecode" "1.11.1" + "@webassemblyjs/ieee754" "1.11.1" + "@webassemblyjs/leb128" "1.11.1" + "@webassemblyjs/utf8" "1.11.1" + +"@webassemblyjs/wast-printer@1.11.1": + version "1.11.1" + resolved "https://registry.yarnpkg.com/@webassemblyjs/wast-printer/-/wast-printer-1.11.1.tgz#d0c73beda8eec5426f10ae8ef55cee5e7084c2f0" + integrity sha512-IQboUWM4eKzWW+N/jij2sRatKMh99QEelo3Eb2q0qXkvPRISAj8Qxtmw5itwqK+TTkBuUIE45AxYPToqPtL5gg== + dependencies: + "@webassemblyjs/ast" "1.11.1" + "@xtuc/long" "4.2.2" + +"@webpack-cli/configtest@^1.1.1": + version "1.1.1" + resolved "https://registry.yarnpkg.com/@webpack-cli/configtest/-/configtest-1.1.1.tgz#9f53b1b7946a6efc2a749095a4f450e2932e8356" + integrity sha512-1FBc1f9G4P/AxMqIgfZgeOTuRnwZMten8E7zap5zgpPInnCrP8D4Q81+4CWIch8i/Nf7nXjP0v6CjjbHOrXhKg== + +"@webpack-cli/info@^1.4.1": + version "1.4.1" + resolved "https://registry.yarnpkg.com/@webpack-cli/info/-/info-1.4.1.tgz#2360ea1710cbbb97ff156a3f0f24556e0fc1ebea" + integrity sha512-PKVGmazEq3oAo46Q63tpMr4HipI3OPfP7LiNOEJg963RMgT0rqheag28NCML0o3GIzA3DmxP1ZIAv9oTX1CUIA== + dependencies: + envinfo "^7.7.3" + +"@webpack-cli/serve@^1.6.1": + version "1.6.1" + resolved "https://registry.yarnpkg.com/@webpack-cli/serve/-/serve-1.6.1.tgz#0de2875ac31b46b6c5bb1ae0a7d7f0ba5678dffe" + integrity sha512-gNGTiTrjEVQ0OcVnzsRSqTxaBSr+dmTfm+qJsCDluky8uhdLWep7Gcr62QsAKHTMxjCS/8nEITsmFAhfIx+QSw== + +"@xtuc/ieee754@^1.2.0": + version "1.2.0" + resolved "https://registry.yarnpkg.com/@xtuc/ieee754/-/ieee754-1.2.0.tgz#eef014a3145ae477a1cbc00cd1e552336dceb790" + integrity sha512-DX8nKgqcGwsc0eJSqYt5lwP4DH5FlHnmuWWBRy7X0NcaGR0ZtuyeESgMwTYVEtxmsNGY+qit4QYT/MIYTOTPeA== + +"@xtuc/long@4.2.2": + version "4.2.2" + resolved "https://registry.yarnpkg.com/@xtuc/long/-/long-4.2.2.tgz#d291c6a4e97989b5c61d9acf396ae4fe133a718d" + integrity sha512-NuHqBY1PB/D8xU6s/thBgOAiAP7HOYDQ32+BFZILJ8ivkUkAHQnWfn6WhL79Owj1qmUnoN/YPhktdIoucipkAQ== + +accepts@~1.3.4, accepts@~1.3.5, accepts@~1.3.8: + version "1.3.8" + resolved "https://registry.yarnpkg.com/accepts/-/accepts-1.3.8.tgz#0bf0be125b67014adcb0b0921e62db7bffe16b2e" + integrity sha512-PYAthTa2m2VKxuvSD3DPC/Gy+U+sOA1LAuT8mkmRuvw+NACSaeXEQ+NHcVF7rONl6qcaxV3Uuemwawk+7+SJLw== + dependencies: + mime-types "~2.1.34" + negotiator "0.6.3" + +acorn-import-assertions@^1.7.6: + version "1.8.0" + resolved "https://registry.yarnpkg.com/acorn-import-assertions/-/acorn-import-assertions-1.8.0.tgz#ba2b5939ce62c238db6d93d81c9b111b29b855e9" + integrity sha512-m7VZ3jwz4eK6A4Vtt8Ew1/mNbP24u0FhdyfA7fSvnJR6LMdfOYnmuIrrJAgrYfYJ10F/otaHTtrtrtmHdMNzEw== + +acorn@^8.4.1, acorn@^8.5.0: + version "8.7.0" + resolved "https://registry.yarnpkg.com/acorn/-/acorn-8.7.0.tgz#90951fde0f8f09df93549481e5fc141445b791cf" + integrity sha512-V/LGr1APy+PXIwKebEWrkZPwoeoF+w1jiOBUmuxuiUIaOHtob8Qc9BTrYo7VuI5fR8tqsy+buA2WFooR5olqvQ== + +aggregate-error@^3.0.0: + version "3.1.0" + resolved "https://registry.yarnpkg.com/aggregate-error/-/aggregate-error-3.1.0.tgz#92670ff50f5359bdb7a3e0d40d0ec30c5737687a" + integrity sha512-4I7Td01quW/RpocfNayFdFVk1qSuoh0E7JrbRJ16nH01HhKFQ88INq9Sd+nd72zqRySlr9BmDA8xlEJ6vJMrYA== + dependencies: + clean-stack "^2.0.0" + indent-string "^4.0.0" + +ajv-formats@^2.1.1: + version "2.1.1" + resolved "https://registry.yarnpkg.com/ajv-formats/-/ajv-formats-2.1.1.tgz#6e669400659eb74973bbf2e33327180a0996b520" + integrity sha512-Wx0Kx52hxE7C18hkMEggYlEifqWZtYaRgouJor+WMdPnQyEK13vgEWyVNup7SoeeoLMsr4kf5h6dOW11I15MUA== + dependencies: + ajv "^8.0.0" + +ajv-keywords@^3.5.2: + version "3.5.2" + resolved "https://registry.yarnpkg.com/ajv-keywords/-/ajv-keywords-3.5.2.tgz#31f29da5ab6e00d1c2d329acf7b5929614d5014d" + integrity sha512-5p6WTN0DdTGVQk6VjcEju19IgaHudalcfabD7yhDGeA6bcQnmL+CpveLJq/3hvfwd1aof6L386Ougkx6RfyMIQ== + +ajv-keywords@^5.0.0: + version "5.1.0" + resolved "https://registry.yarnpkg.com/ajv-keywords/-/ajv-keywords-5.1.0.tgz#69d4d385a4733cdbeab44964a1170a88f87f0e16" + integrity sha512-YCS/JNFAUyr5vAuhk1DWm1CBxRHW9LbJ2ozWeemrIqpbsqKjHVxYPyi5GC0rjZIT5JxJ3virVTS8wk4i/Z+krw== + dependencies: + fast-deep-equal "^3.1.3" + +ajv@^6.12.5: + version "6.12.6" + resolved "https://registry.yarnpkg.com/ajv/-/ajv-6.12.6.tgz#baf5a62e802b07d977034586f8c3baf5adf26df4" + integrity sha512-j3fVLgvTo527anyYyJOGTYJbG+vnnQYvE0m5mmkc1TK+nxAppkCLMIL0aZ4dblVCNoGShhm+kzE4ZUykBoMg4g== + dependencies: + fast-deep-equal "^3.1.1" + fast-json-stable-stringify "^2.0.0" + json-schema-traverse "^0.4.1" + uri-js "^4.2.2" + +ajv@^8.0.0, ajv@^8.8.0: + version "8.10.0" + resolved "https://registry.yarnpkg.com/ajv/-/ajv-8.10.0.tgz#e573f719bd3af069017e3b66538ab968d040e54d" + integrity sha512-bzqAEZOjkrUMl2afH8dknrq5KEk2SrwdBROR+vH1EKVQTqaUbJVPdc/gEdggTMM0Se+s+Ja4ju4TlNcStKl2Hw== + dependencies: + fast-deep-equal "^3.1.1" + json-schema-traverse "^1.0.0" + require-from-string "^2.0.2" + uri-js "^4.2.2" + +ansi-html-community@^0.0.8: + version "0.0.8" + resolved "https://registry.yarnpkg.com/ansi-html-community/-/ansi-html-community-0.0.8.tgz#69fbc4d6ccbe383f9736934ae34c3f8290f1bf41" + integrity sha512-1APHAyr3+PCamwNw3bXCPp4HFLONZt/yIH0sZp0/469KWNTEy+qN5jQ3GVX6DMZ1UXAi34yVwtTeaG/HpBuuzw== + +ansi-regex@^5.0.1: + version "5.0.1" + resolved "https://registry.yarnpkg.com/ansi-regex/-/ansi-regex-5.0.1.tgz#082cb2c89c9fe8659a311a53bd6a4dc5301db304" + integrity sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ== + +ansi-regex@^6.0.1: + version "6.0.1" + resolved "https://registry.yarnpkg.com/ansi-regex/-/ansi-regex-6.0.1.tgz#3183e38fae9a65d7cb5e53945cd5897d0260a06a" + integrity sha512-n5M855fKb2SsfMIiFFoVrABHJC8QtHwVx+mHWP3QcEqBHYienj5dHSgjbxtC0WEZXYt4wcD6zrQElDPhFuZgfA== + +ansi-styles@^4.1.0: + version "4.3.0" + resolved "https://registry.yarnpkg.com/ansi-styles/-/ansi-styles-4.3.0.tgz#edd803628ae71c04c85ae7a0906edad34b648937" + integrity sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg== + dependencies: + color-convert "^2.0.1" + +antd@^4.17.0: + version "4.19.0" + resolved "https://registry.yarnpkg.com/antd/-/antd-4.19.0.tgz#1c637a4d7dde091a2299260ca89f05c29fb21f80" + integrity sha512-4Kp47+zg3j1g1lWmzFstGrmlGdHzUIvxAVXxYJKJqX+iQs++QYgcK2HF+9PBpwEwP6H6VPZCsL0LqKEflke5qg== + dependencies: + "@ant-design/colors" "^6.0.0" + "@ant-design/icons" "^4.7.0" + "@ant-design/react-slick" "~0.28.1" + "@babel/runtime" "^7.12.5" + "@ctrl/tinycolor" "^3.4.0" + classnames "^2.2.6" + copy-to-clipboard "^3.2.0" + lodash "^4.17.21" + memoize-one "^6.0.0" + moment "^2.25.3" + rc-cascader "~3.2.1" + rc-checkbox "~2.3.0" + rc-collapse "~3.1.0" + rc-dialog "~8.6.0" + rc-drawer "~4.4.2" + rc-dropdown "~3.3.2" + rc-field-form "~1.23.0" + rc-image "~5.2.5" + rc-input "^0.0.1-alpha.5" + rc-input-number "~7.3.0" + rc-mentions "~1.6.1" + rc-menu "~9.2.1" + rc-motion "^2.4.4" + rc-notification "~4.5.7" + rc-pagination "~3.1.9" + rc-picker "~2.6.4" + rc-progress "~3.2.1" + rc-rate "~2.9.0" + rc-resize-observer "^1.2.0" + rc-select "~14.0.0-alpha.15" + rc-slider "~10.0.0-alpha.4" + rc-steps "~4.1.0" + rc-switch "~3.2.0" + rc-table "~7.23.0" + rc-tabs "~11.10.0" + rc-textarea "~0.3.0" + rc-tooltip "~5.1.1" + rc-tree "~5.4.3" + rc-tree-select "~5.1.1" + rc-trigger "^5.2.10" + rc-upload "~4.3.0" + rc-util "^5.14.0" + scroll-into-view-if-needed "^2.2.25" + +anymatch@~3.1.2: + version "3.1.2" + resolved "https://registry.yarnpkg.com/anymatch/-/anymatch-3.1.2.tgz#c0557c096af32f106198f4f4e2a383537e378716" + integrity sha512-P43ePfOAIupkguHUycrc4qJ9kz8ZiuOUijaETwX7THt0Y/GNK7v0aa8rY816xWjZ7rJdA5XdMcpVFTKMq+RvWg== + dependencies: + normalize-path "^3.0.0" + picomatch "^2.0.4" + +array-flatten@1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/array-flatten/-/array-flatten-1.1.1.tgz#9a5f699051b1e7073328f2a008968b64ea2955d2" + integrity sha1-ml9pkFGx5wczKPKgCJaLZOopVdI= + +array-flatten@^2.1.0: + version "2.1.2" + resolved "https://registry.yarnpkg.com/array-flatten/-/array-flatten-2.1.2.tgz#24ef80a28c1a893617e2149b0c6d0d788293b099" + integrity sha512-hNfzcOV8W4NdualtqBFPyVO+54DSJuZGY9qT4pRroB6S9e3iiido2ISIC5h9R2sPJ8H3FHCIiEnsv1lPXO3KtQ== + +array-tree-filter@^2.1.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/array-tree-filter/-/array-tree-filter-2.1.0.tgz#873ac00fec83749f255ac8dd083814b4f6329190" + integrity sha512-4ROwICNlNw/Hqa9v+rk5h22KjmzB1JGTMVKP2AKJBOCgb0yL0ASf0+YvCcLNNwquOHNX48jkeZIJ3a+oOQqKcw== + +array-union@^2.1.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/array-union/-/array-union-2.1.0.tgz#b798420adbeb1de828d84acd8a2e23d3efe85e8d" + integrity sha512-HGyxoOTYUyCM6stUe6EJgnd4EoewAI7zMdfqO+kGjnlZmBDz/cR5pf8r/cR4Wq60sL/p0IkcjUEEPwS3GFrIyw== + +async-validator@^4.0.2: + version "4.0.7" + resolved "https://registry.yarnpkg.com/async-validator/-/async-validator-4.0.7.tgz#034a0fd2103a6b2ebf010da75183bec299247afe" + integrity sha512-Pj2IR7u8hmUEDOwB++su6baaRi+QvsgajuFB9j95foM1N2gy5HM4z60hfusIO0fBPG5uLAEl6yCJr1jNSVugEQ== + +async@^2.6.2: + version "2.6.3" + resolved "https://registry.yarnpkg.com/async/-/async-2.6.3.tgz#d72625e2344a3656e3a3ad4fa749fa83299d82ff" + integrity sha512-zflvls11DCy+dQWzTW2dzuilv8Z5X/pjfmZOWba6TNIVDm+2UDaJmXSOXlasHKfNBs8oo3M0aT50fDEWfKZjXg== + dependencies: + lodash "^4.17.14" + +balanced-match@^1.0.0: + version "1.0.2" + resolved "https://registry.yarnpkg.com/balanced-match/-/balanced-match-1.0.2.tgz#e83e3a7e3f300b34cb9d87f615fa0cbf357690ee" + integrity sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw== + +batch@0.6.1: + version "0.6.1" + resolved "https://registry.yarnpkg.com/batch/-/batch-0.6.1.tgz#dc34314f4e679318093fc760272525f94bf25c16" + integrity sha1-3DQxT05nkxgJP8dgJyUl+UvyXBY= + +big.js@^5.2.2: + version "5.2.2" + resolved "https://registry.yarnpkg.com/big.js/-/big.js-5.2.2.tgz#65f0af382f578bcdc742bd9c281e9cb2d7768328" + integrity sha512-vyL2OymJxmarO8gxMr0mhChsO9QGwhynfuu4+MHTAW6czfq9humCB7rKpUjDd9YUiDPU4mzpyupFSvOClAwbmQ== + +binary-extensions@^2.0.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/binary-extensions/-/binary-extensions-2.2.0.tgz#75f502eeaf9ffde42fc98829645be4ea76bd9e2d" + integrity sha512-jDctJ/IVQbZoJykoeHbhXpOlNBqGNcwXJKJog42E5HDPUwQTSdjCHdihjj0DlnheQ7blbT6dHOafNAiS8ooQKA== + +body-parser@1.19.2: + version "1.19.2" + resolved "https://registry.yarnpkg.com/body-parser/-/body-parser-1.19.2.tgz#4714ccd9c157d44797b8b5607d72c0b89952f26e" + integrity sha512-SAAwOxgoCKMGs9uUAUFHygfLAyaniaoun6I8mFY9pRAJL9+Kec34aU+oIjDhTycub1jozEfEwx1W1IuOYxVSFw== + dependencies: + bytes "3.1.2" + content-type "~1.0.4" + debug "2.6.9" + depd "~1.1.2" + http-errors "1.8.1" + iconv-lite "0.4.24" + on-finished "~2.3.0" + qs "6.9.7" + raw-body "2.4.3" + type-is "~1.6.18" + +bonjour@^3.5.0: + version "3.5.0" + resolved "https://registry.yarnpkg.com/bonjour/-/bonjour-3.5.0.tgz#8e890a183d8ee9a2393b3844c691a42bcf7bc9f5" + integrity sha1-jokKGD2O6aI5OzhExpGkK897yfU= + dependencies: + array-flatten "^2.1.0" + deep-equal "^1.0.1" + dns-equal "^1.0.0" + dns-txt "^2.0.2" + multicast-dns "^6.0.1" + multicast-dns-service-types "^1.1.0" + +boolbase@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/boolbase/-/boolbase-1.0.0.tgz#68dff5fbe60c51eb37725ea9e3ed310dcc1e776e" + integrity sha1-aN/1++YMUes3cl6p4+0xDcwed24= + +brace-expansion@^1.1.7: + version "1.1.11" + resolved "https://registry.yarnpkg.com/brace-expansion/-/brace-expansion-1.1.11.tgz#3c7fcbf529d87226f3d2f52b966ff5271eb441dd" + integrity sha512-iCuPHDFgrHX7H2vEI/5xpz07zSHB00TpugqhmYtVmMO6518mCuRMoOYFldEBl0g187ufozdaHgWKcYFb61qGiA== + dependencies: + balanced-match "^1.0.0" + concat-map "0.0.1" + +braces@^3.0.1, braces@~3.0.2: + version "3.0.2" + resolved "https://registry.yarnpkg.com/braces/-/braces-3.0.2.tgz#3454e1a462ee8d599e236df336cd9ea4f8afe107" + integrity sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A== + dependencies: + fill-range "^7.0.1" + +browserslist@^4.14.5, browserslist@^4.16.5: + version "4.20.0" + resolved "https://registry.yarnpkg.com/browserslist/-/browserslist-4.20.0.tgz#35951e3541078c125d36df76056e94738a52ebe9" + integrity sha512-bnpOoa+DownbciXj0jVGENf8VYQnE2LNWomhYuCsMmmx9Jd9lwq0WXODuwpSsp8AVdKM2/HorrzxAfbKvWTByQ== + dependencies: + caniuse-lite "^1.0.30001313" + electron-to-chromium "^1.4.76" + escalade "^3.1.1" + node-releases "^2.0.2" + picocolors "^1.0.0" + +buffer-from@^1.0.0: + version "1.1.2" + resolved "https://registry.yarnpkg.com/buffer-from/-/buffer-from-1.1.2.tgz#2b146a6fd72e80b4f55d255f35ed59a3a9a41bd5" + integrity sha512-E+XQCRwSbaaiChtv6k6Dwgc+bx+Bs6vuKJHHl5kox/BaKbhiXzqQOwK4cO22yElGp2OCmjwVhT3HmxgyPGnJfQ== + +buffer-indexof@^1.0.0: + version "1.1.1" + resolved "https://registry.yarnpkg.com/buffer-indexof/-/buffer-indexof-1.1.1.tgz#52fabcc6a606d1a00302802648ef68f639da268c" + integrity sha512-4/rOEg86jivtPTeOUUT61jJO1Ya1TrR/OkqCSZDyq84WJh3LuuiphBYJN+fm5xufIk4XAFcEwte/8WzC8If/1g== + +bytes@3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/bytes/-/bytes-3.0.0.tgz#d32815404d689699f85a4ea4fa8755dd13a96048" + integrity sha1-0ygVQE1olpn4Wk6k+odV3ROpYEg= + +bytes@3.1.2: + version "3.1.2" + resolved "https://registry.yarnpkg.com/bytes/-/bytes-3.1.2.tgz#8b0beeb98605adf1b128fa4386403c009e0221a5" + integrity sha512-/Nf7TyzTx6S3yRJObOAV7956r8cr2+Oj8AC5dt8wSP3BQAoeX58NoHyCU8P8zGkNXStjTSi6fzO6F0pBdcYbEg== + +call-bind@^1.0.2: + version "1.0.2" + resolved "https://registry.yarnpkg.com/call-bind/-/call-bind-1.0.2.tgz#b1d4e89e688119c3c9a903ad30abb2f6a919be3c" + integrity sha512-7O+FbCihrB5WGbFYesctwmTKae6rOiIzmz1icreWJ+0aA7LJfuqhEso2T9ncpcFtzMQtzXf2QGGueWJGTYsqrA== + dependencies: + function-bind "^1.1.1" + get-intrinsic "^1.0.2" + +camel-case@^4.1.2: + version "4.1.2" + resolved "https://registry.yarnpkg.com/camel-case/-/camel-case-4.1.2.tgz#9728072a954f805228225a6deea6b38461e1bd5a" + integrity sha512-gxGWBrTT1JuMx6R+o5PTXMmUnhnVzLQ9SNutD4YqKtI6ap897t3tKECYla6gCWEkplXnlNybEkZg9GEGxKFCgw== + dependencies: + pascal-case "^3.1.2" + tslib "^2.0.3" + +caniuse-lite@^1.0.30001313: + version "1.0.30001313" + resolved "https://registry.yarnpkg.com/caniuse-lite/-/caniuse-lite-1.0.30001313.tgz#a380b079db91621e1b7120895874e2fd62ed2e2f" + integrity sha512-rI1UN0koZUiKINjysQDuRi2VeSCce3bYJNmDcj3PIKREiAmjakugBul1QSkg/fPrlULYl6oWfGg3PbgOSY9X4Q== + +chalk@^4.1.0: + version "4.1.2" + resolved "https://registry.yarnpkg.com/chalk/-/chalk-4.1.2.tgz#aac4e2b7734a740867aeb16bf02aad556a1e7a01" + integrity sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA== + dependencies: + ansi-styles "^4.1.0" + supports-color "^7.1.0" + +chokidar@^3.5.3: + version "3.5.3" + resolved "https://registry.yarnpkg.com/chokidar/-/chokidar-3.5.3.tgz#1cf37c8707b932bd1af1ae22c0432e2acd1903bd" + integrity sha512-Dr3sfKRP6oTcjf2JmUmFJfeVMvXBdegxB0iVQ5eb2V10uFJUCAS8OByZdVAyVb8xXNz3GjjTgj9kLWsZTqE6kw== + dependencies: + anymatch "~3.1.2" + braces "~3.0.2" + glob-parent "~5.1.2" + is-binary-path "~2.1.0" + is-glob "~4.0.1" + normalize-path "~3.0.0" + readdirp "~3.6.0" + optionalDependencies: + fsevents "~2.3.2" + +chrome-trace-event@^1.0.2: + version "1.0.3" + resolved "https://registry.yarnpkg.com/chrome-trace-event/-/chrome-trace-event-1.0.3.tgz#1015eced4741e15d06664a957dbbf50d041e26ac" + integrity sha512-p3KULyQg4S7NIHixdwbGX+nFHkoBiA4YQmyWtjb8XngSKV124nJmRysgAeujbUVb15vh+RvFUfCPqU7rXk+hZg== + +classnames@2.x, classnames@^2.2.1, classnames@^2.2.3, classnames@^2.2.5, classnames@^2.2.6, classnames@^2.3.1: + version "2.3.1" + resolved "https://registry.yarnpkg.com/classnames/-/classnames-2.3.1.tgz#dfcfa3891e306ec1dad105d0e88f4417b8535e8e" + integrity sha512-OlQdbZ7gLfGarSqxesMesDa5uz7KFbID8Kpq/SxIoNGDqY8lSYs0D+hhtBXhcdB3rcbXArFr7vlHheLk1voeNA== + +clean-css@^5.2.2: + version "5.2.4" + resolved "https://registry.yarnpkg.com/clean-css/-/clean-css-5.2.4.tgz#982b058f8581adb2ae062520808fb2429bd487a4" + integrity sha512-nKseG8wCzEuji/4yrgM/5cthL9oTDc5UOQyFMvW/Q53oP6gLH690o1NbuTh6Y18nujr7BxlsFuS7gXLnLzKJGg== + dependencies: + source-map "~0.6.0" + +clean-stack@^2.0.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/clean-stack/-/clean-stack-2.2.0.tgz#ee8472dbb129e727b31e8a10a427dee9dfe4008b" + integrity sha512-4diC9HaTE+KRAMWhDhrGOECgWZxoevMc5TlkObMqNSsVU62PYzXZ/SMTjzyGAFF1YusgxGcSWTEXBhp0CPwQ1A== + +clone-deep@^4.0.1: + version "4.0.1" + resolved "https://registry.yarnpkg.com/clone-deep/-/clone-deep-4.0.1.tgz#c19fd9bdbbf85942b4fd979c84dcf7d5f07c2387" + integrity sha512-neHB9xuzh/wk0dIHweyAXv2aPGZIVk3pLMe+/RNzINf17fe0OG96QroktYAUm7SM1PBnzTabaLboqqxDyMU+SQ== + dependencies: + is-plain-object "^2.0.4" + kind-of "^6.0.2" + shallow-clone "^3.0.0" + +clsx@^1.0.4, clsx@^1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/clsx/-/clsx-1.1.1.tgz#98b3134f9abbdf23b2663491ace13c5c03a73188" + integrity sha512-6/bPho624p3S2pMyvP5kKBPXnI3ufHLObBFCfgx+LkeR5lg2XYy2hqZqUf45ypD8COn2bhgGJSUE+l5dhNBieA== + +color-convert@^2.0.1: + version "2.0.1" + resolved "https://registry.yarnpkg.com/color-convert/-/color-convert-2.0.1.tgz#72d3a68d598c9bdb3af2ad1e84f21d896abd4de3" + integrity sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ== + dependencies: + color-name "~1.1.4" + +color-name@~1.1.4: + version "1.1.4" + resolved "https://registry.yarnpkg.com/color-name/-/color-name-1.1.4.tgz#c2a09a87acbde69543de6f63fa3995c826c536a2" + integrity sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA== + +colorette@^2.0.10, colorette@^2.0.14: + version "2.0.16" + resolved "https://registry.yarnpkg.com/colorette/-/colorette-2.0.16.tgz#713b9af84fdb000139f04546bd4a93f62a5085da" + integrity sha512-hUewv7oMjCp+wkBv5Rm0v87eJhq4woh5rSR+42YSQJKecCqgIqNkZ6lAlQms/BwHPJA5NKMRlpxPRv0n8HQW6g== + +commander@^2.20.0: + version "2.20.3" + resolved "https://registry.yarnpkg.com/commander/-/commander-2.20.3.tgz#fd485e84c03eb4881c20722ba48035e8531aeb33" + integrity sha512-GpVkmM8vF2vQUkj2LvZmD35JxeJOLCwJ9cUkugyk2nuhbv3+mJvpLYYt+0+USMxE+oj+ey/lJEnhZw75x/OMcQ== + +commander@^7.0.0: + version "7.2.0" + resolved "https://registry.yarnpkg.com/commander/-/commander-7.2.0.tgz#a36cb57d0b501ce108e4d20559a150a391d97ab7" + integrity sha512-QrWXB+ZQSVPmIWIhtEO9H+gwHaMGYiF5ChvoJ+K9ZGHG/sVsa6yiesAD1GC/x46sET00Xlwo1u49RVVVzvcSkw== + +commander@^8.3.0: + version "8.3.0" + resolved "https://registry.yarnpkg.com/commander/-/commander-8.3.0.tgz#4837ea1b2da67b9c616a67afbb0fafee567bca66" + integrity sha512-OkTL9umf+He2DZkUq8f8J9of7yL6RJKI24dVITBmNfZBmri9zYZQrKkuXiKhyfPSu8tUhnVBB1iKXevvnlR4Ww== + +compressible@~2.0.16: + version "2.0.18" + resolved "https://registry.yarnpkg.com/compressible/-/compressible-2.0.18.tgz#af53cca6b070d4c3c0750fbd77286a6d7cc46fba" + integrity sha512-AF3r7P5dWxL8MxyITRMlORQNaOA2IkAFaTr4k7BUumjPtRpGDTZpl0Pb1XCO6JeDCBdp126Cgs9sMxqSjgYyRg== + dependencies: + mime-db ">= 1.43.0 < 2" + +compression@^1.7.4: + version "1.7.4" + resolved "https://registry.yarnpkg.com/compression/-/compression-1.7.4.tgz#95523eff170ca57c29a0ca41e6fe131f41e5bb8f" + integrity sha512-jaSIDzP9pZVS4ZfQ+TzvtiWhdpFhE2RDHz8QJkpX9SIpLq88VueF5jJw6t+6CUQcAoA6t+x89MLrWAqpfDE8iQ== + dependencies: + accepts "~1.3.5" + bytes "3.0.0" + compressible "~2.0.16" + debug "2.6.9" + on-headers "~1.0.2" + safe-buffer "5.1.2" + vary "~1.1.2" + +compute-scroll-into-view@^1.0.17: + version "1.0.17" + resolved "https://registry.yarnpkg.com/compute-scroll-into-view/-/compute-scroll-into-view-1.0.17.tgz#6a88f18acd9d42e9cf4baa6bec7e0522607ab7ab" + integrity sha512-j4dx+Fb0URmzbwwMUrhqWM2BEWHdFGx+qZ9qqASHRPqvTYdqvWnHg0H1hIbcyLnvgnoNAVMlwkepyqM3DaIFUg== + +concat-map@0.0.1: + version "0.0.1" + resolved "https://registry.yarnpkg.com/concat-map/-/concat-map-0.0.1.tgz#d8a96bd77fd68df7793a73036a3ba0d5405d477b" + integrity sha1-2Klr13/Wjfd5OnMDajug1UBdR3s= + +connect-history-api-fallback@^1.6.0: + version "1.6.0" + resolved "https://registry.yarnpkg.com/connect-history-api-fallback/-/connect-history-api-fallback-1.6.0.tgz#8b32089359308d111115d81cad3fceab888f97bc" + integrity sha512-e54B99q/OUoH64zYYRf3HBP5z24G38h5D3qXu23JGRoigpX5Ss4r9ZnDk3g0Z8uQC2x2lPaJ+UlWBc1ZWBWdLg== + +content-disposition@0.5.4: + version "0.5.4" + resolved "https://registry.yarnpkg.com/content-disposition/-/content-disposition-0.5.4.tgz#8b82b4efac82512a02bb0b1dcec9d2c5e8eb5bfe" + integrity sha512-FveZTNuGw04cxlAiWbzi6zTAL/lhehaWbTtgluJh4/E95DqMwTmha3KZN1aAWA8cFIhHzMZUvLevkw5Rqk+tSQ== + dependencies: + safe-buffer "5.2.1" + +content-type@~1.0.4: + version "1.0.4" + resolved "https://registry.yarnpkg.com/content-type/-/content-type-1.0.4.tgz#e138cc75e040c727b1966fe5e5f8c9aee256fe3b" + integrity sha512-hIP3EEPs8tB9AT1L+NUqtwOAps4mk2Zob89MWXMHjHWg9milF/j4osnnQLXBCBFBk/tvIG/tUc9mOUJiPBhPXA== + +cookie-signature@1.0.6: + version "1.0.6" + resolved "https://registry.yarnpkg.com/cookie-signature/-/cookie-signature-1.0.6.tgz#e303a882b342cc3ee8ca513a79999734dab3ae2c" + integrity sha1-4wOogrNCzD7oylE6eZmXNNqzriw= + +cookie@0.4.2: + version "0.4.2" + resolved "https://registry.yarnpkg.com/cookie/-/cookie-0.4.2.tgz#0e41f24de5ecf317947c82fc789e06a884824432" + integrity sha512-aSWTXFzaKWkvHO1Ny/s+ePFpvKsPnjc551iI41v3ny/ow6tBG5Vd+FuqGNhh1LxOmVzOlGUriIlOaokOvhaStA== + +copy-to-clipboard@^3.2.0: + version "3.3.1" + resolved "https://registry.yarnpkg.com/copy-to-clipboard/-/copy-to-clipboard-3.3.1.tgz#115aa1a9998ffab6196f93076ad6da3b913662ae" + integrity sha512-i13qo6kIHTTpCm8/Wup+0b1mVWETvu2kIMzKoK8FpkLkFxlt0znUAHcMzox+T8sPlqtZXq3CulEjQHsYiGFJUw== + dependencies: + toggle-selection "^1.0.6" + +core-util-is@~1.0.0: + version "1.0.3" + resolved "https://registry.yarnpkg.com/core-util-is/-/core-util-is-1.0.3.tgz#a6042d3634c2b27e9328f837b965fac83808db85" + integrity sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ== + +cross-env@^7.0.2: + version "7.0.3" + resolved "https://registry.yarnpkg.com/cross-env/-/cross-env-7.0.3.tgz#865264b29677dc015ba8418918965dd232fc54cf" + integrity sha512-+/HKd6EgcQCJGh2PSjZuUitQBQynKor4wrFbRg4DtAgS1aWO+gU52xpH7M9ScGgXSYmAVS9bIJ8EzuaGw0oNAw== + dependencies: + cross-spawn "^7.0.1" + +cross-spawn@^7.0.1, cross-spawn@^7.0.3: + version "7.0.3" + resolved "https://registry.yarnpkg.com/cross-spawn/-/cross-spawn-7.0.3.tgz#f73a85b9d5d41d045551c177e2882d4ac85728a6" + integrity sha512-iRDPJKUPVEND7dHPO8rkbOnPpyDygcDFtWjpeWNCgy8WP2rXcxXL8TskReQl6OrB2G7+UJrags1q15Fudc7G6w== + dependencies: + path-key "^3.1.0" + shebang-command "^2.0.0" + which "^2.0.1" + +css-loader@^5.2.4: + version "5.2.7" + resolved "https://registry.yarnpkg.com/css-loader/-/css-loader-5.2.7.tgz#9b9f111edf6fb2be5dc62525644cbc9c232064ae" + integrity sha512-Q7mOvpBNBG7YrVGMxRxcBJZFL75o+cH2abNASdibkj/fffYD8qWbInZrD0S9ccI6vZclF3DsHE7njGlLtaHbhg== + dependencies: + icss-utils "^5.1.0" + loader-utils "^2.0.0" + postcss "^8.2.15" + postcss-modules-extract-imports "^3.0.0" + postcss-modules-local-by-default "^4.0.0" + postcss-modules-scope "^3.0.0" + postcss-modules-values "^4.0.0" + postcss-value-parser "^4.1.0" + schema-utils "^3.0.0" + semver "^7.3.5" + +css-select@^4.1.3: + version "4.2.1" + resolved "https://registry.yarnpkg.com/css-select/-/css-select-4.2.1.tgz#9e665d6ae4c7f9d65dbe69d0316e3221fb274cdd" + integrity sha512-/aUslKhzkTNCQUB2qTX84lVmfia9NyjP3WpDGtj/WxhwBzWBYUV3DgUpurHTme8UTPcPlAD1DJ+b0nN/t50zDQ== + dependencies: + boolbase "^1.0.0" + css-what "^5.1.0" + domhandler "^4.3.0" + domutils "^2.8.0" + nth-check "^2.0.1" + +css-vendor@^2.0.8: + version "2.0.8" + resolved "https://registry.yarnpkg.com/css-vendor/-/css-vendor-2.0.8.tgz#e47f91d3bd3117d49180a3c935e62e3d9f7f449d" + integrity sha512-x9Aq0XTInxrkuFeHKbYC7zWY8ai7qJ04Kxd9MnvbC1uO5DagxoHQjm4JvG+vCdXOoFtCjbL2XSZfxmoYa9uQVQ== + dependencies: + "@babel/runtime" "^7.8.3" + is-in-browser "^1.0.2" + +css-what@^5.1.0: + version "5.1.0" + resolved "https://registry.yarnpkg.com/css-what/-/css-what-5.1.0.tgz#3f7b707aadf633baf62c2ceb8579b545bb40f7fe" + integrity sha512-arSMRWIIFY0hV8pIxZMEfmMI47Wj3R/aWpZDDxWYCPEiOMv6tfOrnpDtgxBYPEQD4V0Y/958+1TdC3iWTFcUPw== + +cssesc@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/cssesc/-/cssesc-3.0.0.tgz#37741919903b868565e1c09ea747445cd18983ee" + integrity sha512-/Tb/JcjK111nNScGob5MNtsntNM1aCNUDipB/TkwZFhyDrrE47SOx/18wF2bbjgc3ZzCSKW1T5nt5EbFoAz/Vg== + +csstype@^2.5.2: + version "2.6.20" + resolved "https://registry.yarnpkg.com/csstype/-/csstype-2.6.20.tgz#9229c65ea0b260cf4d3d997cb06288e36a8d6dda" + integrity sha512-/WwNkdXfckNgw6S5R125rrW8ez139lBHWouiBvX8dfMFtcn6V81REDqnH7+CRpRipfYlyU1CmOnOxrmGcFOjeA== + +csstype@^3.0.2: + version "3.0.11" + resolved "https://registry.yarnpkg.com/csstype/-/csstype-3.0.11.tgz#d66700c5eacfac1940deb4e3ee5642792d85cd33" + integrity sha512-sa6P2wJ+CAbgyy4KFssIb/JNMLxFvKF1pCYCSXS8ZMuqZnMsrxqI2E5sPyoTpxoPU/gVZMzr2zjOfg8GIZOMsw== + +date-fns@2.x: + version "2.28.0" + resolved "https://registry.yarnpkg.com/date-fns/-/date-fns-2.28.0.tgz#9570d656f5fc13143e50c975a3b6bbeb46cd08b2" + integrity sha512-8d35hViGYx/QH0icHYCeLmsLmMUheMmTyV9Fcm6gvNwdw31yXXH+O85sOBJ+OLnLQMKZowvpKb6FgMIQjcpvQw== + +dayjs@1.x: + version "1.10.8" + resolved "https://registry.yarnpkg.com/dayjs/-/dayjs-1.10.8.tgz#267df4bc6276fcb33c04a6735287e3f429abec41" + integrity sha512-wbNwDfBHHur9UOzNUjeKUOJ0fCb0a52Wx0xInmQ7Y8FstyajiV1NmK1e00cxsr9YrE9r7yAChE0VvpuY5Rnlow== + +debug@2.6.9: + version "2.6.9" + resolved "https://registry.yarnpkg.com/debug/-/debug-2.6.9.tgz#5d128515df134ff327e90a4c93f4e077a536341f" + integrity sha512-bC7ElrdJaJnPbAP+1EotYvqZsb3ecl5wi6Bfi6BJTUcNowp6cvspg0jXznRTKDjm/E7AdgFBVeAPVMNcKGsHMA== + dependencies: + ms "2.0.0" + +debug@^3.1.1: + version "3.2.7" + resolved "https://registry.yarnpkg.com/debug/-/debug-3.2.7.tgz#72580b7e9145fb39b6676f9c5e5fb100b934179a" + integrity sha512-CFjzYYAi4ThfiQvizrFQevTTXHtnCqWfe7x1AhgEscTz6ZbLbfoLRLPugTQyBth6f8ZERVUSyWHFD/7Wu4t1XQ== + dependencies: + ms "^2.1.1" + +debug@^4.1.0: + version "4.3.3" + resolved "https://registry.yarnpkg.com/debug/-/debug-4.3.3.tgz#04266e0b70a98d4462e6e288e38259213332b664" + integrity sha512-/zxw5+vh1Tfv+4Qn7a5nsbcJKPaSvCDhojn6FEl9vupwK2VCSDtEiEtqr8DFtzYFOdz63LBkxec7DYuc2jon6Q== + dependencies: + ms "2.1.2" + +deep-equal@^1.0.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/deep-equal/-/deep-equal-1.1.1.tgz#b5c98c942ceffaf7cb051e24e1434a25a2e6076a" + integrity sha512-yd9c5AdiqVcR+JjcwUQb9DkhJc8ngNr0MahEBGvDiJw8puWab2yZlh+nkasOnZP+EGTAP6rRp2JzJhJZzvNF8g== + dependencies: + is-arguments "^1.0.4" + is-date-object "^1.0.1" + is-regex "^1.0.4" + object-is "^1.0.1" + object-keys "^1.1.1" + regexp.prototype.flags "^1.2.0" + +default-gateway@^6.0.3: + version "6.0.3" + resolved "https://registry.yarnpkg.com/default-gateway/-/default-gateway-6.0.3.tgz#819494c888053bdb743edbf343d6cdf7f2943a71" + integrity sha512-fwSOJsbbNzZ/CUFpqFBqYfYNLj1NbMPm8MMCIzHjC83iSJRBEGmDUxU+WP661BaBQImeC2yHwXtz+P/O9o+XEg== + dependencies: + execa "^5.0.0" + +define-lazy-prop@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/define-lazy-prop/-/define-lazy-prop-2.0.0.tgz#3f7ae421129bcaaac9bc74905c98a0009ec9ee7f" + integrity sha512-Ds09qNh8yw3khSjiJjiUInaGX9xlqZDY7JVryGxdxV7NPeuqQfplOpQ66yJFZut3jLa5zOwkXw1g9EI2uKh4Og== + +define-properties@^1.1.3: + version "1.1.3" + resolved "https://registry.yarnpkg.com/define-properties/-/define-properties-1.1.3.tgz#cf88da6cbee26fe6db7094f61d870cbd84cee9f1" + integrity sha512-3MqfYKj2lLzdMSf8ZIZE/V+Zuy+BgD6f164e8K2w7dgnpKArBDerGYpM46IYYcjnkdPNMjPk9A6VFB8+3SKlXQ== + dependencies: + object-keys "^1.0.12" + +del@^6.0.0: + version "6.0.0" + resolved "https://registry.yarnpkg.com/del/-/del-6.0.0.tgz#0b40d0332cea743f1614f818be4feb717714c952" + integrity sha512-1shh9DQ23L16oXSZKB2JxpL7iMy2E0S9d517ptA1P8iw0alkPtQcrKH7ru31rYtKwF499HkTu+DRzq3TCKDFRQ== + dependencies: + globby "^11.0.1" + graceful-fs "^4.2.4" + is-glob "^4.0.1" + is-path-cwd "^2.2.0" + is-path-inside "^3.0.2" + p-map "^4.0.0" + rimraf "^3.0.2" + slash "^3.0.0" + +depd@~1.1.2: + version "1.1.2" + resolved "https://registry.yarnpkg.com/depd/-/depd-1.1.2.tgz#9bcd52e14c097763e749b274c4346ed2e560b5a9" + integrity sha1-m81S4UwJd2PnSbJ0xDRu0uVgtak= + +destroy@~1.0.4: + version "1.0.4" + resolved "https://registry.yarnpkg.com/destroy/-/destroy-1.0.4.tgz#978857442c44749e4206613e37946205826abd80" + integrity sha1-l4hXRCxEdJ5CBmE+N5RiBYJqvYA= + +detect-node@^2.0.4: + version "2.1.0" + resolved "https://registry.yarnpkg.com/detect-node/-/detect-node-2.1.0.tgz#c9c70775a49c3d03bc2c06d9a73be550f978f8b1" + integrity sha512-T0NIuQpnTvFDATNuHN5roPwSBG83rFsuO+MXXH9/3N1eFbn4wcPjttvjMLEPWJ0RGUYgQE7cGgS3tNxbqCGM7g== + +dir-glob@^3.0.1: + version "3.0.1" + resolved "https://registry.yarnpkg.com/dir-glob/-/dir-glob-3.0.1.tgz#56dbf73d992a4a93ba1584f4534063fd2e41717f" + integrity sha512-WkrWp9GR4KXfKGYzOLmTuGVi1UWFfws377n9cc55/tb6DuqyF6pcQ5AbiHEshaDpY9v6oaSr2XCDidGmMwdzIA== + dependencies: + path-type "^4.0.0" + +dns-equal@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/dns-equal/-/dns-equal-1.0.0.tgz#b39e7f1da6eb0a75ba9c17324b34753c47e0654d" + integrity sha1-s55/HabrCnW6nBcySzR1PEfgZU0= + +dns-packet@^1.3.1: + version "1.3.4" + resolved "https://registry.yarnpkg.com/dns-packet/-/dns-packet-1.3.4.tgz#e3455065824a2507ba886c55a89963bb107dec6f" + integrity sha512-BQ6F4vycLXBvdrJZ6S3gZewt6rcrks9KBgM9vrhW+knGRqc8uEdT7fuCwloc7nny5xNoMJ17HGH0R/6fpo8ECA== + dependencies: + ip "^1.1.0" + safe-buffer "^5.0.1" + +dns-txt@^2.0.2: + version "2.0.2" + resolved "https://registry.yarnpkg.com/dns-txt/-/dns-txt-2.0.2.tgz#b91d806f5d27188e4ab3e7d107d881a1cc4642b6" + integrity sha1-uR2Ab10nGI5Ks+fRB9iBocxGQrY= + dependencies: + buffer-indexof "^1.0.0" + +dom-align@^1.7.0: + version "1.12.2" + resolved "https://registry.yarnpkg.com/dom-align/-/dom-align-1.12.2.tgz#0f8164ebd0c9c21b0c790310493cd855892acd4b" + integrity sha512-pHuazgqrsTFrGU2WLDdXxCFabkdQDx72ddkraZNih1KsMcN5qsRSTR9O4VJRlwTPCPb5COYg3LOfiMHHcPInHg== + +dom-converter@^0.2.0: + version "0.2.0" + resolved "https://registry.yarnpkg.com/dom-converter/-/dom-converter-0.2.0.tgz#6721a9daee2e293682955b6afe416771627bb768" + integrity sha512-gd3ypIPfOMr9h5jIKq8E3sHOTCjeirnl0WK5ZdS1AW0Odt0b1PaWaHdJ4Qk4klv+YB9aJBS7mESXjFoDQPu6DA== + dependencies: + utila "~0.4" + +dom-helpers@^5.0.1: + version "5.2.1" + resolved "https://registry.yarnpkg.com/dom-helpers/-/dom-helpers-5.2.1.tgz#d9400536b2bf8225ad98fe052e029451ac40e902" + integrity sha512-nRCa7CK3VTrM2NmGkIy4cbK7IZlgBE/PYMn55rrXefr5xXDP0LdtfPnblFDoVdcAfslJ7or6iqAUnx0CCGIWQA== + dependencies: + "@babel/runtime" "^7.8.7" + csstype "^3.0.2" + +dom-serializer@^1.0.1: + version "1.3.2" + resolved "https://registry.yarnpkg.com/dom-serializer/-/dom-serializer-1.3.2.tgz#6206437d32ceefaec7161803230c7a20bc1b4d91" + integrity sha512-5c54Bk5Dw4qAxNOI1pFEizPSjVsx5+bpJKmL2kPn8JhBUq2q09tTCa3mjijun2NfK78NMouDYNMBkOrPZiS+ig== + dependencies: + domelementtype "^2.0.1" + domhandler "^4.2.0" + entities "^2.0.0" + +domelementtype@^2.0.1, domelementtype@^2.2.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/domelementtype/-/domelementtype-2.2.0.tgz#9a0b6c2782ed6a1c7323d42267183df9bd8b1d57" + integrity sha512-DtBMo82pv1dFtUmHyr48beiuq792Sxohr+8Hm9zoxklYPfa6n0Z3Byjj2IV7bmr2IyqClnqEQhfgHJJ5QF0R5A== + +domhandler@^4.0.0, domhandler@^4.2.0, domhandler@^4.3.0: + version "4.3.0" + resolved "https://registry.yarnpkg.com/domhandler/-/domhandler-4.3.0.tgz#16c658c626cf966967e306f966b431f77d4a5626" + integrity sha512-fC0aXNQXqKSFTr2wDNZDhsEYjCiYsDWl3D01kwt25hm1YIPyDGHvvi3rw+PLqHAl/m71MaiF7d5zvBr0p5UB2g== + dependencies: + domelementtype "^2.2.0" + +domutils@^2.5.2, domutils@^2.8.0: + version "2.8.0" + resolved "https://registry.yarnpkg.com/domutils/-/domutils-2.8.0.tgz#4437def5db6e2d1f5d6ee859bd95ca7d02048135" + integrity sha512-w96Cjofp72M5IIhpjgobBimYEfoPjx1Vx0BSX9P30WBdZW2WIKU0T1Bd0kz2eNZ9ikjKgHbEyKx8BB6H1L3h3A== + dependencies: + dom-serializer "^1.0.1" + domelementtype "^2.2.0" + domhandler "^4.2.0" + +dot-case@^3.0.4: + version "3.0.4" + resolved "https://registry.yarnpkg.com/dot-case/-/dot-case-3.0.4.tgz#9b2b670d00a431667a8a75ba29cd1b98809ce751" + integrity sha512-Kv5nKlh6yRrdrGvxeJ2e5y2eRUpkUosIW4A2AS38zwSz27zu7ufDwQPi5Jhs3XAlGNetl3bmnGhQsMtkKJnj3w== + dependencies: + no-case "^3.0.4" + tslib "^2.0.3" + +ee-first@1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/ee-first/-/ee-first-1.1.1.tgz#590c61156b0ae2f4f0255732a158b266bc56b21d" + integrity sha1-WQxhFWsK4vTwJVcyoViyZrxWsh0= + +electron-to-chromium@^1.4.76: + version "1.4.76" + resolved "https://registry.yarnpkg.com/electron-to-chromium/-/electron-to-chromium-1.4.76.tgz#a0494baedaf51094b1c172999919becd9975a934" + integrity sha512-3Vftv7cenJtQb+k00McEBZ2vVmZ/x+HEF7pcZONZIkOsESqAqVuACmBxMv0JhzX7u0YltU0vSqRqgBSTAhFUjA== + +emojis-list@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/emojis-list/-/emojis-list-3.0.0.tgz#5570662046ad29e2e916e71aae260abdff4f6a78" + integrity sha512-/kyM18EfinwXZbno9FyUGeFh87KC8HRQBQGildHZbEuRyWFOmv1U10o9BBp8XVZDVNNuQKyIGIu5ZYAAXJ0V2Q== + +encodeurl@~1.0.2: + version "1.0.2" + resolved "https://registry.yarnpkg.com/encodeurl/-/encodeurl-1.0.2.tgz#ad3ff4c86ec2d029322f5a02c3a9a606c95b3f59" + integrity sha1-rT/0yG7C0CkyL1oCw6mmBslbP1k= + +enhanced-resolve@^4.0.0: + version "4.5.0" + resolved "https://registry.yarnpkg.com/enhanced-resolve/-/enhanced-resolve-4.5.0.tgz#2f3cfd84dbe3b487f18f2db2ef1e064a571ca5ec" + integrity sha512-Nv9m36S/vxpsI+Hc4/ZGRs0n9mXqSWGGq49zxb/cJfPAQMbUtttJAlNPS4AQzaBdw/pKskw5bMbekT/Y7W/Wlg== + dependencies: + graceful-fs "^4.1.2" + memory-fs "^0.5.0" + tapable "^1.0.0" + +enhanced-resolve@^5.9.2: + version "5.9.2" + resolved "https://registry.yarnpkg.com/enhanced-resolve/-/enhanced-resolve-5.9.2.tgz#0224dcd6a43389ebfb2d55efee517e5466772dd9" + integrity sha512-GIm3fQfwLJ8YZx2smuHpBKkXC1yOk+OBEmKckVyL0i/ea8mqDEykK3ld5dgH1QYPNyT/lIllxV2LULnxCHaHkA== + dependencies: + graceful-fs "^4.2.4" + tapable "^2.2.0" + +entities@^2.0.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/entities/-/entities-2.2.0.tgz#098dc90ebb83d8dffa089d55256b351d34c4da55" + integrity sha512-p92if5Nz619I0w+akJrLZH0MX0Pb5DX39XOwQTtXSdQQOaYH03S1uIQp4mhOZtAXrxq4ViO67YTiLBo2638o9A== + +envinfo@^7.7.3: + version "7.8.1" + resolved "https://registry.yarnpkg.com/envinfo/-/envinfo-7.8.1.tgz#06377e3e5f4d379fea7ac592d5ad8927e0c4d475" + integrity sha512-/o+BXHmB7ocbHEAs6F2EnG0ogybVVUdkRunTT2glZU9XAaGmhqskrvKwqXuDfNjEO0LZKWdejEEpnq8aM0tOaw== + +errno@^0.1.3: + version "0.1.8" + resolved "https://registry.yarnpkg.com/errno/-/errno-0.1.8.tgz#8bb3e9c7d463be4976ff888f76b4809ebc2e811f" + integrity sha512-dJ6oBr5SQ1VSd9qkk7ByRgb/1SH4JZjCHSW/mr63/QcXO9zLVxvJ6Oy13nio03rxpSnVDDjFor75SjVeZWPW/A== + dependencies: + prr "~1.0.1" + +es-module-lexer@^0.9.0: + version "0.9.3" + resolved "https://registry.yarnpkg.com/es-module-lexer/-/es-module-lexer-0.9.3.tgz#6f13db00cc38417137daf74366f535c8eb438f19" + integrity sha512-1HQ2M2sPtxwnvOvT1ZClHyQDiggdNjURWpY2we6aMKCQiUVxTmVs2UYPLIrD84sS+kMdUwfBSylbJPwNnBrnHQ== + +escalade@^3.1.1: + version "3.1.1" + resolved "https://registry.yarnpkg.com/escalade/-/escalade-3.1.1.tgz#d8cfdc7000965c5a0174b4a82eaa5c0552742e40" + integrity sha512-k0er2gUkLf8O0zKJiAhmkTnJlTvINGv7ygDNPbeIsX/TJjGJZHuh9B2UxbsaEkmlEo9MfhrSzmhIlhRlI2GXnw== + +escape-html@~1.0.3: + version "1.0.3" + resolved "https://registry.yarnpkg.com/escape-html/-/escape-html-1.0.3.tgz#0258eae4d3d0c0974de1c169188ef0051d1d1988" + integrity sha1-Aljq5NPQwJdN4cFpGI7wBR0dGYg= + +eslint-scope@5.1.1: + version "5.1.1" + resolved "https://registry.yarnpkg.com/eslint-scope/-/eslint-scope-5.1.1.tgz#e786e59a66cb92b3f6c1fb0d508aab174848f48c" + integrity sha512-2NxwbF/hZ0KpepYN0cNbo+FN6XoK7GaHlQhgx/hIZl6Va0bF45RQOOwhLIy8lQDbuCiadSLCBnH2CFYquit5bw== + dependencies: + esrecurse "^4.3.0" + estraverse "^4.1.1" + +esrecurse@^4.3.0: + version "4.3.0" + resolved "https://registry.yarnpkg.com/esrecurse/-/esrecurse-4.3.0.tgz#7ad7964d679abb28bee72cec63758b1c5d2c9921" + integrity sha512-KmfKL3b6G+RXvP8N1vr3Tq1kL/oCFgn2NYXEtqP8/L3pKapUA4G8cFVaoF3SU323CD4XypR/ffioHmkti6/Tag== + dependencies: + estraverse "^5.2.0" + +estraverse@^4.1.1: + version "4.3.0" + resolved "https://registry.yarnpkg.com/estraverse/-/estraverse-4.3.0.tgz#398ad3f3c5a24948be7725e83d11a7de28cdbd1d" + integrity sha512-39nnKffWz8xN1BU/2c79n9nB9HDzo0niYUqx6xyqUnyoAnQyyWpOTdZEeiCch8BBu515t4wp9ZmgVfVhn9EBpw== + +estraverse@^5.2.0: + version "5.3.0" + resolved "https://registry.yarnpkg.com/estraverse/-/estraverse-5.3.0.tgz#2eea5290702f26ab8fe5370370ff86c965d21123" + integrity sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA== + +etag@~1.8.1: + version "1.8.1" + resolved "https://registry.yarnpkg.com/etag/-/etag-1.8.1.tgz#41ae2eeb65efa62268aebfea83ac7d79299b0887" + integrity sha1-Qa4u62XvpiJorr/qg6x9eSmbCIc= + +eventemitter3@^4.0.0: + version "4.0.7" + resolved "https://registry.yarnpkg.com/eventemitter3/-/eventemitter3-4.0.7.tgz#2de9b68f6528d5644ef5c59526a1b4a07306169f" + integrity sha512-8guHBZCwKnFhYdHr2ysuRWErTwhoN2X8XELRlrRwpmfeY2jjuUN4taQMsULKUVo1K4DvZl+0pgfyoysHxvmvEw== + +events@^3.2.0: + version "3.3.0" + resolved "https://registry.yarnpkg.com/events/-/events-3.3.0.tgz#31a95ad0a924e2d2c419a813aeb2c4e878ea7400" + integrity sha512-mQw+2fkQbALzQ7V0MY0IqdnXNOeTtP4r0lN9z7AAawCXgqea7bDii20AYrIBrFd/Hx0M2Ocz6S111CaFkUcb0Q== + +execa@^5.0.0: + version "5.1.1" + resolved "https://registry.yarnpkg.com/execa/-/execa-5.1.1.tgz#f80ad9cbf4298f7bd1d4c9555c21e93741c411dd" + integrity sha512-8uSpZZocAZRBAPIEINJj3Lo9HyGitllczc27Eh5YYojjMFMn8yHMDMaUHE2Jqfq05D/wucwI4JGURyXt1vchyg== + dependencies: + cross-spawn "^7.0.3" + get-stream "^6.0.0" + human-signals "^2.1.0" + is-stream "^2.0.0" + merge-stream "^2.0.0" + npm-run-path "^4.0.1" + onetime "^5.1.2" + signal-exit "^3.0.3" + strip-final-newline "^2.0.0" + +express@^4.17.1: + version "4.17.3" + resolved "https://registry.yarnpkg.com/express/-/express-4.17.3.tgz#f6c7302194a4fb54271b73a1fe7a06478c8f85a1" + integrity sha512-yuSQpz5I+Ch7gFrPCk4/c+dIBKlQUxtgwqzph132bsT6qhuzss6I8cLJQz7B3rFblzd6wtcI0ZbGltH/C4LjUg== + dependencies: + accepts "~1.3.8" + array-flatten "1.1.1" + body-parser "1.19.2" + content-disposition "0.5.4" + content-type "~1.0.4" + cookie "0.4.2" + cookie-signature "1.0.6" + debug "2.6.9" + depd "~1.1.2" + encodeurl "~1.0.2" + escape-html "~1.0.3" + etag "~1.8.1" + finalhandler "~1.1.2" + fresh "0.5.2" + merge-descriptors "1.0.1" + methods "~1.1.2" + on-finished "~2.3.0" + parseurl "~1.3.3" + path-to-regexp "0.1.7" + proxy-addr "~2.0.7" + qs "6.9.7" + range-parser "~1.2.1" + safe-buffer "5.2.1" + send "0.17.2" + serve-static "1.14.2" + setprototypeof "1.2.0" + statuses "~1.5.0" + type-is "~1.6.18" + utils-merge "1.0.1" + vary "~1.1.2" + +fast-deep-equal@^3.1.1, fast-deep-equal@^3.1.3: + version "3.1.3" + resolved "https://registry.yarnpkg.com/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz#3a7d56b559d6cbc3eb512325244e619a65c6c525" + integrity sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q== + +fast-glob@^3.2.9: + version "3.2.11" + resolved "https://registry.yarnpkg.com/fast-glob/-/fast-glob-3.2.11.tgz#a1172ad95ceb8a16e20caa5c5e56480e5129c1d9" + integrity sha512-xrO3+1bxSo3ZVHAnqzyuewYT6aMFHRAd4Kcs92MAonjwQZLsK9d0SF1IyQ3k5PoirxTW0Oe/RqFgMQ6TcNE5Ew== + dependencies: + "@nodelib/fs.stat" "^2.0.2" + "@nodelib/fs.walk" "^1.2.3" + glob-parent "^5.1.2" + merge2 "^1.3.0" + micromatch "^4.0.4" + +fast-json-stable-stringify@^2.0.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/fast-json-stable-stringify/-/fast-json-stable-stringify-2.1.0.tgz#874bf69c6f404c2b5d99c481341399fd55892633" + integrity sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw== + +fastest-levenshtein@^1.0.12: + version "1.0.12" + resolved "https://registry.yarnpkg.com/fastest-levenshtein/-/fastest-levenshtein-1.0.12.tgz#9990f7d3a88cc5a9ffd1f1745745251700d497e2" + integrity sha512-On2N+BpYJ15xIC974QNVuYGMOlEVt4s0EOI3wwMqOmK1fdDY+FN/zltPV8vosq4ad4c/gJ1KHScUn/6AWIgiow== + +fastq@^1.6.0: + version "1.13.0" + resolved "https://registry.yarnpkg.com/fastq/-/fastq-1.13.0.tgz#616760f88a7526bdfc596b7cab8c18938c36b98c" + integrity sha512-YpkpUnK8od0o1hmeSc7UUs/eB/vIPWJYjKck2QKIzAf71Vm1AAQ3EbuZB3g2JIy+pg+ERD0vqI79KyZiB2e2Nw== + dependencies: + reusify "^1.0.4" + +faye-websocket@^0.11.3: + version "0.11.4" + resolved "https://registry.yarnpkg.com/faye-websocket/-/faye-websocket-0.11.4.tgz#7f0d9275cfdd86a1c963dc8b65fcc451edcbb1da" + integrity sha512-CzbClwlXAuiRQAlUyfqPgvPoNKTckTPGfwZV4ZdAhVcP2lh9KUxJg2b5GkE7XbjKQ3YJnQ9z6D9ntLAlB+tP8g== + dependencies: + websocket-driver ">=0.5.1" + +fill-range@^7.0.1: + version "7.0.1" + resolved "https://registry.yarnpkg.com/fill-range/-/fill-range-7.0.1.tgz#1919a6a7c75fe38b2c7c77e5198535da9acdda40" + integrity sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ== + dependencies: + to-regex-range "^5.0.1" + +finalhandler@~1.1.2: + version "1.1.2" + resolved "https://registry.yarnpkg.com/finalhandler/-/finalhandler-1.1.2.tgz#b7e7d000ffd11938d0fdb053506f6ebabe9f587d" + integrity sha512-aAWcW57uxVNrQZqFXjITpW3sIUQmHGG3qSb9mUah9MgMC4NeWhNOlNjXEYq3HjRAvL6arUviZGGJsBg6z0zsWA== + dependencies: + debug "2.6.9" + encodeurl "~1.0.2" + escape-html "~1.0.3" + on-finished "~2.3.0" + parseurl "~1.3.3" + statuses "~1.5.0" + unpipe "~1.0.0" + +find-up@^4.0.0: + version "4.1.0" + resolved "https://registry.yarnpkg.com/find-up/-/find-up-4.1.0.tgz#97afe7d6cdc0bc5928584b7c8d7b16e8a9aa5d19" + integrity sha512-PpOwAdQ/YlXQ2vj8a3h8IipDuYRi3wceVQQGYWxNINccq40Anw7BlsEXCMbt1Zt+OLA6Fq9suIpIWD0OsnISlw== + dependencies: + locate-path "^5.0.0" + path-exists "^4.0.0" + +flow-bin@^0.118.0: + version "0.118.0" + resolved "https://registry.yarnpkg.com/flow-bin/-/flow-bin-0.118.0.tgz#fb706364a58c682d67a2ca7df39396467dc397d1" + integrity sha512-jlbUu0XkbpXeXhan5xyTqVK1jmEKNxE8hpzznI3TThHTr76GiFwK0iRzhDo4KNy+S9h/KxHaqVhTP86vA6wHCg== + +follow-redirects@^1.0.0: + version "1.14.9" + resolved "https://registry.yarnpkg.com/follow-redirects/-/follow-redirects-1.14.9.tgz#dd4ea157de7bfaf9ea9b3fbd85aa16951f78d8d7" + integrity sha512-MQDfihBQYMcyy5dhRDJUHcw7lb2Pv/TuE6xP1vyraLukNDHKbDxDNaOE3NbCAdKQApno+GPRyo1YAp89yCjK4w== + +forwarded@0.2.0: + version "0.2.0" + resolved "https://registry.yarnpkg.com/forwarded/-/forwarded-0.2.0.tgz#2269936428aad4c15c7ebe9779a84bf0b2a81811" + integrity sha512-buRG0fpBtRHSTCOASe6hD258tEubFoRLb4ZNA6NxMVHNw2gOcwHo9wyablzMzOA5z9xA9L1KNjk/Nt6MT9aYow== + +fresh@0.5.2: + version "0.5.2" + resolved "https://registry.yarnpkg.com/fresh/-/fresh-0.5.2.tgz#3d8cadd90d976569fa835ab1f8e4b23a105605a7" + integrity sha1-PYyt2Q2XZWn6g1qx+OSyOhBWBac= + +fs-monkey@1.0.3: + version "1.0.3" + resolved "https://registry.yarnpkg.com/fs-monkey/-/fs-monkey-1.0.3.tgz#ae3ac92d53bb328efe0e9a1d9541f6ad8d48e2d3" + integrity sha512-cybjIfiiE+pTWicSCLFHSrXZ6EilF30oh91FDP9S2B051prEa7QWfrVTQm10/dDpswBDXZugPa1Ogu8Yh+HV0Q== + +fs.realpath@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/fs.realpath/-/fs.realpath-1.0.0.tgz#1504ad2523158caa40db4a2787cb01411994ea4f" + integrity sha1-FQStJSMVjKpA20onh8sBQRmU6k8= + +fsevents@~2.3.2: + version "2.3.2" + resolved "https://registry.yarnpkg.com/fsevents/-/fsevents-2.3.2.tgz#8a526f78b8fdf4623b709e0b975c52c24c02fd1a" + integrity sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA== + +function-bind@^1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/function-bind/-/function-bind-1.1.1.tgz#a56899d3ea3c9bab874bb9773b7c5ede92f4895d" + integrity sha512-yIovAzMX49sF8Yl58fSCWJ5svSLuaibPxXQJFLmBObTuCr0Mf1KiPopGM9NiFjiYBCbfaa2Fh6breQ6ANVTI0A== + +get-intrinsic@^1.0.2: + version "1.1.1" + resolved "https://registry.yarnpkg.com/get-intrinsic/-/get-intrinsic-1.1.1.tgz#15f59f376f855c446963948f0d24cd3637b4abc6" + integrity sha512-kWZrnVM42QCiEA2Ig1bG8zjoIMOgxWwYCEeNdwY6Tv/cOSeGpcoX4pXHfKUxNKVoArnrEr2e9srnAxxGIraS9Q== + dependencies: + function-bind "^1.1.1" + has "^1.0.3" + has-symbols "^1.0.1" + +get-stream@^6.0.0: + version "6.0.1" + resolved "https://registry.yarnpkg.com/get-stream/-/get-stream-6.0.1.tgz#a262d8eef67aced57c2852ad6167526a43cbf7b7" + integrity sha512-ts6Wi+2j3jQjqi70w5AlN8DFnkSwC+MqmxEzdEALB2qXZYV3X/b1CTfgPLGJNMeAWxdPfU8FO1ms3NUfaHCPYg== + +glob-parent@^5.1.2, glob-parent@~5.1.2: + version "5.1.2" + resolved "https://registry.yarnpkg.com/glob-parent/-/glob-parent-5.1.2.tgz#869832c58034fe68a4093c17dc15e8340d8401c4" + integrity sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow== + dependencies: + is-glob "^4.0.1" + +glob-to-regexp@^0.4.1: + version "0.4.1" + resolved "https://registry.yarnpkg.com/glob-to-regexp/-/glob-to-regexp-0.4.1.tgz#c75297087c851b9a578bd217dd59a92f59fe546e" + integrity sha512-lkX1HJXwyMcprw/5YUZc2s7DrpAiHB21/V+E1rHUrVNokkvB6bqMzT0VfV6/86ZNabt1k14YOIaT7nDvOX3Iiw== + +glob@^7.1.3: + version "7.2.0" + resolved "https://registry.yarnpkg.com/glob/-/glob-7.2.0.tgz#d15535af7732e02e948f4c41628bd910293f6023" + integrity sha512-lmLf6gtyrPq8tTjSmrO94wBeQbFR3HbLHbuyD69wuyQkImp2hWqMGB47OX65FBkPffO641IP9jWa1z4ivqG26Q== + dependencies: + fs.realpath "^1.0.0" + inflight "^1.0.4" + inherits "2" + minimatch "^3.0.4" + once "^1.3.0" + path-is-absolute "^1.0.0" + +globby@^11.0.1: + version "11.1.0" + resolved "https://registry.yarnpkg.com/globby/-/globby-11.1.0.tgz#bd4be98bb042f83d796f7e3811991fbe82a0d34b" + integrity sha512-jhIXaOzy1sb8IyocaruWSn1TjmnBVs8Ayhcy83rmxNJ8q2uWKCAj3CnJY+KpGSXCueAPc0i05kVvVKtP1t9S3g== + dependencies: + array-union "^2.1.0" + dir-glob "^3.0.1" + fast-glob "^3.2.9" + ignore "^5.2.0" + merge2 "^1.4.1" + slash "^3.0.0" + +graceful-fs@^4.1.2, graceful-fs@^4.2.4, graceful-fs@^4.2.6, graceful-fs@^4.2.9: + version "4.2.9" + resolved "https://registry.yarnpkg.com/graceful-fs/-/graceful-fs-4.2.9.tgz#041b05df45755e587a24942279b9d113146e1c96" + integrity sha512-NtNxqUcXgpW2iMrfqSfR73Glt39K+BLwWsPs94yR63v45T0Wbej7eRmL5cWfwEgqXnmjQp3zaJTshdRW/qC2ZQ== + +handle-thing@^2.0.0: + version "2.0.1" + resolved "https://registry.yarnpkg.com/handle-thing/-/handle-thing-2.0.1.tgz#857f79ce359580c340d43081cc648970d0bb234e" + integrity sha512-9Qn4yBxelxoh2Ow62nP+Ka/kMnOXRi8BXnRaUwezLNhqelnN49xKz4F/dPP8OYLxLxq6JDtZb2i9XznUQbNPTg== + +has-flag@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/has-flag/-/has-flag-4.0.0.tgz#944771fd9c81c81265c4d6941860da06bb59479b" + integrity sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ== + +has-symbols@^1.0.1, has-symbols@^1.0.2: + version "1.0.3" + resolved "https://registry.yarnpkg.com/has-symbols/-/has-symbols-1.0.3.tgz#bb7b2c4349251dce87b125f7bdf874aa7c8b39f8" + integrity sha512-l3LCuF6MgDNwTDKkdYGEihYjt5pRPbEg46rtlmnSPlUbgmB8LOIrKJbYYFBSbnPaJexMKtiPO8hmeRjRz2Td+A== + +has-tostringtag@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/has-tostringtag/-/has-tostringtag-1.0.0.tgz#7e133818a7d394734f941e73c3d3f9291e658b25" + integrity sha512-kFjcSNhnlGV1kyoGk7OXKSawH5JOb/LzUc5w9B02hOTO0dfFRjbHQKvg1d6cf3HbeUmtU9VbbV3qzZ2Teh97WQ== + dependencies: + has-symbols "^1.0.2" + +has@^1.0.3: + version "1.0.3" + resolved "https://registry.yarnpkg.com/has/-/has-1.0.3.tgz#722d7cbfc1f6aa8241f16dd814e011e1f41e8796" + integrity sha512-f2dvO0VU6Oej7RkWJGrehjbzMAjFp5/VKPp5tTpWIV4JHHZK1/BxbFRtf/siA2SWTe09caDmVtYYzWEIbBS4zw== + dependencies: + function-bind "^1.1.1" + +he@^1.2.0: + version "1.2.0" + resolved "https://registry.yarnpkg.com/he/-/he-1.2.0.tgz#84ae65fa7eafb165fddb61566ae14baf05664f0f" + integrity sha512-F/1DnUGPopORZi0ni+CvrCgHQ5FyEAHRLSApuYWMmrbSwoN2Mn/7k+Gl38gJnR7yyDZk6WLXwiGod1JOWNDKGw== + +hoist-non-react-statics@^3.3.2: + version "3.3.2" + resolved "https://registry.yarnpkg.com/hoist-non-react-statics/-/hoist-non-react-statics-3.3.2.tgz#ece0acaf71d62c2969c2ec59feff42a4b1a85b45" + integrity sha512-/gGivxi8JPKWNm/W0jSmzcMPpfpPLc3dY/6GxhX2hQ9iGj3aDfklV4ET7NjKpSinLpJ5vafa9iiGIEZg10SfBw== + dependencies: + react-is "^16.7.0" + +hpack.js@^2.1.6: + version "2.1.6" + resolved "https://registry.yarnpkg.com/hpack.js/-/hpack.js-2.1.6.tgz#87774c0949e513f42e84575b3c45681fade2a0b2" + integrity sha1-h3dMCUnlE/QuhFdbPEVoH63ioLI= + dependencies: + inherits "^2.0.1" + obuf "^1.0.0" + readable-stream "^2.0.1" + wbuf "^1.1.0" + +html-entities@^2.3.2: + version "2.3.2" + resolved "https://registry.yarnpkg.com/html-entities/-/html-entities-2.3.2.tgz#760b404685cb1d794e4f4b744332e3b00dcfe488" + integrity sha512-c3Ab/url5ksaT0WyleslpBEthOzWhrjQbg75y7XUsfSzi3Dgzt0l8w5e7DylRn15MTlMMD58dTfzddNS2kcAjQ== + +html-minifier-terser@^6.0.2: + version "6.1.0" + resolved "https://registry.yarnpkg.com/html-minifier-terser/-/html-minifier-terser-6.1.0.tgz#bfc818934cc07918f6b3669f5774ecdfd48f32ab" + integrity sha512-YXxSlJBZTP7RS3tWnQw74ooKa6L9b9i9QYXY21eUEvhZ3u9XLfv6OnFsQq6RxkhHygsaUMvYsZRV5rU/OVNZxw== + dependencies: + camel-case "^4.1.2" + clean-css "^5.2.2" + commander "^8.3.0" + he "^1.2.0" + param-case "^3.0.4" + relateurl "^0.2.7" + terser "^5.10.0" + +html-webpack-plugin@^5.3.1: + version "5.5.0" + resolved "https://registry.yarnpkg.com/html-webpack-plugin/-/html-webpack-plugin-5.5.0.tgz#c3911936f57681c1f9f4d8b68c158cd9dfe52f50" + integrity sha512-sy88PC2cRTVxvETRgUHFrL4No3UxvcH8G1NepGhqaTT+GXN2kTamqasot0inS5hXeg1cMbFDt27zzo9p35lZVw== + dependencies: + "@types/html-minifier-terser" "^6.0.0" + html-minifier-terser "^6.0.2" + lodash "^4.17.21" + pretty-error "^4.0.0" + tapable "^2.0.0" + +htmlparser2@^6.1.0: + version "6.1.0" + resolved "https://registry.yarnpkg.com/htmlparser2/-/htmlparser2-6.1.0.tgz#c4d762b6c3371a05dbe65e94ae43a9f845fb8fb7" + integrity sha512-gyyPk6rgonLFEDGoeRgQNaEUvdJ4ktTmmUh/h2t7s+M8oPpIPxgNACWa+6ESR57kXstwqPiCut0V8NRpcwgU7A== + dependencies: + domelementtype "^2.0.1" + domhandler "^4.0.0" + domutils "^2.5.2" + entities "^2.0.0" + +http-deceiver@^1.2.7: + version "1.2.7" + resolved "https://registry.yarnpkg.com/http-deceiver/-/http-deceiver-1.2.7.tgz#fa7168944ab9a519d337cb0bec7284dc3e723d87" + integrity sha1-+nFolEq5pRnTN8sL7HKE3D5yPYc= + +http-errors@1.8.1: + version "1.8.1" + resolved "https://registry.yarnpkg.com/http-errors/-/http-errors-1.8.1.tgz#7c3f28577cbc8a207388455dbd62295ed07bd68c" + integrity sha512-Kpk9Sm7NmI+RHhnj6OIWDI1d6fIoFAtFt9RLaTMRlg/8w49juAStsrBgp0Dp4OdxdVbRIeKhtCUvoi/RuAhO4g== + dependencies: + depd "~1.1.2" + inherits "2.0.4" + setprototypeof "1.2.0" + statuses ">= 1.5.0 < 2" + toidentifier "1.0.1" + +http-errors@~1.6.2: + version "1.6.3" + resolved "https://registry.yarnpkg.com/http-errors/-/http-errors-1.6.3.tgz#8b55680bb4be283a0b5bf4ea2e38580be1d9320d" + integrity sha1-i1VoC7S+KDoLW/TqLjhYC+HZMg0= + dependencies: + depd "~1.1.2" + inherits "2.0.3" + setprototypeof "1.1.0" + statuses ">= 1.4.0 < 2" + +http-parser-js@>=0.5.1: + version "0.5.6" + resolved "https://registry.yarnpkg.com/http-parser-js/-/http-parser-js-0.5.6.tgz#2e02406ab2df8af8a7abfba62e0da01c62b95afd" + integrity sha512-vDlkRPDJn93swjcjqMSaGSPABbIarsr1TLAui/gLDXzV5VsJNdXNzMYDyNBLQkjWQCJ1uizu8T2oDMhmGt0PRA== + +http-proxy-middleware@^2.0.0: + version "2.0.3" + resolved "https://registry.yarnpkg.com/http-proxy-middleware/-/http-proxy-middleware-2.0.3.tgz#5df04f69a89f530c2284cd71eeaa51ba52243289" + integrity sha512-1bloEwnrHMnCoO/Gcwbz7eSVvW50KPES01PecpagI+YLNLci4AcuKJrujW4Mc3sBLpFxMSlsLNHS5Nl/lvrTPA== + dependencies: + "@types/http-proxy" "^1.17.8" + http-proxy "^1.18.1" + is-glob "^4.0.1" + is-plain-obj "^3.0.0" + micromatch "^4.0.2" + +http-proxy@^1.18.1: + version "1.18.1" + resolved "https://registry.yarnpkg.com/http-proxy/-/http-proxy-1.18.1.tgz#401541f0534884bbf95260334e72f88ee3976549" + integrity sha512-7mz/721AbnJwIVbnaSv1Cz3Am0ZLT/UBwkC92VlxhXv/k/BBQfM2fXElQNC27BVGr0uwUpplYPQM9LnaBMR5NQ== + dependencies: + eventemitter3 "^4.0.0" + follow-redirects "^1.0.0" + requires-port "^1.0.0" + +human-signals@^2.1.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/human-signals/-/human-signals-2.1.0.tgz#dc91fcba42e4d06e4abaed33b3e7a3c02f514ea0" + integrity sha512-B4FFZ6q/T2jhhksgkbEW3HBvWIfDW85snkQgawt07S7J5QXTk6BkNV+0yAeZrM5QpMAdYlocGoljn0sJ/WQkFw== + +hyphenate-style-name@^1.0.3: + version "1.0.4" + resolved "https://registry.yarnpkg.com/hyphenate-style-name/-/hyphenate-style-name-1.0.4.tgz#691879af8e220aea5750e8827db4ef62a54e361d" + integrity sha512-ygGZLjmXfPHj+ZWh6LwbC37l43MhfztxetbFCoYTM2VjkIUpeHgSNn7QIyVFj7YQ1Wl9Cbw5sholVJPzWvC2MQ== + +iconv-lite@0.4.24: + version "0.4.24" + resolved "https://registry.yarnpkg.com/iconv-lite/-/iconv-lite-0.4.24.tgz#2022b4b25fbddc21d2f524974a474aafe733908b" + integrity sha512-v3MXnZAcvnywkTUEZomIActle7RXXeedOR31wwl7VlyoXO4Qi9arvSenNQWne1TcRwhCL1HwLI21bEqdpj8/rA== + dependencies: + safer-buffer ">= 2.1.2 < 3" + +icss-utils@^5.0.0, icss-utils@^5.1.0: + version "5.1.0" + resolved "https://registry.yarnpkg.com/icss-utils/-/icss-utils-5.1.0.tgz#c6be6858abd013d768e98366ae47e25d5887b1ae" + integrity sha512-soFhflCVWLfRNOPU3iv5Z9VUdT44xFRbzjLsEzSr5AQmgqPMTHdU3PMT1Cf1ssx8fLNJDA1juftYl+PUcv3MqA== + +ignore@^5.2.0: + version "5.2.0" + resolved "https://registry.yarnpkg.com/ignore/-/ignore-5.2.0.tgz#6d3bac8fa7fe0d45d9f9be7bac2fc279577e345a" + integrity sha512-CmxgYGiEPCLhfLnpPp1MoRmifwEIOgjcHXxOBjv7mY96c+eWScsOP9c112ZyLdWHi0FxHjI+4uVhKYp/gcdRmQ== + +import-local@^3.0.2: + version "3.1.0" + resolved "https://registry.yarnpkg.com/import-local/-/import-local-3.1.0.tgz#b4479df8a5fd44f6cdce24070675676063c95cb4" + integrity sha512-ASB07uLtnDs1o6EHjKpX34BKYDSqnFerfTOJL2HvMqF70LnxpjkzDB8J44oT9pu4AMPkQwf8jl6szgvNd2tRIg== + dependencies: + pkg-dir "^4.2.0" + resolve-cwd "^3.0.0" + +indent-string@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/indent-string/-/indent-string-4.0.0.tgz#624f8f4497d619b2d9768531d58f4122854d7251" + integrity sha512-EdDDZu4A2OyIK7Lr/2zG+w5jmbuk1DVBnEwREQvBzspBJkCEbRa8GxU1lghYcaGJCnRWibjDXlq779X1/y5xwg== + +inflight@^1.0.4: + version "1.0.6" + resolved "https://registry.yarnpkg.com/inflight/-/inflight-1.0.6.tgz#49bd6331d7d02d0c09bc910a1075ba8165b56df9" + integrity sha1-Sb1jMdfQLQwJvJEKEHW6gWW1bfk= + dependencies: + once "^1.3.0" + wrappy "1" + +inherits@2, inherits@2.0.4, inherits@^2.0.1, inherits@^2.0.3, inherits@~2.0.3: + version "2.0.4" + resolved "https://registry.yarnpkg.com/inherits/-/inherits-2.0.4.tgz#0fa2c64f932917c3433a0ded55363aae37416b7c" + integrity sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ== + +inherits@2.0.3: + version "2.0.3" + resolved "https://registry.yarnpkg.com/inherits/-/inherits-2.0.3.tgz#633c2c83e3da42a502f52466022480f4208261de" + integrity sha1-Yzwsg+PaQqUC9SRmAiSA9CCCYd4= + +inline-chunk-html-plugin@^1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/inline-chunk-html-plugin/-/inline-chunk-html-plugin-1.1.1.tgz#f64111aed16fac274d2b929f6a6a08671d82354e" + integrity sha512-6W1eGIj8z/Yla6xJx5il6jJfCxMZS3kVkbiLQThbbjdsDLRIWkUVmpnhfW2l6WAwCW+qfy0zoXVGBZM1E5XF3g== + +interpret@^2.2.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/interpret/-/interpret-2.2.0.tgz#1a78a0b5965c40a5416d007ad6f50ad27c417df9" + integrity sha512-Ju0Bz/cEia55xDwUWEa8+olFpCiQoypjnQySseKtmjNrnps3P+xfpUmGr90T7yjlVJmOtybRvPXhKMbHr+fWnw== + +ip@^1.1.0: + version "1.1.5" + resolved "https://registry.yarnpkg.com/ip/-/ip-1.1.5.tgz#bdded70114290828c0a039e72ef25f5aaec4354a" + integrity sha1-vd7XARQpCCjAoDnnLvJfWq7ENUo= + +ipaddr.js@1.9.1: + version "1.9.1" + resolved "https://registry.yarnpkg.com/ipaddr.js/-/ipaddr.js-1.9.1.tgz#bff38543eeb8984825079ff3a2a8e6cbd46781b3" + integrity sha512-0KI/607xoxSToH7GjN1FfSbLoU0+btTicjsQSWQlh/hZykN8KpmMf7uYwPW3R+akZ6R/w18ZlXSHBYXiYUPO3g== + +ipaddr.js@^2.0.1: + version "2.0.1" + resolved "https://registry.yarnpkg.com/ipaddr.js/-/ipaddr.js-2.0.1.tgz#eca256a7a877e917aeb368b0a7497ddf42ef81c0" + integrity sha512-1qTgH9NG+IIJ4yfKs2e6Pp1bZg8wbDbKHT21HrLIeYBTRLgMYKnMTPAuI3Lcs61nfx5h1xlXnbJtH1kX5/d/ng== + +is-arguments@^1.0.4: + version "1.1.1" + resolved "https://registry.yarnpkg.com/is-arguments/-/is-arguments-1.1.1.tgz#15b3f88fda01f2a97fec84ca761a560f123efa9b" + integrity sha512-8Q7EARjzEnKpt/PCD7e1cgUS0a6X8u5tdSiMqXhojOdoV9TsMsiO+9VLC5vAmO8N7/GmXn7yjR8qnA6bVAEzfA== + dependencies: + call-bind "^1.0.2" + has-tostringtag "^1.0.0" + +is-binary-path@~2.1.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/is-binary-path/-/is-binary-path-2.1.0.tgz#ea1f7f3b80f064236e83470f86c09c254fb45b09" + integrity sha512-ZMERYes6pDydyuGidse7OsHxtbI7WVeUEozgR/g7rd0xUimYNlvZRE/K2MgZTjWy725IfelLeVcEM97mmtRGXw== + dependencies: + binary-extensions "^2.0.0" + +is-core-module@^2.8.1: + version "2.8.1" + resolved "https://registry.yarnpkg.com/is-core-module/-/is-core-module-2.8.1.tgz#f59fdfca701d5879d0a6b100a40aa1560ce27211" + integrity sha512-SdNCUs284hr40hFTFP6l0IfZ/RSrMXF3qgoRHd3/79unUTvrFO/JoXwkGm+5J/Oe3E/b5GsnG330uUNgRpu1PA== + dependencies: + has "^1.0.3" + +is-date-object@^1.0.1: + version "1.0.5" + resolved "https://registry.yarnpkg.com/is-date-object/-/is-date-object-1.0.5.tgz#0841d5536e724c25597bf6ea62e1bd38298df31f" + integrity sha512-9YQaSxsAiSwcvS33MBk3wTCVnWK+HhF8VZR2jRxehM16QcVOdHqPn4VPHmRK4lSr38n9JriurInLcP90xsYNfQ== + dependencies: + has-tostringtag "^1.0.0" + +is-docker@^2.0.0, is-docker@^2.1.1: + version "2.2.1" + resolved "https://registry.yarnpkg.com/is-docker/-/is-docker-2.2.1.tgz#33eeabe23cfe86f14bde4408a02c0cfb853acdaa" + integrity sha512-F+i2BKsFrH66iaUFc0woD8sLy8getkwTwtOBjvs56Cx4CgJDeKQeqfz8wAYiSb8JOprWhHH5p77PbmYCvvUuXQ== + +is-extglob@^2.1.1: + version "2.1.1" + resolved "https://registry.yarnpkg.com/is-extglob/-/is-extglob-2.1.1.tgz#a88c02535791f02ed37c76a1b9ea9773c833f8c2" + integrity sha1-qIwCU1eR8C7TfHahueqXc8gz+MI= + +is-glob@^4.0.1, is-glob@~4.0.1: + version "4.0.3" + resolved "https://registry.yarnpkg.com/is-glob/-/is-glob-4.0.3.tgz#64f61e42cbbb2eec2071a9dac0b28ba1e65d5084" + integrity sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg== + dependencies: + is-extglob "^2.1.1" + +is-in-browser@^1.0.2, is-in-browser@^1.1.3: + version "1.1.3" + resolved "https://registry.yarnpkg.com/is-in-browser/-/is-in-browser-1.1.3.tgz#56ff4db683a078c6082eb95dad7dc62e1d04f835" + integrity sha1-Vv9NtoOgeMYILrldrX3GLh0E+DU= + +is-number@^7.0.0: + version "7.0.0" + resolved "https://registry.yarnpkg.com/is-number/-/is-number-7.0.0.tgz#7535345b896734d5f80c4d06c50955527a14f12b" + integrity sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng== + +is-path-cwd@^2.2.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/is-path-cwd/-/is-path-cwd-2.2.0.tgz#67d43b82664a7b5191fd9119127eb300048a9fdb" + integrity sha512-w942bTcih8fdJPJmQHFzkS76NEP8Kzzvmw92cXsazb8intwLqPibPPdXf4ANdKV3rYMuuQYGIWtvz9JilB3NFQ== + +is-path-inside@^3.0.2: + version "3.0.3" + resolved "https://registry.yarnpkg.com/is-path-inside/-/is-path-inside-3.0.3.tgz#d231362e53a07ff2b0e0ea7fed049161ffd16283" + integrity sha512-Fd4gABb+ycGAmKou8eMftCupSir5lRxqf4aD/vd0cD2qc4HL07OjCeuHMr8Ro4CoMaeCKDB0/ECBOVWjTwUvPQ== + +is-plain-obj@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/is-plain-obj/-/is-plain-obj-3.0.0.tgz#af6f2ea14ac5a646183a5bbdb5baabbc156ad9d7" + integrity sha512-gwsOE28k+23GP1B6vFl1oVh/WOzmawBrKwo5Ev6wMKzPkaXaCDIQKzLnvsA42DRlbVTWorkgTKIviAKCWkfUwA== + +is-plain-object@^2.0.4: + version "2.0.4" + resolved "https://registry.yarnpkg.com/is-plain-object/-/is-plain-object-2.0.4.tgz#2c163b3fafb1b606d9d17928f05c2a1c38e07677" + integrity sha512-h5PpgXkWitc38BBMYawTYMWJHFZJVnBquFE57xFpjB8pJFiF6gZ+bU+WyI/yqXiFR5mdLsgYNaPe8uao6Uv9Og== + dependencies: + isobject "^3.0.1" + +is-regex@^1.0.4: + version "1.1.4" + resolved "https://registry.yarnpkg.com/is-regex/-/is-regex-1.1.4.tgz#eef5663cd59fa4c0ae339505323df6854bb15958" + integrity sha512-kvRdxDsxZjhzUX07ZnLydzS1TU/TJlTUHHY4YLL87e37oUA49DfkLqgy+VjFocowy29cKvcSiu+kIv728jTTVg== + dependencies: + call-bind "^1.0.2" + has-tostringtag "^1.0.0" + +is-stream@^2.0.0: + version "2.0.1" + resolved "https://registry.yarnpkg.com/is-stream/-/is-stream-2.0.1.tgz#fac1e3d53b97ad5a9d0ae9cef2389f5810a5c077" + integrity sha512-hFoiJiTl63nn+kstHGBtewWSKnQLpyb155KHheA1l39uvtO9nWIop1p3udqPcUd/xbF1VLMO4n7OI6p7RbngDg== + +is-wsl@^2.2.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/is-wsl/-/is-wsl-2.2.0.tgz#74a4c76e77ca9fd3f932f290c17ea326cd157271" + integrity sha512-fKzAra0rGJUUBwGBgNkHZuToZcn+TtXHpeCgmkMJMMYx1sQDYaCSyjJBSCa2nH1DGm7s3n1oBnohoVTBaN7Lww== + dependencies: + is-docker "^2.0.0" + +isarray@~1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/isarray/-/isarray-1.0.0.tgz#bb935d48582cba168c06834957a54a3e07124f11" + integrity sha1-u5NdSFgsuhaMBoNJV6VKPgcSTxE= + +isexe@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/isexe/-/isexe-2.0.0.tgz#e8fbf374dc556ff8947a10dcb0572d633f2cfa10" + integrity sha1-6PvzdNxVb/iUehDcsFctYz8s+hA= + +isobject@^3.0.1: + version "3.0.1" + resolved "https://registry.yarnpkg.com/isobject/-/isobject-3.0.1.tgz#4e431e92b11a9731636aa1f9c8d1ccbcfdab78df" + integrity sha1-TkMekrEalzFjaqH5yNHMvP2reN8= + +jest-worker@^27.4.5: + version "27.5.1" + resolved "https://registry.yarnpkg.com/jest-worker/-/jest-worker-27.5.1.tgz#8d146f0900e8973b106b6f73cc1e9a8cb86f8db0" + integrity sha512-7vuh85V5cdDofPyxn58nrPjBktZo0u9x1g8WtjQol+jZDaE+fhN+cIvTj11GndBnMnyfrUOG1sZQxCdjKh+DKg== + dependencies: + "@types/node" "*" + merge-stream "^2.0.0" + supports-color "^8.0.0" + +"js-tokens@^3.0.0 || ^4.0.0": + version "4.0.0" + resolved "https://registry.yarnpkg.com/js-tokens/-/js-tokens-4.0.0.tgz#19203fb59991df98e3a287050d4647cdeaf32499" + integrity sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ== + +json-parse-better-errors@^1.0.2: + version "1.0.2" + resolved "https://registry.yarnpkg.com/json-parse-better-errors/-/json-parse-better-errors-1.0.2.tgz#bb867cfb3450e69107c131d1c514bab3dc8bcaa9" + integrity sha512-mrqyZKfX5EhL7hvqcV6WG1yYjnjeuYDzDhhcAAUrq8Po85NBQBJP+ZDUT75qZQ98IkUoBqdkExkukOU7Ts2wrw== + +json-schema-traverse@^0.4.1: + version "0.4.1" + resolved "https://registry.yarnpkg.com/json-schema-traverse/-/json-schema-traverse-0.4.1.tgz#69f6a87d9513ab8bb8fe63bdb0979c448e684660" + integrity sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg== + +json-schema-traverse@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/json-schema-traverse/-/json-schema-traverse-1.0.0.tgz#ae7bcb3656ab77a73ba5c49bf654f38e6b6860e2" + integrity sha512-NM8/P9n3XjXhIZn1lLhkFaACTOURQXjWhV4BA/RnOv8xvgqtqpAX9IO4mRQxSx1Rlo4tqzeqb0sOlruaOy3dug== + +json2mq@^0.2.0: + version "0.2.0" + resolved "https://registry.yarnpkg.com/json2mq/-/json2mq-0.2.0.tgz#b637bd3ba9eabe122c83e9720483aeb10d2c904a" + integrity sha1-tje9O6nqvhIsg+lyBIOusQ0skEo= + dependencies: + string-convert "^0.2.0" + +json5@^2.1.2: + version "2.2.0" + resolved "https://registry.yarnpkg.com/json5/-/json5-2.2.0.tgz#2dfefe720c6ba525d9ebd909950f0515316c89a3" + integrity sha512-f+8cldu7X/y7RAJurMEJmdoKXGB/X550w2Nr3tTbezL6RwEE/iMcm+tZnXeoZtKuOq6ft8+CqzEkrIgx1fPoQA== + dependencies: + minimist "^1.2.5" + +jss-plugin-camel-case@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-camel-case/-/jss-plugin-camel-case-10.9.0.tgz#4921b568b38d893f39736ee8c4c5f1c64670aaf7" + integrity sha512-UH6uPpnDk413/r/2Olmw4+y54yEF2lRIV8XIZyuYpgPYTITLlPOsq6XB9qeqv+75SQSg3KLocq5jUBXW8qWWww== + dependencies: + "@babel/runtime" "^7.3.1" + hyphenate-style-name "^1.0.3" + jss "10.9.0" + +jss-plugin-default-unit@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-default-unit/-/jss-plugin-default-unit-10.9.0.tgz#bb23a48f075bc0ce852b4b4d3f7582bc002df991" + integrity sha512-7Ju4Q9wJ/MZPsxfu4T84mzdn7pLHWeqoGd/D8O3eDNNJ93Xc8PxnLmV8s8ZPNRYkLdxZqKtm1nPQ0BM4JRlq2w== + dependencies: + "@babel/runtime" "^7.3.1" + jss "10.9.0" + +jss-plugin-global@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-global/-/jss-plugin-global-10.9.0.tgz#fc07a0086ac97aca174e37edb480b69277f3931f" + integrity sha512-4G8PHNJ0x6nwAFsEzcuVDiBlyMsj2y3VjmFAx/uHk/R/gzJV+yRHICjT4MKGGu1cJq2hfowFWCyrr/Gg37FbgQ== + dependencies: + "@babel/runtime" "^7.3.1" + jss "10.9.0" + +jss-plugin-nested@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-nested/-/jss-plugin-nested-10.9.0.tgz#cc1c7d63ad542c3ccc6e2c66c8328c6b6b00f4b3" + integrity sha512-2UJnDrfCZpMYcpPYR16oZB7VAC6b/1QLsRiAutOt7wJaaqwCBvNsosLEu/fUyKNQNGdvg2PPJFDO5AX7dwxtoA== + dependencies: + "@babel/runtime" "^7.3.1" + jss "10.9.0" + tiny-warning "^1.0.2" + +jss-plugin-props-sort@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-props-sort/-/jss-plugin-props-sort-10.9.0.tgz#30e9567ef9479043feb6e5e59db09b4de687c47d" + integrity sha512-7A76HI8bzwqrsMOJTWKx/uD5v+U8piLnp5bvru7g/3ZEQOu1+PjHvv7bFdNO3DwNPC9oM0a//KwIJsIcDCjDzw== + dependencies: + "@babel/runtime" "^7.3.1" + jss "10.9.0" + +jss-plugin-rule-value-function@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-rule-value-function/-/jss-plugin-rule-value-function-10.9.0.tgz#379fd2732c0746fe45168011fe25544c1a295d67" + integrity sha512-IHJv6YrEf8pRzkY207cPmdbBstBaE+z8pazhPShfz0tZSDtRdQua5jjg6NMz3IbTasVx9FdnmptxPqSWL5tyJg== + dependencies: + "@babel/runtime" "^7.3.1" + jss "10.9.0" + tiny-warning "^1.0.2" + +jss-plugin-vendor-prefixer@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss-plugin-vendor-prefixer/-/jss-plugin-vendor-prefixer-10.9.0.tgz#aa9df98abfb3f75f7ed59a3ec50a5452461a206a" + integrity sha512-MbvsaXP7iiVdYVSEoi+blrW+AYnTDvHTW6I6zqi7JcwXdc6I9Kbm234nEblayhF38EftoenbM+5218pidmC5gA== + dependencies: + "@babel/runtime" "^7.3.1" + css-vendor "^2.0.8" + jss "10.9.0" + +jss@10.9.0, jss@^10.5.1: + version "10.9.0" + resolved "https://registry.yarnpkg.com/jss/-/jss-10.9.0.tgz#7583ee2cdc904a83c872ba695d1baab4b59c141b" + integrity sha512-YpzpreB6kUunQBbrlArlsMpXYyndt9JATbt95tajx0t4MTJJcCJdd4hdNpHmOIDiUJrF/oX5wtVFrS3uofWfGw== + dependencies: + "@babel/runtime" "^7.3.1" + csstype "^3.0.2" + is-in-browser "^1.1.3" + tiny-warning "^1.0.2" + +kind-of@^6.0.2: + version "6.0.3" + resolved "https://registry.yarnpkg.com/kind-of/-/kind-of-6.0.3.tgz#07c05034a6c349fa06e24fa35aa76db4580ce4dd" + integrity sha512-dcS1ul+9tmeD95T+x28/ehLgd9mENa3LsvDTtzm3vyBEO7RPptvAD+t44WVXaUjTBRcrpFeFlC8WCruUR456hw== + +loader-runner@^4.2.0: + version "4.2.0" + resolved "https://registry.yarnpkg.com/loader-runner/-/loader-runner-4.2.0.tgz#d7022380d66d14c5fb1d496b89864ebcfd478384" + integrity sha512-92+huvxMvYlMzMt0iIOukcwYBFpkYJdpl2xsZ7LrlayO7E8SOv+JJUEK17B/dJIHAOLMfh2dZZ/Y18WgmGtYNw== + +loader-utils@^2.0.0: + version "2.0.2" + resolved "https://registry.yarnpkg.com/loader-utils/-/loader-utils-2.0.2.tgz#d6e3b4fb81870721ae4e0868ab11dd638368c129" + integrity sha512-TM57VeHptv569d/GKh6TAYdzKblwDNiumOdkFnejjD0XwTH87K90w3O7AiJRqdQoXygvi1VQTJTLGhJl7WqA7A== + dependencies: + big.js "^5.2.2" + emojis-list "^3.0.0" + json5 "^2.1.2" + +locate-path@^5.0.0: + version "5.0.0" + resolved "https://registry.yarnpkg.com/locate-path/-/locate-path-5.0.0.tgz#1afba396afd676a6d42504d0a67a3a7eb9f62aa0" + integrity sha512-t7hw9pI+WvuwNJXwk5zVHpyhIqzg2qTlklJOf0mVxGSbe3Fp2VieZcduNYjaLDoy6p9uGpQEGWG87WpMKlNq8g== + dependencies: + p-locate "^4.1.0" + +lodash@^4.17.14, lodash@^4.17.20, lodash@^4.17.21: + version "4.17.21" + resolved "https://registry.yarnpkg.com/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c" + integrity sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg== + +loose-envify@^1.1.0, loose-envify@^1.4.0: + version "1.4.0" + resolved "https://registry.yarnpkg.com/loose-envify/-/loose-envify-1.4.0.tgz#71ee51fa7be4caec1a63839f7e682d8132d30caf" + integrity sha512-lyuxPGr/Wfhrlem2CL/UcnUc1zcqKAImBDzukY7Y5F/yQiNdko6+fRLevlw1HgMySw7f611UIY408EtxRSoK3Q== + dependencies: + js-tokens "^3.0.0 || ^4.0.0" + +lower-case@^2.0.2: + version "2.0.2" + resolved "https://registry.yarnpkg.com/lower-case/-/lower-case-2.0.2.tgz#6fa237c63dbdc4a82ca0fd882e4722dc5e634e28" + integrity sha512-7fm3l3NAF9WfN6W3JOmf5drwpVqX78JtoGJ3A6W0a6ZnldM41w2fV5D490psKFTpMds8TJse/eHLFFsNHHjHgg== + dependencies: + tslib "^2.0.3" + +lru-cache@^6.0.0: + version "6.0.0" + resolved "https://registry.yarnpkg.com/lru-cache/-/lru-cache-6.0.0.tgz#6d6fe6570ebd96aaf90fcad1dafa3b2566db3a94" + integrity sha512-Jo6dJ04CmSjuznwJSS3pUeWmd/H0ffTlkXXgwZi+eq1UCmqQwCh+eLsYOYCwY991i2Fah4h1BEMCx4qThGbsiA== + dependencies: + yallist "^4.0.0" + +media-typer@0.3.0: + version "0.3.0" + resolved "https://registry.yarnpkg.com/media-typer/-/media-typer-0.3.0.tgz#8710d7af0aa626f8fffa1ce00168545263255748" + integrity sha1-hxDXrwqmJvj/+hzgAWhUUmMlV0g= + +memfs@^3.4.1: + version "3.4.1" + resolved "https://registry.yarnpkg.com/memfs/-/memfs-3.4.1.tgz#b78092f466a0dce054d63d39275b24c71d3f1305" + integrity sha512-1c9VPVvW5P7I85c35zAdEr1TD5+F11IToIHIlrVIcflfnzPkJa0ZoYEoEdYDP8KgPFoSZ/opDrUsAoZWym3mtw== + dependencies: + fs-monkey "1.0.3" + +"memoize-one@>=3.1.1 <6": + version "5.2.1" + resolved "https://registry.yarnpkg.com/memoize-one/-/memoize-one-5.2.1.tgz#8337aa3c4335581839ec01c3d594090cebe8f00e" + integrity sha512-zYiwtZUcYyXKo/np96AGZAckk+FWWsUdJ3cHGGmld7+AhvcWmQyGCYUh1hc4Q/pkOhb65dQR/pqCyK0cOaHz4Q== + +memoize-one@^3.1.1: + version "3.1.1" + resolved "https://registry.yarnpkg.com/memoize-one/-/memoize-one-3.1.1.tgz#ef609811e3bc28970eac2884eece64d167830d17" + integrity sha512-YqVh744GsMlZu6xkhGslPSqSurOv6P+kLN2J3ysBZfagLcL5FdRK/0UpgLoL8hwjjEvvAVkjJZyFP+1T6p1vgA== + +memoize-one@^6.0.0: + version "6.0.0" + resolved "https://registry.yarnpkg.com/memoize-one/-/memoize-one-6.0.0.tgz#b2591b871ed82948aee4727dc6abceeeac8c1045" + integrity sha512-rkpe71W0N0c0Xz6QD0eJETuWAJGnJ9afsl1srmwPrI+yBCkge5EycXXbYRyvL29zZVUWQCY7InPRCv3GDXuZNw== + +memory-fs@^0.5.0: + version "0.5.0" + resolved "https://registry.yarnpkg.com/memory-fs/-/memory-fs-0.5.0.tgz#324c01288b88652966d161db77838720845a8e3c" + integrity sha512-jA0rdU5KoQMC0e6ppoNRtpp6vjFq6+NY7r8hywnC7V+1Xj/MtHwGIbB1QaK/dunyjWteJzmkpd7ooeWg10T7GA== + dependencies: + errno "^0.1.3" + readable-stream "^2.0.1" + +merge-descriptors@1.0.1: + version "1.0.1" + resolved "https://registry.yarnpkg.com/merge-descriptors/-/merge-descriptors-1.0.1.tgz#b00aaa556dd8b44568150ec9d1b953f3f90cbb61" + integrity sha1-sAqqVW3YtEVoFQ7J0blT8/kMu2E= + +merge-stream@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/merge-stream/-/merge-stream-2.0.0.tgz#52823629a14dd00c9770fb6ad47dc6310f2c1f60" + integrity sha512-abv/qOcuPfk3URPfDzmZU1LKmuw8kT+0nIHvKrKgFrwifol/doWcdA4ZqsWQ8ENrFKkd67Mfpo/LovbIUsbt3w== + +merge2@^1.3.0, merge2@^1.4.1: + version "1.4.1" + resolved "https://registry.yarnpkg.com/merge2/-/merge2-1.4.1.tgz#4368892f885e907455a6fd7dc55c0c9d404990ae" + integrity sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg== + +methods@~1.1.2: + version "1.1.2" + resolved "https://registry.yarnpkg.com/methods/-/methods-1.1.2.tgz#5529a4d67654134edcc5266656835b0f851afcee" + integrity sha1-VSmk1nZUE07cxSZmVoNbD4Ua/O4= + +micromatch@^4.0.0, micromatch@^4.0.2, micromatch@^4.0.4: + version "4.0.4" + resolved "https://registry.yarnpkg.com/micromatch/-/micromatch-4.0.4.tgz#896d519dfe9db25fce94ceb7a500919bf881ebf9" + integrity sha512-pRmzw/XUcwXGpD9aI9q/0XOwLNygjETJ8y0ao0wdqprrzDa4YnxLcz7fQRZr8voh8V10kGhABbNcHVk5wHgWwg== + dependencies: + braces "^3.0.1" + picomatch "^2.2.3" + +mime-db@1.51.0: + version "1.51.0" + resolved "https://registry.yarnpkg.com/mime-db/-/mime-db-1.51.0.tgz#d9ff62451859b18342d960850dc3cfb77e63fb0c" + integrity sha512-5y8A56jg7XVQx2mbv1lu49NR4dokRnhZYTtL+KGfaa27uq4pSTXkwQkFJl4pkRMyNFz/EtYDSkiiEHx3F7UN6g== + +"mime-db@>= 1.43.0 < 2": + version "1.52.0" + resolved "https://registry.yarnpkg.com/mime-db/-/mime-db-1.52.0.tgz#bbabcdc02859f4987301c856e3387ce5ec43bf70" + integrity sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg== + +mime-types@^2.1.27, mime-types@^2.1.31, mime-types@~2.1.17, mime-types@~2.1.24, mime-types@~2.1.34: + version "2.1.34" + resolved "https://registry.yarnpkg.com/mime-types/-/mime-types-2.1.34.tgz#5a712f9ec1503511a945803640fafe09d3793c24" + integrity sha512-6cP692WwGIs9XXdOO4++N+7qjqv0rqxxVvJ3VHPh/Sc9mVZcQP+ZGhkKiTvWMQRr2tbHkJP/Yn7Y0npb3ZBs4A== + dependencies: + mime-db "1.51.0" + +mime@1.6.0: + version "1.6.0" + resolved "https://registry.yarnpkg.com/mime/-/mime-1.6.0.tgz#32cd9e5c64553bd58d19a568af452acff04981b1" + integrity sha512-x0Vn8spI+wuJ1O6S7gnbaQg8Pxh4NNHb7KSINmEWKiPE4RKOplvijn+NkmYmmRgP68mc70j2EbeTFRsrswaQeg== + +mimic-fn@^2.1.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/mimic-fn/-/mimic-fn-2.1.0.tgz#7ed2c2ccccaf84d3ffcb7a69b57711fc2083401b" + integrity sha512-OqbOk5oEQeAZ8WXWydlu9HJjz9WVdEIvamMCcXmuqUYjTknH/sqsWvhQ3vgwKFRR1HpjvNBKQ37nbJgYzGqGcg== + +minimalistic-assert@^1.0.0: + version "1.0.1" + resolved "https://registry.yarnpkg.com/minimalistic-assert/-/minimalistic-assert-1.0.1.tgz#2e194de044626d4a10e7f7fbc00ce73e83e4d5c7" + integrity sha512-UtJcAD4yEaGtjPezWuO9wC4nwUnVH/8/Im3yEHQP4b67cXlD/Qr9hdITCU1xDbSEXg2XKNaP8jsReV7vQd00/A== + +minimatch@^3.0.4: + version "3.1.2" + resolved "https://registry.yarnpkg.com/minimatch/-/minimatch-3.1.2.tgz#19cd194bfd3e428f049a70817c038d89ab4be35b" + integrity sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw== + dependencies: + brace-expansion "^1.1.7" + +minimist@^1.2.5: + version "1.2.5" + resolved "https://registry.yarnpkg.com/minimist/-/minimist-1.2.5.tgz#67d66014b66a6a8aaa0c083c5fd58df4e4e97602" + integrity sha512-FM9nNUYrRBAELZQT3xeZQ7fmMOBg6nWNmJKTcgsJeaLstP/UODVpGsr5OhXhhXg6f+qtJ8uiZ+PUxkDWcgIXLw== + +mkdirp@^0.5.5: + version "0.5.5" + resolved "https://registry.yarnpkg.com/mkdirp/-/mkdirp-0.5.5.tgz#d91cefd62d1436ca0f41620e251288d420099def" + integrity sha512-NKmAlESf6jMGym1++R0Ra7wvhV+wFW63FaSOFPwRahvea0gMUcGUhVeAg/0BC0wiv9ih5NYPB1Wn1UEI1/L+xQ== + dependencies: + minimist "^1.2.5" + +moment@^2.24.0, moment@^2.25.3: + version "2.29.1" + resolved "https://registry.yarnpkg.com/moment/-/moment-2.29.1.tgz#b2be769fa31940be9eeea6469c075e35006fa3d3" + integrity sha512-kHmoybcPV8Sqy59DwNDY3Jefr64lK/by/da0ViFcuA4DH0vQg5Q6Ze5VimxkfQNSC+Mls/Kx53s7TjP1RhFEDQ== + +ms@2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/ms/-/ms-2.0.0.tgz#5608aeadfc00be6c2901df5f9861788de0d597c8" + integrity sha1-VgiurfwAvmwpAd9fmGF4jeDVl8g= + +ms@2.1.2: + version "2.1.2" + resolved "https://registry.yarnpkg.com/ms/-/ms-2.1.2.tgz#d09d1f357b443f493382a8eb3ccd183872ae6009" + integrity sha512-sGkPx+VjMtmA6MX27oA4FBFELFCZZ4S4XqeGOXCv68tT+jb3vk/RyaKWP0PTKyWtmLSM0b+adUTEvbs1PEaH2w== + +ms@2.1.3, ms@^2.1.1: + version "2.1.3" + resolved "https://registry.yarnpkg.com/ms/-/ms-2.1.3.tgz#574c8138ce1d2b5861f0b44579dbadd60c6615b2" + integrity sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA== + +multicast-dns-service-types@^1.1.0: + version "1.1.0" + resolved "https://registry.yarnpkg.com/multicast-dns-service-types/-/multicast-dns-service-types-1.1.0.tgz#899f11d9686e5e05cb91b35d5f0e63b773cfc901" + integrity sha1-iZ8R2WhuXgXLkbNdXw5jt3PPyQE= + +multicast-dns@^6.0.1: + version "6.2.3" + resolved "https://registry.yarnpkg.com/multicast-dns/-/multicast-dns-6.2.3.tgz#a0ec7bd9055c4282f790c3c82f4e28db3b31b229" + integrity sha512-ji6J5enbMyGRHIAkAOu3WdV8nggqviKCEKtXcOqfphZZtQrmHKycfynJ2V7eVPUA4NhJ6V7Wf4TmGbTwKE9B6g== + dependencies: + dns-packet "^1.3.1" + thunky "^1.0.2" + +nanoid@^3.1.31, nanoid@^3.3.1: + version "3.3.1" + resolved "https://registry.yarnpkg.com/nanoid/-/nanoid-3.3.1.tgz#6347a18cac88af88f58af0b3594b723d5e99bb35" + integrity sha512-n6Vs/3KGyxPQd6uO0eH4Bv0ojGSUvuLlIHtC3Y0kEO23YRge8H9x1GCzLn28YX0H66pMkxuaeESFq4tKISKwdw== + +negotiator@0.6.3: + version "0.6.3" + resolved "https://registry.yarnpkg.com/negotiator/-/negotiator-0.6.3.tgz#58e323a72fedc0d6f9cd4d31fe49f51479590ccd" + integrity sha512-+EUsqGPLsM+j/zdChZjsnX51g4XrHFOIXwfnCVPGlQk/k5giakcKsuxCObBRu6DSm9opw/O6slWbJdghQM4bBg== + +neo-async@^2.6.2: + version "2.6.2" + resolved "https://registry.yarnpkg.com/neo-async/-/neo-async-2.6.2.tgz#b4aafb93e3aeb2d8174ca53cf163ab7d7308305f" + integrity sha512-Yd3UES5mWCSqR+qNT93S3UoYUkqAZ9lLg8a7g9rimsWmYGK8cVToA4/sF3RrshdyV3sAGMXVUmpMYOw+dLpOuw== + +no-case@^3.0.4: + version "3.0.4" + resolved "https://registry.yarnpkg.com/no-case/-/no-case-3.0.4.tgz#d361fd5c9800f558551a8369fc0dcd4662b6124d" + integrity sha512-fgAN3jGAh+RoxUGZHTSOLJIqUc2wmoBwGR4tbpNAKmmovFoWq0OdRkb0VkldReO2a2iBT/OEulG9XSUc10r3zg== + dependencies: + lower-case "^2.0.2" + tslib "^2.0.3" + +node-fetch@^1.0.1, node-fetch@^2.6.1: + version "2.6.7" + resolved "https://registry.yarnpkg.com/node-fetch/-/node-fetch-2.6.7.tgz#24de9fba827e3b4ae44dc8b20256a379160052ad" + integrity sha512-ZjMPFEfVx5j+y2yF35Kzx5sF7kDzxuDj6ziH4FFbOp87zKDZNx8yExJIb05OGF4Nlt9IHFIMBkRl41VdvcNdbQ== + dependencies: + whatwg-url "^5.0.0" + +node-forge@^1.2.0: + version "1.2.1" + resolved "https://registry.yarnpkg.com/node-forge/-/node-forge-1.2.1.tgz#82794919071ef2eb5c509293325cec8afd0fd53c" + integrity sha512-Fcvtbb+zBcZXbTTVwqGA5W+MKBj56UjVRevvchv5XrcyXbmNdesfZL37nlcWOfpgHhgmxApw3tQbTr4CqNmX4w== + +node-releases@^2.0.2: + version "2.0.2" + resolved "https://registry.yarnpkg.com/node-releases/-/node-releases-2.0.2.tgz#7139fe71e2f4f11b47d4d2986aaf8c48699e0c01" + integrity sha512-XxYDdcQ6eKqp/YjI+tb2C5WM2LgjnZrfYg4vgQt49EK268b6gYCHsBLrK2qvJo4FmCtqmKezb0WZFK4fkrZNsg== + +normalize-path@^3.0.0, normalize-path@~3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/normalize-path/-/normalize-path-3.0.0.tgz#0dcd69ff23a1c9b11fd0978316644a0388216a65" + integrity sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA== + +npm-run-path@^4.0.1: + version "4.0.1" + resolved "https://registry.yarnpkg.com/npm-run-path/-/npm-run-path-4.0.1.tgz#b7ecd1e5ed53da8e37a55e1c2269e0b97ed748ea" + integrity sha512-S48WzZW777zhNIrn7gxOlISNAqi9ZC/uQFnRdbeIHhZhCA6UqpkOT8T1G7BvfdgP4Er8gF4sUbaS0i7QvIfCWw== + dependencies: + path-key "^3.0.0" + +nth-check@^2.0.1: + version "2.0.1" + resolved "https://registry.yarnpkg.com/nth-check/-/nth-check-2.0.1.tgz#2efe162f5c3da06a28959fbd3db75dbeea9f0fc2" + integrity sha512-it1vE95zF6dTT9lBsYbxvqh0Soy4SPowchj0UBGj/V6cTPnXXtQOPUbhZ6CmGzAD/rW22LQK6E96pcdJXk4A4w== + dependencies: + boolbase "^1.0.0" + +object-assign@^4.1.1: + version "4.1.1" + resolved "https://registry.yarnpkg.com/object-assign/-/object-assign-4.1.1.tgz#2109adc7965887cfc05cbbd442cac8bfbb360863" + integrity sha1-IQmtx5ZYh8/AXLvUQsrIv7s2CGM= + +object-is@^1.0.1: + version "1.1.5" + resolved "https://registry.yarnpkg.com/object-is/-/object-is-1.1.5.tgz#b9deeaa5fc7f1846a0faecdceec138e5778f53ac" + integrity sha512-3cyDsyHgtmi7I7DfSSI2LDp6SK2lwvtbg0p0R1e0RvTqF5ceGx+K2dfSjm1bKDMVCFEDAQvy+o8c6a7VujOddw== + dependencies: + call-bind "^1.0.2" + define-properties "^1.1.3" + +object-keys@^1.0.12, object-keys@^1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/object-keys/-/object-keys-1.1.1.tgz#1c47f272df277f3b1daf061677d9c82e2322c60e" + integrity sha512-NuAESUOUMrlIXOfHKzD6bpPu3tYt3xvjNdRIQ+FeT0lNb4K8WR70CaDxhuNguS2XG+GjkyMwOzsN5ZktImfhLA== + +obuf@^1.0.0, obuf@^1.1.2: + version "1.1.2" + resolved "https://registry.yarnpkg.com/obuf/-/obuf-1.1.2.tgz#09bea3343d41859ebd446292d11c9d4db619084e" + integrity sha512-PX1wu0AmAdPqOL1mWhqmlOd8kOIZQwGZw6rh7uby9fTc5lhaOWFLX3I6R1hrF9k3zUY40e6igsLGkDXK92LJNg== + +on-finished@~2.3.0: + version "2.3.0" + resolved "https://registry.yarnpkg.com/on-finished/-/on-finished-2.3.0.tgz#20f1336481b083cd75337992a16971aa2d906947" + integrity sha1-IPEzZIGwg811M3mSoWlxqi2QaUc= + dependencies: + ee-first "1.1.1" + +on-headers@~1.0.2: + version "1.0.2" + resolved "https://registry.yarnpkg.com/on-headers/-/on-headers-1.0.2.tgz#772b0ae6aaa525c399e489adfad90c403eb3c28f" + integrity sha512-pZAE+FJLoyITytdqK0U5s+FIpjN0JP3OzFi/u8Rx+EV5/W+JTWGXG8xFzevE7AjBfDqHv/8vL8qQsIhHnqRkrA== + +once@^1.3.0: + version "1.4.0" + resolved "https://registry.yarnpkg.com/once/-/once-1.4.0.tgz#583b1aa775961d4b113ac17d9c50baef9dd76bd1" + integrity sha1-WDsap3WWHUsROsF9nFC6753Xa9E= + dependencies: + wrappy "1" + +onetime@^5.1.2: + version "5.1.2" + resolved "https://registry.yarnpkg.com/onetime/-/onetime-5.1.2.tgz#d0e96ebb56b07476df1dd9c4806e5237985ca45e" + integrity sha512-kbpaSSGJTWdAY5KPVeMOKXSrPtr8C8C7wodJbcsd51jRnmD+GZu8Y0VoU6Dm5Z4vWr0Ig/1NKuWRKf7j5aaYSg== + dependencies: + mimic-fn "^2.1.0" + +open@^8.0.9: + version "8.4.0" + resolved "https://registry.yarnpkg.com/open/-/open-8.4.0.tgz#345321ae18f8138f82565a910fdc6b39e8c244f8" + integrity sha512-XgFPPM+B28FtCCgSb9I+s9szOC1vZRSwgWsRUA5ylIxRTgKozqjOCrVOqGsYABPYK5qnfqClxZTFBa8PKt2v6Q== + dependencies: + define-lazy-prop "^2.0.0" + is-docker "^2.1.1" + is-wsl "^2.2.0" + +p-limit@^2.2.0: + version "2.3.0" + resolved "https://registry.yarnpkg.com/p-limit/-/p-limit-2.3.0.tgz#3dd33c647a214fdfffd835933eb086da0dc21db1" + integrity sha512-//88mFWSJx8lxCzwdAABTJL2MyWB12+eIY7MDL2SqLmAkeKU9qxRvWuSyTjm3FUmpBEMuFfckAIqEaVGUDxb6w== + dependencies: + p-try "^2.0.0" + +p-locate@^4.1.0: + version "4.1.0" + resolved "https://registry.yarnpkg.com/p-locate/-/p-locate-4.1.0.tgz#a3428bb7088b3a60292f66919278b7c297ad4f07" + integrity sha512-R79ZZ/0wAxKGu3oYMlz8jy/kbhsNrS7SKZ7PxEHBgJ5+F2mtFW2fK2cOtBh1cHYkQsbzFV7I+EoRKe6Yt0oK7A== + dependencies: + p-limit "^2.2.0" + +p-map@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/p-map/-/p-map-4.0.0.tgz#bb2f95a5eda2ec168ec9274e06a747c3e2904d2b" + integrity sha512-/bjOqmgETBYB5BoEeGVea8dmvHb2m9GLy1E9W43yeyfP6QQCZGFNa+XRceJEuDB6zqr+gKpIAmlLebMpykw/MQ== + dependencies: + aggregate-error "^3.0.0" + +p-retry@^4.5.0: + version "4.6.1" + resolved "https://registry.yarnpkg.com/p-retry/-/p-retry-4.6.1.tgz#8fcddd5cdf7a67a0911a9cf2ef0e5df7f602316c" + integrity sha512-e2xXGNhZOZ0lfgR9kL34iGlU8N/KO0xZnQxVEwdeOvpqNDQfdnxIYizvWtK8RglUa3bGqI8g0R/BdfzLMxRkiA== + dependencies: + "@types/retry" "^0.12.0" + retry "^0.13.1" + +p-try@^2.0.0: + version "2.2.0" + resolved "https://registry.yarnpkg.com/p-try/-/p-try-2.2.0.tgz#cb2868540e313d61de58fafbe35ce9004d5540e6" + integrity sha512-R4nPAVTAU0B9D35/Gk3uJf/7XYbQcyohSKdvAxIRSNghFl4e71hVoGnBNQz9cWaXxO2I10KTC+3jMdvvoKw6dQ== + +param-case@^3.0.4: + version "3.0.4" + resolved "https://registry.yarnpkg.com/param-case/-/param-case-3.0.4.tgz#7d17fe4aa12bde34d4a77d91acfb6219caad01c5" + integrity sha512-RXlj7zCYokReqWpOPH9oYivUzLYZ5vAPIfEmCTNViosC78F8F0H9y7T7gG2M39ymgutxF5gcFEsyZQSph9Bp3A== + dependencies: + dot-case "^3.0.4" + tslib "^2.0.3" + +parseurl@~1.3.2, parseurl@~1.3.3: + version "1.3.3" + resolved "https://registry.yarnpkg.com/parseurl/-/parseurl-1.3.3.tgz#9da19e7bee8d12dff0513ed5b76957793bc2e8d4" + integrity sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ== + +pascal-case@^3.1.2: + version "3.1.2" + resolved "https://registry.yarnpkg.com/pascal-case/-/pascal-case-3.1.2.tgz#b48e0ef2b98e205e7c1dae747d0b1508237660eb" + integrity sha512-uWlGT3YSnK9x3BQJaOdcZwrnV6hPpd8jFH1/ucpiLRPh/2zCVJKS19E4GvYHvaCcACn3foXZ0cLB9Wrx1KGe5g== + dependencies: + no-case "^3.0.4" + tslib "^2.0.3" + +path-exists@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/path-exists/-/path-exists-4.0.0.tgz#513bdbe2d3b95d7762e8c1137efa195c6c61b5b3" + integrity sha512-ak9Qy5Q7jYb2Wwcey5Fpvg2KoAc/ZIhLSLOSBmRmygPsGwkVVt0fZa0qrtMz+m6tJTAHfZQ8FnmB4MG4LWy7/w== + +path-is-absolute@^1.0.0: + version "1.0.1" + resolved "https://registry.yarnpkg.com/path-is-absolute/-/path-is-absolute-1.0.1.tgz#174b9268735534ffbc7ace6bf53a5a9e1b5c5f5f" + integrity sha1-F0uSaHNVNP+8es5r9TpanhtcX18= + +path-key@^3.0.0, path-key@^3.1.0: + version "3.1.1" + resolved "https://registry.yarnpkg.com/path-key/-/path-key-3.1.1.tgz#581f6ade658cbba65a0d3380de7753295054f375" + integrity sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q== + +path-parse@^1.0.7: + version "1.0.7" + resolved "https://registry.yarnpkg.com/path-parse/-/path-parse-1.0.7.tgz#fbc114b60ca42b30d9daf5858e4bd68bbedb6735" + integrity sha512-LDJzPVEEEPR+y48z93A0Ed0yXb8pAByGWo/k5YYdYgpY2/2EsOsksJrq7lOHxryrVOn1ejG6oAp8ahvOIQD8sw== + +path-to-regexp@0.1.7: + version "0.1.7" + resolved "https://registry.yarnpkg.com/path-to-regexp/-/path-to-regexp-0.1.7.tgz#df604178005f522f15eb4490e7247a1bfaa67f8c" + integrity sha1-32BBeABfUi8V60SQ5yR6G/qmf4w= + +path-type@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/path-type/-/path-type-4.0.0.tgz#84ed01c0a7ba380afe09d90a8c180dcd9d03043b" + integrity sha512-gDKb8aZMDeD/tZWs9P6+q0J9Mwkdl6xMV8TjnGP3qJVJ06bdMgkbBlLU8IdfOsIsFz2BW1rNVT3XuNEl8zPAvw== + +picocolors@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/picocolors/-/picocolors-1.0.0.tgz#cb5bdc74ff3f51892236eaf79d68bc44564ab81c" + integrity sha512-1fygroTLlHu66zi26VoTDv8yRgm0Fccecssto+MhsZ0D/DGW2sm8E8AjW7NU5VVTRt5GxbeZ5qBuJr+HyLYkjQ== + +picomatch@^2.0.4, picomatch@^2.2.1, picomatch@^2.2.3: + version "2.3.1" + resolved "https://registry.yarnpkg.com/picomatch/-/picomatch-2.3.1.tgz#3ba3833733646d9d3e4995946c1365a67fb07a42" + integrity sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA== + +pkg-dir@^4.2.0: + version "4.2.0" + resolved "https://registry.yarnpkg.com/pkg-dir/-/pkg-dir-4.2.0.tgz#f099133df7ede422e81d1d8448270eeb3e4261f3" + integrity sha512-HRDzbaKjC+AOWVXxAU/x54COGeIv9eb+6CkDSQoNTt4XyWoIJvuPsXizxu/Fr23EiekbtZwmh1IcIG/l/a10GQ== + dependencies: + find-up "^4.0.0" + +popper.js@1.16.1-lts: + version "1.16.1-lts" + resolved "https://registry.yarnpkg.com/popper.js/-/popper.js-1.16.1-lts.tgz#cf6847b807da3799d80ee3d6d2f90df8a3f50b05" + integrity sha512-Kjw8nKRl1m+VrSFCoVGPph93W/qrSO7ZkqPpTf7F4bk/sqcfWK019dWBUpE/fBOsOQY1dks/Bmcbfn1heM/IsA== + +portable-fetch@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/portable-fetch/-/portable-fetch-3.0.0.tgz#3cbf4aa6dbc5a5734b41c0419c9273313bfd9ad8" + integrity sha1-PL9KptvFpXNLQcBBnJJzMTv9mtg= + dependencies: + node-fetch "^1.0.1" + whatwg-fetch ">=0.10.0" + +portfinder@^1.0.28: + version "1.0.28" + resolved "https://registry.yarnpkg.com/portfinder/-/portfinder-1.0.28.tgz#67c4622852bd5374dd1dd900f779f53462fac778" + integrity sha512-Se+2isanIcEqf2XMHjyUKskczxbPH7dQnlMjXX6+dybayyHvAf/TCgyMRlzf/B6QDhAEFOGes0pzRo3by4AbMA== + dependencies: + async "^2.6.2" + debug "^3.1.1" + mkdirp "^0.5.5" + +postcss-modules-extract-imports@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/postcss-modules-extract-imports/-/postcss-modules-extract-imports-3.0.0.tgz#cda1f047c0ae80c97dbe28c3e76a43b88025741d" + integrity sha512-bdHleFnP3kZ4NYDhuGlVK+CMrQ/pqUm8bx/oGL93K6gVwiclvX5x0n76fYMKuIGKzlABOy13zsvqjb0f92TEXw== + +postcss-modules-local-by-default@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/postcss-modules-local-by-default/-/postcss-modules-local-by-default-4.0.0.tgz#ebbb54fae1598eecfdf691a02b3ff3b390a5a51c" + integrity sha512-sT7ihtmGSF9yhm6ggikHdV0hlziDTX7oFoXtuVWeDd3hHObNkcHRo9V3yg7vCAY7cONyxJC/XXCmmiHHcvX7bQ== + dependencies: + icss-utils "^5.0.0" + postcss-selector-parser "^6.0.2" + postcss-value-parser "^4.1.0" + +postcss-modules-scope@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/postcss-modules-scope/-/postcss-modules-scope-3.0.0.tgz#9ef3151456d3bbfa120ca44898dfca6f2fa01f06" + integrity sha512-hncihwFA2yPath8oZ15PZqvWGkWf+XUfQgUGamS4LqoP1anQLOsOJw0vr7J7IwLpoY9fatA2qiGUGmuZL0Iqlg== + dependencies: + postcss-selector-parser "^6.0.4" + +postcss-modules-values@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/postcss-modules-values/-/postcss-modules-values-4.0.0.tgz#d7c5e7e68c3bb3c9b27cbf48ca0bb3ffb4602c9c" + integrity sha512-RDxHkAiEGI78gS2ofyvCsu7iycRv7oqw5xMWn9iMoR0N/7mf9D50ecQqUo5BZ9Zh2vH4bCUR/ktCqbB9m8vJjQ== + dependencies: + icss-utils "^5.0.0" + +postcss-selector-parser@^6.0.2, postcss-selector-parser@^6.0.4: + version "6.0.9" + resolved "https://registry.yarnpkg.com/postcss-selector-parser/-/postcss-selector-parser-6.0.9.tgz#ee71c3b9ff63d9cd130838876c13a2ec1a992b2f" + integrity sha512-UO3SgnZOVTwu4kyLR22UQ1xZh086RyNZppb7lLAKBFK8a32ttG5i87Y/P3+2bRSjZNyJ1B7hfFNo273tKe9YxQ== + dependencies: + cssesc "^3.0.0" + util-deprecate "^1.0.2" + +postcss-value-parser@^4.1.0: + version "4.2.0" + resolved "https://registry.yarnpkg.com/postcss-value-parser/-/postcss-value-parser-4.2.0.tgz#723c09920836ba6d3e5af019f92bc0971c02e514" + integrity sha512-1NNCs6uurfkVbeXG4S8JFT9t19m45ICnif8zWLd5oPSZ50QnwMfK+H3jv408d4jw/7Bttv5axS5IiHoLaVNHeQ== + +postcss@^8.2.15: + version "8.4.8" + resolved "https://registry.yarnpkg.com/postcss/-/postcss-8.4.8.tgz#dad963a76e82c081a0657d3a2f3602ce10c2e032" + integrity sha512-2tXEqGxrjvAO6U+CJzDL2Fk2kPHTv1jQsYkSoMeOis2SsYaXRO2COxTdQp99cYvif9JTXaAk9lYGc3VhJt7JPQ== + dependencies: + nanoid "^3.3.1" + picocolors "^1.0.0" + source-map-js "^1.0.2" + +prettier@^2.1.2: + version "2.5.1" + resolved "https://registry.yarnpkg.com/prettier/-/prettier-2.5.1.tgz#fff75fa9d519c54cf0fce328c1017d94546bc56a" + integrity sha512-vBZcPRUR5MZJwoyi3ZoyQlc1rXeEck8KgeC9AwwOn+exuxLxq5toTRDTSaVrXHxelDMHy9zlicw8u66yxoSUFg== + +pretty-error@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/pretty-error/-/pretty-error-4.0.0.tgz#90a703f46dd7234adb46d0f84823e9d1cb8f10d6" + integrity sha512-AoJ5YMAcXKYxKhuJGdcvse+Voc6v1RgnsR3nWcYU7q4t6z0Q6T86sv5Zq8VIRbOWWFpvdGE83LtdSMNd+6Y0xw== + dependencies: + lodash "^4.17.20" + renderkid "^3.0.0" + +process-nextick-args@~2.0.0: + version "2.0.1" + resolved "https://registry.yarnpkg.com/process-nextick-args/-/process-nextick-args-2.0.1.tgz#7820d9b16120cc55ca9ae7792680ae7dba6d7fe2" + integrity sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag== + +prop-types@^15.6.2, prop-types@^15.7.2: + version "15.8.1" + resolved "https://registry.yarnpkg.com/prop-types/-/prop-types-15.8.1.tgz#67d87bf1a694f48435cf332c24af10214a3140b5" + integrity sha512-oj87CgZICdulUohogVAR7AjlC0327U4el4L6eAvOqCeudMDVU0NThNaV+b9Df4dXgSP1gXMTnPdhfe/2qDH5cg== + dependencies: + loose-envify "^1.4.0" + object-assign "^4.1.1" + react-is "^16.13.1" + +proxy-addr@~2.0.7: + version "2.0.7" + resolved "https://registry.yarnpkg.com/proxy-addr/-/proxy-addr-2.0.7.tgz#f19fe69ceab311eeb94b42e70e8c2070f9ba1025" + integrity sha512-llQsMLSUDUPT44jdrU/O37qlnifitDP+ZwrmmZcoSKyLKvtZxpyV0n2/bD/N4tBAAZ/gJEdZU7KMraoK1+XYAg== + dependencies: + forwarded "0.2.0" + ipaddr.js "1.9.1" + +prr@~1.0.1: + version "1.0.1" + resolved "https://registry.yarnpkg.com/prr/-/prr-1.0.1.tgz#d3fc114ba06995a45ec6893f484ceb1d78f5f476" + integrity sha1-0/wRS6BplaRexok/SEzrHXj19HY= + +punycode@^2.1.0: + version "2.1.1" + resolved "https://registry.yarnpkg.com/punycode/-/punycode-2.1.1.tgz#b58b010ac40c22c5657616c8d2c2c02c7bf479ec" + integrity sha512-XRsRjdf+j5ml+y/6GKHPZbrF/8p2Yga0JPtdqTIY2Xe5ohJPD9saDJJLPvp9+NSBprVvevdXZybnj2cv8OEd0A== + +qs@6.9.7: + version "6.9.7" + resolved "https://registry.yarnpkg.com/qs/-/qs-6.9.7.tgz#4610846871485e1e048f44ae3b94033f0e675afe" + integrity sha512-IhMFgUmuNpyRfxA90umL7ByLlgRXu6tIfKPpF5TmcfRLlLCckfP/g3IQmju6jjpu+Hh8rA+2p6A27ZSPOOHdKw== + +queue-microtask@^1.2.2: + version "1.2.3" + resolved "https://registry.yarnpkg.com/queue-microtask/-/queue-microtask-1.2.3.tgz#4929228bbc724dfac43e0efb058caf7b6cfb6243" + integrity sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A== + +randombytes@^2.1.0: + version "2.1.0" + resolved "https://registry.yarnpkg.com/randombytes/-/randombytes-2.1.0.tgz#df6f84372f0270dc65cdf6291349ab7a473d4f2a" + integrity sha512-vYl3iOX+4CKUWuxGi9Ukhie6fsqXqS9FE2Zaic4tNFD2N2QQaXOMFbuKK4QmDHC0JO6B1Zp41J0LpT0oR68amQ== + dependencies: + safe-buffer "^5.1.0" + +range-parser@^1.2.1, range-parser@~1.2.1: + version "1.2.1" + resolved "https://registry.yarnpkg.com/range-parser/-/range-parser-1.2.1.tgz#3cf37023d199e1c24d1a55b84800c2f3e6468031" + integrity sha512-Hrgsx+orqoygnmhFbKaHE6c296J+HTAQXoxEF6gNupROmmGJRoyzfG3ccAveqCBrwr/2yxQ5BVd/GTl5agOwSg== + +raw-body@2.4.3: + version "2.4.3" + resolved "https://registry.yarnpkg.com/raw-body/-/raw-body-2.4.3.tgz#8f80305d11c2a0a545c2d9d89d7a0286fcead43c" + integrity sha512-UlTNLIcu0uzb4D2f4WltY6cVjLi+/jEN4lgEUj3E04tpMDpUlkBo/eSn6zou9hum2VMNpCCUone0O0WeJim07g== + dependencies: + bytes "3.1.2" + http-errors "1.8.1" + iconv-lite "0.4.24" + unpipe "1.0.0" + +rc-align@^4.0.0: + version "4.0.11" + resolved "https://registry.yarnpkg.com/rc-align/-/rc-align-4.0.11.tgz#8198c62db266bc1b8ef05e56c13275bf72628a5e" + integrity sha512-n9mQfIYQbbNTbefyQnRHZPWuTEwG1rY4a9yKlIWHSTbgwI+XUMGRYd0uJ5pE2UbrNX0WvnMBA1zJ3Lrecpra/A== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + dom-align "^1.7.0" + lodash "^4.17.21" + rc-util "^5.3.0" + resize-observer-polyfill "^1.5.1" + +rc-cascader@~3.2.1: + version "3.2.7" + resolved "https://registry.yarnpkg.com/rc-cascader/-/rc-cascader-3.2.7.tgz#74ac3ab9258f930e0c84dfacffd838b122b2cedf" + integrity sha512-M8VtKtifTXXo/qqXj63p12tsMNXm1z45Lytj7tu86L6gxIF8keDPcJ16/ZqrhS5JwlBPfoJNA1VooNl/KId15A== + dependencies: + "@babel/runtime" "^7.12.5" + array-tree-filter "^2.1.0" + classnames "^2.3.1" + rc-select "~14.0.0-alpha.23" + rc-tree "~5.4.3" + rc-util "^5.6.1" + +rc-checkbox@~2.3.0: + version "2.3.2" + resolved "https://registry.yarnpkg.com/rc-checkbox/-/rc-checkbox-2.3.2.tgz#f91b3678c7edb2baa8121c9483c664fa6f0aefc1" + integrity sha512-afVi1FYiGv1U0JlpNH/UaEXdh6WUJjcWokj/nUN2TgG80bfG+MDdbfHKlLcNNba94mbjy2/SXJ1HDgrOkXGAjg== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.1" + +rc-collapse@~3.1.0: + version "3.1.2" + resolved "https://registry.yarnpkg.com/rc-collapse/-/rc-collapse-3.1.2.tgz#76028a811b845d03d9460ccc409c7ea8ad09db14" + integrity sha512-HujcKq7mghk/gVKeI6EjzTbb8e19XUZpakrYazu1MblEZ3Hu3WBMSN4A3QmvbF6n1g7x6lUlZvsHZ5shABWYOQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-motion "^2.3.4" + rc-util "^5.2.1" + shallowequal "^1.1.0" + +rc-dialog@~8.6.0: + version "8.6.0" + resolved "https://registry.yarnpkg.com/rc-dialog/-/rc-dialog-8.6.0.tgz#3b228dac085de5eed8c6237f31162104687442e7" + integrity sha512-GSbkfqjqxpZC5/zc+8H332+q5l/DKUhpQr0vdX2uDsxo5K0PhvaMEVjyoJUTkZ3+JstEADQji1PVLVb/2bJeOQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.6" + rc-motion "^2.3.0" + rc-util "^5.6.1" + +rc-drawer@~4.4.2: + version "4.4.3" + resolved "https://registry.yarnpkg.com/rc-drawer/-/rc-drawer-4.4.3.tgz#2094937a844e55dc9644236a2d9fba79c344e321" + integrity sha512-FYztwRs3uXnFOIf1hLvFxIQP9MiZJA+0w+Os8dfDh/90X7z/HqP/Yg+noLCIeHEbKln1Tqelv8ymCAN24zPcfQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.6" + rc-util "^5.7.0" + +rc-dropdown@^3.2.0, rc-dropdown@~3.3.2: + version "3.3.2" + resolved "https://registry.yarnpkg.com/rc-dropdown/-/rc-dropdown-3.3.2.tgz#097c2ec1b6d55c10eeb94dcf6120ba034c7a58e0" + integrity sha512-49GOz42oNvLtYGoJ2X5UWXJFp7aUiSZkj9OcgTV1UpxFZqHQMw+xijkaL5k3XDkMbb92XsuFnFt7IGG3/C0DKw== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.6" + rc-trigger "^5.0.4" + +rc-field-form@~1.23.0: + version "1.23.1" + resolved "https://registry.yarnpkg.com/rc-field-form/-/rc-field-form-1.23.1.tgz#638c11d05d7ed2efdcb862ff3da5fe2a7d199aaa" + integrity sha512-Mun+eaFmX1Pjud9bz0fD0IvxwDfFKWk2Q8tkt4sg4aKR9/FML/rzYC5MjY77p86X45XBurBDUR3gAda+Cg/ULw== + dependencies: + "@babel/runtime" "^7.8.4" + async-validator "^4.0.2" + rc-util "^5.8.0" + +rc-image@~5.2.5: + version "5.2.5" + resolved "https://registry.yarnpkg.com/rc-image/-/rc-image-5.2.5.tgz#44e6ffc842626827960e7ab72e1c0d6f3a8ce440" + integrity sha512-qUfZjYIODxO0c8a8P5GeuclYXZjzW4hV/5hyo27XqSFo1DmTCs2HkVeQObkcIk5kNsJtgsj1KoPThVsSc/PXOw== + dependencies: + "@babel/runtime" "^7.11.2" + classnames "^2.2.6" + rc-dialog "~8.6.0" + rc-util "^5.0.6" + +rc-input-number@~7.3.0: + version "7.3.4" + resolved "https://registry.yarnpkg.com/rc-input-number/-/rc-input-number-7.3.4.tgz#674aea98260250287d36e330a7e065b174486e9d" + integrity sha512-W9uqSzuvJUnz8H8vsVY4kx+yK51SsAxNTwr8SNH4G3XqQNocLVmKIibKFRjocnYX1RDHMND9FFbgj2h7E7nvGA== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.5" + rc-util "^5.9.8" + +rc-input@^0.0.1-alpha.5: + version "0.0.1-alpha.5" + resolved "https://registry.yarnpkg.com/rc-input/-/rc-input-0.0.1-alpha.5.tgz#cc043c44570c651f4d10d9809b3d634ed12537e6" + integrity sha512-RHvNweOVWFbbx2l/y6hgnSAdOg5fXc1D1VGhX2RNkGGyGr6cemnvyiYMxwZJjcXs0al3YK9jMObm20+DgH/mpw== + dependencies: + "@babel/runtime" "^7.11.1" + classnames "^2.2.1" + rc-util "^5.18.1" + +rc-mentions@~1.6.1: + version "1.6.2" + resolved "https://registry.yarnpkg.com/rc-mentions/-/rc-mentions-1.6.2.tgz#62ed7cdd8fa86d857c3ce3f9e73438022130815e" + integrity sha512-cntfJkNMq8B910rXuvnsnOV88DfmoUidnQnSIeXzWiYiUX4RL5oWUfSZzs+HAXYRU4SL1l8Mwjx95wHETiZ/fQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.6" + rc-menu "^9.0.0" + rc-textarea "^0.3.0" + rc-trigger "^5.0.4" + rc-util "^5.0.1" + +rc-menu@^9.0.0: + version "9.3.2" + resolved "https://registry.yarnpkg.com/rc-menu/-/rc-menu-9.3.2.tgz#bb842d37ebf71da912bea201cf7ef0a27267ad49" + integrity sha512-h3m45oY1INZyqphGELkdT0uiPnFzxkML8m0VMhJnk2fowtqfiT7F5tJLT3znEVaPIY80vMy1bClCkgq8U91CzQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-motion "^2.4.3" + rc-overflow "^1.2.0" + rc-trigger "^5.1.2" + rc-util "^5.12.0" + shallowequal "^1.1.0" + +rc-menu@~9.2.1: + version "9.2.1" + resolved "https://registry.yarnpkg.com/rc-menu/-/rc-menu-9.2.1.tgz#6fbe47f4846363bb81a5a21f0960026c3ada497a" + integrity sha512-UbEtn3rflJ8zS+etYGTVQuzy7Fm+yWXR5c0Rl6ecNTS/dPknRyWAyhJcbeR0Hu1+RdQT+0VCqrUPrgKnm4iY+w== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-motion "^2.4.3" + rc-overflow "^1.2.0" + rc-trigger "^5.1.2" + rc-util "^5.12.0" + shallowequal "^1.1.0" + +rc-motion@^2.0.0, rc-motion@^2.0.1, rc-motion@^2.2.0, rc-motion@^2.3.0, rc-motion@^2.3.4, rc-motion@^2.4.3, rc-motion@^2.4.4: + version "2.4.5" + resolved "https://registry.yarnpkg.com/rc-motion/-/rc-motion-2.4.5.tgz#b061c50bb29ecd3d735d5f4c40924a3c78226cbd" + integrity sha512-f3uJHR4gcpeZS/s8/nYFSOrXt2Wu/h9GrEcbJmC0qmKrVNgwL1pTgrT5kW7lgG6PFeoL4yHDmpQoEKkrPtKIzQ== + dependencies: + "@babel/runtime" "^7.11.1" + classnames "^2.2.1" + rc-util "^5.18.1" + +rc-notification@~4.5.7: + version "4.5.7" + resolved "https://registry.yarnpkg.com/rc-notification/-/rc-notification-4.5.7.tgz#265e6e6a0c1a0fac63d6abd4d832eb8ff31522f1" + integrity sha512-zhTGUjBIItbx96SiRu3KVURcLOydLUHZCPpYEn1zvh+re//Tnq/wSxN4FKgp38n4HOgHSVxcLEeSxBMTeBBDdw== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-motion "^2.2.0" + rc-util "^5.0.1" + +rc-overflow@^1.0.0, rc-overflow@^1.2.0: + version "1.2.3" + resolved "https://registry.yarnpkg.com/rc-overflow/-/rc-overflow-1.2.3.tgz#1754216d807f5473304272b0321c3aba7615f47a" + integrity sha512-Bz6dXTn/ww8nmu70tUQfRV0wT3BkfXY6j1lB1O38OVkDPz4xwfAcGK+LJ2zewUR5cTXkJ8hAN7YULohG8z4M7Q== + dependencies: + "@babel/runtime" "^7.11.1" + classnames "^2.2.1" + rc-resize-observer "^1.0.0" + rc-util "^5.15.0" + +rc-pagination@~3.1.9: + version "3.1.15" + resolved "https://registry.yarnpkg.com/rc-pagination/-/rc-pagination-3.1.15.tgz#e05eddf4c15717a5858290bed0857e27e2f957ff" + integrity sha512-4L3fot8g4E+PjWEgoVGX0noFCg+8ZFZmeLH4vsnZpB3O2T2zThtakjNxG+YvSaYtyMVT4B+GLayjKrKbXQpdAg== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.1" + +rc-picker@~2.6.4: + version "2.6.4" + resolved "https://registry.yarnpkg.com/rc-picker/-/rc-picker-2.6.4.tgz#916aa5fcd8abd11106f1c2fb64bfd549439abfa0" + integrity sha512-Mnc1udPyGNSG7/ya5SmYltUjCUcsMH7jfJnuuXVAvEaEdx9qZxDGMWtIii//+ARC06CSHQ83s5iwiGFwM+FcDw== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.1" + date-fns "2.x" + dayjs "1.x" + moment "^2.24.0" + rc-trigger "^5.0.4" + rc-util "^5.4.0" + shallowequal "^1.1.0" + +rc-progress@~3.2.1: + version "3.2.4" + resolved "https://registry.yarnpkg.com/rc-progress/-/rc-progress-3.2.4.tgz#4036acdae2566438545bc4df2203248babaf7549" + integrity sha512-M9WWutRaoVkPUPIrTpRIDpX0SPSrVHzxHdCRCbeoBFrd9UFWTYNWRlHsruJM5FH1AZI+BwB4wOJUNNylg/uFSw== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.6" + rc-util "^5.16.1" + +rc-rate@~2.9.0: + version "2.9.1" + resolved "https://registry.yarnpkg.com/rc-rate/-/rc-rate-2.9.1.tgz#e43cb95c4eb90a2c1e0b16ec6614d8c43530a731" + integrity sha512-MmIU7FT8W4LYRRHJD1sgG366qKtSaKb67D0/vVvJYR0lrCuRrCiVQ5qhfT5ghVO4wuVIORGpZs7ZKaYu+KMUzA== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.5" + rc-util "^5.0.1" + +rc-resize-observer@^1.0.0, rc-resize-observer@^1.1.0, rc-resize-observer@^1.2.0: + version "1.2.0" + resolved "https://registry.yarnpkg.com/rc-resize-observer/-/rc-resize-observer-1.2.0.tgz#9f46052f81cdf03498be35144cb7c53fd282c4c7" + integrity sha512-6W+UzT3PyDM0wVCEHfoW3qTHPTvbdSgiA43buiy8PzmeMnfgnDeb9NjdimMXMl3/TcrvvWl5RRVdp+NqcR47pQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.1" + rc-util "^5.15.0" + resize-observer-polyfill "^1.5.1" + +rc-select@~14.0.0-alpha.15, rc-select@~14.0.0-alpha.23, rc-select@~14.0.0-alpha.8: + version "14.0.0" + resolved "https://registry.yarnpkg.com/rc-select/-/rc-select-14.0.0.tgz#87735dbc548f1cc8e94d579b21682ed2d34f7653" + integrity sha512-DkoWMhyxmrfpc1KJSqPORZdkKevzgOINvjR4WI+dibRe6i6DyqGB4Jk21sencnK9di6dumzOCHf93x9t9+gp3Q== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-motion "^2.0.1" + rc-overflow "^1.0.0" + rc-trigger "^5.0.4" + rc-util "^5.16.1" + rc-virtual-list "^3.2.0" + +rc-slider@~10.0.0-alpha.4: + version "10.0.0-alpha.4" + resolved "https://registry.yarnpkg.com/rc-slider/-/rc-slider-10.0.0-alpha.4.tgz#f14ec0905d53f1f9d7f495c301527d6eca5781cf" + integrity sha512-ih2xwkBgXAWAf7MjZIZyCiiWo6tnoIMuHifn0UeKXVAup7sH53QdSVvT9x/cysuSZIPNMYWEf6mec184n3gbiQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.5" + rc-tooltip "^5.0.1" + rc-util "^5.18.1" + shallowequal "^1.1.0" + +rc-steps@~4.1.0: + version "4.1.4" + resolved "https://registry.yarnpkg.com/rc-steps/-/rc-steps-4.1.4.tgz#0ba82db202d59ca52d0693dc9880dd145b19dc23" + integrity sha512-qoCqKZWSpkh/b03ASGx1WhpKnuZcRWmvuW+ZUu4mvMdfvFzVxblTwUM+9aBd0mlEUFmt6GW8FXhMpHkK3Uzp3w== + dependencies: + "@babel/runtime" "^7.10.2" + classnames "^2.2.3" + rc-util "^5.0.1" + +rc-switch@~3.2.0: + version "3.2.2" + resolved "https://registry.yarnpkg.com/rc-switch/-/rc-switch-3.2.2.tgz#d001f77f12664d52595b4f6fb425dd9e66fba8e8" + integrity sha512-+gUJClsZZzvAHGy1vZfnwySxj+MjLlGRyXKXScrtCTcmiYNPzxDFOxdQ/3pK1Kt/0POvwJ/6ALOR8gwdXGhs+A== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.1" + rc-util "^5.0.1" + +rc-table@~7.23.0: + version "7.23.0" + resolved "https://registry.yarnpkg.com/rc-table/-/rc-table-7.23.0.tgz#e5f76998ecf3246147d45ed311417c08886e6507" + integrity sha512-Q1gneB2+lUa8EzCCfbrq+jO1qNSwQv1RUUXKB84W/Stdp4EvGOt2+QqGyfotMNM4JUw0fgGLwY+WjnhUhnLuQQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.5" + rc-resize-observer "^1.1.0" + rc-util "^5.14.0" + shallowequal "^1.1.0" + +rc-tabs@~11.10.0: + version "11.10.7" + resolved "https://registry.yarnpkg.com/rc-tabs/-/rc-tabs-11.10.7.tgz#7d8b5dcc17f1608cf3b9425d80069f1415479335" + integrity sha512-7IKmcU7QU3CdYnJTabeXs2DDeLiXLyALC8fvOtgyWWFXUD47G5vG+4bFO3f9+AI+rcFAPpfwapZbXxgmiRuWYQ== + dependencies: + "@babel/runtime" "^7.11.2" + classnames "2.x" + rc-dropdown "^3.2.0" + rc-menu "^9.0.0" + rc-resize-observer "^1.0.0" + rc-util "^5.5.0" + +rc-textarea@^0.3.0, rc-textarea@~0.3.0: + version "0.3.7" + resolved "https://registry.yarnpkg.com/rc-textarea/-/rc-textarea-0.3.7.tgz#987142891efdedb774883c07e2f51b318fde5a11" + integrity sha512-yCdZ6binKmAQB13hc/oehh0E/QRwoPP1pjF21aHBxlgXO3RzPF6dUu4LG2R4FZ1zx/fQd2L1faktulrXOM/2rw== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.1" + rc-resize-observer "^1.0.0" + rc-util "^5.7.0" + shallowequal "^1.1.0" + +rc-tooltip@^5.0.1, rc-tooltip@~5.1.1: + version "5.1.1" + resolved "https://registry.yarnpkg.com/rc-tooltip/-/rc-tooltip-5.1.1.tgz#94178ed162d0252bc4993b725f5dc2ac0fccf154" + integrity sha512-alt8eGMJulio6+4/uDm7nvV+rJq9bsfxFDCI0ljPdbuoygUscbsMYb6EQgwib/uqsXQUvzk+S7A59uYHmEgmDA== + dependencies: + "@babel/runtime" "^7.11.2" + rc-trigger "^5.0.0" + +rc-tree-select@~5.1.1: + version "5.1.4" + resolved "https://registry.yarnpkg.com/rc-tree-select/-/rc-tree-select-5.1.4.tgz#3577135399d1f4931b0f4d8245e0845861802e2b" + integrity sha512-sA6vTUQghzbjh3u6YAwJIebKkJEHUWDPFHQpfiPObqsEYqi9TKE1LvWqbJ77NbOlOARZq0KIb7LDGF8X0dikDQ== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-select "~14.0.0-alpha.8" + rc-tree "~5.4.3" + rc-util "^5.16.1" + +rc-tree@~5.4.3: + version "5.4.4" + resolved "https://registry.yarnpkg.com/rc-tree/-/rc-tree-5.4.4.tgz#2ea3663ad3c566aef79a46ba6a1e050d24323e01" + integrity sha512-2qoObRgp31DBXmVzMJmo4qmwP20XEa4hR3imWQtRPcgN3pmljW3WKFmZRrYdOFHz7CyTnRsFZR065bBkIoUpiA== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "2.x" + rc-motion "^2.0.1" + rc-util "^5.16.1" + rc-virtual-list "^3.4.2" + +rc-trigger@^5.0.0, rc-trigger@^5.0.4, rc-trigger@^5.1.2, rc-trigger@^5.2.10: + version "5.2.10" + resolved "https://registry.yarnpkg.com/rc-trigger/-/rc-trigger-5.2.10.tgz#8a0057a940b1b9027eaa33beec8a6ecd85cce2b1" + integrity sha512-FkUf4H9BOFDaIwu42fvRycXMAvkttph9AlbCZXssZDVzz2L+QZ0ERvfB/4nX3ZFPh1Zd+uVGr1DEDeXxq4J1TA== + dependencies: + "@babel/runtime" "^7.11.2" + classnames "^2.2.6" + rc-align "^4.0.0" + rc-motion "^2.0.0" + rc-util "^5.5.0" + +rc-upload@~4.3.0: + version "4.3.3" + resolved "https://registry.yarnpkg.com/rc-upload/-/rc-upload-4.3.3.tgz#e237aa525e5313fa16f4d04d27f53c2f0e157bb8" + integrity sha512-YoJ0phCRenMj1nzwalXzciKZ9/FAaCrFu84dS5pphwucTC8GUWClcDID/WWNGsLFcM97NqIboDqrV82rVRhW/w== + dependencies: + "@babel/runtime" "^7.10.1" + classnames "^2.2.5" + rc-util "^5.2.0" + +rc-util@^5.0.1, rc-util@^5.0.6, rc-util@^5.0.7, rc-util@^5.12.0, rc-util@^5.14.0, rc-util@^5.15.0, rc-util@^5.16.1, rc-util@^5.18.1, rc-util@^5.2.0, rc-util@^5.2.1, rc-util@^5.3.0, rc-util@^5.4.0, rc-util@^5.5.0, rc-util@^5.6.1, rc-util@^5.7.0, rc-util@^5.8.0, rc-util@^5.9.4, rc-util@^5.9.8: + version "5.18.1" + resolved "https://registry.yarnpkg.com/rc-util/-/rc-util-5.18.1.tgz#80bd1450b5254655d2fbea63e3d34f6871e9be79" + integrity sha512-24xaSrMZUEKh1+suDOtJWfPe9E6YrwryViZcoPO0miJTKzP4qhUlV5AAlKQ82AJilz/AOHfi3l6HoX8qa1ye8w== + dependencies: + "@babel/runtime" "^7.12.5" + react-is "^16.12.0" + shallowequal "^1.1.0" + +rc-virtual-list@^3.2.0, rc-virtual-list@^3.4.2: + version "3.4.2" + resolved "https://registry.yarnpkg.com/rc-virtual-list/-/rc-virtual-list-3.4.2.tgz#1078327aa7230b5e456d679ed2ce99f3c036ebd1" + integrity sha512-OyVrrPvvFcHvV0ssz5EDZ+7Rf5qLat/+mmujjchNw5FfbJWNDwkpQ99EcVE6+FtNRmX9wFa1LGNpZLUTvp/4GQ== + dependencies: + classnames "^2.2.6" + rc-resize-observer "^1.0.0" + rc-util "^5.0.7" + +react-dom@^16.13.1: + version "16.14.0" + resolved "https://registry.yarnpkg.com/react-dom/-/react-dom-16.14.0.tgz#7ad838ec29a777fb3c75c3a190f661cf92ab8b89" + integrity sha512-1gCeQXDLoIqMgqD3IO2Ah9bnf0w9kzhwN5q4FGnHZ67hBm9yePzB5JJAIQCc8x3pFnNlwFq4RidZggNAAkzWWw== + dependencies: + loose-envify "^1.1.0" + object-assign "^4.1.1" + prop-types "^15.6.2" + scheduler "^0.19.1" + +react-flame-graph@^1.4.0: + version "1.4.0" + resolved "https://registry.yarnpkg.com/react-flame-graph/-/react-flame-graph-1.4.0.tgz#52d118cc94348f630a812fc0ec530a5b73c30cdb" + integrity sha512-DaCK9ZX+xK0mNca72kUE5cu6T8hGe/KLsefQWf+eT9sVt+0WP1dVxZCGD8Svfn2KrZB9Mv011Intg/yG2YWSxA== + dependencies: + flow-bin "^0.118.0" + memoize-one "^3.1.1" + react-window "^1" + +react-is@^16.12.0, react-is@^16.13.1, react-is@^16.7.0: + version "16.13.1" + resolved "https://registry.yarnpkg.com/react-is/-/react-is-16.13.1.tgz#789729a4dc36de2999dc156dd6c1d9c18cea56a4" + integrity sha512-24e6ynE2H+OKt4kqsOvNd8kBpV65zoxbA4BVsEOB3ARVWQki/DHzaUoC5KuON/BiccDaCCTZBuOcfZs70kR8bQ== + +"react-is@^16.8.0 || ^17.0.0": + version "17.0.2" + resolved "https://registry.yarnpkg.com/react-is/-/react-is-17.0.2.tgz#e691d4a8e9c789365655539ab372762b0efb54f0" + integrity sha512-w2GsyukL62IJnlaff/nRegPQR94C/XXamvMWmSHRJ4y7Ts/4ocGRmTHvOs8PSE6pB3dWOrD/nueuU5sduBsQ4w== + +react-transition-group@^4.4.0: + version "4.4.2" + resolved "https://registry.yarnpkg.com/react-transition-group/-/react-transition-group-4.4.2.tgz#8b59a56f09ced7b55cbd53c36768b922890d5470" + integrity sha512-/RNYfRAMlZwDSr6z4zNKV6xu53/e2BuaBbGhbyYIXTrmgu/bGHzmqOs7mJSJBHy9Ud+ApHx3QjrkKSp1pxvlFg== + dependencies: + "@babel/runtime" "^7.5.5" + dom-helpers "^5.0.1" + loose-envify "^1.4.0" + prop-types "^15.6.2" + +react-window@^1: + version "1.8.6" + resolved "https://registry.yarnpkg.com/react-window/-/react-window-1.8.6.tgz#d011950ac643a994118632665aad0c6382e2a112" + integrity sha512-8VwEEYyjz6DCnGBsd+MgkD0KJ2/OXFULyDtorIiTz+QzwoP94tBoA7CnbtyXMm+cCeAUER5KJcPtWl9cpKbOBg== + dependencies: + "@babel/runtime" "^7.0.0" + memoize-one ">=3.1.1 <6" + +react@^16.13.1: + version "16.14.0" + resolved "https://registry.yarnpkg.com/react/-/react-16.14.0.tgz#94d776ddd0aaa37da3eda8fc5b6b18a4c9a3114d" + integrity sha512-0X2CImDkJGApiAlcf0ODKIneSwBPhqJawOa5wCtKbu7ZECrmS26NvtSILynQ66cgkT/RJ4LidJOc3bUESwmU8g== + dependencies: + loose-envify "^1.1.0" + object-assign "^4.1.1" + prop-types "^15.6.2" + +readable-stream@^2.0.1: + version "2.3.7" + resolved "https://registry.yarnpkg.com/readable-stream/-/readable-stream-2.3.7.tgz#1eca1cf711aef814c04f62252a36a62f6cb23b57" + integrity sha512-Ebho8K4jIbHAxnuxi7o42OrZgF/ZTNcsZj6nRKyUmkhLFq8CHItp/fy6hQZuZmP/n3yZ9VBUbp4zz/mX8hmYPw== + dependencies: + core-util-is "~1.0.0" + inherits "~2.0.3" + isarray "~1.0.0" + process-nextick-args "~2.0.0" + safe-buffer "~5.1.1" + string_decoder "~1.1.1" + util-deprecate "~1.0.1" + +readable-stream@^3.0.6: + version "3.6.0" + resolved "https://registry.yarnpkg.com/readable-stream/-/readable-stream-3.6.0.tgz#337bbda3adc0706bd3e024426a286d4b4b2c9198" + integrity sha512-BViHy7LKeTz4oNnkcLJ+lVSL6vpiFeX6/d3oSH8zCW7UxP2onchk+vTGB143xuFjHS3deTgkKoXXymXqymiIdA== + dependencies: + inherits "^2.0.3" + string_decoder "^1.1.1" + util-deprecate "^1.0.1" + +readdirp@~3.6.0: + version "3.6.0" + resolved "https://registry.yarnpkg.com/readdirp/-/readdirp-3.6.0.tgz#74a370bd857116e245b29cc97340cd431a02a6c7" + integrity sha512-hOS089on8RduqdbhvQ5Z37A0ESjsqz6qnRcffsMU3495FuTdqSm+7bhJ29JvIOsBDEEnan5DPu9t3To9VRlMzA== + dependencies: + picomatch "^2.2.1" + +rechoir@^0.7.0: + version "0.7.1" + resolved "https://registry.yarnpkg.com/rechoir/-/rechoir-0.7.1.tgz#9478a96a1ca135b5e88fc027f03ee92d6c645686" + integrity sha512-/njmZ8s1wVeR6pjTZ+0nCnv8SpZNRMT2D1RLOJQESlYFDBvwpTA4KWJpZ+sBJ4+vhjILRcK7JIFdGCdxEAAitg== + dependencies: + resolve "^1.9.0" + +regenerator-runtime@^0.13.4: + version "0.13.9" + resolved "https://registry.yarnpkg.com/regenerator-runtime/-/regenerator-runtime-0.13.9.tgz#8925742a98ffd90814988d7566ad30ca3b263b52" + integrity sha512-p3VT+cOEgxFsRRA9X4lkI1E+k2/CtnKtU4gcxyaCUreilL/vqI6CdZ3wxVUx3UOUg+gnUOQQcRI7BmSI656MYA== + +regexp.prototype.flags@^1.2.0: + version "1.4.1" + resolved "https://registry.yarnpkg.com/regexp.prototype.flags/-/regexp.prototype.flags-1.4.1.tgz#b3f4c0059af9e47eca9f3f660e51d81307e72307" + integrity sha512-pMR7hBVUUGI7PMA37m2ofIdQCsomVnas+Jn5UPGAHQ+/LlwKm/aTLJHdasmHRzlfeZwHiAOaRSo2rbBDm3nNUQ== + dependencies: + call-bind "^1.0.2" + define-properties "^1.1.3" + +relateurl@^0.2.7: + version "0.2.7" + resolved "https://registry.yarnpkg.com/relateurl/-/relateurl-0.2.7.tgz#54dbf377e51440aca90a4cd274600d3ff2d888a9" + integrity sha1-VNvzd+UUQKypCkzSdGANP/LYiKk= + +renderkid@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/renderkid/-/renderkid-3.0.0.tgz#5fd823e4d6951d37358ecc9a58b1f06836b6268a" + integrity sha512-q/7VIQA8lmM1hF+jn+sFSPWGlMkSAeNYcPLmDQx2zzuiDfaLrOmumR8iaUKlenFgh0XRPIUeSPlH3A+AW3Z5pg== + dependencies: + css-select "^4.1.3" + dom-converter "^0.2.0" + htmlparser2 "^6.1.0" + lodash "^4.17.21" + strip-ansi "^6.0.1" + +require-from-string@^2.0.2: + version "2.0.2" + resolved "https://registry.yarnpkg.com/require-from-string/-/require-from-string-2.0.2.tgz#89a7fdd938261267318eafe14f9c32e598c36909" + integrity sha512-Xf0nWe6RseziFMu+Ap9biiUbmplq6S9/p+7w7YXP/JBHhrUDDUhwa+vANyubuqfZWTveU//DYVGsDG7RKL/vEw== + +requires-port@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/requires-port/-/requires-port-1.0.0.tgz#925d2601d39ac485e091cf0da5c6e694dc3dcaff" + integrity sha1-kl0mAdOaxIXgkc8NpcbmlNw9yv8= + +resize-observer-polyfill@^1.5.0, resize-observer-polyfill@^1.5.1: + version "1.5.1" + resolved "https://registry.yarnpkg.com/resize-observer-polyfill/-/resize-observer-polyfill-1.5.1.tgz#0e9020dd3d21024458d4ebd27e23e40269810464" + integrity sha512-LwZrotdHOo12nQuZlHEmtuXdqGoOD0OhaxopaNFxWzInpEgaLWoVuAMbTzixuosCx2nEG58ngzW3vxdWoxIgdg== + +resolve-cwd@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/resolve-cwd/-/resolve-cwd-3.0.0.tgz#0f0075f1bb2544766cf73ba6a6e2adfebcb13f2d" + integrity sha512-OrZaX2Mb+rJCpH/6CpSqt9xFVpN++x01XnN2ie9g6P5/3xelLAkXWVADpdz1IHD/KFfEXyE6V0U01OQ3UO2rEg== + dependencies: + resolve-from "^5.0.0" + +resolve-from@^5.0.0: + version "5.0.0" + resolved "https://registry.yarnpkg.com/resolve-from/-/resolve-from-5.0.0.tgz#c35225843df8f776df21c57557bc087e9dfdfc69" + integrity sha512-qYg9KP24dD5qka9J47d0aVky0N+b4fTU89LN9iDnjB5waksiC49rvMB0PrUJQGoTmH50XPiqOvAjDfaijGxYZw== + +resolve@^1.9.0: + version "1.22.0" + resolved "https://registry.yarnpkg.com/resolve/-/resolve-1.22.0.tgz#5e0b8c67c15df57a89bdbabe603a002f21731198" + integrity sha512-Hhtrw0nLeSrFQ7phPp4OOcVjLPIeMnRlr5mcnVuMe7M/7eBn98A3hmFRLoFo3DLZkivSYwhRUJTyPyWAk56WLw== + dependencies: + is-core-module "^2.8.1" + path-parse "^1.0.7" + supports-preserve-symlinks-flag "^1.0.0" + +retry@^0.13.1: + version "0.13.1" + resolved "https://registry.yarnpkg.com/retry/-/retry-0.13.1.tgz#185b1587acf67919d63b357349e03537b2484658" + integrity sha512-XQBQ3I8W1Cge0Seh+6gjj03LbmRFWuoszgK9ooCpwYIrhhoO80pfq4cUkU5DkknwfOfFteRwlZ56PYOGYyFWdg== + +reusify@^1.0.4: + version "1.0.4" + resolved "https://registry.yarnpkg.com/reusify/-/reusify-1.0.4.tgz#90da382b1e126efc02146e90845a88db12925d76" + integrity sha512-U9nH88a3fc/ekCF1l0/UP1IosiuIjyTh7hBvXVMHYgVcfGvt897Xguj2UOLDeI5BG2m7/uwyaLVT6fbtCwTyzw== + +rimraf@^3.0.2: + version "3.0.2" + resolved "https://registry.yarnpkg.com/rimraf/-/rimraf-3.0.2.tgz#f1a5402ba6220ad52cc1282bac1ae3aa49fd061a" + integrity sha512-JZkJMZkAGFFPP2YqXZXPbMlMBgsxzE8ILs4lMIX/2o0L9UBw9O/Y3o6wFw/i9YLapcUJWwqbi3kdxIPdC62TIA== + dependencies: + glob "^7.1.3" + +run-parallel@^1.1.9: + version "1.2.0" + resolved "https://registry.yarnpkg.com/run-parallel/-/run-parallel-1.2.0.tgz#66d1368da7bdf921eb9d95bd1a9229e7f21a43ee" + integrity sha512-5l4VyZR86LZ/lDxZTR6jqL8AFE2S0IFLMP26AbjsLVADxHdhB/c0GUsH+y39UfCi3dzz8OlQuPmnaJOMoDHQBA== + dependencies: + queue-microtask "^1.2.2" + +safe-buffer@5.1.2, safe-buffer@~5.1.0, safe-buffer@~5.1.1: + version "5.1.2" + resolved "https://registry.yarnpkg.com/safe-buffer/-/safe-buffer-5.1.2.tgz#991ec69d296e0313747d59bdfd2b745c35f8828d" + integrity sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g== + +safe-buffer@5.2.1, safe-buffer@>=5.1.0, safe-buffer@^5.0.1, safe-buffer@^5.1.0, safe-buffer@~5.2.0: + version "5.2.1" + resolved "https://registry.yarnpkg.com/safe-buffer/-/safe-buffer-5.2.1.tgz#1eaf9fa9bdb1fdd4ec75f58f9cdb4e6b7827eec6" + integrity sha512-rp3So07KcdmmKbGvgaNxQSJr7bGVSVk5S9Eq1F+ppbRo70+YeaDxkw5Dd8NPN+GD6bjnYm2VuPuCXmpuYvmCXQ== + +"safer-buffer@>= 2.1.2 < 3": + version "2.1.2" + resolved "https://registry.yarnpkg.com/safer-buffer/-/safer-buffer-2.1.2.tgz#44fa161b0187b9549dd84bb91802f9bd8385cd6a" + integrity sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg== + +scheduler@^0.19.1: + version "0.19.1" + resolved "https://registry.yarnpkg.com/scheduler/-/scheduler-0.19.1.tgz#4f3e2ed2c1a7d65681f4c854fa8c5a1ccb40f196" + integrity sha512-n/zwRWRYSUj0/3g/otKDRPMh6qv2SYMWNq85IEa8iZyAv8od9zDYpGSnpBEjNgcMNq6Scbu5KfIPxNF72R/2EA== + dependencies: + loose-envify "^1.1.0" + object-assign "^4.1.1" + +schema-utils@^3.0.0, schema-utils@^3.1.0, schema-utils@^3.1.1: + version "3.1.1" + resolved "https://registry.yarnpkg.com/schema-utils/-/schema-utils-3.1.1.tgz#bc74c4b6b6995c1d88f76a8b77bea7219e0c8281" + integrity sha512-Y5PQxS4ITlC+EahLuXaY86TXfR7Dc5lw294alXOq86JAHCihAIZfqv8nNCWvaEJvaC51uN9hbLGeV0cFBdH+Fw== + dependencies: + "@types/json-schema" "^7.0.8" + ajv "^6.12.5" + ajv-keywords "^3.5.2" + +schema-utils@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/schema-utils/-/schema-utils-4.0.0.tgz#60331e9e3ae78ec5d16353c467c34b3a0a1d3df7" + integrity sha512-1edyXKgh6XnJsJSQ8mKWXnN/BVaIbFMLpouRUrXgVq7WYne5kw3MW7UPhO44uRXQSIpTSXoJbmrR2X0w9kUTyg== + dependencies: + "@types/json-schema" "^7.0.9" + ajv "^8.8.0" + ajv-formats "^2.1.1" + ajv-keywords "^5.0.0" + +scroll-into-view-if-needed@^2.2.25: + version "2.2.29" + resolved "https://registry.yarnpkg.com/scroll-into-view-if-needed/-/scroll-into-view-if-needed-2.2.29.tgz#551791a84b7e2287706511f8c68161e4990ab885" + integrity sha512-hxpAR6AN+Gh53AdAimHM6C8oTN1ppwVZITihix+WqalywBeFcQ6LdQP5ABNl26nX8GTEL7VT+b8lKpdqq65wXg== + dependencies: + compute-scroll-into-view "^1.0.17" + +select-hose@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/select-hose/-/select-hose-2.0.0.tgz#625d8658f865af43ec962bfc376a37359a4994ca" + integrity sha1-Yl2GWPhlr0Psliv8N2o3NZpJlMo= + +selfsigned@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/selfsigned/-/selfsigned-2.0.0.tgz#e927cd5377cbb0a1075302cff8df1042cc2bce5b" + integrity sha512-cUdFiCbKoa1mZ6osuJs2uDHrs0k0oprsKveFiiaBKCNq3SYyb5gs2HxhQyDNLCmL51ZZThqi4YNDpCK6GOP1iQ== + dependencies: + node-forge "^1.2.0" + +semver@^7.3.4, semver@^7.3.5: + version "7.3.5" + resolved "https://registry.yarnpkg.com/semver/-/semver-7.3.5.tgz#0b621c879348d8998e4b0e4be94b3f12e6018ef7" + integrity sha512-PoeGJYh8HK4BTO/a9Tf6ZG3veo/A7ZVsYrSA6J8ny9nb3B1VrpkuN+z9OE5wfE5p6H4LchYZsegiQgbJD94ZFQ== + dependencies: + lru-cache "^6.0.0" + +send@0.17.2: + version "0.17.2" + resolved "https://registry.yarnpkg.com/send/-/send-0.17.2.tgz#926622f76601c41808012c8bf1688fe3906f7820" + integrity sha512-UJYB6wFSJE3G00nEivR5rgWp8c2xXvJ3OPWPhmuteU0IKj8nKbG3DrjiOmLwpnHGYWAVwA69zmTm++YG0Hmwww== + dependencies: + debug "2.6.9" + depd "~1.1.2" + destroy "~1.0.4" + encodeurl "~1.0.2" + escape-html "~1.0.3" + etag "~1.8.1" + fresh "0.5.2" + http-errors "1.8.1" + mime "1.6.0" + ms "2.1.3" + on-finished "~2.3.0" + range-parser "~1.2.1" + statuses "~1.5.0" + +serialize-javascript@^6.0.0: + version "6.0.0" + resolved "https://registry.yarnpkg.com/serialize-javascript/-/serialize-javascript-6.0.0.tgz#efae5d88f45d7924141da8b5c3a7a7e663fefeb8" + integrity sha512-Qr3TosvguFt8ePWqsvRfrKyQXIiW+nGbYpy8XK24NQHE83caxWt+mIymTT19DGFbNWNLfEwsrkSmN64lVWB9ag== + dependencies: + randombytes "^2.1.0" + +serve-index@^1.9.1: + version "1.9.1" + resolved "https://registry.yarnpkg.com/serve-index/-/serve-index-1.9.1.tgz#d3768d69b1e7d82e5ce050fff5b453bea12a9239" + integrity sha1-03aNabHn2C5c4FD/9bRTvqEqkjk= + dependencies: + accepts "~1.3.4" + batch "0.6.1" + debug "2.6.9" + escape-html "~1.0.3" + http-errors "~1.6.2" + mime-types "~2.1.17" + parseurl "~1.3.2" + +serve-static@1.14.2: + version "1.14.2" + resolved "https://registry.yarnpkg.com/serve-static/-/serve-static-1.14.2.tgz#722d6294b1d62626d41b43a013ece4598d292bfa" + integrity sha512-+TMNA9AFxUEGuC0z2mevogSnn9MXKb4fa7ngeRMJaaGv8vTwnIEkKi+QGvPt33HSnf8pRS+WGM0EbMtCJLKMBQ== + dependencies: + encodeurl "~1.0.2" + escape-html "~1.0.3" + parseurl "~1.3.3" + send "0.17.2" + +setprototypeof@1.1.0: + version "1.1.0" + resolved "https://registry.yarnpkg.com/setprototypeof/-/setprototypeof-1.1.0.tgz#d0bd85536887b6fe7c0d818cb962d9d91c54e656" + integrity sha512-BvE/TwpZX4FXExxOxZyRGQQv651MSwmWKZGqvmPcRIjDqWub67kTKuIMx43cZZrS/cBBzwBcNDWoFxt2XEFIpQ== + +setprototypeof@1.2.0: + version "1.2.0" + resolved "https://registry.yarnpkg.com/setprototypeof/-/setprototypeof-1.2.0.tgz#66c9a24a73f9fc28cbe66b09fed3d33dcaf1b424" + integrity sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw== + +shallow-clone@^3.0.0: + version "3.0.1" + resolved "https://registry.yarnpkg.com/shallow-clone/-/shallow-clone-3.0.1.tgz#8f2981ad92531f55035b01fb230769a40e02efa3" + integrity sha512-/6KqX+GVUdqPuPPd2LxDDxzX6CAbjJehAAOKlNpqqUpAqPM6HeL8f+o3a+JsyGjn2lv0WY8UsTgUJjU9Ok55NA== + dependencies: + kind-of "^6.0.2" + +shallowequal@^1.1.0: + version "1.1.0" + resolved "https://registry.yarnpkg.com/shallowequal/-/shallowequal-1.1.0.tgz#188d521de95b9087404fd4dcb68b13df0ae4e7f8" + integrity sha512-y0m1JoUZSlPAjXVtPPW70aZWfIL/dSP7AFkRnniLCrK/8MDKog3TySTBmckD+RObVxH0v4Tox67+F14PdED2oQ== + +shebang-command@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/shebang-command/-/shebang-command-2.0.0.tgz#ccd0af4f8835fbdc265b82461aaf0c36663f34ea" + integrity sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA== + dependencies: + shebang-regex "^3.0.0" + +shebang-regex@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/shebang-regex/-/shebang-regex-3.0.0.tgz#ae16f1644d873ecad843b0307b143362d4c42172" + integrity sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A== + +signal-exit@^3.0.3: + version "3.0.7" + resolved "https://registry.yarnpkg.com/signal-exit/-/signal-exit-3.0.7.tgz#a9a1767f8af84155114eaabd73f99273c8f59ad9" + integrity sha512-wnD2ZE+l+SPC/uoS0vXeE9L1+0wuaMqKlfz9AMUo38JsyLSBWSFcHR1Rri62LZc12vLr1gb3jl7iwQhgwpAbGQ== + +slash@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/slash/-/slash-3.0.0.tgz#6539be870c165adbd5240220dbe361f1bc4d4634" + integrity sha512-g9Q1haeby36OSStwb4ntCGGGaKsaVSjQ68fBxoQcutl5fS1vuY18H3wSt3jFyFtrkx+Kz0V1G85A4MyAdDMi2Q== + +sockjs@^0.3.21: + version "0.3.24" + resolved "https://registry.yarnpkg.com/sockjs/-/sockjs-0.3.24.tgz#c9bc8995f33a111bea0395ec30aa3206bdb5ccce" + integrity sha512-GJgLTZ7vYb/JtPSSZ10hsOYIvEYsjbNU+zPdIHcUaWVNUEPivzxku31865sSSud0Da0W4lEeOPlmw93zLQchuQ== + dependencies: + faye-websocket "^0.11.3" + uuid "^8.3.2" + websocket-driver "^0.7.4" + +source-map-js@^1.0.2: + version "1.0.2" + resolved "https://registry.yarnpkg.com/source-map-js/-/source-map-js-1.0.2.tgz#adbc361d9c62df380125e7f161f71c826f1e490c" + integrity sha512-R0XvVJ9WusLiqTCEiGCmICCMplcCkIwwR11mOSD9CR5u+IXYdiseeEuXCVAjS54zqwkLcPNnmU4OeJ6tUrWhDw== + +source-map-support@~0.5.20: + version "0.5.21" + resolved "https://registry.yarnpkg.com/source-map-support/-/source-map-support-0.5.21.tgz#04fe7c7f9e1ed2d662233c28cb2b35b9f63f6e4f" + integrity sha512-uBHU3L3czsIyYXKX88fdrGovxdSCoTGDRZ6SYXtSRxLZUzHg5P/66Ht6uoUlHu9EZod+inXhKo3qQgwXUT/y1w== + dependencies: + buffer-from "^1.0.0" + source-map "^0.6.0" + +source-map@^0.6.0, source-map@^0.6.1, source-map@~0.6.0: + version "0.6.1" + resolved "https://registry.yarnpkg.com/source-map/-/source-map-0.6.1.tgz#74722af32e9614e9c287a8d0bbde48b5e2f1a263" + integrity sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g== + +source-map@~0.7.2: + version "0.7.3" + resolved "https://registry.yarnpkg.com/source-map/-/source-map-0.7.3.tgz#5302f8169031735226544092e64981f751750383" + integrity sha512-CkCj6giN3S+n9qrYiBTX5gystlENnRW5jZeNLHpe6aue+SrHcG5VYwujhW9s4dY31mEGsxBDrHR6oI69fTXsaQ== + +spdy-transport@^3.0.0: + version "3.0.0" + resolved "https://registry.yarnpkg.com/spdy-transport/-/spdy-transport-3.0.0.tgz#00d4863a6400ad75df93361a1608605e5dcdcf31" + integrity sha512-hsLVFE5SjA6TCisWeJXFKniGGOpBgMLmerfO2aCyCU5s7nJ/rpAepqmFifv/GCbSbueEeAJJnmSQ2rKC/g8Fcw== + dependencies: + debug "^4.1.0" + detect-node "^2.0.4" + hpack.js "^2.1.6" + obuf "^1.1.2" + readable-stream "^3.0.6" + wbuf "^1.7.3" + +spdy@^4.0.2: + version "4.0.2" + resolved "https://registry.yarnpkg.com/spdy/-/spdy-4.0.2.tgz#b74f466203a3eda452c02492b91fb9e84a27677b" + integrity sha512-r46gZQZQV+Kl9oItvl1JZZqJKGr+oEkB08A6BzkiR7593/7IbtuncXHd2YoYeTsG4157ZssMu9KYvUHLcjcDoA== + dependencies: + debug "^4.1.0" + handle-thing "^2.0.0" + http-deceiver "^1.2.7" + select-hose "^2.0.0" + spdy-transport "^3.0.0" + +"statuses@>= 1.4.0 < 2", "statuses@>= 1.5.0 < 2", statuses@~1.5.0: + version "1.5.0" + resolved "https://registry.yarnpkg.com/statuses/-/statuses-1.5.0.tgz#161c7dac177659fd9811f43771fa99381478628c" + integrity sha1-Fhx9rBd2Wf2YEfQ3cfqZOBR4Yow= + +string-convert@^0.2.0: + version "0.2.1" + resolved "https://registry.yarnpkg.com/string-convert/-/string-convert-0.2.1.tgz#6982cc3049fbb4cd85f8b24568b9d9bf39eeff97" + integrity sha1-aYLMMEn7tM2F+LJFaLnZvznu/5c= + +string_decoder@^1.1.1: + version "1.3.0" + resolved "https://registry.yarnpkg.com/string_decoder/-/string_decoder-1.3.0.tgz#42f114594a46cf1a8e30b0a84f56c78c3edac21e" + integrity sha512-hkRX8U1WjJFd8LsDJ2yQ/wWWxaopEsABU1XfkM8A+j0+85JAGppt16cr1Whg6KIbb4okU6Mql6BOj+uup/wKeA== + dependencies: + safe-buffer "~5.2.0" + +string_decoder@~1.1.1: + version "1.1.1" + resolved "https://registry.yarnpkg.com/string_decoder/-/string_decoder-1.1.1.tgz#9cf1611ba62685d7030ae9e4ba34149c3af03fc8" + integrity sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg== + dependencies: + safe-buffer "~5.1.0" + +strip-ansi@^6.0.1: + version "6.0.1" + resolved "https://registry.yarnpkg.com/strip-ansi/-/strip-ansi-6.0.1.tgz#9e26c63d30f53443e9489495b2105d37b67a85d9" + integrity sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A== + dependencies: + ansi-regex "^5.0.1" + +strip-ansi@^7.0.0: + version "7.0.1" + resolved "https://registry.yarnpkg.com/strip-ansi/-/strip-ansi-7.0.1.tgz#61740a08ce36b61e50e65653f07060d000975fb2" + integrity sha512-cXNxvT8dFNRVfhVME3JAe98mkXDYN2O1l7jmcwMnOslDeESg1rF/OZMtK0nRAhiari1unG5cD4jG3rapUAkLbw== + dependencies: + ansi-regex "^6.0.1" + +strip-final-newline@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/strip-final-newline/-/strip-final-newline-2.0.0.tgz#89b852fb2fcbe936f6f4b3187afb0a12c1ab58ad" + integrity sha512-BrpvfNAE3dcvq7ll3xVumzjKjZQ5tI1sEUIKr3Uoks0XUl45St3FlatVqef9prk4jRDzhW6WZg+3bk93y6pLjA== + +style-loader@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/style-loader/-/style-loader-2.0.0.tgz#9669602fd4690740eaaec137799a03addbbc393c" + integrity sha512-Z0gYUJmzZ6ZdRUqpg1r8GsaFKypE+3xAzuFeMuoHgjc9KZv3wMyCRjQIWEbhoFSq7+7yoHXySDJyyWQaPajeiQ== + dependencies: + loader-utils "^2.0.0" + schema-utils "^3.0.0" + +supports-color@^7.1.0: + version "7.2.0" + resolved "https://registry.yarnpkg.com/supports-color/-/supports-color-7.2.0.tgz#1b7dcdcb32b8138801b3e478ba6a51caa89648da" + integrity sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw== + dependencies: + has-flag "^4.0.0" + +supports-color@^8.0.0: + version "8.1.1" + resolved "https://registry.yarnpkg.com/supports-color/-/supports-color-8.1.1.tgz#cd6fc17e28500cff56c1b86c0a7fd4a54a73005c" + integrity sha512-MpUEN2OodtUzxvKQl72cUF7RQ5EiHsGvSsVG0ia9c5RbWGL2CI4C7EpPS8UTBIplnlzZiNuV56w+FuNxy3ty2Q== + dependencies: + has-flag "^4.0.0" + +supports-preserve-symlinks-flag@^1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/supports-preserve-symlinks-flag/-/supports-preserve-symlinks-flag-1.0.0.tgz#6eda4bd344a3c94aea376d4cc31bc77311039e09" + integrity sha512-ot0WnXS9fgdkgIcePe6RHNk1WA8+muPa6cSjeR3V8K27q9BB1rTE3R1p7Hv0z1ZyAc8s6Vvv8DIyWf681MAt0w== + +tapable@^1.0.0: + version "1.1.3" + resolved "https://registry.yarnpkg.com/tapable/-/tapable-1.1.3.tgz#a1fccc06b58db61fd7a45da2da44f5f3a3e67ba2" + integrity sha512-4WK/bYZmj8xLr+HUCODHGF1ZFzsYffasLUgEiMBY4fgtltdO6B4WJtlSbPaDTLpYTcGVwM2qLnFTICEcNxs3kA== + +tapable@^2.0.0, tapable@^2.1.1, tapable@^2.2.0: + version "2.2.1" + resolved "https://registry.yarnpkg.com/tapable/-/tapable-2.2.1.tgz#1967a73ef4060a82f12ab96af86d52fdb76eeca0" + integrity sha512-GNzQvQTOIP6RyTfE2Qxb8ZVlNmw0n88vp1szwWRimP02mnTsx3Wtn5qRdqY9w2XduFNUgvOwhNnQsjwCp+kqaQ== + +terser-webpack-plugin@^5.1.3: + version "5.3.1" + resolved "https://registry.yarnpkg.com/terser-webpack-plugin/-/terser-webpack-plugin-5.3.1.tgz#0320dcc270ad5372c1e8993fabbd927929773e54" + integrity sha512-GvlZdT6wPQKbDNW/GDQzZFg/j4vKU96yl2q6mcUkzKOgW4gwf1Z8cZToUCrz31XHlPWH8MVb1r2tFtdDtTGJ7g== + dependencies: + jest-worker "^27.4.5" + schema-utils "^3.1.1" + serialize-javascript "^6.0.0" + source-map "^0.6.1" + terser "^5.7.2" + +terser@^5.10.0, terser@^5.7.2: + version "5.12.0" + resolved "https://registry.yarnpkg.com/terser/-/terser-5.12.0.tgz#728c6bff05f7d1dcb687d8eace0644802a9dae8a" + integrity sha512-R3AUhNBGWiFc77HXag+1fXpAxTAFRQTJemlJKjAgD9r8xXTpjNKqIXwHM/o7Rh+O0kUJtS3WQVdBeMKFk5sw9A== + dependencies: + acorn "^8.5.0" + commander "^2.20.0" + source-map "~0.7.2" + source-map-support "~0.5.20" + +thunky@^1.0.2: + version "1.1.0" + resolved "https://registry.yarnpkg.com/thunky/-/thunky-1.1.0.tgz#5abaf714a9405db0504732bbccd2cedd9ef9537d" + integrity sha512-eHY7nBftgThBqOyHGVN+l8gF0BucP09fMo0oO/Lb0w1OF80dJv+lDVpXG60WMQvkcxAkNybKsrEIE3ZtKGmPrA== + +tiny-warning@^1.0.2: + version "1.0.3" + resolved "https://registry.yarnpkg.com/tiny-warning/-/tiny-warning-1.0.3.tgz#94a30db453df4c643d0fd566060d60a875d84754" + integrity sha512-lBN9zLN/oAf68o3zNXYrdCt1kP8WsiGW8Oo2ka41b2IM5JL/S1CTyX1rW0mb/zSuJun0ZUrDxx4sqvYS2FWzPA== + +to-regex-range@^5.0.1: + version "5.0.1" + resolved "https://registry.yarnpkg.com/to-regex-range/-/to-regex-range-5.0.1.tgz#1648c44aae7c8d988a326018ed72f5b4dd0392e4" + integrity sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ== + dependencies: + is-number "^7.0.0" + +toggle-selection@^1.0.6: + version "1.0.6" + resolved "https://registry.yarnpkg.com/toggle-selection/-/toggle-selection-1.0.6.tgz#6e45b1263f2017fa0acc7d89d78b15b8bf77da32" + integrity sha1-bkWxJj8gF/oKzH2J14sVuL932jI= + +toidentifier@1.0.1: + version "1.0.1" + resolved "https://registry.yarnpkg.com/toidentifier/-/toidentifier-1.0.1.tgz#3be34321a88a820ed1bd80dfaa33e479fbb8dd35" + integrity sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA== + +tr46@~0.0.3: + version "0.0.3" + resolved "https://registry.yarnpkg.com/tr46/-/tr46-0.0.3.tgz#8184fd347dac9cdc185992f3a6622e14b9d9ab6a" + integrity sha1-gYT9NH2snNwYWZLzpmIuFLnZq2o= + +ts-loader@^8.0.18: + version "8.3.0" + resolved "https://registry.yarnpkg.com/ts-loader/-/ts-loader-8.3.0.tgz#83360496d6f8004fab35825279132c93412edf33" + integrity sha512-MgGly4I6cStsJy27ViE32UoqxPTN9Xly4anxxVyaIWR+9BGxboV4EyJBGfR3RePV7Ksjj3rHmPZJeIt+7o4Vag== + dependencies: + chalk "^4.1.0" + enhanced-resolve "^4.0.0" + loader-utils "^2.0.0" + micromatch "^4.0.0" + semver "^7.3.4" + +tslib@^2.0.3: + version "2.3.1" + resolved "https://registry.yarnpkg.com/tslib/-/tslib-2.3.1.tgz#e8a335add5ceae51aa261d32a490158ef042ef01" + integrity sha512-77EbyPPpMz+FRFRuAFlWMtmgUWGe9UOG2Z25NqCwiIjRhOf5iKGuzSe5P2w1laq+FkRy4p+PCuVkJSGkzTEKVw== + +type-is@~1.6.18: + version "1.6.18" + resolved "https://registry.yarnpkg.com/type-is/-/type-is-1.6.18.tgz#4e552cd05df09467dcbc4ef739de89f2cf37c131" + integrity sha512-TkRKr9sUTxEH8MdfuCSP7VizJyzRNMjj2J2do2Jr3Kym598JVdEksuzPQCnlFPW4ky9Q+iA+ma9BGm06XQBy8g== + dependencies: + media-typer "0.3.0" + mime-types "~2.1.24" + +typescript@^4.0.3: + version "4.6.2" + resolved "https://registry.yarnpkg.com/typescript/-/typescript-4.6.2.tgz#fe12d2727b708f4eef40f51598b3398baa9611d4" + integrity sha512-HM/hFigTBHZhLXshn9sN37H085+hQGeJHJ/X7LpBWLID/fbc2acUMfU+lGD98X81sKP+pFa9f0DZmCwB9GnbAg== + +unpipe@1.0.0, unpipe@~1.0.0: + version "1.0.0" + resolved "https://registry.yarnpkg.com/unpipe/-/unpipe-1.0.0.tgz#b2bf4ee8514aae6165b4817829d21b2ef49904ec" + integrity sha1-sr9O6FFKrmFltIF4KdIbLvSZBOw= + +uri-js@^4.2.2: + version "4.4.1" + resolved "https://registry.yarnpkg.com/uri-js/-/uri-js-4.4.1.tgz#9b1a52595225859e55f669d928f88c6c57f2a77e" + integrity sha512-7rKUyy33Q1yc98pQ1DAmLtwX109F7TIfWlW1Ydo8Wl1ii1SeHieeh0HHfPeL2fMXK6z0s8ecKs9frCuLJvndBg== + dependencies: + punycode "^2.1.0" + +util-deprecate@^1.0.1, util-deprecate@^1.0.2, util-deprecate@~1.0.1: + version "1.0.2" + resolved "https://registry.yarnpkg.com/util-deprecate/-/util-deprecate-1.0.2.tgz#450d4dc9fa70de732762fbd2d4a28981419a0ccf" + integrity sha1-RQ1Nyfpw3nMnYvvS1KKJgUGaDM8= + +utila@~0.4: + version "0.4.0" + resolved "https://registry.yarnpkg.com/utila/-/utila-0.4.0.tgz#8a16a05d445657a3aea5eecc5b12a4fa5379772c" + integrity sha1-ihagXURWV6Oupe7MWxKk+lN5dyw= + +utils-merge@1.0.1: + version "1.0.1" + resolved "https://registry.yarnpkg.com/utils-merge/-/utils-merge-1.0.1.tgz#9f95710f50a267947b2ccc124741c1028427e713" + integrity sha1-n5VxD1CiZ5R7LMwSR0HBAoQn5xM= + +uuid@^8.3.2: + version "8.3.2" + resolved "https://registry.yarnpkg.com/uuid/-/uuid-8.3.2.tgz#80d5b5ced271bb9af6c445f21a1a04c606cefbe2" + integrity sha512-+NYs2QeMWy+GWFOEm9xnn6HCDp0l7QBD7ml8zLUmJ+93Q5NF0NocErnwkTkXVFNiX3/fpC6afS8Dhb/gz7R7eg== + +vary@~1.1.2: + version "1.1.2" + resolved "https://registry.yarnpkg.com/vary/-/vary-1.1.2.tgz#2299f02c6ded30d4a5961b0b9f74524a18f634fc" + integrity sha1-IpnwLG3tMNSllhsLn3RSShj2NPw= + +watchpack@^2.3.1: + version "2.3.1" + resolved "https://registry.yarnpkg.com/watchpack/-/watchpack-2.3.1.tgz#4200d9447b401156eeca7767ee610f8809bc9d25" + integrity sha512-x0t0JuydIo8qCNctdDrn1OzH/qDzk2+rdCOC3YzumZ42fiMqmQ7T3xQurykYMhYfHaPHTp4ZxAx2NfUo1K6QaA== + dependencies: + glob-to-regexp "^0.4.1" + graceful-fs "^4.1.2" + +wbuf@^1.1.0, wbuf@^1.7.3: + version "1.7.3" + resolved "https://registry.yarnpkg.com/wbuf/-/wbuf-1.7.3.tgz#c1d8d149316d3ea852848895cb6a0bfe887b87df" + integrity sha512-O84QOnr0icsbFGLS0O3bI5FswxzRr8/gHwWkDlQFskhSPryQXvrTMxjxGP4+iWYoauLoBvfDpkrOauZ+0iZpDA== + dependencies: + minimalistic-assert "^1.0.0" + +webidl-conversions@^3.0.0: + version "3.0.1" + resolved "https://registry.yarnpkg.com/webidl-conversions/-/webidl-conversions-3.0.1.tgz#24534275e2a7bc6be7bc86611cc16ae0a5654871" + integrity sha1-JFNCdeKnvGvnvIZhHMFq4KVlSHE= + +webpack-cli@^4.5.0: + version "4.9.2" + resolved "https://registry.yarnpkg.com/webpack-cli/-/webpack-cli-4.9.2.tgz#77c1adaea020c3f9e2db8aad8ea78d235c83659d" + integrity sha512-m3/AACnBBzK/kMTcxWHcZFPrw/eQuY4Df1TxvIWfWM2x7mRqBQCqKEd96oCUa9jkapLBaFfRce33eGDb4Pr7YQ== + dependencies: + "@discoveryjs/json-ext" "^0.5.0" + "@webpack-cli/configtest" "^1.1.1" + "@webpack-cli/info" "^1.4.1" + "@webpack-cli/serve" "^1.6.1" + colorette "^2.0.14" + commander "^7.0.0" + execa "^5.0.0" + fastest-levenshtein "^1.0.12" + import-local "^3.0.2" + interpret "^2.2.0" + rechoir "^0.7.0" + webpack-merge "^5.7.3" + +webpack-dev-middleware@^5.3.1: + version "5.3.1" + resolved "https://registry.yarnpkg.com/webpack-dev-middleware/-/webpack-dev-middleware-5.3.1.tgz#aa079a8dedd7e58bfeab358a9af7dab304cee57f" + integrity sha512-81EujCKkyles2wphtdrnPg/QqegC/AtqNH//mQkBYSMqwFVCQrxM6ktB2O/SPlZy7LqeEfTbV3cZARGQz6umhg== + dependencies: + colorette "^2.0.10" + memfs "^3.4.1" + mime-types "^2.1.31" + range-parser "^1.2.1" + schema-utils "^4.0.0" + +webpack-dev-server@^4.7.4: + version "4.7.4" + resolved "https://registry.yarnpkg.com/webpack-dev-server/-/webpack-dev-server-4.7.4.tgz#d0ef7da78224578384e795ac228d8efb63d5f945" + integrity sha512-nfdsb02Zi2qzkNmgtZjkrMOcXnYZ6FLKcQwpxT7MvmHKc+oTtDsBju8j+NMyAygZ9GW1jMEUpy3itHtqgEhe1A== + dependencies: + "@types/bonjour" "^3.5.9" + "@types/connect-history-api-fallback" "^1.3.5" + "@types/express" "^4.17.13" + "@types/serve-index" "^1.9.1" + "@types/sockjs" "^0.3.33" + "@types/ws" "^8.2.2" + ansi-html-community "^0.0.8" + bonjour "^3.5.0" + chokidar "^3.5.3" + colorette "^2.0.10" + compression "^1.7.4" + connect-history-api-fallback "^1.6.0" + default-gateway "^6.0.3" + del "^6.0.0" + express "^4.17.1" + graceful-fs "^4.2.6" + html-entities "^2.3.2" + http-proxy-middleware "^2.0.0" + ipaddr.js "^2.0.1" + open "^8.0.9" + p-retry "^4.5.0" + portfinder "^1.0.28" + schema-utils "^4.0.0" + selfsigned "^2.0.0" + serve-index "^1.9.1" + sockjs "^0.3.21" + spdy "^4.0.2" + strip-ansi "^7.0.0" + webpack-dev-middleware "^5.3.1" + ws "^8.4.2" + +webpack-merge@^5.7.3: + version "5.8.0" + resolved "https://registry.yarnpkg.com/webpack-merge/-/webpack-merge-5.8.0.tgz#2b39dbf22af87776ad744c390223731d30a68f61" + integrity sha512-/SaI7xY0831XwP6kzuwhKWVKDP9t1QY1h65lAFLbZqMPIuYcD9QAW4u9STIbU9kaJbPBB/geU/gLr1wDjOhQ+Q== + dependencies: + clone-deep "^4.0.1" + wildcard "^2.0.0" + +webpack-sources@^3.2.3: + version "3.2.3" + resolved "https://registry.yarnpkg.com/webpack-sources/-/webpack-sources-3.2.3.tgz#2d4daab8451fd4b240cc27055ff6a0c2ccea0cde" + integrity sha512-/DyMEOrDgLKKIG0fmvtz+4dUX/3Ghozwgm6iPp8KRhvn+eQf9+Q7GWxVNMk3+uCPWfdXYC4ExGBckIXdFEfH1w== + +webpack@^5.28.0: + version "5.70.0" + resolved "https://registry.yarnpkg.com/webpack/-/webpack-5.70.0.tgz#3461e6287a72b5e6e2f4872700bc8de0d7500e6d" + integrity sha512-ZMWWy8CeuTTjCxbeaQI21xSswseF2oNOwc70QSKNePvmxE7XW36i7vpBMYZFAUHPwQiEbNGCEYIOOlyRbdGmxw== + dependencies: + "@types/eslint-scope" "^3.7.3" + "@types/estree" "^0.0.51" + "@webassemblyjs/ast" "1.11.1" + "@webassemblyjs/wasm-edit" "1.11.1" + "@webassemblyjs/wasm-parser" "1.11.1" + acorn "^8.4.1" + acorn-import-assertions "^1.7.6" + browserslist "^4.14.5" + chrome-trace-event "^1.0.2" + enhanced-resolve "^5.9.2" + es-module-lexer "^0.9.0" + eslint-scope "5.1.1" + events "^3.2.0" + glob-to-regexp "^0.4.1" + graceful-fs "^4.2.9" + json-parse-better-errors "^1.0.2" + loader-runner "^4.2.0" + mime-types "^2.1.27" + neo-async "^2.6.2" + schema-utils "^3.1.0" + tapable "^2.1.1" + terser-webpack-plugin "^5.1.3" + watchpack "^2.3.1" + webpack-sources "^3.2.3" + +websocket-driver@>=0.5.1, websocket-driver@^0.7.4: + version "0.7.4" + resolved "https://registry.yarnpkg.com/websocket-driver/-/websocket-driver-0.7.4.tgz#89ad5295bbf64b480abcba31e4953aca706f5760" + integrity sha512-b17KeDIQVjvb0ssuSDF2cYXSg2iztliJ4B9WdsuB6J952qCPKmnVq4DyW5motImXHDC1cBT/1UezrJVsKw5zjg== + dependencies: + http-parser-js ">=0.5.1" + safe-buffer ">=5.1.0" + websocket-extensions ">=0.1.1" + +websocket-extensions@>=0.1.1: + version "0.1.4" + resolved "https://registry.yarnpkg.com/websocket-extensions/-/websocket-extensions-0.1.4.tgz#7f8473bc839dfd87608adb95d7eb075211578a42" + integrity sha512-OqedPIGOfsDlo31UNwYbCFMSaO9m9G/0faIHj5/dZFDMFqPTcx6UwqyOy3COEaEOg/9VsGIpdqn62W5KhoKSpg== + +whatwg-fetch@>=0.10.0: + version "3.6.2" + resolved "https://registry.yarnpkg.com/whatwg-fetch/-/whatwg-fetch-3.6.2.tgz#dced24f37f2624ed0281725d51d0e2e3fe677f8c" + integrity sha512-bJlen0FcuU/0EMLrdbJ7zOnW6ITZLrZMIarMUVmdKtsGvZna8vxKYaexICWPfZ8qwf9fzNq+UEIZrnSaApt6RA== + +whatwg-url@^5.0.0: + version "5.0.0" + resolved "https://registry.yarnpkg.com/whatwg-url/-/whatwg-url-5.0.0.tgz#966454e8765462e37644d3626f6742ce8b70965d" + integrity sha1-lmRU6HZUYuN2RNNib2dCzotwll0= + dependencies: + tr46 "~0.0.3" + webidl-conversions "^3.0.0" + +which@^2.0.1: + version "2.0.2" + resolved "https://registry.yarnpkg.com/which/-/which-2.0.2.tgz#7c6a8dd0a636a0327e10b59c9286eee93f3f51b1" + integrity sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA== + dependencies: + isexe "^2.0.0" + +wildcard@^2.0.0: + version "2.0.0" + resolved "https://registry.yarnpkg.com/wildcard/-/wildcard-2.0.0.tgz#a77d20e5200c6faaac979e4b3aadc7b3dd7f8fec" + integrity sha512-JcKqAHLPxcdb9KM49dufGXn2x3ssnfjbcaQdLlfZsL9rH9wgDQjUtDxbo8NE0F6SFvydeu1VhZe7hZuHsB2/pw== + +wrappy@1: + version "1.0.2" + resolved "https://registry.yarnpkg.com/wrappy/-/wrappy-1.0.2.tgz#b5243d8f3ec1aa35f1364605bc0d1036e30ab69f" + integrity sha1-tSQ9jz7BqjXxNkYFvA0QNuMKtp8= + +ws@^8.4.2: + version "8.5.0" + resolved "https://registry.yarnpkg.com/ws/-/ws-8.5.0.tgz#bfb4be96600757fe5382de12c670dab984a1ed4f" + integrity sha512-BWX0SWVgLPzYwF8lTzEy1egjhS4S4OEAHfsO8o65WOVsrnSRGaSiUaa9e0ggGlkMTtBlmOpEXiie9RUcBO86qg== + +yallist@^4.0.0: + version "4.0.0" + resolved "https://registry.yarnpkg.com/yallist/-/yallist-4.0.0.tgz#9bb92790d9c0effec63be73519e11a35019a3a72" + integrity sha512-3wdGidZyq5PB084XLES5TpOSRA3wjXAlIWMhum2kRcv/41Sn2emQ0dycQW4uZXLejwKvg6EsvbdlVL+FYEct7A== diff --git a/plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_0/worker0.1623143089861.pt.trace.json.gz b/plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_0/worker0.1623143089861.pt.trace.json.gz similarity index 100% rename from plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_0/worker0.1623143089861.pt.trace.json.gz rename to plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_0/worker0.1623143089861.pt.trace.json.gz diff --git a/plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_0/worker0.1623143566756.pt.trace.json.gz b/plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_0/worker0.1623143566756.pt.trace.json.gz similarity index 100% rename from plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_0/worker0.1623143566756.pt.trace.json.gz rename to plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_0/worker0.1623143566756.pt.trace.json.gz diff --git a/plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_4/worker0.1623212756351.pt.trace.json.gz b/plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_4/worker0.1623212756351.pt.trace.json.gz similarity index 100% rename from plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_4/worker0.1623212756351.pt.trace.json.gz rename to plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_4/worker0.1623212756351.pt.trace.json.gz diff --git a/plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_4/worker0.1623213129365.pt.trace.json.gz b/plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_4/worker0.1623213129365.pt.trace.json.gz similarity index 100% rename from plugins/tensorboard-plugins/tb_plugin/test/resources/resnet50_num_workers_4/worker0.1623213129365.pt.trace.json.gz rename to plugins/tensorboard-plugins/tb_plugin/samples/resnet50_num_workers_4/worker0.1623213129365.pt.trace.json.gz diff --git a/plugins/tensorboard-plugins/tb_plugin/setup.py b/plugins/tensorboard-plugins/tb_plugin/setup.py index 2d4260b2133ae00a91831a7e2867b467e029d108..3c09006122c776df8fbe8af5836711613e3f6a9c 100644 --- a/plugins/tensorboard-plugins/tb_plugin/setup.py +++ b/plugins/tensorboard-plugins/tb_plugin/setup.py @@ -1,5 +1,6 @@ # ------------------------------------------------------------------------- -# Copyright (c) Microsoft Corporation. +# Copyright (c) Microsoft Corporation. All rights reserved. +# # Copyright(c) 2023 Huawei Technologies. # All rights reserved # @@ -20,13 +21,8 @@ import os import pathlib import subprocess -from configparser import ConfigParser - import setuptools -config = ConfigParser() -config.read('./torch_tb_profiler/config/config.ini') - def read(rel_path): here = os.path.abspath(os.path.dirname(__file__)) @@ -87,16 +83,17 @@ setuptools.setup( name="torch-tb-profiler-ascend", version=get_version(os.path.join('torch_tb_profiler', '__init__.py')), description="PyTorch Ascend Profiler TensorBoard Plugin", - long_description=f"PyTorch Ascend Profiler TensorBoard Plugin: {config.get('URL', 'repository_url')}", - url=config.get('URL', 'repository_url'), + long_description="PyTorch Ascend Profiler TensorBoard Plugin : \ + https://gitee.com/ascend/att/tree/master/plugins/tensorboard-plugins/tb_plugin", + url="https://gitee.com/ascend/att/tree/master/plugins/tensorboard-plugins/tb_plugin", author="Ascend Team", - author_email=config.get('EMAIL', 'author_email'), + author_email="pmail_mindstudio@huawei.com", cmdclass={ "build_fe": build_fe }, packages=setuptools.find_packages(), package_data={ - "torch_tb_profiler": ["static/**", "config/**"], + "torch_tb_profiler": ["static/**"], }, entry_points={ "tensorboard_plugins": [ diff --git a/plugins/tensorboard-plugins/tb_plugin/test/test_tensorboard_end2end.py b/plugins/tensorboard-plugins/tb_plugin/test/test_tensorboard_end2end.py index 46636d11801a739935b4f385c6ce548009d09916..fae95b49050537b921e291a4771c63a6bff35690 100644 --- a/plugins/tensorboard-plugins/tb_plugin/test/test_tensorboard_end2end.py +++ b/plugins/tensorboard-plugins/tb_plugin/test/test_tensorboard_end2end.py @@ -13,7 +13,7 @@ from urllib.error import HTTPError def get_samples_dir(): - return os.path.join(os.path.dirname(os.path.abspath(__file__)), 'resources') + return os.path.join(os.path.dirname(os.path.abspath(__file__)), '../samples') class TestEnd2End(unittest.TestCase): diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/__init__.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/__init__.py index f7b951e609e5c65895a6db82d391e8d584eb37c8..fd7b265cfa7d67023075ec8d9bc59ed85f4e0f15 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/__init__.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/__init__.py @@ -4,4 +4,4 @@ # Entry point for Pytorch TensorBoard plugin package. -__version__ = '0.4.0.11' +__version__ = '0.4.0.8' diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/config/config.ini b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/config/config.ini deleted file mode 100644 index 500d472d27b2ca574e07829a64c50d6eb2ab7e71..0000000000000000000000000000000000000000 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/config/config.ini +++ /dev/null @@ -1,11 +0,0 @@ -[URL] -pytorch_data_loading_url = https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading -pytorch_amp_url = https://pytorch.org/docs/stable/amp.html -pytorch_ckp_url = https://pytorch.org/docs/stable/checkpoint.html -cuda_nn_ddp_instead_url = https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead -compress_url = https://pytorch.org/docs/stable/ddp_comm_hooks.html -grad_acc_url = https://towardsdatascience.com/what-is-gradient-accumulation-in-deep-learning-ec034122cfa -lamb_url = https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedLAMB -repository_url = https://gitee.com/ascend/att/tree/master/plugins/tensorboard-plugins/tb_plugin -[EMAIL] -author_email = pmail_mindstudio@huawei.com \ No newline at end of file diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/consts.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/consts.py index b3e202af61eb9df1d210cd366e7d172075e1e570..533effb8bb91f1f775fb1b98725b63854182ef53 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/consts.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/consts.py @@ -35,8 +35,6 @@ NODE_PROCESS_PATTERN = re.compile(r"""^(.*)_(\d+)""") MONITOR_RUN_REFRESH_INTERNAL_IN_SECONDS = 10 MAX_GPU_PER_NODE = 64 MAX_FILE_SIZE = 500 * 1024 * 1024 -MAX_LINUX_PATH_LENGTH = 4096 -MAX_WINDOWS_PATH_LENGTH = 260 View = namedtuple('View', 'id, name, display_name') OVERALL_VIEW = View(1, 'overall', 'Overview') diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/__init__.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/__init__.py index 296f53b7c813b2c97b498469f49b973438d9f3ae..6bd764e88d4fecd142e7a953b1adb5c4a72262b9 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/__init__.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/__init__.py @@ -1,23 +1,4 @@ -# ------------------------------------------------------------------------- -# Copyright (c) Microsoft Corporation. -# Copyright(c) 2023 Huawei Technologies. -# All rights reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Modifications: Add visualization of PyTorch Ascend profiling. -# -------------------------------------------------------------------------- from .cache import Cache from .file import (BaseFileSystem, StatData, abspath, basename, download_file, exists, get_filesystem, glob, isdir, join, listdir, - makedirs, read, register_filesystem, relpath, walk, stat, check_file_valid) + makedirs, read, register_filesystem, relpath, walk, stat) diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/azureblob.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/azureblob.py index 2fcd69fee8c24393458875635c17bd74a71b0fc4..b0ac49a655fd3d999ea80dfc3e6fa62e33fc5269 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/azureblob.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/azureblob.py @@ -20,9 +20,9 @@ class AzureBlobSystem(RemotePath, BaseFileSystem): raise ImportError('azure-storage-blob must be installed for Azure Blob support.') self.connection_string = os.environ.get('AZURE_STORAGE_CONNECTION_STRING', None) - def exists(self, filename): + def exists(self, dirname): """Returns whether the path is a directory or not.""" - basename, parts = self.split_blob_path(filename) + basename, parts = self.split_blob_path(dirname) if basename is None or parts is None: return False if basename == '': @@ -31,10 +31,10 @@ class AzureBlobSystem(RemotePath, BaseFileSystem): else: return basename == parts[0] - def read(self, file, binary_mode=False, size=None, continue_from=None): + def read(self, filename, binary_mode=False, size=None, continue_from=None): """Reads contents of a file to a string.""" - logger.info('azure blob: starting reading file %s' % file) - account, container, path = self.container_and_path(file) + logger.info('azure blob: starting reading file %s' % filename) + account, container, path = self.container_and_path(filename) client = self.create_container_client(account, container) blob_client = client.get_blob_client(path) if not blob_client.exists(): @@ -47,7 +47,7 @@ class AzureBlobSystem(RemotePath, BaseFileSystem): continuation_token = downloader.size data = downloader.readall() - logger.info('azure blob: file %s download is done, size is %d' % (file, len(data))) + logger.info('azure blob: file %s download is done, size is %d' % (filename, len(data))) if binary_mode: return as_bytes(data), continuation_token else: @@ -122,7 +122,7 @@ class AzureBlobSystem(RemotePath, BaseFileSystem): items.append(item) return items - def makedirs(self, path): + def makedirs(self, dirname): """No need create directory since the upload blob will automatically create""" pass diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/file.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/file.py index 9ef5d8485264f18426c18147663f2e1b9fb6900e..dc9abb056860d7a7708533bba55995a1ac6a5e79 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/file.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/file.py @@ -15,34 +15,32 @@ The following functionalities are added after forking: """ import glob as py_glob import os -import platform -import sys import tempfile from .. import utils from .base import BaseFileSystem, LocalPath, RemotePath, StatData from .utils import as_bytes, as_text, parse_blob_url -from ..consts import MAX_FILE_SIZE, MAX_WINDOWS_PATH_LENGTH, MAX_LINUX_PATH_LENGTH logger = utils.get_logger() -S3_ENABLED = True try: import boto3 import botocore.exceptions + + S3_ENABLED = True except ImportError: S3_ENABLED = False -BLOB_ENABLED = True try: from azure.storage.blob import ContainerClient + BLOB_ENABLED = True except ImportError: BLOB_ENABLED = False -GS_ENABLED = True try: # Imports the Google Cloud client library from google.cloud import storage + GS_ENABLED = True except ImportError: GS_ENABLED = False @@ -88,23 +86,19 @@ class LocalFileSystem(LocalPath, BaseFileSystem): def __init__(self): pass - @staticmethod - def islink(path): - return os.path.islink(path) - def exists(self, filename): return os.path.exists(filename) - def read(self, file, binary_mode=False, size=None, continue_from=None): + def read(self, filename, binary_mode=False, size=None, continue_from=None): mode = "rb" if binary_mode else "r" encoding = None if binary_mode else "utf8" - if not self.exists(file): - raise FileNotFoundError(file) + if not self.exists(filename): + raise FileNotFoundError(filename) offset = None if continue_from is not None: offset = continue_from.get("opaque_offset", None) - with open(file, mode, encoding=encoding) as f: + with open(filename, mode, encoding=encoding) as f: if offset is not None: f.seek(offset) data = f.read(size) @@ -166,6 +160,10 @@ class LocalFileSystem(LocalPath, BaseFileSystem): return StatData(file_length) def walk(self, top, topdown=True, onerror=None): + # Note on followlinks=True: per the tensorboard documentation [1], users are encouraged to + # use symlink trees to have fine-grained control over the filesystem layout of runs. To + # support such trees, we must follow links. + # [1] https://github.com/tensorflow/tensorboard/blob/master/README.md#logdir--logdir_spec-legacy-mode yield from os.walk(top, topdown, onerror, followlinks=True) @@ -200,10 +198,10 @@ class S3FileSystem(RemotePath, BaseFileSystem): return True return False - def read(self, file, binary_mode=False, size=None, continue_from=None): + def read(self, filename, binary_mode=False, size=None, continue_from=None): """Reads contents of a file to a string.""" s3 = boto3.resource("s3", endpoint_url=self._s3_endpoint) - bucket, path = self.bucket_and_path(file) + bucket, path = self.bucket_and_path(filename) args = {} # S3 use continuation tokens of the form: {byte_offset: number} @@ -218,7 +216,7 @@ class S3FileSystem(RemotePath, BaseFileSystem): if offset != 0 or endpoint != "": args["Range"] = "bytes={}-{}".format(offset, endpoint) - logger.info("s3: starting reading file %s" % file) + logger.info("s3: starting reading file %s" % filename) try: stream = s3.Object(bucket, path).get(**args)["Body"].read() except botocore.exceptions.ClientError as exc: @@ -240,7 +238,7 @@ class S3FileSystem(RemotePath, BaseFileSystem): raise logger.info("s3: file %s download is done, size is %d" % - (file, len(stream))) + (filename, len(stream))) # `stream` should contain raw bytes here (i.e., there has been neither decoding nor newline translation), # so the byte offset increases by the expected amount. continuation_token = {"byte_offset": (offset + len(stream))} @@ -263,6 +261,9 @@ class S3FileSystem(RemotePath, BaseFileSystem): def download_file(self, file_to_download, file_to_save): logger.info("s3: starting downloading file %s as %s" % (file_to_download, file_to_save)) + # Use boto3.resource instead of boto3.client('s3') to support minio. + # https://docs.min.io/docs/how-to-use-aws-sdk-for-python-with-minio-server.html + # To support minio, the S3_ENDPOINT need to be set like: S3_ENDPOINT=http://localhost:9000 s3 = boto3.resource("s3", endpoint_url=self._s3_endpoint) bucket, path = self.bucket_and_path(file_to_download) s3.Bucket(bucket).download_file(path, file_to_save) @@ -320,14 +321,14 @@ class S3FileSystem(RemotePath, BaseFileSystem): keys.append(key) return keys - def makedirs(self, path): + def makedirs(self, dirname): """Creates a directory and all parent/intermediate directories.""" - if not self.exists(path): + if not self.exists(dirname): client = boto3.client("s3", endpoint_url=self._s3_endpoint) - bucket, dir_path = self.bucket_and_path(path) - if not dir_path.endswith("/"): - dir_path += "/" - client.put_object(Body="", Bucket=bucket, Key=dir_path) + bucket, path = self.bucket_and_path(dirname) + if not path.endswith("/"): + path += "/" + client.put_object(Body="", Bucket=bucket, Key=path) def stat(self, filename): """Returns file statistics for a given path.""" @@ -465,7 +466,7 @@ class File(object): if line and (line[-1] == "\n" or not self.buff): return line if not self.buff: - return None + raise StopIteration() else: index = self.buff.find("\n", self.buff_offset) if index != -1: @@ -480,7 +481,7 @@ class File(object): if line and (line[-1] == "\n" or not self.buff): return line if not self.buff: - return None + raise StopIteration() def next(self): return self.__next__() @@ -619,40 +620,3 @@ def stat(filename): def read(file): with File(file, 'rb') as f: return f.read() - - -def is_link(path): - return LocalFileSystem.islink(path) - - -def is_too_big_file(filepath): - return stat(filepath).length > MAX_FILE_SIZE - - -def has_too_long_path(filepath): - if platform.system() == 'Windows' and len(filepath) > MAX_WINDOWS_PATH_LENGTH: - logger.warning( - f'The path length of the file "{filepath}" exceeds the maximum limit of {MAX_WINDOWS_PATH_LENGTH} ' - f'and will be skipped.') - return True - elif len(filepath) > MAX_WINDOWS_PATH_LENGTH: - logger.warning( - f'The path length of the file "{filepath}" exceeds the maximum limit of {MAX_LINUX_PATH_LENGTH} ' - f'and will be skipped.') - return True - else: - return False - - -def check_file_valid(filepath): - if is_link(filepath): - logger.warning(f'File "{filepath}" is a soft link and will be skipped.') - return False - if is_too_big_file(filepath): - logger.warning( - f'File "{filepath}" exceeds the maximum limit size of 500MB and will be skipped.') - return False - if has_too_long_path(filepath): - return False - return True - diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/gs.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/gs.py index 8596bce2b892b7188155d05330a6356a83323eff..d3a46877326b12a5e8be49a65cf4c90be8157311 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/gs.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/io/gs.py @@ -16,14 +16,14 @@ class GoogleBlobSystem(RemotePath, BaseFileSystem): if not storage: raise ImportError('google-cloud-storage must be installed for Google Cloud Blob support.') - def exists(self, filename): + def exists(self, dirname): """Returns whether the path is a directory or not.""" - bucket_name, path = self.bucket_and_path(filename) + bucket_name, path = self.bucket_and_path(dirname) client = self.create_google_cloud_client() bucket = client.bucket(bucket_name) return bucket.blob(path).exists() - def read(self, file, binary_mode=False, size=None, continue_from=None): + def read(self, filename, binary_mode=False, size=None, continue_from=None): raise NotImplementedError def write(self, filename, file_content, binary_mode=False): @@ -62,7 +62,7 @@ class GoogleBlobSystem(RemotePath, BaseFileSystem): items.append(item) return items - def makedirs(self, path): + def makedirs(self, dirname): """No need create directory since the upload blob will automatically create""" pass diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/plugin.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/plugin.py index 2651f87c087a419c950f93b201606e7601a33a08..6091fdbcd906bf49e4e631afe7d2ba57e65ce711 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/plugin.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/plugin.py @@ -1,5 +1,6 @@ # ------------------------------------------------------------------------- -# Copyright (c) Microsoft Corporation. +# Copyright (c) Microsoft Corporation. All rights reserved. +# # Copyright(c) 2023 Huawei Technologies. # All rights reserved # @@ -46,7 +47,6 @@ def decorate_headers(func): headers = func(*args, **kwargs) headers.extend(TorchProfilerPlugin.headers) return headers - return wrapper @@ -344,23 +344,14 @@ class TorchProfilerPlugin(base_plugin.TBPlugin): end_ts = float(end_ts) for key in operator_memory_events: if start_ts is not None and end_ts is not None: - operator_memory_events[key] = [ - i - for i in operator_memory_events[key] - if i[2] and start_ts <= i[2] <= end_ts - ] + operator_memory_events[key] = [i for i in operator_memory_events[key] if + i[2] and start_ts <= i[2] <= end_ts] elif start_ts is not None: - operator_memory_events[key] = [ - i - for i in operator_memory_events[key] - if i[2] and start_ts <= i[2] - ] + operator_memory_events[key] = [i for i in operator_memory_events[key] if + i[2] and start_ts <= i[2]] elif end_ts is not None: - operator_memory_events[key] = [ - i - for i in operator_memory_events[key] - if i[2] and end_ts >= i[2] - ] + operator_memory_events[key] = [i for i in operator_memory_events[key] if + i[2] and end_ts >= i[2]] return self.respond_as_json(temp_memory_events, True) else: if start_ts is not None: @@ -482,8 +473,9 @@ class TorchProfilerPlugin(base_plugin.TBPlugin): def _monitor_runs(self): logger.info('Monitor runs begin') - touched = set() + try: + touched = set() while True: try: logger.debug('Scan run dir') diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/__init__.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/__init__.py index 59a0e64155546ce75e1c4607cf35c3144a28271f..9ca062abf58245753361a96890a2ee1ccdec42fb 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/__init__.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/__init__.py @@ -1,6 +1,7 @@ # ------------------------------------------------------------------------- # Copyright (c) Microsoft Corporation. All rights reserved. # -------------------------------------------------------------------------- -__all__ = ['RunLoader'] from .loader import RunLoader + +__all__ = ['RunLoader'] diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/communication.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/communication.py index 0afcdb11a66f89b8a448713bf140e3293db7e503..00f8dc98139d5bbb96daffb5989b9c3c660f2cbc 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/communication.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/communication.py @@ -59,7 +59,7 @@ def analyze_communication_nodes(comm_node_list: List[CommunicationNode])\ total_comm_stats[comm_node.name][0] += 1 bytes_one_value = 0 if comm_node.input_shape: - for i, shape in enumerate(comm_node.input_shape): + for i in range(len(comm_node.input_shape)): if comm_node.input_type[i] == 'long int': bytes_one_value = 8 elif comm_node.input_type[i] == 'float': @@ -76,7 +76,7 @@ def analyze_communication_nodes(comm_node_list: List[CommunicationNode])\ logger.warning('Found an unknown tensor type: {}'.format(comm_node.input_type[i])) bytes_one_value = 0 total_size = 1 - for size in shape: + for size in comm_node.input_shape[i]: total_size *= size total_comm_stats[comm_node.name][1] += total_size * bytes_one_value total_comm_stats[comm_node.name][2].extend(comm_node.kernel_ranges) diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/data.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/data.py index 00544e635340c556d5346fc307bb29913c08929c..d6f9bb245eb2d170cb4a63e7f912a9c69932e28b 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/data.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/data.py @@ -22,16 +22,14 @@ import gzip import io as sysio import json import math -import os.path import re import tempfile from json.decoder import JSONDecodeError from typing import Dict, List, Optional -from configparser import ConfigParser from .op_tree import OpTreeBuilder from .. import io, utils -from ..consts import InputFilesType, INPUT_FILE_LIST +from ..consts import InputFilesType, MAX_FILE_SIZE, INPUT_FILE_LIST from ..utils import href from . import trace from .communication import analyze_communication_nodes @@ -46,9 +44,6 @@ from .tensor_cores_parser import TensorCoresParser from .trace import BaseEvent, EventTypes, MemoryEvent logger = utils.get_logger() -config = ConfigParser() -config_path = os.path.join(os.getcwd(), 'torch_tb_profiler', 'config', '../config/config.ini') -config.read(config_path) class RunProfileData(object): @@ -169,8 +164,15 @@ class RunProfileData(object): has_communication_overlap = False has_communication_wait_ops = False + def _check_file_size_valid(filepath): + if io.stat(filepath).length > MAX_FILE_SIZE: + logger.warning( + f'File "{filepath}" exceeds the maximum limit size of 500MB and will be skipped.') + return False + return True + for file in io.listdir(path): - if utils.is_npu_trace_path(file) and io.check_file_valid(io.join(path, file)): + if utils.is_npu_trace_path(file) and _check_file_size_valid(io.join(path, file)): has_trace = True trace_file = io.join(path, file) trace_path, trace_json = RunProfileData._preprocess_file(trace_file, cache_dir, 'Ascend') @@ -192,7 +194,7 @@ class RunProfileData(object): profile.profiler_start_ts = 0 for file in io.listdir(path): - if str(file) in INPUT_FILE_LIST and io.check_file_valid(io.join(path, file)): + if str(file) in INPUT_FILE_LIST and _check_file_size_valid(io.join(path, file)): if InputFilesType(file) == InputFilesType.KERNEL_DETAILS_CSV: has_kernel = True profile.kernel_file_path = io.join(path, file) @@ -260,10 +262,10 @@ class RunProfileData(object): try: trace_json = json.loads(fout.getvalue()) logger.warning('Get JSONDecodeError: %s, Re-encode it to temp file' % e.msg) + json_reencode = True except JSONDecodeError: logger.error(f'File "{trace_path}" is not in a legal JSON format and will be skipped.') return trace_path, {} - json_reencode = True # work-around to remove the 'Record Window End' events to avoid the huge end timestamp if device_target == 'Ascend': @@ -361,7 +363,7 @@ class RunProfileData(object): dataloader_ratio = self.avg_costs.costs[ProfileRole.DataLoader] / self.avg_costs.costs[ProfileRole.Total] if dataloader_ratio > 0.05: percentage = dataloader_ratio * 100 - url = config.get('URL', 'pytorch_data_loading_url') + url = 'https://pytorch.org/docs/stable/data.html#single-and-multi-process-data-loading' self.recommendations.append( f'This run has high time cost on input data loading. {percentage:.1f}% of the step ' + "time is in DataLoader. You could try to set num_workers on DataLoader's construction " + @@ -373,11 +375,12 @@ class RunProfileData(object): if self.device_props: # Tensor Cores feature is available on GPU cards with compute capability >= 7.0 + # https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications major = self.device_props[0].get('computeMajor') # If it's a pure CPU run, then self.tc_used_ratio is None, this rule will not be triggered. if major is not None and major >= 7: if math.isclose(self.tc_used_ratio, 0.0) and self.tc_eligible_ops_kernel_ratio > 0.0: - url = config.get('URL', 'pytorch_amp_url') + url = 'https://pytorch.org/docs/stable/amp.html' self.recommendations.append( f'Kernels with {round(self.tc_eligible_ops_kernel_ratio * 100)}%' ' time are launched by Tensor Cores eligible operators. ' @@ -392,8 +395,8 @@ class RunProfileData(object): if total_mem is not None and peak_mem > total_mem * 0.9: percentage = peak_mem / total_mem * 100 if total_mem > 0 else 0 total_mem_gb = total_mem / 1024 / 1024 / 1024 - ckp_url = config.get('URL', 'pytorch_ckp_url') - amp_url = config.get('URL', 'pytorch_amp_url') + ckp_url = 'https://pytorch.org/docs/stable/checkpoint.html' + amp_url = 'https://pytorch.org/docs/stable/amp.html' self.recommendations.append( f'Device memory usage is at the limit of device memory capacity ' f'({percentage:.1f}% of {total_mem_gb:.1f}GB on GPU{dev_id}). ' @@ -403,7 +406,7 @@ class RunProfileData(object): def _analyze_distributed_metrics(self): if self.use_dp and len(self.used_devices) > 1: - url = config.get('URL', 'cuda_nn_ddp_instead_url') + url = 'https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead' self.recommendations.append( f"It is recommended to {href('use DistributedDataParallel instead of DataParallel', url)}" ' to do multi-GPU training.') @@ -425,9 +428,9 @@ class RunProfileData(object): communication_ratio = self.avg_costs.costs[ProfileRole.Communication] / self.avg_costs.costs[ProfileRole.Total] if communication_ratio > 0.1: percentage = communication_ratio * 100 - compress_url = config.get('URL', 'compress_url') - grad_acc_url = config.get('URL', 'grad_acc_url') - lamb_url = config.get('URL', 'lamb_url') + compress_url = 'https://pytorch.org/docs/stable/ddp_comm_hooks.html', + grad_acc_url = 'https://towardsdatascience.com/what-is-gradient-accumulation-in-deep-learning-ec034122cfa' + lamb_url = 'https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedLAMB' self.recommendations.append( f'This run has high time cost on communication. {percentage:.1f}% of the step time is in ' f"communication. You could try {href('Gradient Compression', compress_url)} or " diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/diffrun/tree.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/diffrun/tree.py index c5cf5fad448122c74db46467cb0c70b8ce4f727e..a164bd3d37390ba367f0d504910e45050227ffbf 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/diffrun/tree.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/diffrun/tree.py @@ -56,9 +56,8 @@ class DiffNode: def compare_operator_nodes( left_nodes: List[OperatorNode], right_nodes: List[OperatorNode]) -> Generator['DiffNode', None, None]: - """Given two OperatorNode lists, find the DataLoader/Module/Backward/Optimizer node and - create the child list DiffNode - """ + '''Given two OperatorNode lists, find the DataLoader/Module/Backward/Optimizer node and create the child list DiffNode + ''' right_keys = [(type(r), r.name) for r in right_nodes] # find matching points in the two list diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/event_parser.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/event_parser.py index 9b364e0dbba55e07b939690d45123bbf6dc6fe23..3cd7ce9ff662a152cc9e1e4150bfe4d762e7a691 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/event_parser.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/event_parser.py @@ -3,7 +3,6 @@ # ------------------------------------------------------------------------- import sys from collections import defaultdict -from dataclasses import dataclass from enum import IntEnum from typing import Dict, Iterable, List, Optional, Tuple @@ -32,19 +31,11 @@ class ProfileRole(IntEnum): Total = 8 -@dataclass -class NodeInfoParams: - event: DurationEvent - corrid_to_device: Dict[int, List[DeviceNode]] - corrid_to_runtime: Dict[int, RuntimeNode] - externalid_to_runtime: Dict[int, List[RuntimeNode]] - tid2list: Dict[int, List[OperatorNode]] - pl_tid2list: Dict[int, List[PLProfileNode]] - tid2zero_rt_list: Dict[int, List[RuntimeNode]] - - class NodeParserMixin: def __init__(self, *args, **kwargs): + """Please refer to https://stackoverflow.com/questions/9575409/calling-parent-class-init-with-multiple-inheritance-whats-the-right-way # noqa: E501 + to see the reason why we need call super().__init__ like this way + """ super().__init__(*args, **kwargs) self.communication_data: Dict[int, CommunicationNode] = {} @@ -77,9 +68,14 @@ class NodeParserMixin: for event in events: if event.type == EventTypes.MEMORY: continue - params = NodeInfoParams(event, corrid_to_device, corrid_to_runtime, externalid_to_runtime, tid2list, - pl_tid2list, tid2zero_rt_list) - self._parse_node(params) + self._parse_node( + event, + corrid_to_device, + corrid_to_runtime, + externalid_to_runtime, + tid2list, + pl_tid2list, + tid2zero_rt_list) if CommLibTypes.Nccl in self.comm_lib: for event in events: @@ -120,14 +116,14 @@ class NodeParserMixin: return comm_node is not None - def _parse_node(self, params: NodeInfoParams): - event = params.event - corrid_to_device = params.corrid_to_device - corrid_to_runtime = params.corrid_to_runtime - externalid_to_runtime = params.externalid_to_runtime - tid2list = params.tid2list - pl_tid2list = params.pl_tid2list - tid2zero_rt_list = params.tid2zero_rt_list + def _parse_node(self, + event: DurationEvent, + corrid_to_device: Dict[int, List[DeviceNode]], + corrid_to_runtime: Dict[int, RuntimeNode], + externalid_to_runtime: Dict[int, List[RuntimeNode]], + tid2list: Dict[int, List[OperatorNode]], + pl_tid2list: Dict[int, List[PLProfileNode]], + tid2zero_rt_list: Dict[int, List[RuntimeNode]]): corrid = event.correlation_id tid = event.tid if event.type in [EventTypes.KERNEL, EventTypes.MEMCPY, EventTypes.MEMSET]: @@ -230,8 +226,8 @@ class StepParser: self.steps.append((self.cpu_min_ts, self.cpu_max_ts)) self.steps_names.append('0') - for i, role_range in enumerate(self.role_ranges): - self.role_ranges[i] = merge_ranges(role_range) + for i in range(len(self.role_ranges)): + self.role_ranges[i] = merge_ranges(self.role_ranges[i]) def update_device_steps(self, runtime_node_list: List[RuntimeNode]): self._update_steps_duration(*self._find_device_steps(runtime_node_list)) @@ -366,9 +362,9 @@ class StepParser: # Change step time to device side on the condition that any step have device time. is_use_gpu = prev_step_end_time is not None if is_use_gpu: - for i_step, step in enumerate(self.steps): - step_start_time = max(prev_step_end_time, step[0]) - step_end_time = step[1] + for i_step in range(len(self.steps)): + step_start_time = max(prev_step_end_time, self.steps[i_step][0]) + step_end_time = self.steps[i_step][1] if steps_device[i_step][0] == sys.maxsize: # When step i_step has no device event. # Assign to step_start_time when kernel is behind host step end. step_end_time = max(step_end_time, step_start_time) @@ -406,7 +402,7 @@ class StepParser: class EventParser(NodeParserMixin, StepParser): def __init__(self): super().__init__() - self.comm_node_list: List[CommunicationNode] = None + self.comm_node_list: Dict[CommunicationNode] = None def parse(self, events: Iterable[BaseEvent], fwd_bwd_map: Dict[int, int]) -> Dict[int, List[OperatorNode]]: with utils.timing('EventParser: parse nodes'): @@ -443,10 +439,10 @@ class EventParser(NodeParserMixin, StepParser): header = f'[{ctx.tid}]' + '.'.join(ctx.name_stack[1:]) # omit the CallTreeRoot prefix_len = len(ctx.name_stack) * 4 - 4 - 1 if len(ctx.name_stack) > 1: - logger.info(header) + print(header) prefix = ' ' * prefix_len - logger.info(prefix, node.name) - logger.info(prefix, 'time:', node.start_time, '-->', node.end_time) + print(prefix, node.name) + print(prefix, 'time:', node.start_time, '-->', node.end_time) def push(node: OperatorNode): ctx.name_stack.append(node.name) diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/kernel_parser.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/kernel_parser.py index 229251e60a90d5bf4fed514d5f175199b92d3870..838fc38ce60619977c3e096791241d7fc697562d 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/kernel_parser.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/kernel_parser.py @@ -6,7 +6,7 @@ from typing import Optional import numpy as np import pandas as pd -from .tensor_core import TcAllowlist +from .tensor_core import TC_Allowlist from .trace import EventTypes @@ -19,7 +19,7 @@ class KernelParser: events = [vars(event) for event in events if event.type == EventTypes.KERNEL] events = pd.DataFrame(events) events = events.astype({'type': 'category', 'name': 'string'}, copy=False) - events['tc_used'] = events['name'].map(lambda name: name in TcAllowlist) + events['tc_used'] = events['name'].map(lambda name: name in TC_Allowlist) def weighted_avg(x: pd.Series): try: diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/memory_parser.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/memory_parser.py index 64b78127a4c7a5675e5b2f71877754c541dde94f..766782be271240dabffc76bbc389d8659e601299 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/memory_parser.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/memory_parser.py @@ -25,7 +25,7 @@ class MemoryMetrics(IntEnum): class MemoryRecord: def __init__(self, scope: str, pid: int, tid: int, ts: int, device_type: DeviceType, device_id: int, - address: int, record_bytes: int, total_allocated: float, total_reserved: float): + address: int, bytes: int, total_allocated: float, total_reserved: float): self.scope = scope self.tid = tid self.pid = pid @@ -33,7 +33,7 @@ class MemoryRecord: self.device_type = device_type self.device_id = device_id self.addr = address - self.bytes = record_bytes + self.bytes = bytes self.total_allocated = total_allocated self.total_reserved = total_reserved self.op_name: Optional[str] = None @@ -132,7 +132,7 @@ class MemorySnapshot: for i in range(self_metric_length, metric_length): memory_metrics_keyed_by_node[node][device][i] += metrics[i] - for _, root in tid2tree.items(): + for tid, root in tid2tree.items(): for child in root.children: traverse_node_memory(child) @@ -217,8 +217,7 @@ class MemoryParser: """In the loop, one pass will process one record. The basic logic is: It will search from the node that last visited since both the records and tree is ordered already 1. it current node contains the records, then find the exactly child which just embrace it. - 2. otherwise, find the parent node and set the child_index, so that the parent node could continue from - previous visited node. # noqa: E501 + 2. otherwise, find the parent node and set the child_index, so that the parent node could continue from previous visited node. # noqa: E501 3. if there is not any node contains the records, then all remaining records will be ignored. """ record = records[record_index] diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/module_op.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/module_op.py index 15f1e4ef93a5234cdf6273f9830ac1a6f3aeaa41..061a503b411bb900c6a405c0b97c8a07dd986a00 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/module_op.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/module_op.py @@ -260,3 +260,10 @@ def get_module_tree(tid2tree: Dict[int, OperatorNode]): traverse_node(child, None) return modules + + +def dump_modules(level: int, modules: Iterable[Union[Module, ModuleNode]]): + """testing purpose""" + for module in modules: + print(f"{' ' * level}{module.name.replace('nn.Module: ', '')}_{module.module_id}") + dump_modules(level + 1, module.children) diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/node.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/node.py index 0528491c28752b0358d79e27168d055546bd0310..80860e53661e9a554de6fa9b09e6f13057fca8bb 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/node.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/node.py @@ -6,7 +6,7 @@ from abc import ABC from typing import List, Optional, Tuple from .. import utils -from .tensor_core import TcAllowlist, TcOpAllowlist +from .tensor_core import TC_Allowlist, TC_OP_Allowlist from .trace import (DurationEvent, EventTypes, KernelEvent, ModuleEvent, OperatorEvent, PLProfileEvent, NcclOpNameSet, GlooOpNameSet) @@ -16,12 +16,12 @@ ExcludeOpName = ['DataParallel.forward', 'DistributedDataParallel.forward'] class BaseNode(ABC): - def __init__(self, name: str, start_time: int, end_time: int, node_type: str, tid: int, + def __init__(self, name: str, start_time: int, end_time: int, type: str, tid: int, external_id: Optional[int] = None): self.name = name self.start_time = start_time self.end_time = end_time - self.type = node_type + self.type = type self.tid = tid self.external_id = external_id # For consistency check. @@ -31,7 +31,7 @@ class BaseNode(ABC): kwargs['name'] = event.name kwargs['start_time'] = event.ts kwargs['end_time'] = event.ts + event.duration - kwargs['node_type'] = event.type + kwargs['type'] = event.type kwargs['tid'] = event.tid external_id = getattr(event, 'external_id', None) @@ -84,18 +84,15 @@ class OperatorNode(HostNode): self.callstack = callstack self.self_host_duration = self_host_duration self.self_device_duration = self_device_duration - self.tc_eligible = self.name in TcOpAllowlist + # self.parent_node = None + self.tc_eligible = self.name in TC_OP_Allowlist self.tc_self_duration = 0 # Time of TC kernels launched by this op excluding its children operators. self.tc_total_duration = 0 # Time of TC kernels launched by this op including its children operators. def fill_stats(self): - def sort_key(x): - if x.start_time and x.end_time: - return x.start_time, -x.end_time - else: - return sys.maxsize, -sys.maxsize - 1 self.children.sort(key=lambda x: (x.start_time, -x.end_time)) - self.runtimes.sort(key=sort_key) + self.runtimes.sort(key=lambda x: (x.start_time, -x.end_time) + if x.start_time and x.end_time else (sys.maxsize, -sys.maxsize - 1)) for child in self.children: child.fill_stats() @@ -276,7 +273,7 @@ class DeviceNode(BaseNode): self.block = block self.regs_per_thread = regs_per_thread self.shared_memory = shared_memory - self.tc_used = self.name in TcAllowlist + self.tc_used = self.name in TC_Allowlist self.device_id = device_id @classmethod @@ -309,7 +306,7 @@ def create_operator_node(event: OperatorEvent): def is_operator_node(node: BaseNode): - return bool(isinstance(node, OperatorNode) and node.type == EventTypes.OPERATOR and node.name not in ExcludeOpName + return bool(type(node) is OperatorNode and node.type == EventTypes.OPERATOR and node.name not in ExcludeOpName and not node.name.startswith("Optimizer.")) # exclude Optimizer.zero_grad diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_agg.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_agg.py index d6fdb5903d368e02c4ddb9fc3f29f536696e2a2e..08a3f0d7061dc332a78ec97a6ff085bf1840a47d 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_agg.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_agg.py @@ -49,6 +49,7 @@ def aggregate_ops(op_list: List[OperatorNode], agg.self_device_duration += op.self_device_duration agg.tc_self_duration += op.tc_self_duration agg.tc_total_duration += op.tc_total_duration + return agg agg_dicts: List[Dict[str, OperatorAgg]] = [{} for _ in range(len(keys_func))] for op in op_list: diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_tree.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_tree.py index fe919b29ced02efcea862f5e83ab52704f3f0d09..55e264617d835fb5bf94819b329fdbd2ee1c53f6 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_tree.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/op_tree.py @@ -68,10 +68,9 @@ class OpTreeBuilder: if main_tid: # only append the staled device nodes into main thread self.main_tid = op_list[0].tid - root_node = OpTreeBuilder._build_tree_internal(op_list, zero_rt_list, tid, staled_device_nodes, - is_ascend) + root_node = self._build_tree_internal(op_list, zero_rt_list, tid, staled_device_nodes, is_ascend) else: - root_node = OpTreeBuilder._build_tree_internal(op_list, zero_rt_list, tid, [], is_ascend) + root_node = self._build_tree_internal(op_list, zero_rt_list, tid, [], is_ascend) tid2tree[int(tid)] = root_node return tid2tree @@ -84,8 +83,7 @@ class OpTreeBuilder: # there are multiple tids backward_tid = self._find_backward_tid() tid2len = { - tid: root.end_time - root.start_time - for tid, root in self.tid2tree.items() + tid: root.end_time - root.start_time for tid, root in self.tid2tree.items() if tid != backward_tid or backward_tid is None } # get the maximum length as the main thread @@ -99,8 +97,7 @@ class OpTreeBuilder: return None - @staticmethod - def _build_tree_internal(host_node_list, zero_rt_list, tid, staled_device_nodes, is_ascend): + def _build_tree_internal(self, host_node_list, zero_rt_list, tid, staled_device_nodes, is_ascend): """host_node_list: list of OperatorNode and ProfilerStepNode. zero_rt_list: list of RuntimeNode with external_id=0.""" @@ -113,7 +110,7 @@ class OpTreeBuilder: name='dummy', start_time=None, end_time=None, - node_type=EventTypes.RUNTIME, + type=EventTypes.RUNTIME, tid=0, device_nodes=staled_device_nodes)) dummpy_rt[0].fill_stats() @@ -122,7 +119,7 @@ class OpTreeBuilder: name='CallTreeRoot', start_time=-sys.maxsize - 1, end_time=sys.maxsize, - node_type=EventTypes.PYTHON, + type=EventTypes.PYTHON, tid=tid, runtimes=zero_rt_list + dummpy_rt) # Give the list of RuntimeNode with external_id=0 to root node. node_stack.append(root_node) @@ -133,6 +130,7 @@ class OpTreeBuilder: if node.end_time <= tail_node.end_time or ( is_ascend and math.isclose(node.end_time, tail_node.end_time, rel_tol=1)): tail_node.children.append(node) + # node.parent_node = weakref.ref(tail_node) node_stack.append(node) else: logger.error('Error in input data: ranges on the same thread should not intersect!' @@ -276,7 +274,7 @@ class OpTreeBuilder: if isinstance(node, ModuleNode): backward_node = BackwardNode(name=node.name + '.backward', start_time=None, end_time=None, - node_type='backward', tid=0) + type='backward', tid=0) if parent is None: result.append(backward_node) else: diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/overall_parser.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/overall_parser.py index c646a33b89a673e1738fd38704516df8bfdfaade..e12fbfd1cc502accee83fb44c52b94f8253c64ce 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/overall_parser.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/overall_parser.py @@ -23,8 +23,8 @@ class OverallParser(object): @classmethod def create_from_statistics(cls, statistics: 'OverallParser.Statistics', total_duration: int): costs = [0.] * len(ProfileRole) - for i, cost_range in enumerate(statistics.cost_ranges): - costs[i] = get_ranges_sum(cost_range) + for i in range(len(statistics.cost_ranges)): + costs[i] = get_ranges_sum(statistics.cost_ranges[i]) costs[ProfileRole.Total] = total_duration return cls(costs) @@ -58,8 +58,8 @@ class OverallParser(object): def intersection_with_step(self, step: Tuple[int, int]): cost_ranges: List[List[Tuple[int, int]]] = [] step = [step] - for cost_range in self.cost_ranges: - cost_ranges.append(intersection_ranges_lists(step, cost_range)) + for range in self.cost_ranges: + cost_ranges.append(intersection_ranges_lists(step, range)) return OverallParser.Statistics(cost_ranges) @@ -77,9 +77,6 @@ class OverallParser(object): def aggregate(self, steps: List[Tuple[int, int]], role_ranges: List[List[Tuple[int, int]]]): logger.debug('Overall, statistics') - if len(steps) <= 0: - logger.error('Invalid steps number of 0') - return global_stats = OverallParser.Statistics.create_from_range(steps, role_ranges) if role_ranges[ProfileRole.Kernel]: comm_comp_overlap = intersection_ranges_lists( @@ -92,7 +89,7 @@ class OverallParser(object): for i, step in enumerate(steps): steps_stat = global_stats.intersection_with_step(step) self.steps_costs.append(OverallParser.Costs.create_from_statistics(steps_stat, step[1] - step[0])) - for cost_index, _ in enumerate(self.avg_costs.costs): + for cost_index in range(len(self.avg_costs.costs)): self.avg_costs.costs[cost_index] += self.steps_costs[i].costs[cost_index] comm_costs = OverallParser.StepCommunicationCosts() @@ -110,5 +107,5 @@ class OverallParser(object): self.communication_overlap.append(comm_costs) valid_steps = len(steps) - for i, _ in enumerate(self.avg_costs.costs): + for i in range(len(self.avg_costs.costs)): self.avg_costs.costs[i] /= valid_steps diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/run_generator.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/run_generator.py index 111dc34e81031a33ff9e0a2c03b0375522de24cf..f2ab0452ec733783d880abfebae948f8ec4b3e6e 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/run_generator.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/run_generator.py @@ -2,6 +2,7 @@ # Copyright (c) Microsoft Corporation. All rights reserved. # # Copyright(c) 2023 Huawei Technologies. +# All rights reserved # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -48,140 +49,6 @@ class RunGenerator(object): self.component_curve_data = {} self.process_data = {} - @staticmethod - def check_overlap_data(title): - # csv: step / compute time / communication_not_overlap / overlap / communication / free time - length = len(title) - if length < 5: - return [] - key = ["computing", "overlapped", "communication(not overlapped)", "free"] - get_key = list() - for j in key: - for i in range(length): - if j == title[i]: - get_key.append(i) - if len(get_key) < 4: - return [] - return get_key - - @staticmethod - def get_table_head(name: str, input_shape: str, call_stack: str, value: list): - if name is None: - return {} - temp = { - 'name': name, 'calls': 0, 'host_self_duration': 0, - 'host_total_duration': 0, 'device_self_duration': 0, 'device_total_duration': 0, - 'tc_self_ratio': 0, 'tc_total_ratio': 0, 'tc_eligible': 'Yes' - } - if input_shape is not None: - temp['input_shape'] = input_shape - if call_stack is not None: - temp['call_stack'] = call_stack - else: - temp['has_call_stack'] = False - else: - if call_stack is not None: - temp['call_stack'] = call_stack - else: - temp['has_call_stack'] = False - for vl in iter(value): - if 'has_call_stack' in temp and vl[2]: - temp['has_call_stack'] = True - temp['calls'] += 1 - temp['host_self_duration'] = round(temp['host_self_duration'] + vl[3], 2) - temp['host_total_duration'] = round(temp['host_total_duration'] + vl[4], 2) - temp['device_self_duration'] = round(temp['device_self_duration'] + vl[5], 2) - temp['device_total_duration'] = round(temp['device_total_duration'] + vl[6], 2) - temp['tc_self_ratio'] = round(temp['tc_self_ratio'] + vl[7], 2) - temp['tc_total_ratio'] = round(temp['tc_total_ratio'] + vl[8], 2) - temp['tc_eligible'] = 'Yes' if temp['tc_self_ratio'] > 0 or temp['tc_total_ratio'] > 0 else 'No' - temp['tc_self_ratio'] = 0 if temp['device_self_duration'] == 0 \ - else round(temp['tc_self_ratio'] / temp['device_self_duration'] * 100, 2) - temp['tc_total_ratio'] = 0 if temp['device_total_duration'] == 0 \ - else round(temp['tc_total_ratio'] / temp['device_total_duration'] * 100, 2) - return temp - - @staticmethod - def get_wait_table_by_ops(op, ops): - total_trans = 0 - total_synchronize = 0 - for key, data in op.items(): - if str(key) == "Total Op Info" and data.get("Communication Time Info"): - total_trans += float(data.get("Communication Time Info").get("Transit Time(ms)")) - total_synchronize += float(data.get("Communication Time Info").get("Synchronization Time(ms)")) - continue - k = re.sub(r'[0-9]+', ' ', key).split(" ")[0] - if k not in ops: - ops[k] = [0, 0, 0, 0] - ops[k][0] += 1 - for _, band in data.get("Communication Bandwidth Info").items(): - ops[k][1] += float(band.get("Transit Size(MB)")) - if data.get("Communication Time Info") is not None: - ops[k][2] += data.get("Communication Time Info").get("Elapse Time(ms)") - ops[k][3] += data.get("Communication Time Info").get("Transit Time(ms)") - return total_trans, total_synchronize - - @staticmethod - def trans_shape(shape: str): - result = list() - if ';' not in shape: - result.append('[' + shape.strip() + ']') - return '[' + ', '.join(result) + ']' - if len(shape.strip()) <= 1: - result.append('[]') - return '[' + ', '.join(result) + ']' - shape_spl = shape.split("\n") - for shape_div in iter(shape_spl): - result.append('[' + str(shape_div.replace(';', '')) + ']') - return '[' + ', '.join(result) + ']' - - @staticmethod - def get_process_peaks_and_devices_type(process_data: dict, memory_metric: str): - devices_type = [] - peaks = {} - for device in process_data: - devices_type.append(device) - reserved_list = process_data.get(device).get('Allocated') - if reserved_list is not None: - max_reserved = 0 - for array_value in reserved_list: - max_reserved = max(array_value[1], max_reserved) - peaks[device] = f'Peak Memory Usage: {max_reserved:.1f}{memory_metric}' - return devices_type, peaks - - @staticmethod - def get_pta_ge_peaks_and_devices_type(process_data: dict, memory_metric: str): - devices_type = [] - peaks = {} - for device in process_data: - devices_type.append(device) - peaks[device] = 'Reserved Peak Memory Usage:' - for component in process_data.get(device): - max_reserved = 0 - for array_value in process_data.get(device).get(component): - max_reserved = max(array_value[2], max_reserved) - peaks[device] += f' {component}-{max_reserved:.1f}{memory_metric} |' - return devices_type, peaks - - @staticmethod - def check_csv_columns(columns: list, column_idxs: dict): - column_exist_count = 0 - for idx, column in enumerate(columns): - if column in column_idxs: - column_idxs[column] = idx - column_exist_count += 1 - return column_idxs.values(), column_exist_count - - @staticmethod - def get_csv_data(path: str): - if path is None: - return [] - datas = [] - with open(path, encoding='utf-8-sig') as f: - for row in csv.reader(f, skipinitialspace=True): - datas.append(row) - return datas - def generate_run_profile(self): profile_run = RunProfile(self.worker, self.span) profile_run.is_pytorch_lightning = self.profile_data.is_pytorch_lightning @@ -218,7 +85,7 @@ class RunGenerator(object): profile_run.gpu_metrics = self.profile_data.gpu_metrics_parser.get_gpu_metrics() - gpu_infos = {gpu_id: RunGenerator.get_gpu_info(self.profile_data.device_props, gpu_id) + gpu_infos = {gpu_id: RunGenerator._get_gpu_info(self.profile_data.device_props, gpu_id) for gpu_id in self.profile_data.gpu_metrics_parser.gpu_ids} gpu_infos = {gpu_id: gpu_info for gpu_id, gpu_info in gpu_infos.items() if gpu_info is not None} @@ -273,11 +140,11 @@ class RunGenerator(object): def _npu_get_overlap(self): path = self.profile_data.distributed_csv_path overlap_by_steps: Dict[str, List[float]] = OrderedDict() - data = RunGenerator.get_csv_data(path) + data = RunGenerator._get_csv_data(path) if len(data) <= 1: return overlap_by_steps title = [x.lower() for x in data[0]] - title_name = RunGenerator.check_overlap_data(title) + title_name = RunGenerator._check_overlap_data(title) if not title_name: logger.error(f"Incomplete content of CSV file {path}.") return overlap_by_steps @@ -287,10 +154,8 @@ class RunGenerator(object): key = step[0] if key == '': key = 'all' - overlap = [ - float(step[int(title_name[0])]), float(step[int(title_name[1])]), - float(step[int(title_name[2])]), float(step[int(title_name[3])]) - ] + overlap = [float(step[int(title_name[0])]), float(step[int(title_name[1])]), + float(step[int(title_name[2])]), float(step[int(title_name[3])])] if key in overlap_by_steps: overlap_by_steps[key] = list(np.add(overlap, overlap_by_steps[key])) else: @@ -299,6 +164,22 @@ class RunGenerator(object): logger.error(f'File "{path}" has wrong data format in row {idx + 2} and will skip it.') return overlap_by_steps + @staticmethod + def _check_overlap_data(title): + # csv: step / compute time / communication_not_overlap / overlap / communication / free time + length = len(title) + if length < 5: + return [] + key = ["computing", "overlapped", "communication(not overlapped)", "free"] + get_key = list() + for j in key: + for i in range(length): + if j == title[i]: + get_key.append(i) + if len(get_key) < 4: + return [] + return get_key + def _npu_get_wait_table(self): path = self.profile_data.communication_json_path if not io.exists(path): @@ -333,9 +214,9 @@ class RunGenerator(object): collection_ops = data.get("collective") p2p_ops = data.get("p2p") try: - coll_total_trans, coll_total_synchronize = RunGenerator.get_wait_table_by_ops(collection_ops, - table_ops) - p2p_total_trans, p2p_total_synchronize = RunGenerator.get_wait_table_by_ops(p2p_ops, table_ops) + coll_total_trans, coll_total_synchronize = RunGenerator._get_wait_table_by_ops(collection_ops, + table_ops) + p2p_total_trans, p2p_total_synchronize = RunGenerator._get_wait_table_by_ops(p2p_ops, table_ops) except ValueError: logger.error(f'Time and size info must be number, please check file "{path}"') return wait_by_step, table_ops @@ -346,21 +227,39 @@ class RunGenerator(object): } return wait_by_step, table_ops + @staticmethod + def _get_wait_table_by_ops(op, ops): + total_trans = 0 + total_synchronize = 0 + for key, data in op.items(): + if str(key) == "Total Op Info" and data.get("Communication Time Info"): + total_trans += float(data.get("Communication Time Info").get("Transit Time(ms)")) + total_synchronize += float(data.get("Communication Time Info").get("Synchronization Time(ms)")) + continue + k = re.sub(r'[0-9]+', ' ', key).split(" ")[0] + if k not in ops: + ops[k] = [0, 0, 0, 0] + ops[k][0] += 1 + for _, band in data.get("Communication Bandwidth Info").items(): + ops[k][1] += float(band.get("Transit Size(MB)")) + if data.get("Communication Time Info") is not None: + ops[k][2] += data.get("Communication Time Info").get("Elapse Time(ms)") + ops[k][3] += data.get("Communication Time Info").get("Transit Time(ms)") + return total_trans, total_synchronize + def _get_operator_details_by_name(self): operator_by_name = defaultdict(list) operator_by_name_and_input_shapes = defaultdict(list) path = self.profile_data.operator_path - datas = RunGenerator.get_csv_data(path) + datas = RunGenerator._get_csv_data(path) if len(datas) <= 1: return operator_by_name, operator_by_name_and_input_shapes for idx, ls in enumerate(datas[1:]): try: - temp: list = [ - ls[0], RunGenerator.trans_shape(str(ls[1])), ls[2], float(ls[3]), float(ls[4]), - float(ls[5]), float(ls[6]), float(ls[7]), float(ls[8]) - ] + temp: list = [ls[0], RunGenerator._trans_shape(str(ls[1])), ls[2], float(ls[3]), float(ls[4]), + float(ls[5]), float(ls[6]), float(ls[7]), float(ls[8])] operator_by_name[ls[0]].append(temp) - key = "{}###{}".format(str(ls[0]), RunGenerator.trans_shape(str(ls[1]))) + key = "{}###{}".format(str(ls[0]), RunGenerator._trans_shape(str(ls[1]))) operator_by_name_and_input_shapes[key].append(temp) except (ValueError, IndexError): logger.error(f'File "{path}" has wrong data format in row {idx + 2} and will skip it.') @@ -382,10 +281,8 @@ class RunGenerator(object): def _get_operator_pie(self, group_by_input_shape=False): data = {} - tag = { - 'device_self_time': 'Device Self Time (us)', 'device_total_time': 'Device Total Time (us)', - 'host_self_time': 'Host Self Time (us)', 'host_total_time': 'Host Total Time (us)' - } + tag = {'device_self_time': 'Device Self Time (us)', 'device_total_time': 'Device Total Time (us)', + 'host_self_time': 'Host Self Time (us)', 'host_total_time': 'Host Total Time (us)'} for key, value in tag.items(): data[key] = { 'title': value, @@ -410,9 +307,9 @@ class RunGenerator(object): if group_by_input_shape: name = name_key.split("###")[0] shape = name_key.split("###")[1] - result.append(RunGenerator.get_table_head(name, shape, None, values)) + result.append(RunGenerator._get_table_head(name, shape, None, values)) else: - result.append(RunGenerator.get_table_head(name_key, None, None, values)) + result.append(RunGenerator._get_table_head(name_key, None, None, values)) return result def _set_name_callstack_data(self, group_by_input_shape=False): @@ -447,10 +344,24 @@ class RunGenerator(object): 'data': [] } for callstack_key, value in values.items(): - table['data'].append(RunGenerator.get_table_head(name, shape, callstack_key, value)) + table['data'].append(RunGenerator._get_table_head(name, shape, callstack_key, value)) result[name_key] = table return result + @staticmethod + def _trans_shape(shape: str): + result = list() + if ';' not in shape: + result.append('[' + shape.strip() + ']') + return '[' + ', '.join(result) + ']' + if len(shape.strip()) <= 1: + result.append('[]') + return '[' + ', '.join(result) + ']' + shape_spl = shape.split("\n") + for shape_div in iter(shape_spl): + result.append('[' + str(shape_div.replace(';', '')) + ']') + return '[' + ', '.join(result) + ']' + def _get_call_stack_by_name(self): result = dict() name_callstack_data = self._set_name_callstack_data() @@ -467,10 +378,45 @@ class RunGenerator(object): 'data': [] } for callstack_key, value in values.items(): - table['data'].append(RunGenerator.get_table_head(name_key, None, callstack_key, value)) + table['data'].append(RunGenerator._get_table_head(name_key, None, callstack_key, value)) result[name_key] = table return result + @staticmethod + def _get_table_head(name: str, input_shape: str, call_stack: str, value: list): + if name is None: + return {} + temp = {'name': name, 'calls': 0, 'host_self_duration': 0, + 'host_total_duration': 0, 'device_self_duration': 0, 'device_total_duration': 0, + 'tc_self_ratio': 0, 'tc_total_ratio': 0, 'tc_eligible': 'Yes'} + if input_shape is not None: + temp['input_shape'] = input_shape + if call_stack is not None: + temp['call_stack'] = call_stack + else: + temp['has_call_stack'] = False + else: + if call_stack is not None: + temp['call_stack'] = call_stack + else: + temp['has_call_stack'] = False + for vl in iter(value): + if 'has_call_stack' in temp and vl[2]: + temp['has_call_stack'] = True + temp['calls'] += 1 + temp['host_self_duration'] = round(temp['host_self_duration'] + vl[3], 2) + temp['host_total_duration'] = round(temp['host_total_duration'] + vl[4], 2) + temp['device_self_duration'] = round(temp['device_self_duration'] + vl[5], 2) + temp['device_total_duration'] = round(temp['device_total_duration'] + vl[6], 2) + temp['tc_self_ratio'] = round(temp['tc_self_ratio'] + vl[7], 2) + temp['tc_total_ratio'] = round(temp['tc_total_ratio'] + vl[8], 2) + temp['tc_eligible'] = 'Yes' if temp['tc_self_ratio'] > 0 or temp['tc_total_ratio'] > 0 else 'No' + temp['tc_self_ratio'] = 0 if temp['device_self_duration'] == 0 \ + else round(temp['tc_self_ratio'] / temp['device_self_duration'] * 100, 2) + temp['tc_total_ratio'] = 0 if temp['device_total_duration'] == 0 \ + else round(temp['tc_total_ratio'] / temp['device_total_duration'] * 100, 2) + return temp + def _get_memory_event(self, peak_memory_events: dict): display_columns = ('Name', 'Size(KB)', 'Allocation Time(us)', 'Release Time(us)', 'Duration(us)') path = self.profile_data.memory_operator_path @@ -484,16 +430,10 @@ class RunGenerator(object): 'columns': [], 'rows': {} } - datas = RunGenerator.get_csv_data(path) - if len(datas) < 1: - return { - 'operator': table, - 'component': peak_memory_events - } - device_type_form_idx = -1 + datas = RunGenerator._get_csv_data(path) for idx, column in enumerate(datas[0]): if column == 'Device Type': - device_type_form_idx = idx + self.device_type_form_idx = idx if column in display_columns: if column == 'Name': table['columns'].append({'name': column, 'type': 'string'}) @@ -504,22 +444,20 @@ class RunGenerator(object): table['columns'].append({'name': column.replace('(us)', '(ms)'), 'type': 'number'}) required_column_idxs = {key: -1 for key in display_columns} (name_idx, size_idx, allocation_idx, release_idx, duration_idx), column_exist_count = \ - RunGenerator.check_csv_columns(datas[0], required_column_idxs) - if device_type_form_idx < 0 or column_exist_count < len(required_column_idxs): - raise ValueError('Required column is missing in file "operator_memory.csv"') + RunGenerator._check_csv_columns(datas[0], required_column_idxs) + if column_exist_count < len(required_column_idxs): + logger.error('Required column is missing in file "operator_memory.csv"') for idx, ls in enumerate(datas[1:]): - device_type = ls[device_type_form_idx] + device_type = ls[self.device_type_form_idx] # convert time metric 'us' to 'ms' # some operators may not have the following columns try: - nums = [ - ls[name_idx] if ls[name_idx] else '', abs(float(ls[size_idx])), + nums = [ls[name_idx] if ls[name_idx] else '', abs(float(ls[size_idx])), round((float(ls[allocation_idx]) - self.profile_data.profiler_start_ts) / 1000, 3) if ls[ allocation_idx] else None, round((float(ls[release_idx]) - self.profile_data.profiler_start_ts) / 1000, 3) if ls[ release_idx] else None, - round(float(ls[duration_idx]) / 1000, 3) if ls[duration_idx] else None - ] + round(float(ls[duration_idx]) / 1000, 3) if ls[duration_idx] else None] display_datas[device_type].append(nums) except ValueError: logger.error(f'File "{path}" has wrong data format in row {idx + 2} and will skip it.') @@ -536,8 +474,8 @@ class RunGenerator(object): time_metric: str = 'ms' memory_metric: str = 'MB' cano = Canonicalizer(time_metric, memory_metric) - process_devices_type, process_peaks = RunGenerator.get_process_peaks_and_devices_type(self.process_data, - memory_metric) + process_devices_type, process_peaks = RunGenerator._get_process_peaks_and_devices_type(self.process_data, + memory_metric) total_result = { 'metadata': { 'devices': process_devices_type, @@ -564,8 +502,8 @@ class RunGenerator(object): if len(total_result['columns'][device]) > 0: total_result['columns'][device].insert(0, {'name': f'Time ({cano.time_metric})', 'type': 'number', 'tooltip': 'Time since profiler starts.'}) - pta_ge_devices_type, pta_ge_peaks = RunGenerator.get_pta_ge_peaks_and_devices_type(self.component_curve_data, - memory_metric) + pta_ge_devices_type, pta_ge_peaks = RunGenerator._get_pta_ge_peaks_and_devices_type(self.component_curve_data, + memory_metric) component_curve_result = { 'metadata': { 'devices': pta_ge_devices_type, @@ -609,11 +547,48 @@ class RunGenerator(object): 'ptaGe': component_curve_result } + @staticmethod + def _get_process_peaks_and_devices_type(process_data: dict, memory_metric: str): + devices_type = [] + peaks = {} + for device in process_data: + devices_type.append(device) + reserved_list = process_data.get(device).get('Allocated') + if reserved_list is not None: + max_reserved = 0 + for array_value in reserved_list: + max_reserved = max(array_value[1], max_reserved) + peaks[device] = f'Peak Memory Usage: {max_reserved:.1f}{memory_metric}' + return devices_type, peaks + + @staticmethod + def _get_pta_ge_peaks_and_devices_type(process_data: dict, memory_metric: str): + devices_type = [] + peaks = {} + for device in process_data: + devices_type.append(device) + peaks[device] = 'Reserved Peak Memory Usage:' + for component in process_data.get(device): + max_reserved = 0 + for array_value in process_data.get(device).get(component): + max_reserved = max(array_value[2], max_reserved) + peaks[device] += f' {component}-{max_reserved:.1f}{memory_metric} |' + return devices_type, peaks + + @staticmethod + def _check_csv_columns(columns: list, column_idxs: dict): + column_exist_count = 0 + for idx, column in enumerate(columns): + if column in column_idxs: + column_idxs[column] = idx + column_exist_count += 1 + return column_idxs.values(), column_exist_count + def _handle_memory_data(self): process_data = defaultdict() pta_or_ge_data = defaultdict() path = self.profile_data.memory_curve_path - datas = RunGenerator.get_csv_data(path) + datas = RunGenerator._get_csv_data(path) required_column_idxs = { 'Component': -1, 'Device Type': -1, @@ -622,7 +597,7 @@ class RunGenerator(object): 'Total Allocated(MB)': -1 } (tag_type_idx, device_type_idx, time_idx, reserved_idx, allocated_idx), column_exist_count = \ - RunGenerator.check_csv_columns(datas[0], required_column_idxs) + RunGenerator._check_csv_columns(datas[0], required_column_idxs) if column_exist_count < len(required_column_idxs): logger.error('Required column is missing in file "memory_record.csv"') else: @@ -640,10 +615,8 @@ class RunGenerator(object): pta_or_ge_data.setdefault(device_type, {}).setdefault(ls[tag_type_idx], []).append( line_chart_data) elif ls[tag_type_idx] in ('PTA', 'GE'): - line_chart_data = [ - time_column, round(float(ls[allocated_idx]), 3), - round(float(ls[reserved_idx]), 3) - ] + line_chart_data = [time_column, round(float(ls[allocated_idx]), 3), + round(float(ls[reserved_idx]), 3)] pta_or_ge_data.setdefault(device_type, {}).setdefault(ls[tag_type_idx], []).append( line_chart_data) except ValueError: @@ -663,7 +636,7 @@ class RunGenerator(object): } peak_memory_rows = defaultdict(list) path = self.profile_data.memory_component_path - component_datas = RunGenerator.get_csv_data(path) + component_datas = RunGenerator._get_csv_data(path) if component_datas: required_column_idxs = { 'Component': -1, @@ -672,7 +645,7 @@ class RunGenerator(object): 'Device': -1 } (tag_type_idx, time_idx, reserved_idx, device_type_idx), column_exist_count = \ - RunGenerator.check_csv_columns(component_datas[0], required_column_idxs) + RunGenerator._check_csv_columns(component_datas[0], required_column_idxs) if column_exist_count < len(required_column_idxs): logger.error(f'Required column is missing in file "{path}"') else: @@ -718,16 +691,14 @@ class RunGenerator(object): '{}: {}us
' 'Percentage: {}%' '') - percentage = 0.0 if costs.costs[ProfileRole.Total] == 0 else round( - 100 * part_cost / costs.costs[ProfileRole.Total], 2) + percentage = round(100 * part_cost / costs.costs[ProfileRole.Total], 2) return format_str.format(step_name, costs.costs[ProfileRole.Total], part_name, part_cost, percentage) def build_avg_cost_dict(part_name: str, part_cost: float): - profiler_total_cost = self.profile_data.avg_costs.costs[ProfileRole.Total] cost_dict = {'name': part_name, 'description': '', 'value': round(part_cost), - 'extra': 0.0 if profiler_total_cost == 0 else round(100 * part_cost / profiler_total_cost, 2)} + 'extra': round(100 * part_cost / self.profile_data.avg_costs.costs[ProfileRole.Total], 2)} return cost_dict show_gpu = (self.profile_data.has_runtime @@ -746,7 +717,8 @@ class RunGenerator(object): data['steps']['columns'].extend(['DataLoader', 'CPU Exec', 'Other']) data['steps']['rows'] = [] - for i, costs in enumerate(self.profile_data.steps_costs): + for i in range(len(self.profile_data.steps_costs)): + costs = self.profile_data.steps_costs[i] step_name = self.profile_data.steps_names[i] row = [{'value': step_name}] if show_gpu: @@ -791,11 +763,9 @@ class RunGenerator(object): build_avg_cost_dict('Other', self.profile_data.avg_costs.costs[ProfileRole.Other]) ]) - data['performance'] = [ - {'name': 'Average Step Time', 'description': '', + data['performance'] = [{'name': 'Average Step Time', 'description': '', 'value': round(self.profile_data.avg_costs.costs[ProfileRole.Total]), - 'extra': 100, 'children': avg_costs} - ] + 'extra': 100, 'children': avg_costs}] if len(self.profile_data.recommendations) == 0: html = '
  • N/A
  • ' @@ -945,8 +915,7 @@ class RunGenerator(object): }, 'data': table } - table['columns'] = [ - {'type': 'string', 'name': 'Name'}, + table['columns'] = [{'type': 'string', 'name': 'Name'}, {'type': 'string', 'name': 'Operator'}, {'type': 'string', 'name': 'Grid'}, {'type': 'string', 'name': 'Block'}, @@ -955,8 +924,7 @@ class RunGenerator(object): {'type': 'string', 'name': 'Kernel Uses Tensor Cores', 'tooltip': consts.TOOLTIP_KERNEL_USES_TC}, {'type': 'string', 'name': 'Op is Tensor Cores eligible', - 'tooltip': consts.TOOLTIP_KERNEL_OP_TC_ELIGIBLE} - ] + 'tooltip': consts.TOOLTIP_KERNEL_OP_TC_ELIGIBLE}] col_names = ['Calls', 'Total Duration (us)', 'Mean Duration (us)', 'Max Duration (us)', 'Min Duration (us)'] for column in col_names: table['columns'].append({'type': 'number', 'name': column}) @@ -967,16 +935,14 @@ class RunGenerator(object): kernel_list: List[KernelAggByNameOp] = sorted( self.profile_data.kernel_list_groupby_name_op, key=lambda x: x.total_duration, reverse=True) for agg_by_name_op in kernel_list: - kernel_op_row = [ - agg_by_name_op.name, agg_by_name_op.op_name, + kernel_op_row = [agg_by_name_op.name, agg_by_name_op.op_name, str(agg_by_name_op.grid), str(agg_by_name_op.block), str(agg_by_name_op.regs_per_thread or '0'), str(agg_by_name_op.shared_memory or '0'), 'Yes' if agg_by_name_op.tc_used else 'No', 'Yes' if agg_by_name_op.op_tc_eligible else 'No', agg_by_name_op.calls, agg_by_name_op.total_duration, round(agg_by_name_op.avg_duration), - agg_by_name_op.max_duration, agg_by_name_op.min_duration - ] + agg_by_name_op.max_duration, agg_by_name_op.min_duration] if self.profile_data.gpu_metrics_parser.has_blocks_per_sm: kernel_op_row.append(round(agg_by_name_op.avg_blocks_per_sm, 2)) if self.profile_data.gpu_metrics_parser.has_occupancy: @@ -999,11 +965,9 @@ class RunGenerator(object): }, 'data': table } - table['columns'] = [ - {'type': 'string', 'name': 'Name'}, + table['columns'] = [{'type': 'string', 'name': 'Name'}, {'type': 'string', 'name': 'Tensor Cores Used', - 'tooltip': consts.TOOLTIP_KERNEL_USES_TC} - ] + 'tooltip': consts.TOOLTIP_KERNEL_USES_TC}] columns = ['count', 'sum', 'mean', 'max', 'min'] round_digits = [0, 0, 0, 0, 0] if self.profile_data.gpu_metrics_parser.has_blocks_per_sm: @@ -1047,8 +1011,7 @@ class RunGenerator(object): {'type': 'number', 'name': 'Total Durations(us)'}, {'type': 'number', 'name': 'Min Durations(us)'}, {'type': 'number', 'name': 'Avg Durations(us)'}, - {'type': 'number', 'name': 'Max Durations(us)'} - ] + {'type': 'number', 'name': 'Max Durations(us)'}] table['rows'] = [] for key, value in self.statistic_data.items(): temp = [key] @@ -1074,14 +1037,14 @@ class RunGenerator(object): 'data': table } path = self.profile_data.kernel_file_path - datas = RunGenerator.get_csv_data(path) + datas = RunGenerator._get_csv_data(path) required_column_idxs = { 'Name': -1, 'Duration(us)': -1, 'Accelerator Core': -1 } (name_idx, duration_idx, core_type_idx), column_exist_count = \ - RunGenerator.check_csv_columns(datas[0], required_column_idxs) + RunGenerator._check_csv_columns(datas[0], required_column_idxs) if column_exist_count < 3: logger.error('Required column is missing in file "kernel_details.csv"') else: @@ -1095,6 +1058,16 @@ class RunGenerator(object): table['rows'] = datas[1:] return result + @staticmethod + def _get_csv_data(path: str): + if path is None: + return [] + datas = [] + with open(path, encoding='utf-8-sig') as f: + for row in csv.reader(f, skipinitialspace=True): + datas.append(row) + return datas + def _generate_tc_pie_npu(self): pie = {'columns': [{'type': 'string', 'name': 'name'}, {'type': 'number', 'name': 'value'}], 'rows': []} for key, val in self.accelerator_data.items(): @@ -1103,7 +1076,7 @@ class RunGenerator(object): return data @staticmethod - def get_gpu_info(device_props, gpu_id): + def _get_gpu_info(device_props, gpu_id): if (device_props is None) or (gpu_id >= len(device_props)) or (gpu_id < 0): return None @@ -1144,17 +1117,12 @@ class RunGenerator(object): self.accelerator_data[call_type] = call_duration if self.statistic_data.get(call_name) is not None: - temp = self.statistic_data.get(call_name, {}) - temp['Max'] = max(temp.get('Max', 0), call_duration) - temp['Min'] = min(temp.get('Min', 0), call_duration) - temp['Total'] = round(temp.get('Total', 0) + call_duration, 2) - temp['Calls'] = temp.get('Calls', 0) + 1 - if temp['Calls'] == 0: - logger.error( - f'temp["Calls"] is zero which can not be divisor.') - temp['Average'] = 0 - else: - temp['Average'] = round(temp['Total'] / temp['Calls'], 2) + temp = self.statistic_data[call_name] + temp['Max'] = max(temp['Max'], call_duration) + temp['Min'] = min(temp['Min'], call_duration) + temp['Total'] = round(temp['Total'] + call_duration, 2) + temp['Calls'] += 1 + temp['Average'] = round(temp['Total'] / temp['Calls'], 2) else: self.statistic_data[call_name] = { 'Calls': 1, @@ -1204,7 +1172,7 @@ class DistributedRunGenerator(object): process_id = 'Process ' + str(process_id) result[node][process_id] = OrderedDict() for used_device in data.used_devices: - gpu_info = RunGenerator.get_gpu_info(data.device_props, used_device) + gpu_info = RunGenerator._get_gpu_info(data.device_props, used_device) if gpu_info is not None: result[node][process_id]['GPU' + str(used_device)] = gpu_info @@ -1255,9 +1223,7 @@ class DistributedRunGenerator(object): round(costs.other, 3) ] steps_to_overlap['all'][data.worker] = [ - sum(x) - for x in zip(steps_to_overlap['all'][data.worker], steps_to_overlap[step_name][data.worker]) - ] + sum(x) for x in zip(steps_to_overlap['all'][data.worker], steps_to_overlap[step_name][data.worker])] @staticmethod def _get_npu_overlap_data(data, steps_to_overlap): @@ -1269,9 +1235,7 @@ class DistributedRunGenerator(object): steps_to_overlap[k][data.worker] = list( [round(v[0] - v[1], 3), round(v[1], 3), round(v[2], 3), round(v[3], 3)]) steps_to_overlap['all'][data.worker] = [ - sum(x) - for x in zip(steps_to_overlap['all'][data.worker], steps_to_overlap[k][data.worker]) - ] + sum(x) for x in zip(steps_to_overlap['all'][data.worker], steps_to_overlap[k][data.worker])] @staticmethod def _get_npu_wait_data(data, steps_to_wait): @@ -1286,9 +1250,7 @@ class DistributedRunGenerator(object): wait = round(v.get('Synchronize') * 1000, 3) # 1ms = 1000us steps_to_wait[k][data.worker] = list([trans, wait]) steps_to_wait['all'][data.worker] = [ - sum(x) - for x in zip(steps_to_wait['all'][data.worker], steps_to_wait[k][data.worker]) - ] + sum(x) for x in zip(steps_to_wait['all'][data.worker], steps_to_wait[k][data.worker])] steps_to_wait['all'][data.worker] = [x / step_number for x in steps_to_wait['all'][data.worker]] @staticmethod @@ -1302,9 +1264,7 @@ class DistributedRunGenerator(object): round(comm_stats[0] - comm_stats[1], 3) ] steps_to_wait['all'][data.worker] = [ - sum(x) - for x in zip(steps_to_wait['all'][data.worker], steps_to_wait[step][data.worker]) - ] + sum(x) for x in zip(steps_to_wait['all'][data.worker], steps_to_wait[step][data.worker])] steps_to_wait['all'][data.worker] = [int(x / step_number) for x in steps_to_wait['all'][data.worker]] def _generate_wait_graph(self): @@ -1392,11 +1352,10 @@ class DistributedRunGenerator(object): op, stats[0], round(stats[1], 3), - - round(stats[1] / stats[0] if stats[0] != 0 else 0), + round(stats[1] / stats[0] if stats != 0 else 0), round(stats[2], 3), - round(stats[2] / stats[0] if stats[0] != 0 else 0), + round(stats[2] / stats[0] if stats != 0 else 0), round(stats[3], 3), - round(stats[3] / stats[0] if stats[0] != 0 else 0) + round(stats[3] / stats[0] if stats != 0 else 0) ] table['rows'].append(row) diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/tensor_core.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/tensor_core.py index cc53ab217f0ee6f88817c51da6ba46da68df4e28..3a69cf70b881acc4588682fc4440cb5534541eb1 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/tensor_core.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/tensor_core.py @@ -1,13 +1,14 @@ # ------------------------------------------------------------------------- # Copyright (c) Microsoft Corporation. All rights reserved. # ------------------------------------------------------------------------- -class TcAllowlistMeta(type): - # Enable grammar sugar as 'v in TcAllowlist'. +class TC_Allowlist_Meta(type): + # Enable grammar sugar as 'v in TC_Allowlist'. def __contains__(cls, item): return cls.__contains__(item) -class TcAllowlist(metaclass=TcAllowlistMeta): +class TC_Allowlist(metaclass=TC_Allowlist_Meta): + # Refer to https://github.com/NVIDIA/PyProf/blob/fd1b2902e3306119eee40ba6b6e8b2f816920c29/pyprof/prof/tc.py#L19 allowlist = ['h884', 's884', 'h1688', 's1688', 'hmma', 'i8816', '16816', 'dgrad_1x1_stride_2x2', 'first_layer_wgrad_kernel', 'conv1x1', 'conv2d_c1_k1', 'direct_group', 'xmma_implicit_gemm', @@ -23,7 +24,8 @@ class TcAllowlist(metaclass=TcAllowlistMeta): return False -class TcOpAllowlist(metaclass=TcAllowlistMeta): +class TC_OP_Allowlist(metaclass=TC_Allowlist_Meta): + # Refer to https://github.com/pytorch/pytorch/blob/69b2bf70f9c0e591ce5e566afa59e19618031ead/aten/src/ATen/autocast_mode.cpp#L290-L351 # noqa: E501 allowlist = ['aten::_convolution', 'aten::conv1d', 'aten::conv2d', 'aten::conv3d', 'aten::conv_tbc', 'aten::conv_transpose1d', 'aten::conv_transpose2d', 'aten::conv_transpose3d', 'aten::convolution', 'aten::cudnn_convolution', 'aten::cudnn_convolution_transpose', diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/trace.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/trace.py index ea09f79666bd184956469f48fc7922854394940d..e76f8b18dd80a9f12a867c9395de6a96a39bc2c1 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/trace.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/profiler/trace.py @@ -1,13 +1,13 @@ # ------------------------------------------------------------------------- # Copyright (c) Microsoft Corporation. All rights reserved. # -------------------------------------------------------------------------- -__all__ = ['EventTypes', 'create_event'] - from enum import IntEnum from typing import Dict, Optional from .. import utils +__all__ = ['EventTypes', 'create_event'] + logger = utils.get_logger() NcclOpNameSet = ['nccl:broadcast', 'nccl:reduce', 'nccl:all_reduce', 'nccl:all_gather', 'nccl:reduce_scatter'] @@ -56,8 +56,8 @@ EventTypeMap = { class BaseEvent(object): - def __init__(self, event_type, data): - self.type: str = event_type + def __init__(self, type, data): + self.type: str = type self.name: str = data.get('name') self.ts: int = data.get('ts') self.pid: int = data.get('pid') @@ -66,8 +66,8 @@ class BaseEvent(object): class DurationEvent(BaseEvent): - def __init__(self, event_type, data): - super().__init__(event_type, data) + def __init__(self, type, data): + super().__init__(type, data) self.category: str = data.get('cat', '') self.duration: int = data.get('dur') @@ -79,8 +79,8 @@ class DurationEvent(BaseEvent): class KernelEvent(DurationEvent): - def __init__(self, event_type, data): - super().__init__(event_type, data) + def __init__(self, type, data): + super().__init__(type, data) self.occupancy = self.args.get('est. achieved occupancy %') self.blocks_per_sm = self.args.get('blocks per SM') self.grid = self.args.get('grid') @@ -91,8 +91,8 @@ class KernelEvent(DurationEvent): class OperatorEvent(DurationEvent): - def __init__(self, event_type, data): - super().__init__(event_type, data) + def __init__(self, type, data): + super().__init__(type, data) self.callstack = self.args.get('Call stack') self.input_type = self.args.get('Input type') @@ -111,8 +111,8 @@ class ProfilerStepEvent(OperatorEvent): class MemoryEvent(BaseEvent): - def __init__(self, event_type, data): - super().__init__(event_type, data) + def __init__(self, type, data): + super().__init__(type, data) self.scope: str = data.get('s', '') self.device_id: int = self.args.get('Device Id') dtype = self.args.get('Device Type') @@ -142,8 +142,8 @@ class MemoryEvent(BaseEvent): class PythonFunctionEvent(DurationEvent): - def __init__(self, event_type, data): - super().__init__(event_type, data) + def __init__(self, type, data): + super().__init__(type, data) self.python_id: int = self.args.get('Python id') self.python_parent_id: int = self.args.get('Python parent id') diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/run.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/run.py index 9e30f225244280df7acfd7d2deb95a40208cfa54..2f719fb0c6139e498f51afdcad2497293e90ad1e 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/run.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/run.py @@ -77,7 +77,7 @@ class Run(object): if worker is not None: if self.span_view.get(worker) is None: return None - spans = self.span_view.get(worker, []) + spans = self.span_view[worker] else: spans = [s for _, s in self.profiles.keys()] diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/static/trace_embedding.html b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/static/trace_embedding.html index 462d2c395f81d932fbf0196ccc53f4b0ece6e93a..bb84da0d0c0cb92d51a2d6ab1cb92ce308b23241 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/static/trace_embedding.html +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/static/trace_embedding.html @@ -11,7 +11,7 @@ found in the LICENSE file. 'use strict'; function onTraceViewerImportFail() { - document.addEventListener('DOMContentLoaded', () => { + document.addEventListener('DOMContentLoaded', function () { document.body.textContent = 'tracing/bin/trace_viewer_full.html is missing. ' + 'Run vulcanize_trace_viewer from $TRACE_VIEWER and reload.'; @@ -52,11 +52,12 @@ found in the LICENSE file. // warning. window.__hideTraceViewerPolyfillWarning = true; - window.addEventListener('message', event => { - const data = event.data || {}; - name = data.name || 'unknown'; - onResult(data.data); - }); + window.addEventListener("message", event => { + const data = event.data || {} + console.log(data) + name = data.name || 'unknown' + onResult(data.data) + }) function onResult(result) { model = new tr.Model(); @@ -77,7 +78,7 @@ found in the LICENSE file. overlay.visible = true; } - document.addEventListener('WebComponentsReady', () => { + document.addEventListener('WebComponentsReady', function () { const container = document.createElement('track-view-container'); container.id = 'track_view_container'; @@ -90,7 +91,7 @@ found in the LICENSE file. Polymer.dom(document.body).appendChild(viewer); if (window.parent) { - window.parent.postMessage({ msg: 'ready' }, window.origin); + window.parent.postMessage({ msg: 'ready' }, '*') } }); }()); diff --git a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/utils.py b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/utils.py index 5991cf2b33d1e818e6876c8d7550fbb6c87cdaa3..8f4189d765e6e9233478d800ab2d1424597af254 100644 --- a/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/utils.py +++ b/plugins/tensorboard-plugins/tb_plugin/torch_tb_profiler/utils.py @@ -23,15 +23,14 @@ import math import os import time from contextlib import contextmanager +from math import pow from . import consts -predefined_logging_level = ('CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET') - def get_logging_level(): log_level = os.environ.get('TORCH_PROFILER_LOG_LEVEL', 'INFO').upper() - if log_level not in predefined_logging_level: + if log_level not in logging._levelToName.values(): log_level = logging.getLevelName(logging.INFO) return log_level @@ -77,6 +76,7 @@ class Canonicalizer: input_time_metric='us', input_memory_metric='B'): # raw timestamp is in microsecond + # https://github.com/pytorch/pytorch/blob/v1.9.0/torch/csrc/autograd/profiler_kineto.cpp#L33 time_metric_to_factor = { 'us': 1, 'ms': 1e3, @@ -84,10 +84,10 @@ class Canonicalizer: } # raw memory is in bytes memory_metric_to_factor = { - 'B': math.pow(1024, 0), - 'KB': math.pow(1024, 1), - 'MB': math.pow(1024, 2), - 'GB': math.pow(1024, 3), + 'B': pow(1024, 0), + 'KB': pow(1024, 1), + 'MB': pow(1024, 2), + 'GB': pow(1024, 3), } # canonicalize the memory metric to a string @@ -125,7 +125,7 @@ class DisplayRounder: def __init__(self, ndigits): self.ndigits = ndigits - self.precision = math.pow(10, -ndigits) + self.precision = pow(10, -ndigits) def __call__(self, v: float): _v = abs(v) diff --git a/profiler/__init__.py b/profiler/__init__.py index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..de0604079e1323b2749bc801a6e8326893c73498 100644 --- a/profiler/__init__.py +++ b/profiler/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. \ No newline at end of file diff --git a/profiler/example/mstx_torch_plugin/mstx_torch_plugin.py b/profiler/example/mstx_torch_plugin/mstx_torch_plugin.py index ed22a3d0b7eed2ab0b457bb0a185061dacabc186..f6b25db7cf48cb5bcd2be687251c2499ecb30965 100644 --- a/profiler/example/mstx_torch_plugin/mstx_torch_plugin.py +++ b/profiler/example/mstx_torch_plugin/mstx_torch_plugin.py @@ -14,9 +14,8 @@ # limitations under the License. import os import functools -import re -import site import torch +import torch_npu from torch.nn import Module from torch.utils.data import DataLoader from torch.optim.optimizer import register_optimizer_step_post_hook @@ -29,18 +28,6 @@ original_multinext = torch.utils.data.dataloader._MultiProcessingDataLoaderIter. origin_patch_step_function = torch.optim.Optimizer._patch_step_function -def _check_directory_path_readable(path): - if not os.path.exists(path): - msg = f"The path dose not exist: {path}" - raise RuntimeError(msg) - if os.path.islink(path): - msg = f"Invalid path is a soft chain: {path}" - raise RuntimeError(msg) - if not os.access(path, os.R_OK): - msg = f"The path permission check failed: {path}" - raise RuntimeError(msg) - - class MstxState: def __init__(self): self.module_dict = {} @@ -157,57 +144,9 @@ def _custom_step(optimizer: torch.optim.Optimizer): mstx_state.last_optimizer_id = id(optimizer) -def _get_torch_npu_version_str(): - torch_npu_version_str = "" - site_packages = site.getsitepackages() - if site_packages and site_packages[0]: - path = site_packages[0] - version_path = os.path.join(path, "torch_npu", "version.py") - _check_directory_path_readable(version_path) - # example version info: "__version__ = '2.1.0.post11.xxxxxx'" - try: - with open(version_path, "r") as f: - for line in f: - if line.find("__version__") != -1: - torch_npu_version_str = line.strip().split("=")[-1][2:-1] - break - except Exception as e: - raise RuntimeError(f"Failed to open {version_path} to get torch npu version.") from e - return torch_npu_version_str - - -def _get_torch_npu_info(version_str: str): - # version info example: "2.1.0.post11.xxxxxx" - match = re.search(r"^(\d+\.\d+\.\d+)\.post(\d+)", version_str) - if match and len(match.groups()) == 2: - return match.group(1), match.group(2) - else: - return '', '' - - -def _check_pta_support_patch(): - pta_support_patch_version = { - "2.1.0": 10, - "2.3.1": 4, - "2.4.0": 2, - } - torch_npu_version_str = _get_torch_npu_version_str() - if not torch_npu_version_str: - raise RuntimeError("Failed to get torch_npu version info.") - torch_branch, torch_npu_version = _get_torch_npu_info(torch_npu_version_str) - if not torch_branch or not torch_npu_version or not torch_npu_version.isdigit(): - raise RuntimeError("Failed to get valid torch branch or torch_npu version.") - for branch, post_version in pta_support_patch_version.items(): - if torch_branch == branch and int(torch_npu_version) <= post_version: - return False - return True - - def apply_mstx_patch(): - pta_support_patch = _check_pta_support_patch() Module.__call__ = _custom_forward_call - if not pta_support_patch: - DataLoader.__iter__ = _custom_dataloader_iter - torch.serialization.save = _custom_save(original_save) + DataLoader.__iter__ = _custom_dataloader_iter + torch.serialization.save = _custom_save(original_save) torch.optim.Optimizer._patch_step_function = _custom_step register_optimizer_step_post_hook(_step_hook) diff --git a/profiler/merge_profiling_timeline/README.md b/profiler/merge_profiling_timeline/README.md new file mode 100644 index 0000000000000000000000000000000000000000..24db91adee88d74bff99117189e70a6ad632ddd3 --- /dev/null +++ b/profiler/merge_profiling_timeline/README.md @@ -0,0 +1,115 @@ +# 合并大json工具 + +merge_profiling_timeline(合并大json工具)支持合并Profiling的timeline数据,支持合并指定rank的timline、合并指定timeline中的item。 + + +## 多timeline融合 + +### 性能数据采集 + +使用Ascend PyTorch Profiler或者E2E性能采集工具采集性能数据,E2E profiling将被废弃,不建议使用。Ascend PyTorch Profiler采集方式参考:[Profiling数据采集](https://gitee.com/ascend/mstt/tree/master/profiler/msprof_analyze)。将采集到的所有节点的性能数据拷贝到当前环境同一目录下,以下假设数据在/home/test/cann_profiling下。 + +E2E Profiling数据目录结构示例如下: + +```bash +|- cann_profiling + |- PROF_*** + |- timeline + |- msprof.json + |- device_* + |- info.json.* + ... + |- PROF_*** + ... +``` + +Ascend PyTorch Profiler数据目录结构示例如下: + +```bash +|- ascend_pytorch_profiling + |- **_ascend_pt + |- ASCEND_PROFILER_OUTPUT + |- trace_view.json + |- FRAMEWORK + |- PROF_*** + |- **_ascend_pt +``` + +### 参数说明 + +| 参数名称 | 说明 | 是否必选 | +| -------- | ------------------------------------------------------------ | -------- | +| -i | 指定Profiling数据目录路径。 | 是 | +| --type | 指定需要合并timeline场景,可选取值:`pytorch`(通过Ascend PyTorch Profiler方式采集profiling数据,合并所有卡的trace_view.json)、`e2e`(通过E2E Profiling方式采集Profiling数据,优先合并总timeline,没有生成则选择合并device目录下的msprof_*.json)、`custom` (自定义需要合并的timeline数据,具体参考**使用示例**)。 | 是 | +| -o | 指定合并后的timeline文件输出的路径(路径末尾可以设置文件名,具体用法参考**使用示例**),不设置该参数的情况下默认文件输出的路径为当前目录(默认文件名为merged.json)。 | 否 | +| --rank | 指定需要合并timeline的Rank ID,默认全部合并。 | 否 | +| --items | 指定需要合并的Profiling数据项,包括:python、Ascend Hardware、CANN、HCCL、PTA、Overlap Analysis,默认全部合并。 | 否 | + +### 使用示例 + +1. 合并单机多卡timeline,默认合并所有卡、所有数据项,生成first.json在path/to/cann_profiling/output/目录下 + + ```bash + python3 main.py -i path/to/cann_profiling/ -o path/to/cann_profiling/output/first --type pytorch + ``` + +2. 合并单机多卡timeline,默认合并所有卡、所有数据项,不设置-o参数时默认生成merge.json在当前目录下 + + ```bash + python3 main.py -i path/to/cann_profiling/ --type pytorch + ``` + +3. 合并单机多卡timeline,只合并0卡和1卡 + + ```bash + python3 main.py -i path/to/cann_profiling/ -o path/to/cann_profiling/output/2p --type pytorch --rank 0,1 + ``` + +4. 合并单机多卡timeline,合并所有卡的CANN层和Ascend_Hardware层数据 + + ```bash + python3 main.py -i path/to/cann_profiling/ --type pytorch --items "CANN,Ascend Hardware" + ``` + +5. 合并多timeline(自定义) + + 以上场景不支持的情况下,可以使用自定义的合并方式,将需要合并的timeline文件放在同一目录下(附:该场景比较特殊,与正常合并不同,无法直接读取info.json中的rank_id,因此该场景下的rank_id为默认分配的序号,用于区分不同文件的相同层,不代表实际rank_id) + 数据目录结构示意如下: + + ```bash + |- timeline + |- msprof_0.json + |- msprof_1.json + |- msprof_2.json + |- hccl_3.json + |- hccl_4.json + ... + ``` + + 通过下面的命令合并所有timeline,同样支持-o、--rank、--items等参数。 + + ```bash + python3 main.py -i path/to/timeline/ -o path/to/timeline/xxx --type custom + ``` + + 合并timeline查看:在 -o 指定的目录(不设置-o时默认在当前目录下的merged.json)的xxx.json为合并后的文件。 + + +## 超大timeline文件查看 + +[下载whl](https://gitee.com/aerfaliang/trace_processor/releases/download/trace_processor_37.0/trace_processor-37.0-py3-none-any.whl)包并执行如下命令安装(windows): + +```bash +pip3 install trace_processor-37.0-py3-none-any.whl +``` + +安装完成后直接执行如下命令: + +```bash +python -m trace_processor --httpd path/to/xxx_merged.json +``` + +等待加载完毕,刷新[perfetto](https://ui.perfetto.dev/)界面,单击Use old version regardless,再单击`YES, use loaded trace`即可展示timeline(通过W放大、S缩小、A左移、D右移来查看timeline文件)。 + +![输入图片说明](perfetto使用指导截图1.png) +![输入图片说明](perfetto使用指导截图2.png) \ No newline at end of file diff --git a/profiler/merge_profiling_timeline/__init__.py b/profiler/merge_profiling_timeline/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/merge_profiling_timeline/main.py b/profiler/merge_profiling_timeline/main.py new file mode 100644 index 0000000000000000000000000000000000000000..722457812b8c039317cbf541d26767ee2bb91361 --- /dev/null +++ b/profiler/merge_profiling_timeline/main.py @@ -0,0 +1,237 @@ +#! /usr/bin/python3 +# Copyright 2023 Huawei Technologies Co., Ltd +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import json +import os +import re + +from functools import partial +from argparse import ArgumentParser +from decimal import Decimal + + +FILTER_DIRS = [".profiler", "HCCL_PROF", "timeline", "query", 'sqlite', 'log'] +RANK_ID_POS = 1000 + +def get_path_dir(path: str) -> list: + """ + check result path exist JOB dir + path : result path + """ + path_dir_filter = filter(partial(_path_dir_filter_func, root_dir=path), os.listdir(path)) + sub_dirs = list(path_dir_filter) + return sub_dirs + + +def _path_dir_filter_func(sub_path, root_dir): + return sub_path not in FILTER_DIRS and os.path.isdir(os.path.realpath(os.path.join(root_dir, sub_path))) + + +def natural_sort(files): + def convert(text): + return int(text) if text.isdigit() else text.lower() + + def alphanum_key(key): + return [convert(c) for c in re.split('([0-9]+)', key)] + + return sorted(files, key=alphanum_key) + + +def get_timeline_info(args, prof_dirs): + timeline_info = {} + + for prof in prof_dirs: + pro_path = os.path.join(args.input, prof) + + # 从info.json读取rank_id + rank_id = get_rank_id_from_info_json(pro_path) + if rank_id is None: + print(f"WARN, There is not rank id info in {pro_path}") + continue + + timeline_path = get_timeline_path(pro_path, args.type) + + if os.path.exists(timeline_path): + timeline_info[rank_id] = timeline_path + else: + print(f"WARN, The file \"{timeline_path}\" does not exist.") + return timeline_info + + +def get_timeline_path(pro_path, type): + for root, dirs, files in os.walk(pro_path): + for dir_ in dirs: + if 'ASCEND_PROFILER_OUTPUT' == dir_ and type == 'pytorch': + timeline_path = os.path.realpath(os.path.join(root, dir_, 'trace_view.json')) + return timeline_path + + for file_ in sorted(files, reverse=True): + if 'msprof' in file_: + timeline_path = os.path.join(root, file_) + return timeline_path + return None + +def get_rank_id_from_info_json(pro_path): + info_json = "" + rank_id = None + for root, _, files in os.walk(pro_path): + for file in files: + if "info.json." in file and ".done" not in file: + info_json = os.path.join(root, file) + break + + if info_json: + if os.path.islink(info_json): + print(f"The file: \"{info_json}\" is link. Please check the path.") + return None + try: + with open(info_json, "r+") as f: + info = json.load(f) + rank_id = info.get("rank_id") + except Exception as err: + print("[ERROR] %s" % err) + return None + return rank_id + + +def merge_timeline_general(args): + """合并e2e profiling生成的msprof*.json""" + if not os.path.isdir(args.input): + print(f"No such file or directory: \"{args.input}\". Please check the path.") + return + prof_dir = get_path_dir(args.input) + if not prof_dir: + message = f"The path \"{args.input}\" does not have PROF dir. Please check the path." + print(message) + return + timeline_info = get_timeline_info(args, prof_dir) + timeline_files_dict = {} + + # 合并部分profiling items + process_list = args.items.split(",") if args.items else None + + # 合并部分rank + if args.rank: + rank_ids = [int(rank_id) for rank_id in args.rank.split(",")] + else: + rank_ids = list(timeline_info.keys()) + + for rank_id in rank_ids: + if not timeline_info.get(rank_id): + print(f"main.py: error rank_id '{rank_id}' ") + return + timeline_files_dict[rank_id] = timeline_info.get(rank_id) + merge_timeline_events(timeline_files_dict, process_list) + + +def merge_timeline_custom(args): + """合并指定目录里所有timeline文件""" + timeline_files = natural_sort(os.listdir(args.input)) + timeline_files_dict = {} + for idx, timeline_file in enumerate(timeline_files): + timeline_files_dict[idx] = os.path.join(args.input, timeline_file) + # 合并部分profiling items + process_list = args.items.split(",") if args.items else None + merge_timeline_events(timeline_files_dict, process_list) + + +def merge_timeline_events(timeline_file_dict, process_list): + """ + 输入需要合并的timeline文件路径及对应的rank_id/id、需要合并的process_list + 输出合并timeline + """ + new_events = [] + for rank_id, timeline_path in timeline_file_dict.items(): + node = rank_id // 8 + print("rank id: ", rank_id, "timeline file: ", timeline_path) + if os.path.islink(timeline_path): + print(f"The file: \"{timeline_path}\" is link. Please check the path.") + return + try: + with open(timeline_path, 'r+') as f: + cur_events = json.load(f) + except Exception as err: + print("[ERROR] %s" % err) + return + + proc_pid_dict = {} + for event in cur_events: + if event.get("name") == "process_name" and event.get("ph") == "M": + if event.get("args"): + proc_pid_dict[event["args"].get("name")] = event.get("pid") + process_list_tmp = process_list if process_list else list(proc_pid_dict.keys()) + # 提取待合并的items的pid + merged_pids = set() + for pro in process_list_tmp: + if pro not in proc_pid_dict.keys(): + print(f"main.py: error argument --items: invalid choice: '{pro}' (choose from {list(proc_pid_dict.keys())})") + return + merged_pids.add(proc_pid_dict.get(pro)) + + for event in cur_events: + + # 只合并特定数据项 + if merged_pids and event.get('pid') not in merged_pids: + continue + + # convert tid to int + if not isinstance(event['tid'], int): + print(f"[WARNNING] {event['tid']} is not int type") + + # 进程名加上rank_id区分不同rank + if event.get("name") == "process_name" and event.get("ph") == "M": + if event.get("args") is not None and event["args"].get("name") is not None: + event["args"]["name"] = event["args"]["name"] + f"_{rank_id}" + + #modify connect id + if event.get('id') and (event.get('ph') == 's' or event.get('ph') == 'f'): + event['id'] = float(event.get('id')) * RANK_ID_POS + rank_id + + new_events.append(event) + out_path = f"{args.output}.json" + if os.path.islink(out_path): + print(f"The file: \"{out_path}\" is link. Please check the path.") + return + if os.path.exists(out_path): + print(f"File {out_path} existed before and is now overwritten.") + os.remove(out_path) + try: + # 设置文件权限为640,安全考虑 + with os.fdopen(os.open(out_path, os.O_WRONLY | os.O_CREAT, 0o640), 'w') as f: + json.dump(new_events, f) + except FileNotFoundError: + print(f"Param -o (output path) is not exists, please check it.") + return + print(f"timeline merged output path: {out_path}") + + +def parse_args(): + parser = ArgumentParser(description="Merge timeline for multi card") + parser.add_argument("-i", "--input", default=None, help="root dir of PROF_* data") + parser.add_argument("-o", "--output", default="./merged", help="save path of merged.json ") + parser.add_argument("--rank", default=None, help="List of ranks to be merged. By default, all ranks are merged") + parser.add_argument("--items", default=None, help="Specify the data items (python,CANN,Ascend Hardware,HCCL,..)to be merged. in the timeline.") + parser.add_argument("--type", choices=('pytorch', 'e2e', 'custom'), help="Customize the timeline file to be merged.") + arg = parser.parse_args() + return arg + + +if __name__ == "__main__": + args = parse_args() + print("========================== start merge timeline ====================") + if args.type == "custom": + merge_timeline_custom(args) + else: + merge_timeline_general(args) \ No newline at end of file diff --git "a/profiler/merge_profiling_timeline/perfetto\344\275\277\347\224\250\346\214\207\345\257\274\346\210\252\345\233\2761.png" "b/profiler/merge_profiling_timeline/perfetto\344\275\277\347\224\250\346\214\207\345\257\274\346\210\252\345\233\2761.png" new file mode 100644 index 0000000000000000000000000000000000000000..beef396ce2996c25ecd74298285ccab5011ddea1 Binary files /dev/null and "b/profiler/merge_profiling_timeline/perfetto\344\275\277\347\224\250\346\214\207\345\257\274\346\210\252\345\233\2761.png" differ diff --git "a/profiler/merge_profiling_timeline/perfetto\344\275\277\347\224\250\346\214\207\345\257\274\346\210\252\345\233\2762.png" "b/profiler/merge_profiling_timeline/perfetto\344\275\277\347\224\250\346\214\207\345\257\274\346\210\252\345\233\2762.png" new file mode 100644 index 0000000000000000000000000000000000000000..48793f136e48f21f618ff3cb13bdcc3388f76930 Binary files /dev/null and "b/profiler/merge_profiling_timeline/perfetto\344\275\277\347\224\250\346\214\207\345\257\274\346\210\252\345\233\2762.png" differ diff --git a/profiler/msprof_analyze/MANIFEST.in b/profiler/msprof_analyze/MANIFEST.in index b4d096405c98ea1a906b8882418362d428cbf1b6..df1488cce957db8d6135caf1e65e834103fe92ed 100644 --- a/profiler/msprof_analyze/MANIFEST.in +++ b/profiler/msprof_analyze/MANIFEST.in @@ -3,5 +3,6 @@ recursive-include msprof_analyze/cli/ * recursive-include msprof_analyze/prof_common/ * recursive-include msprof_analyze/compare_tools/ * recursive-include msprof_analyze/cluster_analyse/ * +recursive-include msprof_analyze/precheck/ * global-exclude */__pycache__/* global-exclude *.pyc diff --git a/profiler/msprof_analyze/OWNERS b/profiler/msprof_analyze/OWNERS index 864e7ecc649aab5a9eb5d6db1b33e9dd8a8882dc..7524470824c5552b570c09cc231e74811a15adf7 100644 --- a/profiler/msprof_analyze/OWNERS +++ b/profiler/msprof_analyze/OWNERS @@ -1,10 +1,12 @@ -options: - no_parent_owners: true -approvers: -- xhahn -- aerfaliang -- chenhao_1209 -- feng123www -reviewers: -- Seanesmhxocism -- wjchuee +options: + no_parent_owners: true +approvers: +- xhahn +- aerfaliang +- chenhao_1209 +- feng123www +- sunboquan +reviewers: +- sunboquan +- Seanesmhxocism +- wjchuee diff --git a/profiler/msprof_analyze/README.md b/profiler/msprof_analyze/README.md index 7e2267a55596bac342b0e2ada564ea31c5625a84..c3be2acd6ef1a33c629a07cba10be953036cfefd 100644 --- a/profiler/msprof_analyze/README.md +++ b/profiler/msprof_analyze/README.md @@ -1,250 +1,250 @@ -# 性能工具 - -MindStudio Training Tools工具针对训练&大模型场景,提供端到端性能调优工具msprof-analyze:用户采集到性能数据后,由MindStudio Training Tools的性能工具msprof-analyze提供统计、分析以及相关的调优建议。 - -## NPU性能数据采集 - -目前MindStudio Training Tools工具主要支持对Ascend PyTorch Profiler接口采集的性能数据进行分析,请参考官方文档:[Ascend PyTorch Profiler数据采集与分析](https://www.hiascend.com/document/detail/zh/canncommercial/80RC1/devaids/auxiliarydevtool/atlasprofiling_16_0006.html)。 - -### 环境和依赖 - -- 硬件环境请参见《[昇腾产品形态说明](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2Fcanncommercial%2F80RC22%2Fquickstart%2Fquickstart%2Fquickstart_18_0002.html)》。 -- 软件环境请参见《[CANN 软件安装指南](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2Fcanncommercial%2F80RC22%2Fsoftwareinst%2Finstg%2Finstg_0000.html%3FMode%3DPmIns%26OS%3DUbuntu%26Software%3DcannToolKit)》安装昇腾设备开发或运行环境,即toolkit软件包。 - -以上环境依赖请根据实际环境选择适配的版本。 - -### 版本配套说明 - -- Ascend PyTorch Profiler接口支持AscendPyTorch 1.11.0或更高版本,支持的PyTorch和CANN以及PyTorch和Python软件版本配套关系请参见《[Ascend Extension for PyTorch插件](https://gitee.com/ascend/pytorch)》。 -- Ascend PyTorch Profiler接口支持的固件驱动版本与配套CANN软件支持的固件驱动版本相同,开发者可通过“[昇腾社区-固件与驱动](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fhardware%2Ffirmware-drivers%2Fcommunity%3Fproduct%3D2%26model%3D28%26cann%3D8.0.RC3.alpha003%26driver%3D1.0.25.alpha)”页面根据产品型号与CANN软件版本获取配套的固件与驱动。 - -### 采集方式一:通过with语句进行采集 - -```python -import torch_npu -experimental_config = torch_npu.profiler._ExperimentalConfig( - aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, - profiler_level=torch_npu.profiler.ProfilerLevel.Level1, - l2_cache=False -) -with torch_npu.profiler.profile( - activities=[ - torch_npu.profiler.ProfilerActivity.CPU, - torch_npu.profiler.ProfilerActivity.NPU - ], - record_shapes=True, - profile_memory=True, - with_stack=True, - experimental_config=experimental_config, - schedule=torch_npu.profiler.schedule(wait=10, warmup=0, active=1, repeat=1), - on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data") -) as prof: - # 模型训练代码 - for epoch, data in enumerate(dataloader): - train_model_one_step(model, data) - prof.step() -``` - -### 采集方式二:start,stop方式进行采集 - -```python -import torch_npu -experimental_config = torch_npu.profiler._ExperimentalConfig( - aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, - profiler_level=torch_npu.profiler.ProfilerLevel.Level1, - l2_cache=False -) -prof = torch_npu.profiler.profile( - activities=[ - torch_npu.profiler.ProfilerActivity.CPU, - torch_npu.profiler.ProfilerActivity.NPU - ], - record_shapes=True, - profile_memory=True, - with_stack=True, - experimental_config=experimental_config, - on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data")) -# 模型训练代码 -for epoch, data in enumerate(dataloader): - if epoch == 11: - prof.start() - train_model_one_step(model, data) - prof.step() - if epoch == 11: - prof.stop() -``` - -### NPU性能数据目录结构 - -ascend pytorch profiler数据目录结构如下: - -``` -|- ascend_pytorch_profiling - |- * _ascend_pt - |- ASCEND_PROFILER_OUTPUT - |- trace_view.json - |- FRAMEWORK - |- PROF_XXX - |- profiler_info.json - |- * _ascend_pt -``` - -## 安装 - -性能工具的安装方式包括:**pip安装**、**下载whl包安装**和**源代码编译安装**。 - -### pip安装 - -```shell -pip install msprof-analyze -``` - -使用`pip install msprof-analyze==版本号`可安装指定版本的包,支持1.2.1及之后版本,版本号参见“**下载whl包安装**”。 - -pip命令会自动安装最新的包及其配套依赖。 - -提示如下信息则表示安装成功。 - -```bash -Successfully installed msprof-analyze-{version} -``` - -### 下载whl包安装 - -1. whl包获取。 - - 请通过下表链接下载profiler工具whl包。 - -| profiler版本 | 发布日期 | 下载链接 | 校验码 | -|------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | -| 2.0.0 | 2025-02-08 | [msprof_analyze-2.0.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/2.0.0/msprof_analyze-2.0.0-py3-none-any.whl) | 8e44e5f3e7681c377bb2657a600ad9841d3bed11061ddd7844c30e8a97242101 | -| 1.3.4 | 2025-01-20 | [msprof_analyze-1.3.4-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.4/msprof_analyze-1.3.4-py3-none-any.whl) | 8de92188d1a97105fb14cadcb0875ccd5f66629ee3bb25f37178da1906f4cce2 | -| 1.3.3 | 2024-12-26 | [msprof_analyze-1.3.3-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.3/msprof_analyze-1.3.3-py3-none-any.whl) | 27676f2eee636bd0c65243f81e292c7f9d30d7f985c772ac9cbaf10b54d3584e | -| 1.3.2 | 2024-12-20 | [msprof_analyze-1.3.2-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.2/msprof_analyze-1.3.2-py3-none-any.whl) | ceb227e751ec3a204135be13801f1deee6a66c347f1bb3cdaef596872874df06 | -| 1.3.1 | 2024-12-04 | [msprof_analyze-1.3.1-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.1/msprof_analyze-1.3.1-py3-none-any.whl) | eae5548804314110a649caae537f2c63320fc70ec41ce1167f67c1d674d8798e | -| 1.3.0 | 2024-10-12 | [msprof_analyze-1.3.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.0/msprof_analyze-1.3.0-py3-none-any.whl) | 8b09758c6b5181bb656a95857c32852f898c370e7f1041e5a08e4f10d5004d48 | -| 1.2.5 | 2024-09-25 | [msprof_analyze-1.2.5-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.5/msprof_analyze-1.2.5-py3-none-any.whl) | aea8ae8deac07b5b4980bd2240da27d0eec93b9ace9ea9eb2e3a05ae9072018b | -| 1.2.4 | 2024-09-19 | [msprof_analyze-1.2.4-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.4/msprof_analyze-1.2.4-py3-none-any.whl) | 7c392e72c3347c4034fd3fdfcccb1f7936c24d9c3eb217e2cc05bae1347e5ab7 | -| 1.2.3 | 2024-08-29 | [msprof_analyze-1.2.3-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.3/msprof_analyze-1.2.3-py3-none-any.whl) | 354a55747f64ba1ec6ee6fe0f05a53e84e1b403ee0341ec40cc216dd25fda14c | -| 1.2.2 | 2024-08-23 | [msprof_analyze-1.2.2-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.2/msprof_analyze-1.2.2-py3-none-any.whl) | ed92a8e4eaf5ada8a2b4079072ec0cc42501b1b1f2eb00c8fdcb077fecb4ae02 | -| 1.2.1 | 2024-08-14 | [msprof_analyze-1.2.1-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.1/msprof_analyze-1.2.1-py3-none-any.whl) | 7acd477417bfb3ea29029dadf175d019ad3212403b7e11dc1f87e84c2412c078 | -| 1.2.0 | 2024-07-25 | [msprof_analyze-1.2.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.0/msprof_analyze-1.2.0-py3-none-any.whl) | 6a4366e3beca40b4a8305080e6e441d6ecafb5c05489e5905ac0265787555f37 | -| 1.1.2 | 2024-07-12 | [msprof_analyze-1.1.2-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.1.2/msprof_analyze-1.1.2-py3-none-any.whl) | af62125b1f9348bf491364e03af712fc6d0282ccee3fb07458bc9bbef82dacc6 | -| 1.1.1 | 2024-06-20 | [msprof_analyze-1.1.1-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.1.1/msprof_analyze-1.1.1-py3-none-any.whl) | 76aad967a3823151421153d368d4d2f8e5cfbcb356033575e0b8ec5acea8e5e4 | -| 1.1.0 | 2024-05-28 | [msprof_analyze-1.1.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.1.0/msprof_analyze-1.1.0-py3-none-any.whl) | b339f70e7d1e45e81f289332ca64990a744d0e7ce6fdd84a8d82e814fa400698 | -| 1.0 | 2024-05-10 | [msprof_analyze-1.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.0/msprof_analyze-1.0-py3-none-any.whl) | 95b2f41c8c8e8afe4887b738c8cababcb4f412e1874483b6adae4a025fcbb7d4 | - -2. whl包校验。 - - 1. 根据以上下载链接下载whl包到Linux安装环境。 - - 2. 进入whl包所在目录,执行如下命令。 - - ``` - sha256sum {name}.whl - ``` - - {name}为whl包名称。 - - 若回显呈现对应版本whl包一致的**校验码**,则表示下载了正确的性能工具whl安装包。示例如下: - - ```bash - sha256sum msprof_analyze-1.0-py3-none-any.whl - xx *msprof_analyze-1.0-py3-none-any.whl - ``` - -3. whl包安装。 - - 执行如下命令进行安装。 - - ```bash - pip3 install ./msprof_analyze-{version}-py3-none-any.whl - ``` - - 提示如下信息则表示安装成功。 - - ```bash - Successfully installed msprof_analyze-{version} - ``` - -### 源代码编译安装 - -1. 安装依赖。 - - 编译前需要安装wheel。 - - ```bash - pip3 install wheel - ``` - -2. 下载源码。 - - ```bash - git clone https://gitee.com/ascend/mstt.git - ``` - -3. 编译whl包。 - - ```bash - cd mstt/profiler/msprof_analyze - pip3 install -r requirements.txt && python3 setup.py bdist_wheel - ``` - - 以上命令执行完成后在mstt/profiler/msprof_analyze/dist目录下生成性能工具whl安装包`msprof_analyze-{version}-py3-none-any.whl`。 - -4. 安装。 - - 执行如下命令进行性能工具安装。 - - ```bash - cd dist - pip3 install ./msprof_analyze-{version}-py3-none-any.whl - ``` - -## 卸载和更新 - -若需要更新工具,请先卸载旧版本后再重新安装新版本,如下操作: - -1. 卸载 - - ```bash - pip3 uninstall msprof-analyze - ``` - -2. 更新 - - ```bash - pip3 install ./msprof_analyze-{version}-py3-none-any.whl - ``` - -## 工具使用 - -```bash -msprof-analyze advisor [-h] -``` - -```bash -msprof-analyze compare [-h] -``` - -```bash -msprof-analyze cluster [-h] -``` - -```bash -msprof-analyze auto-completion [-h] -``` - -``` -msprof-analyze [-h] [-v] -``` - -| 参数 | 说明 | -| -------------------- | ------------------------------------------------------------ | -| advisor | [advisor](./advisor/README.md)。将Ascend PyTorch Profiler或者msprof采集的PyThon场景性能数据进行分析,并输出性能调优建议。 | -| compare | [compare_tools(性能比对工具)](./compare_tools/README.md)。提供NPU与GPU性能拆解功能以及算子、通信、内存性能的比对功能。 | -| cluster | [cluster_analyse(集群分析工具)](./cluster_analyse/README.md)。提供多机多卡的集群分析能力(基于通信域的通信分析和迭代耗时分析), 当前需要配合Ascend Insight的集群分析功能使用。 | -| auto-completion | 自动补全。配置后在当前视图下配置msprof-analyze工具所有的子参数时,可以使用Tab将所有子参数自动补全。 | -| -v,-V
    --version | 查看版本号。 | -| -h,-H
    --help | 命令行参数帮助信息。 | - +# 性能工具 + +MindStudio Training Tools工具针对训练&大模型场景,提供端到端性能调优工具msprof-analyze:用户采集到性能数据后,由MindStudio Training Tools的性能工具msprof-analyze提供统计、分析以及相关的调优建议。 + +## NPU性能数据采集 + +目前MindStudio Training Tools工具主要支持对Ascend PyTorch Profiler接口采集的性能数据进行分析,请参考官方文档:[Ascend PyTorch Profiler数据采集与分析](https://www.hiascend.com/document/detail/zh/canncommercial/80RC1/devaids/auxiliarydevtool/atlasprofiling_16_0006.html)。 + +### 环境和依赖 + +- 硬件环境请参见《[昇腾产品形态说明](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2Fcanncommercial%2F80RC22%2Fquickstart%2Fquickstart%2Fquickstart_18_0002.html)》。 +- 软件环境请参见《[CANN 软件安装指南](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2Fcanncommercial%2F80RC22%2Fsoftwareinst%2Finstg%2Finstg_0000.html%3FMode%3DPmIns%26OS%3DUbuntu%26Software%3DcannToolKit)》安装昇腾设备开发或运行环境,即toolkit软件包。 + +以上环境依赖请根据实际环境选择适配的版本。 + +### 版本配套说明 + +- Ascend PyTorch Profiler接口支持AscendPyTorch 1.11.0或更高版本,支持的PyTorch和CANN以及PyTorch和Python软件版本配套关系请参见《[Ascend Extension for PyTorch插件](https://gitee.com/ascend/pytorch)》。 +- Ascend PyTorch Profiler接口支持的固件驱动版本与配套CANN软件支持的固件驱动版本相同,开发者可通过“[昇腾社区-固件与驱动](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fhardware%2Ffirmware-drivers%2Fcommunity%3Fproduct%3D2%26model%3D28%26cann%3D8.0.RC3.alpha003%26driver%3D1.0.25.alpha)”页面根据产品型号与CANN软件版本获取配套的固件与驱动。 + +### 采集方式一:通过with语句进行采集 + +```python +import torch_npu +experimental_config = torch_npu.profiler._ExperimentalConfig( + aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, + profiler_level=torch_npu.profiler.ProfilerLevel.Level1, + l2_cache=False +) +with torch_npu.profiler.profile( + activities=[ + torch_npu.profiler.ProfilerActivity.CPU, + torch_npu.profiler.ProfilerActivity.NPU + ], + record_shapes=True, + profile_memory=True, + with_stack=True, + experimental_config=experimental_config, + schedule=torch_npu.profiler.schedule(wait=10, warmup=0, active=1, repeat=1), + on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data") +) as prof: + # 模型训练代码 + for epoch, data in enumerate(dataloader): + train_model_one_step(model, data) + prof.step() +``` + +### 采集方式二:start,stop方式进行采集 + +```python +import torch_npu +experimental_config = torch_npu.profiler._ExperimentalConfig( + aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, + profiler_level=torch_npu.profiler.ProfilerLevel.Level1, + l2_cache=False +) +prof = torch_npu.profiler.profile( + activities=[ + torch_npu.profiler.ProfilerActivity.CPU, + torch_npu.profiler.ProfilerActivity.NPU + ], + record_shapes=True, + profile_memory=True, + with_stack=True, + experimental_config=experimental_config, + on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data")) +# 模型训练代码 +for epoch, data in enumerate(dataloader): + if epoch == 11: + prof.start() + train_model_one_step(model, data) + prof.step() + if epoch == 11: + prof.stop() +``` + +### NPU性能数据目录结构 + +ascend pytorch profiler数据目录结构如下: + +``` +|- ascend_pytorch_profiling + |- * _ascend_pt + |- ASCEND_PROFILER_OUTPUT + |- trace_view.json + |- FRAMEWORK + |- PROF_XXX + |- profiler_info.json + |- * _ascend_pt +``` + +## 安装 + +性能工具的安装方式包括:**pip安装**、**下载whl包安装**和**源代码编译安装**。 + +### pip安装 + +```shell +pip install msprof-analyze +``` + +使用`pip install msprof-analyze==版本号`可安装指定版本的包,支持1.2.1及之后版本,版本号参见“**下载whl包安装**”。 + +pip命令会自动安装最新的包及其配套依赖。 + +提示如下信息则表示安装成功。 + +```bash +Successfully installed msprof-analyze-{version} +``` + +### 下载whl包安装 + +1. whl包获取。 + + 请通过下表链接下载profiler工具whl包。 + +| profiler版本 | 发布日期 | 下载链接 | 校验码 | +|------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | +| 2.0.0 | 2025-02-08 | [msprof_analyze-2.0.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/2.0.0/msprof_analyze-2.0.0-py3-none-any.whl) | 8e44e5f3e7681c377bb2657a600ad9841d3bed11061ddd7844c30e8a97242101 | +| 1.3.4 | 2025-01-20 | [msprof_analyze-1.3.4-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.4/msprof_analyze-1.3.4-py3-none-any.whl) | 8de92188d1a97105fb14cadcb0875ccd5f66629ee3bb25f37178da1906f4cce2 | +| 1.3.3 | 2024-12-26 | [msprof_analyze-1.3.3-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.3/msprof_analyze-1.3.3-py3-none-any.whl) | 27676f2eee636bd0c65243f81e292c7f9d30d7f985c772ac9cbaf10b54d3584e | +| 1.3.2 | 2024-12-20 | [msprof_analyze-1.3.2-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.2/msprof_analyze-1.3.2-py3-none-any.whl) | ceb227e751ec3a204135be13801f1deee6a66c347f1bb3cdaef596872874df06 | +| 1.3.1 | 2024-12-04 | [msprof_analyze-1.3.1-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.1/msprof_analyze-1.3.1-py3-none-any.whl) | eae5548804314110a649caae537f2c63320fc70ec41ce1167f67c1d674d8798e | +| 1.3.0 | 2024-10-12 | [msprof_analyze-1.3.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.3.0/msprof_analyze-1.3.0-py3-none-any.whl) | 8b09758c6b5181bb656a95857c32852f898c370e7f1041e5a08e4f10d5004d48 | +| 1.2.5 | 2024-09-25 | [msprof_analyze-1.2.5-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.5/msprof_analyze-1.2.5-py3-none-any.whl) | aea8ae8deac07b5b4980bd2240da27d0eec93b9ace9ea9eb2e3a05ae9072018b | +| 1.2.4 | 2024-09-19 | [msprof_analyze-1.2.4-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.4/msprof_analyze-1.2.4-py3-none-any.whl) | 7c392e72c3347c4034fd3fdfcccb1f7936c24d9c3eb217e2cc05bae1347e5ab7 | +| 1.2.3 | 2024-08-29 | [msprof_analyze-1.2.3-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.3/msprof_analyze-1.2.3-py3-none-any.whl) | 354a55747f64ba1ec6ee6fe0f05a53e84e1b403ee0341ec40cc216dd25fda14c | +| 1.2.2 | 2024-08-23 | [msprof_analyze-1.2.2-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.2/msprof_analyze-1.2.2-py3-none-any.whl) | ed92a8e4eaf5ada8a2b4079072ec0cc42501b1b1f2eb00c8fdcb077fecb4ae02 | +| 1.2.1 | 2024-08-14 | [msprof_analyze-1.2.1-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.1/msprof_analyze-1.2.1-py3-none-any.whl) | 7acd477417bfb3ea29029dadf175d019ad3212403b7e11dc1f87e84c2412c078 | +| 1.2.0 | 2024-07-25 | [msprof_analyze-1.2.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.2.0/msprof_analyze-1.2.0-py3-none-any.whl) | 6a4366e3beca40b4a8305080e6e441d6ecafb5c05489e5905ac0265787555f37 | +| 1.1.2 | 2024-07-12 | [msprof_analyze-1.1.2-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.1.2/msprof_analyze-1.1.2-py3-none-any.whl) | af62125b1f9348bf491364e03af712fc6d0282ccee3fb07458bc9bbef82dacc6 | +| 1.1.1 | 2024-06-20 | [msprof_analyze-1.1.1-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.1.1/msprof_analyze-1.1.1-py3-none-any.whl) | 76aad967a3823151421153d368d4d2f8e5cfbcb356033575e0b8ec5acea8e5e4 | +| 1.1.0 | 2024-05-28 | [msprof_analyze-1.1.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.1.0/msprof_analyze-1.1.0-py3-none-any.whl) | b339f70e7d1e45e81f289332ca64990a744d0e7ce6fdd84a8d82e814fa400698 | +| 1.0 | 2024-05-10 | [msprof_analyze-1.0-py3-none-any.whl](https://ptdbg.obs.myhuaweicloud.com/profiler/package/1.0/msprof_analyze-1.0-py3-none-any.whl) | 95b2f41c8c8e8afe4887b738c8cababcb4f412e1874483b6adae4a025fcbb7d4 | + +2. whl包校验。 + + 1. 根据以上下载链接下载whl包到Linux安装环境。 + + 2. 进入whl包所在目录,执行如下命令。 + + ``` + sha256sum {name}.whl + ``` + + {name}为whl包名称。 + + 若回显呈现对应版本whl包一致的**校验码**,则表示下载了正确的性能工具whl安装包。示例如下: + + ```bash + sha256sum msprof_analyze-1.0-py3-none-any.whl + xx *msprof_analyze-1.0-py3-none-any.whl + ``` + +3. whl包安装。 + + 执行如下命令进行安装。 + + ```bash + pip3 install ./msprof_analyze-{version}-py3-none-any.whl + ``` + + 提示如下信息则表示安装成功。 + + ```bash + Successfully installed msprof_analyze-{version} + ``` + +### 源代码编译安装 + +1. 安装依赖。 + + 编译前需要安装wheel。 + + ```bash + pip3 install wheel + ``` + +2. 下载源码。 + + ```bash + git clone https://gitee.com/ascend/mstt.git + ``` + +3. 编译whl包。 + + ```bash + cd mstt/profiler/msprof_analyze + pip3 install -r requirements.txt && python3 setup.py bdist_wheel + ``` + + 以上命令执行完成后在mstt/profiler/msprof_analyze/dist目录下生成性能工具whl安装包`msprof_analyze-{version}-py3-none-any.whl`。 + +4. 安装。 + + 执行如下命令进行性能工具安装。 + + ```bash + cd dist + pip3 install ./msprof_analyze-{version}-py3-none-any.whl + ``` + +## 卸载和更新 + +若需要更新工具,请先卸载旧版本后再重新安装新版本,如下操作: + +1. 卸载 + + ```bash + pip3 uninstall msprof-analyze + ``` + +2. 更新 + + ```bash + pip3 install ./msprof_analyze-{version}-py3-none-any.whl + ``` + +## 工具使用 + +```bash +msprof-analyze advisor [-h] +``` + +```bash +msprof-analyze compare [-h] +``` + +```bash +msprof-analyze cluster [-h] +``` + +```bash +msprof-analyze auto-completion [-h] +``` + +``` +msprof-analyze [-h] [-v] +``` + +| 参数 | 说明 | +| -------------------- | ------------------------------------------------------------ | +| advisor | [advisor](./advisor/README.md)。将Ascend PyTorch Profiler或者msprof采集的PyThon场景性能数据进行分析,并输出性能调优建议。 | +| compare | [compare_tools(性能比对工具)](./compare_tools/README.md)。提供NPU与GPU性能拆解功能以及算子、通信、内存性能的比对功能。 | +| cluster | [cluster_analyse(集群分析工具)](./cluster_analyse/README.md)。提供多机多卡的集群分析能力(基于通信域的通信分析和迭代耗时分析), 当前需要配合Ascend Insight的集群分析功能使用。 | +| auto-completion | 自动补全。配置后在当前视图下配置msprof-analyze工具所有的子参数时,可以使用Tab将所有子参数自动补全。 | +| -v,-V
    --version | 查看版本号。 | +| -h,-H
    --help | 命令行参数帮助信息。 | + diff --git a/profiler/msprof_analyze/advisor/README.md b/profiler/msprof_analyze/advisor/README.md index befdf89fbe9542c69b5ac0e94d163e11f34c4fad..2c9e055a119847134f08337c559d4012b4ea31fc 100644 --- a/profiler/msprof_analyze/advisor/README.md +++ b/profiler/msprof_analyze/advisor/README.md @@ -90,7 +90,6 @@ msprof-analyze advisor命令行包含如下三个参数: | | slow link | 慢链路识别 | PyTorch、MindSpore | | computation | AICPU Issues | AI CPU调优 | PyTorch、MindSpore | | | Operator Dynamic Shape Issues | 识别动态Shape算子 | PyTorch | -| | AI Core Performance analysis | MatMul、FlashAttentionScore、AI_VECTOR_CORE和MIX_AIV类算子的性能分析 | PyTorch | | | Block Dim | Block Dim算子调优 | PyTorch、MindSpore | | | Operator No Bound Issues | 算子瓶颈分析 | PyTorch、MindSpore | | | Fusion Issues | 融合算子图调优 | PyTorch、MindSpore | @@ -104,7 +103,6 @@ msprof-analyze advisor命令行包含如下三个参数: | | SyncBatchNorm Issues | BatchNorm同步检测 | PyTorch、MindSpore | | | Synchronize Stream Issues | 流同步检测 | PyTorch、MindSpore | | | GC Analysis | 识别异常垃圾回收事件。需要Ascend PyTorch Profiler采集时开启experimental_config下的gc_delect_threshold功能 | PyTorch | -| | Fusible Operator Analysis | 检测具有Host瓶颈或者MTE瓶颈的算子序列,可用于代码优化或开发可融合算子 | PyTorch、MindSpore | | dataloader | Slow Dataloader Issues | 异常dataloader检测 | PyTorch、MindSpore | | memory | Memory Operator Issues | 识别异常的内存申请释放操作 | PyTorch、MindSpore | | comparison | Kernel compare of Rank\* Step\* and Rank\* Step\* | 识别标杆和待比对性能数据的Kernel数据(无标杆场景是集群内部快慢卡的性能数据对比,有标杆场景是两个集群之间存在明显耗时差异的相同卡之间的性能数据对比) | PyTorch、MindSpore | @@ -235,7 +233,7 @@ communication模块从通信维度进行分析,目前支持通信小包检测 ![byte_alignment](/img/byte_alignment.png) -computation模块从device计算性能维度进行分析,能够识别AI CPU、动态Shape、AI Core Performance analysis、Dlock Dim、算子瓶颈、融合算子图、AI Core算子降频分析等问题并给出相应建议。此处不再详细展开,按照报告进行调优即可。示例如下: +computation模块从device计算性能维度进行分析,能够识别AI CPU、动态Shape、Dlock Dim、算子瓶颈、融合算子图、AI Core算子降频分析等问题并给出相应建议。此处不再详细展开,按照报告进行调优即可。示例如下: ![computation_1](./img/computation_1.png) @@ -243,8 +241,6 @@ computation模块从device计算性能维度进行分析,能够识别AI CPU、 ![op_no_bound](./img/op_no_bound.png) -![AI Core Performance analysis](./img/AI Core Performance analysis.png) - 上图中torch_npu.npu.set_compile_mode接口介绍请参见[torch_npu.npu.set_compile_mode](https://www.hiascend.com/document/detail/zh/Pytorch/60RC2/apiref/apilist/ptaoplist_000880.html);AICPU算子替换样例可参考《[Samples of AI CPU Operator Replacement](https://gitee.com/ascend/mstt/blob/master/profiler/msprof_analyze/advisor/doc/Samples%20of%20AI%20CPU%20Operator%20Replacement.md)》。 当存在pp stage(流水线并行)时,computation会按stage分析,每个stage就是一个流水线切分,比如0\~7卡为stage-0、8\~15卡为stage-1。 @@ -257,22 +253,7 @@ dataloader模块包含Slow Dataloader Issues,主要检测异常高耗时的dat 上图中的`pin_memory`(内存锁定)和`num_workers`(数据加载是子流程数量)参数为[数据加载优化](https://www.hiascend.com/document/detail/zh/Pytorch/60RC2/ptmoddevg/trainingmigrguide/performance_tuning_0019.html)使用。 -schedule模块包GC Analysis、含亲和API、aclOpCompile、SyncBatchNorm、SynchronizeStream和Fusible Operator Analysis等多项检测。 - -其中Fusible Operator Analysis解析结果仅打屏展示和保存在`mstt_advisor_{timestamp}.xlsx`文件中,包含“基于host瓶颈的算子序列分析”和“基于mte瓶颈的算子序列分析”页签,如下图: - -![Fusible Operator Analysis](/img/Fusible Operator Analysis.png) - -| 字段 | 说明 | -| ------------------ | ------------------------------------------------------------ | -| start index | 序列起始算子在kernel details.csv或op_summary.csv中索引位置(不包含表头,起始索引为0)。 | -| end index | 序列末尾算子在kernel details.csv或op_summary.csv中索引位置。 | -| total time(us) | 算子序列总耗时(包含算子间隙),单位us。 | -| execution time(us) | 序列中算子执行总耗时,单位us。 | -| mte time(us) | 序列中算子搬运总耗时,单位us。 | -| occurrences | 序列出现次数。 | -| mte bound | 是否为MTE瓶颈。 | -| host bound | 是否为Host瓶颈。 | +schedule模块包GC Analysis、含亲和API、aclOpCompile、SyncBatchNorm、SynchronizeStream等多项检测。 如下图示例,GC Analysis提示存在异常垃圾回收事件,用户可以通过有效的Python内存管理、使用`gc.set_threshold()`调整垃圾回收阈值、使用gc.disable()禁用gc等方法处理GC问题。 diff --git a/profiler/msprof_analyze/advisor/advisor_backend/cluster_advice/slow_link_advice.py b/profiler/msprof_analyze/advisor/advisor_backend/cluster_advice/slow_link_advice.py index 6d2a0638913d759817b091a013d7fbce9df09f63..2024adf8f6a020a5e09ce41949f9815831d7b563 100644 --- a/profiler/msprof_analyze/advisor/advisor_backend/cluster_advice/slow_link_advice.py +++ b/profiler/msprof_analyze/advisor/advisor_backend/cluster_advice/slow_link_advice.py @@ -13,7 +13,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -import copy import os from collections import defaultdict from msprof_analyze.advisor.advisor_backend.common_func_advisor.constant import Constant @@ -42,7 +41,7 @@ class SlowLinkAdvice(ClusterAdviceBase): self.SDMA_TIME_MS: 0, self.SDMA_SIZE_MB: 0, } - self.rank_bw_dict = defaultdict(lambda: copy.deepcopy(default_value)) + self.rank_bw_dict = defaultdict(lambda: default_value.copy()) @staticmethod def compute_ratio(dividend: float, divisor: float): diff --git a/profiler/msprof_analyze/advisor/advisor_backend/common_func_advisor/constant.py b/profiler/msprof_analyze/advisor/advisor_backend/common_func_advisor/constant.py index 162a9fd2fdde15e02d2897106b43f52bca99bde1..077bf0074ccc5edc1bbf0814d2d3d72b1c5475e7 100644 --- a/profiler/msprof_analyze/advisor/advisor_backend/common_func_advisor/constant.py +++ b/profiler/msprof_analyze/advisor/advisor_backend/common_func_advisor/constant.py @@ -214,7 +214,7 @@ class CoreType: AICPU = "AI_CPU" MIX_AIV = "MIX_AIV" MIX_AIC = "MIX_AIC" - HCCL = "COMMUNICATION" + HCCL = "HCCL" class PerfColor(Enum): diff --git a/profiler/msprof_analyze/advisor/analyzer/analyzer_controller.py b/profiler/msprof_analyze/advisor/analyzer/analyzer_controller.py index bde9e5cd3454a85853a6fbcfbd0ade060ebc229b..d923ba978f8d8797ba0b41902fa24920afd61dad 100644 --- a/profiler/msprof_analyze/advisor/analyzer/analyzer_controller.py +++ b/profiler/msprof_analyze/advisor/analyzer/analyzer_controller.py @@ -1,947 +1,947 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import copy -import logging -import json -import sys -import os -import platform -import multiprocessing as mp -from multiprocessing import Manager -from pathlib import Path - -import psutil - -from msprof_analyze.prof_common.additional_args_manager import AdditionalArgsManager -from msprof_analyze.advisor.analyzer.cluster.slow_rank_analyzer import SlowRankAnalyzer -from msprof_analyze.advisor.analyzer.cluster.slow_link_analyzer import SlowLinkAnalyzer -from msprof_analyze.advisor.analyzer.computation.pp_stage_computation_analyzer import PPStageComputationAnalyzer -from msprof_analyze.advisor.analyzer.overall.overall_summary_analyzer import OverallSummaryAnalyzer -from msprof_analyze.advisor.config.config import Config -from msprof_analyze.advisor.common.analyzer_scopes import SupportedScopes -from msprof_analyze.advisor.common.async_analysis_status import AsyncAnalysisStatus -from msprof_analyze.advisor.common.enum_params_parser import EnumParamsParser -from msprof_analyze.advisor.utils.utils import Timer, safe_index_value, safe_division, safe_index, convert_to_int -from msprof_analyze.advisor.interface.interface import Interface -from msprof_analyze.cluster_analyse.cluster_data_preprocess.pytorch_data_preprocessor import PytorchDataPreprocessor -from msprof_analyze.cluster_analyse.cluster_data_preprocess.mindspore_data_preprocessor import MindsporeDataPreprocessor -from msprof_analyze.prof_common.path_manager import PathManager -from msprof_analyze.prof_common.constant import Constant - -# 以spawn模式启动多进程,避免fork主进程资源。如果主进程逻辑较为复杂,fork可能会导致异常。 -mp.set_start_method("spawn", force=True) -logger = logging.getLogger() - - -class AsyncParams: - """处理用户异步请求的输入参数,包括cli arguments和环境变量两类参数.""" - user_valid_arguments = {} - user_valid_envs = {} - user_non_enum_params = {} - user_invalid_values = [] - user_total_params = {} - - @staticmethod - def parse_async_list_params(key, value, option_values, key_type, value_type): - if isinstance(value, list): - value_list = value - else: - value_list = [_.strip(" ") for _ in str(value).split(",")] - - if sorted(value_list) not in [sorted(option) for option in option_values]: - AsyncParams.user_invalid_values.append( - {"key": key, "invalid value": value, "optional values": option_values, - "required value type": value_type}) - return - if key_type == EnumParamsParser.ENVS: - AsyncParams.user_valid_envs[key.upper()] = ",".join(value_list) - elif key_type == EnumParamsParser.ARGUMENTS: - AsyncParams.user_valid_arguments[key] = value_list - - @staticmethod - def parse_async_int_params(key, value, option_values, key_type, value_type): - if convert_to_int(value) not in option_values: - AsyncParams.user_invalid_values.append( - {"key": key, "invalid value": value, "optional values": option_values, - "required value type": value_type}) - return - - if key_type == EnumParamsParser.ENVS: - AsyncParams.user_valid_envs[key.upper()] = str(convert_to_int(value)) - elif key_type == EnumParamsParser.ARGUMENTS: - AsyncParams.user_valid_arguments[key] = convert_to_int(value) - - @staticmethod - def parse_async_str_params(key, value, option_values, key_type, value_type): - if str(value) not in option_values: - AsyncParams.user_invalid_values.append( - {"key": key, "invalid value": value, "optional values": option_values, - "required value type": value_type}) - return - if key_type == EnumParamsParser.ENVS: - AsyncParams.user_valid_envs[key.upper()] = str(value) - elif key_type == EnumParamsParser.ARGUMENTS: - AsyncParams.user_valid_arguments[key] = str(value) - - @staticmethod - def parse_async_boolean_params(key, value, option_values, key_type, value_type): - - if str(value).lower() not in ["true", "false"]: - AsyncParams.user_invalid_values.append( - {"key": key, "invalid value": value, "optional values": option_values, - "required value type": value_type}) - return - - if key_type == EnumParamsParser.ENVS: - AsyncParams.user_valid_envs[key.upper()] = str(value) - elif key_type == EnumParamsParser.ARGUMENTS: - AsyncParams.user_valid_arguments[key] = str(value).lower() == "true" - - @staticmethod - def parse_params(user_async_params): - params_parser = EnumParamsParser() - valid_env_keys = [key.lower() for key in params_parser.get_envs_keys()] - valid_arg_keys = [key.lower() for key in params_parser.get_arguments_keys()] - - for key, value in user_async_params.items(): - key = key.lower() - if key not in valid_env_keys + valid_arg_keys: - AsyncParams.user_non_enum_params[key] = value - continue - - if key in valid_env_keys: - # 环境变量均大写,异步调用入参到analyzer controller时支持用户使用小写配置环境变量 - option_values = params_parser.get_options(key.upper()) - value_type = params_parser.get_value_type(key.upper()) - key_type = params_parser.ENVS - else: - option_values = params_parser.get_options(key) - value_type = params_parser.get_value_type(key) - key_type = params_parser.ARGUMENTS - - if hasattr(AsyncParams, f"parse_async_{value_type}_params"): - getattr(AsyncParams, f"parse_async_{value_type}_params")(key, value, option_values, key_type, - value_type) - - AsyncParams.user_total_params["async_analysis_env"] = AsyncParams.user_valid_envs - AsyncParams.user_total_params.update(AsyncParams.user_valid_arguments) - AsyncParams.user_total_params.update(AsyncParams.user_non_enum_params) - - -class AnalyzerController: - CLUSTER_RANK_THRESHOLD = 2 - SDMA_SUPPORT_SCOPES = [SupportedScopes.BANDWIDTH_CONTENTION_DETECTION, SupportedScopes.BYTE_ALIGNMENT_DETECTION] - RDMA_SUPPORT_SCOPES = [SupportedScopes.PACKET] - COMMUNICATION_MAPPING = { - SlowLinkAnalyzer.SDMA: SDMA_SUPPORT_SCOPES, - SlowLinkAnalyzer.RDMA: RDMA_SUPPORT_SCOPES - } - - def __init__(self): - self.dimensions = Interface.all_dimension - self.kwargs = {} - self.args_manager = None - self.slow_rank_analyzer = None - self.slow_link_analyzer = None - self.cluster_local_data_map = {} - self.default_rank_id = None - self.rank_id_map = {} - self._is_cluster = False - self.analysis_process_resp = Manager().dict() - - @staticmethod - def _set_analysis_process_priority(pid): - # 将分析进程优先级设置为最低,避免因为分析进程阻塞其他任务进程,unix上19表示最低优先级 - unix_process_lowest_priority = 19 - windows_platform = "windows" - linux_platform = "linux" - p = psutil.Process(pid) - if platform.system().lower() == windows_platform: - p.nice(psutil.BELOW_NORMAL_PRIORITY_CLASS) - elif platform.system().lower() == linux_platform: - p.nice(unix_process_lowest_priority) - - @staticmethod - def _check_profiling_path_valid(profiling_path): - PathManager.input_path_common_check(profiling_path) - - if not Path(profiling_path).exists(): - logger.error("Profiling path is not existed. Invalid profiling path: %s", profiling_path) - return False - - return True - - - @staticmethod - def _get_step_rank_for_cluster_statistic_diff(target_cluster_statistic_data, benchmark_cluster_statistic_data, - headers, dimension, get_max=False): - if dimension not in headers: - logger.error("Error dimension %s for cluster statistics data, optionals are %s.", dimension, headers) - return None, None, None - - dimension_index = safe_index_value(headers, dimension) - diff_record = [] - # 对比目标profiling和benchmark profiling 每张卡的计算和下发和带宽,取计算、下发、带宽差异最大的卡进行下一步分析 - for target_row_data, benchmark_row_data in zip(target_cluster_statistic_data, benchmark_cluster_statistic_data): - target_data = safe_index(target_row_data, dimension_index) - benchmark_data = safe_index(benchmark_row_data, dimension_index) - - if not isinstance(target_data, (int, float)) or not isinstance(benchmark_data, (int, float)): - continue - diff_record.append(target_data - benchmark_data) - - if SlowRankAnalyzer.compute_max_gap_ratio(diff_record, safe_division(sum(diff_record), len( - diff_record))) < SlowRankAnalyzer.RATIO_THRESHOLD: - return None, None, None - - value = max(diff_record) if get_max else min(diff_record) - value_index = safe_index_value(diff_record, value) - - step_value_index = safe_index_value(headers, "step") - rank_id_value_index = safe_index_value(headers, "rank_id") - - step = safe_index(safe_index(target_cluster_statistic_data, value_index, []), step_value_index) - benchmark_step = safe_index(safe_index(benchmark_cluster_statistic_data, value_index, []), step_value_index) - target_rank_id = safe_index(safe_index(target_cluster_statistic_data, value_index, []), rank_id_value_index) - benchmark_rank_id = safe_index(safe_index(benchmark_cluster_statistic_data, value_index, []), - rank_id_value_index) - - if target_rank_id != benchmark_rank_id: - logger.error( - "Rank ids of target profiling must keep the same as benchmark profiling, skip cluster comparison") - return None, None, None - - return step, benchmark_step, target_rank_id - - @staticmethod - def _init_async_analysis_env(kwargs): - envs = kwargs.get("async_analysis_env", {}) - for key, value in envs.items(): - os.environ[key] = value - - def format_async_analysis_params(self, pid, async_resp, dimensions, kwargs): - - AsyncParams.parse_params(kwargs) - dimensions = AsyncParams.user_total_params.get("analysis_dimensions") or dimensions - - if AsyncParams.user_invalid_values: - error_msg = "Got invalid arguments as follows: \n " - for index, invalid_value in enumerate(AsyncParams.user_invalid_values): - error_msg += f"{index + 1}. Key '{invalid_value.get('key')}', " \ - f"invalid value '{invalid_value.get('invalid value')}', " \ - f"optional valid values '{invalid_value.get('optional values')}', " \ - f"required value type '{invalid_value.get('required value type')}'.\n " - self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, - status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED) - raise ValueError(error_msg) - - logger.warning("User parameters for async analysis is as follows:\n %s", - json.dumps(AsyncParams.user_total_params, indent=4)) - return dimensions, AsyncParams.user_total_params - - def do_analysis(self, dimensions, **kwargs): - pid = os.getpid() - resp = {"id": pid} - self.args_manager = AdditionalArgsManager() - self.args_manager.init(kwargs) - output_path = kwargs.get("output_path") - - AnalyzerController._set_analysis_process_priority(pid) - if kwargs.get("is_async_analysis"): - del kwargs["is_async_analysis"] - dimensions, kwargs = self.format_async_analysis_params(pid, resp, dimensions, kwargs) - AnalyzerController._init_async_analysis_env(kwargs) - - try: - if output_path: - - PathManager.check_input_directory_path(output_path) - if os.path.exists(output_path): - PathManager.check_path_owner_consistent([output_path]) - else: - PathManager.make_dir_safety(output_path) - - Config().set_config("_work_path", output_path) - Config().set_log_path(f"mstt_advisor_{Timer().strftime}.xlsx") - - self._do_analysis(dimensions, pid=pid, async_resp=resp, **kwargs) - except Exception as e: - self._update_analysis_process_resp(pid, resp, status_code=AsyncAnalysisStatus.INNER_ERROR_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED, error_msg=str(e)) - logger.error(e) - raise RuntimeError("Do analysis error.") from e - - def async_do_analysis(self, dimensions, **kwargs): - """ Deploy a online service to start async analysis job, wrap this api by flask or tornado and so on, - then could query the analysis status by restful api. - You can view file 'profiler/msprof_analyze/advisor/config/enum_parameters.yaml' to obtain detailed - information for all the args listed below. - - Args: - dimensions: analysis dimension, normally set as Interface.all_dimension, support specific dimension analysis - such as ['computation'] or ['computation', 'schedule'] - cann_version: cann version of your runtime, inpact on the analysis of affinity api and AICPU operators - profiling_type: profiling type of your runtime - profiling_version: profiling version of your runtime, inpact on the analysis of affinity api - analysis_dimensions: can overwite dimensions. - advisor_analyze_processes: number of processes to use while the training params pipeline parallel(pp) >1, - can reduce the time of analysis. - disable_profiling_comparison: disable comparison of operators(including npu computation operator and - cpu torch aten operator), can reduce the time of analysis. - disable_affinity_api: disable analysis of affinity api, normally set as 'True' while you training job - has been trained on NPU for a long time and suddenly shows performance degradation. - output_path: analysis output path(including html and xlsx). - - Example: - >>> # initialize a global analyzer controller - >>> analyzer_controller = AnalyzerController() - >>> analysis_kwargs = dict(advisor_analyze_processes=2, disable_profiling_comparison=True) - >>> - >>> async_analysis_process = analyzer_controller.async_do_analysis( - >>> Interface.all_dimension, **analysis_kwargs) - >>> - >>> - >>> # query the job status every second - >>> while True: - >>> response = analyzer_controller.get_response_by_pid(async_analysis_process.pid) - >>> print(f'analysis response is {response}') - >>> if response.get("status") in ["success", "failed"]: - >>> async_analysis_process.join() - >>> break - >>> time.sleep(1) - """ - kwargs["is_async_analysis"] = True - - async_analysis_process = mp.Process(target=self.do_analysis, args=(dimensions,), kwargs=kwargs, - name="Async advisor performance analysis") - async_analysis_process.start() - self._update_analysis_process_resp(async_analysis_process.pid, {"id": async_analysis_process.pid}, - status_code=AsyncAnalysisStatus.NON_FAILED_STATUS_CODE, - status=AsyncAnalysisStatus.ANALYZING) - return async_analysis_process - - def get_response_by_pid(self, pid): - def _is_pid_exists(pid): - try: - psutil.Process(pid) - return True - except psutil.NoSuchProcess: - return False - - pid_not_exist_response = dict(id=pid, status_code=AsyncAnalysisStatus.NOT_FOUND_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED, - error_msg="The advisor task id does not exist") - if pid not in self.analysis_process_resp: - return pid_not_exist_response - - response = self.analysis_process_resp.get(pid) - if response.get("status") not in [AsyncAnalysisStatus.FAILED, - AsyncAnalysisStatus.SUCCESS] and not _is_pid_exists(pid): - return pid_not_exist_response - return response - - def single_rank_analysis(self, profiling_path, benchmark_profiling_path=None): - job_list = [] - - profiling_path = self._get_profiling_path_by_rank(profiling_path) - benchmark_profiling_path = self._get_profiling_path_by_rank(benchmark_profiling_path) - - # 单卡场景无集群分析 - for dim in [Interface.CLUSTER]: - if dim in self.dimensions: - self.dimensions.remove(dim) - - for dimension in self.dimensions: - dimension_analysis_func_name = f"{dimension}_analysis" - if not hasattr(self, dimension_analysis_func_name): - continue - logger.info("Start %s analysis", dimension) - job_list += getattr(self, dimension_analysis_func_name)(profiling_path) - - if benchmark_profiling_path: - # kernel/api 比对 - compare_profiling_list = [ - dict(profiling_path=profiling_path, benchmark_profiling_path=benchmark_profiling_path, - compare_mode=Constant.KERNEL_COMPARE), - dict(profiling_path=profiling_path, benchmark_profiling_path=benchmark_profiling_path, - compare_mode=Constant.API_COMPARE) - ] - - job_list += self._profiling_comparison(compare_profiling_list) - else: - self.overall(profiling_path) - - return job_list - - def do_cluster_analysis(self, profiling_path, benchmark_profiling_path=None): - job_list = [] - - # 单集群profiling分析:下发、通信、计算、显存/内存 - for dimension in self.dimensions: - dimension_analysis_func_name = f"cluster_{dimension}_analysis" - if not hasattr(self, dimension_analysis_func_name): - continue - logger.info("Start cluster %s analysis", dimension) - job_list += getattr(self, dimension_analysis_func_name)(profiling_path) - - self.overall(profiling_path) - - if benchmark_profiling_path: - # 两个集群profiling比对分析 - job_list += self._cluster_profiling_comparison(profiling_path, benchmark_profiling_path) - return job_list - - def overall(self, profiling_path): - from msprof_analyze.advisor.analyzer.overall.environment_variable_analyzer import EnvironmentVariableAnalyzer - env_analyzer = EnvironmentVariableAnalyzer(profiling_path) - env_analyzer.optimize() - - if self._is_cluster: - self.slow_rank_analyzer.optimize(template_key=Interface.OVERALL) - self.slow_link_analyzer.optimize(template_key=Interface.OVERALL) - else: - overall_analyzer = OverallSummaryAnalyzer(profiling_path) - overall_analyzer.optimize() - - def schedule_analysis(self, profiling_path, benchmark_profiling_path=None, step=None, benchmark_step=None, - **kwargs): - # 任意单卡的下发分析 - - input_kwargs = copy.deepcopy(self.kwargs) - job_list = [] - - input_kwargs["profiling_path"] = profiling_path - input_kwargs["benchmark_profiling_path"] = benchmark_profiling_path - input_kwargs["step"] = step - input_kwargs["benchmark_step"] = benchmark_step - input_kwargs["rank"] = kwargs.get("rank") - input_kwargs["step_duration"] = kwargs.get("step_duration") - - for dimension in [Interface.SCHEDULE]: - for scope in Interface.get_scope(dimension): - interface = Interface(**input_kwargs) - job_list.append((dimension, scope, interface, input_kwargs)) - return job_list - - def computation_analysis(self, profiling_path, benchmark_profiling_path=None, step=None, - benchmark_step=None, stage=None, **kwargs): - # 任意单卡的计算分析 - - input_kwargs = copy.deepcopy(self.kwargs) - input_kwargs["profiling_path"] = profiling_path - input_kwargs["benchmark_profiling_path"] = benchmark_profiling_path - input_kwargs["step"] = step - input_kwargs["benchmark_step"] = benchmark_step - input_kwargs["stage"] = stage - input_kwargs["rank"] = kwargs.get("rank") - input_kwargs["step_duration"] = kwargs.get("step_duration") - job_list = [] - - for dimension in [Interface.COMPUTATION]: - for scope in Interface.get_scope(dimension): - if scope == SupportedScopes.STAGE_COMPUTE: - continue - interface = Interface(**input_kwargs) - job_list.append((dimension, scope, interface, input_kwargs)) - return job_list - - def memory_analysis(self, profiling_path, benchmark_profiling_path=None, step=None, benchmark_step=None, **kwargs): - # 任意单卡的内存分析 - - input_kwargs = copy.deepcopy(self.kwargs) - job_list = [] - - input_kwargs["profiling_path"] = profiling_path - input_kwargs["benchmark_profiling_path"] = benchmark_profiling_path - input_kwargs["step"] = step - input_kwargs["benchmark_step"] = benchmark_step - input_kwargs["rank"] = kwargs.get("rank") - input_kwargs["step_duration"] = kwargs.get("step_duration") - - for dimension in [Interface.MEMORY]: - for scope in Interface.get_scope(dimension): - interface = Interface(**input_kwargs) - job_list.append((dimension, scope, interface, input_kwargs)) - return job_list - - def communication_analysis(self, profiling_path, benchmark_profiling_path=None, **kwargs): - - job_list = [] - supported_trans_type = [SlowLinkAnalyzer.SDMA, SlowLinkAnalyzer.RDMA] - step = kwargs.get("step", None) - benchmark_step = kwargs.get("benchmark_step", None) - bandwidth_type = kwargs.get("bandwidth_type", None) - scope = kwargs.get("scope", None) - if bandwidth_type is not None and bandwidth_type not in supported_trans_type: - logger.error("Error transit type %s, optionals are %s", bandwidth_type, supported_trans_type) - return job_list - - job_list += self._communication_analysis(profiling_path=profiling_path, - benchmark_profiling_path=benchmark_profiling_path, - step=step, benchmark_step=benchmark_step, - scope=scope, bandwidth_type=bandwidth_type) - - return job_list - - def cluster_schedule_analysis(self, profiling_path): - # 目标集群profiling数据下发分析,不包含两个集群profiling数据的比对分析 - - job_list = [] - global_step_rank = self.slow_rank_analyzer.get_global_step_rank(SlowRankAnalyzer.FREE) - - info_msg = "For cluster schedule analysis, " - slow_rank_id = global_step_rank.get("maximum", {}).get("rank_id") - if slow_rank_id is not None: - info_msg += f"maximum free for rank {slow_rank_id}" - else: - slow_rank_id = self.default_rank_id - info_msg += f"no slow rank with free time, analysis for default rank {slow_rank_id}" - - fast_rank_id = global_step_rank.get("minimum", {}).get("rank_id") - - slow_step = global_step_rank.get("maximum", {}).get("step") - fast_step = global_step_rank.get("minimum", {}).get("step") - - if slow_step is not None: - info_msg += f" and step {slow_step}" - logger.info(info_msg) - - kwargs = dict(profiling_path=self._get_profiling_path_by_rank(profiling_path, slow_rank_id), - benchmark_profiling_path=self._get_profiling_path_by_rank(profiling_path, fast_rank_id), - step=slow_step, benchmark_step=fast_step, - rank=slow_rank_id, benchmark_rank=fast_rank_id, - compare_mode=Constant.API_COMPARE, - step_duration=self.slow_rank_analyzer.get_step_duration(slow_rank_id, slow_step)) - - job_list += self.schedule_analysis(**kwargs) - - rank_id_valid = slow_rank_id is not None and fast_rank_id is not None and fast_rank_id != slow_rank_id - if not self.kwargs.get("benchmark_profiling_path") and rank_id_valid: - # 当用户指定benchmark profiling path时,不进行目标集群profiling的内部快慢卡对比 - logger.info("Enable schedule comparison of fast and slow rank/step") - job_list += self._profiling_comparison([kwargs]) - return job_list - - def cluster_communication_analysis(self, profiling_path): - job_list = [] - - for dimension in [Interface.COMMUNICATION]: - for scope in Interface.get_scope(dimension): - analyzer_class = Interface.get_analyzer(dimension, scope) - if hasattr(analyzer_class, "requires_cluster_dataset") and getattr(analyzer_class, - "requires_cluster_dataset"): - - # 如果不依赖数据集,或者依赖的是ClusterDataset,则不用根据带宽确定需要分析的特定rank - kwargs = copy.deepcopy(self.kwargs) - kwargs["profiling_path"] = profiling_path - interface = Interface(**kwargs) - job_list.append((dimension, scope, interface, kwargs)) - else: - # 非ClusterDataset场景,需要根据带宽大小分析特定的rank - for bandwidth_type in [SlowLinkAnalyzer.SDMA, SlowLinkAnalyzer.RDMA]: - global_step_rank = self.slow_link_analyzer.get_global_step_rank(bandwidth_type) - # 获取带宽最小的卡进行分析 - target_rank_id = global_step_rank.get("minimum", {}).get("rank_id") - if target_rank_id is None: - target_rank_id = self.default_rank_id - step = global_step_rank.get("minimum", {}).get("step") - analysis_profiling_path = self._get_profiling_path_by_rank(profiling_path, target_rank_id) - - info_msg = f"Minimum {bandwidth_type} bandwidth for rank {target_rank_id} " - if step: - info_msg += f"and step {step}" - logger.info(info_msg) - - job_list += self.communication_analysis(analysis_profiling_path, step=step, - bandwidth_type=bandwidth_type, scope=scope) - - return job_list - - def cluster_computation_analysis(self, profiling_path): - # 目标集群profiling数据计算分析,不包含两个集群profiling数据的比对分析;如果有pp stage,则对不同stage进行计算分析 - - job_list = [] - global_step_rank = self.slow_rank_analyzer.get_global_step_rank(SlowRankAnalyzer.COMPUTE) - stage_step_rank = self.slow_rank_analyzer.get_stage_step_rank(SlowRankAnalyzer.COMPUTE) - - if stage_step_rank: - job_list = self._stage_computation_analysis(profiling_path, stage_step_rank, job_list) - else: - job_list = self._global_computation_analysis(profiling_path, global_step_rank, job_list) - return job_list - - def cluster_memory_analysis(self, profiling_path): - # 目标集群profiling数据内存分析,当前memory识别的两个算子,导致的问题都是大的free,因此选择FREE最慢的卡进行分析 - - job_list = [] - global_step_rank = self.slow_rank_analyzer.get_global_step_rank(SlowRankAnalyzer.FREE) - - info_msg = "For cluster memory analysis, " - slow_rank_id = global_step_rank.get("maximum", {}).get("rank_id") - if slow_rank_id is not None: - info_msg += f"maximum free for rank {slow_rank_id}" - else: - slow_rank_id = self.default_rank_id - info_msg += f"no slow rank with free time, analysis for default rank {slow_rank_id}" - - slow_step = global_step_rank.get("maximum", {}).get("step") - if slow_step is not None: - info_msg += f" and step {slow_step}" - logger.info(info_msg) - - analysis_profiling_path = self._get_profiling_path_by_rank(profiling_path, slow_rank_id) - step_duration = self.slow_rank_analyzer.get_step_duration(slow_rank_id, slow_step) - job_list += self.memory_analysis(analysis_profiling_path, step=slow_step, rank=slow_rank_id, - step_duration=step_duration) - return job_list - - def _do_analysis(self, dimensions, pid=0, async_resp=None, **kwargs): - self.dimensions = dimensions - self.kwargs = kwargs - result_list = [] - profiling_path = PathManager.get_realpath(self.kwargs.get("profiling_path")) - benchmark_profiling_path = self.kwargs.get("benchmark_profiling_path") - PathManager.check_path_owner_consistent([profiling_path]) - if benchmark_profiling_path: - benchmark_profiling_path = PathManager.get_realpath(benchmark_profiling_path) - PathManager.check_path_owner_consistent([benchmark_profiling_path]) - - if not self._check_profiling_path_valid(profiling_path): - error_msg = f"Got invalid argument '-d/--profiling_path' {profiling_path}, skip analysis" - self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, - status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED) - logger.error(error_msg) - return - - - if benchmark_profiling_path and not self._check_profiling_path_valid(benchmark_profiling_path): - error_msg = (f"Got invalid argument '-bp/--benchmark_profiling_path' {benchmark_profiling_path}, " - f"skip analysis") - self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, - status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED) - logger.error(error_msg) - return - - self._is_cluster = self._is_cluster_profiling(profiling_path) - if benchmark_profiling_path: - # 构建benchmark profiling的map,用于根据rank获取profiling路径,否则无法进行比对 - is_benchmark_cluster = self._is_cluster_profiling(benchmark_profiling_path) - is_comparison_path_valid = (self._is_cluster and is_benchmark_cluster) or ( - not self._is_cluster and not is_benchmark_cluster) - if not is_comparison_path_valid: - error_msg = f"Only support profiling comparison for '1 npu vs 1 gpu/npu' and 'multi npus vs multi npus'" - self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, - status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED) - logger.error(error_msg) - return - - if not self._is_cluster: - job_list = self.single_rank_analysis(profiling_path, benchmark_profiling_path) - else: - self.slow_rank_analyzer = SlowRankAnalyzer(profiling_path, output_path=self.kwargs.get("output_path")) - self.slow_link_analyzer = SlowLinkAnalyzer(profiling_path, output_path=self.kwargs.get("output_path")) - job_list = self.do_cluster_analysis(profiling_path, benchmark_profiling_path) - - for i, (dimension, scope, interface, kwargs) in enumerate(job_list[::-1]): - result_list.append( - interface.get_result(dimension, scope, render_html=i == len(job_list) - 1, output_dict=False, - **kwargs) - ) - - for result in result_list[::-1]: - if result and hasattr(result, "show"): - result.show() - break - self._get_analysis_finished_resp(pid, async_resp) - - def _get_scopes(self, scope=None, bandwidth_type=SlowLinkAnalyzer.SDMA): - """ - Args: - scope: analyzer type - bandwidth_type: analysis standard - Returns: - scope lists - """ - scopes = [] - if scope: - if scope in self.COMMUNICATION_MAPPING.get(bandwidth_type, self.SDMA_SUPPORT_SCOPES): - scopes.append(scope) - return scopes - for dimension in [Interface.COMMUNICATION]: - for scope_ in Interface.get_scope(dimension): - if scope_ in self.SDMA_SUPPORT_SCOPES or scope_ in self.RDMA_SUPPORT_SCOPES: - scopes.append(scope_) - return scopes - - def _communication_analysis(self, **child_kwargs): - kwargs = copy.deepcopy(self.kwargs) - job_list = [] - - kwargs["profiling_path"] = child_kwargs.get("profiling_path", "") - kwargs["benchmark_profiling_path"] = child_kwargs.get("benchmark_profiling_path", "") - kwargs["step"] = child_kwargs.get("step", -1) - kwargs["benchmark_step"] = child_kwargs.get("benchmark_step", -1) - bandwidth_type = child_kwargs.get("bandwidth_type", SlowLinkAnalyzer.SDMA) - scope = child_kwargs.get("scope", None) - - for scope_ in self._get_scopes(scope, bandwidth_type): - interface = Interface(**kwargs) - job_list.append((Interface.COMMUNICATION, scope_, interface, kwargs)) - - return job_list - - def _profiling_comparison(self, compare_profiling_list): - job_list = [] - disable_profiling_comparison = os.getenv(Constant.DISABLE_PROFILING_COMPARISON) - if disable_profiling_comparison is not None and disable_profiling_comparison.lower() == "true": - logger.info( - "Skip profiling comparison due to longer processing time due to env 'DISABLE_PROFILING_COMPARISON'") - return job_list - - for index, _kwargs in enumerate(compare_profiling_list): - kwargs = copy.deepcopy(self.kwargs) - kwargs.update(_kwargs) - compare_profiling_list[index] = kwargs - - compare_kwargs = { - "profiling_path": kwargs.get("profiling_path"), - "compare_profiling_list": compare_profiling_list, - } - - interface = Interface(**compare_kwargs) - job_list.append((Interface.COMPARISON, SupportedScopes.COMPARISON, interface, compare_kwargs)) - - return job_list - - def _cluster_profiling_comparison(self, profiling_path, benchmark_profiling_path): - # 从计算、下发和通信三个维度对集群profiling数据进行对比 - - job_list = [] - benchmark_profiling_path = self._get_profiling_path_by_rank(benchmark_profiling_path) - benchmark_slow_rank_analyzer = SlowRankAnalyzer(benchmark_profiling_path) - benchmark_slow_link_analyzer = SlowLinkAnalyzer(benchmark_profiling_path) - - # 计算和下发分析 - job_list += self._cluster_data_comparison(profiling_path, - benchmark_profiling_path, - self.slow_rank_analyzer, - benchmark_slow_rank_analyzer, - get_max=True) - - # 通信分析 - job_list += self._cluster_data_comparison(profiling_path, - benchmark_profiling_path, - self.slow_link_analyzer, - benchmark_slow_link_analyzer, - get_max=False) - return job_list - - def _cluster_data_comparison(self, profiling_path, benchmark_profiling_path, target_cluster_analyzer, - benchmark_cluster_analyzer, get_max=False): - # #low rank/slow link结果逐行对比获取差值最大的rank和step进行单卡分析 - job_list = [] - - if isinstance(target_cluster_analyzer, SlowRankAnalyzer): - comparison_dims = [SlowRankAnalyzer.COMPUTE, SlowRankAnalyzer.FREE] - comparison_modes = [Constant.KERNEL_COMPARE, Constant.API_COMPARE] - elif isinstance(target_cluster_analyzer, SlowLinkAnalyzer): - comparison_dims = [SlowLinkAnalyzer.SDMA_BANDWIDTH, SlowLinkAnalyzer.RDMA_BANDWIDTH] - comparison_modes = [None, None] - else: - return job_list - - target_data = target_cluster_analyzer.format_datas.get("data", []) - benchmark_data = benchmark_cluster_analyzer.format_datas.get("data", []) - headers = benchmark_cluster_analyzer.format_datas.get("headers", []) - - if len(target_data) != len(benchmark_data): - logger.warning( - "The product of ranks and steps of Benchmark profiling is not equals to target profiling, " - "skip cluster comparison.") - return job_list - - compare_profiling_list = [] - for dimension, compare_mode in zip(comparison_dims, comparison_modes): - step, benchmark_step, rank_id_for_comparison = AnalyzerController._get_step_rank_for_cluster_statistic_diff( - target_data, - benchmark_data, - headers, - dimension, - get_max=get_max - ) - - rank_profiling_path = self._get_profiling_path_by_rank(profiling_path, rank_id_for_comparison) - rank_benchmark_profiling_path = self._get_profiling_path_by_rank( - benchmark_profiling_path, - rank_id_for_comparison - ) - - if rank_id_for_comparison is None: - # rank id为空则无法获取对应rank的profiling路径,无法进行比较 - continue - - compare_profiling_list.append( - dict(profiling_path=rank_profiling_path, benchmark_profiling_path=rank_benchmark_profiling_path, - step=step, benchmark_step=benchmark_step, - rank=rank_id_for_comparison, benchmark_rank=rank_id_for_comparison, compare_mode=compare_mode) - ) - - if not compare_profiling_list: - return job_list - - job_list += self._profiling_comparison(compare_profiling_list) - return job_list - - def _is_cluster_profiling(self, profiling_path): - if os.path.isfile(profiling_path): - return False - path_list = [os.path.join(profiling_path, dir_name) for dir_name in os.listdir(profiling_path)] - ascend_pt_dirs = [path for path in path_list if os.path.isdir(path) and path.endswith("ascend_pt")] - ascend_ms_dirs = [path for path in path_list if os.path.isdir(path) and path.endswith("ascend_ms")] - if ascend_ms_dirs and ascend_pt_dirs: - logger.error("Cannot analyze pytorch and mindspore meantime.") - return False - if not ascend_pt_dirs and not ascend_ms_dirs: - return False - if ascend_ms_dirs and not ascend_pt_dirs: - data_processor = MindsporeDataPreprocessor(ascend_ms_dirs) - elif ascend_pt_dirs and not ascend_ms_dirs: - data_processor = PytorchDataPreprocessor(ascend_pt_dirs) - - self.cluster_local_data_map[profiling_path] = data_processor.get_data_map() - - if not self.cluster_local_data_map or not self.cluster_local_data_map.get(profiling_path): - return False - - self.default_rank_id = list(self.cluster_local_data_map[profiling_path].keys())[0] - - return len(self.cluster_local_data_map[profiling_path]) >= self.CLUSTER_RANK_THRESHOLD - - def _get_profiling_path_by_rank(self, profiling_path, rank_id=None): - - if not profiling_path: - return profiling_path - - return self._get_target_profiling_path_for_local(profiling_path, rank_id) - - def _get_target_profiling_path_for_local(self, profiling_path, rank_id): - rank_id_map = self.cluster_local_data_map.get(profiling_path, {}) - if rank_id is None or not rank_id_map: - return profiling_path - - if rank_id in rank_id_map: - return rank_id_map.get(rank_id) - - local_first_rank_id = sorted(list(map(int, rank_id_map.keys())))[0] - logger.warning("Target rank id %s does not exist in local profiling data %s, use rank %s for analysis", - rank_id, profiling_path, local_first_rank_id) - return rank_id_map.get(local_first_rank_id) - - def _update_analysis_process_resp(self, pid, resp, **kwargs): - if kwargs: - resp.update(kwargs) - self.analysis_process_resp[pid] = resp - - def _get_analysis_finished_resp(self, pid, resp): - advisor_output_file_prefix = f"mstt_advisor_{Timer().strftime}" - html_path = os.path.join(Config().work_path, f"{advisor_output_file_prefix}.html") - xlsx_path = os.path.join(Config().work_path, "log", f"{advisor_output_file_prefix}.xlsx") - if os.path.exists(html_path) and os.path.exists(xlsx_path): - result_files = {"html": html_path, "xlsx": xlsx_path} - self._update_analysis_process_resp(pid, resp, status_code=AsyncAnalysisStatus.NON_FAILED_STATUS_CODE, - status=AsyncAnalysisStatus.SUCCESS, result_files=result_files) - else: - self._update_analysis_process_resp(pid, resp, status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, - status=AsyncAnalysisStatus.FAILED, - error_msg="No optimization suggestions, please check your input path.") - - def _stage_computation_analysis(self, profiling_path, stage_step_rank, job_list): - # 对不同pp stage取min max进行分析 - logger.info("Steps and ranks to be analyzed of different pipeline parallel stages are %s", - json.dumps(stage_step_rank)) - - stages_profiling_path = [] - for stage, step_rank_info in stage_step_rank.items(): - rank_id = step_rank_info.get("maximum", {}).get("rank_id") - step = step_rank_info.get("maximum", {}).get("step") - benchmark_rank_id = step_rank_info.get("minimum", {}).get("rank_id") - benchmark_step = step_rank_info.get("minimum", {}).get("step") - - info_msg = f"For {stage}, slow rank is {rank_id}" - if step: - info_msg += f", step is {step}" - logger.info(info_msg) - - stages_profiling_path.append( - dict( - stage=stage, rank=rank_id, step=step, benchmark_rank=benchmark_rank_id, - benchmark_step=benchmark_step, - profiling_path=self._get_profiling_path_by_rank(profiling_path, rank_id), - benchmark_profiling_path=self._get_profiling_path_by_rank(profiling_path, benchmark_rank_id), - compare_mode=Constant.KERNEL_COMPARE, - step_duration=self.slow_rank_analyzer.get_step_duration(rank_id, step) - ) - ) - Interface.add_analyzer(Interface.COMPUTATION, SupportedScopes.STAGE_COMPUTE, PPStageComputationAnalyzer) - compute_analysis_kwargs = {"stages_profiling_path": stages_profiling_path, "profiling_path": profiling_path} - - job_list.append((Interface.COMPUTATION, SupportedScopes.STAGE_COMPUTE, Interface(**compute_analysis_kwargs), - compute_analysis_kwargs)) - if not self.kwargs.get("benchmark_profiling_path"): - logger.info("Enable computation comparison of fast and slow rank/step in different pp stages") - job_list += self._profiling_comparison(stages_profiling_path) - return job_list - - def _global_computation_analysis(self, profiling_path, global_step_rank, job_list): - # 不区分stage,对所有卡取Min max进行分析 - logger.info("Without pipeline parallel stage, steps and ranks to be analyzed are %s", - json.dumps(global_step_rank)) - slow_rank_id = global_step_rank.get("maximum", {}).get("rank_id") - if slow_rank_id is not None: - info_msg = f"Maximum computation time for rank {slow_rank_id}" - else: - slow_rank_id = self.default_rank_id - info_msg = f"No slow rank with computation time, analysis for default rank {slow_rank_id}" - slow_step = global_step_rank.get("maximum", {}).get("step") - # 如果没有标杆profiling数据的rank id,说明没有快慢卡问题,直接对默认rank id进行分析,因此这里取值为None - fast_rank_id = global_step_rank.get("minimum", {}).get("rank_id") - fast_step = global_step_rank.get("minimum", {}).get("step") - - if slow_step is not None: - info_msg += f" and step {slow_step}, " - if fast_rank_id is not None: - info_msg += f"minimum computation time for rank {fast_rank_id}" - if fast_step is not None: - info_msg += f" and step {fast_step}" - logger.info(info_msg) - - kwargs = dict(profiling_path=self._get_profiling_path_by_rank(profiling_path, slow_rank_id), - benchmark_profiling_path=self._get_profiling_path_by_rank(profiling_path, fast_rank_id), - step=slow_step, benchmark_step=fast_step, rank=slow_rank_id, benchmark_rank=fast_rank_id, - compare_mode=Constant.KERNEL_COMPARE, - step_duration=self.slow_rank_analyzer.get_step_duration(slow_rank_id, slow_step)) - - job_list += self.computation_analysis(**kwargs) - - rank_id_valid = slow_rank_id is not None and fast_rank_id is not None and fast_rank_id != slow_rank_id - if not self.kwargs.get("benchmark_profiling_path") and rank_id_valid: - # 当用户指定benchmark profiling path时,不进行目标集群profiling的内部快慢卡对比 - logger.info("Enable computation comparison of fast and slow rank/step") - job_list += self._profiling_comparison([kwargs]) - return job_list +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import copy +import logging +import json +import sys +import os +import platform +import multiprocessing as mp +from multiprocessing import Manager +from pathlib import Path + +import psutil + +from msprof_analyze.prof_common.additional_args_manager import AdditionalArgsManager +from msprof_analyze.advisor.analyzer.cluster.slow_rank_analyzer import SlowRankAnalyzer +from msprof_analyze.advisor.analyzer.cluster.slow_link_analyzer import SlowLinkAnalyzer +from msprof_analyze.advisor.analyzer.computation.pp_stage_computation_analyzer import PPStageComputationAnalyzer +from msprof_analyze.advisor.analyzer.overall.overall_summary_analyzer import OverallSummaryAnalyzer +from msprof_analyze.advisor.config.config import Config +from msprof_analyze.advisor.common.analyzer_scopes import SupportedScopes +from msprof_analyze.advisor.common.async_analysis_status import AsyncAnalysisStatus +from msprof_analyze.advisor.common.enum_params_parser import EnumParamsParser +from msprof_analyze.advisor.utils.utils import Timer, safe_index_value, safe_division, safe_index, convert_to_int +from msprof_analyze.advisor.interface.interface import Interface +from msprof_analyze.cluster_analyse.cluster_data_preprocess.pytorch_data_preprocessor import PytorchDataPreprocessor +from msprof_analyze.cluster_analyse.cluster_data_preprocess.mindspore_data_preprocessor import MindsporeDataPreprocessor +from msprof_analyze.prof_common.path_manager import PathManager +from msprof_analyze.prof_common.constant import Constant + +# 以spawn模式启动多进程,避免fork主进程资源。如果主进程逻辑较为复杂,fork可能会导致异常。 +mp.set_start_method("spawn", force=True) +logger = logging.getLogger() + + +class AsyncParams: + """处理用户异步请求的输入参数,包括cli arguments和环境变量两类参数.""" + user_valid_arguments = {} + user_valid_envs = {} + user_non_enum_params = {} + user_invalid_values = [] + user_total_params = {} + + @staticmethod + def parse_async_list_params(key, value, option_values, key_type, value_type): + if isinstance(value, list): + value_list = value + else: + value_list = [_.strip(" ") for _ in str(value).split(",")] + + if sorted(value_list) not in [sorted(option) for option in option_values]: + AsyncParams.user_invalid_values.append( + {"key": key, "invalid value": value, "optional values": option_values, + "required value type": value_type}) + return + if key_type == EnumParamsParser.ENVS: + AsyncParams.user_valid_envs[key.upper()] = ",".join(value_list) + elif key_type == EnumParamsParser.ARGUMENTS: + AsyncParams.user_valid_arguments[key] = value_list + + @staticmethod + def parse_async_int_params(key, value, option_values, key_type, value_type): + if convert_to_int(value) not in option_values: + AsyncParams.user_invalid_values.append( + {"key": key, "invalid value": value, "optional values": option_values, + "required value type": value_type}) + return + + if key_type == EnumParamsParser.ENVS: + AsyncParams.user_valid_envs[key.upper()] = str(convert_to_int(value)) + elif key_type == EnumParamsParser.ARGUMENTS: + AsyncParams.user_valid_arguments[key] = convert_to_int(value) + + @staticmethod + def parse_async_str_params(key, value, option_values, key_type, value_type): + if str(value) not in option_values: + AsyncParams.user_invalid_values.append( + {"key": key, "invalid value": value, "optional values": option_values, + "required value type": value_type}) + return + if key_type == EnumParamsParser.ENVS: + AsyncParams.user_valid_envs[key.upper()] = str(value) + elif key_type == EnumParamsParser.ARGUMENTS: + AsyncParams.user_valid_arguments[key] = str(value) + + @staticmethod + def parse_async_boolean_params(key, value, option_values, key_type, value_type): + + if str(value).lower() not in ["true", "false"]: + AsyncParams.user_invalid_values.append( + {"key": key, "invalid value": value, "optional values": option_values, + "required value type": value_type}) + return + + if key_type == EnumParamsParser.ENVS: + AsyncParams.user_valid_envs[key.upper()] = str(value) + elif key_type == EnumParamsParser.ARGUMENTS: + AsyncParams.user_valid_arguments[key] = str(value).lower() == "true" + + @staticmethod + def parse_params(user_async_params): + params_parser = EnumParamsParser() + valid_env_keys = [key.lower() for key in params_parser.get_envs_keys()] + valid_arg_keys = [key.lower() for key in params_parser.get_arguments_keys()] + + for key, value in user_async_params.items(): + key = key.lower() + if key not in valid_env_keys + valid_arg_keys: + AsyncParams.user_non_enum_params[key] = value + continue + + if key in valid_env_keys: + # 环境变量均大写,异步调用入参到analyzer controller时支持用户使用小写配置环境变量 + option_values = params_parser.get_options(key.upper()) + value_type = params_parser.get_value_type(key.upper()) + key_type = params_parser.ENVS + else: + option_values = params_parser.get_options(key) + value_type = params_parser.get_value_type(key) + key_type = params_parser.ARGUMENTS + + if hasattr(AsyncParams, f"parse_async_{value_type}_params"): + getattr(AsyncParams, f"parse_async_{value_type}_params")(key, value, option_values, key_type, + value_type) + + AsyncParams.user_total_params["async_analysis_env"] = AsyncParams.user_valid_envs + AsyncParams.user_total_params.update(AsyncParams.user_valid_arguments) + AsyncParams.user_total_params.update(AsyncParams.user_non_enum_params) + + +class AnalyzerController: + CLUSTER_RANK_THRESHOLD = 2 + SDMA_SUPPORT_SCOPES = [SupportedScopes.BANDWIDTH_CONTENTION_DETECTION, SupportedScopes.BYTE_ALIGNMENT_DETECTION] + RDMA_SUPPORT_SCOPES = [SupportedScopes.PACKET] + COMMUNICATION_MAPPING = { + SlowLinkAnalyzer.SDMA: SDMA_SUPPORT_SCOPES, + SlowLinkAnalyzer.RDMA: RDMA_SUPPORT_SCOPES + } + + def __init__(self): + self.dimensions = Interface.all_dimension + self.kwargs = {} + self.args_manager = None + self.slow_rank_analyzer = None + self.slow_link_analyzer = None + self.cluster_local_data_map = {} + self.default_rank_id = None + self.rank_id_map = {} + self._is_cluster = False + self.analysis_process_resp = Manager().dict() + + @staticmethod + def _set_analysis_process_priority(pid): + # 将分析进程优先级设置为最低,避免因为分析进程阻塞其他任务进程,unix上19表示最低优先级 + unix_process_lowest_priority = 19 + windows_platform = "windows" + linux_platform = "linux" + p = psutil.Process(pid) + if platform.system().lower() == windows_platform: + p.nice(psutil.BELOW_NORMAL_PRIORITY_CLASS) + elif platform.system().lower() == linux_platform: + p.nice(unix_process_lowest_priority) + + @staticmethod + def _check_profiling_path_valid(profiling_path): + PathManager.input_path_common_check(profiling_path) + + if not Path(profiling_path).exists(): + logger.error("Profiling path is not existed. Invalid profiling path: %s", profiling_path) + return False + + return True + + + @staticmethod + def _get_step_rank_for_cluster_statistic_diff(target_cluster_statistic_data, benchmark_cluster_statistic_data, + headers, dimension, get_max=False): + if dimension not in headers: + logger.error("Error dimension %s for cluster statistics data, optionals are %s.", dimension, headers) + return None, None, None + + dimension_index = safe_index_value(headers, dimension) + diff_record = [] + # 对比目标profiling和benchmark profiling 每张卡的计算和下发和带宽,取计算、下发、带宽差异最大的卡进行下一步分析 + for target_row_data, benchmark_row_data in zip(target_cluster_statistic_data, benchmark_cluster_statistic_data): + target_data = safe_index(target_row_data, dimension_index) + benchmark_data = safe_index(benchmark_row_data, dimension_index) + + if not isinstance(target_data, (int, float)) or not isinstance(benchmark_data, (int, float)): + continue + diff_record.append(target_data - benchmark_data) + + if SlowRankAnalyzer.compute_max_gap_ratio(diff_record, safe_division(sum(diff_record), len( + diff_record))) < SlowRankAnalyzer.RATIO_THRESHOLD: + return None, None, None + + value = max(diff_record) if get_max else min(diff_record) + value_index = safe_index_value(diff_record, value) + + step_value_index = safe_index_value(headers, "step") + rank_id_value_index = safe_index_value(headers, "rank_id") + + step = safe_index(safe_index(target_cluster_statistic_data, value_index, []), step_value_index) + benchmark_step = safe_index(safe_index(benchmark_cluster_statistic_data, value_index, []), step_value_index) + target_rank_id = safe_index(safe_index(target_cluster_statistic_data, value_index, []), rank_id_value_index) + benchmark_rank_id = safe_index(safe_index(benchmark_cluster_statistic_data, value_index, []), + rank_id_value_index) + + if target_rank_id != benchmark_rank_id: + logger.error( + "Rank ids of target profiling must keep the same as benchmark profiling, skip cluster comparison") + return None, None, None + + return step, benchmark_step, target_rank_id + + @staticmethod + def _init_async_analysis_env(kwargs): + envs = kwargs.get("async_analysis_env", {}) + for key, value in envs.items(): + os.environ[key] = value + + def format_async_analysis_params(self, pid, async_resp, dimensions, kwargs): + + AsyncParams.parse_params(kwargs) + dimensions = AsyncParams.user_total_params.get("analysis_dimensions") or dimensions + + if AsyncParams.user_invalid_values: + error_msg = "Got invalid arguments as follows: \n " + for index, invalid_value in enumerate(AsyncParams.user_invalid_values): + error_msg += f"{index + 1}. Key '{invalid_value.get('key')}', " \ + f"invalid value '{invalid_value.get('invalid value')}', " \ + f"optional valid values '{invalid_value.get('optional values')}', " \ + f"required value type '{invalid_value.get('required value type')}'.\n " + self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, + status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED) + raise ValueError(error_msg) + + logger.warning("User parameters for async analysis is as follows:\n %s", + json.dumps(AsyncParams.user_total_params, indent=4)) + return dimensions, AsyncParams.user_total_params + + def do_analysis(self, dimensions, **kwargs): + pid = os.getpid() + resp = {"id": pid} + self.args_manager = AdditionalArgsManager() + self.args_manager.init(kwargs) + output_path = kwargs.get("output_path") + + AnalyzerController._set_analysis_process_priority(pid) + if kwargs.get("is_async_analysis"): + del kwargs["is_async_analysis"] + dimensions, kwargs = self.format_async_analysis_params(pid, resp, dimensions, kwargs) + AnalyzerController._init_async_analysis_env(kwargs) + + try: + if output_path: + + PathManager.check_input_directory_path(output_path) + if os.path.exists(output_path): + PathManager.check_path_owner_consistent([output_path]) + else: + PathManager.make_dir_safety(output_path) + + Config().set_config("_work_path", output_path) + Config().set_log_path(f"mstt_advisor_{Timer().strftime}.xlsx") + + self._do_analysis(dimensions, pid=pid, async_resp=resp, **kwargs) + except Exception as e: + self._update_analysis_process_resp(pid, resp, status_code=AsyncAnalysisStatus.INNER_ERROR_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED, error_msg=str(e)) + logger.error(e) + raise RuntimeError("Do analysis error.") from e + + def async_do_analysis(self, dimensions, **kwargs): + """ Deploy a online service to start async analysis job, wrap this api by flask or tornado and so on, + then could query the analysis status by restful api. + You can view file 'profiler/msprof_analyze/advisor/config/enum_parameters.yaml' to obtain detailed + information for all the args listed below. + + Args: + dimensions: analysis dimension, normally set as Interface.all_dimension, support specific dimension analysis + such as ['computation'] or ['computation', 'schedule'] + cann_version: cann version of your runtime, inpact on the analysis of affinity api and AICPU operators + profiling_type: profiling type of your runtime + profiling_version: profiling version of your runtime, inpact on the analysis of affinity api + analysis_dimensions: can overwite dimensions. + advisor_analyze_processes: number of processes to use while the training params pipeline parallel(pp) >1, + can reduce the time of analysis. + disable_profiling_comparison: disable comparison of operators(including npu computation operator and + cpu torch aten operator), can reduce the time of analysis. + disable_affinity_api: disable analysis of affinity api, normally set as 'True' while you training job + has been trained on NPU for a long time and suddenly shows performance degradation. + output_path: analysis output path(including html and xlsx). + + Example: + >>> # initialize a global analyzer controller + >>> analyzer_controller = AnalyzerController() + >>> analysis_kwargs = dict(advisor_analyze_processes=2, disable_profiling_comparison=True) + >>> + >>> async_analysis_process = analyzer_controller.async_do_analysis( + >>> Interface.all_dimension, **analysis_kwargs) + >>> + >>> + >>> # query the job status every second + >>> while True: + >>> response = analyzer_controller.get_response_by_pid(async_analysis_process.pid) + >>> print(f'analysis response is {response}') + >>> if response.get("status") in ["success", "failed"]: + >>> async_analysis_process.join() + >>> break + >>> time.sleep(1) + """ + kwargs["is_async_analysis"] = True + + async_analysis_process = mp.Process(target=self.do_analysis, args=(dimensions,), kwargs=kwargs, + name="Async advisor performance analysis") + async_analysis_process.start() + self._update_analysis_process_resp(async_analysis_process.pid, {"id": async_analysis_process.pid}, + status_code=AsyncAnalysisStatus.NON_FAILED_STATUS_CODE, + status=AsyncAnalysisStatus.ANALYZING) + return async_analysis_process + + def get_response_by_pid(self, pid): + def _is_pid_exists(pid): + try: + psutil.Process(pid) + return True + except psutil.NoSuchProcess: + return False + + pid_not_exist_response = dict(id=pid, status_code=AsyncAnalysisStatus.NOT_FOUND_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED, + error_msg="The advisor task id does not exist") + if pid not in self.analysis_process_resp: + return pid_not_exist_response + + response = self.analysis_process_resp.get(pid) + if response.get("status") not in [AsyncAnalysisStatus.FAILED, + AsyncAnalysisStatus.SUCCESS] and not _is_pid_exists(pid): + return pid_not_exist_response + return response + + def single_rank_analysis(self, profiling_path, benchmark_profiling_path=None): + job_list = [] + + profiling_path = self._get_profiling_path_by_rank(profiling_path) + benchmark_profiling_path = self._get_profiling_path_by_rank(benchmark_profiling_path) + + # 单卡场景无集群分析 + for dim in [Interface.CLUSTER]: + if dim in self.dimensions: + self.dimensions.remove(dim) + + for dimension in self.dimensions: + dimension_analysis_func_name = f"{dimension}_analysis" + if not hasattr(self, dimension_analysis_func_name): + continue + logger.info("Start %s analysis", dimension) + job_list += getattr(self, dimension_analysis_func_name)(profiling_path) + + if benchmark_profiling_path: + # kernel/api 比对 + compare_profiling_list = [ + dict(profiling_path=profiling_path, benchmark_profiling_path=benchmark_profiling_path, + compare_mode=Constant.KERNEL_COMPARE), + dict(profiling_path=profiling_path, benchmark_profiling_path=benchmark_profiling_path, + compare_mode=Constant.API_COMPARE) + ] + + job_list += self._profiling_comparison(compare_profiling_list) + else: + self.overall(profiling_path) + + return job_list + + def do_cluster_analysis(self, profiling_path, benchmark_profiling_path=None): + job_list = [] + + # 单集群profiling分析:下发、通信、计算、显存/内存 + for dimension in self.dimensions: + dimension_analysis_func_name = f"cluster_{dimension}_analysis" + if not hasattr(self, dimension_analysis_func_name): + continue + logger.info("Start cluster %s analysis", dimension) + job_list += getattr(self, dimension_analysis_func_name)(profiling_path) + + self.overall(profiling_path) + + if benchmark_profiling_path: + # 两个集群profiling比对分析 + job_list += self._cluster_profiling_comparison(profiling_path, benchmark_profiling_path) + return job_list + + def overall(self, profiling_path): + from msprof_analyze.advisor.analyzer.overall.environment_variable_analyzer import EnvironmentVariableAnalyzer + env_analyzer = EnvironmentVariableAnalyzer(profiling_path) + env_analyzer.optimize() + + if self._is_cluster: + self.slow_rank_analyzer.optimize(template_key=Interface.OVERALL) + self.slow_link_analyzer.optimize(template_key=Interface.OVERALL) + else: + overall_analyzer = OverallSummaryAnalyzer(profiling_path) + overall_analyzer.optimize() + + def schedule_analysis(self, profiling_path, benchmark_profiling_path=None, step=None, benchmark_step=None, + **kwargs): + # 任意单卡的下发分析 + + input_kwargs = copy.deepcopy(self.kwargs) + job_list = [] + + input_kwargs["profiling_path"] = profiling_path + input_kwargs["benchmark_profiling_path"] = benchmark_profiling_path + input_kwargs["step"] = step + input_kwargs["benchmark_step"] = benchmark_step + input_kwargs["rank"] = kwargs.get("rank") + input_kwargs["step_duration"] = kwargs.get("step_duration") + + for dimension in [Interface.SCHEDULE]: + for scope in Interface.get_scope(dimension): + interface = Interface(**input_kwargs) + job_list.append((dimension, scope, interface, input_kwargs)) + return job_list + + def computation_analysis(self, profiling_path, benchmark_profiling_path=None, step=None, + benchmark_step=None, stage=None, **kwargs): + # 任意单卡的计算分析 + + input_kwargs = copy.deepcopy(self.kwargs) + input_kwargs["profiling_path"] = profiling_path + input_kwargs["benchmark_profiling_path"] = benchmark_profiling_path + input_kwargs["step"] = step + input_kwargs["benchmark_step"] = benchmark_step + input_kwargs["stage"] = stage + input_kwargs["rank"] = kwargs.get("rank") + input_kwargs["step_duration"] = kwargs.get("step_duration") + job_list = [] + + for dimension in [Interface.COMPUTATION]: + for scope in Interface.get_scope(dimension): + if scope == SupportedScopes.STAGE_COMPUTE: + continue + interface = Interface(**input_kwargs) + job_list.append((dimension, scope, interface, input_kwargs)) + return job_list + + def memory_analysis(self, profiling_path, benchmark_profiling_path=None, step=None, benchmark_step=None, **kwargs): + # 任意单卡的内存分析 + + input_kwargs = copy.deepcopy(self.kwargs) + job_list = [] + + input_kwargs["profiling_path"] = profiling_path + input_kwargs["benchmark_profiling_path"] = benchmark_profiling_path + input_kwargs["step"] = step + input_kwargs["benchmark_step"] = benchmark_step + input_kwargs["rank"] = kwargs.get("rank") + input_kwargs["step_duration"] = kwargs.get("step_duration") + + for dimension in [Interface.MEMORY]: + for scope in Interface.get_scope(dimension): + interface = Interface(**input_kwargs) + job_list.append((dimension, scope, interface, input_kwargs)) + return job_list + + def communication_analysis(self, profiling_path, benchmark_profiling_path=None, **kwargs): + + job_list = [] + supported_trans_type = [SlowLinkAnalyzer.SDMA, SlowLinkAnalyzer.RDMA] + step = kwargs.get("step", None) + benchmark_step = kwargs.get("benchmark_step", None) + bandwidth_type = kwargs.get("bandwidth_type", None) + scope = kwargs.get("scope", None) + if bandwidth_type is not None and bandwidth_type not in supported_trans_type: + logger.error("Error transit type %s, optionals are %s", bandwidth_type, supported_trans_type) + return job_list + + job_list += self._communication_analysis(profiling_path=profiling_path, + benchmark_profiling_path=benchmark_profiling_path, + step=step, benchmark_step=benchmark_step, + scope=scope, bandwidth_type=bandwidth_type) + + return job_list + + def cluster_schedule_analysis(self, profiling_path): + # 目标集群profiling数据下发分析,不包含两个集群profiling数据的比对分析 + + job_list = [] + global_step_rank = self.slow_rank_analyzer.get_global_step_rank(SlowRankAnalyzer.FREE) + + info_msg = "For cluster schedule analysis, " + slow_rank_id = global_step_rank.get("maximum", {}).get("rank_id") + if slow_rank_id is not None: + info_msg += f"maximum free for rank {slow_rank_id}" + else: + slow_rank_id = self.default_rank_id + info_msg += f"no slow rank with free time, analysis for default rank {slow_rank_id}" + + fast_rank_id = global_step_rank.get("minimum", {}).get("rank_id") + + slow_step = global_step_rank.get("maximum", {}).get("step") + fast_step = global_step_rank.get("minimum", {}).get("step") + + if slow_step is not None: + info_msg += f" and step {slow_step}" + logger.info(info_msg) + + kwargs = dict(profiling_path=self._get_profiling_path_by_rank(profiling_path, slow_rank_id), + benchmark_profiling_path=self._get_profiling_path_by_rank(profiling_path, fast_rank_id), + step=slow_step, benchmark_step=fast_step, + rank=slow_rank_id, benchmark_rank=fast_rank_id, + compare_mode=Constant.API_COMPARE, + step_duration=self.slow_rank_analyzer.get_step_duration(slow_rank_id, slow_step)) + + job_list += self.schedule_analysis(**kwargs) + + rank_id_valid = slow_rank_id is not None and fast_rank_id is not None and fast_rank_id != slow_rank_id + if not self.kwargs.get("benchmark_profiling_path") and rank_id_valid: + # 当用户指定benchmark profiling path时,不进行目标集群profiling的内部快慢卡对比 + logger.info("Enable schedule comparison of fast and slow rank/step") + job_list += self._profiling_comparison([kwargs]) + return job_list + + def cluster_communication_analysis(self, profiling_path): + job_list = [] + + for dimension in [Interface.COMMUNICATION]: + for scope in Interface.get_scope(dimension): + analyzer_class = Interface.get_analyzer(dimension, scope) + if hasattr(analyzer_class, "requires_cluster_dataset") and getattr(analyzer_class, + "requires_cluster_dataset"): + + # 如果不依赖数据集,或者依赖的是ClusterDataset,则不用根据带宽确定需要分析的特定rank + kwargs = copy.deepcopy(self.kwargs) + kwargs["profiling_path"] = profiling_path + interface = Interface(**kwargs) + job_list.append((dimension, scope, interface, kwargs)) + else: + # 非ClusterDataset场景,需要根据带宽大小分析特定的rank + for bandwidth_type in [SlowLinkAnalyzer.SDMA, SlowLinkAnalyzer.RDMA]: + global_step_rank = self.slow_link_analyzer.get_global_step_rank(bandwidth_type) + # 获取带宽最小的卡进行分析 + target_rank_id = global_step_rank.get("minimum", {}).get("rank_id") + if target_rank_id is None: + target_rank_id = self.default_rank_id + step = global_step_rank.get("minimum", {}).get("step") + analysis_profiling_path = self._get_profiling_path_by_rank(profiling_path, target_rank_id) + + info_msg = f"Minimum {bandwidth_type} bandwidth for rank {target_rank_id} " + if step: + info_msg += f"and step {step}" + logger.info(info_msg) + + job_list += self.communication_analysis(analysis_profiling_path, step=step, + bandwidth_type=bandwidth_type, scope=scope) + + return job_list + + def cluster_computation_analysis(self, profiling_path): + # 目标集群profiling数据计算分析,不包含两个集群profiling数据的比对分析;如果有pp stage,则对不同stage进行计算分析 + + job_list = [] + global_step_rank = self.slow_rank_analyzer.get_global_step_rank(SlowRankAnalyzer.COMPUTE) + stage_step_rank = self.slow_rank_analyzer.get_stage_step_rank(SlowRankAnalyzer.COMPUTE) + + if stage_step_rank: + job_list = self._stage_computation_analysis(profiling_path, stage_step_rank, job_list) + else: + job_list = self._global_computation_analysis(profiling_path, global_step_rank, job_list) + return job_list + + def cluster_memory_analysis(self, profiling_path): + # 目标集群profiling数据内存分析,当前memory识别的两个算子,导致的问题都是大的free,因此选择FREE最慢的卡进行分析 + + job_list = [] + global_step_rank = self.slow_rank_analyzer.get_global_step_rank(SlowRankAnalyzer.FREE) + + info_msg = "For cluster memory analysis, " + slow_rank_id = global_step_rank.get("maximum", {}).get("rank_id") + if slow_rank_id is not None: + info_msg += f"maximum free for rank {slow_rank_id}" + else: + slow_rank_id = self.default_rank_id + info_msg += f"no slow rank with free time, analysis for default rank {slow_rank_id}" + + slow_step = global_step_rank.get("maximum", {}).get("step") + if slow_step is not None: + info_msg += f" and step {slow_step}" + logger.info(info_msg) + + analysis_profiling_path = self._get_profiling_path_by_rank(profiling_path, slow_rank_id) + step_duration = self.slow_rank_analyzer.get_step_duration(slow_rank_id, slow_step) + job_list += self.memory_analysis(analysis_profiling_path, step=slow_step, rank=slow_rank_id, + step_duration=step_duration) + return job_list + + def _do_analysis(self, dimensions, pid=0, async_resp=None, **kwargs): + self.dimensions = dimensions + self.kwargs = kwargs + result_list = [] + profiling_path = PathManager.get_realpath(self.kwargs.get("profiling_path")) + benchmark_profiling_path = self.kwargs.get("benchmark_profiling_path") + PathManager.check_path_owner_consistent([profiling_path]) + if benchmark_profiling_path: + benchmark_profiling_path = PathManager.get_realpath(benchmark_profiling_path) + PathManager.check_path_owner_consistent([benchmark_profiling_path]) + + if not self._check_profiling_path_valid(profiling_path): + error_msg = f"Got invalid argument '-d/--profiling_path' {profiling_path}, skip analysis" + self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, + status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED) + logger.error(error_msg) + return + + + if benchmark_profiling_path and not self._check_profiling_path_valid(benchmark_profiling_path): + error_msg = (f"Got invalid argument '-bp/--benchmark_profiling_path' {benchmark_profiling_path}, " + f"skip analysis") + self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, + status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED) + logger.error(error_msg) + return + + self._is_cluster = self._is_cluster_profiling(profiling_path) + if benchmark_profiling_path: + # 构建benchmark profiling的map,用于根据rank获取profiling路径,否则无法进行比对 + is_benchmark_cluster = self._is_cluster_profiling(benchmark_profiling_path) + is_comparison_path_valid = (self._is_cluster and is_benchmark_cluster) or ( + not self._is_cluster and not is_benchmark_cluster) + if not is_comparison_path_valid: + error_msg = f"Only support profiling comparison for '1 npu vs 1 gpu/npu' and 'multi npus vs multi npus'" + self._update_analysis_process_resp(pid, async_resp, error_msg=error_msg, + status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED) + logger.error(error_msg) + return + + if not self._is_cluster: + job_list = self.single_rank_analysis(profiling_path, benchmark_profiling_path) + else: + self.slow_rank_analyzer = SlowRankAnalyzer(profiling_path, output_path=self.kwargs.get("output_path")) + self.slow_link_analyzer = SlowLinkAnalyzer(profiling_path, output_path=self.kwargs.get("output_path")) + job_list = self.do_cluster_analysis(profiling_path, benchmark_profiling_path) + + for i, (dimension, scope, interface, kwargs) in enumerate(job_list[::-1]): + result_list.append( + interface.get_result(dimension, scope, render_html=i == len(job_list) - 1, output_dict=False, + **kwargs) + ) + + for result in result_list[::-1]: + if result and hasattr(result, "show"): + result.show() + break + self._get_analysis_finished_resp(pid, async_resp) + + def _get_scopes(self, scope=None, bandwidth_type=SlowLinkAnalyzer.SDMA): + """ + Args: + scope: analyzer type + bandwidth_type: analysis standard + Returns: + scope lists + """ + scopes = [] + if scope: + if scope in self.COMMUNICATION_MAPPING.get(bandwidth_type, self.SDMA_SUPPORT_SCOPES): + scopes.append(scope) + return scopes + for dimension in [Interface.COMMUNICATION]: + for scope_ in Interface.get_scope(dimension): + if scope_ in self.SDMA_SUPPORT_SCOPES or scope_ in self.RDMA_SUPPORT_SCOPES: + scopes.append(scope_) + return scopes + + def _communication_analysis(self, **child_kwargs): + kwargs = copy.deepcopy(self.kwargs) + job_list = [] + + kwargs["profiling_path"] = child_kwargs.get("profiling_path", "") + kwargs["benchmark_profiling_path"] = child_kwargs.get("benchmark_profiling_path", "") + kwargs["step"] = child_kwargs.get("step", -1) + kwargs["benchmark_step"] = child_kwargs.get("benchmark_step", -1) + bandwidth_type = child_kwargs.get("bandwidth_type", SlowLinkAnalyzer.SDMA) + scope = child_kwargs.get("scope", None) + + for scope_ in self._get_scopes(scope, bandwidth_type): + interface = Interface(**kwargs) + job_list.append((Interface.COMMUNICATION, scope_, interface, kwargs)) + + return job_list + + def _profiling_comparison(self, compare_profiling_list): + job_list = [] + disable_profiling_comparison = os.getenv(Constant.DISABLE_PROFILING_COMPARISON) + if disable_profiling_comparison is not None and disable_profiling_comparison.lower() == "true": + logger.info( + "Skip profiling comparison due to longer processing time due to env 'DISABLE_PROFILING_COMPARISON'") + return job_list + + for index, _kwargs in enumerate(compare_profiling_list): + kwargs = copy.deepcopy(self.kwargs) + kwargs.update(_kwargs) + compare_profiling_list[index] = kwargs + + compare_kwargs = { + "profiling_path": kwargs.get("profiling_path"), + "compare_profiling_list": compare_profiling_list, + } + + interface = Interface(**compare_kwargs) + job_list.append((Interface.COMPARISON, SupportedScopes.COMPARISON, interface, compare_kwargs)) + + return job_list + + def _cluster_profiling_comparison(self, profiling_path, benchmark_profiling_path): + # 从计算、下发和通信三个维度对集群profiling数据进行对比 + + job_list = [] + benchmark_profiling_path = self._get_profiling_path_by_rank(benchmark_profiling_path) + benchmark_slow_rank_analyzer = SlowRankAnalyzer(benchmark_profiling_path) + benchmark_slow_link_analyzer = SlowLinkAnalyzer(benchmark_profiling_path) + + # 计算和下发分析 + job_list += self._cluster_data_comparison(profiling_path, + benchmark_profiling_path, + self.slow_rank_analyzer, + benchmark_slow_rank_analyzer, + get_max=True) + + # 通信分析 + job_list += self._cluster_data_comparison(profiling_path, + benchmark_profiling_path, + self.slow_link_analyzer, + benchmark_slow_link_analyzer, + get_max=False) + return job_list + + def _cluster_data_comparison(self, profiling_path, benchmark_profiling_path, target_cluster_analyzer, + benchmark_cluster_analyzer, get_max=False): + # #low rank/slow link结果逐行对比获取差值最大的rank和step进行单卡分析 + job_list = [] + + if isinstance(target_cluster_analyzer, SlowRankAnalyzer): + comparison_dims = [SlowRankAnalyzer.COMPUTE, SlowRankAnalyzer.FREE] + comparison_modes = [Constant.KERNEL_COMPARE, Constant.API_COMPARE] + elif isinstance(target_cluster_analyzer, SlowLinkAnalyzer): + comparison_dims = [SlowLinkAnalyzer.SDMA_BANDWIDTH, SlowLinkAnalyzer.RDMA_BANDWIDTH] + comparison_modes = [None, None] + else: + return job_list + + target_data = target_cluster_analyzer.format_datas.get("data", []) + benchmark_data = benchmark_cluster_analyzer.format_datas.get("data", []) + headers = benchmark_cluster_analyzer.format_datas.get("headers", []) + + if len(target_data) != len(benchmark_data): + logger.warning( + "The product of ranks and steps of Benchmark profiling is not equals to target profiling, " + "skip cluster comparison.") + return job_list + + compare_profiling_list = [] + for dimension, compare_mode in zip(comparison_dims, comparison_modes): + step, benchmark_step, rank_id_for_comparison = AnalyzerController._get_step_rank_for_cluster_statistic_diff( + target_data, + benchmark_data, + headers, + dimension, + get_max=get_max + ) + + rank_profiling_path = self._get_profiling_path_by_rank(profiling_path, rank_id_for_comparison) + rank_benchmark_profiling_path = self._get_profiling_path_by_rank( + benchmark_profiling_path, + rank_id_for_comparison + ) + + if rank_id_for_comparison is None: + # rank id为空则无法获取对应rank的profiling路径,无法进行比较 + continue + + compare_profiling_list.append( + dict(profiling_path=rank_profiling_path, benchmark_profiling_path=rank_benchmark_profiling_path, + step=step, benchmark_step=benchmark_step, + rank=rank_id_for_comparison, benchmark_rank=rank_id_for_comparison, compare_mode=compare_mode) + ) + + if not compare_profiling_list: + return job_list + + job_list += self._profiling_comparison(compare_profiling_list) + return job_list + + def _is_cluster_profiling(self, profiling_path): + if os.path.isfile(profiling_path): + return False + path_list = [os.path.join(profiling_path, dir_name) for dir_name in os.listdir(profiling_path)] + ascend_pt_dirs = [path for path in path_list if os.path.isdir(path) and path.endswith("ascend_pt")] + ascend_ms_dirs = [path for path in path_list if os.path.isdir(path) and path.endswith("ascend_ms")] + if ascend_ms_dirs and ascend_pt_dirs: + logger.error("Cannot analyze pytorch and mindspore meantime.") + return False + if not ascend_pt_dirs and not ascend_ms_dirs: + return False + if ascend_ms_dirs and not ascend_pt_dirs: + data_processor = MindsporeDataPreprocessor(ascend_ms_dirs) + elif ascend_pt_dirs and not ascend_ms_dirs: + data_processor = PytorchDataPreprocessor(ascend_pt_dirs) + + self.cluster_local_data_map[profiling_path] = data_processor.get_data_map() + + if not self.cluster_local_data_map or not self.cluster_local_data_map.get(profiling_path): + return False + + self.default_rank_id = list(self.cluster_local_data_map[profiling_path].keys())[0] + + return len(self.cluster_local_data_map[profiling_path]) >= self.CLUSTER_RANK_THRESHOLD + + def _get_profiling_path_by_rank(self, profiling_path, rank_id=None): + + if not profiling_path: + return profiling_path + + return self._get_target_profiling_path_for_local(profiling_path, rank_id) + + def _get_target_profiling_path_for_local(self, profiling_path, rank_id): + rank_id_map = self.cluster_local_data_map.get(profiling_path, {}) + if rank_id is None or not rank_id_map: + return profiling_path + + if rank_id in rank_id_map: + return rank_id_map.get(rank_id) + + local_first_rank_id = sorted(list(map(int, rank_id_map.keys())))[0] + logger.warning("Target rank id %s does not exist in local profiling data %s, use rank %s for analysis", + rank_id, profiling_path, local_first_rank_id) + return rank_id_map.get(local_first_rank_id) + + def _update_analysis_process_resp(self, pid, resp, **kwargs): + if kwargs: + resp.update(kwargs) + self.analysis_process_resp[pid] = resp + + def _get_analysis_finished_resp(self, pid, resp): + advisor_output_file_prefix = f"mstt_advisor_{Timer().strftime}" + html_path = os.path.join(Config().work_path, f"{advisor_output_file_prefix}.html") + xlsx_path = os.path.join(Config().work_path, "log", f"{advisor_output_file_prefix}.xlsx") + if os.path.exists(html_path) and os.path.exists(xlsx_path): + result_files = {"html": html_path, "xlsx": xlsx_path} + self._update_analysis_process_resp(pid, resp, status_code=AsyncAnalysisStatus.NON_FAILED_STATUS_CODE, + status=AsyncAnalysisStatus.SUCCESS, result_files=result_files) + else: + self._update_analysis_process_resp(pid, resp, status_code=AsyncAnalysisStatus.BAD_REQUEST_STATUS_CODE, + status=AsyncAnalysisStatus.FAILED, + error_msg="No optimization suggestions, please check your input path.") + + def _stage_computation_analysis(self, profiling_path, stage_step_rank, job_list): + # 对不同pp stage取min max进行分析 + logger.info("Steps and ranks to be analyzed of different pipeline parallel stages are %s", + json.dumps(stage_step_rank)) + + stages_profiling_path = [] + for stage, step_rank_info in stage_step_rank.items(): + rank_id = step_rank_info.get("maximum", {}).get("rank_id") + step = step_rank_info.get("maximum", {}).get("step") + benchmark_rank_id = step_rank_info.get("minimum", {}).get("rank_id") + benchmark_step = step_rank_info.get("minimum", {}).get("step") + + info_msg = f"For {stage}, slow rank is {rank_id}" + if step: + info_msg += f", step is {step}" + logger.info(info_msg) + + stages_profiling_path.append( + dict( + stage=stage, rank=rank_id, step=step, benchmark_rank=benchmark_rank_id, + benchmark_step=benchmark_step, + profiling_path=self._get_profiling_path_by_rank(profiling_path, rank_id), + benchmark_profiling_path=self._get_profiling_path_by_rank(profiling_path, benchmark_rank_id), + compare_mode=Constant.KERNEL_COMPARE, + step_duration=self.slow_rank_analyzer.get_step_duration(rank_id, step) + ) + ) + Interface.add_analyzer(Interface.COMPUTATION, SupportedScopes.STAGE_COMPUTE, PPStageComputationAnalyzer) + compute_analysis_kwargs = {"stages_profiling_path": stages_profiling_path, "profiling_path": profiling_path} + + job_list.append((Interface.COMPUTATION, SupportedScopes.STAGE_COMPUTE, Interface(**compute_analysis_kwargs), + compute_analysis_kwargs)) + if not self.kwargs.get("benchmark_profiling_path"): + logger.info("Enable computation comparison of fast and slow rank/step in different pp stages") + job_list += self._profiling_comparison(stages_profiling_path) + return job_list + + def _global_computation_analysis(self, profiling_path, global_step_rank, job_list): + # 不区分stage,对所有卡取Min max进行分析 + logger.info("Without pipeline parallel stage, steps and ranks to be analyzed are %s", + json.dumps(global_step_rank)) + slow_rank_id = global_step_rank.get("maximum", {}).get("rank_id") + if slow_rank_id is not None: + info_msg = f"Maximum computation time for rank {slow_rank_id}" + else: + slow_rank_id = self.default_rank_id + info_msg = f"No slow rank with computation time, analysis for default rank {slow_rank_id}" + slow_step = global_step_rank.get("maximum", {}).get("step") + # 如果没有标杆profiling数据的rank id,说明没有快慢卡问题,直接对默认rank id进行分析,因此这里取值为None + fast_rank_id = global_step_rank.get("minimum", {}).get("rank_id") + fast_step = global_step_rank.get("minimum", {}).get("step") + + if slow_step is not None: + info_msg += f" and step {slow_step}, " + if fast_rank_id is not None: + info_msg += f"minimum computation time for rank {fast_rank_id}" + if fast_step is not None: + info_msg += f" and step {fast_step}" + logger.info(info_msg) + + kwargs = dict(profiling_path=self._get_profiling_path_by_rank(profiling_path, slow_rank_id), + benchmark_profiling_path=self._get_profiling_path_by_rank(profiling_path, fast_rank_id), + step=slow_step, benchmark_step=fast_step, rank=slow_rank_id, benchmark_rank=fast_rank_id, + compare_mode=Constant.KERNEL_COMPARE, + step_duration=self.slow_rank_analyzer.get_step_duration(slow_rank_id, slow_step)) + + job_list += self.computation_analysis(**kwargs) + + rank_id_valid = slow_rank_id is not None and fast_rank_id is not None and fast_rank_id != slow_rank_id + if not self.kwargs.get("benchmark_profiling_path") and rank_id_valid: + # 当用户指定benchmark profiling path时,不进行目标集群profiling的内部快慢卡对比 + logger.info("Enable computation comparison of fast and slow rank/step") + job_list += self._profiling_comparison([kwargs]) + return job_list diff --git a/profiler/msprof_analyze/advisor/analyzer/communication/base_communication_analyzer.py b/profiler/msprof_analyze/advisor/analyzer/communication/base_communication_analyzer.py index 5fbbf0c56dc204711eb37f47d11e67c65f9d3897..73724ee29988064f8e2b862ac49e5bd6b27f8762 100644 --- a/profiler/msprof_analyze/advisor/analyzer/communication/base_communication_analyzer.py +++ b/profiler/msprof_analyze/advisor/analyzer/communication/base_communication_analyzer.py @@ -1,22 +1,22 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer - - -class BaseCommunicationAnalyzer(BaseAnalyzer): - requires_cluster_dataset = True - - def __init__(self, collection_path, n_processes: int = 1, **kwargs): - super().__init__(collection_path, n_processes, **kwargs) +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer + + +class BaseCommunicationAnalyzer(BaseAnalyzer): + requires_cluster_dataset = True + + def __init__(self, collection_path, n_processes: int = 1, **kwargs): + super().__init__(collection_path, n_processes, **kwargs) diff --git a/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/ai_core_performance_analyzer.py b/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/ai_core_performance_analyzer.py deleted file mode 100644 index 23ec775e275134e8a99336b005d9f8f198660245..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/ai_core_performance_analyzer.py +++ /dev/null @@ -1,53 +0,0 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging - -from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer -from msprof_analyze.advisor.analyzer.computation.ai_core_performance.ai_core_performance_checker import \ - AICorePerformanceChecker -from msprof_analyze.advisor.dataset.profiling.profiling_dataset import ProfilingDataset -from msprof_analyze.advisor.result.result import OptimizeResult -from msprof_analyze.advisor.display.html.priority_background_color import PriorityBackgroundColor -from msprof_analyze.advisor.display.html.render import HTMLRender - -logger = logging.getLogger() - - -class AICorePerformanceAnalyzer(BaseAnalyzer): - dataset_cls_list = [ProfilingDataset] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs) -> None: - super().__init__(collection_path, n_processes, **kwargs) - profiling_key = ProfilingDataset.get_key() - self.profiling_dataset = self.get_first_data_by_key(self.dataset_list, profiling_key) - self.result = OptimizeResult() - self.html_render = HTMLRender() - self.html = None - - def optimize(self, **kwargs): - add_render_list = kwargs.get("add_render_list", True) - ai_core_perf_checker = AICorePerformanceChecker() - ai_core_perf_checker.data_filter(self.profiling_dataset) - if not ai_core_perf_checker.ai_core_performance_issues: - return self.result - ai_core_perf_checker.check_ai_core_performance(self.profiling_dataset) - ai_core_perf_checker.make_record(self.result) - self.html = ai_core_perf_checker.make_render(self.html_render, - add_render_list, - priority=self.get_priority(), - rank=kwargs.get("rank")) - return self.result - - def get_priority(self, max_mem_op_dur=None): - return PriorityBackgroundColor.low \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/ai_core_performance_checker.py b/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/ai_core_performance_checker.py deleted file mode 100644 index fa62cd6f8958e28320d19e09d8ef1dae5609d03f..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/advisor/analyzer/computation/ai_core_performance/ai_core_performance_checker.py +++ /dev/null @@ -1,562 +0,0 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging -import os -from functools import reduce -from msprof_analyze.advisor.dataset.profiling.profiling_dataset import ProfilingDataset -from msprof_analyze.advisor.result.item import OptimizeItem, OptimizeRecord -from msprof_analyze.advisor.result.result import OptimizeResult -from msprof_analyze.prof_common.additional_args_manager import AdditionalArgsManager -from msprof_analyze.prof_common.file_manager import FileManager - -logger = logging.getLogger() - - -class AICorePerformanceChecker: - """ - operator performance checker - """ - _CHECKER = "AICorePerformanceChecker" - CUBE_OPERATOR_MEMORY_SIZE_MB = 100 - INNER_AXIS_256 = 256 - INNER_AXIS_128 = 128 - - def __init__(self): - self.result = dict() - self.ai_core_performance_issues = False - self._desc = "" - self.cube_dict = {} - self.fa_dict = {} - self.fa_list = [] - self.vector_dict = {} - self.load_aicore_perf_rules() - - @staticmethod - def get_operator_list(cube_dict, profiling_dataset): - operator_list = [] - for op in profiling_dataset.op_summary.op_list: - if op.op_name in cube_dict: - key = op.input_shapes[1:-1] + "-" + op.output_shapes[1:-1] - if key in cube_dict[op.op_name]: - operator_list.append(op) - return operator_list - - @staticmethod - def get_vector_list(profiling_dataset, vector_dict): - vector_list = [] - for op_name in vector_dict: - for shape in vector_dict[op_name]: - for operator in profiling_dataset.op_summary.op_list: - if operator.op_name == op_name and operator.input_shapes[1:-1] + "-" + operator.output_shapes[ - 1:-1] == shape: - vector_list.append(operator) - return vector_list - - @staticmethod - def safe_divide(numerator, denominator): - if denominator == 0: - logger.warning("Warning: Division by zero is not allowed.") - return None - return numerator / denominator - - @staticmethod - def memory_size(operator): - memory = 0 - input_shapes = operator.input_shapes[1:-1].split(";") - output_shapes = operator.output_shapes[1:-1] - for shapes in input_shapes: - if "," not in shapes and shapes != "": - # 多的一维是 bias ,预先乘2 - memory += int(shapes) * 2 - continue - memory += reduce(lambda x, y: x * y, map(int, shapes.split(","))) - memory += reduce(lambda x, y: x * y, map(int, output_shapes.split(","))) - return memory * 2 / 1024 / 1024 - - def load_aicore_perf_rules(self): - language = AdditionalArgsManager().language - rule_path = os.path.join( - os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))), - "rules", language, "aicore_performance.yaml" - ) - - if not os.path.exists(rule_path): - logger.warning("Skip analyze aicpu issues, because %s does not exist.", rule_path) - - self.language = language - self.aicore_rules = FileManager.read_yaml_file(rule_path) - self._cube_problem = self.aicore_rules.get("cube_problem") - self._fa_problem = self.aicore_rules.get("fa_problem") - self._vector_problem = self.aicore_rules.get("vector_problem") - self._desc = self.aicore_rules.get("description") - self._bound_desc = self.aicore_rules.get("bound_description") - self._opti_desc = self.aicore_rules.get("optimization_description") - self._affinity_desc = self.aicore_rules.get("affinity_description") - self._cube_affinity_desc = self.aicore_rules.get("cube_affinity_desc") - self._fa_affinity_desc_head_dim_128 = self.aicore_rules.get("fa_affinity_desc_head_dim_128") - self._fa_affinity_desc_seq_len_128 = self.aicore_rules.get("fa_affinity_desc_seq_len_128") - self._fa_affinity_desc_head_dim_seq_len_128 = self.aicore_rules.get("fa_affinity_desc_head_dim_seq_len_128") - self._suggestion = self.aicore_rules.get("suggestion") - self._affinity_suggestion = self.aicore_rules.get("affinity_suggestion") - self._bound_suggestion = self.aicore_rules.get("bound_suggestion") - self._opti_suggestion = self.aicore_rules.get("optimization_suggestion") - self._operator_rules = {"cube_operators": self.aicore_rules.get("cube_operators"), - "fa_operators": self.aicore_rules.get("fa_operators"), - "vector_operators": self.aicore_rules.get("vector_operators")} - - def data_filter(self, profiling_dataset: ProfilingDataset): - if not self.check_task_list(profiling_dataset): - return - - operator_list = profiling_dataset.op_summary.op_list - total_duration = sum(float(operator.task_duration) for operator in operator_list) - if (total_duration == 0): - return - cube_memory_dict, vector_type_dict = {}, {} - - for op in operator_list: - shapes = op.input_shapes[1:-1] + "-" + op.output_shapes[1:-1] - # preliminary filter cube operator - if op.task_type == "AI_CORE" and "matmul" in op.op_type.lower(): - cube_memory_dict.setdefault(op.op_name, {}).setdefault(shapes, 0) - cube_memory_dict[op.op_name][shapes] += self.memory_size(op) - continue - - # filter fa operator - if op.op_type == "FlashAttentionScore": - self.fa_dict.setdefault(op.op_name, set()).add(shapes) - self.fa_list.append(op) - elif op.op_type == "FlashAttentionScoreGrad": - self.fa_dict.setdefault(op.op_name, set()).add(shapes + "-grad") - self.fa_list.append(op) - - # preliminary filter vector operator - if op.task_type in ["AI_VECTOR_CORE", "MIX_AIV"]: - vector_type_dict.setdefault(op.op_type, set()).add(op) - - # filter cube operator - for op_name in cube_memory_dict: - for shapes in cube_memory_dict[op_name]: - if cube_memory_dict[op_name][shapes] >= self.CUBE_OPERATOR_MEMORY_SIZE_MB: - self.cube_dict.setdefault(op_name, set()).add(shapes) - - # filter vector operator - for op_type in vector_type_dict: - duration_group_by_time = sum(float(op.task_duration) for op in vector_type_dict[op_type]) - if (duration_group_by_time / total_duration) >= 0.01 or duration_group_by_time >= 1000000: - for op in vector_type_dict[op_type]: - shapes = op.input_shapes[1:-1] + "-" + op.output_shapes[1:-1] - self.vector_dict.setdefault(op.op_name, set()).add(shapes) - - if any([self.cube_dict, self.fa_dict, self.vector_dict]): - self.ai_core_performance_issues = True - - def check_ai_core_performance(self, promoting_dataset: ProfilingDataset): - for operator_type in ["cube", "fa", "vector"]: - try: - self.result[operator_type] = getattr(self, f"check_{operator_type}_operator")(promoting_dataset) - except (IndexError, ValueError, AttributeError) as e: - logger.warning(f"Failed to check ai core performance {operator_type} operator, {e}.") - self.result[operator_type] = [] - - if not any([self.result["cube"], self.result["fa"], self.result["vector"]]): - self.ai_core_performance_issues = False - - def check_cube_operator(self, profiling_dataset: ProfilingDataset): - cube_dict = self.cube_dict - suggestion = self._cube_affinity_desc - optimization_queue, bound_queue, affinity_queue = [], [], [] - operator_list = self.get_operator_list(cube_dict, profiling_dataset) - for op in cube_dict: - for shape in cube_dict[op]: - affinity_flag = self._check_cube_inner_axis(shape) - if not affinity_flag: - dtype, shape_duration = None, 0. - for operator in operator_list: - if (operator.op_name == op and - operator.input_shapes[1:-1] + "-" + operator.output_shapes[1:-1] == shape): - dtype = operator.input_data_types - shape_duration += float(operator.task_duration) - affinity_queue.append({"op_name": op, - "shape": shape.split("-")[0], - "dtype": dtype, - "duration": shape_duration, - "suggestion": suggestion}) - else: - shape_list = [] - for operator in operator_list: - if (operator.op_name == op and operator.input_shapes[1:-1] + "-" + - operator.output_shapes[1:-1] == shape): - shape_list.append(operator) - shape_duration = sum(float(operator.task_duration) for operator in shape_list) - dtype = shape_list[0].input_data_types if shape_list else None - bound, optimization = self.del_cube_operator_bound(shape_list) - if bound is None and optimization is None: - continue - if bound: - bound_queue.append({"op_name": op, - "shape": shape.split("-")[0], - "dtype": dtype, - "bound": bound, - "duration": shape_duration}) - else: - optimization_queue.append({"op_name": op, - "shape": shape.split("-")[0], - "dtype": dtype, - "optimization": round(optimization * 100, 2)}) - return [sorted(optimization_queue, key=lambda x: x["optimization"], reverse=True)[:5], - sorted(bound_queue, key=lambda x: x["duration"], reverse=True)[:5], - sorted(affinity_queue, key=lambda x: x["duration"], reverse=True)[:5]] - - def del_cube_operator_bound(self, shape_list): - bound, optimization, aic_mac_ratio, aic_mte2_ratio, length = "", 0., 0., 0., 0 - for operator in shape_list: - try: - aic_mac_ratio += float(operator.aic_mac_ratio) - aic_mte2_ratio += float(operator.aic_mte2_ratio) - length += 1 - except ValueError: - continue - aic_mac_ratio = self.safe_divide(aic_mac_ratio, length) - aic_mte2_ratio = self.safe_divide(aic_mte2_ratio, length) - if aic_mac_ratio is None or aic_mte2_ratio is None: - return None, None - aic_mac_ratio_rule, aic_mte2_ratio_rule = None, None - for operator_rule in self._operator_rules["cube_operators"]: - if operator_rule["target"] == "aic_mac_ratio": - aic_mac_ratio_rule = operator_rule - elif operator_rule["target"] == "aic_mte2_ratio": - aic_mte2_ratio_rule = operator_rule - if (aic_mac_ratio >= aic_mac_ratio_rule["threshold"] - and aic_mte2_ratio >= aic_mte2_ratio_rule["threshold"]): - bound = aic_mac_ratio_rule["bound"] + "_and_" + aic_mte2_ratio_rule["bound"] + "_bound" - elif aic_mac_ratio >= aic_mte2_ratio_rule["threshold"]: - bound = aic_mac_ratio_rule["bound"] - elif aic_mte2_ratio >= aic_mte2_ratio_rule["threshold"]: - bound = aic_mte2_ratio_rule["bound"] - else: - optimization = max(aic_mac_ratio_rule["threshold"] - aic_mac_ratio, - aic_mte2_ratio_rule["threshold"] - aic_mte2_ratio) - return bound, optimization - - def check_fa_operator(self, profiling_dataset: ProfilingDataset): - fa_list, fa_dict = self.fa_list, self.fa_dict - optimization_queue, bound_queue, affinity_queue = [], [], [] - # 不亲和算子筛选 - for op in fa_dict: - for shape in fa_dict[op]: - affinity_flag, dtype, shape_duration, suggestion = self._check_fa_inner_axis(fa_list, op, shape) - if affinity_flag: - # 不亲和算子 计算耗时,加入affinity_queue - affinity_queue.append({"op_name": op, - "shape": shape.split("-")[0], - "dtype": dtype, - "suggestion": suggestion, - "duration": shape_duration}) - else: - # 处理bound算子和优化算子 - if len(shape.split("-")) > 2: - bound, optimization, dtype, shape_duration = self.del_fa_operator_bound_grad(op, shape, fa_list) - else: - bound, optimization, dtype, shape_duration = self.del_fa_operator_bound(op, shape, fa_list) - if bound is None and optimization is None: - continue - if bound: - bound_queue.append({"op_name": op, - "shape": shape.split("-")[0], - "dtype": dtype, - "bound": bound, - "duration": shape_duration}) - else: - optimization_queue.append({"op_name": op, - "shape": shape.split("-")[0], - "dtype": dtype, - "optimization": round(optimization * 100, 2)}) - - return [sorted(optimization_queue, key=lambda x: x["optimization"], reverse=True)[:5], - sorted(bound_queue, key=lambda x: x["duration"], reverse=True)[:5], - sorted(affinity_queue, key=lambda x: x["duration"], reverse=True)[:5]] - - def del_fa_operator_bound_grad(self, op, shape, fa_list): - aic_fixpipe_ratio, aic_mte2_ratio, shape_duration, optimization, length = 0., 0., 0., 0., 0 - bound, dtype = "", None - for operator in fa_list: - if (operator.op_name == op and - operator.input_shapes[1:-1] + "-" + - operator.output_shapes[1:-1] + "-grad" == shape): - try: - aic_fixpipe_ratio += float(operator.aic_fixpipe_ratio) - aic_mte2_ratio += float(operator.aic_mte2_ratio) - shape_duration += float(operator.task_duration) - dtype = operator.input_data_types - length += 1 - except ValueError: - continue - aic_fixpipe_ratio = self.safe_divide(aic_fixpipe_ratio, length) - aic_mte2_ratio = self.safe_divide(aic_mte2_ratio, length) - if aic_mte2_ratio is None or aic_fixpipe_ratio is None: - return None, None, None - aic_fixpipe_ratio_rule, aic_mte2_ratio_rule = None, None - for rule in self._operator_rules["fa_operators"]: - if rule["target"] == "aic_fixpipe_ratio": - aic_fixpipe_ratio_rule = rule - elif rule["target"] == "aic_mte2_ratio": - aic_mte2_ratio_rule = rule - if (aic_mte2_ratio >= aic_mte2_ratio_rule["threshold"] and - aic_fixpipe_ratio >= aic_fixpipe_ratio_rule["threshold"]): - bound = aic_fixpipe_ratio_rule["bound"] + "_and_" + aic_mte2_ratio_rule["bound"] + "_bound" - elif aic_mte2_ratio >= aic_mte2_ratio_rule["threshold"]: - bound = aic_mte2_ratio_rule["bound"] - elif aic_fixpipe_ratio >= aic_fixpipe_ratio_rule["threshold"]: - bound = aic_fixpipe_ratio_rule["bound"] - else: - optimization = max(aic_fixpipe_ratio_rule["threshold"] - aic_fixpipe_ratio, - aic_mte2_ratio_rule["threshold"] - aic_mte2_ratio) - return bound, optimization, dtype, shape_duration - - def del_fa_operator_bound(self, op, shape, fa_list): - aiv_vec_ratio, aic_mte2_ratio, shape_duration, optimization, length = 0., 0., 0., 0., 0 - bound, dtype = "", None - for operator in fa_list: - if (operator.op_name == op and - operator.input_shapes[1:-1] + "-" + operator.output_shapes[1:-1] == shape): - try: - aiv_vec_ratio += float(operator.aiv_vec_ratio) - aic_mte2_ratio += float(operator.aic_mte2_ratio) - shape_duration += float(operator.task_duration) - length += 1 - except ValueError: - continue - aiv_vec_ratio = self.safe_divide(aiv_vec_ratio, length) - aic_mte2_ratio = self.safe_divide(aic_mte2_ratio, length) - if aiv_vec_ratio is None or aic_mte2_ratio is None: - return None, None, None - aiv_vec_ratio_rule, aic_mte2_ratio_rule = None, None - for rule in self._operator_rules["fa_operators"]: - if rule["target"] == "aiv_vec_ratio": - aiv_vec_ratio_rule = rule - elif rule["target"] == "aic_mte2_ratio": - aic_mte2_ratio_rule = rule - if (aic_mte2_ratio >= aic_mte2_ratio_rule["threshold"] - and aiv_vec_ratio >= aiv_vec_ratio_rule["threshold"]): - bound = aic_mte2_ratio_rule["bound"] + "_and_" + aiv_vec_ratio_rule["bound"] + "_bound" - elif aic_mte2_ratio >= aic_mte2_ratio_rule["threshold"]: - bound = aic_mte2_ratio_rule["bound"] - elif aiv_vec_ratio >= aiv_vec_ratio_rule["threshold"]: - bound = aiv_vec_ratio_rule["bound"] - else: - optimization = max(aiv_vec_ratio_rule["threshold"] - aiv_vec_ratio, - aic_mte2_ratio_rule["threshold"] - aic_mte2_ratio) - return bound, optimization, dtype, shape_duration - - def check_vector_operator(self, profiling_dataset: ProfilingDataset): - vector_dict = self.vector_dict - optimization_queue, bound_queue = [], [] - vector_list = self.get_vector_list(profiling_dataset, vector_dict) - for op_name in vector_dict: - for shape in vector_dict[op_name]: - aiv_vec_ratio, aiv_mte2_ratio, aiv_mte3_ratio, shape_duration = 0., 0., 0., 0. - length, dtype = 0, "" - for operator in vector_list: - if (operator.op_name == op_name and - operator.input_shapes[1:-1] + "-" + operator.output_shapes[1:-1] == shape): - try: - aiv_vec_ratio += float(operator.aiv_vec_ratio) - aiv_mte2_ratio += float(operator.aiv_mte2_ratio) - aiv_mte3_ratio += float(operator.aiv_mte3_ratio) - shape_duration += float(operator.task_duration) - dtype = operator.input_data_types - length += 1 - except ValueError: - continue - aiv_vec_ratio = self.safe_divide(aiv_vec_ratio, length) - aiv_mte2_ratio = self.safe_divide(aiv_mte2_ratio, length) - aiv_mte3_ratio = self.safe_divide(aiv_mte3_ratio, length) - if aiv_vec_ratio is None or aiv_mte2_ratio is None or aiv_mte3_ratio is None: - continue - bound, optimization = self.del_vector_operator_bound(aiv_mte2_ratio, aiv_mte3_ratio, aiv_vec_ratio) - if bound: - bound_queue.append({"op_name": op_name, - "shape": shape.split("-")[0], - "bound": bound, - "dtype": dtype, - "duration": shape_duration}) - else: - optimization_queue.append({"op_name": op_name, - "shape": shape.split("-")[0], - "dtype": dtype, - "optimization": round(optimization * 100, 2)}) - return [sorted(optimization_queue, key=lambda x: x["optimization"], reverse=True)[:5], - sorted(bound_queue, key=lambda x: x["duration"], reverse=True)[:5]] - - def del_vector_operator_bound(self, aiv_mte2_ratio, aiv_mte3_ratio, aiv_vec_ratio): - bound, optimization = "", 0 - aiv_vec_ratio_rule, aiv_mte2_ratio_rule, aiv_mte3_ratio_rule, total_rule = None, None, None, None - for operator_rule in self._operator_rules["vector_operators"]: - if operator_rule["target"] == "aiv_vec_ratio": - aiv_vec_ratio_rule = operator_rule - elif operator_rule["target"] == "aiv_mte2_ratio": - aiv_mte2_ratio_rule = operator_rule - elif operator_rule["target"] == "aiv_mte3_ratio": - aiv_mte3_ratio_rule = operator_rule - elif operator_rule["target"] == "total": - total_rule = operator_rule - if aiv_vec_ratio + aiv_mte2_ratio + aiv_mte3_ratio >= total_rule["threshold"]: - bound = total_rule["bound"] - elif aiv_mte2_ratio >= aiv_mte2_ratio_rule["threshold"]: - bound = aiv_mte2_ratio_rule["bound"] - elif aiv_mte3_ratio >= aiv_mte3_ratio_rule["threshold"]: - bound = aiv_mte3_ratio_rule["bound"] - elif aiv_vec_ratio >= aiv_vec_ratio_rule["threshold"]: - bound = aiv_vec_ratio_rule["bound"] - else: - optimization = max(aiv_vec_ratio_rule["threshold"] - aiv_vec_ratio, - aiv_mte2_ratio_rule["threshold"] - aiv_mte2_ratio, - aiv_mte3_ratio_rule["threshold"] - aiv_mte3_ratio) - return bound, optimization - - def draw_record(self, op_type: str, result: OptimizeResult): - suggestion_keys = ['opti', 'bound', 'affinity'] - desc = dict.fromkeys(suggestion_keys, "") - problem_map = { - 'cube': self._cube_problem, - 'fa': self._fa_problem, - 'vector': self._vector_problem - } - if op_type not in problem_map: - return - optimization_item = OptimizeItem(problem_map[op_type], self._desc, [self._suggestion]) - result.add(OptimizeRecord(optimization_item)) - headers = [ - "Type", - "Description and Suggestion", - ] - result.add_detail(problem_map[op_type], headers=headers) - for opti_issue in self.result[op_type][0]: - opti_sugg = self._opti_suggestion.format(**opti_issue) - desc["opti"] += opti_sugg - if desc["opti"]: - result.add_detail(problem_map[op_type], detail=[self._opti_desc, desc["opti"]]) - for bound_issue in self.result[op_type][1]: - bound_sugg = self._bound_suggestion.format(**bound_issue) - desc["bound"] += bound_sugg - if desc["bound"]: - result.add_detail(problem_map[op_type], detail=[self._bound_desc, desc["bound"]]) - if op_type == "vector": # vector 类型没有亲和性建议 - return - for affinity_issue in self.result[op_type][2]: - affinity_sugg = self._affinity_suggestion.format(**affinity_issue) - desc["affinity"] += affinity_sugg - if desc["affinity"]: - result.add_detail(problem_map[op_type], detail=[self._affinity_desc, desc["affinity"]]) - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self.ai_core_performance_issues: - return self.ai_core_performance_issues - if any(self.result["cube"]): - self.draw_record("cube", result) - if any(self.result["fa"]): - self.draw_record("fa", result) - if any(self.result["vector"]): - self.draw_record("vector", result) - - return True - - def make_render(self, html_render, add_render_list=True, **kwargs): - if not self.ai_core_performance_issues: - return self.ai_core_performance_issues - - priority = kwargs.get("priority") - return html_render.render_template(key="computation", - template_dir="templates", - template_name="ai_core_performance.html", - format_result=self.result, - language=self.language, - add_render_list=add_render_list, - priority_background_color=priority, - rank=kwargs.get("rank")) - - def check_task_list(self, profiling_dataset: ProfilingDataset) -> bool: - if not hasattr(profiling_dataset, "op_summary"): - logger.warning("Skip %s checker because of not containing %s", self._CHECKER, "op summary") - return False - if not hasattr(profiling_dataset.op_summary, "op_list"): - logger.warning("Skip %s checker because of not containing %s", self._CHECKER, "op_list") - return False - if (not hasattr(profiling_dataset.op_summary.op_list[0], "input_shapes") or - not hasattr(profiling_dataset.op_summary.op_list[0], "input_data_types")): - logger.warning("Skip %s checker because of not containing input datas", self._CHECKER) - return False - return True - - def _check_cube_inner_axis(self, shape): - # 判断输入shape内轴是否为256的倍数 - shapes = shape.split("-")[0].split(";") - if (len(shape.split("-")[0].split(";")[0].split(","))) == 4: - # NZ格式 - b_axis, c_axis = int(shapes[0].split(",")[1]), int(shapes[0].split(",")[2]) - f_axis, g_axis = int(shapes[1].split(",")[1]), int(shapes[1].split(",")[2]) - return (b_axis * c_axis % self.INNER_AXIS_256 == 0) and (f_axis * g_axis % self.INNER_AXIS_256 == 0) - elif (len(shape.split("-")[0].split(";")[0].split(","))) == 2: - # ND格式 - l_axis, k_axis = int(shapes[0].split(",")[1]), int(shapes[1].split(",")[1]) - return (l_axis % self.INNER_AXIS_256 == 0) and (k_axis % self.INNER_AXIS_256 == 0) - else: - return False - - def _check_fa_inner_axis(self, fa_list, op, shape): - shape_duration = 0. - affinity_flag = False - dtype = None - suggestion = "" - if "varlen" in op.lower(): - # 处理变长算子 如果不亲和则affinity_flag为False - inner_axis = int(shape.split("-")[0].split(";")[0].split(",")[2]) - if inner_axis % self.INNER_AXIS_128 != 0: - affinity_flag = True - suggestion = self._fa_affinity_desc_head_dim_128 - for operator in fa_list: - if (operator.op_name == op and - operator.input_shapes[1:-1] + "-" + operator.output_shapes[1:-1] == shape): - shape_duration += float(operator.task_duration) - dtype = operator.input_data_types - else: - # 处理定长算子 如果不亲和则affinity_flag为False - head_dim = 0 - seq_len = int(shape.split("-")[1].split(";")[0].split(",")[2]) - input_first_tensor = shape.split("-")[0].split(";")[0].split(",") - if len(input_first_tensor) == 3: - head_dim = int(input_first_tensor[2]) / int(shape.split("-")[1].split(";")[0].split(",")[1]) - else: - head_dim = int(input_first_tensor[3]) - if head_dim % self.INNER_AXIS_128 != 0 and seq_len % self.INNER_AXIS_128 != 0: - affinity_flag = True - suggestion = self._fa_affinity_desc_head_dim_seq_len_128 - elif head_dim % self.INNER_AXIS_128 != 0: - affinity_flag = True - suggestion = self._fa_affinity_desc_head_dim_128 - elif seq_len % self.INNER_AXIS_128 != 0: - affinity_flag = True - suggestion = self._fa_affinity_desc_seq_len_128 - if affinity_flag: - for operator in fa_list: - if (operator.op_name == op and - operator.input_shapes[1:-1] + "-" + - operator.output_shapes[1:-1] == shape): - shape_duration += float(operator.task_duration) - dtype = operator.input_data_types - return affinity_flag, dtype, shape_duration, suggestion diff --git a/profiler/msprof_analyze/advisor/analyzer/computation/operator_checker.py b/profiler/msprof_analyze/advisor/analyzer/computation/operator_checker.py index 4be0fc66ae8b8f75ca0518228cbdccde1a0d7c1e..ab9d4228b470ee515ed912ab018badbba3ec2e67 100644 --- a/profiler/msprof_analyze/advisor/analyzer/computation/operator_checker.py +++ b/profiler/msprof_analyze/advisor/analyzer/computation/operator_checker.py @@ -52,7 +52,6 @@ class OperatorChecker(VersionControl): self._tune_op_list: List[str] = [] self.prompt_class = BasePrompt.get_prompt_class("OperatorChecker") - self.rank_id = self.prompt_class.RANK_ID self.pytorch_op_tune_suggestion = self.prompt_class.PYTORCH_OPERATOR_TUNE_SUGGESTION self.mslite_op_tune_suggestion = self.prompt_class.MSLITE_OPERATOR_TUNE_SUGGESTION self.pytorch_release_suggestion = self.prompt_class.PYTORCH_RELEASE_SUGGESTION @@ -119,7 +118,7 @@ class OperatorChecker(VersionControl): """ if rank is not None: - self._problem = self.rank_id.format(rank) + self._problem.lower() + self._problem = self.prompt_class.RANK_ID.format(rank) + self._problem.lower() task_duration_list = [float(op_info.get_attr("task_duration")) for op_info in self._op_list @@ -302,7 +301,7 @@ class OperatorChecker(VersionControl): def format_suggestion_content(self, profiling_data: ProfilingDataset) -> None: if profiling_data.prof_type == EnumParamsParser().profiling_type.ascend_pytorch_profiler: self._suggestion.append(self.pytorch_op_tune_suggestion) - elif profiling_data.prof_type == EnumParamsParser().profiling_type.mslite: + elif profiling_data.prof_type == EnumParamsParser.profiling_type.mslite: self._suggestion.append(self.mslite_op_tune_suggestion) def _check_data(self, profiling_data): diff --git a/profiler/msprof_analyze/advisor/analyzer/computation/pp_stage_computation_analyzer.py b/profiler/msprof_analyze/advisor/analyzer/computation/pp_stage_computation_analyzer.py index 2780204b2064ed628ee686d91e82169818955eb7..2a08e668e140e090c6a3fb7f65fdbaf01310e741 100644 --- a/profiler/msprof_analyze/advisor/analyzer/computation/pp_stage_computation_analyzer.py +++ b/profiler/msprof_analyze/advisor/analyzer/computation/pp_stage_computation_analyzer.py @@ -1,118 +1,118 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging -from multiprocessing import Manager - -from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer -from msprof_analyze.advisor.common.analyzer_scopes import SupportedScopes -from msprof_analyze.advisor.display.html.render import HTMLRender -from msprof_analyze.advisor.display.html.priority_background_color import PriorityBackgroundColor -from msprof_analyze.advisor.interface.interface import Interface -from msprof_analyze.advisor.utils.utils import ParallelJob, get_analyze_processes -from msprof_analyze.advisor.result.result import OptimizeResult -from msprof_analyze.advisor.result.item import OptimizeItem, OptimizeRecord - -logger = logging.getLogger() - - -class PPStageComputationAnalyzer(BaseAnalyzer): - - def __init__(self, collection_path, **kwargs): - super().__init__(collection_path, **kwargs) - self.collection_path = collection_path - self._stages_rendered_html = Manager().list() - self._multiprocess_result = Manager().dict() - # html render不能序列化,无法用多进程,放到optimize里面初始化 - self.html_render = None - self.result = None - - @staticmethod - def _get_valid_sheet_name(sheet_name, prefix): - if not sheet_name.lower().startswith(prefix.lower()): - sheet_name = f"{prefix} {sheet_name}" - return sheet_name - - def optimize(self, stages_profiling_path, **kwargs): - pp_stage_processes = min(get_analyze_processes(), len(stages_profiling_path)) - if pp_stage_processes <= 1: - for stage_profiling_path in stages_profiling_path: - self._optimize(**stage_profiling_path) - else: - logger.info("Start to parallel analysis of pp stages, number of processes is %s", pp_stage_processes) - parallel_stage_analysis_job = ParallelJob(self._optimize, stages_profiling_path, - "Computation analysis of Pipeline parallel stages") - parallel_stage_analysis_job.start(pp_stage_processes) - self._merge_multiprocess_result() - - self.make_render() - self.html_render = HTMLRender() - return self.result - - def make_render(self): - HTMLRender().render_template(key="computation", - template_dir="templates", - template_name="pp_stage_computation_analysis.html", - stages_rendered_html=list(self._stages_rendered_html), - priority_background_color=PriorityBackgroundColor.high) - - def get_priority(self, max_mem_op_dur=None): - pass - - def _optimize(self, profiling_path, **kwargs): - stage_html_record = dict(stage=kwargs.get("stage"), rank=kwargs.get("rank"), step=kwargs.get("step")) - kwargs["add_render_list"] = False - - # stage 并行分析时,避免调用本身,即SupportedScopes.STAGE_COMPUTE - scopes = Interface.get_scope(Interface.COMPUTATION) - stage_analyzer_list = [Interface.get_analyzer(Interface.COMPUTATION, scope) - for scope in scopes - if scope != SupportedScopes.STAGE_COMPUTE] - - for analyzer_cls in stage_analyzer_list: - analyzer = analyzer_cls(collection_path=profiling_path, **kwargs) - result = analyzer.optimize(**kwargs) - if hasattr(result, "data") and result.data: - self.result = result - if hasattr(analyzer, "html") and analyzer.html: - if "html_list" not in stage_html_record: - stage_html_record["html_list"] = [] - stage_html_record["html_list"].append(analyzer.html) - self._stages_rendered_html.append(stage_html_record) - self._multiprocess_result[f"rank {kwargs.get('rank')}".capitalize()] = result.data - - def _merge_multiprocess_result(self): - self.result = OptimizeResult() - for key, result_data in self._multiprocess_result.items(): - problem_data = result_data.get("problems", {}).get("data", []) - if not problem_data: - continue - - for row in problem_data: - if len(row) < 3: - continue - issue_name, desc, suggestion = row[:3] - sheet_name = PPStageComputationAnalyzer._get_valid_sheet_name(issue_name, key) - optimization_item = OptimizeItem(sheet_name, desc, [suggestion]) - self.result.add(OptimizeRecord(optimization_item)) - del result_data["problems"] - - for issue_name, issue_details in result_data.items(): - headers = issue_details.get("headers", []) - data = issue_details.get("data", []) - sheet_name = PPStageComputationAnalyzer._get_valid_sheet_name(issue_name, key) - self.result.add_detail(sheet_name, headers=headers) - - for row in data: - self.result.add_detail(sheet_name, detail=row) +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import logging +from multiprocessing import Manager + +from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer +from msprof_analyze.advisor.common.analyzer_scopes import SupportedScopes +from msprof_analyze.advisor.display.html.render import HTMLRender +from msprof_analyze.advisor.display.html.priority_background_color import PriorityBackgroundColor +from msprof_analyze.advisor.interface.interface import Interface +from msprof_analyze.advisor.utils.utils import ParallelJob, get_analyze_processes +from msprof_analyze.advisor.result.result import OptimizeResult +from msprof_analyze.advisor.result.item import OptimizeItem, OptimizeRecord + +logger = logging.getLogger() + + +class PPStageComputationAnalyzer(BaseAnalyzer): + + def __init__(self, collection_path, **kwargs): + super().__init__(collection_path, **kwargs) + self.collection_path = collection_path + self._stages_rendered_html = Manager().list() + self._multiprocess_result = Manager().dict() + # html render不能序列化,无法用多进程,放到optimize里面初始化 + self.html_render = None + self.result = None + + @staticmethod + def _get_valid_sheet_name(sheet_name, prefix): + if not sheet_name.lower().startswith(prefix.lower()): + sheet_name = f"{prefix} {sheet_name}" + return sheet_name + + def optimize(self, stages_profiling_path, **kwargs): + pp_stage_processes = min(get_analyze_processes(), len(stages_profiling_path)) + if pp_stage_processes <= 1: + for stage_profiling_path in stages_profiling_path: + self._optimize(**stage_profiling_path) + else: + logger.info("Start to parallel analysis of pp stages, number of processes is %s", pp_stage_processes) + parallel_stage_analysis_job = ParallelJob(self._optimize, stages_profiling_path, + "Computation analysis of Pipeline parallel stages") + parallel_stage_analysis_job.start(pp_stage_processes) + self._merge_multiprocess_result() + + self.make_render() + self.html_render = HTMLRender() + return self.result + + def make_render(self): + HTMLRender().render_template(key="computation", + template_dir="templates", + template_name="pp_stage_computation_analysis.html", + stages_rendered_html=list(self._stages_rendered_html), + priority_background_color=PriorityBackgroundColor.high) + + def get_priority(self, max_mem_op_dur=None): + pass + + def _optimize(self, profiling_path, **kwargs): + stage_html_record = dict(stage=kwargs.get("stage"), rank=kwargs.get("rank"), step=kwargs.get("step")) + kwargs["add_render_list"] = False + + # stage 并行分析时,避免调用本身,即SupportedScopes.STAGE_COMPUTE + scopes = Interface.get_scope(Interface.COMPUTATION) + stage_analyzer_list = [Interface.get_analyzer(Interface.COMPUTATION, scope) + for scope in scopes + if scope != SupportedScopes.STAGE_COMPUTE] + + for analyzer_cls in stage_analyzer_list: + analyzer = analyzer_cls(collection_path=profiling_path, **kwargs) + result = analyzer.optimize(**kwargs) + if hasattr(result, "data") and result.data: + self.result = result + if hasattr(analyzer, "html") and analyzer.html: + if "html_list" not in stage_html_record: + stage_html_record["html_list"] = [] + stage_html_record["html_list"].append(analyzer.html) + self._stages_rendered_html.append(stage_html_record) + self._multiprocess_result[f"rank {kwargs.get('rank')}".capitalize()] = result.data + + def _merge_multiprocess_result(self): + self.result = OptimizeResult() + for key, result_data in self._multiprocess_result.items(): + problem_data = result_data.get("problems", {}).get("data", []) + if not problem_data: + continue + + for row in problem_data: + if len(row) < 3: + continue + issue_name, desc, suggestion = row[:3] + sheet_name = PPStageComputationAnalyzer._get_valid_sheet_name(issue_name, key) + optimization_item = OptimizeItem(sheet_name, desc, [suggestion]) + self.result.add(OptimizeRecord(optimization_item)) + del result_data["problems"] + + for issue_name, issue_details in result_data.items(): + headers = issue_details.get("headers", []) + data = issue_details.get("data", []) + sheet_name = PPStageComputationAnalyzer._get_valid_sheet_name(issue_name, key) + self.result.add_detail(sheet_name, headers=headers) + + for row in data: + self.result.add_detail(sheet_name, detail=row) diff --git a/profiler/msprof_analyze/advisor/analyzer/schedule/fusible_ops/fusible_operator_checker.py b/profiler/msprof_analyze/advisor/analyzer/schedule/fusible_ops/fusible_operator_checker.py index 9070a8036047f7976ca7e9a7ab81bd5bf9632af6..3ab54b0dbb8729c8297606a471ce67e55715b2b8 100644 --- a/profiler/msprof_analyze/advisor/analyzer/schedule/fusible_ops/fusible_operator_checker.py +++ b/profiler/msprof_analyze/advisor/analyzer/schedule/fusible_ops/fusible_operator_checker.py @@ -88,7 +88,7 @@ class FusibleOperatorChecker: @staticmethod def check_hccl(task: OpInfo): - return (task.task_type in ["COMMUNICATION", "HCCL"] or + return (task.task_type == "HCCL" or any(task.op_name.lower().startswith(item) for item in ["hcom", "lccl", "lcoc"])) @staticmethod diff --git a/profiler/msprof_analyze/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py b/profiler/msprof_analyze/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py index 1e75d4e8969d57d54f55eb477165e6379664b817..48506da62646cf337380be7d6c7eb6779161889e 100644 --- a/profiler/msprof_analyze/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py +++ b/profiler/msprof_analyze/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py @@ -1,46 +1,46 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging - -from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer -from msprof_analyze.advisor.result.result import OptimizeResult -from msprof_analyze.advisor.analyzer.schedule.syncbn.syncbn_checker import SyncBNChecker -from msprof_analyze.advisor.display.html.priority_background_color import PriorityBackgroundColor -from msprof_analyze.advisor.display.html.render import HTMLRender -from msprof_analyze.advisor.dataset.timeline_event_dataset import ScheduleAnalysisDataset - -logger = logging.getLogger() - - -class SyncBNAnalyzer(BaseAnalyzer): - dataset_cls_list = [ScheduleAnalysisDataset] - - def __init__(self, collection_path, **kwargs): - super().__init__(collection_path, **kwargs) - self.result = OptimizeResult() - self.html_render = HTMLRender() - key = ScheduleAnalysisDataset.get_key() - self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) - - @BaseAnalyzer.check_data((ScheduleAnalysisDataset.get_key(),)) - def optimize(self, **kwargs): - syncbn_checker = SyncBNChecker() - syncbn_checker.check_syncbn(self.timeline_event_dataset) - syncbn_checker.make_record(self.result) - syncbn_checker.make_render(self.html_render, priority=self.get_priority(), rank=kwargs.get("rank")) - return self.result - - def get_priority(self, max_mem_op_dur=None): +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import logging + +from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer +from msprof_analyze.advisor.result.result import OptimizeResult +from msprof_analyze.advisor.analyzer.schedule.syncbn.syncbn_checker import SyncBNChecker +from msprof_analyze.advisor.display.html.priority_background_color import PriorityBackgroundColor +from msprof_analyze.advisor.display.html.render import HTMLRender +from msprof_analyze.advisor.dataset.timeline_event_dataset import ScheduleAnalysisDataset + +logger = logging.getLogger() + + +class SyncBNAnalyzer(BaseAnalyzer): + dataset_cls_list = [ScheduleAnalysisDataset] + + def __init__(self, collection_path, **kwargs): + super().__init__(collection_path, **kwargs) + self.result = OptimizeResult() + self.html_render = HTMLRender() + key = ScheduleAnalysisDataset.get_key() + self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) + + @BaseAnalyzer.check_data((ScheduleAnalysisDataset.get_key(),)) + def optimize(self, **kwargs): + syncbn_checker = SyncBNChecker() + syncbn_checker.check_syncbn(self.timeline_event_dataset) + syncbn_checker.make_record(self.result) + syncbn_checker.make_render(self.html_render, priority=self.get_priority(), rank=kwargs.get("rank")) + return self.result + + def get_priority(self, max_mem_op_dur=None): return PriorityBackgroundColor.high \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py b/profiler/msprof_analyze/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py index ea095e1968f67d4762280bb4dfe180bddde4368e..4ac82fd827186e8afbf93391ce4b109e7cefcf38 100644 --- a/profiler/msprof_analyze/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py +++ b/profiler/msprof_analyze/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py @@ -1,48 +1,48 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging - -from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer -from msprof_analyze.advisor.analyzer.schedule.synchronize_stream.synchronize_stream_checker import \ - SynchronizeStreamChecker -from msprof_analyze.advisor.dataset.timeline_event_dataset import ScheduleAnalysisDataset -from msprof_analyze.advisor.display.html.render import HTMLRender -from msprof_analyze.advisor.result.result import OptimizeResult - -logger = logging.getLogger() - - -class SynchronizeStreamAnalyzer(BaseAnalyzer): - dataset_cls_list = [ScheduleAnalysisDataset] - - def __init__(self, collection_path, **kwargs): - super().__init__(collection_path, **kwargs) - self.result = OptimizeResult() - self.html_render = HTMLRender() - - key = ScheduleAnalysisDataset.get_key() - self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) - - @BaseAnalyzer.check_data((ScheduleAnalysisDataset.get_key(),)) - def optimize(self, **kwargs): - synchronize_stream_checker = SynchronizeStreamChecker() - synchronize_stream_checker.check_synchronize(self.timeline_event_dataset) - synchronize_stream_checker.make_record(self.result) - synchronize_stream_checker.make_render(self.html_render, priority=self.get_priority(synchronize_stream_checker), - rank=kwargs.get("rank")) - return self.result - - def get_priority(self, max_mem_op_dur): - return max_mem_op_dur.priority +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import logging + +from msprof_analyze.advisor.analyzer.base_analyzer import BaseAnalyzer +from msprof_analyze.advisor.analyzer.schedule.synchronize_stream.synchronize_stream_checker import \ + SynchronizeStreamChecker +from msprof_analyze.advisor.dataset.timeline_event_dataset import ScheduleAnalysisDataset +from msprof_analyze.advisor.display.html.render import HTMLRender +from msprof_analyze.advisor.result.result import OptimizeResult + +logger = logging.getLogger() + + +class SynchronizeStreamAnalyzer(BaseAnalyzer): + dataset_cls_list = [ScheduleAnalysisDataset] + + def __init__(self, collection_path, **kwargs): + super().__init__(collection_path, **kwargs) + self.result = OptimizeResult() + self.html_render = HTMLRender() + + key = ScheduleAnalysisDataset.get_key() + self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) + + @BaseAnalyzer.check_data((ScheduleAnalysisDataset.get_key(),)) + def optimize(self, **kwargs): + synchronize_stream_checker = SynchronizeStreamChecker() + synchronize_stream_checker.check_synchronize(self.timeline_event_dataset) + synchronize_stream_checker.make_record(self.result) + synchronize_stream_checker.make_render(self.html_render, priority=self.get_priority(synchronize_stream_checker), + rank=kwargs.get("rank")) + return self.result + + def get_priority(self, max_mem_op_dur): + return max_mem_op_dur.priority diff --git a/profiler/msprof_analyze/advisor/common/analyzer_scopes.py b/profiler/msprof_analyze/advisor/common/analyzer_scopes.py index 6a6261c7b75e721c0a9df75f35ecb3cd2aa1e487..07ceef769440b39c93aeaaf15ded5ad99fc3f4b3 100644 --- a/profiler/msprof_analyze/advisor/common/analyzer_scopes.py +++ b/profiler/msprof_analyze/advisor/common/analyzer_scopes.py @@ -1,4 +1,5 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2024-2025. All rights reserved. +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -40,4 +41,3 @@ class SupportedScopes: FUSIBLE_OPERATOR_ANALYSIS = "fusible_operator_analysis" CONJECTURED_GC_ANALYSIS = "conjectured_analysis" COMPARISON = "comparison" - AICORE_PERFORMANCE_ANALYSIS = "ai_core_performance_analysis" diff --git a/profiler/msprof_analyze/advisor/common/async_analysis_status.py b/profiler/msprof_analyze/advisor/common/async_analysis_status.py index 98bb458105421b38395f745f2913311a24a5ce40..2d314b5cb0d1994f28d74f876395a04f0d8eedee 100644 --- a/profiler/msprof_analyze/advisor/common/async_analysis_status.py +++ b/profiler/msprof_analyze/advisor/common/async_analysis_status.py @@ -1,27 +1,27 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- - -# Copyright (C) 2024-2024. Huawei Technologies Co., Ltd. All rights reserved. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - - -class AsyncAnalysisStatus: - FAILED = "failed" - SUCCESS = "success" - ANALYZING = "analyzing" - - BAD_REQUEST_STATUS_CODE = 400 - NOT_FOUND_STATUS_CODE = 404 - INNER_ERROR_STATUS_CODE = 500 - NON_FAILED_STATUS_CODE = 200 +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- + +# Copyright (C) 2024-2024. Huawei Technologies Co., Ltd. All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + + +class AsyncAnalysisStatus: + FAILED = "failed" + SUCCESS = "success" + ANALYZING = "analyzing" + + BAD_REQUEST_STATUS_CODE = 400 + NOT_FOUND_STATUS_CODE = 404 + INNER_ERROR_STATUS_CODE = 500 + NON_FAILED_STATUS_CODE = 200 diff --git a/profiler/msprof_analyze/advisor/common/enum_params_parser.py b/profiler/msprof_analyze/advisor/common/enum_params_parser.py index ebf81ae38c249f4701e46f9a05b5cb9f86db635c..7158af929f5711a32de835b426bbe91c2000a401 100644 --- a/profiler/msprof_analyze/advisor/common/enum_params_parser.py +++ b/profiler/msprof_analyze/advisor/common/enum_params_parser.py @@ -1,104 +1,104 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- - -# Copyright (C) 2024-2024. Huawei Technologies Co., Ltd. All rights reserved. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import logging -import typing - -from msprof_analyze.advisor.common.timeline.event import AdvisorDict -from msprof_analyze.advisor.utils.utils import singleton -from msprof_analyze.prof_common.file_manager import FileManager - -logger = logging.getLogger() - - -@singleton -class EnumParamsParser(): - # 枚举变量抽象成yaml文件,统一管理,便于第三方服务对接advisor时调用当前类查询所有枚举变量参数的默认值和可选值 - - ARGUMENTS = "arguments" - ENVS = "envs" - OPTIONS = "options" - DEFAULT = "default" - TYPE = "type" - STR_TYPE = "str" - LIST_TYPE = "list" - INT_TYPE = "int" - BOOLEAN_TYPE = "boolean" - - def __init__(self): - enum_params_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "config", - "enum_parameters.yaml") - self.enum_params = FileManager.read_yaml_file(enum_params_path) - self._set_value() - - def get_keys(self): - return list(self.get_arguments_keys()) + list(self.get_envs_keys()) - - def get_arguments_keys(self): - return list(self.enum_params.get(self.ARGUMENTS, {}).keys()) - - def get_envs_keys(self): - return list(self.enum_params.get(self.ENVS, {}).keys()) - - def get_options(self, key, filter_func=None): - options = [] - for param_type in [self.ARGUMENTS, self.ENVS]: - if key not in self.enum_params.get(param_type, {}): - continue - options = self.enum_params.get(param_type, {}).get(key, {}).get(self.OPTIONS, []) - - if not options: - logger.error("Key %s not exists, optionals are %s", key, self.get_keys()) - - if filter_func is not None and callable(filter_func): - options = [value for value in options if filter_func(value)] - - return options - - def get_value_type(self, key): - for param_type in [self.ARGUMENTS, self.ENVS]: - if key not in self.enum_params.get(param_type, {}): - continue - value_type = self.enum_params.get(param_type, {}).get(key, {}).get(self.TYPE, self.STR_TYPE) - return value_type - return self.STR_TYPE - - def get_default(self, key): - default_value = None - for param_type in [self.ARGUMENTS, self.ENVS]: - if key not in self.enum_params.get(param_type, {}): - continue - default_value = self.enum_params.get(param_type, {}).get(key, {}).get(self.DEFAULT, []) - - if not default_value: - logger.error("Key %s not exists, optionals are %s", key, self.get_keys()) - - return default_value - - def _set_value(self): - - for key in self.get_keys(): - - if not hasattr(self, key): - setattr(self, str(key), AdvisorDict()) - - options = self.get_options(key) - - for value in options: - if not isinstance(value, typing.Hashable): - continue - getattr(self, key)[str(value)] = value +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- + +# Copyright (C) 2024-2024. Huawei Technologies Co., Ltd. All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import logging +import typing + +from msprof_analyze.advisor.common.timeline.event import AdvisorDict +from msprof_analyze.advisor.utils.utils import singleton +from msprof_analyze.prof_common.file_manager import FileManager + +logger = logging.getLogger() + + +@singleton +class EnumParamsParser(): + # 枚举变量抽象成yaml文件,统一管理,便于第三方服务对接advisor时调用当前类查询所有枚举变量参数的默认值和可选值 + + ARGUMENTS = "arguments" + ENVS = "envs" + OPTIONS = "options" + DEFAULT = "default" + TYPE = "type" + STR_TYPE = "str" + LIST_TYPE = "list" + INT_TYPE = "int" + BOOLEAN_TYPE = "boolean" + + def __init__(self): + enum_params_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "config", + "enum_parameters.yaml") + self.enum_params = FileManager.read_yaml_file(enum_params_path) + self._set_value() + + def get_keys(self): + return list(self.get_arguments_keys()) + list(self.get_envs_keys()) + + def get_arguments_keys(self): + return list(self.enum_params.get(self.ARGUMENTS, {}).keys()) + + def get_envs_keys(self): + return list(self.enum_params.get(self.ENVS, {}).keys()) + + def get_options(self, key, filter_func=None): + options = [] + for param_type in [self.ARGUMENTS, self.ENVS]: + if key not in self.enum_params.get(param_type, {}): + continue + options = self.enum_params.get(param_type, {}).get(key, {}).get(self.OPTIONS, []) + + if not options: + logger.error("Key %s not exists, optionals are %s", key, self.get_keys()) + + if filter_func is not None and callable(filter_func): + options = [value for value in options if filter_func(value)] + + return options + + def get_value_type(self, key): + for param_type in [self.ARGUMENTS, self.ENVS]: + if key not in self.enum_params.get(param_type, {}): + continue + value_type = self.enum_params.get(param_type, {}).get(key, {}).get(self.TYPE, self.STR_TYPE) + return value_type + return self.STR_TYPE + + def get_default(self, key): + default_value = None + for param_type in [self.ARGUMENTS, self.ENVS]: + if key not in self.enum_params.get(param_type, {}): + continue + default_value = self.enum_params.get(param_type, {}).get(key, {}).get(self.DEFAULT, []) + + if not default_value: + logger.error("Key %s not exists, optionals are %s", key, self.get_keys()) + + return default_value + + def _set_value(self): + + for key in self.get_keys(): + + if not hasattr(self, key): + setattr(self, str(key), AdvisorDict()) + + options = self.get_options(key) + + for value in options: + if not isinstance(value, typing.Hashable): + continue + getattr(self, key)[str(value)] = value diff --git a/profiler/msprof_analyze/advisor/config/enum_parameters.yaml b/profiler/msprof_analyze/advisor/config/enum_parameters.yaml index 678fe72b43c7f5b2fd66b3f38c3114cc9793cd50..534859eb9d08887ca35a65b12db70f5cca4a1716 100644 --- a/profiler/msprof_analyze/advisor/config/enum_parameters.yaml +++ b/profiler/msprof_analyze/advisor/config/enum_parameters.yaml @@ -1,58 +1,58 @@ -arguments: - cann_version: - type: str - options: - - 6.3.RC2 - - 7.0.RC1 - - 7.0.0 - - 8.0.RC1 - - 8.0.RC2 - - 8.0.0 - default: 8.0.0 - - torch_version: - type: str - options: - - 1.11.0 - - 2.1.0 - default: 2.1.0 - mindspore_version: - type: str - options: - - 2.3.0 - - 2.4.0 - default: 2.4.0 - analysis_dimensions: - type: list - options: - - [ computation, communication, schedule, memory ] - - [ computation ] - - [ communication ] - - [ schedule ] - - [ memory ] - default: [ computation, communication, schedule, memory ] - - profiling_type: - type: str - options: - - pytorch - - mslite - - msprof - - mindspore - default: pytorch - -envs: - ADVISOR_ANALYZE_PROCESSES: - type: int - options: [ 1, 2, 3, 4, 5, 6, 7, 8 ] - default: 1 - - DISABLE_PROFILING_COMPARISON: - type: boolean - options: [ true, false ] - default: false - - DISABLE_AFFINITY_API: - type: boolean - options: [ true, false ] - default: false +arguments: + cann_version: + type: str + options: + - 6.3.RC2 + - 7.0.RC1 + - 7.0.0 + - 8.0.RC1 + - 8.0.RC2 + - 8.0.0 + default: 8.0.0 + + torch_version: + type: str + options: + - 1.11.0 + - 2.1.0 + default: 2.1.0 + mindspore_version: + type: str + options: + - 2.3.0 + - 2.4.0 + default: 2.4.0 + analysis_dimensions: + type: list + options: + - [ computation, communication, schedule, memory ] + - [ computation ] + - [ communication ] + - [ schedule ] + - [ memory ] + default: [ computation, communication, schedule, memory ] + + profiling_type: + type: str + options: + - pytorch + - mslite + - msprof + - mindspore + default: pytorch + +envs: + ADVISOR_ANALYZE_PROCESSES: + type: int + options: [ 1, 2, 3, 4, 5, 6, 7, 8 ] + default: 1 + + DISABLE_PROFILING_COMPARISON: + type: boolean + options: [ true, false ] + default: false + + DISABLE_AFFINITY_API: + type: boolean + options: [ true, false ] + default: false diff --git a/profiler/msprof_analyze/advisor/dataset/cluster/cluster_dataset.py b/profiler/msprof_analyze/advisor/dataset/cluster/cluster_dataset.py index 4489dde44621e5650f664cd8e28262f2df613c84..b47f6d4518b45d84497fe4eac87cfe11d0fccb04 100644 --- a/profiler/msprof_analyze/advisor/dataset/cluster/cluster_dataset.py +++ b/profiler/msprof_analyze/advisor/dataset/cluster/cluster_dataset.py @@ -50,8 +50,8 @@ class ClusterDataset(Dataset): if self.is_cluster_analysis_output_exist(): return parameter = { - Constant.PROFILING_PATH: self.collection_path, - Constant.MODE: "all", + Constant.COLLECTION_PATH: self.collection_path, + Constant.ANALYSIS_MODE: "all", Constant.CLUSTER_ANALYSIS_OUTPUT_PATH: self.output_path } logger.info("cluster analysis is in the process, please wait...") diff --git a/profiler/msprof_analyze/advisor/dataset/communication/hccl_detail_dataset.py b/profiler/msprof_analyze/advisor/dataset/communication/hccl_detail_dataset.py index a1d5425b5431b9dc8149b957ae8deb95a2f9295d..fac5603b99bfd4956503fc76d6355edb8da54941 100644 --- a/profiler/msprof_analyze/advisor/dataset/communication/hccl_detail_dataset.py +++ b/profiler/msprof_analyze/advisor/dataset/communication/hccl_detail_dataset.py @@ -39,8 +39,7 @@ class HcclDetailDataset: @staticmethod def _get_hccl_pid(tasks: List[TaskInfo]): for task in tasks: - if task.name == "process_name" and hasattr(task, "args") \ - and task.args.get("name", None) in ["Communication", "HCCL"]: + if task.name == "process_name" and hasattr(task, "args") and task.args.get("name", None) == "HCCL": return task.pid return -1 diff --git a/profiler/msprof_analyze/advisor/display/html/priority_background_color.py b/profiler/msprof_analyze/advisor/display/html/priority_background_color.py index f5b89b232f4f0b2b04ec559149fc96768997ea85..6b03747a81b532364816e171b846adac1f1883fa 100644 --- a/profiler/msprof_analyze/advisor/display/html/priority_background_color.py +++ b/profiler/msprof_analyze/advisor/display/html/priority_background_color.py @@ -1,19 +1,19 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -class PriorityBackgroundColor: - high = "#B5495B" - medium = "#fcaf17" - low = "#65c294" +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +class PriorityBackgroundColor: + high = "#B5495B" + medium = "#fcaf17" + low = "#65c294" diff --git a/profiler/msprof_analyze/advisor/display/html/templates/ai_core_performance.html b/profiler/msprof_analyze/advisor/display/html/templates/ai_core_performance.html deleted file mode 100644 index 77e5e0cb55200efdf5b854e03ac2844ddc631a8f..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/advisor/display/html/templates/ai_core_performance.html +++ /dev/null @@ -1,159 +0,0 @@ -{% if format_result|length > 0 %} -
    -

    AI CORE Performance Analysis

    -
    - {% if language == "cn" %} - {% set title_ns = namespace(type='类别', desc='描述及建议', opti_set='性能优化算子集合', bound_set='bound算子集合', affinity_set='不亲和算子集合', - opti_refer=' 参考性能优化空间: ', bound_refer=' bound类型为: ', affinity_refer=' 不亲和类型为: ', title_desc='算子相关分析,参考如下: ') %} - {% else %} - {% set title_ns = namespace(type='Type', desc='Description and Suggestion', opti_set='set of performance optimization operators', - bound_set='set of bound operators', affinity_set='set of unaffine operators', opti_refer=' refer to Performance Optimization Space: ', - bound_refer=' bound type: ', affinity_refer=' type of disaffinity: ', title_desc=' Operator related analysis, referenced below: ') %} - {% endif %} - {% if format_result.cube[0]|length + format_result.cube[1]|length + format_result.cube[2]|length > 0 %} - MatMul{{ title_ns.title_desc }} -
    -
    - - - - - {% set opti_ns = namespace(total_opti='') %} - {% for opti in format_result.cube[0] %} - {% if not loop.first %} - {% set opti_ns.total_opti = opti_ns.total_opti ~ "
    " ~ opti.op_name ~ " operator shape: " ~ opti.shape ~ " dtype: " ~ opti.dtype ~ title_ns.opti_refer ~ opti.optimization ~ "%" %} - {% else %} - {% set opti_ns.total_opti = opti.op_name ~ " operator shape: " ~ opti.shape ~ " dtype: " ~ opti.dtype ~ title_ns.opti_refer ~ opti.optimization ~ "%" %} - {% endif %} - {% endfor %} - {% if opti_ns.total_opti|length > 0 %} -
    - - - - {% endif %} - {% set bound_ns = namespace(total_bound='') %} - {% for bound in format_result.cube[1] %} - {% if not loop.first %} - {% set bound_ns.total_bound = bound_ns.total_bound ~ "
    " ~ bound.op_name ~ " operator shape: " ~ bound.shape ~ " dtype: " ~ bound.dtype ~ title_ns.bound_refer ~ bound.bound %} - {% else %} - {% set bound_ns.total_bound = bound.op_name ~ " operator shape: " ~ bound.shape ~ " dtype: " ~ bound.dtype ~ title_ns.bound_refer ~ bound.bound %} - {% endif %} - {% endfor %} - {% if bound_ns.total_bound|length > 0 %} -
    - - - - {% endif %} - {% set affinity_ns = namespace(total_affinity='') %} - {% for affinity in format_result.cube[2] %} - {% if not loop.first %} - {% set affinity_ns.total_affinity = affinity_ns.total_affinity ~ "
    " ~ affinity.op_name ~ " operator shape: " ~ affinity.shape ~ " dtype: " ~ affinity.dtype ~ title_ns.affinity_refer ~ affinity.suggestion %} - {% else %} - {% set affinity_ns.total_affinity = affinity.op_name ~ " operator shape: " ~ affinity.shape ~ " dtype: " ~ affinity.dtype ~ title_ns.affinity_refer ~ affinity.suggestion %} - {% endif %} - {% endfor %} - {% if affinity_ns.total_affinity|length > 0 %} -
    - - - - {% endif %} -
    {{ title_ns.type }}{{ title_ns.desc }}
    {{ title_ns.opti_set }}{{ opti_ns.total_opti | safe }}
    {{ title_ns.bound_set }}{{ bound_ns.total_bound | safe }}
    {{ title_ns.affinity_set }}{{ affinity_ns.total_affinity | safe }}
    - {% endif %} - - {% if format_result.fa[0]|length + format_result.fa[1]|length + format_result.fa[2]|length > 0 %} - FA{{ title_ns.title_desc }} -
    - - - - - - {% set opti_ns = namespace(total_opti='') %} - {% for opti in format_result.fa[0] %} - {% if not loop.first %} - {% set opti_ns.total_opti = opti_ns.total_opti ~ "
    " ~ opti.op_name ~ " operator shape: " ~ opti.shape ~ " dtype: " ~ opti.dtype ~ title_ns.opti_refer ~ opti.optimization ~ "%" %} - {% else %} - {% set opti_ns.total_opti = opti.op_name ~ " operator shape: " ~ opti.shape ~ " dtype: " ~ opti.dtype ~ title_ns.opti_refer ~ opti.optimization ~ "%" %} - {% endif %} - {% endfor %} - {% if opti_ns.total_opti|length > 0 %} - - - - - {% endif %} - {% set bound_ns = namespace(total_bound='') %} - {% for bound in format_result.fa[1] %} - {% if not loop.first %} - {% set bound_ns.total_bound = bound_ns.total_bound ~ "
    " ~ bound.op_name ~ " operator shape: " ~ bound.shape ~ " dtype: " ~ bound.dtype ~ title_ns.bound_refer ~ bound.bound %} - {% else %} - {% set bound_ns.total_bound = bound.op_name ~ " operator shape: " ~ bound.shape ~ " dtype: " ~ bound.dtype ~ title_ns.bound_refer ~ bound.bound %} - {% endif %} - {% endfor %} - {% if bound_ns.total_bound|length > 0 %} - - - - - {% endif %} - {% set affinity_ns = namespace(total_affinity='') %} - {% for affinity in format_result.fa[2] %} - {% if not loop.first %} - {% set affinity_ns.total_affinity = affinity_ns.total_affinity ~ "
    " ~ affinity.op_name ~ " operator shape: " ~ affinity.shape ~ " dtype: " ~ affinity.dtype ~ title_ns.affinity_refer ~ affinity.suggestion %} - {% else %} - {% set affinity_ns.total_affinity = affinity.op_name ~ " operator shape: " ~ affinity.shape ~ " dtype: " ~ affinity.dtype ~ title_ns.affinity_refer ~ affinity.suggestion %} - {% endif %} - {% endfor %} - {% if affinity_ns.total_affinity|length > 0 %} - - - - - {% endif %} -
    {{ title_ns.type }}{{ title_ns.desc }}
    {{ title_ns.opti_set }}{{ opti_ns.total_opti | safe }}
    {{ title_ns.bound_set }}{{ bound_ns.total_bound | safe }}
    {{ title_ns.affinity_set }}{{ affinity_ns.total_affinity | safe }}
    - {% endif %} - - {% if format_result.vector[0]|length + format_result.vector[1]|length > 0 %} - Vector{{ title_ns.title_desc }} -
    - - - - - - {% set opti_ns = namespace(total_opti='') %} - {% for opti in format_result.vector[0] %} - {% if not loop.first %} - {% set opti_ns.total_opti = opti_ns.total_opti ~ "
    " ~ opti.op_name ~ " operator shape: " ~ opti.shape ~ " dtype: " ~ opti.dtype ~ title_ns.opti_refer ~ opti.optimization ~ "%" %} - {% else %} - {% set opti_ns.total_opti = opti.op_name ~ " operator shape: " ~ opti.shape ~ " dtype: " ~ opti.dtype ~ title_ns.opti_refer ~ opti.optimization ~ "%" %} - {% endif %} - {% endfor %} - {% if opti_ns.total_opti|length > 0 %} - - - - - {% endif %} - {% set bound_ns = namespace(total_bound='') %} - {% for bound in format_result.vector[1] %} - {% if not loop.first %} - {% set bound_ns.total_bound = bound_ns.total_bound ~ "
    " ~ bound.op_name ~ " operator shape: " ~ bound.shape ~ " dtype: " ~ bound.dtype ~ title_ns.bound_refer ~ bound.bound %} - {% else %} - {% set bound_ns.total_bound = bound.op_name ~ " operator shape: " ~ bound.shape ~ " dtype: " ~ bound.dtype ~ title_ns.bound_refer ~ bound.bound %} - {% endif %} - {% endfor %} - {% if bound_ns.total_bound|length > 0 %} - - - - - {% endif %} -
    {{ title_ns.type }}{{ title_ns.desc }}
    {{ title_ns.opti_set }}{{ opti_ns.total_opti | safe }}
    {{ title_ns.bound_set }}{{ bound_ns.total_bound | safe }}
    - {% endif %} -
    -
    -{% endif %} \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/display/html/templates/comparison.html b/profiler/msprof_analyze/advisor/display/html/templates/comparison.html index 5963e75308c447a386f50517587e857c237fc061..b81802d6b0505ca4a21e5174a0158b800d4a43ec 100644 --- a/profiler/msprof_analyze/advisor/display/html/templates/comparison.html +++ b/profiler/msprof_analyze/advisor/display/html/templates/comparison.html @@ -1,25 +1,25 @@ -{% if rows|length > 0 %} -
    -

    {{ sheet_name }}

    -
    - Issue: {{ desc }} -

    - - - {% for header in headers %} - - {% endfor %} - - - {% for row in rows %} - - {% for element in row %} - - {% endfor %} - - {% endfor %} -
    {{ header }}
    {{ element|safe }}
    - -
    -
    +{% if rows|length > 0 %} +
    +

    {{ sheet_name }}

    +
    + Issue: {{ desc }} +

    + + + {% for header in headers %} + + {% endfor %} + + + {% for row in rows %} + + {% for element in row %} + + {% endfor %} + + {% endfor %} +
    {{ header }}
    {{ element|safe }}
    + +
    +
    {% endif %} \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/display/html/templates/memory.html b/profiler/msprof_analyze/advisor/display/html/templates/memory.html index a3d75877b60ef3481a13572fbd6b0e2bb5eaf2a0..2bf57f46a1ee1b38302b9096f07b6754ca41ce82 100644 --- a/profiler/msprof_analyze/advisor/display/html/templates/memory.html +++ b/profiler/msprof_analyze/advisor/display/html/templates/memory.html @@ -1,21 +1,21 @@ -
    -

    Memory Operator Issues

    -
    - {% if rank is not none %} - Analysis of rank {{ rank|safe }}. - {% endif %} - {{ desc }} - - - - - - {% for suggestion in suggestions %} - - - - {% endfor %} -
    Suggestions
    {{ loop.index }}. {{ suggestion|safe }}
    - -
    -
    +
    +

    Memory Operator Issues

    +
    + {% if rank is not none %} + Analysis of rank {{ rank|safe }}. + {% endif %} + {{ desc }} + + + + + + {% for suggestion in suggestions %} + + + + {% endfor %} +
    Suggestions
    {{ loop.index }}. {{ suggestion|safe }}
    + +
    +
    diff --git a/profiler/msprof_analyze/advisor/display/html/templates/pp_stage_computation_analysis.html b/profiler/msprof_analyze/advisor/display/html/templates/pp_stage_computation_analysis.html index 189e6fadf863e5d1ec930690e9ef8b3012d15c51..6d2792f31ae635fe2d5863a902de50f7fa76b46f 100644 --- a/profiler/msprof_analyze/advisor/display/html/templates/pp_stage_computation_analysis.html +++ b/profiler/msprof_analyze/advisor/display/html/templates/pp_stage_computation_analysis.html @@ -1,19 +1,19 @@ -{% if stages_rendered_html|length > 0 %} -
    -

    Pipeline Parallel Stages Issues

    -
    - {% for stage_html in stages_rendered_html %} -
    -

    {{ stage_html['stage']|safe }}

    -
    - Description: analysis for slow rank {{ stage_html['rank']|safe }} in current stage -

    - {% for html in stage_html['html_list'] %} - {{ html|safe }} - {% endfor %} -
    -
    - {% endfor %} -
    -
    +{% if stages_rendered_html|length > 0 %} +
    +

    Pipeline Parallel Stages Issues

    +
    + {% for stage_html in stages_rendered_html %} +
    +

    {{ stage_html['stage']|safe }}

    +
    + Description: analysis for slow rank {{ stage_html['rank']|safe }} in current stage +

    + {% for html in stage_html['html_list'] %} + {{ html|safe }} + {% endfor %} +
    +
    + {% endfor %} +
    +
    {% endif %} \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/display/html/templates/slow_dataloader.html b/profiler/msprof_analyze/advisor/display/html/templates/slow_dataloader.html index b9ce7a574ab2a838633cb7c5181cfecb737097c9..2a3b2c4462aa6666b3ed42cc77995698ed8ce1c3 100644 --- a/profiler/msprof_analyze/advisor/display/html/templates/slow_dataloader.html +++ b/profiler/msprof_analyze/advisor/display/html/templates/slow_dataloader.html @@ -1,21 +1,21 @@ -
    -

    Slow Dataloader Issues

    -
    - {% if rank is not none %} - Analysis of rank {{ rank|safe }}. - {% endif %} - {{ desc }} - - - - - - {% for suggestion in suggestions %} - - - - {% endfor %} -
    Suggestions
    {{ loop.index }}. {{ suggestion|safe }}
    - -
    -
    +
    +

    Slow Dataloader Issues

    +
    + {% if rank is not none %} + Analysis of rank {{ rank|safe }}. + {% endif %} + {{ desc }} + + + + + + {% for suggestion in suggestions %} + + + + {% endfor %} +
    Suggestions
    {{ loop.index }}. {{ suggestion|safe }}
    + +
    +
    diff --git a/profiler/msprof_analyze/advisor/display/html/templates/sync_batchnorm.html b/profiler/msprof_analyze/advisor/display/html/templates/sync_batchnorm.html index 402404c8a43706ec4a598300eec42c7d2b7767cc..ea322276645ae9ca374f699ed7dcbaec1caad1d8 100644 --- a/profiler/msprof_analyze/advisor/display/html/templates/sync_batchnorm.html +++ b/profiler/msprof_analyze/advisor/display/html/templates/sync_batchnorm.html @@ -1,33 +1,33 @@ - -
    -

    SyncBatchNorm Issues

    -
    - {% if rank is not none %} - Analysis of rank {{ rank|safe }}. - {% endif %} - {{ desc }} - - - - - {% for item in solutions %} - {% set rowloop = loop %} - {% for key, value in item.items() %} - - - - {% endfor %} - {% endfor %} -
    Suggestions
    {{ rowloop.index }}. {{ value.desc }}
    - - More efficient code of syncbn forward as follows: - {% for item in solutions %} - {% for key, value in item.items() %} - {% if 'efficient_code' in value %} -
    {{ value.efficient_code|safe }}
    - {% endif %} - {% endfor %} - {% endfor %} - -
    -
    + +
    +

    SyncBatchNorm Issues

    +
    + {% if rank is not none %} + Analysis of rank {{ rank|safe }}. + {% endif %} + {{ desc }} + + + + + {% for item in solutions %} + {% set rowloop = loop %} + {% for key, value in item.items() %} + + + + {% endfor %} + {% endfor %} +
    Suggestions
    {{ rowloop.index }}. {{ value.desc }}
    + + More efficient code of syncbn forward as follows: + {% for item in solutions %} + {% for key, value in item.items() %} + {% if 'efficient_code' in value %} +
    {{ value.efficient_code|safe }}
    + {% endif %} + {% endfor %} + {% endfor %} + +
    +
    diff --git a/profiler/msprof_analyze/advisor/display/html/templates/synchronize_stream.html b/profiler/msprof_analyze/advisor/display/html/templates/synchronize_stream.html index eb132a6315d223ed36b096c7f6087cb53ca071d4..8636740275a66a5a7ba46703b978385cbc2df3a3 100644 --- a/profiler/msprof_analyze/advisor/display/html/templates/synchronize_stream.html +++ b/profiler/msprof_analyze/advisor/display/html/templates/synchronize_stream.html @@ -1,26 +1,26 @@ -
    -

    Synchronize Stream Issues

    -
    - {% if rank is not none %} - Analysis of rank {{ rank|safe }}. - {% endif %} - {{ desc }} - - - - - - - {% for item in solutions %} - {% set rowloop = loop %} - {% for key, value in item.items() %} - - - - - {% endfor %} - {% endfor %} -
    Suggestions
    {{ rowloop.index }}. {{ value.desc }}
    - -
    -
    +
    +

    Synchronize Stream Issues

    +
    + {% if rank is not none %} + Analysis of rank {{ rank|safe }}. + {% endif %} + {{ desc }} + + + + + + + {% for item in solutions %} + {% set rowloop = loop %} + {% for key, value in item.items() %} + + + + + {% endfor %} + {% endfor %} +
    Suggestions
    {{ rowloop.index }}. {{ value.desc }}
    + +
    +
    diff --git a/profiler/msprof_analyze/advisor/img/AI Core Performance analysis.png b/profiler/msprof_analyze/advisor/img/AI Core Performance analysis.png deleted file mode 100644 index 37708366c990fb899a9b4a846dc81fa43d5e1d43..0000000000000000000000000000000000000000 Binary files a/profiler/msprof_analyze/advisor/img/AI Core Performance analysis.png and /dev/null differ diff --git a/profiler/msprof_analyze/advisor/img/Fusible Operator Analysis.png b/profiler/msprof_analyze/advisor/img/Fusible Operator Analysis.png deleted file mode 100644 index 332b9ff838130e0daa625691aef88059dd31918d..0000000000000000000000000000000000000000 Binary files a/profiler/msprof_analyze/advisor/img/Fusible Operator Analysis.png and /dev/null differ diff --git a/profiler/msprof_analyze/advisor/interface/interface.py b/profiler/msprof_analyze/advisor/interface/interface.py index 99359174de6ecac9257189f6d3c820f39aca9f72..b3afefee57c8c62030af17130f79413238588f8f 100644 --- a/profiler/msprof_analyze/advisor/interface/interface.py +++ b/profiler/msprof_analyze/advisor/interface/interface.py @@ -1,4 +1,5 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2024-2025. All rights reserved. +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -43,8 +44,6 @@ from msprof_analyze.advisor.analyzer.schedule.gc.gc_analyzer import GcAnalyzer from msprof_analyze.advisor.analyzer.schedule.conjectured_gc.conjectured_gc_analyzer import ConjecturedGcAnalyzer from msprof_analyze.advisor.analyzer.comparison.comparison_analyzer import ComparisonAnalyzer from msprof_analyze.advisor.analyzer.schedule.fusible_ops.fusible_operator_analyzer import FusibleOperatorAnalyzer -from msprof_analyze.advisor.analyzer.computation.ai_core_performance.ai_core_performance_analyzer import \ - AICorePerformanceAnalyzer logger = logging.getLogger() @@ -75,8 +74,7 @@ class Interface: SupportedScopes.OPERATOR_NO_BOUND_ANALYSIS: OperatorBoundAnalyzer, SupportedScopes.BLOCK_DIM_ANALYSIS: BlockDimAnalyzer, SupportedScopes.GRAPH: FusionOPAnalyzer, - SupportedScopes.FREQ_ANALYSIS: AICoreFreqAnalyzer, - SupportedScopes.AICORE_PERFORMANCE_ANALYSIS: AICorePerformanceAnalyzer + SupportedScopes.FREQ_ANALYSIS: AICoreFreqAnalyzer }), COMMUNICATION: OrderedDict({SupportedScopes.PACKET: PacketAnalyzer, SupportedScopes.COMMUNICATION_RETRANSMISSION_DETECTION: RDMARetransmissionAnalyzer, diff --git a/profiler/msprof_analyze/advisor/rules/cn/aicore_performance.yaml b/profiler/msprof_analyze/advisor/rules/cn/aicore_performance.yaml deleted file mode 100644 index dcdc3e188f4684c4a80e5a3e064878fb823e3b70..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/advisor/rules/cn/aicore_performance.yaml +++ /dev/null @@ -1,48 +0,0 @@ -cube_problem: "Cube算子性能分析" -fa_problem: "FA算子性能分析" -vector_problem: "Vector算子性能分析" -description: "提供一些AICORE算子的参考瓶颈" -bound_description: "bound算子集合" -optimization_description: "性能优化算子集合" -affinity_description: "不亲和算子集合" -cube_affinity_desc: "内轴无法被256整除" -fa_affinity_desc_head_dim_128: "D不能被128整除" -fa_affinity_desc_seq_len_128: "S不能被128整除" -fa_affinity_desc_head_dim_seq_len_128: "D和S均不能被128整除" -suggestion: "请根据亲和性、bound类型或优化空间尝试分析筛选出来的算子" -affinity_suggestion: "{op_name}算子 shape: {shape} dtype: {dtype} 有不亲和特征: {suggestion}\n" -bound_suggestion: "{op_name}算子 shape: {shape} dtype: {dtype} bound类型为: {bound} bound\n" -optimization_suggestion: "{op_name}算子 shape: {shape} dtype: {dtype} 疑似有性能优化空间,参考性能优化空间: {optimization}%\n" - -cube_operators: - - target: aic_mac_ratio - bound: mac - threshold: 0.8 - - target: aic_mte2_ratio - bound: mte2 - threshold: 0.95 - -fa_operators: - - target: aic_mte2_ratio - bound: mac - threshold: 0.8 - - target: aic_fixpipe_ratio - bound: fixpipe - threshold: 0.75 - - target: aiv_vec_ratio - bound: vec - threshold: 0.75 - -vector_operators: - - target: total - bound: vec_mte2_mte3 - threshold: 0.9 - - target: aiv_vec_ratio - bound: vec - threshold: 0.7 - - target: aiv_mte2_ratio - bound: mte2 - threshold: 0.7 - - target: aiv_mte3_ratio - bound: mte3 - threshold: 0.7 \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/rules/en/aicore_performance.yaml b/profiler/msprof_analyze/advisor/rules/en/aicore_performance.yaml deleted file mode 100644 index 68ab59f16937880c7428330811005297d1551b0d..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/advisor/rules/en/aicore_performance.yaml +++ /dev/null @@ -1,48 +0,0 @@ -cube_problem: "Cube operator performance analysis" -fa_problem: "FA operator performance analysis" -vector_problem: "Vector operator performance analysis" -description: "Provide some reference bottlenecks for the AICORE operator" -bound_description: "set of bound operators" -optimization_description: "set of performance optimization operators" -affinity_description: "set of unaffine operators" -cube_affinity_desc: "Then inner axis is not divisible by 256" -fa_affinity_desc_head_dim_128: "D is not divisible by 128" -fa_affinity_desc_seq_len_128: "S is not divisible by 128" -fa_affinity_desc_head_dim_seq_len_128: "Neither D nor S is not divisible by 128" -suggestion: "Please try to analyze the filtered operators based on affinity, bound type or optimization space" -affinity_suggestion: "{op_name} Op shape: {shape} dtype: {dtype} with disaffection characteristics: {suggestion}\n" -bound_suggestion: "{op_name} Op shape: {shape} dtype: {dtype} bound type: {bound} bound\n" -optimization_suggestion: "{op_name} Op shape: {shape} dtype: {dtype} suspect there is room for performance optimization, refer to Performance Optimization Space: {optimization}%\n" - -cube_operators: - - target: aic_mac_ratio - bound: mac - threshold: 0.8 - - target: aic_mte2_ratio - bound: mte2 - threshold: 0.95 - -fa_operators: - - target: aic_mte2_ratio - bound: mac - threshold: 0.8 - - target: aic_fixpipe_ratio - bound: fixpipe - threshold: 0.75 - - target: aiv_vec_ratio - bound: vec - threshold: 0.75 - -vector_operators: - - target: total - bound: vec_mte2_mte3 - threshold: 0.9 - - target: aiv_vec_ratio - bound: vec - threshold: 0.7 - - target: aiv_mte2_ratio - bound: mte2 - threshold: 0.7 - - target: aiv_mte3_ratio - bound: mte3 - threshold: 0.7 \ No newline at end of file diff --git a/profiler/msprof_analyze/advisor/rules/timeline_fusion_ops.yaml b/profiler/msprof_analyze/advisor/rules/timeline_fusion_ops.yaml index 34de80add2ec849a648e64cf6c8b1e3edb1f0cc5..3337c938625ccd4b4ea77a0dafa9879222cf1bfe 100644 --- a/profiler/msprof_analyze/advisor/rules/timeline_fusion_ops.yaml +++ b/profiler/msprof_analyze/advisor/rules/timeline_fusion_ops.yaml @@ -66,23 +66,4 @@ torch_npu.npu_geglu: [ "(slice|chunk)-gelu-mul", "(slice|chunk)-mul-gelu" ] torch_npu.npu_group_norm_silu: [ "group_norm-silu" ] torch.addmm: [ "mul-mul-add" ] - torch_npu.npu_add_layer_norm: [ "add-layer_norm" ] - -- cann_version: 8.0.RC3 - torch_version: [1.11.0, 2.1.0] - unique_id: 4 - inherit_unique_id: 3 - operator_rules: - aten: - add: - mindspeed.ops.npu_matmul_add: [ "matmul-add" ] - -- cann_version: 8.0.RC3 - torch_version: [1.11.0, 2.1.0] - unique_id: 5 - inherit_unique_id: 4 - operator_rules: - aten: - add: - mindspeed.ops.npu_moe_token_permute: ["argsort-argsort-index_select"] - mindspeed.ops.npu_moe_token_unpermute: ["index_select-mul-reduce_sum"] \ No newline at end of file + torch_npu.npu_add_layer_norm: [ "add-layer_norm" ] \ No newline at end of file diff --git a/profiler/msprof_analyze/cli/cluster_cli.py b/profiler/msprof_analyze/cli/cluster_cli.py index adaf0f8d7cab8eff139125fbb4699ea962e3a427..0cdb2bd2b10b2ede411d10221e36a51e3f015e12 100644 --- a/profiler/msprof_analyze/cli/cluster_cli.py +++ b/profiler/msprof_analyze/cli/cluster_cli.py @@ -34,7 +34,6 @@ context_settings['ignore_unknown_options'] = True @click.option("--parallel_mode", type=str, help="context mode", default="concurrent") @click.option("--export_type", help="recipe export type", type=click.Choice(["db", "notebook"]), default="db") @click.option("--rank_list", type=str, help="Rank id list", default='all') -@click.option("--step_id", type=int, help="Step id", default=Constant.VOID_STEP) @click.argument('args', nargs=-1) def cluster_cli(**kwargs) -> None: Interface(kwargs).run() diff --git a/profiler/msprof_analyze/cli/entrance.py b/profiler/msprof_analyze/cli/entrance.py index 534a9b133c7e60d1442cb290490a79e9256ce43d..0aa61f1b6aee2a5b6b321e8e3fb7a04ed63ff98a 100644 --- a/profiler/msprof_analyze/cli/entrance.py +++ b/profiler/msprof_analyze/cli/entrance.py @@ -22,6 +22,7 @@ from msprof_analyze.cli.complete_cli import auto_complete_cli from msprof_analyze.cli.compare_cli import compare_cli from msprof_analyze.cli.cluster_cli import cluster_cli from msprof_analyze.advisor.version import print_version_callback, cli_version +from msprof_analyze.cli.precheck_cli import precheck_cli logger = logging.getLogger() CONTEXT_SETTINGS = dict(help_option_names=['-H', '-h', '--help'], @@ -31,7 +32,8 @@ COMMAND_PRIORITY = { "advisor": 1, "compare": 2, "cluster": 3, - "auto-completion": 4 + "precheck": 4, + "auto-completion": 5 } @@ -66,5 +68,6 @@ def msprof_analyze_cli(**kwargs): msprof_analyze_cli.add_command(analyze_cli, name="advisor") msprof_analyze_cli.add_command(compare_cli, name="compare") msprof_analyze_cli.add_command(cluster_cli, name="cluster") +msprof_analyze_cli.add_command(precheck_cli, name="precheck") msprof_analyze_cli.add_command(auto_complete_cli, name="auto-completion") diff --git a/profiler/msprof_analyze/cli/precheck_cli.py b/profiler/msprof_analyze/cli/precheck_cli.py new file mode 100644 index 0000000000000000000000000000000000000000..c70b540ce9e3a8c9f718fe6be34917f8fed7b85d --- /dev/null +++ b/profiler/msprof_analyze/cli/precheck_cli.py @@ -0,0 +1,159 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import sys +import ipaddress +import logging +from functools import wraps + +import click + +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) +CONTEXT_SETTINGS = dict(help_option_names=['-H', '-h', '--help']) + + +@click.group(context_settings=CONTEXT_SETTINGS) +def precheck_cli(): + """Profiler precheck tool""" + pass + + +def common_options(f): + """Common options for both precheck and runner commands""" + + @wraps(f) + def wrapper(*args, **kwargs): + return f(*args, **kwargs) + + wrapper = click.option('--master_addr', required=True, + help='IP address of the master node (first node in the cluster)')(wrapper) + wrapper = click.option('--master_port', type=int, default=29500, + help='Port on the master node for communication. Default is 29500')(wrapper) + wrapper = click.option('--nnodes', type=int, required=True, + help='Total number of nodes in the distributed setup')(wrapper) + wrapper = click.option('--nproc_per_node', type=int, required=True, + help='Number of processes to run per node')(wrapper) + wrapper = click.option('--node_prof_save_dir', default='', callback=PathManager.expanduser_for_cli, + help='Directory for saving node profiling data')(wrapper) + wrapper = click.option('--master_prof_gather_dir', default='', callback=PathManager.expanduser_for_cli, + help='Directory for saving gathered profiling data in master node')(wrapper) + wrapper = click.option('--output_dir', default='./output', callback=PathManager.expanduser_for_cli, + help='Directory to save profiling dump data, logs, and advisor reports')(wrapper) + wrapper = click.option('--task_name', default='', + help='Name of the task or experiment')(wrapper) + wrapper = click.option('--static', is_flag=True, + help='If set, run profiling in static mode')(wrapper) + wrapper = click.option('--profiling_cmd', default="", + help='Command to run the profiler script')(wrapper) + wrapper = click.option('--prof_in_shared_storage', is_flag=True, + help='If set, skip data collection as profiling data is in shared storage')(wrapper) + return wrapper + + +def validate_ip_list(ctx, param, value): + if not value: + return [] + try: + ips = [ip.strip() for ip in value.split(',')] + # Validate each IP + for ip in ips: + ipaddress.ip_address(ip) + return ips + except ValueError as e: + raise click.BadParameter(f'Invalid IP address in list: {e}') + + +@precheck_cli.command(context_settings=CONTEXT_SETTINGS, + name="start_all", + short_help='Start precheck on all nodes via ssh') +@common_options +@click.option('--host_ips', + callback=validate_ip_list, + help='Comma-separated list of IP addresses for nodes in distributed training (e.g., "192.168.1.1,192.168.1.2")') +@click.option('--python_path', default=sys.executable, callback=PathManager.expanduser_for_cli, + help='Path to the Python interpreter') +@click.option('--host_config_file', default='', callback=PathManager.expanduser_for_cli, + help='Path to the host configuration file (CSV format with node connection details)') +def precheck_start_all(**kwargs): + """Run precheck command""" + # Add validation + if not kwargs.get('host_ips') and not kwargs.get('host_config_file'): + raise click.UsageError('Either --host_ips or --host_config_file must be specified') + + if kwargs.get('host_ips') and kwargs.get('host_config_file'): + raise click.UsageError('Cannot specify both --host_ips and --host_config_file') + + from msprof_analyze.precheck.manager.args_manager import PrecheckArgsManager + from msprof_analyze.precheck.__main__ import main as precheck_main + + args = PrecheckArgsManager(type('Args', (), kwargs)) + click.echo(args) + precheck_main(args) + + +@precheck_cli.command(context_settings=CONTEXT_SETTINGS, + name="start_node", + short_help='Start one node precheck, if your nnodes > 1, you need to run this command on each node') +@common_options +@click.option('--node_rank', type=int, required=True, + help='Rank of the current node') +def precheck_start_node(**kwargs): + """Run precheck runner command""" + from msprof_analyze.precheck.manager.args_manager import PrecheckRunnerArgsManager + from msprof_analyze.precheck.runner.__main__ import main as runner_main + + args = PrecheckRunnerArgsManager(type('Args', (), kwargs)) + click.echo(args) + + runner_main(args) + + +@precheck_cli.command(context_settings=CONTEXT_SETTINGS, + name="env", + short_help='execute environment precheck') +@click.option('--nproc_per_node', type=int, required=True, + help='Number of processes to run per node') +@click.option('--nnodes', type=int, required=True, + help='Total number of nodes in the distributed setup') +@click.option('--node_rank', type=int, required=True, + help='Rank of the current node') +@click.option('--master_addr', type=str, required=False, + help='IP address of the master node', default="localhost") +@click.option('--master_port', type=int, required=False, + help='Port on the master node for communication', default=6000) +@click.option('--tensor-model-parallel-size', type=int, required=False, + help='Degree of tensor parallelism', default=1) +@click.option('--pipeline-model-parallel-size', type=int, required=False, + help='Degree of pipeline parallelism', default=1) +@click.option('--context-parallel-size', type=int, required=False, + help='Degree of context parallelism', default=1) +@click.option('--expert-model-parallel-size', type=int, required=False, + help='Degree of expert parallelism', default=1) +@click.option('--output', type=str, required=False, + help='Output path', default="./output") +@click.option('--check-type', type=str, required=False, + help='Environment precheck type', default="all") +def environment_precheck(**kwargs): + from msprof_analyze.precheck.precheck import Precheck + + click.echo(kwargs) + Precheck.env_precheck(**kwargs) + + +if __name__ == '__main__': + precheck_cli() \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/README.md b/profiler/msprof_analyze/cluster_analyse/README.md index 325a0984793297dfac28673f04a582ea7b4316b9..5147fa651481f7ea894f9158cec1b36a56641634 100644 --- a/profiler/msprof_analyze/cluster_analyse/README.md +++ b/profiler/msprof_analyze/cluster_analyse/README.md @@ -1,329 +1,207 @@ -# 集群分析工具 -cluster_analyse(集群分析工具)是在集群场景下,通过此工具来进行集群数据的分析,当前主要对基于通信域的迭代内耗时分析、通信时间分析以及通信矩阵分析为主, 从而定位慢卡、慢节点以及慢链路问题。 - -## 性能数据采集 -当前集群调优工具主要支持PyTorch场景的Ascend PyTorch Profiler采集方式和MindSpore场景的MindSpore Profiler采集方式下的集群数据。 - -此工具只需要NPU的性能数据作为输入。 - -Ascend PyTorch Profiler采集方法请参见《[NPU性能数据采集](https://gitee.com/ascend/mstt/tree/master/profiler/msprof_analyze)》,MindSpore Profiler采集方法请参见《[性能调试](https://www.mindspore.cn/mindinsight/docs/zh-CN/r2.3/performance_profiling_ascend.html)》。 - -我们要求至少是L1级别的数据。 -```python -experimental_config = torch_npu.profiler._ExperimentalConfig( - profiler_level=torch_npu.profiler.ProfilerLevel.Level1 -) -``` -### 确认数据是否可用 - -打开采集到的某张卡数据(\*ascend_pt、\*ascend_ms结尾的文件夹),可用的数据应该具备: - -- ./profiler_info_x.json, -- ./ASCEND_PROFILER_OUTPUT/step_trace_time.csv, -- ./ASCEND_PROFILER_OUTPUT/trace_view.json, -- ./ASCEND_PROFILER_OUTPUT/kernel_details.csv, -- ./ASCEND_PROFILER_OUTPUT/communication.json, -- ./ASCEND_PROFILER_OUTPUT/communication_matrix.json - -或者具备: - -- analysis.db -- ascend_pytorch_profiler_{rank_id}.db - -以上csv、json文件与db文件只能存在一类,否则集群分析工具解析异常。MindSpore场景暂不支持以上db文件。 - -确认这几个文件生成后,继续下面的集群分析。 - -## 数据汇聚与解析 - -### 操作步骤 - -1. 参见《[性能工具](../README.md)》完成工具安装。建议安装最新版本。 - -2. 将所有卡的数据拷贝并汇集到一个目录下,运行以下命令,在该目录下即可生成cluster_analysis_output文件夹。 - - ```bash - msprof-analyze cluster -d {cluster profiling data path} [-m mode] [-o output_path] [--data_simplification] [--force] - ``` - - 或 - - ```bash - python3 cluster_analysis.py -d {cluster profiling data path} [-m mode] [-o output_path] [--data_simplification] [--force] - ``` - - 参数说明: - - | 参数名 | 说明 | 是否必选 | - | --------------------- | ------------------------------------------------------------ | -------- | - | --profiling_path或-d | 性能数据汇集目录。未配置-o参数时,运行分析脚本之后会在该目录下自动创建cluster_analysis_output文件夹,保存分析数据。 | 是 | - | --output_path或-o | 自定义输出路径,运行分析脚本之后会在该目录下自动创建cluster_analysis_output文件夹,保存分析数据。 | 否 | - | --mode或-m | 数据解析模式,取值详见“**--mode参数说明**”表。 | 否 | - | --data_simplification | 数据精简模式。对于数据量过大的性能数据db文件,可以通过配置该参数将数据精简,并提高工具分析效率。配置该参数表示开启数据精简,默认未配置表示关闭。 | 否 | - | --force | 强制执行cluster。配置后可强制跳过如下情况:
    指定的目录、文件的用户属主不属于当前用户,忽略属主判断直接执行。
    csv文件大于5G、json文件大于10G、db文件大于8G,忽略文件过大判断直接执行。
    配置该参数表示开启强制执行,默认未配置表示关闭。 | 否 | - | --parallel_mode | 设置收集多卡、多节点db数据时的并发方式。取值为concurrent(使用concurrent.feature进程池实现并发)。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum和自定义分析参数时可配置此参数。** | 否 | - | --export_type | 设置导出的数据形式。取值为db(.db格式文件)和notebook(Jupyter Notebook文件),默认值为db。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum和自定义分析参数时可配置此参数。** | 否 | - | --rank_list | 对特定Rank上的数据进行统计,默认值为all(表示对所有Rank进行统计),须根据实际卡的Rank ID配置。应配置为大于等于0的整数,若所配置的值大于实际训练所运行的卡的Rank ID,则仅解析合法的RankID的数据,比如当前环境Rank ID为0到7,实际训练运行0到3卡,此时若配置Rank ID为0, 3, 4或不存在的10等其他值,则仅解析0和3。配置示例:--rank_list 0, 1, 2。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum和自定义分析参数时可配置此参数。** | 否 | - | --step_id | 性能数据Step ID,配置后对该Step的性能数据进行分析。需配置性能数据中实际存在的Step ID,默认未配置,表示全量分析。配置示例:--step_id=1。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum和自定义分析参数时可配置此参数。** | 否 | - | --top_num | 设置TopN耗时的通信算子的数量,默认值为15,配置示例:--top_num 20。
    **只有-m配置hccl_sum时可配置此参数。** | 否 | - | --exclude_op_name | 控制compute_op_name结果是否包含op_name,示例:--exclude_op_name,后面不需要跟参数。
    **只有-m配置compute_op_sum时可配置此参数。** | 否 | - - --mode参数说明: - - | 参数名 | 说明 | 是否必选 | - | -------------------- | ------------------------------------------------------------ | -------- | - | communication_matrix | 解析通信矩阵数据。 | 否 | - | communication_time | 解析通信耗时数据。 | 否 | - | all | 解析内容包括:
    通信矩阵communication_matrix
    通信耗时数据communication_time
    汇总集群内的节点信息(基于ascend_pytorch_profiler_{rank_id}.db生成)
    --mode参数默认值为all。 | 否 | - | cann_api_sum | 集群API性能数据汇总分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/CannApiSum目录下输出交付件stats.ipynb。 | 否 | - | compute_op_sum | 集群场景性能数据的device运行算子信息汇总分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/ComputeOpSum目录下输出交付件stats.ipynb;可根据实际情况决定是否是否打开--exclude_op_name。 | 否 | - | hccl_sum | 集合通信算子耗时分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/HcclSum目录下输出交付件stats.ipynb。 | 否 | - | mstx_sum | 集群场景mstx打点信息汇总分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/MstxSum目录下输出交付件stats.ipynb。 | 否 | - | 自定义分析参数 | 与cann_api_sum、compute_op_sum、hccl_sum等参数功能类似,用户可自定义一套性能数据的分析规则,需要详细了解性能分析的开发人员,具体开发指导请参见“[自定义分析规则开发指导](#自定义分析规则开发指导)”。 | 否 | - - --parallel_mode参数示例如下: - - ```bash - msprof-analyze cluster -d {cluster profiling data path} -m cann_api_sum --parallel_mode concurrent - ``` - - 或 - - ```bash - python3 cluster_analysis.py -d {cluster profiling data path} -m cann_api_sum --parallel_mode concurrent - ``` - - -### 交付件 - -集群分析工具的交付件通过MindStudio Insight工具展示,详见《[MindStudio Insight用户指南](https://www.hiascend.com/document/detail/zh/mindstudio/70RC2/GUI-baseddevelopmenttool/msascendinsightug/AscendInsight_0002.html)》。 - -#### cluster_step_trace_time.csv - -数据解析模式为communication_matrix、communication_time或all时均生成。 - -A列: Step数,是采集性能数据时设置的,一般来说集群性能数据采集一个step足够,如果采集多个step,需要先筛选一下。 - -B列: Type,主要分两种,rank和stage, 和后面的index强相关,可以理解为一个是单卡rank,一个是rank group(pp 并行的stage),如果type为stage,则后面D-K列信息为rank group下的最大值。 - -C列:Index,与type相关,表示卡号。 - -D列:Computing, 此列统计计算时间。 - -E列:Communication(Not Overlapped),此列统计未被掩盖的通信耗时。 - -F列:Overlapped,统计计算与通信重叠的耗时。 - -G列:Communication,通信时间的全部耗时。 - -H列:Free,空闲时间,指device侧既不在通信也不在计算的耗时,可能在做sdma拷贝或者空等。 - -I列:Stage时间,I、J、K列属于pp并行时有效的数值,stage时间代表除receive算子时间外的时间。 - -J列:Bubble时间,指receive时间的总和。 - -K列:Communication(Not Overlapped and Exclude Receive)指剔除receive算子外的并且不被掩盖的通信时间。 - -L列:Preparing,指迭代开始到首个计算或通信算子运行的时间。 - -M列:DP Index,指集群数据按照并行策略切分后所属DP组的索引, 如果没有采集则不显示。 - -N列:PP Index,指集群数据按照并行策略切分后所属PP组的索引,如果没有采集则不显示。 - -O列:TP Index,指集群数据按照并行策略切分后所属TP组的索引,如果没有采集则不显示。 - -**Tips**:先筛选B列type为stage, 看stage间是否有问题,再筛选B列type为rank,看rank是否有问题,根据以下几点排查。 - -* 根据Computing的时间差异判断是否有慢卡,或者有负载不均衡的现象。 - -* 根据Free统计是否有host bound或者分布不均现象。 - -* 根据Communication(Not Overlapped and Exclude Receive)时间判断是否通信耗时占比过大。 - -* 根据Bubble时间的占比和理论计算公式判断bubble设置是否合理,是否stage间有不均衡现象。 - -以上时间理论上都应该处于持平状态,即最大值小于最小值5%,否则就可能出现慢卡。 - -#### cluster_communication_matrix.json - -数据解析模式为communication_matrix或all时生成。 - -直接打开json(vscode或json查看器), 搜索"Total", 会有多个搜索结果,一般来说链路带宽信息的结构: - -```bash -{src_rank}-{dst_rank}: { - "Transport Type": "LOCAL", - "Transit Time(ms)": 0.02462, - "Transit Size(MB)": 16.777216, - "Bandwidth(GB/s)": 681.4466 -} -``` -**Tips**:可以根据rank互联的带宽以及链路类型,判断是否有慢链路的问题。 - -- "LOCAL"是片内拷贝,速度最高。 -- “HCCS”或“PCIE”是节点内片间拷贝,速度居中。 -- “RDMA”是节点间拷贝,速度最低。 - -#### cluster_communication.json - -数据解析模式为communication_time或all时生成。 - -主要为通信耗时数据。 - -#### cluster_analysis.db - -解析analysis.db或ascend_pytorch_profiler_{rank_id}.db生成的交付件,根据数据解析模式不同而解析不同的数据,可以使用MindStudio Insight工具展示。 - -#### communication_group.json - -记录通信域信息,解析analysis.db生成的交付件,collective表示集合通信域,P2P表示点对点通信,用户无须关注该文件。 - -#### stats.ipynb - -- 数据解析模式为cann_api_sum时生成,保存在cluster_analysis_output/CannApiSum目录下。 - - 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群API耗时信息。 - -- 数据解析模式为compute_op_sum时生成,保存在cluster_analysis_output/ComputeOpSum目录下。 - - 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群计算算子耗时分析(将集群所有计算算子进行汇总并以图表展示),集群Rank计算算子耗时分析(将每个Rank的计算算子进行各自汇总)。 - -- 数据解析模式为hccl_sum时生成,保存在cluster_analysis_output/HcclSum目录下。 - - 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群通信算子耗时分析(将集群所有通信算子进行汇总并以图表展示),集群Rank通信算子耗时分析(将每个Rank的通信算子进行各自汇总)、Top通信算子信息展示。 - -- 数据解析模式为mstx_sum时生成,保存在cluster_analysis_output/MstxSum目录下。 - - 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群场景mstx打点信息,分为框架侧、CANN侧和Device侧三部分的打点信息。 - -## 附录 - -### 自定义分析规则开发指导 - -自定义分析规则是基于对Profiling的analysis.db和ascend_pytorch_profiler_{rank_id}.db文件进行性能数据分析而开发。与cann_api_sum、compute_op_sum、hccl_sum等参数功能实现类似,可自定义一套性能数据的分析规则,方法如下: - -1. 在mstt工具代码仓profiler/msprof_analyze/cluster_analyse/recipes目录下创建xxx目录和xxx.py文件。 - - 例如:profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/cann_api_sum.py,其中目录名和文件名要保持一致,该目录名也会作为使用msprof-analyze cluster工具启动该自定义分析的开关参数。 - -2. 在xxx.py文件进行性能数据分析规则的开发,开发要求继承BaseRecipeAnalysis,实现run函数。 - - 典型的run函数实现: - - ```python - def run(self, context): - mapper_res = self.mapper_func(context) - self.reducer_func(mapper_res) - if self._export_type == "db": - self.save_db() - elif self._export_type == "notebook": - self.save_notebook() - else: - logger.error("Unknown export type.") - ``` - - 1. `mapper_func`函数:多卡数据查询并合并返回结果。由于集群数据每张卡的数据处理是同样的,因此采用context并行处理集群数据并将结果按序拼装返回。开发只需要实现单卡数据处理的函数`self._mapper_fun`。 - - ```python - def mapper_func(self, context): - return context.wait( - context.map( - self._mapper_func, - self._get_rank_db(), - analysis_class=self._recipe_name - ) - ) - ``` - - ```python - def _mapper_func(self, data_map, analysis_class): - """ - Extract the profiling data required for cluster analysis from each device, and then aggregate the - results from each device to be processed by a reduce function. - Params: - data_map: eg. {"RANK_ID": 1, "profiler_db_path": "xxxx/ascend_pytorch_profiler_1.db"} - analysis_class: hccl_sum, compute_op_sum, cann_api_sum, mstx_sum...... - """ - pass - ``` - - 2. `reducer_func`函数:对多卡结果分析处理。接收`mapper_func`函数的返回值,进行进一步的集群数据的汇总分析,数据结构采用dataframe。 - - 3. `save_db`函数:分析结果保存在cluster_analysis.db中。 - - 4. `save_notebook`函数:分析结果以csv和stats.ipynb的形式保存。 - -3. `self._mapper_fun`函数依赖单db数据查询,可通过可通过如下两种方式。 - - 1. 使用DatabaseService可配置单表的查询。 - - 可参考:https://gitee.com/ascend/mstt/blob/pre-research/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/mstx2commop.py - - 使用样例: - - ```Python - service = DatabaseService(profiler_db_path) - service.add_table_for_query("ENUM_HCCL_DATA_TYPE", ["id", "name"]) # 第一个参数:表名;第二个参数:字段列表,默认为None,当不填写时表明select * - service.add_table_for_query("STRING_IDS", ["id", "value"]) #可 以添加多个表 - df_dict = service.query_data() # 将配置的所有表按序查询,以dict形式返回,key为表名,value为数据库查询结果dataframe数据类型 - ``` - - 2. 维护在msprof_analyze/prof_exports目录下,新建一个py文件,需继承自BaseStatsExport(注:新增之前可以看现有的是否可用,避免重复)如下示例: - - ```Python - from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport - - QUERY = """ - SELECT - NAME_IDS.value AS "OpName", - TYPE_IDS.value AS "OpType", - round(endNs - startNs) AS "Duration", - GROUP_NAME_IDS.value AS "GroupName" - FROM - COMMUNICATION_OP - LEFT JOIN - STRING_IDS AS TYPE_IDS - ON TYPE_IDS.id == COMMUNICATION_OP.opType - LEFT JOIN - STRING_IDS AS NAME_IDS - ON NAME_IDS.id == COMMUNICATION_OP.opName - LEFT JOIN - STRING_IDS AS GROUP_NAME_IDS - ON GROUP_NAME_IDS.id == COMMUNICATION_OP.groupName - """ - - - class HcclSumExport(BaseStatsExport): - def __init__(self, db_path, recipe_name): - super().__init__(db_path, recipe_name) - self._query = QUERY - ``` - - 使用样例:df = HcclSumExport(profiler_db_path, analysis_class).read_export_db(),返回的数据类型是dataframe。 - -4. 分析规则增加拓展参数。 - - 实现函数add_parser_argument,样例如下: - - ```Python - @classmethod - def add_parser_argument(cls, parser): - parser.add_argument("--top_num", type=str, help="Duration cost top count", default=cls.DEFAULT_TOP_NUM) - ``` - - 从self._extra_args里获取对应的扩展参数: - - ```Python - def __init__(self, params): - super().__init__(params) - top_num = self._extra_args.get(self.TOP_NUM, self.DEFAULT_TOP_NUM) - self.top_num = int(top_num) if isinstance(top_num, str) and top_num.isdigit() else self.DEFAULT_TOP_NUM - ``` - -5. 执行自定义分析规则命令。 - - ```bash - msprof-analyze cluster -d {cluster profiling data path} --mode xxx --top_num 10 - ``` - - +# 集群分析工具 +cluster_analyse(集群分析工具)是在集群场景下,通过此工具来进行集群数据的分析,当前主要对基于通信域的迭代内耗时分析、通信时间分析以及通信矩阵分析为主, 从而定位慢卡、慢节点以及慢链路问题。 + +## 性能数据采集 +当前集群调优工具主要支持PyTorch场景的Ascend PyTorch Profiler采集方式和MindSpore场景的MindSpore Profiler采集方式下的集群数据。 + +此工具只需要NPU的性能数据作为输入。 + +Ascend PyTorch Profiler采集方法请参见《[NPU性能数据采集](https://gitee.com/ascend/mstt/tree/master/profiler/msprof_analyze)》,MindSpore Profiler采集方法请参见《[性能调试](https://www.mindspore.cn/mindinsight/docs/zh-CN/r2.3/performance_profiling_ascend.html)》。 + +我们要求至少是L1级别的数据。 +```python +experimental_config = torch_npu.profiler._ExperimentalConfig( + profiler_level=torch_npu.profiler.ProfilerLevel.Level1 +) +``` +### 确认数据是否可用 + +打开采集到的某张卡数据(\*ascend_pt、\*ascend_ms结尾的文件夹),可用的数据应该具备: + +- ./profiler_info_x.json, +- ./ASCEND_PROFILER_OUTPUT/step_trace_time.csv, +- ./ASCEND_PROFILER_OUTPUT/trace_view.json, +- ./ASCEND_PROFILER_OUTPUT/kernel_details.csv, +- ./ASCEND_PROFILER_OUTPUT/communication.json, +- ./ASCEND_PROFILER_OUTPUT/communication_matrix.json + +或者具备: + +- analysis.db +- ascend_pytorch_profiler_{rank_id}.db + +以上csv、json文件与db文件只能存在一类,否则集群分析工具解析异常。MindSpore场景暂不支持以上db文件。 + +确认这几个文件生成后,继续下面的集群分析。 + +## 数据汇聚与解析 + +### 操作步骤 + +1. 参见《[性能工具](../README.md)》完成工具安装。建议安装最新版本。 + +2. 将所有卡的数据拷贝并汇集到一个目录下,运行以下命令,在该目录下即可生成cluster_analysis_output文件夹。 + + ```bash + msprof-analyze cluster -d {cluster profiling data path} [-m mode] [-o output_path] [--data_simplification] [--force] + ``` + + 或 + + ```bash + python3 cluster_analysis.py -d {cluster profiling data path} [-m mode] [-o output_path] [--data_simplification] [--force] + ``` + + 参数说明: + + | 参数名 | 说明 | 是否必选 | + | --------------------- | ------------------------------------------------------------ | -------- | + | --profiling_path或-d | 性能数据汇集目录。未配置-o参数时,运行分析脚本之后会在该目录下自动创建cluster_analysis_output文件夹,保存分析数据。 | 是 | + | --output_path或-o | 自定义输出路径,运行分析脚本之后会在该目录下自动创建cluster_analysis_output文件夹,保存分析数据。 | 否 | + | --mode或-m | 数据解析模式,取值详见“**--mode参数说明**”表。 | 否 | + | --data_simplification | 数据精简模式。对于数据量过大的性能数据db文件,可以通过配置该参数将数据精简,并提高工具分析效率。配置该参数表示开启数据精简,默认未配置表示关闭。 | 否 | + | --force | 强制执行cluster。配置后可强制跳过如下情况:
    指定的目录、文件的用户属主不属于当前用户,忽略属主判断直接执行。
    csv文件大于5G、json文件大于10G、db文件大于8G,忽略文件过大判断直接执行。
    配置该参数表示开启强制执行,默认未配置表示关闭。 | 否 | + | --parallel_mode | 设置收集多卡、多节点db数据时的并发方式。取值为concurrent(使用concurrent.feature进程池实现并发)。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum时可配置此参数。** | 否 | + | --export_type | 设置导出的数据形式。取值为db(.db格式文件)和notebook(Jupyter Notebook文件),默认值为db。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum时可配置此参数。** | 否 | + | --rank_list | 对特定Rank上的数据进行统计,默认值为all(表示对所有Rank进行统计),须根据实际卡的Rank ID配置。应配置为大于等于0的整数,若所配置的值大于实际训练所运行的卡的Rank ID,则仅解析合法的RankID的数据,比如当前环境Rank ID为0到7,实际训练运行0到3卡,此时若配置Rank ID为0, 3, 4或不存在的10等其他值,则仅解析0和3。配置示例:--rank_list 0, 1, 2。
    **只有-m配置cann_api_sum、compute_op_sum、hccl_sum、mstx_sum时可配置此参数。** | 否 | + | --top_num | 设置TopN耗时的通信算子的数量,默认值为15,配置示例:--top_num 20。
    **只有-m配置hccl_sum时可配置此参数。** | 否 | + | --exclude_op_name | 控制compute_op_name结果是否包含op_name,示例:--exclude_op_name,后面不需要跟参数。
    **只有-m配置compute_op_sum时可配置此参数。** | 否 | + + --mode参数说明: + + | 参数名 | 说明 | 是否必选 | + |----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| + | communication_matrix | 解析通信矩阵数据。 | 否 | + | communication_time | 解析通信耗时数据。 | 否 | + | all | 同时解析通信矩阵communication_matrix和通信耗时数据communication_time,--mode参数默认值为all。 | 否 | + | cann_api_sum | 集群API性能数据汇总分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/CannApiSum目录下输出交付件stats.ipynb。 | 否 | + | compute_op_sum | 集群场景性能数据的device运行算子信息汇总分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/ComputeOpSum目录下输出交付件stats.ipynb;可根据实际情况决定是否是否打开--exclude_op_name。 | 否 | + | hccl_sum | 集合通信算子耗时分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/HcclSum目录下输出交付件stats.ipynb。 | 否 | + | mstx_sum | 集群场景mstx打点信息汇总分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/MstxSum目录下输出交付件stats.ipynb。 | 否 | + | slow_link | 集群慢链路异常分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。--export_type为db时,输出交付件cluster_analysis.db;--export_type为notebook时,在cluster_analysis_output/SlowLink目录下输出交付件stats.ipynb。 | 否 | + | cluster_time_summary | 集群场景性能数据分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db和analysis.db文件。--export_type为db时,输出交付件cluster_analysis.db,db里面有ClusterTimeSummary,不支持导出notebook。 | 否 | + | cluster_time_compare_summary | 集群场景性能数据对比分析,使用前集群数据必须先分析cluster_time_summary,需要配合--bp参数使用。输入性能数据需要基于cluster_analysis_output下的cluster_analysis.db文件。--export_type为db时,输出交付件cluster_analysis.db,db文件中有对比结果的表ClusterTimeCompareSummary,不支持导出notebook。 | 否 | + | slow_rank_pp_stage | 集群场景性能数据pp stage通信对比分析,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。输入性能数据中MetaData表如果没有包含训练任务的并行策略,则需要通过--tp --pp --dp手动传入,数据类型为正整数。--export_type为db时,输出交付件cluster_analysis.db,db文件中有分析结果PPAnalysisResult和P2PAnalysisResult,不支持导出notebook。 | 否 | + | p2p_pairing | 集群场景P2P算子生成全局关联索引,输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。输出的关联索引会作为一个新的字段`opConnectionId`附在原性能数据ascend_pytorch_profiler_{rank_id}.db文件的`COMMUNICATION_OP`的表中。 | 否 | + + --parallel_mode参数示例如下: + + ```bash + msprof-analyze cluster -d {cluster profiling data path} -m cann_api_sum --parallel_mode concurrent + ``` + + 或 + + ```bash + python3 cluster_analysis.py -d {cluster profiling data path} -m cann_api_sum --parallel_mode concurrent + ``` + + +### 交付件 + +集群分析工具的交付件通过MindStudio Insight工具展示,详见《[MindStudio Insight用户指南](https://www.hiascend.com/document/detail/zh/mindstudio/70RC2/GUI-baseddevelopmenttool/msascendinsightug/AscendInsight_0002.html)》。 + +#### cluster_step_trace_time.csv + +数据解析模式为communication_matrix、communication_time或all时均生成。 + +A列: Step数,是采集性能数据时设置的,一般来说集群性能数据采集一个step足够,如果采集多个step,需要先筛选一下。 + +B列: Type,主要分两种,rank和stage, 和后面的index强相关,可以理解为一个是单卡rank,一个是rank group(pp 并行的stage),如果type为stage,则后面D-K列信息为rank group下的最大值。 + +C列:Index,与type相关,表示卡号。 + +D列:Computing, 此列统计计算时间。 + +E列:Communication(Not Overlapped),此列统计未被掩盖的通信耗时。 + +F列:Overlapped,统计计算与通信重叠的耗时。 + +G列:Communication,通信时间的全部耗时。 + +H列:Free,空闲时间,指device侧既不在通信也不在计算的耗时,可能在做sdma拷贝或者空等。 + +I列:Stage时间,I、J、K列属于pp并行时有效的数值,stage时间代表除receive算子时间外的时间。 + +J列:Bubble时间,指receive时间的总和。 + +K列:Communication(Not Overlapped and Exclude Receive)指剔除receive算子外的并且不被掩盖的通信时间。 + +L列:Preparing,指迭代开始到首个计算或通信算子运行的时间。 + +M列:DP Index,指集群数据按照并行策略切分后所属DP组的索引, 如果没有采集则不显示。 + +N列:PP Index,指集群数据按照并行策略切分后所属PP组的索引,如果没有采集则不显示。 + +O列:TP Index,指集群数据按照并行策略切分后所属TP组的索引,如果没有采集则不显示。 + +**Tips**:先筛选B列type为stage, 看stage间是否有问题,再筛选B列type为rank,看rank是否有问题,根据以下几点排查。 + +* 根据Computing的时间差异判断是否有慢卡,或者有负载不均衡的现象。 + +* 根据Free统计是否有host bound或者分布不均现象。 + +* 根据Communication(Not Overlapped and Exclude Receive)时间判断是否通信耗时占比过大。 + +* 根据Bubble时间的占比和理论计算公式判断bubble设置是否合理,是否stage间有不均衡现象。 + +以上时间理论上都应该处于持平状态,即最大值小于最小值5%,否则就可能出现慢卡。 + +#### cluster_communication_matrix.json + +数据解析模式为communication_matrix或all时生成。 + +直接打开json(vscode或json查看器), 搜索"Total", 会有多个搜索结果,一般来说链路带宽信息的结构: + +```bash +{src_rank}-{dst_rank}: { + "Transport Type": "LOCAL", + "Transit Time(ms)": 0.02462, + "Transit Size(MB)": 16.777216, + "Bandwidth(GB/s)": 681.4466 +} +``` +**Tips**:可以根据rank互联的带宽以及链路类型,判断是否有慢链路的问题。 + +- "LOCAL"是片内拷贝,速度最高。 +- “HCCS”或“PCIE”是节点内片间拷贝,速度居中。 +- “RDMA”是节点间拷贝,速度最低。 + +#### cluster_communication.json + +数据解析模式为communication_time或all时生成。 + +主要为通信耗时数据。 + +#### cluster_analysis.db + +解析analysis.db或ascend_pytorch_profiler_{rank_id}.db生成的交付件,根据数据解析模式不同而解析不同的数据,可以使用MindStudio Insight工具展示。 + +#### communication_group.json + +记录通信域信息,解析analysis.db生成的交付件,collective表示集合通信域,P2P表示点对点通信,用户无须关注该文件。 + +#### stats.ipynb + +- 数据解析模式为cann_api_sum时生成,保存在cluster_analysis_output/CannApiSum目录下。 + + 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群API耗时信息。 + +- 数据解析模式为compute_op_sum时生成,保存在cluster_analysis_output/ComputeOpSum目录下。 + + 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群计算算子耗时分析(将集群所有计算算子进行汇总并以图表展示),集群Rank计算算子耗时分析(将每个Rank的计算算子进行各自汇总)。 + +- 数据解析模式为hccl_sum时生成,保存在cluster_analysis_output/HcclSum目录下。 + + 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群通信算子耗时分析(将集群所有通信算子进行汇总并以图表展示),集群Rank通信算子耗时分析(将每个Rank的通信算子进行各自汇总)、Top通信算子信息展示。 + +- 数据解析模式为mstx_sum时生成,保存在cluster_analysis_output/MstxSum目录下。 + + 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群场景mstx打点信息,分为框架侧、CANN侧和Device侧三部分的打点信息。 + +- 数据解析模式为slow_link时生成,保存在cluster_analysis_output/SlowLink目录下。 + + 可使用jupyter notebook工具或MindStudio Insight工具打开,主要展示集群场景异常慢链路数据分析(将集群所有链路进行汇总并以图表展示),集群慢链路汇总耗时分析(展示检测到可能存在慢链路的数据)。 + + + diff --git a/profiler/msprof_analyze/cluster_analyse/analysis/cluster_base_info_analysis.py b/profiler/msprof_analyze/cluster_analyse/analysis/cluster_base_info_analysis.py index cb280978c41a639a2f5d17e2a0ff08ed3a9962d6..c5cb2652a1159f9bb645b96c4f60535c74a67859 100644 --- a/profiler/msprof_analyze/cluster_analyse/analysis/cluster_base_info_analysis.py +++ b/profiler/msprof_analyze/cluster_analyse/analysis/cluster_base_info_analysis.py @@ -1,92 +1,92 @@ -# Copyright (c) 2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import json -import os - -from msprof_analyze.cluster_analyse.analysis.base_analysis import BaseAnalysis -from msprof_analyze.prof_common.db_manager import DBManager -from msprof_analyze.cluster_analyse.common_func.utils import increase_shared_value -from msprof_analyze.prof_common.path_manager import PathManager -from msprof_analyze.prof_common.constant import Constant -from msprof_analyze.prof_common.logger import get_logger -from msprof_analyze.prof_common.file_manager import FileManager - - -logger = get_logger() - - -class ClusterBaseInfoAnalysis(BaseAnalysis): - KEY_DISTRIBUTED_ARGS = "distributed_args" - - def __init__(self, param: dict): - super().__init__(param) - self.distributed_args = {} - - def run(self, completed_processes=None, lock=None): - if self.data_type != Constant.DB: - if completed_processes and lock: - increase_shared_value(completed_processes, lock) - logger.info("ClusterBaseInfoAnalysis skipped, since data type is not db") - return - if not self.extract_base_info(): - logger.warning("ClusterBaseInfoAnalysis skipped, since no metadata or distributed args found") - return - self.dump_db() - if completed_processes and lock: - increase_shared_value(completed_processes, lock) - logger.info("ClusterBaseInfoAnalysis completed") - - def dump_db(self): - if not self.distributed_args: - return - output_path = os.path.join(self.cluster_analysis_output_path, Constant.CLUSTER_ANALYSIS_OUTPUT) - PathManager.make_dir_safety(output_path) - result_db = os.path.join(output_path, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER) - conn, curs = DBManager.create_connect_db(result_db) - DBManager.create_tables(result_db, Constant.TABLE_CLUSTER_BASE_INFO) - save_distributed_args = [[json.dumps(self.distributed_args)]] - sql = "insert into {} values ({value})".format(Constant.TABLE_CLUSTER_BASE_INFO, - value="?," * (len(save_distributed_args[0]) - 1) + "?") - DBManager.executemany_sql(conn, sql, save_distributed_args) - DBManager.destroy_db_connect(conn, curs) - - def extract_base_info(self): - file_list = self.get_profiler_metadata_file() - if not file_list: - return False - for file_path in file_list: - try: - meta_data = FileManager.read_json_file(file_path) - except RuntimeError as e: - logger.error("Read json failed. %s", str(e)) - continue - if not meta_data.get(self.KEY_DISTRIBUTED_ARGS): - continue - for key, value in meta_data[self.KEY_DISTRIBUTED_ARGS].items(): - if key == "rank": - continue - self.distributed_args.setdefault(key, value) - return True - return False - - def get_profiler_metadata_file(self): - meta_file_list = [] - for root, _, files in os.walk(self.collection_path): - for file_name in files: - if file_name == Constant.PROFILER_METADATA: - meta_file_list.append(os.path.join(root, file_name)) - return meta_file_list - - +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import json +import os + +from msprof_analyze.cluster_analyse.analysis.base_analysis import BaseAnalysis +from msprof_analyze.prof_common.db_manager import DBManager +from msprof_analyze.cluster_analyse.common_func.utils import increase_shared_value +from msprof_analyze.prof_common.path_manager import PathManager +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_common.file_manager import FileManager + + +logger = get_logger() + + +class ClusterBaseInfoAnalysis(BaseAnalysis): + KEY_DISTRIBUTED_ARGS = "distributed_args" + + def __init__(self, param: dict): + super().__init__(param) + self.distributed_args = {} + + def run(self, completed_processes=None, lock=None): + if self.data_type != Constant.DB: + if completed_processes and lock: + increase_shared_value(completed_processes, lock) + logger.info("ClusterBaseInfoAnalysis skipped, since data type is not db") + return + if not self.extract_base_info(): + logger.warning("ClusterBaseInfoAnalysis skipped, since no metadata or distributed args found") + return + self.dump_db() + if completed_processes and lock: + increase_shared_value(completed_processes, lock) + logger.info("ClusterBaseInfoAnalysis completed") + + def dump_db(self): + if not self.distributed_args: + return + output_path = os.path.join(self.cluster_analysis_output_path, Constant.CLUSTER_ANALYSIS_OUTPUT) + PathManager.make_dir_safety(output_path) + result_db = os.path.join(output_path, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER) + conn, curs = DBManager.create_connect_db(result_db) + DBManager.create_tables(result_db, Constant.TABLE_CLUSTER_BASE_INFO) + save_distributed_args = [[json.dumps(self.distributed_args)]] + sql = "insert into {} values ({value})".format(Constant.TABLE_CLUSTER_BASE_INFO, + value="?," * (len(save_distributed_args[0]) - 1) + "?") + DBManager.executemany_sql(conn, sql, save_distributed_args) + DBManager.destroy_db_connect(conn, curs) + + def extract_base_info(self): + file_list = self.get_profiler_metadata_file() + if not file_list: + return False + for file_path in file_list: + try: + meta_data = FileManager.read_json_file(file_path) + except RuntimeError as e: + logger.error("Read json failed. %s", str(e)) + continue + if not meta_data.get(self.KEY_DISTRIBUTED_ARGS): + continue + for key, value in meta_data[self.KEY_DISTRIBUTED_ARGS].items(): + if key == "rank": + continue + self.distributed_args.setdefault(key, value) + return True + return False + + def get_profiler_metadata_file(self): + meta_file_list = [] + for root, _, files in os.walk(self.collection_path): + for file_name in files: + if file_name == Constant.PROFILER_METADATA: + meta_file_list.append(os.path.join(root, file_name)) + return meta_file_list + + diff --git a/profiler/msprof_analyze/cluster_analyse/analysis/comm_matrix_analysis.py b/profiler/msprof_analyze/cluster_analyse/analysis/comm_matrix_analysis.py index a87803438aef3a733c41588413ff2281b85ae418..3a538509b88e7ce3996aa539e49bf714bb163766 100644 --- a/profiler/msprof_analyze/cluster_analyse/analysis/comm_matrix_analysis.py +++ b/profiler/msprof_analyze/cluster_analyse/analysis/comm_matrix_analysis.py @@ -13,7 +13,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -import copy import os from collections import defaultdict @@ -108,7 +107,7 @@ class CommMatrixAnalysis(BaseAnalysis): Constant.OP_NAME: '' } for op_name, op_dict in step_dict.items(): - link_info = defaultdict(lambda: copy.deepcopy(default_value)) + link_info = defaultdict(lambda: default_value.copy()) for rank_id, rank_dict in op_dict.items(): process_link_key(rank_id, rank_dict) step_dict[op_name] = convert_local_to_global_rank() @@ -120,7 +119,7 @@ class CommMatrixAnalysis(BaseAnalysis): Constant.TRANSIT_SIZE_MB: 0, Constant.OP_NAME: '' } - total_op_info = defaultdict(lambda: copy.deepcopy(default_value)) + total_op_info = defaultdict(lambda: default_value.copy()) for op_name, op_dict in step_dict.items(): if self.check_add_op(op_name): for link_key, link_dict in op_dict.items(): diff --git a/profiler/msprof_analyze/cluster_analyse/analysis/communication_analysis.py b/profiler/msprof_analyze/cluster_analyse/analysis/communication_analysis.py index 61daa5b943d4a718d90f80203bac3fc948202199..47846522a9543511b3e55579d05b814d1ca9717d 100644 --- a/profiler/msprof_analyze/cluster_analyse/analysis/communication_analysis.py +++ b/profiler/msprof_analyze/cluster_analyse/analysis/communication_analysis.py @@ -13,7 +13,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -import copy import os from collections import defaultdict @@ -79,7 +78,7 @@ class CommunicationAnalysis(BaseAnalysis): Constant.COMMUNICATION_TIME_INFO: defaultdict(float), Constant.COMMUNICATION_BANDWIDTH_INFO: {} } - total_rank_dict = defaultdict(lambda: copy.deepcopy(default_value)) + total_rank_dict = defaultdict(lambda: default_value.copy()) for _, rank_dict in comm_ops.items(): for rank_id, communication_op_info in rank_dict.items(): for com_info, com_info_dict in communication_op_info.items(): diff --git a/profiler/msprof_analyze/cluster_analyse/cluster_analysis.py b/profiler/msprof_analyze/cluster_analyse/cluster_analysis.py index d7d71908506256eca3b8bd884a593188546189ce..6464bb732ddf57b2790d99ac7148ce3ecaf327ce 100644 --- a/profiler/msprof_analyze/cluster_analyse/cluster_analysis.py +++ b/profiler/msprof_analyze/cluster_analyse/cluster_analysis.py @@ -142,7 +142,6 @@ def cluster_analysis_main(): parser.add_argument("--parallel_mode", type=str, help="context mode", default="concurrent") parser.add_argument("--export_type", type=str, help="recipe export type", choices=["db", "notebook"], default="db") parser.add_argument("--rank_list", type=str, help="Rank id list", default='all') - parser.add_argument("--step_id", type=int, help="Step id", default=Constant.VOID_STEP) args, extra_args = parser.parse_known_args() parameter = vars(args) diff --git a/profiler/msprof_analyze/cluster_analyse/common_func/context.py b/profiler/msprof_analyze/cluster_analyse/common_func/context.py index b41972c0d21ac73fb5b9f0291cddec8d9a06b94a..e4f716e90d991645de514e2bc6ecd12920c0c9e1 100644 --- a/profiler/msprof_analyze/cluster_analyse/common_func/context.py +++ b/profiler/msprof_analyze/cluster_analyse/common_func/context.py @@ -16,6 +16,7 @@ import os from functools import partial from concurrent import futures +from collections import defaultdict from msprof_analyze.prof_common.constant import Constant from msprof_analyze.prof_common.logger import get_logger @@ -68,6 +69,7 @@ class ConcurrentContext(Context): super().__init__() self._custom = executor is None self._executor = executor or futures.ProcessPoolExecutor(max_workers=os.cpu_count()) + self.future_dict = defaultdict(list) def __enter__(self): if self._executor is None: @@ -88,3 +90,11 @@ class ConcurrentContext(Context): def wait(self, waitable): return waitable + + def submit(self, name, func, *args, **kwargs): + self.future_dict[name].append(self._executor.submit(func, *args, **kwargs)) + + def wait_all_futures(self): + for _, future_list in self.future_dict.items(): + for future in future_list: + future.result() \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/common_func/table_constant.py b/profiler/msprof_analyze/cluster_analyse/common_func/table_constant.py index 27daae78cb9004a2f713b9ebf9bc6ab916dd9325..3acb8713e21f0337dae4973044667fb64707eba1 100644 --- a/profiler/msprof_analyze/cluster_analyse/common_func/table_constant.py +++ b/profiler/msprof_analyze/cluster_analyse/common_func/table_constant.py @@ -39,3 +39,21 @@ class TableConstant: DST_RANK = "dst_rank" TRANSPORT_TYPE = "transport_type" OPNAME = "op_name" + + +class ProfilerTableConstant: + + # COMMUNICATION OP + OP_ID = "opId" + OP_NAME = "opName" + START_NS = "startNS" + END_NS = "endNS" + CONNECTION_ID = "connectionId" + GROUP_NAME = "groupName" + RELAY = "relay" + RETRY = "retry" + DATA_TYPE = "dataType" + ALG_TYPE = "algType" + COUNT = "count" + OP_TYPE = "opType" + WAIT_NS = "waitNS" diff --git a/profiler/msprof_analyze/cluster_analyse/common_func/tables_config.py b/profiler/msprof_analyze/cluster_analyse/common_func/tables_config.py index 42c509694cfd1a896f60ec6b282de040f22204b6..7c948ead594dcf5c67d1e70ff417b7bedf2b9265 100644 --- a/profiler/msprof_analyze/cluster_analyse/common_func/tables_config.py +++ b/profiler/msprof_analyze/cluster_analyse/common_func/tables_config.py @@ -31,10 +31,7 @@ class TablesConfig: ], "CommunicationGroupMap": [ ("type", "TEXT, null"), - ("rank_set", "TEXT, null"), - ("group_name", "TEXT, null"), - ("group_id", "TEXT, null"), - ("pg_name", "TEXT, null") + ("rank_set", "TEXT, null") ], "ClusterCommAnalyzerBandwidthMap": [ ("rank_set", "TEXT, null"), @@ -133,10 +130,8 @@ class TablesConfig: ], "CommunicationGroupMappingMap": [ ("type", "TEXT, null"), - ("rank_set", "TEXT, null"), ("group_name", "TEXT, null"), - ("group_id", "TEXT, null"), - ("pg_name", "TEXT, null") + ("rank_set", "TEXT, null") ], "ClusterBaseInfoMap": [ ("distributed_args", "TEXT, null") diff --git a/profiler/msprof_analyze/cluster_analyse/common_func/utils.py b/profiler/msprof_analyze/cluster_analyse/common_func/utils.py index f2ba499d6f42986c1b2ecca49998f33d766c2d21..7c867cba32a5ca72423988ac1805d88d0de75a0e 100644 --- a/profiler/msprof_analyze/cluster_analyse/common_func/utils.py +++ b/profiler/msprof_analyze/cluster_analyse/common_func/utils.py @@ -81,14 +81,32 @@ def increase_shared_value(shared_value: Value, lock: Lock): shared_value.value += 1 -def double_hash(data): - uint32_bits = 32 - uint32_max = 0xFFFFFFFF # 32 位无符号整数的最大值 - prime = [29, 131] - hash_values = [0, 0] - - for d in data: - hash_values[0] = (hash_values[0] * prime[0] + ord(d)) & uint32_max - hash_values[1] = (hash_values[1] * prime[1] + ord(d)) & uint32_max - - return ((hash_values[0] << uint32_bits) | hash_values[1]) +def detect_outliers_z_score(data, threshold=3): + """ + 使用 Z-Score 方法判断是否存在异常值。 + Z-Score 是一种统计方法,用于衡量数据点与均值的标准差距离。 + 如果某个数据点的 Z-Score 超过阈值(默认为3),则认为它是异常值。 + + 返回值: + - True:存在异常值 + - False:不存在异常值 + """ + # 计算数据的均值 + mean = np.mean(data) # 均值表示数据的中心位置 + + # 计算数据的标准差 + std = np.std(data) # 标准差表示数据的离散程度 + + # 如果标准差为0,直接返回 False(不存在异常值) + if std == 0: + return False + + # 计算 Z-Score 的上阈值和下阈值 + z_scores_upper_threshold = threshold * std + mean + z_scores_lower_threshold = -threshold * std + mean + + # 判断是否存在 Z-Score 超过阈值的数据点 + has_outliers = any(x > z_scores_upper_threshold or x < z_scores_lower_threshold for x in data) + + # 返回是否存在异常值的布尔值 + return has_outliers \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/communication_group/__init__.py b/profiler/msprof_analyze/cluster_analyse/communication_group/__init__.py index de0604079e1323b2749bc801a6e8326893c73498..7101187a2c2619f3b1c20dded14b433950b4c662 100644 --- a/profiler/msprof_analyze/cluster_analyse/communication_group/__init__.py +++ b/profiler/msprof_analyze/cluster_analyse/communication_group/__init__.py @@ -11,4 +11,4 @@ # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and -# limitations under the License. \ No newline at end of file +# limitations under the License. diff --git a/profiler/msprof_analyze/cluster_analyse/communication_group/base_communication_group.py b/profiler/msprof_analyze/cluster_analyse/communication_group/base_communication_group.py index 2c02bfdbf1bdd22dec838b56b7eb0c9c9872cac2..8f6625f8f6bbf646cfd77099b70c36398680f67a 100644 --- a/profiler/msprof_analyze/cluster_analyse/communication_group/base_communication_group.py +++ b/profiler/msprof_analyze/cluster_analyse/communication_group/base_communication_group.py @@ -18,21 +18,15 @@ from abc import abstractmethod from collections import defaultdict from copy import deepcopy from multiprocessing import Pool -import pandas as pd from msprof_analyze.cluster_analyse.cluster_utils.data_transfer_adapter import DataTransferAdapter -from msprof_analyze.cluster_analyse.common_func.utils import double_hash from msprof_analyze.prof_common.constant import Constant from msprof_analyze.prof_common.logger import get_logger -from msprof_analyze.prof_common.file_manager import FileManager logger = get_logger() class BaseCommunicationGroup: - KEY_PARALLEL_GROUP_INFO = "parallel_group_info" - KEY_COMM_GROUP_PARALLEL_INFO = "comm_group_parallel_info" - def __init__(self, params: dict): self.collection_path = params.get(Constant.COLLECTION_PATH) self.cluster_analysis_output_path = params.get(Constant.CLUSTER_ANALYSIS_OUTPUT_PATH) @@ -44,11 +38,9 @@ class BaseCommunicationGroup: self.collective_group_dict = defaultdict(set) self.p2p_comm_group = [] self.communication_group = {} - self.parallel_group_info = {} self.communication_ops = [] self.matrix_ops = [] self.adapter = DataTransferAdapter() - self.comm_group_parallel_info_df = None def load_communication_data(self): comm_op_dirs = [] @@ -120,18 +112,6 @@ class BaseCommunicationGroup: def read_communication_func(self, params: tuple): pass - def read_parallel_group_info(self): - for _, profiling_dir_path in self.data_map.items(): - meta_file = os.path.join(profiling_dir_path, Constant.PROFILER_METADATA) - if not os.path.exists(meta_file): - continue - meta_data = FileManager.read_json_file(meta_file) - if self.KEY_PARALLEL_GROUP_INFO not in meta_data: - continue - for group_id, group_info in meta_data[self.KEY_PARALLEL_GROUP_INFO].items(): - if group_id not in self.parallel_group_info: - self.parallel_group_info[group_id] = group_info - def analyze_communication_data(self): for rank_id, rank_id_comm_dict, rank_id_matrix_dict in self.rank_comm_dir_dict: for step_id, step_id_dict in rank_id_comm_dict.items(): @@ -165,11 +145,9 @@ class BaseCommunicationGroup: def generate(self): self.load_communication_data() self.analyze_communication_data() - self.read_parallel_group_info() self.set_p2p_groups() self.generate_collective_communication_group() self.generate_p2p_communication_group() - self.analyze_parallel_group_info() self.dump_data() return self.collect_comm_data() @@ -237,32 +215,6 @@ class BaseCommunicationGroup: Constant.COMM_OP_INFO: op_link_info }) - def analyze_parallel_group_info(self): - # create comm group dataframe - comm_group_cols = ["type", "rank_set", "group_name"] - comm_group_df = pd.DataFrame(columns=comm_group_cols) - for group_name, rank_set in self.collective_group_dict.items(): - comm_group_df.loc[comm_group_df.shape[0]] = [Constant.COLLECTIVE, list(rank_set), group_name] - - # create parallel group dataframe - parallel_group_cols = ["group_name", "group_id", "pg_name"] - parallel_group_df = pd.DataFrame(columns=parallel_group_cols) - for group_id, parallel_info in self.parallel_group_info.items(): - group_name = str(double_hash(group_id)) # group_name is hashed group_id - pg_name = parallel_info.get("group_name", "") - if not pg_name: - continue - parallel_group_df.loc[parallel_group_df.shape[0]] = [group_name, group_id, pg_name] - - # merge by group_name - df = pd.merge(comm_group_df, parallel_group_df, on='group_name', how='left') - # add p2p group - for rank_set in self.communication_group[Constant.P2P]: - df.loc[df.shape[0]] = [Constant.P2P, list(rank_set), None, None, None] - df.fillna("", inplace=True) - - self.comm_group_parallel_info_df = df - class UnionFind(object): """Disjoint Set Union""" diff --git a/profiler/msprof_analyze/cluster_analyse/communication_group/communication_db_group.py b/profiler/msprof_analyze/cluster_analyse/communication_group/communication_db_group.py index 99b55fb9956fff23ba36d9f4b80ba05caa33562c..7d1b4ec250ba1d25079a86f1b0bf95fd2c8906aa 100644 --- a/profiler/msprof_analyze/cluster_analyse/communication_group/communication_db_group.py +++ b/profiler/msprof_analyze/cluster_analyse/communication_group/communication_db_group.py @@ -76,9 +76,12 @@ class CommunicationDBGroup(BaseCommunicationGroup): return rank_id, comm_data, comm_matrix_data def dump_data(self): - self.comm_group_parallel_info_df["rank_set"] = (self.comm_group_parallel_info_df["rank_set"]. - apply(lambda x: "(" + ",".join(str(i) for i in x) + ")")) - res = self.comm_group_parallel_info_df.values.tolist() + res = [] + for data_type, data_list in self.communication_group.items(): + for data in data_list: + rank_set = "(" + ",".join(str(i) for i in data) + ")" + data = [data_type, rank_set] + res.append(data) dump_group_db(res, self.COMMUNICATION_GROUP_TABLE, self.cluster_analysis_output_path) @@ -145,9 +148,16 @@ class CommunicationDBGroupOptimized(BaseCommunicationGroup): return comm_data_dict def dump_data(self): - self.comm_group_parallel_info_df["rank_set"] = (self.comm_group_parallel_info_df["rank_set"]. - apply(lambda x: "(" + ",".join(str(i) for i in x) + ")")) - res = self.comm_group_parallel_info_df.values.tolist() + res = [] + for data_type, data_list in self.communication_group.items(): + if data_type == Constant.P2P: + for data in data_list: + rank_set = "(" + ",".join(str(i) for i in data) + ")" + res.append([data_type, "", rank_set]) + continue + for group_name, data in data_list: + rank_set = "(" + ",".join(str(i) for i in data) + ")" + res.append([data_type, group_name, rank_set]) dump_group_db(res, self.COMMUNICATION_GROUP_MAPPING_TABLE, self.cluster_analysis_output_path) def _merge_data_with_rank(self, rank_id: int, data_list: list): diff --git a/profiler/msprof_analyze/cluster_analyse/communication_group/communication_json_group.py b/profiler/msprof_analyze/cluster_analyse/communication_group/communication_json_group.py index 2975050da0706870136ad4d8e84f28c56ded4718..97948228264f7b6fb2aed8d8b8766b3515626d40 100644 --- a/profiler/msprof_analyze/cluster_analyse/communication_group/communication_json_group.py +++ b/profiler/msprof_analyze/cluster_analyse/communication_group/communication_json_group.py @@ -14,7 +14,6 @@ # limitations under the License. import os -from copy import deepcopy from msprof_analyze.cluster_analyse.communication_group.base_communication_group import BaseCommunicationGroup from msprof_analyze.prof_common.file_manager import FileManager @@ -27,10 +26,8 @@ class CommunicationJsonGroup(BaseCommunicationGroup): super().__init__(params) def dump_data(self): - res = deepcopy(self.communication_group) - res[self.KEY_COMM_GROUP_PARALLEL_INFO] = self.comm_group_parallel_info_df.to_dict(orient="records") FileManager.create_json_file( - self.cluster_analysis_output_path, res, self.COMMUNICATION_GROUP_JSON + self.cluster_analysis_output_path, self.communication_group, self.COMMUNICATION_GROUP_JSON ) def read_communication_func(self: any, params: tuple): diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/base_recipe_analysis.py b/profiler/msprof_analyze/cluster_analyse/recipes/base_recipe_analysis.py index a8b503592536e529b4a9043058284f0094b08038..7975c3675373ea80d5eba4d827fdc757a409c7af 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/base_recipe_analysis.py +++ b/profiler/msprof_analyze/cluster_analyse/recipes/base_recipe_analysis.py @@ -49,7 +49,6 @@ class BaseRecipeAnalysis(ABC): rank_list = params.get(Constant.RANK_LIST, 'all') self._rank_list = rank_list if rank_list == "all" else [int(rank) for rank in rank_list.split(",") if rank.isdigit()] - self._step_id = params.get(Constant.STEP_ID, Constant.VOID_STEP) self._extra_args = self.get_extra_argument(params.get(Constant.EXTRA_ARGS)) PathManager.make_dir_safety(self._output_path) @@ -107,7 +106,7 @@ class BaseRecipeAnalysis(ABC): result_db = custom_db_path if custom_db_path else os.path.join(self.output_path, file_name) conn, cursor = DBManager.create_connect_db(result_db) if isinstance(data, pd.DataFrame): - data.to_sql(table_name, conn, if_exists='replace', index=True) + data.to_sql(table_name, conn, if_exists='replace', index=index) else: logger.error(f"Unknown dump data type: {type(data)}") DBManager.destroy_db_connect(conn, cursor) @@ -158,55 +157,27 @@ class BaseRecipeAnalysis(ABC): db_paths = [] for rank_id in rank_ids: rank_path = self._data_map[rank_id] - db_path = os.path.join(rank_path, Constant.SINGLE_OUTPUT, f"ascend_pytorch_profiler_{rank_id}.db") - if os.path.exists(db_path): - db_paths.append({Constant.RANK_ID: rank_id, Constant.PROFILER_DB_PATH: db_path, - Constant.STEP_RANGE: self._get_step_range(db_path)}) + profiler_db_path = os.path.join(rank_path, Constant.SINGLE_OUTPUT, f"ascend_pytorch_profiler_{rank_id}.db") + analysis_db_path = os.path.join(rank_path, Constant.SINGLE_OUTPUT, f"analysis.db") + if not os.path.exists(profiler_db_path): + logger.warning(f"Profiler DB file not found, rank id: {rank_id}, db path: {profiler_db_path}") + continue + db_path_dict = {Constant.RANK_ID: rank_id, Constant.PROFILER_DB_PATH: profiler_db_path} + if os.path.exists(analysis_db_path): + db_path_dict[Constant.ANALYSIS_DB_PATH] = analysis_db_path else: - logger.warning(f"DB file not found, rank id: {rank_id}, db path: {db_path}.") + logger.warning(f"Analysis DB file not found, rank id: {rank_id}, db path: {analysis_db_path}") + db_paths.append(db_path_dict) if invalid_rank_id: - logger.warning(f"Invalid Rank id: [{','.join(invalid_rank_id)}].") + logger.warning(f"Invalid Rank id : [{','.join(invalid_rank_id)}].") return db_paths - def _get_step_range(self, db_path): - step_range = {} - if self._step_id == Constant.VOID_STEP: - return step_range - conn, cursor = DBManager.create_connect_db(db_path) - if not DBManager.judge_table_exists(cursor, "STEP_TIME"): - logger.error(f"The STEP_TIME table does not exist in the database: {db_path}, " - f"the parameter step_id will not take effect.") - DBManager.destroy_db_connect(conn, cursor) - return step_range - - step_time = [] - sql = f"select id, startNs, endNs from STEP_TIME" - try: - step_time = DBManager.fetch_all_data(cursor, sql) - except Exception as err: - logger.error(err) - finally: - DBManager.destroy_db_connect(conn, cursor) - - for step_data in step_time: - if step_data.get("id") == self._step_id: - step_range = step_data - break - if not step_range: - step_list = ", ".join([str(step.get("id", "")) for step in step_time]) - logger.error(f"Invalid step_id {self._step_id} in the database: {db_path}, " - f"step_id must be an element of the set ({step_list}), " - f"the parameter step_id will not take effect.") - return step_range - def _mapper_func(self, data_map, analysis_class): """ Extract the profiling data required for cluster analysis from each device, and then aggregate the results from each device to be processed by a reduce function. Params: - data_map: eg. {"RANK_ID": 1, - "profiler_db_path": "xxxx/ascend_pytorch_profiler_1.db", - "step_range": {"id": 2, "startNs": 12345, "endNs": 12443]} + data_map: eg. {"RANK_ID": 1, "profiler_db_path": "xxxx/ascend_pytorch_profiler_1.db"} analysis_class: hccl_sum, compute_op_sum, cann_api_sum, mstx_sum…… """ - pass + pass \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/cann_api_sum.py b/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/cann_api_sum.py index 22cd2c64aeb09a417c1915bfbaaed0cc49bd8b00..17f8f698960ac4ebd10bfbdfd0c5712fa016b7f1 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/cann_api_sum.py +++ b/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/cann_api_sum.py @@ -96,9 +96,8 @@ class CannApiSum(BaseRecipeAnalysis): def _mapper_func(self, data_map, analysis_class): profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) rank_id = data_map.get(Constant.RANK_ID) - step_range = data_map.get(Constant.STEP_RANGE) - df = CannApiSumExport(profiler_db_path, analysis_class, step_range).read_export_db() + df = CannApiSumExport(profiler_db_path, analysis_class).read_export_db() if df is None or df.empty: logger.warning(f"There is no stats data in {profiler_db_path}.") return None, None - return rank_id, df + return rank_id, df \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/stats.ipynb b/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/stats.ipynb index 2bc1b77e9b14777b57771313233beb7fa255d2e9..c97f039c5a01a6e7cce2968d569d79e137e76f8c 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/stats.ipynb +++ b/profiler/msprof_analyze/cluster_analyse/recipes/cann_api_sum/stats.ipynb @@ -72,7 +72,7 @@ "outputs": [], "source": [ "per_rank_df = pd.read_csv(\"rank_stats.csv\")\n", - "cluster_display.display_stats_per_operation(per_rank_df, box=False, scatter=False)" + "cluster_display.display_stats_per_operation(per_rank_df, xaxis_title='rank', yaxis_title='duration (ns)')" ] } ], diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cluster_display.py b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_display.py index 5a23a280fff9b3c0492f1c8cd2fac20824afb708..fbf89bc4909c28fc1ec3a4f2c38a7414fe8b986d 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/cluster_display.py +++ b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_display.py @@ -14,6 +14,8 @@ # limitations under the License. import logging +import math +import matplotlib.pyplot as plt import numpy as np import pandas as pd import plotly.graph_objects as go @@ -238,3 +240,74 @@ def display_stats_optional_combobox(options, display_func, args, description="Op dropdown.value = options[0] elif len(options) == 1: display_func(options[0], args) + + +def compute_quantile_intervals(lst, num_intervals): + lst.sort(reverse=False) + if len(lst) > num_intervals: + min_value = min(lst) + max_value = max(lst) + interval_size = len(lst) / num_intervals + result = [min_value] + for i in range(1, num_intervals): + index = int(math.ceil(i * interval_size)) - 1 + result.append(lst[index]) + result.append(max_value) + else: + result = lst + return result[::-1] + + +def calculate_zscore(x, mean, std): + if std != 0: + zscore = (x - mean) / std + elif x > mean: + zscore = 100 + else: + zscore = -100 + return zscore + + +def process_data(df, group_cols, value_col, num_intervals): + grouped = df.groupby(group_cols)[value_col].apply(list).to_dict() + data = {k: compute_quantile_intervals(v, num_intervals) for k, v in grouped.items()} + max_len = max(len(v) for v in data.values()) + data_dict = { + k: v + [np.nan] * (max_len - len(v)) + for k, v in data.items() + } + # 使用sorted()函数和lambda表达式对字典的键进行排序,reverse=True表示降序排列 + sorted_items = sorted(data_dict.items(), key=lambda item: item[0], reverse=True) + # 将排序后的列表转换为字典 + data_dict = dict(sorted_items) + data_dealed = pd.DataFrame(data_dict) + return data_dealed + + +def plot_data(df, title, ylabel): + ax = df.plot(kind='bar', figsize=(12, 6)) + ax.set_title(title, fontsize=14) + ax.set_xlabel('opTypeRelatedRanksDataSize', fontsize=12) + ax.set_ylabel(ylabel, fontsize=12) + ax.legend(title='Percentiles', bbox_to_anchor=(1.05, 1)) + plt.tight_layout() + plt.show() + + +def display_transmittime_bar(slowlinkops_df, ratio_set=0.05, optype='hcom_allGather_', + relatedranks=5, datasize=1024): + slowlinkops_df_f = slowlinkops_df[(slowlinkops_df['opType'] == optype) & + (slowlinkops_df['relatedRanks'] == relatedranks) & (slowlinkops_df['dataSize'] == datasize)] + slowlinkops_df_f['relatedRanks'] = slowlinkops_df_f['relatedRanks'].apply(str) + slowlinkops_df_f['dataSize'] = slowlinkops_df_f['dataSize'].apply(str) + slowlinkops_df_f['opTypeRelatedRanksDataSize'] = slowlinkops_df_f['opType'] + \ + slowlinkops_df_f['relatedRanks'] + '_' + slowlinkops_df_f['dataSize'] + slowlinkops_df_f['transmitTime_Zscore'] = slowlinkops_df_f['transmitTime'].apply( + lambda x: calculate_zscore(x, slowlinkops_df_f['transmitTime'].mean(), slowlinkops_df_f['transmitTime'].std())) + num_intervals = int(1 / ratio_set) + + data_tt = process_data(slowlinkops_df_f, 'opTypeRelatedRanksDataSize', 'transmitTime', num_intervals) + data_ttzscore = process_data(slowlinkops_df_f, 'opTypeRelatedRanksDataSize', 'transmitTime_Zscore', num_intervals) + + plot_data(data_tt, 'Transmit Time Distribution', 'Time (ns)') + plot_data(data_ttzscore, 'Z-Score of Transmit Time Distribution', 'Z-Score') \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_compare_summary/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_compare_summary/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_compare_summary/cluster_time_compare_summary.py b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_compare_summary/cluster_time_compare_summary.py new file mode 100644 index 0000000000000000000000000000000000000000..71a5fbee9d40c34e0f74930f7615ec23bec44d44 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_compare_summary/cluster_time_compare_summary.py @@ -0,0 +1,115 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.database_service import DatabaseService +from msprof_analyze.prof_common.db_manager import DBManager +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_common.path_manager import PathManager + +logger = get_logger() + + +class ClusterTimeCompareSummary(BaseRecipeAnalysis): + BP = "bp" # 被对比的路径参数 + TABLE_CLUSTER_TIME_COMPARE_SUMMARY = "ClusterTimeCompareSummary" + CLUSTER_TIME_SUMMARY_CSV = "cluster_time_summary.csv" + CLUSTER_TIME_SUMMARY_COLUMNS = [ + "rank", + "step", + "computation", + "communicationNotOverlapComputation", + "communicationOverlapComputation", + "communication", + "free", + "communicationWaitStageTime", + "communicationTransmitStageTime", + "memory", + "memoryNotOverlapComputationCommunication", + "taskLaunchDelayAvgTime" + ] + + def __init__(self, params): + super().__init__(params) + self.db_path = os.path.join(self._collection_dir, Constant.CLUSTER_ANALYSIS_OUTPUT, + Constant.DB_CLUSTER_COMMUNICATION_ANALYZER) + self.base_db_path = os.path.join(self._extra_args.get(self.BP, ""), Constant.CLUSTER_ANALYSIS_OUTPUT, + Constant.DB_CLUSTER_COMMUNICATION_ANALYZER) + self.compare_result = None + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + @classmethod + def add_parser_argument(cls, parser): + BaseRecipeAnalysis.add_parser_argument(parser) + parser.add_argument('--bp', type=PathManager.expanduser_for_argumentparser, default="", + help="base profiling data path") + + def run(self, context=None): + logger.info("ClusterTimeCompareSummary starts running.") + if not self.check_params_is_valid(): + return + self.get_compare_data() + self.save_db() + + def check_params_is_valid(self) -> bool: + base_path = self._extra_args.get(self.BP, "") + if not base_path: + logger.error("Must specify the --bp parameter.") + return False + if self._export_type == Constant.NOTEBOOK: + logger.error("For cluster_time_compare_summary, the export_type parameter only supports db.") + return False + try: + PathManager.check_input_directory_path(base_path) # 校验目录 + except RuntimeError: + logger.error(f"{base_path} is not valid.") + return False + if not DBManager.check_tables_in_db(self.db_path, Constant.TABLE_CLUSTER_TIME_SUMMARY): + logger.error(f"{Constant.TABLE_CLUSTER_TIME_SUMMARY} in {self.db_path} does not exist.") + return False + if not DBManager.check_tables_in_db(self.base_db_path, Constant.TABLE_CLUSTER_TIME_SUMMARY): + logger.error(f"{Constant.TABLE_CLUSTER_TIME_SUMMARY} in {self.base_db_path} does not exist.") + return False + return True + + + def get_compare_data(self): + database_service_for_db = DatabaseService(self.db_path) + database_service_for_db.add_table_for_query(Constant.TABLE_CLUSTER_TIME_SUMMARY, + self.CLUSTER_TIME_SUMMARY_COLUMNS) + cluster_time_summary_df_dict = database_service_for_db.query_data() + cluster_time_summary_df = cluster_time_summary_df_dict.get(Constant.TABLE_CLUSTER_TIME_SUMMARY) + database_service_for_base_db = DatabaseService(self.base_db_path) + database_service_for_base_db.add_table_for_query(Constant.TABLE_CLUSTER_TIME_SUMMARY, + self.CLUSTER_TIME_SUMMARY_COLUMNS) + base_cluster_time_summary_df_dict = database_service_for_base_db.query_data() + base_cluster_time_summary_df = base_cluster_time_summary_df_dict.get(Constant.TABLE_CLUSTER_TIME_SUMMARY) + self.compare_result = ( + cluster_time_summary_df.set_index(["rank", "step"]) + .subtract(base_cluster_time_summary_df.set_index(["rank", "step"])) + .dropna() + .reset_index() + .rename(columns=lambda x: f"{x}Diff" if x not in ["rank", "step"] else x) + ) + + def save_db(self): + self.dump_data(self.compare_result, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER, + self.TABLE_CLUSTER_TIME_COMPARE_SUMMARY, index=False) \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_summary/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_summary/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_summary/cluster_time_summary.py b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_summary/cluster_time_summary.py new file mode 100644 index 0000000000000000000000000000000000000000..a574850ec6d0aa2ace4d5d5b1ffaa3a3c71b6759 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/cluster_time_summary/cluster_time_summary.py @@ -0,0 +1,186 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pandas as pd + +from msprof_analyze.cluster_analyse.common_func.context import ConcurrentContext +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_exports.cluster_time_summary_export import CommunicationTimeExport +from msprof_analyze.prof_exports.cluster_time_summary_export import MemoryAndDispatchTimeExport +from msprof_analyze.prof_common.database_service import DatabaseService + +logger = get_logger() + + +class OverlapInfo: + def __init__(self, start, end, overlap_type): + self.start = start + self.end = end + self.type = overlap_type + + +class ClusterTimeSummary(BaseRecipeAnalysis): + COMPUTING_TYPE = 0 + COMMUNICATION_TYPE = 1 + MEMORY_TYPE = 4 + STEP_TRACE = "step_trace" + COMMUNICATION = "communication" + MEMORY_AND_DISPATCH = "memory_and_dispatch" + + def __init__(self, params): + super().__init__(params) + self.db_paths = self._get_rank_db() + self.stats_data = None + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + @staticmethod + def aggregate_stats(context: ConcurrentContext): + step_trace_df_list = [future.result() for future in context.future_dict[ClusterTimeSummary.STEP_TRACE]] + communication_df_list = [ + future.result() + for future in context.future_dict[ClusterTimeSummary.COMMUNICATION] + ] + memory_and_dispatch_df_list = [ + future.result() + for future in context.future_dict[ClusterTimeSummary.MEMORY_AND_DISPATCH] + ] + step_trace_df = pd.concat(step_trace_df_list, ignore_index=True) + communication_df = pd.concat(communication_df_list, ignore_index=True) + memory_and_dispatch_df = pd.concat(memory_and_dispatch_df_list, ignore_index=True) + communication_df["communicationTransmitStageTime"] = \ + communication_df.groupby(["groupName", "opName", "step"])["communication_time"].transform("min") + communication_df["communicationWaitStageTime"] = \ + communication_df["communication_time"] - communication_df["communicationTransmitStageTime"] + transmit_and_wait_df = communication_df.groupby(["rank", "step"])[ + ["communicationWaitStageTime", "communicationTransmitStageTime"]].sum().reset_index() + all_dfs = [step_trace_df, transmit_and_wait_df, memory_and_dispatch_df] + merged_df = all_dfs[0] + for df in all_dfs[1:]: + merged_df = pd.merge(merged_df, df, on=['rank', 'step'], how='outer') + # 根据 step 和 rank 列对合并后的 DataFrame 进行排序 + merged_df = merged_df.sort_values(by=['rank', 'step']) + merged_df["free"] = merged_df["free"] - merged_df["memoryNotOverlapComputationCommunication"] + merged_df = merged_df.rename(columns={ + 'computing': 'computation', + 'overlapped': 'communicationOverlapComputation', + 'communication_not_overlapped': 'communicationNotOverlapComputation'}) + return merged_df.sort_values(by=['rank', 'step']) + + @classmethod + def get_memory_not_overlap(cls, df: pd.DataFrame): + memory_not_overlap_time = 0 # free的时间段里面memory的总时间(异步拷贝) + cur_block = OverlapInfo(df.iloc[0]["start"], df.iloc[0]["start"], -1) + for time_info in df.itertuples(): + if cur_block.type == cls.MEMORY_TYPE: + tmp_start = cur_block.start + tmp_end = cur_block.end if time_info.start > cur_block.end else time_info.start + if tmp_start < tmp_end: + memory_not_overlap_time += tmp_end - tmp_start + if time_info.start > cur_block.end: + cur_block.end = time_info.end + cur_block.type = time_info.type + cur_block.start = time_info.start + else: + cur_block.type = time_info.type if time_info.end > cur_block.end else cur_block.type + cur_block.start = cur_block.end if time_info.end > cur_block.end else time_info.end + cur_block.end = time_info.end if time_info.end > cur_block.end else cur_block.end + # 此处为了添加最后一块数据 + if cur_block.type == cls.MEMORY_TYPE: + memory_not_overlap_time += cur_block.end - cur_block.start + return memory_not_overlap_time / Constant.TIME_UNIT_SCALE + + @classmethod + def calculate_dispatch_time(cls, df: pd.DataFrame) -> pd.DataFrame: + filtered_df = df[df['type'].isin([cls.COMPUTING_TYPE, cls.COMMUNICATION_TYPE])] + result = filtered_df.groupby(['step'])['dispatch'].mean().reset_index() + result = result.rename(columns={'dispatch': 'taskLaunchDelayAvgTime'}) + return result + + @classmethod + def calculate_memory_time(cls, df: pd.DataFrame) -> pd.DataFrame: + filtered_df = df[df['type'].isin([cls.MEMORY_TYPE])].copy() + filtered_df['memory'] = filtered_df['end'] - filtered_df['start'] + result = filtered_df.groupby(['step'])['memory'].sum().reset_index() + result['memory'] = result['memory'] / Constant.TIME_UNIT_SCALE + return result + + def calculate_step_trace_time(self, data_map, analysis_class): + analysis_db_path = data_map.get(Constant.ANALYSIS_DB_PATH) + rank_id = data_map.get(Constant.RANK_ID) + data_service = DatabaseService(analysis_db_path) + data_service.add_table_for_query(Constant.TABLE_STEP_TRACE, ["step", "computing", + "communication_not_overlapped", "overlapped", + "communication", "free", ]) + df = data_service.query_data().get(Constant.TABLE_STEP_TRACE) + if df is None or df.empty: + logger.warning(f"There is no stats data in {analysis_db_path}.") + return None + df.insert(0, "rank", rank_id) + df["step"] = df["step"].astype(int) + return df + + def calculate_communication_time(self, data_map, analysis_class): + analysis_db_path = data_map.get(Constant.PROFILER_DB_PATH) + df = CommunicationTimeExport(analysis_db_path, analysis_class).read_export_db() + return df + + def calculate_memory_and_dispatch_time(self, data_map, analysis_class): + """ + rank step memory computing_dispatch communication_dispatch + 0 1 120 150 200 + 0 2 130 150 200 + """ + profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) + rank_id = data_map.get(Constant.RANK_ID) + df = MemoryAndDispatchTimeExport(profiler_db_path, analysis_class).read_export_db() + if df is None or df.empty: + logger.warning(f"There is no stats data in {profiler_db_path}.") + return None + memory_df = ClusterTimeSummary.calculate_memory_time(df) + memory_not_overlap_df = (df.groupby(["step"]).apply(ClusterTimeSummary.get_memory_not_overlap). + reset_index(name="memoryNotOverlapComputationCommunication")) + dispatch_df = ClusterTimeSummary.calculate_dispatch_time(df) + result_df = pd.merge(memory_df, memory_not_overlap_df, on='step', how='inner') + result_df = pd.merge(result_df, dispatch_df, on='step', how='inner') + result_df.insert(0, "rank", rank_id) + return result_df + + def mapper_func(self, context: ConcurrentContext): + for db_map in self.db_paths: + context.submit(self.STEP_TRACE, self.calculate_step_trace_time, db_map, self._recipe_name) + context.submit(self.COMMUNICATION, self.calculate_communication_time, + db_map, self._recipe_name) + context.submit(self.MEMORY_AND_DISPATCH, self.calculate_memory_and_dispatch_time, + db_map, self._recipe_name) + + def run(self, context: ConcurrentContext): + logger.info("ClusterTimeSummary init.") + self.mapper_func(context) + context.wait_all_futures() + self.stats_data = self.aggregate_stats(context) + if self._export_type == Constant.DB: + self.save_db() + else: + logger.warning("cluster_time_summary only supports export db.") + + def save_db(self): + self.dump_data(self.stats_data, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER, + Constant.TABLE_CLUSTER_TIME_SUMMARY, index=False) diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/compute_op_sum/compute_op_sum.py b/profiler/msprof_analyze/cluster_analyse/recipes/compute_op_sum/compute_op_sum.py index 528534be399e3ceacadbe7d1acf7294d7b3ff37d..a5d44c3f17f6d2a31f097506c38d829c18d5d74f 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/compute_op_sum/compute_op_sum.py +++ b/profiler/msprof_analyze/cluster_analyse/recipes/compute_op_sum/compute_op_sum.py @@ -108,11 +108,10 @@ class ComputeOpSum(BaseRecipeAnalysis): def _mapper_func(self, data_map, analysis_class): profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) rank_id = data_map.get(Constant.RANK_ID) - step_range = data_map.get(Constant.STEP_RANGE) if self.exclude_op_name: - df = ComputeOpSumExportExcludeOpName(profiler_db_path, analysis_class, step_range).read_export_db() + df = ComputeOpSumExportExcludeOpName(profiler_db_path, analysis_class).read_export_db() else: - df = ComputeOpSumExport(profiler_db_path, analysis_class, step_range).read_export_db() + df = ComputeOpSumExport(profiler_db_path, analysis_class).read_export_db() if df is None or df.empty: logger.warning(f"There is no stats data in {profiler_db_path}.") return None diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/filter_db/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/filter_db/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..b14094e3f9a77a0970342980ed8de1017f58ce19 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/filter_db/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/filter_db/filter_db.py b/profiler/msprof_analyze/cluster_analyse/recipes/filter_db/filter_db.py new file mode 100644 index 0000000000000000000000000000000000000000..29db8f637376fa04629e5b728a2aa53c9251944c --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/filter_db/filter_db.py @@ -0,0 +1,80 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import shutil + +from msprof_analyze.prof_common.db_manager import DBManager +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_common.path_manager import PathManager +from msprof_analyze.prof_exports.filter_db_export import OPFilter +from msprof_analyze.prof_exports.filter_db_export import TaskFilter +from msprof_analyze.prof_exports.filter_db_export import CANNFilter +from msprof_analyze.prof_exports.filter_db_export import PYTORCHFilter + +logger = get_logger() + +FILTER_COMPUTE = "COMPUTE_TASK_INFO" +FILTER_TASK = "TASK" +FILTER_CANN = "CANN_API" +FILTER_PYTORCH = "PYTORCH_API" + + +class DatabaseFilter(BaseRecipeAnalysis): + def __init__(self, params): + super().__init__(params) + logger.info("filter_db init.") + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + def run(self, context): + mapper_res = self.mapper_func(context) + logger.info("Filtering database completed.") + + def _mapper_func(self, data_map, analysis_class): + profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) + rank_id = data_map.get(Constant.RANK_ID) + + paths = profiler_db_path.split(os.path.sep) + sub_path = os.path.join(*paths[-3:-1]) + + output_path = os.path.join(self._output_path, "filter_db", sub_path) + PathManager.make_dir_safety(output_path) + + filtered_db = os.path.join(output_path, f"ascend_pytorch_profiler_{rank_id}.db") + shutil.copyfile(profiler_db_path, filtered_db) + + conn, cursor = DBManager.create_connect_db(filtered_db) + + op = OPFilter(filtered_db, analysis_class).read_export_db() + op.to_sql(FILTER_COMPUTE, conn, if_exists="replace", index=False) + task = TaskFilter(filtered_db, analysis_class).read_export_db() + task.to_sql(FILTER_TASK, conn, if_exists="replace", index=False) + cann = CANNFilter(filtered_db, analysis_class).read_export_db() + cann.to_sql(FILTER_CANN, conn, if_exists="replace", index=False) + pytorch = PYTORCHFilter(filtered_db, analysis_class).read_export_db() + pytorch.to_sql(FILTER_PYTORCH, conn, if_exists="replace", index=False) + + DBManager.execute_sql(conn, "DROP TABLE IF EXISTS COMMUNICATION_TASK_INFO;") + DBManager.execute_sql(conn, "DROP TABLE IF EXISTS TASK_PMU_INFO;") + + cursor.execute("VACUUM;") + conn.commit() + + DBManager.destroy_db_connect(conn, cursor) diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/hccl_sum.py b/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/hccl_sum.py index 84ff40ac7e5d78d6ea30127739e18dfd1654e2c0..a78603ee0ac2894fb8b60a21f411e7fef9d144db 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/hccl_sum.py +++ b/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/hccl_sum.py @@ -128,10 +128,9 @@ class HcclSum(BaseRecipeAnalysis): def _mapper_func(self, data_map, analysis_class): profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) rank_id = data_map.get(Constant.RANK_ID) - step_range = data_map.get(Constant.STEP_RANGE) - df = HcclSumExport(profiler_db_path, analysis_class, step_range).read_export_db() + df = HcclSumExport(profiler_db_path, analysis_class).read_export_db() if df is None or df.empty: logger.warning(f"There is no stats data in {profiler_db_path}.") return None df["Rank"] = rank_id - return df + return df \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/stats.ipynb b/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/stats.ipynb index 51a08a854b97161ba8e88ec94809b728582d6631..87f8c6d736240531e2c28c0cf33df087ecfe38e8 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/stats.ipynb +++ b/profiler/msprof_analyze/cluster_analyse/recipes/hccl_sum/stats.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# COMMUNICATION Summary\n", + "# HCCL Summary\n", "\n", - "集群场景通信算子数据分析\n", + "集群场景Hccl算子数据分析\n", "\n", "主要包含以下3个统计内容:\n", "1. 按算子类型分组的,整个集群通信算子耗时的统计情况\n", diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7101187a2c2619f3b1c20dded14b433950b4c662 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/mstx2commop.py b/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/mstx2commop.py new file mode 100644 index 0000000000000000000000000000000000000000..6ca80230abbe90cca79d2a17da466ed5bab83c03 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/mstx2commop/mstx2commop.py @@ -0,0 +1,162 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pandas as pd + +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.prof_common.db_manager import DBManager +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_exports.mstx2commop_export import Mstx2CommopExport +from msprof_analyze.prof_common.database_service import DatabaseService + +logger = get_logger() + +TABLE_COMMUNICATION_OP = "COMMUNICATION_OP" +TABLE_STRING_IDS = "STRING_IDS" + + +def double_hash(data): + uint32_bits = 32 + uint32_max = 0xFFFFFFFF # 32 位无符号整数的最大值 + prime = [29, 131] + hash_values = [0, 0] + + for d in data: + hash_values[0] = (hash_values[0] * prime[0] + ord(d)) & uint32_max + hash_values[1] = (hash_values[1] * prime[1] + ord(d)) & uint32_max + + return ((hash_values[0] << uint32_bits) | hash_values[1]) + + +class Mstx2Commop(BaseRecipeAnalysis): + + def __init__(self, params): + super().__init__(params) + logger.info("Mstx2Commop init.") + self.communication_op = None + self.string_ids_insert = None + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + def run(self, context): + self.mapper_func(context) + + def _mapper_func(self, data_map, analysis_class): + profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) + data_service = DatabaseService(profiler_db_path) + data_service.add_table_for_query("ENUM_HCCL_DATA_TYPE", ["id", "name"]) + data_service.add_table_for_query("STRING_IDS", ["id", "value"]) + df_dict = data_service.query_data() + + df = Mstx2CommopExport(profiler_db_path, analysis_class).read_export_db() + + if df is None or df.empty: + logger.warning(f"There is no stats data in {profiler_db_path}.") + return None + + df_hccl_dt = df_dict.get("ENUM_HCCL_DATA_TYPE") + + if df_hccl_dt is None or df_hccl_dt.empty: + logger.warning(f"There is no stats data in {profiler_db_path}.") + return None + + df_string_ids = df_dict.get("STRING_IDS") + + if df_string_ids is None or df_string_ids.empty: + logger.warning(f"There is no stats data in {profiler_db_path}.") + return None + + df['value_list'] = df['value'].apply(lambda x: x.split(',')) + df['value_list_len'] = df['value_list'].apply(len) + df = df[df['value_list_len'] == 4] + df['opType_primal'] = df['value_list'].apply(lambda x: 'hcom_' + x[0][9:] + '_') + df['groupName_primal'] = df['value_list'].apply(lambda x: x[1]) + df['dataType'] = df['value_list'].apply(lambda x: x[2]) + df['count'] = df['value_list'].apply(lambda x: x[3]) + + df['groupName_hash'] = df['groupName_primal'].apply(double_hash).apply(str) + + df['gN_oT'] = df['groupName_primal'] + df['opType_primal'] + + gnot_set = set(list(df['gN_oT'])) + + df_concat = pd.DataFrame() + for g_o in gnot_set: + df_split = df[df['gN_oT'] == g_o] + df_split['queue'] = list(range(len(df_split))) + df_concat = pd.concat([df_concat, df_split], axis=0) + + df_concat['queue'] = df_concat['queue'].apply(str) + + df_concat['groupId'] = df_concat['groupName_hash'].apply(lambda x: "_" + x[-3:]) + + df_concat['opName_primal'] = df_concat['opType_primal'] + df_concat['groupId'] + '_' + df_concat['queue'] + '_1' + + df_concat['opId'] = list(range(len(df_concat))) + df_concat['relay'] = None + df_concat['retry'] = None + df_concat['algType'] = None + + df_hccl_dt['name'] = df_hccl_dt['name'].apply(lambda x: x.lower()) + hccl_data_type_dict = dict(zip(df_hccl_dt['name'], df_hccl_dt['id'])) + + string_ids_dict = dict(zip(df_string_ids['value'], df_string_ids['id'])) + + string_ids_max = df_string_ids['id'].max() + + df_concat['dataType'] = df_concat['dataType'].apply(lambda x: hccl_data_type_dict[x]) + + df_concat['string_id_opType_primal'] = df_concat['opType_primal'].apply( + lambda x: 1 if x in string_ids_dict else 0) + df_concat['string_id_opName_primal'] = df_concat['opName_primal'].apply( + lambda x: 1 if x in string_ids_dict else 0) + df_concat['string_id_groupName_primal'] = df_concat['groupName_primal'].apply( + lambda x: 1 if x in string_ids_dict else 0) + optype_primal_list = list(set(df_concat[df_concat['string_id_opType_primal'] == 0]['opType_primal'])) + opname_primal_list = list(set(df_concat[df_concat['string_id_opName_primal'] == 0]['opName_primal'])) + groupname_primal_list = list(set(df_concat[df_concat['string_id_groupName_primal'] == 0]['groupName_primal'])) + + special_primal_list = optype_primal_list + opname_primal_list + groupname_primal_list + special_id_list = list(range(string_ids_max + 1, string_ids_max + len(special_primal_list) + 1)) + + special_id_dict = dict(zip(special_primal_list, special_id_list)) + + df_concat['opType'] = df_concat['opType_primal'].apply( + lambda x: string_ids_dict[x] if x in string_ids_dict else special_id_dict[x] + ) + df_concat['opName'] = df_concat['opName_primal'].apply( + lambda x: string_ids_dict[x] if x in string_ids_dict else special_id_dict[x] + ) + df_concat['groupName'] = df_concat['groupName_primal'].apply( + lambda x: string_ids_dict[x] if x in string_ids_dict else special_id_dict[x] + ) + + communication_op = df_concat[ + ['opName', 'startNs', 'endNs', 'connectionId', 'groupName', 'opId', 'relay', 'retry', 'dataType', 'algType', + 'count', 'opType']] + communication_op.sort_values('startNs', ascending=True, inplace=True) + communication_op.set_index('opId', inplace=True) + string_ids_insert = list(map(list, zip(special_id_list, special_primal_list))) + + DBManager.insert_data_into_db(data_map.get(Constant.PROFILER_DB_PATH), TABLE_STRING_IDS, string_ids_insert) + + self.dump_data(data=communication_op, file_name=data_map.get(Constant.PROFILER_DB_PATH), + table_name=TABLE_COMMUNICATION_OP, custom_db_path=data_map.get(Constant.PROFILER_DB_PATH)) + + return data_map.get(Constant.RANK_ID) diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/mstx_sum/mstx_sum.py b/profiler/msprof_analyze/cluster_analyse/recipes/mstx_sum/mstx_sum.py index bfbcc6ffb49c6457cd54a9413e8bf7a145ec365b..69b4b056850b85a856634c7feb0121cfcb34494b 100644 --- a/profiler/msprof_analyze/cluster_analyse/recipes/mstx_sum/mstx_sum.py +++ b/profiler/msprof_analyze/cluster_analyse/recipes/mstx_sum/mstx_sum.py @@ -154,11 +154,10 @@ class MstxSum(BaseRecipeAnalysis): def _mapper_func(self, data_map, analysis_class): profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) rank_id = data_map.get(Constant.RANK_ID) - step_range = data_map.get(Constant.STEP_RANGE) - step_df = MstxStepExport(profiler_db_path, analysis_class, step_range).read_export_db() + step_df = MstxStepExport(profiler_db_path, analysis_class).read_export_db() if step_df is None or step_df.empty: step_df = pd.DataFrame({"start_ns": [0], "end_ns": [float("inf")], "step_id": [0]}) - mark_df = MstxMarkExport(profiler_db_path, analysis_class, step_range).read_export_db() + mark_df = MstxMarkExport(profiler_db_path, analysis_class).read_export_db() if mark_df is None or mark_df.empty: logger.warning(f"There is no mark data in {profiler_db_path}.") return None @@ -195,4 +194,4 @@ class MstxSum(BaseRecipeAnalysis): mark_stats_df["step_id"] = mark_stats_df.apply(compute_step_id, axis=1, step_stats_df=step_df) rename_mark_msg_name(mark_stats_df) mark_stats_df = format_columns(mark_stats_df).set_index("Name", drop=True) - return mark_stats_df + return mark_stats_df \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/p2p_pairing/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/p2p_pairing/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a355e5a7f08206fc39dda4646817224c067f29f7 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/p2p_pairing/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/p2p_pairing/p2p_pairing.py b/profiler/msprof_analyze/cluster_analyse/recipes/p2p_pairing/p2p_pairing.py new file mode 100644 index 0000000000000000000000000000000000000000..b3cce9d214ebbd62b13494c9d68c9bdfe9629d3b --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/p2p_pairing/p2p_pairing.py @@ -0,0 +1,243 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from json import JSONDecodeError + +import numpy as np +import pandas as pd + +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.cluster_analyse.common_func.table_constant import ProfilerTableConstant +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.db_manager import DBManager +from msprof_analyze.prof_common.file_manager import FileManager +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_exports.p2p_pairing_export import P2PPairingExport + + +logger = get_logger() + + +class P2PPairing(BaseRecipeAnalysis): + + P2P_OP_NAME_PATTERN = r"^hcom_([Ss]end|[Rr](ecv|eceive))__\d+_\d+_\d+$" + DOMAIN_ID_EXTRACT_PATTERN = r"__(\d+)_\d+_\d+" + RECEIVE_OP_MATCH_PATTERN = r"[Rr]ecv|[Rr]eceive" + VALID_DST_RANK_TASK_TYPE = [Constant.NOTIFY_RECORD, Constant.NOTIFY_WAIT] + # intermediate dataframe column names + COL_NAME_IS_UNIQUE_VALUE = "isUniqueValue" + COL_NAME_OP_DST_RANK = "opDstRank" + COL_NAME_DOMAIN_ID = "domainId" + COL_NAME_IS_RECEIVE = "isReceive" + COL_NAME_OP_NAMING_INDEX = "opNamingIndex" + # output column name + COL_NAME_P2P_CONNECTION_ID = "opConnectionId" + # export params + TARGET_TABLE_NAME = Constant.TABLE_COMMUNICATION_OP + + def __init__(self, params): + super().__init__(params) + logger.info("P2PPairing init.") + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + def run(self, context): + self.mapper_func(context) + logger.info("P2PPairing completed.") + + def update_connection_info_to_table(self, df_result, profiler_db_path): + """ + 将生成好的连接ID添加至COMMUNICATION OP表中,新增列`opConnectionId`。目前只处理Send和Recv算子,对应的opId会更新具体的连接ID, + 否则置空 + """ + conn, cursor = DBManager.create_connect_db(profiler_db_path) + ret = DBManager.check_columns_exist(cursor, self.TARGET_TABLE_NAME, {self.COL_NAME_P2P_CONNECTION_ID}) + if ret is None: + logger.error("Failed to connect to the database. Please check the database configurations") + return + if self.COL_NAME_P2P_CONNECTION_ID in ret: + logger.error(f"`{self.COL_NAME_P2P_CONNECTION_ID}` already exists in the {self.TARGET_TABLE_NAME}. " + f"Exiting to prevent result overwrite.") + return + DBManager.execute_sql( + conn, + f"ALTER TABLE {self.TARGET_TABLE_NAME} ADD COLUMN {self.COL_NAME_P2P_CONNECTION_ID} TEXT" + ) + DBManager.execute_sql( + conn, + f"UPDATE {self.TARGET_TABLE_NAME} SET {self.COL_NAME_P2P_CONNECTION_ID} = NULL" + ) + DBManager.executemany_sql( + conn, + f""" + UPDATE {self.TARGET_TABLE_NAME} + SET {self.COL_NAME_P2P_CONNECTION_ID} = ? + WHERE {ProfilerTableConstant.OP_ID} = ?;""", + [(row[self.COL_NAME_P2P_CONNECTION_ID], row[P2PPairingExport.CO_OP_NAME]) + for _, row in df_result.iterrows()] + ) + DBManager.destroy_db_connect(conn, cursor) + + def generate_p2p_connection_index(self, df): + """ + 生成每一个P2P的算子的对应连接ID,连接ID的生成规则按照`通信域_Send卡号_Recv卡号_算子index`。 + 其中通信域为通信域字符串的哈希值后三位表示;Send卡和Recv卡分别为这个通信域内的local rank号;算子index是这两张卡之间按时间线排序, + 出现Send和Recv算子已有的频次。比如说,一个算子的名称为`hcom_send_233_58_1`,自己在通信域内的rank号为0,对端的rank号为1;在这之前 + 并没有存在0卡向1卡的Send任务。因此生成的id为`233_0_1_0` + """ + df[self.COL_NAME_DOMAIN_ID] = df[P2PPairingExport.OP_NAME]. \ + str.extract(self.DOMAIN_ID_EXTRACT_PATTERN)[0] + df[self.COL_NAME_IS_RECEIVE] = df[P2PPairingExport.OP_NAME]. \ + str.contains(self.RECEIVE_OP_MATCH_PATTERN) + df.loc[ + df[self.COL_NAME_IS_RECEIVE], [P2PPairingExport.SRC_RANK, self.COL_NAME_OP_DST_RANK] + ] = df.loc[ + df[self.COL_NAME_IS_RECEIVE], [self.COL_NAME_OP_DST_RANK, P2PPairingExport.SRC_RANK] + ].values + df[self.COL_NAME_OP_NAMING_INDEX] = df.sort_values(by=[P2PPairingExport.START_TIME]). \ + groupby([P2PPairingExport.SRC_RANK, self.COL_NAME_OP_DST_RANK]).cumcount() + df[self.COL_NAME_P2P_CONNECTION_ID] = (df[self.COL_NAME_DOMAIN_ID].astype(str) + "_" + + df[P2PPairingExport.SRC_RANK].astype(str) + "_" + + df[self.COL_NAME_OP_DST_RANK].astype(str) + "_" + + df[self.COL_NAME_OP_NAMING_INDEX].astype(str)) + return df.reset_index() + + def fine_filtering_src_dst_ranks(self, df: pd.DataFrame): + """ + 精筛符合条件的数据: + 1、小算子任务包含了“Notify_Record”和“Notify_Wait”的数据 + 2、上一步得到的数据中对端卡号是否一致,如果不一致则会抛出warning + 3、步骤1得到数据中本端卡号是否一致,如果不一致则会报出error返回空值 + """ + df = df[df[P2PPairingExport.TASK_TYPE].isin(self.VALID_DST_RANK_TASK_TYPE)] + + def check_dst_rank_unique(group): + return group[P2PPairingExport.DST_RANK].nunique() == 1 + + unique_dst_rank: pd.DataFrame = (df.groupby(P2PPairingExport.OP_NAME) + .apply(check_dst_rank_unique, include_groups=False)) + + def get_dst_rank_value(group): + if group[P2PPairingExport.DST_RANK].nunique() == 1: + return group[P2PPairingExport.DST_RANK].iloc[0] + return np.nan + + dst_rank_value: pd.DataFrame = (df.groupby(P2PPairingExport.OP_NAME, group_keys=False). + apply(get_dst_rank_value, include_groups=False)) + + df = df.copy() + df[self.COL_NAME_IS_UNIQUE_VALUE] = df[P2PPairingExport.OP_NAME].map(unique_dst_rank) + df[self.COL_NAME_OP_DST_RANK] = df[P2PPairingExport.OP_NAME].map(dst_rank_value) + df[self.COL_NAME_OP_DST_RANK] = df[self.COL_NAME_OP_DST_RANK].fillna(Constant.INVALID_RANK_NUM) + df[self.COL_NAME_OP_DST_RANK] = df[self.COL_NAME_OP_DST_RANK].astype(df[P2PPairingExport.DST_RANK].dtype) + + check_dst_rank_unique_false: pd.DataFrame = df[~df[self.COL_NAME_IS_UNIQUE_VALUE]] + if not check_dst_rank_unique_false.empty: + logger.warning(f"There are communication op entries with multiple destination ranks! " + f"Please check the corresponding profiler database file.") + + df = df[df[self.COL_NAME_IS_UNIQUE_VALUE]] + + src_rank_unique_values: int = df[P2PPairingExport.SRC_RANK].nunique() + if src_rank_unique_values != 1: + logger.error(f"There are communication op entries with multiple source ranks! " + f"Please check the corresponding profiler database file.") + return None + return df.reset_index() + + def filter_data_by_group_name(self, df: pd.DataFrame): + """ + 初步筛选出目标数据: + 1、筛选出Send和Recv的算子 + 2、筛选出同一opId在COMMUNICATION OP中groupName和COMMUNICATION TASK INFO中groupName一致的数据 + """ + df = df[df[P2PPairingExport.OP_NAME].str.match(self.P2P_OP_NAME_PATTERN)] + filtered_df = df[df[P2PPairingExport.CO_GROUP_NAME] == df[P2PPairingExport.CTI_GROUP_NAME]] + anomaly_group_match = df[df[P2PPairingExport.CO_GROUP_NAME] != df[P2PPairingExport.CTI_GROUP_NAME]] + if not anomaly_group_match.empty: + logger.warning(f"Group name mismatch in {len(anomaly_group_match)} entries. Please check the" + f" profiler database in communication task info.") + return filtered_df.reset_index() + + def extract_pp_group_from_metadata(self, profiler_parent_path) -> any: + """ + 从profiler_metadata.json的文件中获取pp通信域的信息 + """ + metadata_path = os.path.join(profiler_parent_path, Constant.PROFILER_METADATA) + try: + if os.path.exists(metadata_path): + metadata = FileManager.read_json_file(metadata_path) + parallel_group_info: dict = metadata.get(Constant.PARALLEL_GROUP_INFO, None) if metadata else None + else: + raise FileNotFoundError(f"No `{Constant.PROFILER_METADATA}` found in {profiler_parent_path}.") + except (FileNotFoundError, JSONDecodeError) as e: + logger.error(f"Failed to load profiler metadata: {e}") + return None + + if parallel_group_info is None: + logger.error(f"No key name `{Constant.PARALLEL_GROUP_INFO}` found in {metadata_path}") + return None + + pp_group_info = [] + for name in parallel_group_info: + each_group_info: dict = parallel_group_info[name] + if each_group_info[Constant.GROUP_NAME] == Constant.PP: + pp_group_info.append(parallel_group_info[name]) + if not pp_group_info: + logger.error(f"No pipeline parallel info found in {metadata_path}") + return None + + return pp_group_info + + def _mapper_func(self, data_map, analysis_class): + profiler_db_path: str = data_map.get(Constant.PROFILER_DB_PATH) + profiler_parent_path: str = os.path.dirname(os.path.dirname(profiler_db_path)) + + df: pd.DataFrame = P2PPairingExport(profiler_db_path, analysis_class).read_export_db() + if df is None or df.empty: + logger.warning(f"There is no stats data in {profiler_db_path}.") + return None + + pp_group_info = self.extract_pp_group_from_metadata(profiler_parent_path) # 暂时没用到,预留给后续确认用全局rank + if pp_group_info is None: + logger.error(f"Cannot obtain pipeline parallel info from the metadata. " + f"Please check the corresponding {Constant.PROFILER_METADATA}") + + df = self.filter_data_by_group_name(df) + if df.empty: + return None + + df_filtered = self.fine_filtering_src_dst_ranks(df.copy()) + if df_filtered is None: + logger.error("Got error when trying to match rank numbers!") + return None + + df_result = df_filtered.groupby([P2PPairingExport.OP_NAME, P2PPairingExport.CO_OP_NAME]).agg( + { + P2PPairingExport.START_TIME: "first", + P2PPairingExport.SRC_RANK: "first", + self.COL_NAME_OP_DST_RANK: "first" + } + ).reset_index() + + df_result = self.generate_p2p_connection_index(df_result) + + df_result = df_result[[P2PPairingExport.CO_OP_NAME, self.COL_NAME_P2P_CONNECTION_ID]] + + self.update_connection_info_to_table(df_result, profiler_db_path) + return data_map.get(Constant.RANK_ID) diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/slow_link.py b/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/slow_link.py new file mode 100644 index 0000000000000000000000000000000000000000..f2c5e5fe7d8004687a8cd0ab6eae929659b662b9 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/slow_link.py @@ -0,0 +1,216 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +from collections import defaultdict + +import pandas as pd +import numpy as np +from tqdm import tqdm + +from msprof_analyze.cluster_analyse.common_func.utils import describe_duration +from msprof_analyze.cluster_analyse.common_func.utils import detect_outliers_z_score +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_exports.slow_link_export import SlowLinkExport + +logger = get_logger() + + +class SlowLink(BaseRecipeAnalysis): + TABLE_SLOW_LINK_SUM = "SlowLinkSum" + TABLE_SLOW_LINK_OPS = "SlowLinkOps" + + TOP_NUM = "top_num" + DEFAULT_TOP_NUM = 15 + + def __init__(self, params): + super().__init__(params) + logger.info("SlowLink init.") + self.slow_link_sum = [] + self.slow_link_ops = [] + top_num = self._extra_args.get(self.TOP_NUM, self.DEFAULT_TOP_NUM) + self.top_num = int(top_num) if isinstance(top_num, str) and top_num.isdigit() else self.DEFAULT_TOP_NUM + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + @classmethod + def add_parser_argument(cls, parser): + parser.add_argument("--top_num", type=str, help="Duration cost top count", default=cls.DEFAULT_TOP_NUM) + + def merge_func(self, mapper_res): + # 过滤掉mapper_res中为None的元素 + mapper_res = list(filter(lambda df: df is not None, mapper_res)) + + # 如果过滤后mapper_res为空,记录错误并返回 + if not mapper_res: + logger.error("Mapper data is empty. Please check the input or data source.") + return + dataframes = [pd.DataFrame(item) for item in mapper_res] + mapper_res = pd.concat(dataframes, ignore_index=True) + # 从mapper_res中提取各个字段的值 + rank_id_arr = mapper_res["rankId"].values # 提取rankId数组 + num_ranks = len(rank_id_arr) # 获取rankId数组的长度 + group_name_arr = mapper_res["groupName"].values # 提取groupName数组 + communication_time_arr = mapper_res["communicationTime"].values # 提取通信时间数组 + op_name_arr = mapper_res["opName"].values # 提取操作名称数组 + + # 初始化用于存储分组信息的字典和数组 + process_group = defaultdict(lambda: defaultdict(list)) # 用于存储按组和操作名分组的索引 + transmit_time_arr = np.zeros(num_ranks, dtype=np.int64) # 初始化传输时间数组 + related_ranks_arr = np.zeros(num_ranks, dtype=np.int32) # 初始化相关rank数量数组 + + # 遍历所有记录,按groupName和opName分组 + for idx in range(num_ranks): + # 如果操作名称中包含"send"或"receive",跳过(可能是发送或接收操作) + if "send" in op_name_arr[idx] or "receive" in op_name_arr[idx]: + continue + # 将当前索引添加到对应的分组中 + process_group[group_name_arr[idx]][op_name_arr[idx]].append(idx) + + # 遍历分组后的数据,计算每个操作的传输时间和相关rank数量 + for _, ops_same_group in tqdm(process_group.items(), desc="Processing database data..."): + for _, ops in ops_same_group.items(): + # 提取当前分组中所有操作的通信时间 + communication_time_list = [communication_time_arr[op_idx] for op_idx in ops] + # 计算最小通信时间作为传输时间 + transmit_time = min(communication_time_list) + # 计算当前分组中操作的数量作为相关rank数量 + related_ranks_num = len(ops) + + # 更新传输时间和相关rank数量数组 + for op_idx in ops: + transmit_time_arr[op_idx] = transmit_time + related_ranks_arr[op_idx] = related_ranks_num + + # 将计算得到的传输时间和相关rank数量添加到mapper_res中 + mapper_res.insert(mapper_res.shape[1], 'transmitTime', transmit_time_arr) + mapper_res.insert(mapper_res.shape[1], 'relatedRanks', related_ranks_arr) + + # 调用过滤函数处理mapper_res + self.filter_func(mapper_res) + + def filter_func(self, mapper_res): + """ + 处理数据,分组并检测异常值。 + """ + # 按 opType, dataSize, related_ranks 分组 + grouped = mapper_res.groupby(['opType', 'dataSize', 'relatedRanks']) + + for _, group in grouped: + # 提取分组数据中的 transmit_time 列 + transmit_time_data = group['transmitTime'].values + + # 检测异常值 + outliers = detect_outliers_z_score(transmit_time_data) + + if outliers: + # 如果存在异常值,将整个分组数据存入 Slow_Link_Ops + self.slow_link_ops.append(group) + + if self.slow_link_ops: + self.slow_link_ops = pd.concat(self.slow_link_ops, ignore_index=True) + # 重置索引并去掉多余的索引列 + data = pd.DataFrame(self.slow_link_ops) + + # 按 'opType', 'dataSize', 'related_ranks' 分组 + grouped = data.groupby(['opType', 'dataSize', 'relatedRanks']) + + # 计算统计信息 + group_data = describe_duration(grouped['transmitTime']) + + # 找到每个组中 transmit_time 最小值和最大值对应的 rankId + min_rank = grouped['transmitTime'].idxmin().map(data['rankId']) + max_rank = grouped['transmitTime'].idxmax().map(data['rankId']) + + # 将最大值和最小值对应的 rankId 添加到 group_data + group_data['maxRank'] = max_rank.values + group_data['minRank'] = min_rank.values + + # 构造 filteringName + group_data['opTypeRelatedRanksDataSize'] = group_data.index.map(lambda x: f"{x[0]}{x[2]}_{x[1]}") + # 将 filteringName 移动到第一列 + cols = ['opTypeRelatedRanksDataSize'] + [col for col in group_data.columns if + col != 'opTypeRelatedRanksDataSize'] + group_data = group_data[cols] + + # 重置索引 + group_data = group_data.reset_index(drop=True) + # 计算最大值和最小值与均值的绝对值 + group_data['abs_max_mean'] = abs(group_data['MaxNs'] - group_data['MeanNs']) + group_data['abs_min_mean'] = abs(group_data['MinNs'] - group_data['MeanNs']) + + # 计算最大值和最小值与均值的绝对值中的较大值 + group_data['max_abs_mean'] = group_data[['abs_max_mean', 'abs_min_mean']].max(axis=1) + + # 计算偏移比值 + group_data['offsetRatio'] = group_data['max_abs_mean'] / group_data['StdNs'] + + # 按偏移比值降序排序 + group_data = group_data.sort_values(by='offsetRatio', ascending=False) + + # 根据 self.top_num 筛选出偏移比值最大的前 N 条记录 + group_data = group_data.head(self.top_num) + + # 删除辅助列 'abs_max_mean', 'abs_min_mean', 'max_abs_mean' + group_data = group_data.drop(columns=['abs_max_mean', 'abs_min_mean', 'max_abs_mean']) + + # 调整列的顺序,将 offsetRatio 移到 MinRank 和 MaxRank 之前 + columns = [col for col in group_data.columns if col not in ['maxRank', 'minRank', 'offsetRatio']] + columns.insert(len(columns), 'offsetRatio') # 将 offsetRatio 插入到倒数第三的位置 + columns.extend(['maxRank', 'minRank']) # 添加 MaxRank 和 MinRank 到列的最后 + + # 重新排列列的顺序 + group_data = group_data[columns] + + # 在处理 group_data 的最后部分并保存 + self.slow_link_sum = group_data + + def run(self, context): + if self.top_num <= 0: + logger.warning(f"SlowLink: top_num is set to a invalid value, " + f"it will be reset to default value({self.DEFAULT_TOP_NUM}).") + self.top_num = self.DEFAULT_TOP_NUM + mapper_res = self.mapper_func(context) + self.merge_func(mapper_res) + + if self._export_type == "db": + self.save_db() + elif self._export_type == "notebook": + self.save_notebook() + else: + logger.error("Unknown export type.") + + def save_notebook(self): + self.dump_data(self.slow_link_sum, "slow_link_sum.csv", index=False) + self.dump_data(self.slow_link_ops, "slow_link_ops.csv", index=False) + self.create_notebook("stats.ipynb") + self.add_helper_file("cluster_display.py") + + def save_db(self): + self.dump_data(self.slow_link_sum, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER, self.TABLE_SLOW_LINK_SUM, + index=False) + self.dump_data(self.slow_link_ops, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER, self.TABLE_SLOW_LINK_OPS, + index=False) + + def _mapper_func(self, data_map, analysis_class): + profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) + rank_id = data_map.get(Constant.RANK_ID) + df = SlowLinkExport(profiler_db_path, analysis_class).read_export_db() + if df is None or df.empty: + logger.warning(f"There is no stats data in {profiler_db_path}.") + return None + df.insert(0, "rankId", rank_id) + return df \ No newline at end of file diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/stats.ipynb b/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/stats.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..30edbc245379aa6b02e8895427bc7ad5db6656b3 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/slow_link/stats.ipynb @@ -0,0 +1,111 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SLOWLINK Summary\n", + "\n", + "集群场景快慢卡数据分析\n", + "\n", + "主要包含以下2个统计内容:\n", + "1. 按算子类型分组的,整个集群通信算子耗时的统计情况\n", + "2. 整个集群异常的opType_relatedRanks_dataSize" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 数据准备" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display, HTML\n", + "display(HTML(\"\"))\n", + "\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import pandas as pd\n", + "pd.set_option(\"display.max_rows\", 100)\n", + "pd.set_option(\"display.width\", 1000)\n", + "\n", + "import cluster_display\n", + "\n", + "slow_link_ops_df = pd.read_csv(\"slow_link_ops.csv\")\n", + "slow_link_sum_df = pd.read_csv(\"slow_link_sum.csv\", index_col=\"opTypeRelatedRanksDataSize\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cluster_display.display_transmittime_bar(slow_link_ops_df, 0.05, 'hcom_allGather_', 5, 1024)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "### 集群异常的opType_relatedRanks_dataSize分析\n", + "\n", + "统计集群异常的opType_relatedRanks_dataSize,时间单位为微秒(us)\n", + "\n", + "包含以下统计项:\n", + "- Count:算子数量\n", + "- Mean:平均耗时\n", + "- Std:标准差\n", + "- Min:最小值\n", + "- Q1:四分之一分位数\n", + "- Median:中位数\n", + "- Q3:四分之三分位数\n", + "- Max:最大值\n", + "- Sum:总耗时\n", + "- MinRank:耗时最少算子所在的Rank\n", + "- MaxRank:耗时最长算子所在的Rank" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "display(slow_link_sum_df)\n", + "fig_slow_link_ops = cluster_display.display_duration_boxplots(None, slow_link_sum_df, x_title=\"opTypeRelatedRanksDataSize\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/slow_rank_pp_stage/__init__.py b/profiler/msprof_analyze/cluster_analyse/recipes/slow_rank_pp_stage/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a355e5a7f08206fc39dda4646817224c067f29f7 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/slow_rank_pp_stage/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/profiler/msprof_analyze/cluster_analyse/recipes/slow_rank_pp_stage/slow_rank_pp_stage.py b/profiler/msprof_analyze/cluster_analyse/recipes/slow_rank_pp_stage/slow_rank_pp_stage.py new file mode 100644 index 0000000000000000000000000000000000000000..fd5bdc05dc04156fc22ff51f0c19e0d2dba64190 --- /dev/null +++ b/profiler/msprof_analyze/cluster_analyse/recipes/slow_rank_pp_stage/slow_rank_pp_stage.py @@ -0,0 +1,295 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import json +from collections import defaultdict + +import pandas as pd + +from msprof_analyze.cluster_analyse.recipes.base_recipe_analysis import BaseRecipeAnalysis +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.logger import get_logger +from msprof_analyze.prof_exports.cluster_time_summary_export import CommunicationTimeExport +from msprof_analyze.prof_common.database_service import DatabaseService + +logger = get_logger() + + +class SlowRankPPStageAnalysis(BaseRecipeAnalysis): + TP_SIZE = "tensor_model_parallel_size" + PP_SIZE = "pipeline_model_parallel_size" + DP_SIZE = "data_parallel_size" + + def __init__(self, params): + super().__init__(params) + logger.info("SlowRank PPstage analysis init.") + + self.p2p_analysis_result = None + self.pp_analysis_result = None + self.p2p_vote_result = None + self.pp_vote_result = None + + self.distributed_args = self.load_distributed_args() + + @property + def base_dir(self): + return os.path.basename(os.path.dirname(__file__)) + + @classmethod + def add_parser_argument(cls, parser): + parser.add_argument("--tp", type=int, help=cls.TP_SIZE, default=None) + parser.add_argument("--pp", type=int, help=cls.PP_SIZE, default=None) + parser.add_argument("--dp", type=int, help=cls.DP_SIZE, default=None) + + def reducer_func(self, mapper_res): + mapper_res = list(filter(lambda df: df is not None, mapper_res)) + if not mapper_res: + logger.error("Mapper data is None.") + return None + concated_df = pd.concat(mapper_res) + return concated_df + + def run(self, context): + if self.distributed_args is None: + return + mapper_res = self.mapper_func(context) + comm_ops_df = self.reducer_func(mapper_res) + if comm_ops_df is None: + return + + p2p_analysis_result_list = [] + p2p_vote_result_list = [] + pp_analysis_result_list = [] + pp_vote_result_list = [] + + pp_stage_rank_map = self.map_rank_pp_stage() + + for _, df_one_step in comm_ops_df.groupby("step"): + p2p_analysis_result, p2p_vote_result, pp_analysis_result, pp_vote_result = \ + SlowRankPPStageStepAnalysis(df_one_step).analysis(pp_stage_rank_map) + p2p_analysis_result_list.append(p2p_analysis_result) + p2p_vote_result_list.append(p2p_vote_result) + pp_analysis_result_list.append(pp_analysis_result) + pp_vote_result_list.append(pp_vote_result) + + for step_id, (p2p_analysis_result, p2p_vote_result, pp_analysis_result, pp_vote_result) in \ + enumerate( + zip( + p2p_analysis_result_list, + p2p_vote_result_list, + pp_analysis_result_list, + pp_vote_result_list + )): + p2p_analysis_result["step"] = step_id + p2p_vote_result["step"] = step_id + pp_analysis_result["step"] = step_id + pp_vote_result["step"] = step_id + + self.p2p_analysis_result = pd.concat(p2p_analysis_result_list) + self.p2p_vote_result = pd.concat(p2p_vote_result_list) + self.pp_analysis_result = pd.concat(pp_analysis_result_list) + self.pp_vote_result = pd.concat(pp_vote_result_list) + + if self._export_type == Constant.DB: + self.save_db() + else: + logger.error("SlowRank PPstage is not supported for notebook export type.") + + def save_db(self): + self.dump_data(self.p2p_vote_result, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER, "P2PAnalysisResult") + self.dump_data(self.pp_vote_result, Constant.DB_CLUSTER_COMMUNICATION_ANALYZER, "PPAnalysisResult") + + def map_rank_pp_stage(self): + tp_size = self.distributed_args.get(self.TP_SIZE, 1) + pp_size = self.distributed_args.get(self.PP_SIZE, 1) + dp_size = self.distributed_args.get(self.DP_SIZE, 1) + + rank_pp_stage_map = {} + rank = 0 + for i in range(pp_size): + for _ in range(tp_size * dp_size): + rank_pp_stage_map[rank] = i + rank += 1 + return rank_pp_stage_map + + def load_distributed_args(self): + tp_size = self._extra_args.get("tp", None) + pp_size = self._extra_args.get("pp", None) + dp_size = self._extra_args.get("dp", None) + + if tp_size and pp_size and dp_size: + if tp_size <= 0 or pp_size <= 0 or dp_size <= 0: + logger.error("Invalid distributed_args, tp pp dp < 0.") + return None + return { + self.TP_SIZE: tp_size, + self.DP_SIZE: dp_size, + self.PP_SIZE: pp_size, + } + else: + rank_id = list(self._data_map.keys())[0] + profiler_db_path = self._data_map[rank_id] + db_path = os.path.join(profiler_db_path, Constant.SINGLE_OUTPUT, f"ascend_pytorch_profiler_{rank_id}.db") + if os.path.exists(db_path): + try: + service = DatabaseService(db_path) + service.add_table_for_query("META_DATA", ["name", "value"]) + df = service.query_data().get("META_DATA", None) + distributed_args = df.loc[df["name"] == "distributed_args", "value"] + if distributed_args.empty: + logger.error("Distributed args not in profiling files, please input manually.") + else: + distributed_args = json.loads(distributed_args.values[0]) + except Exception as err: + logger.error(err) + logger.error("Distributed args not in profiling files, please input manually.") + return None + + tp_size = distributed_args.get(self.TP_SIZE, 1) + pp_size = distributed_args.get(self.PP_SIZE, 1) + dp_size = distributed_args.get(self.DP_SIZE, 1) + if not isinstance(tp_size, int) or not isinstance(pp_size, int) or not isinstance(dp_size, int): + logger.error("Invalid distributed_args in profiling files, please input manually.") + return None + if tp_size <= 0 or pp_size <= 0 or dp_size <= 0: + logger.error("Invalid distributed_args in profiling files, please input manually.") + return None + return { + self.TP_SIZE: tp_size, + self.PP_SIZE: pp_size, + self.DP_SIZE: dp_size, + } + + logger.error(f"Db_file: {db_path} not exist.") + return None + + def _mapper_func(self, data_map, analysis_class): + profiler_db_path = data_map.get(Constant.PROFILER_DB_PATH) + df = CommunicationTimeExport(profiler_db_path, analysis_class).read_export_db() + return df + + +class SlowRankPPStageStepAnalysis: + def __init__(self, comm_ops): + self.comm_ops = comm_ops + self.exclude_ranks = [] + + def grouping_pp_stage_ops(self, pp_stage_rank_map): + p2p_op_group = defaultdict(lambda: defaultdict(list)) + pp_op_group = defaultdict(lambda: defaultdict(list)) + + def divid_opname(op_name): + # op_name的格式:输入 OPTYPE__GORUPHASH_IDX_1 输出 OPTYPE_IDX + splited_name = op_name.split("__") + if len(splited_name) != 2: + return None + splited_num = splited_name[1].split("_") + if len(splited_num) != 3: + return None + return "_".join([splited_name[0], splited_num[1]]) + + ops_num = len(self.comm_ops) + op_name_arr = self.comm_ops["opName"].values + rank_id_arr = self.comm_ops["rank"].values + for idx in range(ops_num): + rank = rank_id_arr[idx] + op_name = op_name_arr[idx] + op_name_short = divid_opname(op_name) + if op_name_short is None: + continue + pp_stage_idx = pp_stage_rank_map[rank] + if rank in self.exclude_ranks: + continue + if "send" in op_name_short or "receive" in op_name_short: + p2p_op_group[pp_stage_idx][op_name_short].append(idx) + else: + pp_op_group[pp_stage_idx][op_name_short].append(idx) + + return p2p_op_group, pp_op_group + + def analysis_pp_stage(self, vote_group): + min_time_dict = defaultdict(lambda: defaultdict(lambda: 0)) + max_time_dict = defaultdict(lambda: defaultdict(lambda: 0)) + mean_time_dict = defaultdict(lambda: defaultdict(lambda: 0)) + count_dict = defaultdict(lambda: defaultdict(lambda: 0)) + rank_vote = defaultdict(lambda: 0) + perpetrator_dict = defaultdict(lambda: defaultdict(lambda: 0)) + minimum_rank_op_name = defaultdict(list) + + communication_time_arr = self.comm_ops["communication_time"].values + rank_id_arr = self.comm_ops["rank"].values + for pp_idx, ops_same_group in vote_group.items(): + for op_name, ops in ops_same_group.items(): + communication_time_list = [communication_time_arr[op_idx] for op_idx in ops] + min_time = min(communication_time_list) + min_op_idx = ops[communication_time_list.index(min_time)] + min_op_rank = rank_id_arr[min_op_idx] + rank_vote[min_op_rank] += 1 + perpetrator_dict[pp_idx][op_name] = min_op_rank + minimum_rank_op_name[min_op_rank].append(op_name) + + max_time = max(communication_time_list) + mean_time = sum(communication_time_list) // len(communication_time_list) + min_time_dict[pp_idx][op_name] = min_time + max_time_dict[pp_idx][op_name] = max_time + mean_time_dict[pp_idx][op_name] = mean_time + count_dict[pp_idx][op_name] = len(ops) + + analysis_result = pd.DataFrame( + columns=[ + "ppIdx", + "opName", + "minTime", + "maxTime", + "meanTime", + "count", + "perpetratorRank" + ] + ) + + for pp_idx in min_time_dict.keys(): + for op_name in min_time_dict[pp_idx].keys(): + analysis_result.loc[len(analysis_result)] = [ + pp_idx, op_name, + min_time_dict[pp_idx][op_name], + max_time_dict[pp_idx][op_name], + mean_time_dict[pp_idx][op_name], + count_dict[pp_idx][op_name], + perpetrator_dict[pp_idx][op_name] + ] + + vote_result = pd.DataFrame(columns=["rankId", "minimumTimes"]) + for rank, minimum_times in rank_vote.items(): + vote_result.loc[len(vote_result)] = [rank, minimum_times] + vote_result.set_index(["rankId"], inplace=True) + + return analysis_result, vote_result + + def analysis(self, pp_stage_rank_map): + self.select_exclude_ranks() + p2p_op_group, pp_op_group = self.grouping_pp_stage_ops(pp_stage_rank_map) + p2p_analysis_result, p2p_vote_result = self.analysis_pp_stage(p2p_op_group) + pp_analysis_result, pp_vote_result = self.analysis_pp_stage(pp_op_group) + return p2p_analysis_result, p2p_vote_result, pp_analysis_result, pp_vote_result + + def select_exclude_ranks(self): + grouped_df = self.comm_ops.groupby("rank") + for rank in grouped_df.groups.keys(): + ops_groupby_rank = grouped_df.get_group(rank) + ops_num = ops_groupby_rank.groupby("opName").size().values + if len(set(ops_num)) > 1: + self.exclude_ranks.append(rank) + \ No newline at end of file diff --git a/profiler/msprof_analyze/compare_tools/compare_backend/compare_bean/origin_data_bean/trace_event_bean.py b/profiler/msprof_analyze/compare_tools/compare_backend/compare_bean/origin_data_bean/trace_event_bean.py index 9d813c23b63350ecc724dfea5cbdd36ac0579afd..ab12d640a1aad9478ca067c56db3bcc10a156a0c 100644 --- a/profiler/msprof_analyze/compare_tools/compare_backend/compare_bean/origin_data_bean/trace_event_bean.py +++ b/profiler/msprof_analyze/compare_tools/compare_backend/compare_bean/origin_data_bean/trace_event_bean.py @@ -193,7 +193,7 @@ class TraceEventBean: return self._args.get("name", "").find("Communication") != -1 def is_hccl_process_name(self) -> bool: - return self.process_name in ["Communication", "HCCL"] + return self.process_name == "HCCL" def is_overlap_process_name(self) -> bool: return self.process_name == "Overlap Analysis" diff --git a/profiler/msprof_analyze/compare_tools/compare_backend/data_prepare/sequence_pre_matching.py b/profiler/msprof_analyze/compare_tools/compare_backend/data_prepare/sequence_pre_matching.py index cdca93a92767f169f4f4c014ed18f5aa7d407a7a..5c2590c723e646660b456acbdf3f114fb2726190 100644 --- a/profiler/msprof_analyze/compare_tools/compare_backend/data_prepare/sequence_pre_matching.py +++ b/profiler/msprof_analyze/compare_tools/compare_backend/data_prepare/sequence_pre_matching.py @@ -91,7 +91,7 @@ class SequencePreMatching: base_index += 1 comparison_index += 1 while comparison_index < comparison_data_len: - result_data.extend(self._match_torch_op([], comparison_data[comparison_index].get(Constant.OPS, []))) + result_data.extend(self._match_torch_op([], comparison_data[0].get(Constant.OPS, []))) comparison_index += 1 return result_data diff --git a/profiler/msprof_analyze/module_visualization/__init__.py b/profiler/msprof_analyze/module_visualization/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/module_visualization/graph/__init__.py b/profiler/msprof_analyze/module_visualization/graph/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/module_visualization/graph/prof_node.py b/profiler/msprof_analyze/module_visualization/graph/prof_node.py new file mode 100644 index 0000000000000000000000000000000000000000..1f39ee9bfa2dd86cc02ea74400de8bcebdd75e21 --- /dev/null +++ b/profiler/msprof_analyze/module_visualization/graph/prof_node.py @@ -0,0 +1,217 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.base_node import BaseNode +from msprof_analyze.prof_common.trace_event_bean import TraceEventBean + + +class ProfNode(BaseNode): + + def __init__(self, event: TraceEventBean, parent_node=None): + super().__init__(event, parent_node) + self._kernel_total_list = [] + self._communication_total_list = [] + self._precision_index = 1 + self._computing_time = 0 + self._uncovered_comm_time = 0 + self._free_time = 0 + self._step_id = None + self._micro_step_id = None + self._bwd_overall_data = {} + + @property + def node_id(self): + return self._event.unique_id + + @property + def node_type(self): + if self._event.event_type is None: + return Constant.VIRTUAL_TYPE + return self._event.event_type + + @property + def step_id(self): + return self._step_id + + @property + def micro_step_id(self): + return self._micro_step_id + + @property + def is_backward(self): + return self.node_id.startswith(Constant.BACKWARD_MODULE) + + @property + def fwd_bwd_id(self): + return self._event.fwd_bwd_id + + @property + def is_bwd(self): + return "BACKWARD" in self.node_id + + @property + def total_kernels(self): + if self.node_type == Constant.VIRTUAL_TYPE: + return [kernel for node in self.child_nodes for kernel in node.total_kernels] + return self._kernel_total_list + + @property + def total_communications(self): + if self.node_type == Constant.VIRTUAL_TYPE: + return [comm for node in self.child_nodes for comm in node.total_communications] + return self._communication_total_list + + @property + def host_total_dur(self): + if self.node_type == Constant.VIRTUAL_TYPE: + return sum((node.host_total_dur for node in self.child_nodes)) + return self._event.dur + + @property + def host_self_dur(self): + if self.node_type == Constant.VIRTUAL_TYPE: + return 0 + return self.host_total_dur - sum((node.host_total_dur for node in self.child_nodes)) + + @property + def device_total_dur(self): + return sum((kernel.dur for kernel in self.total_kernels)) + + @property + def device_self_dur(self): + if self.node_type == Constant.VIRTUAL_TYPE: + return 0 + return self.device_total_dur - sum((node.device_total_dur for node in self.child_nodes)) + + @property + def input_data(self) -> dict: + data = {} + input_dim = self._event.args.get("Input Dims") + if input_dim: + data["Input Dims"] = input_dim + input_type = self._event.args.get("Input type") + if input_type: + data["Input type"] = input_type + return data + + @property + def kernel_data(self) -> list: + return [kernel.kernel_info for kernel in self.total_kernels] + + @property + def communication_data(self) -> list: + return [[comm.name, comm.dur] for comm in self.total_communications] + + @property + def overall_data(self): + return {"Computing Time(us)": round(self._computing_time, 3), + "Uncovered Communication Time(us)": round(self._uncovered_comm_time, 3), + "Free Time(us)": round(self._free_time, 3)} + + @property + def data(self): + data = { + "Overall Metrics": self.overall_data} if self.node_type != Constant.OPERATOR_TYPE else {} + if self._bwd_overall_data: + data.update({"Backward Overall Metrics": self._bwd_overall_data}) + data.update({"Input Data": self.input_data, + "precision_index": self.precision_index, + "Host Self Duration(us)": round(self.host_self_dur, 3), + "Host Total Duration(us)": round(self.host_total_dur, 3), + "Device Self Duration(us)": round(self.device_self_dur, 3), + "Device Total Duration(us)": round(self.device_total_dur, 3), + "kernels": self.kernel_data, + "Communications": self.communication_data}) + return data + + @property + def info(self): + info = {"id": self.node_id, + "node_type": self.node_type, + "data": self.data, + "upnode": self.parent_node.node_id if self.parent_node else "None", + "subnodes": [node.node_id for node in iter(self.child_nodes)]} + if self.step_id is not None: + info.update({"step_id": self.step_id}) + if self.micro_step_id is not None: + info.update({"micro_step_id": self.micro_step_id}) + return info + + @property + def is_root_node(self): + return self.node_id == Constant.NPU_ROOT_ID + + @property + def precision_index(self): + return self._precision_index + + @precision_index.setter + def precision_index(self, precision_index): + self._precision_index = precision_index + + @step_id.setter + def step_id(self, step_id): + self._step_id = step_id + + @micro_step_id.setter + def micro_step_id(self, micro_step_id): + self._micro_step_id = micro_step_id + + def update_child_nodes(self, node): + self._child_nodes.append(node) + + def reset_child_nodes(self, nodes): + self._child_nodes = nodes + + def update_kernel_total_list(self, kernel_list: list): + self._kernel_total_list.extend(kernel_list) + + def update_communication_total_list(self, communication_list: list): + self._communication_total_list.extend(communication_list) + + def update_child_precision_index(self): + if not self.child_nodes: + return + max_dur = max((node.device_total_dur for node in self.child_nodes)) + min_dur = min((node.device_total_dur for node in self.child_nodes)) + diff_dur = max_dur - min_dur + for node in self.child_nodes: + node.precision_index = 1 - (node.device_total_dur - min_dur) / diff_dur if diff_dur else 1 + + def update_overall_metrics(self, overlap_analysis_event): + if not self.total_kernels and not self.total_communications: + return + device_events = [] + device_events.extend(self.total_kernels) + device_events.extend(self.total_communications) + device_events.sort(key=lambda x: x.start_time) + device_start = device_events[0].start_time + device_end = device_events[-1].end_time + for event in overlap_analysis_event: + if event.start_time >= device_end: + break + if event.end_time <= device_start: + continue + duration_us = float( + min(device_end, event.end_time) - max(device_start, event.start_time)) + if event.name == Constant.COMPUTING_EVENT: + self._computing_time += duration_us + elif event.name == Constant.FREE_EVENT: + self._free_time += duration_us + elif event.name == Constant.UNCOVERED_COMMUNICATION_EVENT: + self._uncovered_comm_time += duration_us + + def update_bwd_overall_metrics(self, overall_metrics): + self._bwd_overall_data = overall_metrics diff --git a/profiler/msprof_analyze/module_visualization/graph_build/__init__.py b/profiler/msprof_analyze/module_visualization/graph_build/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/module_visualization/graph_build/fwd_module_node.py b/profiler/msprof_analyze/module_visualization/graph_build/fwd_module_node.py new file mode 100644 index 0000000000000000000000000000000000000000..27bb52da7960ccb7f7ac51d92552cf461196903d --- /dev/null +++ b/profiler/msprof_analyze/module_visualization/graph_build/fwd_module_node.py @@ -0,0 +1,33 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.prof_common.base_node import BaseNode +from msprof_analyze.prof_common.trace_event_bean import TraceEventBean + + +class FwdModuleNode(BaseNode): + def __init__(self, event: TraceEventBean, parent_node=None): + super().__init__(event, parent_node) + self._bwd_op_list = [] + + @property + def bwd_op_list(self): + return self._bwd_op_list + + @property + def event(self): + return self._event + + def update_bwd_op(self, bwd_op_list: list): + self._bwd_op_list.extend(bwd_op_list) diff --git a/profiler/msprof_analyze/module_visualization/graph_build/prof_graph_builder.py b/profiler/msprof_analyze/module_visualization/graph_build/prof_graph_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..a0c00ef92b3e15e2c2c7c81f76d451db5eacb183 --- /dev/null +++ b/profiler/msprof_analyze/module_visualization/graph_build/prof_graph_builder.py @@ -0,0 +1,237 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from decimal import Decimal + +from msprof_analyze.module_visualization.graph.prof_node import ProfNode +from msprof_analyze.module_visualization.graph_build.fwd_module_node import FwdModuleNode +from msprof_analyze.prof_common.tree_builder import TreeBuilder +from msprof_analyze.prof_common.trace_event_bean import TraceEventBean +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.module_visualization.prof_parse.prof_data_pre_process import ProfDataPreProcess + + +class ProfGraphBuilder: + + def __init__(self, prof_data_path: str): + self._prof_data_path = prof_data_path + self._prof_data = {} + self._fwd_bwd_id = 1 + + @classmethod + def _create_event_bean_from_ops(cls, op_list: list, name: str) -> TraceEventBean: + min_start = min((op.start_time for op in iter(op_list))) + max_end = max((op.end_time for op in iter(op_list))) + # 以反向算子的区间作为反向module的区间范围,为了module包含算子,做了-0.0001 +0.0001处理 + event = TraceEventBean( + {"ts": min_start - Decimal("0.0001"), "dur": float(max_end - min_start + Decimal("0.0001")), "name": name}) + event.event_type = Constant.MODULE_TYPE + return event + + @classmethod + def _trans_flow_to_dict(cls, flow_events: dict, end_events: list) -> dict: + end_event_dict = {} + for event in end_events: + end_event_dict[event.start_time] = event + result_data = {} + for flow in flow_events.values(): + start_point = flow.get("start") + end_point = flow.get("end") + if not start_point or not end_point: + continue + end_event = end_event_dict.get(end_point.start_time) + if end_event: + result_data.setdefault(start_point.start_time, []).append(end_event) + return result_data + + @classmethod + def _create_virtual_node(cls, all_nodes: list): + root_node = all_nodes[0] + virtual_nodes = [] + first_level_nodes = root_node.child_nodes + root_node.reset_child_nodes([]) + merged_nodes = [] + order_id = 1 + for node in first_level_nodes: + if node.node_type == Constant.OPERATOR_TYPE: + merged_nodes.append(node) + continue + if len(merged_nodes) >= 2: + virtual_node = ProfNode(TraceEventBean({"ts": min((node.start_time for node in merged_nodes))}, + f"Operators_Between_Modules_{order_id}"), root_node) + root_node.update_child_nodes(virtual_node) + order_id += 1 + for op_node in merged_nodes: + op_node.parent_node = virtual_node + virtual_node.update_child_nodes(op_node) + virtual_nodes.append(virtual_node) + elif len(merged_nodes) == 1: + root_node.update_child_nodes(merged_nodes[0]) + root_node.update_child_nodes(node) + merged_nodes = [] + if len(merged_nodes) >= 2: + virtual_node = ProfNode(TraceEventBean({"ts": min((node.start_time for node in merged_nodes))}, + f"Operators_Between_Modules_{order_id}"), root_node) + root_node.update_child_nodes(virtual_node) + for op_node in merged_nodes: + op_node.parent_node = virtual_node + virtual_node.update_child_nodes(op_node) + virtual_nodes.append(virtual_node) + elif len(merged_nodes) == 1: + root_node.update_child_nodes(merged_nodes[0]) + all_nodes.extend(virtual_nodes) + + @classmethod + def _set_event_order_id(cls, all_events: list): + name_dict = {} + for event in all_events: + order_id = name_dict.get(event.name, 0) + event.set_id(f"{event.name}_{order_id}") + name_dict[event.name] = order_id + 1 + + def build_graph(self): + self._prof_data = ProfDataPreProcess(self._prof_data_path).run() + all_data = [*self._prof_data.get(Constant.MODULE_EVENT, []), + *self.find_bwd_module(), + *self._prof_data.get(Constant.CPU_OP_EVENT, [])] + all_data.sort(key=lambda x: x.start_time) + self._set_event_order_id(all_data) + all_nodes = TreeBuilder.build_tree(all_data, ProfNode, TraceEventBean({}, Constant.NPU_ROOT_ID)) + if len(all_nodes) < 2: + msg = "Failed to build graph." + raise RuntimeError(msg) + self._update_kernel_details(all_nodes[0]) + self._update_communication_details(all_nodes[0]) + self._create_virtual_node(all_nodes) + self._update_precision_index_and_overall_metrics(all_nodes) + self._update_step_info(all_nodes[0]) + return all_nodes + + def find_bwd_module(self) -> list: + bwd_module_list = [] + fwdbwd_flow = self._prof_data.get(Constant.FWD_BWD_FLOW, {}) + fwdbwd_flow = {key: value for key, value in fwdbwd_flow.items() if + value.get("start") and value.get("end") and value.get("start").tid != value.get("end").tid} + module_list = self._prof_data.get(Constant.MODULE_EVENT, []) + cpu_op_list = self._prof_data.get(Constant.CPU_OP_EVENT, []) + if not fwdbwd_flow or not module_list or not cpu_op_list: + return bwd_module_list + fwd_tid = module_list[0].tid + bwd_tid = fwd_tid + for end_point in (flow.get("end") for flow in fwdbwd_flow.values()): + if end_point: + bwd_tid = end_point.tid + break + if fwd_tid == bwd_tid: + return bwd_module_list + # 将每一个反向包成一个module,名字叫“nn.Module: BACKWARD_0” + cpu_op_list.sort(key=lambda x: x.start_time) + pre_status = Constant.FWD_OR_OPT + bwd_op_list = [] + for op in cpu_op_list: + if op.tid == bwd_tid: + bwd_op_list.append(op) + pre_status = Constant.BACKWARD + continue + elif pre_status == Constant.BACKWARD: + bwd_module_list.append(self._create_event_bean_from_ops(bwd_op_list, Constant.BACKWARD_MODULE)) + bwd_module_list.extend(self._match_fwd_module(module_list, fwdbwd_flow, bwd_op_list)) + bwd_op_list.clear() + pre_status = Constant.FWD_OR_OPT + if bwd_op_list: + bwd_module_list.append(self._create_event_bean_from_ops(bwd_op_list, Constant.BACKWARD_MODULE)) + bwd_module_list.extend(self._match_fwd_module(module_list, fwdbwd_flow, bwd_op_list)) + bwd_op_list.clear() + return bwd_module_list + + def _match_fwd_module(self, module_list, fwdbwd_flow, bwd_op_list): + # 通过连线匹配正向module,构建出反向的整体module关系 + bwd_module_list = [] + all_nodes = TreeBuilder.build_tree(module_list, FwdModuleNode, TraceEventBean({})) + root_node = all_nodes[0] + fwdbwd_flow_dict = self._trans_flow_to_dict(fwdbwd_flow, bwd_op_list) + for start_time, end_events in fwdbwd_flow_dict.items(): + matched_node = root_node.binary_search(start_time) + while matched_node != Constant.INVALID_RETURN: + matched_node.update_bwd_op(end_events) + matched_node = matched_node.binary_search(start_time) + for module_node in all_nodes: + if module_node.bwd_op_list: + module_node.event.fwd_bwd_id = self._fwd_bwd_id + bwd_module_list.append( + self._create_event_bean_from_ops(module_node.bwd_op_list, f"{module_node.name} [BACKWARD]")) + bwd_module_list[-1].fwd_bwd_id = self._fwd_bwd_id + self._fwd_bwd_id += 1 + return bwd_module_list + + def _update_kernel_details(self, root_node): + kernel_flow_dict = self._trans_flow_to_dict(self._prof_data.get(Constant.TORCH_TO_NPU_FLOW, {}), + self._prof_data.get(Constant.KERNEL_EVENT, [])) + for start_time, kernels in kernel_flow_dict.items(): + matched_node = root_node.binary_search(start_time) + while matched_node != Constant.INVALID_RETURN: + matched_node.update_kernel_total_list(kernels) + matched_node = matched_node.binary_search(start_time) + + def _update_communication_details(self, root_node): + communication_flow_dict = self._trans_flow_to_dict(self._prof_data.get(Constant.TORCH_TO_NPU_FLOW, {}), + self._prof_data.get(Constant.HCCL_EVENT, [])) + for start_time, communications in communication_flow_dict.items(): + matched_node = root_node.binary_search(start_time) + while matched_node != Constant.INVALID_RETURN: + matched_node.update_communication_total_list(communications) + matched_node = matched_node.binary_search(start_time) + + def _update_step_info(self, root_node): + first_level_nodes = root_node.child_nodes + step_events = self._prof_data.get(Constant.STEP_EVENT, []) + node_dict = {} + if not step_events: + node_dict[None] = first_level_nodes + else: + for node in first_level_nodes: + for step_event in step_events: + if step_event.start_time <= node.start_time <= step_event.end_time: + node.step_id = step_event.step_id + node_dict.setdefault(step_event.step_id, []).append(node) + break + for nodes in node_dict.values(): + micro_step_list = [] + micro_events = [] + for node in nodes: + micro_events.append(node) + if node.is_backward: + micro_step_list.append(micro_events) + micro_events = [] + if micro_step_list: + micro_step_list[-1].extend(micro_events) + else: + micro_step_list.append(micro_events) + for index, micro_events in enumerate(micro_step_list): + for node in micro_events: + node.micro_step_id = index + + def _update_precision_index_and_overall_metrics(self, all_nodes: list): + overlap_analysis_event = self._prof_data.get(Constant.OVERLAP_ANALYSIS_EVENT, []) + overlap_analysis_event.sort(key=lambda x: x.start_time) + bwd_infos = {} + for node in all_nodes: + node.update_child_precision_index() + if node.node_type != Constant.OPERATOR_TYPE: + node.update_overall_metrics(overlap_analysis_event) + if node.is_bwd and node.fwd_bwd_id: + bwd_infos[node.fwd_bwd_id] = node.overall_data + for node in all_nodes: + if node.node_type != Constant.OPERATOR_TYPE and not node.is_bwd: + node.update_bwd_overall_metrics(bwd_infos.get(node.fwd_bwd_id, {})) diff --git a/profiler/msprof_analyze/module_visualization/prof_graph_export.py b/profiler/msprof_analyze/module_visualization/prof_graph_export.py new file mode 100644 index 0000000000000000000000000000000000000000..acb178f7e7e60ea733f93ccbcb7bccdbae458442 --- /dev/null +++ b/profiler/msprof_analyze/module_visualization/prof_graph_export.py @@ -0,0 +1,58 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import logging +import os.path +from datetime import datetime + +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.file_reader import FileReader +from msprof_analyze.prof_common.path_manager import PathManager +from msprof_analyze.module_visualization.graph_build.prof_graph_builder import ProfGraphBuilder + + +class ProfGraphExport: + @classmethod + def export_to_json(cls, prof_data_path: str, output_path: str): + logging.basicConfig(format="%(asctime)s - %(levelname)s - %(message)s") + output_path = os.path.abspath(output_path) + prof_data_path = os.path.abspath(prof_data_path) + try: + PathManager.input_path_common_check(prof_data_path) + PathManager.check_input_directory_path(output_path) + PathManager.make_dir_safety(output_path) + PathManager.check_path_writeable(output_path) + except RuntimeError as err: + logging.error(err) + try: + cls.generate_graph_data(prof_data_path, output_path) + except RuntimeError as err: + logging.error(err) + + @classmethod + def generate_graph_data(cls, prof_data_path: str, output_path: str): + all_nodes = ProfGraphBuilder(prof_data_path).build_graph() + result_data = {"root": Constant.NPU_ROOT_ID, "node": {}} + for node in all_nodes: + result_data["node"][node.node_id] = node.info + step_list = list(set([node.step_id for node in all_nodes[0].child_nodes if node.step_id is not None])) + if step_list: + result_data["StepList"] = step_list + micro_steps = len( + set([node.micro_step_id for node in all_nodes[0].child_nodes if node.micro_step_id is not None])) + result_data["MicroSteps"] = micro_steps + file_name = "prof_graph_json_{}.vis".format(datetime.utcnow().strftime("%Y%m%d%H%M%S%f")[:-3]) + FileReader.write_json_file(output_path, result_data, file_name) + logging.info("Performance data has been converted into a graph-structured file: %s", + os.path.join(output_path, file_name)) diff --git a/profiler/msprof_analyze/module_visualization/prof_parse/__init__.py b/profiler/msprof_analyze/module_visualization/prof_parse/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/module_visualization/prof_parse/prof_data_pre_process.py b/profiler/msprof_analyze/module_visualization/prof_parse/prof_data_pre_process.py new file mode 100644 index 0000000000000000000000000000000000000000..2d39649d58d543fc295bdc0296bd244c25f506f3 --- /dev/null +++ b/profiler/msprof_analyze/module_visualization/prof_parse/prof_data_pre_process.py @@ -0,0 +1,137 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import logging +import os + +from msprof_analyze.prof_common.file_reader import FileReader +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.kernel_bean import KernelBean +from msprof_analyze.prof_common.trace_event_bean import TraceEventBean + + +class ProfDataPreProcess: + def __init__(self, prof_data_path: str): + self._prof_data_path = prof_data_path + self._trace_path = "" + self._kernel_details_path = "" + self._kernel_pid = None + self._hccl_pid = None + self._overlap_analysis_pid = None + self._result_data = {Constant.CPU_OP_EVENT: [], Constant.MODULE_EVENT: [], Constant.KERNEL_EVENT: [], + Constant.TORCH_TO_NPU_FLOW: {}, Constant.FWD_BWD_FLOW: {}, Constant.HCCL_EVENT: [], + Constant.OVERLAP_ANALYSIS_EVENT: [], Constant.STEP_EVENT: []} + + @staticmethod + def _check_trace_data(trace_data): + if not isinstance(trace_data, list): + msg = f"Invalid profiling data path, this feature only supports performance data " \ + f"collected by Ascend PyTorch Profiler." + raise RuntimeError(msg) + + def run(self) -> dict: + self._check_trace_path() + self._parse_trace_events() + self._parse_kernel_details() + self._check_result_data() + return self._result_data + + def _check_trace_path(self): + if os.path.isfile(self._prof_data_path): + (split_file_path, split_file_name) = os.path.split(self._prof_data_path) + (shot_name, extension) = os.path.splitext(split_file_name) + if extension != ".json": + msg = f"Invalid profiling path suffix: {self._prof_data_path}. " \ + f"You should input in a json file path, such as trace_view.json." + raise RuntimeError(msg) + self._trace_path = self._prof_data_path + return + ascend_output = os.path.join(self._prof_data_path, "ASCEND_PROFILER_OUTPUT") + profiler_output = ascend_output if os.path.isdir(ascend_output) else self._prof_data_path + json_path = os.path.join(profiler_output, "trace_view.json") + if not os.path.isfile(json_path): + msg = f"Invalid profiling path: {self._prof_data_path}. The data path should be the " \ + f"folder that ends with the ascend_pt collected by the Ascend PyTorch Profiler." + raise RuntimeError(msg) + kernel_path = os.path.join(profiler_output, "kernel_details.csv") + if os.path.isfile(kernel_path): + self._kernel_details_path = kernel_path + self._trace_path = json_path + + def _parse_trace_events(self): + trace_data = FileReader.read_json_file(self._trace_path) + self._check_trace_data(trace_data) + iter_trace_data = [TraceEventBean(data) for data in trace_data] + for event in iter_trace_data: + if self._kernel_pid is not None and self._hccl_pid is not None and self._overlap_analysis_pid is not None: + break + if not event.is_meta(): + continue + if event.is_npu_process(): + self._kernel_pid = event.pid + elif event.is_hccl_process(): + self._hccl_pid = event.pid + elif event.is_overlap_analysis_process(): + self._overlap_analysis_pid = event.pid + if self._kernel_pid is None: + msg = "There is no operator on the NPU side for this data, please check whether the NPU switch is enabled." + raise RuntimeError(msg) + for event in iter_trace_data: + if event.is_optimizer(): + event.event_type = Constant.MODULE_TYPE + self._result_data[Constant.MODULE_EVENT].append(event) + elif event.is_cpu_op(): + if event.is_step(): + self._result_data[Constant.STEP_EVENT].append(event) + else: + event.event_type = Constant.OPERATOR_TYPE + self._result_data[Constant.CPU_OP_EVENT].append(event) + elif event.is_nn_module(): + event.event_type = Constant.MODULE_TYPE + self._result_data[Constant.MODULE_EVENT].append(event) + elif event.is_torch_to_npu(): + if event.is_flow_start(): + self._result_data[Constant.TORCH_TO_NPU_FLOW].setdefault(event.id, {})["start"] = event + else: + self._result_data[Constant.TORCH_TO_NPU_FLOW].setdefault(event.id, {})["end"] = event + elif event.is_fwd_bwd_flow(): + if event.is_flow_start(): + self._result_data[Constant.FWD_BWD_FLOW].setdefault(event.id, {})["start"] = event + else: + self._result_data[Constant.FWD_BWD_FLOW].setdefault(event.id, {})["end"] = event + elif event.is_kernel_event(self._kernel_pid): + self._result_data[Constant.KERNEL_EVENT].append(event) + elif event.is_hccl_event(self._hccl_pid): + self._result_data[Constant.HCCL_EVENT].append(event) + elif event.is_overlap_analysis_event(self._overlap_analysis_pid): + self._result_data[Constant.OVERLAP_ANALYSIS_EVENT].append(event) + + def _parse_kernel_details(self): + if not self._kernel_details_path: + return + try: + all_kernels = FileReader.read_csv_file(self._kernel_details_path, KernelBean) + except Exception as e: + logging.error(e) + kernels = list(filter(lambda x: x.is_computing_op, all_kernels)) + if kernels: + self._result_data[Constant.KERNEL_EVENT] = kernels + + def _check_result_data(self): + if not self._result_data.get(Constant.CPU_OP_EVENT): + msg = "This data does not have any aten operator, please make sure to enable the CPU switch." + raise RuntimeError(msg) + if not [event for event in self._result_data.get(Constant.MODULE_EVENT) if event.is_nn_module()]: + msg = "This data does not collect any modules, please make sure to enable the with_stack or with_modules." + raise RuntimeError(msg) diff --git a/profiler/msprof_analyze/osrt_trace/README.md b/profiler/msprof_analyze/osrt_trace/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0ffb70415c60922fdff1c9de07313f8ee81f6aed --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/README.md @@ -0,0 +1,157 @@ +# MSOSRT Trace系统库函数耗时检测 + +OSRT(OS runtime libraries trace)是根据Linux操作系统运行时库采集用户层库函数API的调用信息。MSOSRT(MindStudio OSRT)则是采集Linux C库函数和POSIX线程(pthread)库中典型的高耗时接口,即可能阻塞用户进程的函数(如read、ioctl,pthread_mutex_lock等),统计其耗时信息,帮助用户分析进程阻塞的原因。 + +## 使用方法 + +1. 约束条件:仅支持Linux系统,拥有g++编译环境和glibc、pthread等标准库。 +2. 将mstt代码仓下载到本地,进入到profiler/msprof_analyze/osrt_trace目录,执行`bash build.sh`,生成`libmsosrt_trace.so`。 +3. 执行`export LD_PRELOAD=./libmsosrt_trace.so:$LD_PRELOAD`,将`libmsosrt_trace.so`加入到LD_PRELOAD环境变量中。 +4. 设置检测阈值和导出目录的环境变量: + + ```bash + # 检测阈值,正整数,只统计超过阈值的库函数,单位:ns,默认为10000000 + export MSOSRT_TRACE_THRESHOLD=10000000 + # 导出目录,字符串,设置检测结果导出的目录,默认为当前目录 + export MSOSRT_EXPORT_PATH="./osrt_trace_result" + ``` + +5. 执行用户进程,如`python main.py` + +6. 用户进程执行结束后,在MSOSRT_EXPORT_PATH路径下会生成检测结果,生成结果文件:msosrt_trace\_{进程号}\_{进程名}.csv,如`msosrt_trace_2328177_python3.csv`,文件内容包含pid、tid、函数名、开始执行时间和耗时等信息,如下所示: + + | Pid | Tid | Function | StartTime(ns) | Duration(ns) | + | ------: | ------: | ----------------: | ------------------: | -----------: | + | 2328177 | 2328280 | pthread_cond_wait | 1725398310787080000 | 3088062410 | + | 2328177 | 2328282 | pthread_cond_wait | 1725398310787170000 | 3087994240 | + | 2328177 | 2328480 | read | 1725398318916180000 | 100509970 | + | 2328177 | 2328440 | ioctl | 1725398319218640000 | 512040720 | + | 2328177 | 2328177 | free | 1725398330504550000 | 56386880 | + +## 检测接口 + +MSOSRT支持检测如下操作系统库函数: + +- 内存操作 + + ```c + malloc + realloc + free + mmap + munmap + mremap + msync + mprotect + brk + ``` + +- 文件操作 + + ```c + dup + dup2 + dup3 + tee + splice + fallocate + fdatasync + fsync + fcntl + flock + lockf + truncate + ftruncate + ioctl + open + openat + pipe + pipe2 + mkfifo + mkfifoat + read + pread + readv + preadv + preadv2 + write + pwrite + writev + pwritev + pwritev2 + copy_file_range + sync + syncfs + sync_file_range + vmsplice + process_vm_readv + process_vm_writev + fclose + fcloseall + fflush + fgetc + fgets + fputc + fputs + flockfile + ftrylockfile + funlockfile + fopen + freopen + fread + fwrite + getdelim + getline + getc + putc + getc_unlocked + putc_unlocked + fflush_unlocked + fgetc_unlocked + fputc_unlocked + fread_unlocked + fwrite_unlocked + fgets_unlocked + fputs_unlocked + ``` + +- 网络操作 + + ```c + socket + socketpair + epoll_ctl + epoll_wait + epoll_pwait + select + listen + accept + accept4 + bind + poll + ppoll + send + sendto + sendmsg + sendmmsg + sendfile + recv + recvfrom + recvmsg + recvmmsg + ``` + +- 线程操作 + + ```c + pthread_mutex_lock + pthread_mutex_timedlock + pthread_cond_signal + pthread_cond_broadcast + pthread_cond_wait + pthread_cond_timedwait + pthread_rwlock_rdlock + pthread_rwlock_timedrdlock + pthread_rwlock_wrlock + pthread_rwlock_timedwrlock + ``` \ No newline at end of file diff --git a/profiler/msprof_analyze/osrt_trace/build.sh b/profiler/msprof_analyze/osrt_trace/build.sh new file mode 100644 index 0000000000000000000000000000000000000000..bb153e6247122c922dc5cea247be43bfec3d5430 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/build.sh @@ -0,0 +1 @@ +g++ ./src/*.cpp -std=c++11 -fPIC -fstack-protector-all -fno-strict-aliasing -fno-common -fvisibility=hidden -fvisibility-inlines-hidden -Wfloat-equal -Wextra -O2 -shared -lpthread -ldl -o libmsosrt_trace.so \ No newline at end of file diff --git a/profiler/msprof_analyze/osrt_trace/src/file_func.cpp b/profiler/msprof_analyze/osrt_trace/src/file_func.cpp new file mode 100644 index 0000000000000000000000000000000000000000..319dcb227b139adf158d55fe762f97afdfa5fdd8 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/file_func.cpp @@ -0,0 +1,664 @@ +#include "file_func.h" + +#include + +#include "msosrt_trace.h" + +void FileFuncProxy::loadFunc() +{ + LOAD_FUNC(dup, DupFunc); + LOAD_FUNC(dup2, Dup2Func); + LOAD_FUNC(dup3, Dup3Func); + LOAD_FUNC(tee, TeeFunc); + LOAD_FUNC(splice, SpliceFunc); + LOAD_FUNC(fallocate, FallocateFunc); + LOAD_FUNC(fdatasync, FdatasyncFunc); + LOAD_FUNC(fsync, FsyncFunc); + LOAD_FUNC(fcntl, FcntlFunc); + LOAD_FUNC(flock, FlockFunc); + LOAD_FUNC(lockf, LockfFunc); + LOAD_FUNC(truncate, TruncateFunc); + LOAD_FUNC(ftruncate, FtruncateFunc); + LOAD_FUNC(ioctl, IoctlFunc); + LOAD_FUNC(open, OpenFunc); + LOAD_FUNC(openat, OpenatFunc); + LOAD_FUNC(pipe, PipeFunc); + LOAD_FUNC(pipe2, Pipe2Func); + LOAD_FUNC(mkfifo, MkfifoFunc); + LOAD_FUNC(mkfifoat, MkfifoatFunc); + LOAD_FUNC(read, ReadFunc); + LOAD_FUNC(pread, PreadFunc); + LOAD_FUNC(readv, ReadvFunc); + LOAD_FUNC(preadv, PreadvFunc); + LOAD_FUNC(preadv2, Preadv2Func); + LOAD_FUNC(write, WriteFunc); + LOAD_FUNC(pwrite, PwriteFunc); + LOAD_FUNC(writev, WritevFunc); + LOAD_FUNC(pwritev, PwritevFunc); + LOAD_FUNC(pwritev2, Pwritev2Func); + LOAD_FUNC(copy_file_range, CopyFileRangeFunc); + LOAD_FUNC(sync, SyncFunc); + LOAD_FUNC(syncfs, SyncfsFunc); + LOAD_FUNC(sync_file_range, SyncFileRangeFunc); + LOAD_FUNC(vmsplice, VmspliceFunc); + LOAD_FUNC(process_vm_readv, ProcessVmReadvFunc); + LOAD_FUNC(process_vm_writev, ProcessVmWritevFunc); + LOAD_FUNC(fclose, FcloseFunc); + LOAD_FUNC(fcloseall, FcloseallFunc); + LOAD_FUNC(fflush, FflushFunc); + LOAD_FUNC(fgetc, FgetcFunc); + LOAD_FUNC(fgets, FgetsFunc); + LOAD_FUNC(fputc, FputcFunc); + LOAD_FUNC(fputs, FputsFunc); + LOAD_FUNC(flockfile, FlockfileFunc); + LOAD_FUNC(ftrylockfile, FtrylockfileFunc); + LOAD_FUNC(funlockfile, FunlockfileFunc); + LOAD_FUNC(fopen, FopenFunc); + LOAD_FUNC(freopen, FreopenFunc); + LOAD_FUNC(fread, FreadFunc); + LOAD_FUNC(fwrite, FwriteFunc); + LOAD_FUNC(getdelim, GetdelimFunc); + LOAD_FUNC(getline, GetlineFunc); + LOAD_FUNC(getc, GetcFunc); + LOAD_FUNC(putc, PutcFunc); + LOAD_FUNC(getc_unlocked, GetcUnlockedFunc); + LOAD_FUNC(putc_unlocked, PutcUnlockedFunc); + LOAD_FUNC(fflush_unlocked, FflushUnlockedFunc); + LOAD_FUNC(fgetc_unlocked, FgetcUnlockedFunc); + LOAD_FUNC(fputc_unlocked, FputcUnlockedFunc); + LOAD_FUNC(fread_unlocked, FreadUnlockedFunc); + LOAD_FUNC(fwrite_unlocked, FwriteUnlockedFunc); + LOAD_FUNC(fgets_unlocked, FgetsUnlockedFunc); + LOAD_FUNC(fputs_unlocked, FputsUnlockedFunc); +} + +int dup(int oldfd) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_dup(oldfd); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int dup2(int oldfd, int newfd) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_dup2(oldfd, newfd); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int dup3(int oldfd, int newfd, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_dup3(oldfd, newfd, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_tee(fd_in, fd_out, len, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t splice(int fd_in, off_t* off_in, int fd_out, off_t* off_out, size_t len, unsigned int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_splice(fd_in, off_in, fd_out, off_out, len, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fallocate(int fd, int mode, off_t offset, off_t len) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fallocate(fd, mode, offset, len); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fdatasync(int fildes) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fdatasync(fildes); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fsync(int fd) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fsync(fd); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fcntl(int fd, int op, ...) +{ + global_osrt_func.loadFunc(); + va_list args; + va_start(args, op); + void* arg = va_arg(args, void*); + va_end(args); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fcntl(fd, op, arg); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int flock(int fd, int op) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_flock(fd, op); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int lockf(int fd, int op, off_t len) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_lockf(fd, op, len); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int truncate(const char* path, off_t length) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_truncate(path, length); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int ftruncate(int fildes, off_t length) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_ftruncate(fildes, length); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int ioctl(int fd, int op, ...) +{ + global_osrt_func.loadFunc(); + va_list args; + va_start(args, op); + void* arg = va_arg(args, void*); + va_end(args); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_ioctl(fd, op, arg); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int open(const char* pathname, int flags, ...) +{ + global_osrt_func.loadFunc(); + va_list args; + va_start(args, flags); + mode_t arg = va_arg(args, mode_t); + va_end(args); + uint64_t start_time = nsec_now(); + auto ret = arg ? global_osrt_func.file_func.real_open(pathname, flags, arg) : global_osrt_func.file_func.real_open(pathname, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int openat(int dirfd, const char *pathname, int flags, ...) +{ + global_osrt_func.loadFunc(); + va_list args; + va_start(args, flags); + mode_t arg = va_arg(args, mode_t); + va_end(args); + uint64_t start_time = nsec_now(); + auto ret = arg ? global_osrt_func.file_func.real_openat(dirfd, pathname, flags, arg) : global_osrt_func.file_func.real_openat(dirfd, pathname, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pipe(int pipefd[2]) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_pipe(pipefd); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pipe2(int pipefd[2], int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_pipe2(pipefd, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int mkfifo(const char* pathname, mode_t mode) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_mkfifo(pathname, mode); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int mkfifoat(int dirfd, const char* pathname, mode_t mode) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_mkfifoat(dirfd, pathname, mode); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t read(int fd, void* buf, size_t count) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_read(fd, buf, count); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t pread(int fd, void* buf, size_t count, off_t offset) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_pread(fd, buf, count, offset); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t readv(int fd, const struct iovec* iov, int iovcnt) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_readv(fd, iov, iovcnt); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t preadv(int fd, const struct iovec* iov, int iovcnt, off_t offset) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_preadv(fd, iov, iovcnt, offset); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t preadv2(int fd, const struct iovec* iov, int iovcnt, off_t offset, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_preadv2(fd, iov, iovcnt, offset, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t write(int fd, const void* buf, size_t count) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_write(fd, buf, count); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t pwrite(int fd, const void* buf, size_t count, off_t offset) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_pwrite(fd, buf, count, offset); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t writev(int fd, const struct iovec* iov, int iovcnt) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_writev(fd, iov, iovcnt); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t pwritev(int fd, const struct iovec* iov, int iovcnt, off_t offset) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_pwritev(fd, iov, iovcnt, offset); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t pwritev2(int fd, const struct iovec* iov, int iovcnt, off_t offset, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_pwritev2(fd, iov, iovcnt, offset, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t copy_file_range(int fd_in, off_t* off_in, int fd_out, off_t* off_out, size_t len, unsigned int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_copy_file_range(fd_in, off_in, fd_out, off_out, len, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void sync(void) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + global_osrt_func.file_func.real_sync(); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); +} + +int syncfs(int fd) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_syncfs(fd); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int sync_file_range(int fd, off_t offset, off_t nbytes, unsigned int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_sync_file_range(fd, offset, nbytes, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t vmsplice(int fd, const struct iovec* iov, size_t nr_segs, unsigned int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_vmsplice(fd, iov, nr_segs, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t process_vm_readv(pid_t pid, const struct iovec* local_iov, unsigned long liovcnt, + const struct iovec* remote_iov, unsigned long riovcnt, unsigned long flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_process_vm_readv(pid, local_iov, liovcnt, remote_iov, riovcnt, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t process_vm_writev(pid_t pid, const struct iovec* local_iov, unsigned long liovcnt, + const struct iovec* remote_iov, unsigned long riovcnt, unsigned long flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_process_vm_writev(pid, local_iov, liovcnt, remote_iov, riovcnt, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fclose(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fclose(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fcloseall(void) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fcloseall(); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fflush(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fflush(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fgetc(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fgetc(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +char* fgets(char* s, int size, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + char* ret = global_osrt_func.file_func.real_fgets(s, size, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fputc(int c, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fputc(c, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fputs(const char* s, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fputs(s, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void flockfile(FILE* filehandle) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + global_osrt_func.file_func.real_flockfile(filehandle); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); +} + +int ftrylockfile(FILE* filehandle) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_ftrylockfile(filehandle); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void funlockfile(FILE* filehandle) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + global_osrt_func.file_func.real_funlockfile(filehandle); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); +} + +FILE* fopen(const char* pathname, const char* mode) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fopen(pathname, mode); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +FILE* freopen(const char* pathname, const char* mode, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_freopen(pathname, mode, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +size_t fread(void* ptr, size_t size, size_t nmemb, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fread(ptr, size, nmemb, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +size_t fwrite(const void* ptr, size_t size, size_t nitems, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fwrite(ptr, size, nitems, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t getdelim(char** lineptr, size_t* n, int delimiter, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_getdelim(lineptr, n, delimiter, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t getline(char** lineptr, size_t* n, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_getline(lineptr, n, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int getc(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_getc(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int putc(int c, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_putc(c, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int getc_unlocked(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_getc_unlocked(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int putc_unlocked(int c, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_putc_unlocked(c, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fflush_unlocked(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fflush_unlocked(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fgetc_unlocked(FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fgetc_unlocked(stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fputc_unlocked(int c, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fputc_unlocked(c, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +size_t fread_unlocked(void* ptr, size_t size, size_t n, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fread_unlocked(ptr, size, n, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +size_t fwrite_unlocked(const void* ptr, size_t size, size_t n, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fwrite_unlocked(ptr, size, n, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +char* fgets_unlocked(char* s, int n, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + char* ret = global_osrt_func.file_func.real_fgets_unlocked(s, n, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int fputs_unlocked(const char* s, FILE* stream) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.file_func.real_fputs_unlocked(s, stream); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} diff --git a/profiler/msprof_analyze/osrt_trace/src/file_func.h b/profiler/msprof_analyze/osrt_trace/src/file_func.h new file mode 100644 index 0000000000000000000000000000000000000000..23c6a25eeeddd734a1ab10ecfcb7d3035d2f9a6a --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/file_func.h @@ -0,0 +1,144 @@ +#pragma once + +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include +#include +#include + +using DupFunc = int(*)(int); +using Dup2Func = int(*)(int, int); +using Dup3Func = int(*)(int, int, int); +using TeeFunc = ssize_t(*)(int, int, size_t, unsigned int); +using SpliceFunc = ssize_t(*)(int, off_t*, int, off_t*, size_t, unsigned int); +using FallocateFunc = int(*)(int, int, off_t, off_t); +using FdatasyncFunc = int(*)(int); +using FsyncFunc = int(*)(int); +using FcntlFunc = int(*)(int, int, ...); +using FlockFunc = int(*)(int, int); +using LockfFunc = int(*)(int, int, off_t); +using TruncateFunc = int(*)(const char*, off_t); +using FtruncateFunc = int(*)(int, off_t); +using IoctlFunc = int(*)(int, int, ...); +using OpenFunc = int(*)(const char*, int, ...); +using OpenatFunc = int(*)(int, const char*, int, ...); +using PipeFunc = int(*)(int*); +using Pipe2Func = int(*)(int*, int); +using MkfifoFunc = int(*)(const char*, mode_t); +using MkfifoatFunc = int(*)(int, const char*, mode_t); +using ReadFunc = ssize_t(*)(int, void*, size_t); +using PreadFunc = ssize_t(*)(int, void*, size_t, off_t); +using ReadvFunc = ssize_t(*)(int, const struct iovec*, int); +using PreadvFunc = ssize_t(*)(int, const struct iovec*, int, off_t); +using Preadv2Func = ssize_t(*)(int, const struct iovec*, int, off_t, int); +using WriteFunc = ssize_t(*)(int, const void*, size_t); +using PwriteFunc = ssize_t(*)(int, const void*, size_t, off_t); +using WritevFunc = ssize_t(*)(int, const struct iovec*, int); +using PwritevFunc = ssize_t(*)(int, const struct iovec*, int, off_t); +using Pwritev2Func = ssize_t(*)(int, const struct iovec*, int, off_t, int); +using CopyFileRangeFunc = ssize_t(*)(int, off_t*, int, off_t*, size_t, unsigned int); +using SyncFunc = void(*)(void); +using SyncfsFunc = int(*)(int); +using SyncFileRangeFunc = int(*)(int, off_t, off_t, unsigned int); +using VmspliceFunc = ssize_t(*)(int, const struct iovec*, size_t, unsigned int); +using ProcessVmReadvFunc = ssize_t(*)(pid_t, const struct iovec*, unsigned long, const struct iovec*, unsigned long, unsigned long); +using ProcessVmWritevFunc = ssize_t(*)(pid_t, const struct iovec*, unsigned long, const struct iovec*, unsigned long, unsigned long); +using FcloseFunc = int(*)(FILE*); +using FcloseallFunc = int(*)(void); +using FflushFunc = int(*)(FILE*); +using FgetcFunc = int(*)(FILE*); +using FgetsFunc = char*(*)(char*, int, FILE*); +using FputcFunc = int(*)(int, FILE*); +using FputsFunc = int(*)(const char*, FILE*); +using FlockfileFunc = void(*)(FILE*); +using FtrylockfileFunc = int(*)(FILE*); +using FunlockfileFunc = void(*)(FILE*); +using FopenFunc = FILE*(*)(const char*, const char*); +using FreopenFunc = FILE*(*)(const char*, const char*, FILE*); +using FreadFunc = size_t(*)(void*, size_t, size_t, FILE*); +using FwriteFunc = size_t(*)(const void*, size_t, size_t, FILE*); +using GetdelimFunc = ssize_t(*)(char**, size_t*, int, FILE*); +using GetlineFunc = ssize_t(*)(char**, size_t*, FILE*); +using GetcFunc = int(*)(FILE*); +using PutcFunc = int(*)(int, FILE*); +using GetcUnlockedFunc = int(*)(FILE*); +using PutcUnlockedFunc = int(*)(int, FILE*); +using FflushUnlockedFunc = int(*)(FILE*); +using FgetcUnlockedFunc = int(*)(FILE*); +using FputcUnlockedFunc = int(*)(int, FILE*); +using FreadUnlockedFunc = size_t(*)(void*, size_t, size_t, FILE*); +using FwriteUnlockedFunc = size_t(*)(const void*, size_t, size_t, FILE*); +using FgetsUnlockedFunc = char*(*)(char*, int, FILE*); +using FputsUnlockedFunc = int(*)(const char*, FILE*); + +struct FileFuncProxy +{ + DupFunc real_dup = nullptr; + Dup2Func real_dup2 = nullptr; + Dup3Func real_dup3 = nullptr; + TeeFunc real_tee = nullptr; + SpliceFunc real_splice = nullptr; + FallocateFunc real_fallocate = nullptr; + FdatasyncFunc real_fdatasync = nullptr; + FsyncFunc real_fsync = nullptr; + FcntlFunc real_fcntl = nullptr; + FlockFunc real_flock = nullptr; + LockfFunc real_lockf = nullptr; + TruncateFunc real_truncate = nullptr; + FtruncateFunc real_ftruncate = nullptr; + IoctlFunc real_ioctl = nullptr; + OpenFunc real_open = nullptr; + OpenatFunc real_openat = nullptr; + PipeFunc real_pipe = nullptr; + Pipe2Func real_pipe2 = nullptr; + MkfifoFunc real_mkfifo = nullptr; + MkfifoatFunc real_mkfifoat = nullptr; + ReadFunc real_read = nullptr; + PreadFunc real_pread = nullptr; + ReadvFunc real_readv = nullptr; + PreadvFunc real_preadv = nullptr; + Preadv2Func real_preadv2 = nullptr; + WriteFunc real_write = nullptr; + PwriteFunc real_pwrite = nullptr; + WritevFunc real_writev = nullptr; + PwritevFunc real_pwritev = nullptr; + Pwritev2Func real_pwritev2 = nullptr; + CopyFileRangeFunc real_copy_file_range = nullptr; + SyncFunc real_sync = nullptr; + SyncfsFunc real_syncfs = nullptr; + SyncFileRangeFunc real_sync_file_range = nullptr; + VmspliceFunc real_vmsplice = nullptr; + ProcessVmReadvFunc real_process_vm_readv = nullptr; + ProcessVmWritevFunc real_process_vm_writev = nullptr; + FcloseFunc real_fclose = nullptr; + FcloseallFunc real_fcloseall = nullptr; + FflushFunc real_fflush = nullptr; + FgetcFunc real_fgetc = nullptr; + FgetsFunc real_fgets = nullptr; + FputcFunc real_fputc = nullptr; + FputsFunc real_fputs = nullptr; + FlockfileFunc real_flockfile = nullptr; + FtrylockfileFunc real_ftrylockfile = nullptr; + FunlockfileFunc real_funlockfile = nullptr; + FopenFunc real_fopen = nullptr; + FreopenFunc real_freopen = nullptr; + FreadFunc real_fread = nullptr; + FwriteFunc real_fwrite = nullptr; + GetdelimFunc real_getdelim = nullptr; + GetlineFunc real_getline = nullptr; + GetcFunc real_getc = nullptr; + PutcFunc real_putc = nullptr; + GetcUnlockedFunc real_getc_unlocked = nullptr; + PutcUnlockedFunc real_putc_unlocked = nullptr; + FflushUnlockedFunc real_fflush_unlocked = nullptr; + FgetcUnlockedFunc real_fgetc_unlocked = nullptr; + FputcUnlockedFunc real_fputc_unlocked = nullptr; + FreadUnlockedFunc real_fread_unlocked = nullptr; + FwriteUnlockedFunc real_fwrite_unlocked = nullptr; + FgetsUnlockedFunc real_fgets_unlocked = nullptr; + FputsUnlockedFunc real_fputs_unlocked = nullptr; + + void loadFunc(); +}; diff --git a/profiler/msprof_analyze/osrt_trace/src/msosrt_trace.cpp b/profiler/msprof_analyze/osrt_trace/src/msosrt_trace.cpp new file mode 100644 index 0000000000000000000000000000000000000000..a3a88b05480193ce9bee6c26480e214f69e4ddf0 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/msosrt_trace.cpp @@ -0,0 +1,476 @@ +#include "msosrt_trace.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#if !defined (__linux__) || !defined(__GLIBC__) +#error "This tool only works on Linux!" +#endif + +#ifdef __cplusplus +extern "C" { +#endif +static void setup_trace() __attribute ((constructor)); +static void end_trace() __attribute ((destructor)); +#ifdef __cplusplus +} +#endif + +// Special handling exit func +static void (*real_exit)(int status) __attribute__((noreturn)) = nullptr; +static void (*real__exit)(int status) __attribute__((noreturn)) = nullptr; +static void (*real__Exit)(int status) __attribute__((noreturn)) = nullptr; + +static __thread bool RECURSIVE = false; +static volatile bool INITIALIZED = false; + +namespace { +pid_t GetPid() +{ + static thread_local pid_t pid = getpid(); + return pid; +} + +pid_t GetTid() +{ + static thread_local pid_t tid = gettid(); + return tid; +} + +const char* DUMP_FILE = "msosrt_trace_"; +char EXPORT_PATH[PATH_MAX]; + +const size_t RECORD_LENGTH = 512 * 1024; // Default number of trace data records +struct { + OSRTRecord data_[RECORD_LENGTH]; + std::atomic index_{0}; + bool is_full_ = false; + + void recordData(const char* function, uint64_t start_time, uint64_t duration) + { + size_t index = index_.load(std::memory_order_relaxed); + if (index + 1 >= RECORD_LENGTH) { + index_.store(0, std::memory_order_relaxed); + is_full_ = true; + } else { + index_.fetch_add(1, std::memory_order_relaxed); + } + auto& record = data_[index]; + record.pid = GetPid(); + record.tid = GetTid(); + record.function = function; + record.start_time = start_time; + record.duration = duration; + } + + size_t size() + { + return is_full_ ? RECORD_LENGTH : index_.load(std::memory_order_relaxed); + } + + bool hasValidData() + { + pid_t pid = getpid(); + for (size_t i = 0, len = size(); i < len; ++i) { + if (data_[i].pid == pid && data_[i].function != nullptr) { + return true; + } + } + return false; + } +} OSRT_RECORD_QUEUE; +} + +OSRTFunc global_osrt_func; + +void OSRTFunc::loadFunc() +{ + static volatile bool loaded = false; + if (LIKELY(loaded)) { + return; + } + RECURSIVE = true; + LOAD_FUNC(malloc, MallocFunc); + LOAD_FUNC(realloc, ReallocFunc); + LOAD_FUNC(free, FreeFunc); + LOAD_FUNC(mmap, MmapFunc); + LOAD_FUNC(munmap, MunmapFunc); + LOAD_FUNC(mremap, MremapFunc); + LOAD_FUNC(msync, MsyncFunc); + LOAD_FUNC(mprotect, MprotectFunc); + LOAD_FUNC(brk, BrkFunc); + + LOAD_FUNC(pthread_mutex_lock, PthreadMutexLockFunc); + LOAD_FUNC(pthread_mutex_timedlock, PthreadMutexTimedlockFunc); + LOAD_FUNC(pthread_cond_signal, PthreadCondSignalFunc); + LOAD_FUNC(pthread_cond_broadcast, PthreadCondBroadcastFunc); + LOAD_FUNC(pthread_cond_wait, PthreadCondWaitFunc); + LOAD_FUNC(pthread_cond_timedwait, PthreadCondTimedwaitFunc); + LOAD_FUNC(pthread_rwlock_rdlock, PthreadRwlockRdlockFunc); + LOAD_FUNC(pthread_rwlock_timedrdlock, PthreadRwlockTimedrdlockFunc); + LOAD_FUNC(pthread_rwlock_wrlock, PthreadRwlockWrlockFunc); + LOAD_FUNC(pthread_rwlock_timedwrlock, PthreadRwlockTimedwrlockFunc); + + real_exit = reinterpret_cast(dlsym(RTLD_NEXT, "exit")); + real__exit = reinterpret_cast(dlsym(RTLD_NEXT, "_exit")); + real__Exit = reinterpret_cast(dlsym(RTLD_NEXT, "_Exit")); + + file_func.loadFunc(); + socket_func.loadFunc(); + + loaded = true; + RECURSIVE = false; +} + +void OSRTFunc::recordFunc(uint64_t start_time, uint64_t duration, const char* name) +{ + if (UNLIKELY(!INITIALIZED || RECURSIVE)) { + return; + } + if (UNLIKELY(duration >= threshold_)) { + RECURSIVE = true; + OSRT_RECORD_QUEUE.recordData(name, start_time, duration); + RECURSIVE = false; + } +} + +void OSRTFunc::dumpFunc() +{ + if (!INITIALIZED) { + return; + } + static std::mutex dump_mutex; + static bool dumped = false; + + std::lock_guard lock(dump_mutex); + if (!dumped) { + RECURSIVE = true; + if (OSRT_RECORD_QUEUE.hasValidData()) { + std::string dump_file; + pid_t pid = getpid(); + // The glibc program_invocation_short_name contains the basename that was used to invoke the calling program + if (program_invocation_short_name != nullptr) { + dump_file = std::string(EXPORT_PATH) + "/" + DUMP_FILE + std::to_string(pid) + "_" + program_invocation_short_name + ".csv"; + } else { + dump_file = std::string(EXPORT_PATH) + "/" + DUMP_FILE + std::to_string(pid) + ".csv"; + } + if (!PathUtils::IsFileExist(dump_file) && !PathUtils::CreateFile(dump_file)) { + fprintf(stderr, "[ERROR] Create msosrt trace file failed.\n"); + RECURSIVE = false; + return; + } + auto fd = fopen(dump_file.c_str(), "ab"); + if (fd == nullptr) { + RECURSIVE = false; + return; + } + fprintf(fd, "%s\n", "Pid,Tid,Function,StartTime(ns),Duration(ns)"); + for (size_t i = 0, len = OSRT_RECORD_QUEUE.size(); i < len; ++i) { + if (OSRT_RECORD_QUEUE.data_[i].pid == pid && OSRT_RECORD_QUEUE.data_[i].function != nullptr) { + fprintf(fd, "%" PRIdMAX ",%" PRIdMAX ",%s,%" PRIu64 ",%" PRIu64 "\n", + static_cast(pid), + static_cast(OSRT_RECORD_QUEUE.data_[i].tid), + OSRT_RECORD_QUEUE.data_[i].function, + OSRT_RECORD_QUEUE.data_[i].start_time, + OSRT_RECORD_QUEUE.data_[i].duration); + } + } + fclose(fd); + } + RECURSIVE = false; + } + dumped = true; +} + +static void setup_trace() +{ + if (LIKELY(INITIALIZED)) { + return; + } + global_osrt_func.loadFunc(); + INITIALIZED = true; + + RECURSIVE = true; + const char* threshold_env_val = getenv("MSOSRT_TRACE_THRESHOLD"); + int64_t threshold = 0; + if (threshold_env_val == nullptr || str_to_i64(threshold_env_val, threshold) != 0) { + fprintf(stderr, "[WARNING] Parse MSOSRT_TRACE_THRESHOLD failed, use default value\n"); + } else { + if (threshold > 0) { + global_osrt_func.threshold_ = threshold; + } else { + fprintf(stderr, "[WARNING] MSOSRT_TRACE_THRESHOLD must be a positive integer, use default value\n"); + } + } + + const char* export_path_env_val = getenv("MSOSRT_EXPORT_PATH"); + std::string dump_path; + if (export_path_env_val != nullptr) { + dump_path = export_path_env_val; + } + if (dump_path.empty()) { + fprintf(stderr, "[WARNING] MSOSRT_EXPORT_PATH is not set, data will export to current working directory\n"); + char cwd_path[PATH_MAX] = {0}; + if (getcwd(cwd_path, PATH_MAX) != nullptr) { + dump_path = cwd_path; + } + } + std::string abs_path = PathUtils::RelativeToAbsPath(dump_path); + if (PathUtils::DirPathCheck(abs_path)) { + std::string real_path = PathUtils::RealPath(abs_path); + strncpy(EXPORT_PATH, real_path.c_str(), real_path.size() < PATH_MAX ? real_path.size() : PATH_MAX); + fprintf(stderr, "[INFO] MSOSRT result export path is: %s\n", real_path.c_str()); + } else { + fprintf(stderr, "[ERROR] Invalid export path, data will not be exported.\n"); + } + RECURSIVE = false; +} + +static void end_trace() +{ + global_osrt_func.dumpFunc(); +} + +void* malloc(size_t size) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return (void*)global_osrt_func.real_malloc(size); + } + uint64_t start_time = nsec_now(); + void* ret = global_osrt_func.real_malloc(size); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void* realloc(void* ptr, size_t size) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return (void*)global_osrt_func.real_realloc(ptr, size); + } + uint64_t start_time = nsec_now(); + void* ret = global_osrt_func.real_realloc(ptr, size); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void free(void* ptr) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + global_osrt_func.real_free(ptr); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); +} + +void* mmap(void* addr, size_t length, int prot, int flags, int fd, off_t offset) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + void* ret = global_osrt_func.real_mmap(addr, length, prot, flags, fd, offset); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void* mremap(void* old_address, size_t old_size, size_t new_size, int flags, ...) +{ + global_osrt_func.loadFunc(); + va_list args; + va_start(args, flags); + void* arg = va_arg(args, void*); + va_end(args); + uint64_t start_time = nsec_now(); + void* ret = global_osrt_func.real_mremap(old_address, old_size, new_size, flags, arg); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int munmap(void* addr, size_t length) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_munmap(addr, length); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int msync(void* addr, size_t length, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_msync(addr, length, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int mprotect(void* addr, size_t len, int prot) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_mprotect(addr, len, prot); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int brk(void* addr) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_brk(addr); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_mutex_lock(pthread_mutex_t* mutex) +{ + if (UNLIKELY(!INITIALIZED && RECURSIVE)) { + // During the initialization phase we might be called inside of dlsym(). + // Since we'd enter an endless loop if we tried to resolved the real + // pthread_mutex_lock() here then we simply fake the lock which should + // be safe since no thread can be running yet. + return 0; + } + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_mutex_lock(mutex); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_mutex_timedlock(pthread_mutex_t* mutex, const struct timespec* abstime) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_mutex_timedlock(mutex, abstime); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_mutex_timedlock(mutex, abstime); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_cond_signal(pthread_cond_t* cond) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_cond_signal(cond); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_cond_signal(cond); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_cond_broadcast(pthread_cond_t* cond) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_cond_broadcast(cond); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_cond_broadcast(cond); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* mutex) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_cond_wait(cond, mutex); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_cond_wait(cond, mutex); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_cond_timedwait(pthread_cond_t* cond, pthread_mutex_t* mutex, const struct timespec* abstime) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_cond_timedwait(cond, mutex, abstime); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_cond_timedwait(cond, mutex, abstime); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_rwlock_rdlock(pthread_rwlock_t* rwlock) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_rwlock_rdlock(rwlock); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_rwlock_rdlock(rwlock); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_rwlock_timedrdlock(pthread_rwlock_t* rwlock, const struct timespec* abstime) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_rwlock_timedrdlock(rwlock, abstime); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_rwlock_timedrdlock(rwlock, abstime); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_rwlock_wrlock(pthread_rwlock_t* rwlock) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + global_osrt_func.real_pthread_rwlock_wrlock(rwlock); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_rwlock_wrlock(rwlock); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int pthread_rwlock_timedwrlock(pthread_rwlock_t* rwlock, const struct timespec* abstime) +{ + global_osrt_func.loadFunc(); + if (UNLIKELY(RECURSIVE)) { + return global_osrt_func.real_pthread_rwlock_timedwrlock(rwlock, abstime); + } + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.real_pthread_rwlock_timedwrlock(rwlock, abstime); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +void exit(int status) +{ + if (LIKELY(INITIALIZED)) { + global_osrt_func.dumpFunc(); + } + real_exit(status); +} + +void _exit(int status) +{ + if (LIKELY(INITIALIZED)) { + global_osrt_func.dumpFunc(); + } + real__exit(status); +} + +void _Exit(int status) +{ + if (LIKELY(INITIALIZED)) { + global_osrt_func.dumpFunc(); + } + real__Exit(status); +} diff --git a/profiler/msprof_analyze/osrt_trace/src/msosrt_trace.h b/profiler/msprof_analyze/osrt_trace/src/msosrt_trace.h new file mode 100644 index 0000000000000000000000000000000000000000..e153ef5138883cd597c0a5a524adc5ec5b555ea4 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/msosrt_trace.h @@ -0,0 +1,207 @@ +#pragma once + +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" +#include "file_func.h" +#include "socket_func.h" + +#define TRACE_API __attribute__((visibility("default"))) +#define LOAD_FUNC(name, func_type) \ + do { \ + (real_##name) = reinterpret_cast(dlsym(RTLD_NEXT, #name)); \ + } while (false) + +#ifdef __cplusplus +extern "C" { +#endif +// memory func +TRACE_API void* malloc(size_t size); +TRACE_API void* realloc(void* ptr, size_t size); +TRACE_API void free(void* ptr); +TRACE_API void* mmap(void* addr, size_t length, int prot, int flags, int fd, off_t offset); +TRACE_API int munmap(void* addr, size_t length); +TRACE_API void* mremap(void* old_address, size_t old_size, size_t new_size, int flags, ... /* void *new_address */); +TRACE_API int msync(void* addr, size_t length, int flags); +TRACE_API int mprotect(void* addr, size_t len, int prot); +TRACE_API int brk(void* addr); +// pthread func +TRACE_API int pthread_mutex_lock(pthread_mutex_t* mutex); +TRACE_API int pthread_mutex_timedlock(pthread_mutex_t* mutex, const struct timespec* abstime); +TRACE_API int pthread_cond_signal(pthread_cond_t* cond); +TRACE_API int pthread_cond_broadcast(pthread_cond_t* cond); +TRACE_API int pthread_cond_wait(pthread_cond_t* cond, pthread_mutex_t* mutex); +TRACE_API int pthread_cond_timedwait(pthread_cond_t* cond, pthread_mutex_t* mutex, const struct timespec* abstime); +TRACE_API int pthread_rwlock_rdlock(pthread_rwlock_t* rwlock); +TRACE_API int pthread_rwlock_timedrdlock(pthread_rwlock_t* rwlock, const struct timespec* abstime); +TRACE_API int pthread_rwlock_wrlock(pthread_rwlock_t* rwlock); +TRACE_API int pthread_rwlock_timedwrlock(pthread_rwlock_t* rwlock, const struct timespec* abstime); +// exit func +TRACE_API void exit(int status) __attribute__((noreturn)); +TRACE_API void _exit(int status) __attribute__((noreturn)); +TRACE_API void _Exit(int status) __attribute__((noreturn)); +// file func +TRACE_API int dup(int oldfd); +TRACE_API int dup2(int oldfd, int newfd); +TRACE_API int dup3(int oldfd, int newfd, int flags); +TRACE_API ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags); +TRACE_API ssize_t splice(int fd_in, off_t* off_in, int fd_out, off_t* off_out, size_t len, unsigned int flags); +TRACE_API int fallocate(int fd, int mode, off_t offset, off_t len); +TRACE_API int fdatasync(int fildes); +TRACE_API int fsync(int fd); +TRACE_API int fcntl(int fd, int op, ...); +TRACE_API int flock(int fd, int op); +TRACE_API int lockf(int fd, int op, off_t len); +TRACE_API int truncate(const char* path, off_t length); +TRACE_API int ftruncate(int fildes, off_t length); +TRACE_API int ioctl(int fd, int op, ...); +TRACE_API int open(const char* pathname, int flags, ... /* mode_t mode */ ); +TRACE_API int openat(int dirfd, const char* pathname, int flags, ... /* mode_t mode */ ); +TRACE_API int pipe(int pipefd[2]); +TRACE_API int pipe2(int pipefd[2], int flags); +TRACE_API int mkfifo(const char* pathname, mode_t mode); +TRACE_API int mkfifoat(int dirfd, const char* pathname, mode_t mode); +TRACE_API ssize_t read(int fd, void* buf, size_t count); +TRACE_API ssize_t pread(int fd, void* buf, size_t count, off_t offset); +TRACE_API ssize_t readv(int fd, const struct iovec* iov, int iovcnt); +TRACE_API ssize_t preadv(int fd, const struct iovec* iov, int iovcnt, off_t offset); +TRACE_API ssize_t preadv2(int fd, const struct iovec* iov, int iovcnt, off_t offset, int flags); +TRACE_API ssize_t write(int fd, const void* buf, size_t count); +TRACE_API ssize_t pwrite(int fd, const void* buf, size_t count, off_t offset); +TRACE_API ssize_t writev(int fd, const struct iovec* iov, int iovcnt); +TRACE_API ssize_t pwritev(int fd, const struct iovec* iov, int iovcnt, off_t offset); +TRACE_API ssize_t pwritev2(int fd, const struct iovec* iov, int iovcnt, off_t offset, int flags); +TRACE_API ssize_t copy_file_range(int fd_in, off_t* off_in, int fd_out, off_t* off_out, size_t len, unsigned int flags); +TRACE_API void sync(void); +TRACE_API int syncfs(int fd); +TRACE_API int sync_file_range(int fd, off_t offset, off_t nbytes, unsigned int flags); +TRACE_API ssize_t vmsplice(int fd, const struct iovec* iov, size_t nr_segs, unsigned int flags); +TRACE_API ssize_t process_vm_readv(pid_t pid, const struct iovec* local_iov, unsigned long liovcnt, + const struct iovec* remote_iov, unsigned long riovcnt, unsigned long flags); +TRACE_API ssize_t process_vm_writev(pid_t pid, const struct iovec* local_iov, unsigned long liovcnt, + const struct iovec* remote_iov, unsigned long riovcnt, unsigned long flags); +TRACE_API int fclose(FILE* stream); +TRACE_API int fcloseall(void); +TRACE_API int fflush(FILE* stream); +TRACE_API int fgetc(FILE* stream); +TRACE_API char* fgets(char* s, int size, FILE* stream); +TRACE_API int fputc(int c, FILE* stream); +TRACE_API int fputs(const char* s, FILE* stream); +TRACE_API void flockfile(FILE* filehandle); +TRACE_API int ftrylockfile(FILE* filehandle); +TRACE_API void funlockfile(FILE* filehandle); +TRACE_API FILE* fopen(const char* pathname, const char* mode); +TRACE_API FILE* freopen(const char* pathname, const char* mode, FILE* stream); +TRACE_API size_t fread(void* ptr, size_t size, size_t nmemb, FILE* stream); +TRACE_API size_t fwrite(const void* ptr, size_t size, size_t nitems, FILE* stream); +TRACE_API ssize_t getdelim(char** lineptr, size_t* n, int delimiter, FILE* stream); +TRACE_API ssize_t getline(char** lineptr, size_t* n, FILE* stream); +TRACE_API int getc(FILE* stream); +TRACE_API int putc(int c, FILE* stream); +TRACE_API int getc_unlocked(FILE* stream); +TRACE_API int putc_unlocked(int c, FILE* stream); +TRACE_API int fflush_unlocked(FILE* stream); +TRACE_API int fgetc_unlocked(FILE* stream); +TRACE_API int fputc_unlocked(int c, FILE* stream); +TRACE_API size_t fread_unlocked(void* ptr, size_t size, size_t n, FILE* stream); +TRACE_API size_t fwrite_unlocked(const void* ptr, size_t size, size_t n, FILE* stream); +TRACE_API char* fgets_unlocked(char* s, int n, FILE* stream); +TRACE_API int fputs_unlocked(const char* s, FILE* stream); +// socket func +TRACE_API int socket(int domain, int type, int protocol); +TRACE_API int socketpair(int domain, int type, int protocol, int sv[2]); +TRACE_API int epoll_ctl(int epfd, int op, int fd, struct epoll_event* event); +TRACE_API int epoll_wait(int epfd, struct epoll_event* events, int maxevents, int timeout); +TRACE_API int epoll_pwait(int epfd, struct epoll_event* events, int maxevents, int timeout, const sigset_t* sigmask); +TRACE_API int select(int nfds, fd_set* readfds, fd_set* writefds, fd_set* exceptfds, struct timeval* timeout); +TRACE_API int listen(int sockfd, int backlog); +TRACE_API int accept(int sockfd, struct sockaddr* addr, socklen_t* addrlen); +TRACE_API int accept4(int sockfd, struct sockaddr* addr, socklen_t* addrlen, int flags); +TRACE_API int bind(int sockfd, const struct sockaddr* addr, socklen_t addrlen); +TRACE_API int poll(struct pollfd* fds, nfds_t nfds, int timeout); +TRACE_API int ppoll(struct pollfd* fds, nfds_t nfds, const struct timespec* tmo_p, const sigset_t* sigmask); +TRACE_API ssize_t send(int sockfd, const void* buf, size_t len, int flags); +TRACE_API ssize_t sendto(int sockfd, const void* buf, size_t len, int flags, const struct sockaddr* dest_addr, socklen_t addrlen); +TRACE_API ssize_t sendmsg(int sockfd, const struct msghdr* msg, int flags); +TRACE_API int sendmmsg(int sockfd, struct mmsghdr* msgvec, unsigned int vlen, int flags); +TRACE_API ssize_t sendfile(int out_fd, int in_fd, off_t* offset, size_t count); +TRACE_API ssize_t recv(int sockfd, void* buf, size_t len, int flags); +TRACE_API ssize_t recvfrom(int sockfd, void* buf, size_t len, int flags, struct sockaddr* src_addr, socklen_t* addrlen); +TRACE_API ssize_t recvmsg(int sockfd, struct msghdr* msg, int flags); +TRACE_API int recvmmsg(int sockfd, struct mmsghdr* msgvec, unsigned int vlen, int flags, struct timespec* timeout); +#ifdef __cplusplus +} +#endif + +using MallocFunc = void*(*)(size_t); +using ReallocFunc = void*(*)(void*, size_t); +using FreeFunc = void(*)(void*); +using MmapFunc = void*(*)(void*, size_t, int, int, int, off_t); +using MunmapFunc = int(*)(void*, size_t); +using MremapFunc = void*(*)(void*, size_t, size_t, int, ...); +using MsyncFunc = int(*)(void*, size_t, int); +using MprotectFunc = int(*)(void*, size_t, int); +using BrkFunc = int(*)(void*); +using PthreadMutexLockFunc = int(*)(pthread_mutex_t*); +using PthreadMutexTimedlockFunc = int(*)(pthread_mutex_t*, const struct timespec*); +using PthreadCondSignalFunc = int(*)(pthread_cond_t*); +using PthreadCondBroadcastFunc = int(*)(pthread_cond_t*); +using PthreadCondWaitFunc = int(*)(pthread_cond_t*, pthread_mutex_t*); +using PthreadCondTimedwaitFunc = int(*)(pthread_cond_t*, pthread_mutex_t*, const struct timespec*); +using PthreadRwlockRdlockFunc = int(*)(pthread_rwlock_t*); +using PthreadRwlockTimedrdlockFunc = int(*)(pthread_rwlock_t*, const struct timespec*); +using PthreadRwlockWrlockFunc = int(*)(pthread_rwlock_t*); +using PthreadRwlockTimedwrlockFunc = int(*)(pthread_rwlock_t*, const struct timespec*); + +struct OSRTRecord { + pid_t pid = 0; + pid_t tid = 0; + const char* function = nullptr; + uint64_t start_time = 0; + uint64_t duration = 0; +}; + +const uint64_t DEFAULT_THRESHOLD = 10 * 1000 * 1000; // 10ms + +struct OSRTFunc { + uint64_t threshold_ = DEFAULT_THRESHOLD; + + MallocFunc real_malloc = nullptr; + ReallocFunc real_realloc = nullptr; + FreeFunc real_free = nullptr; + MmapFunc real_mmap = nullptr; + MunmapFunc real_munmap = nullptr; + MremapFunc real_mremap = nullptr; + MsyncFunc real_msync = nullptr; + MprotectFunc real_mprotect = nullptr; + BrkFunc real_brk = nullptr; + PthreadMutexLockFunc real_pthread_mutex_lock = nullptr; + PthreadMutexTimedlockFunc real_pthread_mutex_timedlock = nullptr; + PthreadCondSignalFunc real_pthread_cond_signal = nullptr; + PthreadCondBroadcastFunc real_pthread_cond_broadcast = nullptr; + PthreadCondWaitFunc real_pthread_cond_wait = nullptr; + PthreadCondTimedwaitFunc real_pthread_cond_timedwait = nullptr; + PthreadRwlockRdlockFunc real_pthread_rwlock_rdlock = nullptr; + PthreadRwlockTimedrdlockFunc real_pthread_rwlock_timedrdlock = nullptr; + PthreadRwlockWrlockFunc real_pthread_rwlock_wrlock = nullptr; + PthreadRwlockTimedwrlockFunc real_pthread_rwlock_timedwrlock = nullptr; + + FileFuncProxy file_func; + SocketFuncProxy socket_func; + + void loadFunc(); + void recordFunc(uint64_t start_time, uint64_t duration, const char* name); + void dumpFunc(); +}; + +extern OSRTFunc global_osrt_func; diff --git a/profiler/msprof_analyze/osrt_trace/src/socket_func.cpp b/profiler/msprof_analyze/osrt_trace/src/socket_func.cpp new file mode 100644 index 0000000000000000000000000000000000000000..f2863c6a515f3d5159eb5e7e1212499d78301df9 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/socket_func.cpp @@ -0,0 +1,217 @@ +#include "socket_func.h" + +#include "msosrt_trace.h" + +void SocketFuncProxy::loadFunc() +{ + LOAD_FUNC(socket, SocketFunc); + LOAD_FUNC(socketpair, SocketpairFunc); + LOAD_FUNC(epoll_ctl, EpollCtlFunc); + LOAD_FUNC(epoll_wait, EpollWaitFunc); + LOAD_FUNC(epoll_pwait, EpollPwaitFunc); + LOAD_FUNC(select, SelectFunc); + LOAD_FUNC(listen, ListenFunc); + LOAD_FUNC(accept, AcceptFunc); + LOAD_FUNC(accept4, Accept4Func); + LOAD_FUNC(bind, BindFunc); + LOAD_FUNC(poll, PollFunc); + LOAD_FUNC(ppoll, PpollFunc); + LOAD_FUNC(send, SendFunc); + LOAD_FUNC(sendto, SendtoFunc); + LOAD_FUNC(sendmsg, SendmsgFunc); + LOAD_FUNC(sendmmsg, SendmmsgFunc); + LOAD_FUNC(sendfile, SendfileFunc); + LOAD_FUNC(recv, RecvFunc); + LOAD_FUNC(recvfrom, RecvfromFunc); + LOAD_FUNC(recvmsg, RecvmsgFunc); + LOAD_FUNC(recvmmsg, RecvmmsgFunc); +} + +int socket(int domain, int type, int protocol) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_socket(domain, type, protocol); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int socketpair(int domain, int type, int protocol, int sv[2]) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_socketpair(domain, type, protocol, sv); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int epoll_ctl(int epfd, int op, int fd, struct epoll_event* event) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_epoll_ctl(epfd, op, fd, event); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int epoll_wait(int epfd, struct epoll_event* events, int maxevents, int timeout) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_epoll_wait(epfd, events, maxevents, timeout); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int epoll_pwait(int epfd, struct epoll_event* events, int maxevents, int timeout, const sigset_t* sigmask) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_epoll_pwait(epfd, events, maxevents, timeout, sigmask); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int select(int nfds, fd_set* readfds, fd_set* writefds, fd_set* exceptfds, struct timeval* timeout) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_select(nfds, readfds, writefds, exceptfds, timeout); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int listen(int sockfd, int backlog) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_listen(sockfd, backlog); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int accept(int sockfd, struct sockaddr* addr, socklen_t* addrlen) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_accept(sockfd, addr, addrlen); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int accept4(int sockfd, struct sockaddr* addr, socklen_t* addrlen, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_accept4(sockfd, addr, addrlen, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int bind(int sockfd, const struct sockaddr* addr, socklen_t addrlen) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_bind(sockfd, addr, addrlen); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int poll(struct pollfd* fds, nfds_t nfds, int timeout) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_poll(fds, nfds, timeout); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int ppoll(struct pollfd* fds, nfds_t nfds, const struct timespec* tmo_p, const sigset_t* sigmask) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_ppoll(fds, nfds, tmo_p, sigmask); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t send(int sockfd, const void* buf, size_t len, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_send(sockfd, buf, len, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t sendto(int sockfd, const void* buf, size_t len, int flags, const struct sockaddr* dest_addr, socklen_t addrlen) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_sendto(sockfd, buf, len, flags, dest_addr, addrlen); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t sendmsg(int sockfd, const struct msghdr* msg, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_sendmsg(sockfd, msg, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int sendmmsg(int sockfd, struct mmsghdr* msgvec, unsigned int vlen, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_sendmmsg(sockfd, msgvec, vlen, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t sendfile(int out_fd, int in_fd, off_t* offset, size_t count) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_sendfile(out_fd, in_fd, offset, count); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t recv(int sockfd, void* buf, size_t len, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_recv(sockfd, buf, len, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t recvfrom(int sockfd, void* buf, size_t len, int flags, struct sockaddr* src_addr, socklen_t* addrlen) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_recvfrom(sockfd, buf, len, flags, src_addr, addrlen); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +ssize_t recvmsg(int sockfd, struct msghdr* msg, int flags) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_recvmsg(sockfd, msg, flags); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} + +int recvmmsg(int sockfd, struct mmsghdr* msgvec, unsigned int vlen, int flags, struct timespec* timeout) +{ + global_osrt_func.loadFunc(); + uint64_t start_time = nsec_now(); + auto ret = global_osrt_func.socket_func.real_recvmmsg(sockfd, msgvec, vlen, flags, timeout); + global_osrt_func.recordFunc(start_time, nsec_now() - start_time, __FUNCTION__); + return ret; +} diff --git a/profiler/msprof_analyze/osrt_trace/src/socket_func.h b/profiler/msprof_analyze/osrt_trace/src/socket_func.h new file mode 100644 index 0000000000000000000000000000000000000000..361ce1d6382eada6cd942d74c2f3e0e7cd8621a0 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/socket_func.h @@ -0,0 +1,60 @@ +#pragma once + +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include +#include +#include +#include +#include + +using SocketFunc = int(*)(int, int, int); +using SocketpairFunc = int(*)(int, int, int, int* sv); +using EpollCtlFunc = int(*)(int, int, int, struct epoll_event*); +using EpollWaitFunc = int(*)(int, struct epoll_event*, int, int); +using EpollPwaitFunc = int(*)(int, struct epoll_event*, int, int, const sigset_t*); +using SelectFunc = int(*)(int, fd_set*, fd_set*, fd_set*, struct timeval*); +using ListenFunc = int(*)(int, int); +using AcceptFunc = int(*)(int, struct sockaddr*, socklen_t*); +using Accept4Func = int(*)(int, struct sockaddr*, socklen_t*, int); +using BindFunc = int(*)(int, const struct sockaddr*, socklen_t); +using PollFunc = int(*)(struct pollfd*, nfds_t, int); +using PpollFunc = int(*)(struct pollfd*, nfds_t, const struct timespec*, const sigset_t*); +using SendFunc = ssize_t(*)(int, const void*, size_t, int); +using SendtoFunc = ssize_t(*)(int, const void*, size_t, int, const struct sockaddr*, socklen_t); +using SendmsgFunc = ssize_t(*)(int, const struct msghdr*, int); +using SendmmsgFunc = int(*)(int, struct mmsghdr*, unsigned int, int); +using SendfileFunc = ssize_t(*)(int, int, off_t*, size_t); +using RecvFunc = ssize_t(*)(int, void*, size_t, int); +using RecvfromFunc = ssize_t(*)(int, void*, size_t, int, struct sockaddr*, socklen_t*); +using RecvmsgFunc = ssize_t(*)(int, struct msghdr*, int); +using RecvmmsgFunc = int(*)(int, struct mmsghdr*, unsigned int, int, struct timespec*); + +struct SocketFuncProxy +{ + SocketFunc real_socket = nullptr; + SocketpairFunc real_socketpair = nullptr; + EpollCtlFunc real_epoll_ctl = nullptr; + EpollWaitFunc real_epoll_wait = nullptr; + EpollPwaitFunc real_epoll_pwait = nullptr; + SelectFunc real_select = nullptr; + ListenFunc real_listen = nullptr; + AcceptFunc real_accept = nullptr; + Accept4Func real_accept4 = nullptr; + BindFunc real_bind = nullptr; + PollFunc real_poll = nullptr; + PpollFunc real_ppoll = nullptr; + SendFunc real_send = nullptr; + SendtoFunc real_sendto = nullptr; + SendmsgFunc real_sendmsg = nullptr; + SendmmsgFunc real_sendmmsg = nullptr; + SendfileFunc real_sendfile = nullptr; + RecvFunc real_recv = nullptr; + RecvfromFunc real_recvfrom = nullptr; + RecvmsgFunc real_recvmsg = nullptr; + RecvmmsgFunc real_recvmmsg = nullptr; + + void loadFunc(); +}; diff --git a/profiler/msprof_analyze/osrt_trace/src/utils.cpp b/profiler/msprof_analyze/osrt_trace/src/utils.cpp new file mode 100644 index 0000000000000000000000000000000000000000..82382d23039e63c7ab2d4475d0dcf7fe2aec9fad --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/utils.cpp @@ -0,0 +1,159 @@ +#include "utils.h" + +#include +#include +#include +#include +#include +#include + +int str_to_i64(const std::string& str, int64_t& num) +{ + if (str.empty()) { + return -1; + } + size_t pos = 0; + try { + num = std::stoll(str, &pos); + } catch (...) { + return -1; + } + if (pos != str.size()) { + return -1; + } + return 0; +} + +bool PathUtils::IsFileExist(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX) { + return false; + } + return (access(path.c_str(), F_OK) == 0) ? true : false; +} + +bool PathUtils::IsFileWritable(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX) { + return false; + } + return (access(path.c_str(), W_OK) == 0) ? true : false; +} + +bool PathUtils::IsDir(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX) { + return false; + } + struct stat st{}; + int ret = lstat(path.c_str(), &st); + if (ret != 0) { + return false; + } + return S_ISDIR(st.st_mode) ? true : false; +} + +bool PathUtils::CreateDir(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX) { + return false; + } + if (IsFileExist(path)) { + return IsDir(path) ? true : false; + } + size_t pos = 0; + while ((pos = path.find_first_of('/', pos)) != std::string::npos) { + std::string base_dir = path.substr(0, ++pos); + if (IsFileExist(base_dir)) { + if (IsDir(base_dir)) { + continue; + } else { + return false; + } + } + if (mkdir(base_dir.c_str(), DATA_DIR_AUTHORITY) != 0) { + return false; + } + } + return (mkdir(path.c_str(), DATA_DIR_AUTHORITY) == 0) ? true : false; +} + +std::string PathUtils::RealPath(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX) { + return ""; + } + char realPath[PATH_MAX] = {0}; + if (realpath(path.c_str(), realPath) == nullptr) { + return ""; + } + return std::string(realPath); +} + +std::string PathUtils::RelativeToAbsPath(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX) { + return ""; + } + if (path[0] != '/') { + char pwd_path[PATH_MAX] = {0}; + if (getcwd(pwd_path, PATH_MAX) != nullptr) { + return std::string(pwd_path) + "/" + path; + } + return ""; + } + return std::string(path); +} + +std::string PathUtils::DirName(const std::string &path) +{ + if (path.empty()) { + return ""; + } + char temp_path[PATH_MAX] = {0}; + strncpy(temp_path, path.c_str(), path.size() < PATH_MAX ? path.size() : PATH_MAX); + char* path_c = dirname(temp_path); + return path_c ? std::string(path_c) : ""; +} + +bool PathUtils::CreateFile(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX || !CreateDir(DirName(path))) { + return false; + } + int fd = creat(path.c_str(), DATA_FILE_AUTHORITY); + return (fd < 0 || close(fd) != 0) ? false : true; +} + +bool PathUtils::IsSoftLink(const std::string &path) +{ + if (path.empty() || path.size() > PATH_MAX || !IsFileExist(path)) { + return false; + } + struct stat st{}; + if (lstat(path.c_str(), &st) != 0) { + return false; + } + return S_ISLNK(st.st_mode); +} + +bool PathUtils::DirPathCheck(const std::string& abs_path) +{ + if (abs_path.empty() || abs_path.size() > PATH_MAX) { + fprintf(stderr, "[ERROR] The length of Path %s is invalid.\n", abs_path.c_str()); + return false; + } + if (IsSoftLink(abs_path)) { + fprintf(stderr, "[ERROR] Path %s is soft link.\n", abs_path.c_str()); + return false; + } + if (!IsFileExist(abs_path) && !CreateDir(abs_path)) { + fprintf(stderr, "[ERROR] Path %s not exist and create failed.\n", abs_path.c_str()); + return false; + } + if (!IsDir(abs_path) || !IsFileWritable(abs_path)) { + fprintf(stderr, "[ERROR] %s is not a directory or is not writable.\n", abs_path.c_str()); + return false; + } + return true; +} diff --git a/profiler/msprof_analyze/osrt_trace/src/utils.h b/profiler/msprof_analyze/osrt_trace/src/utils.h new file mode 100644 index 0000000000000000000000000000000000000000..129c062d5f2898d0b33db33f4716ae497c6ad8d1 --- /dev/null +++ b/profiler/msprof_analyze/osrt_trace/src/utils.h @@ -0,0 +1,50 @@ +/** + * Copyright 2024 Huawei Technologies Co., Ltd + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#pragma once + +#include +#include +#include + +#define LIKELY(x) (__builtin_expect(!!(x), 1)) +#define UNLIKELY(x) (__builtin_expect(!!(x), 0)) + +const mode_t DATA_FILE_AUTHORITY = 0640; +const mode_t DATA_DIR_AUTHORITY = 0750; + +inline uint64_t nsec_now() +{ + static const uint64_t S_TO_NS = 1000 * 1000 * 1000; + struct timespec ts; + clock_gettime(CLOCK_REALTIME, &ts); + return static_cast(ts.tv_sec * S_TO_NS + ts.tv_nsec); +} + +int str_to_i64(const std::string& str, int64_t& num); + + +struct PathUtils { + static bool IsFileExist(const std::string &path); + static bool IsFileWritable(const std::string &path); + static bool IsDir(const std::string &path); + static bool CreateDir(const std::string &path); + static std::string RealPath(const std::string &path); + static std::string RelativeToAbsPath(const std::string &path); + static std::string DirName(const std::string &path); + static bool CreateFile(const std::string &path); + static bool IsSoftLink(const std::string &path); + static bool DirPathCheck(const std::string &path); +}; diff --git a/profiler/msprof_analyze/precheck/README.md b/profiler/msprof_analyze/precheck/README.md new file mode 100644 index 0000000000000000000000000000000000000000..882cc4e12fff0b64c905d6146d9b7a0d98e7bad9 --- /dev/null +++ b/profiler/msprof_analyze/precheck/README.md @@ -0,0 +1,393 @@ +# Profiler Precheck 用户指南 + +欢迎使用 Profiler Precheck 工具!本指南将详细介绍该工具的功能、使用方法以及内部实现原理,帮助您快速上手并充分利用其性能分析能力。 + +## 目录 +- [1. 概述](#1-概述) +- [2. 整体架构](#2-整体架构) +- [3. 使用方法](#3-使用方法) + - [3.1 云容器场景](#31-云容器场景) + - [3.2 裸机场景](#32-裸机场景) +- [4. 命令参数说明](#4-命令参数说明) +- [5. 常见问题](#5-常见问题) + +## 1. 概述 + +Profiler Precheck 是一个用于分布式训练任务的性能分析工具。它可以自动采集集群中各节点的硬件与软件环境信息,并基于历史数据和专家知识,对当前训练任务的配置与资源使用情况进行分析,给出优化建议与预警,帮助用户发现并解决潜在的性能瓶颈,提升训练效率。 + +## 2. 整体架构 + +Profiler Precheck 采用主从架构,由一个主节点(master节点)和多个从节点(slave节点)组成: + +- **主节点(master节点)**: + - 负责接收用户的任务请求 + - 将 Precheck 相关代码分发到各从节点 + - 协调各节点的分析过程 + - 汇总分析结果生成最终报告 + - 通常是集群训练中rank=0的设备所在的主机节点 + +- **从节点(slave节点)**: + - 负责在本节点上执行用户的训练脚本 + - 运行 Profiler 采集各项性能指标 + - 将结果回传给主节点 + +### 预检流程 +1. **准备阶段**:用户在master节点上提交预检请求,master节点将代码分发到各slave节点 +2. **采集阶段**:各节点启动训练脚本,同时运行 Profiler 采集性能数据 +3. **汇总阶段**:master节点汇总各slave节点上报的性能数据 +4. **分析阶段**:主节点对汇总数据进行分析,生成分析报告 + +### 典型场景 + +## 3. 使用方法 + +### 3.1 云容器场景 +详细预检流程请参考:[云场景预检流程](assert/code_structure_startnode_docker.svg) + +#### 3.1.1 部署流程 + +1. **准备基础环境** +```bash +# 下载并加载基础镜像 +docker load -i user_image.tar + +# 创建训练容器 +docker run -it --name user_container \ + --device=/dev/davinci0 \ + --device=/dev/davinci_manager \ + --device=/dev/devmm_svm \ + --device=/dev/hisi_hdc \ + -v /usr/local/Ascend:/usr/local/Ascend \ + -v /path/to/data:/data \ + -v /path/to/model:/model \ + user_image:latest +``` + +2. **构建预检环境** +```bash +# 安装预检工具 +pip install msprof-analyze-xx.whl + +# 创建预检启动脚本 +cat > /usr/local/bin/run_node_precheck.sh << 'EOF' +#!/bin/bash +msprof-analyze precheck start_node \ + --node_rank ${NODE_RANK:-0} \ + --master_addr ${MASTER_ADDR:-"127.0.0.1"} \ + --master_port ${MASTER_PORT:-29500} \ + --nnodes ${NNODES:-1} \ + --nproc_per_node ${NPUS_PER_NODE:-8} \ + --task_name ${TASK_NAME:-"container_test"} \ + --profiling_cmd ${PROFILING_CMD:-"run.sh"} +EOF +chmod +x /usr/local/bin/run_node_precheck.sh + +# 保存预检镜像 +docker commit user_container precheck_image:latest +docker save -o precheck_image.tar precheck_image:latest +``` + +3. **分发和启动** +```bash +# 在每个节点上加载镜像 +docker load -i precheck_image.tar + +# 启动主节点容器 +docker run -d --name precheck_master \ + --network host \ + --device=/dev/davinci* \ + -v /usr/local/Ascend:/usr/local/Ascend \ + -v /path/to/data:/data \ + -e MASTER_ADDR=192.168.0.1 \ + -e MASTER_PORT=29500 \ + -e NNODES=2 \ + -e NODE_RANK=0 \ + -e NPUS_PER_NODE=8 \ + -e TASK_NAME=container_test \ + -e PROFILING_CMD="run.sh" \ + precheck_image:latest \ + /usr/local/bin/run_node_precheck.sh + +# 启动从节点容器 +docker run -d --name precheck_worker \ + --network host \ + --device=/dev/davinci* \ + -v /usr/local/Ascend:/usr/local/Ascend \ + -v /path/to/data:/data \ + -e MASTER_ADDR=192.168.0.1 \ + -e MASTER_PORT=29500 \ + -e NNODES=2 \ + -e NODE_RANK=1 \ + -e NPUS_PER_NODE=8 \ + -e TASK_NAME=container_test \ + -e PROFILING_CMD="run.sh" \ + precheck_image:latest \ + /usr/local/bin/run_node_precheck.sh +``` + +#### 3.1.2 配置说明 + +##### 容器环境变量 +| 变量名 | 说明 | 默认值 | +|--------|------|--------| +| MASTER_ADDR | 主节点IP地址 | 127.0.0.1 | +| MASTER_PORT | 主节点端口 | 29500 | +| NNODES | 总节点数 | 1 | +| NODE_RANK | 节点序号 | 0 | +| NPUS_PER_NODE | 每节点NPU数量 | 8 | +| TASK_NAME | 预检任务名称 | container_test | +| PROFILING_CMD | 训练命令 | run.sh | + +##### 容器挂载说明 +| 挂载点 | 说明 | 必需 | +|--------|------|------| +| /usr/local/Ascend | CANN工具包 | 是 | +| /data | 训练数据目录 | 否 | +| /model | 模型文件目录 | 否 | +| /output | 输出目录 | 否 | + +### 3.2 裸机场景 +详细预检流程请参考:[裸机场景预检流程](assert/code_structure_startall.svg) + +#### 3.2.1 环境配置验证 + +在开始使用预检工具前,需要确保集群环境配置正确。我们提供了一系列验证脚本帮助您快速检查环境: + +##### 1. SSH 免密配置 +```bash +# 1. 生成SSH密钥(如果已存在则跳过) +[ ! -f ~/.ssh/id_rsa ] && ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa + +# 2. 复制密钥到其他节点(替换用户名和IP) +ssh-copy-id user@192.168.0.2 +``` + +##### 2. 环境检查 +我们提供两个脚本帮助验证集群配置: + +1. **SSH连通性检查** +```bash +# 基础检查(默认5秒超时) +HOST_IPS="192.168.0.1,192.168.0.2" bash test_hosts_ssh.sh + +# 自定义超时时间 +HOST_IPS="192.168.0.1,192.168.0.2" TIMEOUT=10 bash test_hosts_ssh.sh +``` + +2. **集群环境一致性检查** +```bash +# 基础环境检查(Python、PyTorch等) +HOST_IPS="192.168.0.1,192.168.0.2" bash test_hosts_env.sh + +# 完整环境检查(包含CANN环境, developing) +HOST_IPS="192.168.0.1,192.168.0.2" CHECK_CANN=1 bash test_hosts_env.sh +``` + +示例输出: +``` +🔍 Cluster Environment Checker +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +📊 Step 1: Collecting local environment info... +Detecting Python environment... +Checking installed packages... +Checking CANN environment... + +📌 Local Environment Summary: + • Python Path: /usr/bin/python3 + • Python Version: Python 3.8.10 + • Msprof-analyze: v1.3.0 + • Torch: v2.4.0 + • Torch_NPU: v2.2.0 +``` + +#### 3.2.2 启动预检 + +预检工具支持两种使用模式:内置基准测试和自定义训练脚本。 + +##### 方式一:使用内置ResNet基准测试 + +```bash +# 使用IP列表方式 +msprof-analyze precheck start_all \ + --host_ips "192.168.0.1,192.168.0.2" \ + --master_addr 192.168.0.1 \ + --nnodes 2 \ + --nproc_per_node 8 \ + --task_name resnet_test \ + --profiling_cmd "[resnet]" + +# 使用配置文件方式 +msprof-analyze precheck start_all \ + --host_config_file hosts.csv \ + --master_addr 192.168.0.1 \ + --nnodes 2 \ + --nproc_per_node 8 \ + --profiling_cmd "[resnet]" +``` + +##### 方式二:使用自定义训练脚本 + +1. **准备训练脚本** +```python +# train.py +import torch_npu +from torch_npu.profiler import profile, ProfilerActivity + +def train(node_prof_save_dir): + # 配置性能分析器 + with profile( + activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], + on_trace_ready=torch_npu.profiler.tensorboard_trace_handler(node_prof_save_dir) + ) as prof: + # 训练代码 + for epoch in range(num_epochs): + for batch in dataloader: + # 训练逻辑 + ... + prof.step() # 记录性能数据 +``` + +2. **创建启动脚本** + +run.sh示例如下: +```bash +#!/bin/bash + +# 设置性能数据保存目录 +NODE_PROF_SAVE_DIR=${NODE_PROF_SAVE_DIR:-"./output/prof_data"} +mkdir -p "$NODE_PROF_SAVE_DIR" + +# 启动训练 +python3 train.py \ + --prof_dir "$NODE_PROF_SAVE_DIR" \ + --batch-size 32 \ + --epochs 10 \ + "$@" # 支持传入额外参数 +``` + +3. **启动预检分析** +```bash +# 设置执行权限 +chmod +x run.sh + +# 使用相对路径 +msprof-analyze precheck start_all \ + --host_ips "192.168.0.1,192.168.0.2" \ + --master_addr 192.168.0.1 \ + --nnodes 2 \ + --nproc_per_node 8 \ + --task_name custom_test \ + --profiling_cmd "./run.sh --extra-args value" + +# 使用绝对路径(推荐) +msprof-analyze precheck start_all \ + --host_ips "192.168.0.1,192.168.0.2" \ + --master_addr 192.168.0.1 \ + --nnodes 2 \ + --nproc_per_node 8 \ + --task_name custom_test \ + --profiling_cmd "/path/to/run.sh --extra-args value" +``` + +#### 3.2.3 使用注意事项 + +1. **路径设置** + - 建议使用绝对路径指定脚本位置 + - 确保所有节点的脚本路径一致 + - 检查目录和文件的读写权限 + +2. **环境变量** + - `NODE_PROF_SAVE_DIR`: 性能数据保存目录 + - 可通过 `"$@"` 传递额外的训练参数 + +3. **常见问题** + - 确保 run.sh 有执行权限 + - 验证工作目录的正确性 + - 检查性能数据目录是否可写 + +## 4. 命令参数说明 + +### 基本用法 +```bash +msprof-analyze precheck [options] + +Commands: + start_all 启动所有节点的预检 + start_node 启动单个节点的预检 + stop 停止预检(todo) + status 查看预检状态 (todo) +``` + +### 通用参数 +| 参数名 | 类型 | 必需 | 默认值 | 说明 | +|--------|------|------|-----------------------------------------------------------------------------------------------------------------|------| +| master_addr | str | 是 | - | 主节点IP地址 | +| master_port | int | 否 | 29500 | 主节点通信端口 | +| nnodes | int | 是 | - | 总节点数 | +| nproc_per_node | int | 是 | - | 每节点进程数 | +| task_name | str | 否 | auto_timestamp | 任务名称 | +| output_dir | str | 否 | ./output | 输出目录 | +| node_prof_save_dir | str | 否 | {output_dir}/{task_name}/node_prof_save_dir | 节点性能数据保存目录 | +| master_prof_gather_dir | str | 否 | {output_dir}/{task_name}/master_prof_gather_dir | 主节点数据汇总目录 | +| static | bool | 否 | False | 是否使用静态profiler采集模式 | +| prof_in_shared_storage | bool | 否 | False | 是否使用共享存储(跳过数据收集) | +| profiling_cmd | str | 是 | 训练命令说明:
    - `[resnet]`: 运行ResNet基准测试
    - `python train.py [args]`: 自定义训练脚本
    - `bash run.sh [args]`: 自定义训练脚本 | 要求用户自定义脚需要将profiler数据保存到{node_prof_save_dir} + +### start_all 专用参数 +| 参数名 | 类型 | 必需 | 说明 | +|--------|------|------|------| +| host_ips | str | 是* | 节点IP列表,逗号分隔 | +| host_config_file | str | 是* | SSH配置文件路径 | + +*注:host_ips 和 host_config_file 必须提供其中之一 + +### start_node 专用参数 +| 参数名 | 类型 | 必需 | 说明 | +|--------|------|------|------| +| node_rank | int | 是 | 当前节点序号(0 到 nnodes-1) | + +## 5. 常见问题 + +### 5.1 容器场景常见问题 + +1. **容器启动失败** +```bash +# 检查设备挂载 +ls -l /dev/davinci* + +# 检查日志 +docker logs precheck_container +``` + +2. **网络连接问题** +```bash +# 检查网络配置 +docker network inspect precheck_net + +# 测试容器间连接 +docker exec precheck_master ping precheck_worker +``` + +### 5.2 裸机场景常见问题 + +1. **SSH连接超时** +```bash +# 增加连接超时时间 +HOST_IPS="192.168.0.1,192.168.0.2" TIMEOUT=10 bash test_hosts_ssh.sh +``` + +2. **环境不一致** +```bash +# 详细检查环境 +HOST_IPS="192.168.0.1,192.168.0.2" CHECK_CANN=1 bash test_hosts_env.sh +``` + +3. **CANN环境问题** +```bash +# 检查CANN工具 +npu-smi info + +# 检查环境变量 +echo $LD_LIBRARY_PATH | grep Ascend +``` \ No newline at end of file diff --git a/profiler/msprof_analyze/precheck/__init__.py b/profiler/msprof_analyze/precheck/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/__main__.py b/profiler/msprof_analyze/precheck/__main__.py new file mode 100644 index 0000000000000000000000000000000000000000..deb0a713199c5629195ed16d54d1ae67c8df3d78 --- /dev/null +++ b/profiler/msprof_analyze/precheck/__main__.py @@ -0,0 +1,98 @@ +import os +from copy import deepcopy +import logging + +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.precheck.common.logger import add_file_handler, create_logger +from msprof_analyze.precheck.common.utils import cn_now +from msprof_analyze.precheck.manager.args_manager import PrecheckArgsManager +from msprof_analyze.precheck.tools.ssh_utils import run_remote_command +from msprof_analyze.prof_common.path_manager import PathManager + + +def get_command_tpl(): + cwd = os.getcwd() + from msprof_analyze.precheck.runner.__main__ import get_conda_envs_info + _, conda_activate_cmd = get_conda_envs_info() + + EXECUTOR = f'source ~/.bashrc && {conda_activate_cmd} && cd {cwd} && {Constant.MS_PROF_PRECHECK_CMD} start_node' + ARGS = ('--nnodes={nnodes}', '--nproc_per_node={nproc_per_node}', + '--node_rank={node_rank}', '--master_addr={master_addr}', + '--master_port={master_port}', + '--nproc_per_node={nproc_per_node}', + '--node_prof_save_dir={node_prof_save_dir}', + '--master_prof_gather_dir={master_prof_gather_dir}', + '--task_name={task_name}', + '--profiling_cmd="{profiling_cmd}"', + '--output_dir={output_dir}', + ) + TPL = EXECUTOR + " " + " ".join(ARGS) + return TPL + + +def start_precheck(args: PrecheckArgsManager, logger): + config = dict( + nnodes=args.nnodes, + node_rank=-1, + nproc_per_node=args.nproc_per_node, + master_addr=args.master_addr, + master_port=args.master_port, + node_prof_save_dir=args.node_prof_save_dir, + master_prof_gather_dir=args.master_prof_gather_dir, + static=args.static, + task_name=args.task_name, + python_path=args.python_path, + output_dir=args.output_dir, + profiling_cmd=args.profiling_cmd, + prof_in_shared_storage=args.prof_in_shared_storage, + ) + + hosts_info = [] + for node_id, host in enumerate(args.host_ips): + node_config = deepcopy(config) + node_config['node_rank'] = node_id + + TPL = get_command_tpl() + cmd = TPL.format(**node_config) + if node_config.get('static', False) is True: + cmd += ' --static' + if node_config.get('prof_in_shared_storage', False) is True: + cmd += ' --prof_in_shared_storage' + + host_info = { + "host": host, + "username": os.getenv('USER'), + "key_filename": "~/.ssh/id_rsa", + "command": cmd, + "port": 22 + } + + if args.host_config_file: + host_info.update(args.ssh_remote_hosts[host]) + + hosts_info.append(host_info) + + logger.info("Starting remote command execution on %d hosts", len(hosts_info)) + run_remote_command(hosts_info) + logger.info("Precheck main processes have been started on all hosts") + + +def main(args=None): + logger = create_logger("profiler.precheck", Constant.LOGGING_LEVEL, use_memory_handler=True) + + PathManager.make_dir_safety(args.task_output_dir) + + timestamp = cn_now().strftime('%Y%m%d_%H%M%S') + log_filename = f'precheck_{timestamp}.log' + log_file_path = os.path.join(args.task_output_dir, log_filename) + PathManager.create_file_safety(log_file_path) + PathManager.check_path_writeable(log_file_path) + + logger = add_file_handler(logger, log_file_path) + logger.info("Starting precheck, Precheck log file will be saved at %s", log_file_path) + logger.info("Precheck arguments: %s", args) + + try: + start_precheck(args, logger) + except Exception as e: + logger.error("Precheck runner failed with error: %s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) diff --git a/profiler/msprof_analyze/precheck/analyze/__init__.py b/profiler/msprof_analyze/precheck/analyze/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/analyze/advisor_adaptor.py b/profiler/msprof_analyze/precheck/analyze/advisor_adaptor.py new file mode 100644 index 0000000000000000000000000000000000000000..491969e804f2622c4077d9a2abb52b9915b38ca7 --- /dev/null +++ b/profiler/msprof_analyze/precheck/analyze/advisor_adaptor.py @@ -0,0 +1,56 @@ +import sys +import os +import logging +from pathlib import Path + +sys.path.append(os.path.join(os.path.dirname(os.path.dirname(__file__)), "compare_tools")) +sys.path.append(os.path.join(os.path.dirname(os.path.dirname(__file__)), "cluster_analyse")) + +from msprof_analyze.advisor.analyzer.analyzer_controller import AnalyzerController +from msprof_analyze.advisor.interface.interface import Interface +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +class advisor_adaptor: + def __init__(self): + pass + + @staticmethod + def _check_profiling_path_valid(profiling_path): + PathManager.input_path_common_check(profiling_path) + PathManager.check_path_owner_consistent(profiling_path) + if not Path(profiling_path).exists(): + logger.error(" Invalid profiling path: %s", profiling_path) + return False + return True + + @staticmethod + def _check_output_path_valid(output_path): + if not output_path: + return False + + if not os.path.exists(output_path): + return PathManager.make_dir_safety(output_path) + + PathManager.check_input_directory_path(output_path) + PathManager.input_path_common_check(output_path) + PathManager.check_path_owner_consistent(output_path) + return True + + def analyze(self, input_profiling_path, output_path): + if self._check_profiling_path_valid(input_profiling_path) and self._check_output_path_valid(output_path): + try: + reduced_dimensions = Interface.all_dimension[:-1] #advisor 默认调用全部功能,此方法不需要compare功能,故对列表进行处理 + AnalyzerController().do_analysis(dimensions=reduced_dimensions, + profiling_path=input_profiling_path, + benchmark_profiling_path=None, + output_path=output_path, + ) + except RuntimeError as e: + logger.error("RuntimeError during analysis: %s", e) + except Exception as e: + logger.error("Unexpected error during analysis: %s", e) + else: + logger.error("Invalid paths provided; analysis aborted.") diff --git a/profiler/msprof_analyze/precheck/assert/code_structure_startall.svg b/profiler/msprof_analyze/precheck/assert/code_structure_startall.svg new file mode 100644 index 0000000000000000000000000000000000000000..9502f093c35d4a05eef603a0c9e3089075d6ea5b --- /dev/null +++ b/profiler/msprof_analyze/precheck/assert/code_structure_startall.svg @@ -0,0 +1 @@ +Launch LayerPrecheck Control LayerPrecheck Execution LayerData Collection & Analysis LayerUserUserrun_llama2_precheck.sh/run_precheck.shrun_llama2_precheck.sh/run_precheck.shprecheck_cli.pyprecheck_cli.pyprecheck/_ _main_ _.pyprecheck/_ _main_ _.pySSH RunnerSSH Runnerprecheck_cli.py(start_node)precheck_cli.py(start_node)runner/_ _main_ _.pyrunner/_ _main_ _.pyUser Training ScriptUser Training Scripttrain_with_profiler.pytrain_with_profiler.pyCollectorRunnerCollectorRunnerAdvisorRunnerAdvisorRunnerExecute scriptmsprof-analyze precheck start_allConfiguration:1. Node IPs2. Master node settings3. Distributed parameters4. Output directoriesstart_precheck()run_remote_command()loop[for each host]Execute on remote nodestart_precheck_runner()get_conda_envs_info()Auto-detect conda/python envalt[profiling_cmd == "[resnet]"]Execute example modelInitialize profilerTraining loop1. Load model & dataset2. Configure optimizer3. Execute training steps4. Collect metricsComplete training[profiling_cmd == custom command]Prepare environmentSet distributed env vars:- MASTER_ADDR- MASTER_PORT- NNODES- NODE_RANK- NPROC_PER_NODEExecute via bashExample:torchrun $DISTRIBUTED_ARGS \pretrain_gpt.py \$MODEL_ARGS \$PROFILE_ARGS \...Training completealt[not prof_in_shared_storage]Package profiling datazip_directory()1. Compress profiling data2. Filter by whitelist patterns3. Check archive size limitstransport()1. Transfer to master node2. Handle node rank specific logicCollection completealt[rank == 0]Analyze collected datarun_analyzer()1. Extract archives2. Process ascend_pt files3. Generate reportsAnalysis completeExecution completeNode completeAll nodes completePrecheck completeCommand completeDisplay completion \ No newline at end of file diff --git a/profiler/msprof_analyze/precheck/assert/code_structure_startnode_docker.svg b/profiler/msprof_analyze/precheck/assert/code_structure_startnode_docker.svg new file mode 100644 index 0000000000000000000000000000000000000000..a3bcca97fefddc8b3fa3123452c879a58d074e6c --- /dev/null +++ b/profiler/msprof_analyze/precheck/assert/code_structure_startnode_docker.svg @@ -0,0 +1 @@ +Cloud Platform LayerLaunch LayerPrecheck Execution LayerData Collection & Analysis LayerUserUserCloud PlatformCloud PlatformDocker ContainersDocker Containersrun_node_precheck.shrun_node_precheck.shprecheck_cli.pyprecheck_cli.pyrunner/_ _main_ _.pyrunner/_ _main_ _.pyUser Training ScriptUser Training Scripttrain_with_profiler.pytrain_with_profiler.pyCollectorRunnerCollectorRunnerAdvisorRunnerAdvisorRunnerPlatform Configuration1. Upload Docker image2. Configure cluster settings(nodes, NPUs per node)3. Set training parameters(model, dataset, etc.)Container DeploymentDeploy containers across cluster nodesPrecheck Executionloop[For each container in parallel]Execute with env vars(MASTER_ADDR, NODES,NODE_RANK, etc.)msprof-analyze precheck start_nodeInitialize precheck sessionget_conda_envs_info()1. Detect conda environment2. Get activation command3. Setup environment varsalt[profiling_cmd == "[resnet]"]Execute example modelInitialize profilerTraining loop1. Load model & dataset2. Configure optimizer3. Execute training steps4. Collect metricsComplete training[profiling_cmd == custom command]Prepare environmentSet distributed env vars:- MASTER_ADDR- MASTER_PORT- NNODES- NODE_RANK- NPROC_PER_NODEExecute via bashExample:torchrun $DISTRIBUTED_ARGS \custom_training.py \$MODEL_ARGS \$PROFILE_ARGS \...Initialize profilerTraining loop1. Load custom configuration2. Setup distributed env3. Execute training steps4. Collect profiling dataTraining completealt[not prof_in_shared_storage]Package profiling datazip_directory()1. Compress profiling data2. Filter by whitelist patterns3. Check archive size limitstransport()1. Transfer to master node2. Handle node rank specific logicCollection completealt[rank == 0]Analyze collected datarun_analyzer()1. Extract archives2. Process ascend_pt files3. Generate reportsAnalysis completePrecheck completeCommand finishedContainer task completeAll containers finishedResultsReturn profiling resultsand analysis report \ No newline at end of file diff --git a/profiler/msprof_analyze/precheck/collect/__init__.py b/profiler/msprof_analyze/precheck/collect/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/collect/collector.py b/profiler/msprof_analyze/precheck/collect/collector.py new file mode 100644 index 0000000000000000000000000000000000000000..ca74b3e6769106ec23c8e789ca8eb170fbef600a --- /dev/null +++ b/profiler/msprof_analyze/precheck/collect/collector.py @@ -0,0 +1,458 @@ +import sys +import os +from typing import Any, Dict + +sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))) + +import logging +from pathlib import Path +import argparse +import time +import math + +import torch +import torch_npu +import torch.distributed as dist +import numpy as np +import torch.multiprocessing as mp + +from msprof_analyze.prof_common.path_manager import PathManager +from msprof_analyze.precheck.manager.group_manager import GroupManager, EnvGroup, SubGroup +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.precheck.common.time_stat import TimeStat +from msprof_analyze.precheck.common.utils import create_npu_event, event_elaspe_second, parition_sub_group_ranks, \ + get_master_rank_collect_dir, get_slave_rank_collect_dir, cat_files, is_equal_file_hash, get_quick_hash, \ + compress_directory +from msprof_analyze.precheck.manager.disk_manager import DiskManager + + +class Collector: + + def __init__(self): + self.stream = None + self.time_stat = None + self.world_size = None + self.device = None + self.local_rank = None + self.rank = None + self.logger = logging.getLogger(__name__) + + def init(self, slave_env: EnvGroup): + self.rank = slave_env.rank + self.local_rank = slave_env.local_rank + torch.npu.set_device(self.local_rank) + self.device = torch.device('npu:%d' % self.local_rank) + self.world_size = slave_env.world_size + self.time_stat = TimeStat() + self.stream = torch_npu.npu.current_stream() + + def gather_rank_data(self, group, gather_tensor, all_gather=False, dst_rank=None) -> tuple: + cur_group_size = dist.get_world_size(group) + self.logger.debug( + "[Rank %d] Local rank %d, gather data from %d ranks" % (self.rank, self.local_rank, cur_group_size)) + wait_event = create_npu_event(self.stream) + dist.barrier(group=group) + start_event = create_npu_event(self.stream) + wait_time = event_elaspe_second(self.stream, wait_event, start_event) + if all_gather: + gather_list = [] + for _ in range(cur_group_size): + gather_list.append(torch.zeros_like(gather_tensor, dtype=gather_tensor.dtype, device=self.device)) + dist.all_gather(gather_list, gather_tensor, group=group) + else: + if self.rank == dst_rank: + gather_list = [] + for _ in range(cur_group_size): + gather_list.append(torch.zeros_like(gather_tensor, dtype=gather_tensor.dtype, device=self.device)) + else: + gather_list = None + dist.gather(gather_tensor, gather_list=gather_list, dst=dst_rank, group=group) + end_event = create_npu_event(self.stream) + transfer_time = event_elaspe_second(self.stream, start_event, end_event) + + return gather_list, wait_time, transfer_time + + def create_sub_group(self, file_sizes_hash, master_rank_num): + # 需要根据file_sizes来划分sub_group ranks + file_sizes = [item[0] for item in file_sizes_hash[master_rank_num:]] + partitions = parition_sub_group_ranks(master_rank_num, file_sizes) + self.logger.debug("[Rank %d] subgroup partiitons %s" % (self.rank, partitions)) + + wait_time = 0 + transfer_time = 0 + for ranks in partitions: + if len(ranks) > 1: + wait_event = create_npu_event(self.stream) + dist.barrier() + start_event = create_npu_event(self.stream) + wait_time = event_elaspe_second(self.stream, wait_event, start_event) + sub_group = dist.new_group(ranks=ranks, backend='hccl') + end_event = create_npu_event(self.stream) + transfer_time = event_elaspe_second(self.stream, start_event, end_event) + + self.logger.info( + '[Rank %d] after new group, ranks: %s, file_sizes_hash %s' % (self.rank, ranks, file_sizes_hash)) + cur_file_sizes = [file_sizes_hash[r].cpu().tolist()[0] for r in ranks[1:]] + cur_file_hashes = [file_sizes_hash[r].cpu().tolist()[1:] for r in ranks[1:]] + + GroupManager().add_rank_sub_group(sub_group=sub_group, ranks=ranks, file_sizes=cur_file_sizes, + file_hashes=cur_file_hashes) + else: + self.logger.debug('[Rank %d] ranks %s not enough for creating subgroup' % (self.rank, ranks)) + self.time_stat.init_pg_stat.sub_group_init = [wait_time, transfer_time] + + def bd_split_file_size(self, sub_group, split_size=None): + split_size_bd = torch.tensor([split_size], dtype=torch.int64, device=self.device) \ + if self.rank == sub_group.master_rank else torch.zeros(1, dtype=torch.int64, device=self.device) + wait_event = create_npu_event(self.stream) + dist.barrier(group=sub_group.group) + start_event = create_npu_event(self.stream) + wait_time = event_elaspe_second(self.stream, wait_event, start_event) + self.logger.info("[Rank %d] after split size barrier" % self.rank) + dist.broadcast(split_size_bd, group=sub_group.group, src=sub_group.master_rank) + end_event = create_npu_event(self.stream) + transfer_time = event_elaspe_second(self.stream, start_event, end_event) + self.logger.info("[Rank %d] after split size bd, %s" % (self.rank, split_size_bd)) + + self.time_stat.com_stat.broad_splits = [wait_time, transfer_time] + return split_size_bd.cpu().item() + + def gather_file_split(self, sub_group, tensor, master_rank_num, output_file_dir=None): + for i in range(sub_group.max_splits): + # is master node + if self.rank < master_rank_num: + cur_tensor = torch.zeros(sub_group.split_file_size, dtype=torch.uint8, device=self.device) + else: + start_time = time.perf_counter() + cur_tensor = tensor[i * sub_group.split_file_size: (i + 1) * sub_group.split_file_size] + if len(cur_tensor) < sub_group.split_file_size: + cur_tensor = np.pad(cur_tensor, (0, sub_group.split_file_size - len(cur_tensor)), 'constant', + constant_values=0) + cur_tensor = torch.tensor(cur_tensor, dtype=torch.uint8, device=self.device) + end_time = time.perf_counter() + self.time_stat.disk_stat.read_input_file_splits.append(end_time - start_time) + + # gather rank data内部有barrier与计时 + file_tensor_list, wait_time, transfer_time = self.gather_rank_data(dst_rank=sub_group.master_rank, + group=sub_group.group, + gather_tensor=cur_tensor) + self.logger.debug("[Rank %d] gather file split %d, wait time: %f, gather time: %f seconds" % ( + self.rank, i, wait_time, transfer_time)) + self.time_stat.com_stat.gather_file_splits.append([wait_time, transfer_time]) + + # 记录从memory_on_chip刷到硬盘中的耗时 + if file_tensor_list: + master_rank_collect_dir = get_master_rank_collect_dir(output_file_dir, self.rank) + memory_on_chip_ram_times = [] + ram_disk_times = [] + for rank_i, rank in enumerate(sub_group.ranks): + if rank != sub_group.master_rank: + group_rank = rank - master_rank_num + rank_dir = get_slave_rank_collect_dir(master_rank_collect_dir, group_rank) + if not os.path.exists(rank_dir): + os.makedirs(rank_dir, exist_ok=True) + rank_file = os.path.join(rank_dir, 'split_%d' % i) + cur_split_size = sub_group.splits[rank_i - 1][i] + if cur_split_size > 0: + start_time = time.perf_counter() + data = file_tensor_list[rank_i][:cur_split_size].cpu().numpy().tobytes() + ram_time = time.perf_counter() + with open(rank_file, 'wb') as f: + f.write(data) + end_time = time.perf_counter() + memory_on_chip_ram_times.append(ram_time - start_time) + ram_disk_times.append(end_time - ram_time) + + self.time_stat.disk_stat.memory_on_chip.append(memory_on_chip_ram_times) + self.time_stat.disk_stat.ram_disk.append(ram_disk_times) + + for tensor in file_tensor_list: + del tensor + del file_tensor_list + torch.npu.empty_cache() + + def concat_file_split(self, output_file_dir: str, sub_group: SubGroup, master_rank_num): + cur_rank_collect_dir = get_master_rank_collect_dir(output_file_dir, self.rank) + concat_times = [] + verify_hash_times = [] + for rank_i, rank in enumerate(sub_group.ranks): + # 只提取slave rank的case + if rank == self.rank: + continue + group_rank = rank - master_rank_num + rank_dir = get_slave_rank_collect_dir(cur_rank_collect_dir, group_rank) + output_file_name = os.path.join(rank_dir, 'merge.zip') + file_split_names = [] + start_time = time.perf_counter() + with open(output_file_name, 'wb') as output_file: + for split_i in range(sub_group.max_splits): + file_split = os.path.join(rank_dir, 'split_%d' % split_i) + if not os.path.exists(file_split): + self.logger.error('[Rank %d] not exist file split %s' % (self.rank, file_split)) + else: + file_split_names.append(file_split) + cat_files(output_file_name, input_files=file_split_names) + for file_split in file_split_names: + os.remove(file_split) + + end_time = time.perf_counter() + concat_times.append(end_time - start_time) + self.logger.debug( + '[Rank %d] concatenate slave rank %s, time: %f seconds' % (self.rank, rank, end_time - start_time)) + + start_time = time.perf_counter() + output_file_hash = get_quick_hash(output_file_name) + self.logger.debug('[Rank %d] rank_i %d, file_hashs:%s' % (self.rank, rank_i, sub_group.file_hashes)) + if not is_equal_file_hash(output_file_hash, sub_group.file_hashes[rank_i - 1]): + self.logger.error('[Rank %d] Not equal merge file hash. %s. %s' % ( + self.rank, output_file_hash, sub_group.file_hashes[rank_i - 1])) + end_time = time.perf_counter() + verify_hash_times.append(end_time - start_time) + + self.time_stat.disk_stat.hash_output_file = verify_hash_times + self.time_stat.disk_stat.concat_file = concat_times + + def master_node_run(self, master_env: EnvGroup, output_file_dir, split_file_size=None): + try: + # 设置环境变量,这些会在torch.dist中用到 + # 因为master node rank为0, 所以global rank直接等于local rank + master_env.set_env() + self.init(master_env) + + start_event = create_npu_event(self.stream) + self.logger.info('[Rank %d] Start master node process' % self.rank) + torch.npu.set_device(self.device) + init_process_group_event = create_npu_event(self.stream) + elp_time = event_elaspe_second(self.stream, start_event, init_process_group_event) + self.logger.debug('[Rank %d] init process group time %f seconds' % (self.rank, elp_time)) + self.time_stat.init_pg_stat.global_group_init = elp_time + + self.logger.info("[Rank %d] master node run" % (self.rank)) + # Step 2. Gather tensor size from slave node. + gather_tensor = torch.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=torch.int64, device=self.device) + # 分为 (file_size, file_hash) + dist.init_process_group(backend='hccl', rank=self.rank, world_size=self.world_size) + if not (dist.is_available() and dist.is_initialized()): + raise RuntimeError("Distributed environment is not available") + + file_sizes_hash, wait_time, transfer_time = self.gather_rank_data(group=dist.group.WORLD, + gather_tensor=gather_tensor, + all_gather=True) + self.time_stat.com_stat.gather_file_size = [wait_time, transfer_time] + + self.logger.debug('[Rank %d] gather file size time %f seconds' % (self.rank, transfer_time)) + + # 判断硬盘空间是否足够,解压的过程中需要额外的空间存储临时文件与原压缩包 + file_sizes = [item[0] for item in file_sizes_hash[master_env.local_world_size:]] + total_file_size = sum(file_sizes) + + total_size_gb = Constant.UNZIP_DISK_SIZE_RAIO * total_file_size / (1024 * 1024 * 1024) + + self.logger.debug( + '[Rank %d] collect file sizes %s, total size %fgb' % (self.rank, file_sizes, total_file_size)) + DiskManager.check_disk_space(output_file_dir, total_size_gb) + + # Step 3. broadcast子通信域配置,建立子通信域 + self.logger.info("[Rank %d] creating sub group %s" % (self.rank, file_sizes_hash)) + self.create_sub_group(file_sizes_hash, master_env.local_world_size) + sub_group = GroupManager().get_rank_sub_group(self.rank) + + # 以下进入每个子通信域特定的逻辑 + if sub_group: + self.logger.info("[Rank %d] Subgroup ranks %s, file_sizes %s" % ( + self.rank, sub_group.ranks, sub_group.file_sizes)) + + # 未指定split file size的话,根据memory_on_chip/rank_num计算 + if not split_file_size: + if len(sub_group.ranks) > 0: + split_file_size = math.floor(Constant.MASTER_RANK_MEMORY_ON_CHIP / (len(sub_group.ranks))) + else: + logger.error("Value of sub_group.ranks is invalid, %d.", len(sub_group.ranks)) + self.bd_split_file_size(sub_group, split_file_size) + sub_group.split_size(split_file_size) + self.logger.info("[Rank %d] Subgroup split file size %s, splits %s" % ( + self.rank, sub_group.split_file_size, sub_group.splits)) + self.gather_file_split(sub_group=sub_group, tensor=None, master_rank_num=master_env.local_world_size, + output_file_dir=output_file_dir) + self.logger.debug("[Rank %d] start concat file split" % self.rank) + self.concat_file_split(output_file_dir, sub_group, master_env.local_world_size) + if len(sub_group.ranks) > 1: + self.logger.info(self.time_stat.to_string()) + else: + self.logger.info("[Rank %d] master rank not in sub group" % self.rank) + dist.barrier() + except Exception as e: + self.logger.error("%s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + raise e + finally: + dist.destroy_process_group() + + def slave_node_run(self, slave_env: EnvGroup, input_file_dir, master_rank_num): + try: + self.logger.debug('Enter slave node run wrapper') + # 设置环境变量,这些会在torch.dist中用到 + slave_env.set_env() + self.init(slave_env) + torch.npu.set_device(self.device) + start_event = create_npu_event(self.stream) + init_process_group_event = create_npu_event(self.stream) + elp_time = event_elaspe_second(self.stream, start_event, init_process_group_event) + self.time_stat.init_pg_stat.global_group_init = elp_time + + self.logger.debug('[Rank %d] init process group time %f seconds' % (self.rank, elp_time)) + self.logger.info('[Rank %d] Start slave node process' % self.rank) + + # Step2. 先压缩文件,统计文件大小,再进入到gather逻辑里 + if os.path.isfile(input_file_dir): + file_path = input_file_dir + else: + PathManager.check_path_writeable(input_file_dir) + file_path = os.path.join(str(Path(input_file_dir).parent), 'compress.tar') + start_time = time.perf_counter() + compress_directory(input_file_dir, file_path) + end_time = time.perf_counter() + self.time_stat.disk_stat.compress_input_file = end_time - start_time + self.logger.info("[Rank %d] Compress directory time: %f seconds" % (self.rank, end_time - start_time)) + file_size = os.path.getsize(file_path) + start_time = time.perf_counter() + file_hash_chunks = get_quick_hash(file_path) + end_time = time.perf_counter() + self.time_stat.disk_stat.hash_input_file = end_time - start_time + self.logger.info("[Rank %d] Hash input file time: %f seconds" % (self.rank, end_time - start_time)) + file_hash_chunks.insert(0, file_size) + self.logger.info( + "[Rank %d] File hash chunks (first element is file size): %s" % (self.rank, file_hash_chunks)) + gather_tensor = torch.tensor(file_hash_chunks, dtype=torch.int64, device=self.device) + + dist.init_process_group(backend='hccl', rank=self.rank, world_size=self.world_size) + if not (dist.is_available() and dist.is_initialized()): + raise RuntimeError("Distributed environment is not available") + + file_sizes_hash, wait_time, transfer_time = self.gather_rank_data(group=dist.group.WORLD, + gather_tensor=gather_tensor, + all_gather=True) + self.time_stat.com_stat.gather_file_size = [wait_time, transfer_time] + self.logger.info("[Rank %d] Gather file size - wait time: %f seconds, transfer time: %f seconds" % ( + self.rank, wait_time, transfer_time)) + # Step3. 建立子通信域 + self.logger.debug("[Rank %d] creating sub group %s" % (self.rank, file_sizes_hash)) + self.create_sub_group(file_sizes_hash, master_rank_num) + sub_group = GroupManager().get_rank_sub_group(self.rank) + + # 进入每个子通信域特定的逻辑 + if sub_group: + # Step4. broacast split size大小 + self.logger.info("[Rank %d] Subgroup ranks %s, file_sizes %s" % ( + self.rank, sub_group.ranks, sub_group.file_sizes)) + split_file_size = self.bd_split_file_size(sub_group) + sub_group.split_size(split_file_size) + file_tensor = np.memmap(file_path, dtype=np.uint8, mode='r') + self.gather_file_split(sub_group=sub_group, tensor=file_tensor, master_rank_num=master_rank_num) + self.logger.info(self.time_stat.to_string()) + else: + self.logger.warning("[Rank %d] slave rank not in sub group" % (self.rank)) + dist.barrier() + except Exception as e: + self.logger.error("%s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + raise e + finally: + dist.destroy_process_group() + + def run(self, args_dict: Dict[str, Any]): + input_file_dir = args_dict.get("input_file_dir") + output_file_dir = args_dict.get("output_file_dir") + nnodes = args_dict.get("nnodes") + node_rank = args_dict.get("node_rank") + master_addr = args_dict.get("master_addr") + master_port = args_dict.get("master_port") + master_rank_num = args_dict.get("master_rank_num") + split_file_size = args_dict.get("split_file_size") + time_out = args_dict.get("time_out") + log_file = args_dict.get("log_file") + + logging.basicConfig( + filename=log_file, # File to write logs to + level=logging.DEBUG, # Minimum logging level to write to the file + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' # Log message format + ) + self.logger.info({"message": "Run method arguments", + "class": self.__class__.__name__, + "method": sys._getframe().f_code.co_name, + "args": args_dict}) + + # 计算calculate world size + world_size = nnodes + master_rank_num - 1 + # master node的逻辑 + if node_rank == 0: + processes = [] + for i in range(master_rank_num): + master_env = EnvGroup(rank=i, local_rank=i, world_size=world_size, master_addr=master_addr, + master_port=master_port, group_rank=0, local_world_size=master_rank_num) + process = mp.Process(target=self.master_node_run, args=(master_env, output_file_dir, split_file_size)) + self.logger.info("Start master node subprocess %d." % i) + process.start() + processes.append(process) + start_time = time.perf_counter() + try: + while True: + all_done = all(not process.is_alive() for process in processes) + if all_done: + self.logger.info("All subprocesses finished successfully.") + break + elapsed_time = time.perf_counter() - start_time + time.sleep(5) + if elapsed_time > time_out: + raise TimeoutError("Timeout reached. Terminating all subprocesses.") + + except TimeoutError as e: + self.logger.error("%s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + for process in processes: + if process.is_alive(): + process.terminate() + process.join() + finally: + # 确保Ensure all processes are cleaned up + for process in processes: + process.join() + # slave node的逻辑 + else: + rank = node_rank + master_rank_num - 1 + slave_env = EnvGroup(rank=rank, local_rank=0, world_size=world_size, master_addr=master_addr, + master_port=master_port, group_rank=node_rank, local_world_size=1) + self.slave_node_run(slave_env, input_file_dir, master_rank_num) + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--input_file_dir", type=str, help='input profiling data dir') + parser.add_argument("--output_file_dir", type=str, help='input profiling data dir') + parser.add_argument("--nnodes", type=int, help='the total node number') + parser.add_argument("--node_rank", type=int, help='node rank in the cluster') + parser.add_argument("--master_addr", type=str, help='master address') + parser.add_argument("--master_port", type=int, default=29501, help='master port') + parser.add_argument("--master_rank_num", type=int, default=8, help='master rank nums') + + parser.add_argument("--split_file_size", type=int, default=None, help='split file size') + + # master node整体time out的时间 + parser.add_argument("--time_out", type=int, default=Constant.DEFAULT_TIME_OUT, + help='totoal process time out in seconds') + parser.add_argument("--log_file", type=str, default=None, help='logging file') + args = parser.parse_args() + + logging.basicConfig( + filename=args.log_file, # File to write logs to + level=logging.DEBUG, # Minimum logging level to write to the file + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' # Log message format + ) + logger = logging.getLogger(__name__) + + collector = Collector() + logger.debug(vars(args)) + args_dict = vars(args) + + try: + collector.run(args_dict) + except Exception as e: + logger.error("%s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + raise e diff --git a/profiler/msprof_analyze/precheck/common/__init__.py b/profiler/msprof_analyze/precheck/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/common/constant.py b/profiler/msprof_analyze/precheck/common/constant.py new file mode 100644 index 0000000000000000000000000000000000000000..1fc724e7524917c4c5abb21b83e26e573d0f956b --- /dev/null +++ b/profiler/msprof_analyze/precheck/common/constant.py @@ -0,0 +1,45 @@ +import logging +import os +import stat +from datetime import timezone, timedelta + + +class Constant: + DEFAULT_SPLIT_FILE_SIZE = 15 * 1024 # 便于测试多文件split,默认split size设为15k + MASTER_RANK_MEMORY_ON_CHIP = 10 * 1024 * 1024 * 1024 # 10GB 片上内存可用显存来传输数据 + UNZIP_DISK_SIZE_RAIO = 1.0 # 需要x倍压缩文件的空间进行解压操作 + DEFAULT_TIME_OUT = 1200 + + ARG_MAX_LEN = 255 # 参数最大长度 + ARG_MIN_INT_VALUE = - (1 << 31) # 32位整数最小值 + ARG_MAX_INT_VALUE = (1 << 31) - 1 # 32位整数最大值 + ARG_MIN_PORT_VALUE = 0 + ARG_MAX_PORT_VALUE = 65535 + + PROFILER_FILE_PATTERNS = [r'profiler_metadata\.json', r'profiler_info_\d{1,10}\.json', r'ASCEND_PROFILER_OUTPUT/.*'] + + COLLECTOR_MASTER_RANK_NUM = 4 + COLLECTOR_DEFAULT_TIMEOUT = 1200 # seconds + COLLECTOR_SPLIT_FILE_SIZE = None # 文件传输的split块大小,默认split size设为根据显存自动计算 + LOCALHOST_ADDRESSES = {'localhost', '127.0.0.1'} + + MAX_ARCHIVE_SIZE = 20 * 1024 * 1024 * 1024 # 20 GB + MAX_ARCHIVE_FILE_COUNT = 10000 + MAX_ARCHIVE_RATIO = 10 + + DEFAULT_PROFILING_COMMANDS = { + "[resnet]": "resnet", + } + + MS_PROF_PRECHECK_CMD = "msprof-analyze precheck" + + ENABLE_STACKTRACE_LOGGING = False + LOGGING_LEVEL = logging.INFO + + +class TimeConstant: + """Time related constants""" + UTC = timezone.utc + CHINA_OFFSET = timedelta(hours=8) + CHINA_TIMEZONE = timezone(CHINA_OFFSET, name='Asia/Shanghai') + MS_TO_S = 1 / 1000 # Milliseconds to seconds conversion factor diff --git a/profiler/msprof_analyze/precheck/common/logger.py b/profiler/msprof_analyze/precheck/common/logger.py new file mode 100644 index 0000000000000000000000000000000000000000..04346a80343098491e7610610730f9ea52fded8a --- /dev/null +++ b/profiler/msprof_analyze/precheck/common/logger.py @@ -0,0 +1,103 @@ +import logging +import logging.handlers + + +def create_logger(name: str, level: int = logging.DEBUG, use_memory_handler: bool = True) -> logging.Logger: + """ + Create a logger with optional memory handler for buffering logs. + + Args: + name: The name of the logger. recommend to use the module name: __name__. + level: The logging level, default is DEBUG. + use_memory_handler: Whether to add a memory handler for buffering logs, default is True. + + Returns: + A configured logger instance. + + Examples: + # Create a logger with memory handler + logger = create_logger("my_logger", logging.INFO, use_memory_handler=True) + + # Create a logger without memory handler + logger = create_logger("my_logger", logging.INFO, use_memory_handler=False) + + Notes: + When use_memory_handler is True, a memory handler is added to buffer logs until a specific log level + (default is ERROR) is reached, then logs are flushed to the target handler. This can avoid frequent + file writes and improve performance. Buffered logs can be manually flushed by calling logger.handlers[1].flush() + if no file handler is created yet. + + When use_memory_handler is False, no memory handler is added, and logs are written to the target handler + (e.g., console or file) in real-time. + """ + logger = logging.getLogger(name) + logger.handlers.clear() + + logger.setLevel(level) + logger.propagate = False + + console_handler = logging.StreamHandler() + console_handler.setLevel(level) + formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') + console_handler.setFormatter(formatter) + logger.addHandler(console_handler) + + if use_memory_handler: + memory_handler = logging.handlers.MemoryHandler(capacity=1000, flushLevel=logging.ERROR) + memory_handler.setLevel(level) + memory_handler.setFormatter(formatter) + logger.addHandler(memory_handler) + + return logger + + +def add_file_handler(logger: logging.Logger, log_file: str) -> logging.Logger: + """ + Add a file handler to an existing logger and handle the memory handler if present. + + Args: + logger: An existing logger instance. + log_file: The path to the log file. + + Returns: + The updated logger instance. + + Example: + # Initialize a logger + logger = create_logger("my_logger", logging.DEBUG, use_memory_handler=True) + + # Add a file handler to the logger + logger = add_file_handler(logger, "output.log") + + Notes: + This function adds a file handler to the given logger, inheriting the log level from the logger. + If a memory handler was previously added to the logger, its target handler is set to the new file handler, + buffered logs are flushed to the file, and then the memory handler is removed. + This ensures that both buffered logs and subsequent logs are written to the file after using the file handler. + """ + file_handler = logging.FileHandler(log_file) + file_handler.setLevel(logger.level) + formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') + file_handler.setFormatter(formatter) + logger.addHandler(file_handler) + + for handler in logger.handlers: + if isinstance(handler, logging.handlers.MemoryHandler): + handler.setTarget(file_handler) + handler.flush() + logger.removeHandler(handler) + + return logger + + +if __name__ == "__main__": + logger = create_logger("test_logger", logging.DEBUG, use_memory_handler=True) + logger.info("This is an info message from initial logger with memory handler") + + import tempfile + + with tempfile.NamedTemporaryFile(mode='w', delete=False) as temp_file: + temp_file_path = temp_file.name + add_file_handler(logger, temp_file_path) + logger.info("This is an info message from logger with file handler") + logger.info("The log file is {}".format(temp_file_path)) diff --git a/profiler/msprof_analyze/precheck/common/singleton.py b/profiler/msprof_analyze/precheck/common/singleton.py new file mode 100644 index 0000000000000000000000000000000000000000..b645f284d642d3ba84c6e0cde374d865f16b7105 --- /dev/null +++ b/profiler/msprof_analyze/precheck/common/singleton.py @@ -0,0 +1,9 @@ +def singleton(cls: any) -> any: + _instance = {} + + def _singleton(*args: any, **kw: any) -> any: + if cls not in _instance: + _instance[cls] = cls(*args, **kw) + return _instance.get(cls) + + return _singleton diff --git a/profiler/msprof_analyze/precheck/common/time_stat.py b/profiler/msprof_analyze/precheck/common/time_stat.py new file mode 100644 index 0000000000000000000000000000000000000000..69df61cb797310adb4262c1261eec651770c08d8 --- /dev/null +++ b/profiler/msprof_analyze/precheck/common/time_stat.py @@ -0,0 +1,74 @@ +from dataclasses import dataclass, field +from typing import List + +@dataclass +class InitProcessGroupStat: + global_group_init: float = None + sub_group_init: List[float] = field(default_factory=list) #wait time, transfer time + def sum_transfer_time(self): + return self.global_group_init + self.sub_group_init[1] + + def to_str_list(self): + str_list = ['[InitPGStat]:'] + str_list.append(' global group init: %f seconds:' % self.global_group_init) + str_list.append(' sub group init: %f seconds:' % self.sub_group_init[1]) + return str_list + +@dataclass +class ComStat: + gather_file_size: List[float] = field(default_factory=list) + broad_splits: List[float] = field(default_factory=list) + gather_file_splits: List[List[float]] = field(default_factory=list) + def sum_transfer_time(self): + return self.gather_file_size[1] + self.broad_splits[1] + + def to_str_list(self): + str_list = ['[ComStat]:'] + str_list.append(' gather file size: %f seconds:' % self.gather_file_size[1]) + str_list.append(' broad splits: %f seconds:' % self.broad_splits[1]) + file_split_times = [t[1] for t in self.gather_file_splits] + str_list.append(' gather file splits: %s seconds:' % file_split_times) + return str_list + +@dataclass +class DiskStat: + memory_on_chip: List[List[float]] = field(default_factory=list) + ram_disk: List[List[float]] = field(default_factory=list) + + concat_file: List[float] = field(default_factory=list) + hash_output_file: List[float] = field(default_factory=list) + + read_input_file_splits: List[float] = field(default_factory=list) + hash_input_file: float = None + + def to_str_list(self): + str_list = ['[DiskStat]:'] + if len(self.memory_on_chip) > 0: + for memory_on_chip, ram_disk in zip(self.memory_on_chip, self.ram_disk): + str_list.append(' File Split: ') + str_list.append(' hdm_ram time: %s' % memory_on_chip) + str_list.append(' ram_disk time: %s' % ram_disk) + str_list.append(' concat file time for slave ranks: %s' % self.concat_file) + str_list.append(' verify file hash time for slave ranks: %s' % self.hash_output_file) + + #slave node + else: + str_list.append(' hash file time: %s' % self.hash_input_file) + str_list.append(' read file split times: %s' % self.read_input_file_splits) + + return str_list + + +@dataclass +class TimeStat: + init_pg_stat: InitProcessGroupStat = field(default_factory=InitProcessGroupStat) + com_stat: ComStat = field(default_factory=ComStat) + disk_stat: DiskStat = field(default_factory=DiskStat) + + #print it for logging, 应当区分master node rank与slave node。 + def to_string(self): + str_list = ['[TimeStat]:'] + str_list.extend(' %s' %s for s in self.init_pg_stat.to_str_list()) + str_list.extend(' %s' %s for s in self.com_stat.to_str_list()) + str_list.extend(' %s' %s for s in self.disk_stat.to_str_list()) + return '\n'.join(str_list) diff --git a/profiler/msprof_analyze/precheck/common/utils.py b/profiler/msprof_analyze/precheck/common/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..03f4e7d7ee31032067298351c25f426505cfc2ff --- /dev/null +++ b/profiler/msprof_analyze/precheck/common/utils.py @@ -0,0 +1,193 @@ +import os +import sys +import hashlib +import subprocess +import logging +from datetime import datetime + +import torch_npu +from msprof_analyze.precheck.common.constant import TimeConstant +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +def get_file_md5(filepath, chunk_size=4096, split_hash_size=4): + PathManager.check_input_file_path(filepath) + PathManager.check_path_readable(filepath) + md5_hash = hashlib.md5() + with open(filepath, "rb") as file: + for chunk in iter(lambda: file.read(chunk_size), b""): + md5_hash.update(chunk) + hash_bytes = int(md5_hash.hexdigest(), 16).to_bytes(16, 'big') + + chunks = [] + for i in range(0, 16, split_hash_size): + chunks.append(int.from_bytes(hash_bytes[i:i + split_hash_size], 'big')) + return chunks + + +def get_quick_hash(file_path, sample_size=65536, hash_spilt_size=4): + PathManager.check_input_file_path(file_path) + PathManager.check_path_readable(file_path) + file_size = os.path.getsize(file_path) + if file_size < sample_size * 5: + return get_file_md5(file_path) + hash_md5 = hashlib.md5() + with open(file_path, "rb") as f: + hash_md5.update(f.read(sample_size)) + f.seek(max(0, (os.path.getsize(file_path) // 2) - (sample_size // 2))) + hash_md5.update(f.read(sample_size)) + f.seek(-sample_size, 2) + hash_md5.update(f.read(sample_size)) + hash_bytes = int(hash_md5.hexdigest(), 16).to_bytes(16, 'big') + + chunks = [] + for i in range(0, 16, hash_spilt_size): + chunks.append(int.from_bytes(hash_bytes[i:i + hash_spilt_size], 'big')) + return chunks + + +def is_equal_file_hash(chunks1, chunks2): + for chunk1, chunk2 in zip(chunks1, chunks2): + if chunk1 != chunk2: + return False + return True + + +def cat_files(output_file, input_files): + """ + Concatenate multiple binary input files into a single output file using cat command. + + Args: + output_file (str): Path to the output file + input_files (list): List of input file paths to concatenate + + Returns: + bool: True if concatenation was successful + + Raises: + subprocess.CalledProcessError: If the cat command fails + """ + PathManager.check_input_file_path(output_file) + cmd = ["cat"] + list(input_files) + + try: + with open(output_file, 'wb') as outfile: + result = subprocess.run(cmd, stdout=outfile, stderr=subprocess.PIPE) + + if result.returncode == 0: + return True + else: + logger.error("Error occurred during concatenation: %s", + result.stderr.decode('utf-8', errors='replace')) + raise subprocess.CalledProcessError(result.returncode, cmd, + output=None, + stderr=result.stderr) + + except OSError as e: + logger.error("OS error occurred during file operation: %s", str(e)) + raise + + +def compress_directory(src_dir, output_file): + PathManager.check_input_directory_path(src_dir) + PathManager.check_path_readable(src_dir) + if not os.path.isdir(src_dir): + raise FileNotFoundError(f"The directory '{src_dir}' does not exist.") + try: + result = subprocess.run( + ["/bin/tar", "-czf", output_file, "-C", src_dir, "."], + check=True, # Raise an error if the command fails + stdout=subprocess.PIPE, + stderr=subprocess.PIPE + ) + except subprocess.CalledProcessError as e: + raise RuntimeError( + f"Failed to compress directory '{src_dir}' into '{output_file}'. " + f"Error: {e.stderr.decode('utf-8')}" + ) from e + + +def get_master_rank_collect_dir(output_file_dir, master_rank_i): + return os.path.join(output_file_dir, 'rank_%d_collect' % master_rank_i) + + +def get_slave_rank_collect_dir(master_rank_collect_dir, group_rank): + return os.path.join(master_rank_collect_dir, 'node_%d' % group_rank) + + +def parition_sub_group_ranks(master_rank_num, file_sizes): + master_rank_num = int(master_rank_num) + indexed_lst = sorted(enumerate(file_sizes), key=lambda x: x[1]) + sorted_indices = [index + master_rank_num for index, value in indexed_lst] + if master_rank_num != 0: + base_size = len(file_sizes) // master_rank_num + else: + logging.error("%s value can not be 0", master_rank_num) + extra_items = len(file_sizes) % master_rank_num + partitions = [] + start = 0 + for i in range(master_rank_num): + end = start + base_size + (1 if i < extra_items else 0) + partition_indices = [i] + partition_indices.extend(sorted_indices[start:end]) + partitions.append(partition_indices) + start = end + return partitions + + +def get_split_file_size(memory_on_chip_size, sub_group_rank_num): + if sub_group_rank_num != 0: + return memory_on_chip_size // sub_group_rank_num + else: + logging.error("%s value can not be 0", sub_group_rank_num) + return None + + +def create_npu_event(stream): + event = torch_npu.npu.Event(enable_timing=True) + stream.record_event(event) + return event + + +def event_elaspe_second(stream, event1, event2): + stream.synchronize() + return event1.elapsed_time(event2) * TimeConstant.MS_TO_S + + +def cn_now() -> datetime: + """ + Get current time in China timezone as a formatted string. + + Returns: + datetime: Current time in China timezone + """ + return datetime.now(tz=TimeConstant.UTC).astimezone(TimeConstant.CHINA_TIMEZONE) + + +def check_file_owner_and_permission(file_path): + """ + Check if the file belongs to current user and only owner has write permission. + + Args: + file_path: Path to the file to check + + Raises: + RuntimeError: If file not found, not owned by current user, or has wrong permissions + """ + PathManager.check_path_readable(file_path) + + if not os.path.isfile(file_path): + raise RuntimeError(f"File not found at {file_path}") + + # Check file owner + if os.stat(file_path).st_uid != os.getuid(): + raise RuntimeError(f"File {file_path} is not owned by current user") + + # Check file permissions (only owner should have write permission) + current_mode = os.stat(file_path).st_mode + desired_mode = 0o700 # rwx------ (only owner has read/write/execute) + if (current_mode & 0o777) != desired_mode: + os.chmod(file_path, desired_mode) + logger.warning("File %s has wrong permissions, has been changed to %o", file_path, desired_mode) diff --git a/profiler/msprof_analyze/precheck/distributed_cluster/__init__.py b/profiler/msprof_analyze/precheck/distributed_cluster/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..b14094e3f9a77a0970342980ed8de1017f58ce19 --- /dev/null +++ b/profiler/msprof_analyze/precheck/distributed_cluster/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. \ No newline at end of file diff --git a/profiler/msprof_analyze/precheck/distributed_cluster/distributed_cluster_base.py b/profiler/msprof_analyze/precheck/distributed_cluster/distributed_cluster_base.py new file mode 100644 index 0000000000000000000000000000000000000000..7ccd1e542eee2050542a08df62e1720a9cdf4dcb --- /dev/null +++ b/profiler/msprof_analyze/precheck/distributed_cluster/distributed_cluster_base.py @@ -0,0 +1,19 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +class DistributedClusterBase: + def __init__(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/__init__.py b/profiler/msprof_analyze/precheck/env_check/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..b14094e3f9a77a0970342980ed8de1017f58ce19 --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. \ No newline at end of file diff --git a/profiler/msprof_analyze/precheck/env_check/check_item_factory.py b/profiler/msprof_analyze/precheck/env_check/check_item_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..0ea14bfe0d37768828291a1e9c71a1b890c7bd0d --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/check_item_factory.py @@ -0,0 +1,57 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_variable_check import EnvironmentVariableCheck +from msprof_analyze.precheck.env_check.python_library_check import PythonLibraryCheck +from msprof_analyze.precheck.env_check.cpu_check import CPUCheck +from msprof_analyze.precheck.env_check.npu_check import NPUCheck +from msprof_analyze.precheck.env_check.communication_check import CommunicationCheck +from msprof_analyze.precheck.env_check.io_check import IOCheck + + +HARDWARE_CHECK_LIST = [ + CPUCheck, + NPUCheck, + CommunicationCheck, + IOCheck, +] + +SOFTWARE_CHECK_LIST = [ + EnvironmentVariableCheck, + PythonLibraryCheck, +] + + +class CheckItemFactory: + CHECK_ITEMS = { + check_item.CHECK_TYPE: check_item + for check_item in SOFTWARE_CHECK_LIST + HARDWARE_CHECK_LIST + } + + @staticmethod + def get_check_item(check_type: str) -> list: + if check_type == "all": + return SOFTWARE_CHECK_LIST + HARDWARE_CHECK_LIST + if check_type == "software": + return SOFTWARE_CHECK_LIST + if check_type == "hardware": + return HARDWARE_CHECK_LIST + check_type_list = check_type.split("|") + check_items = [] + for check_type in check_type_list: + check_item = CheckItemFactory.CHECK_ITEMS.get(check_type) + if not check_item: + continue + check_items.append(check_item) + return check_items diff --git a/dynolog_npu/plugin/setup.py b/profiler/msprof_analyze/precheck/env_check/communication_check.py similarity index 42% rename from dynolog_npu/plugin/setup.py rename to profiler/msprof_analyze/precheck/env_check/communication_check.py index 151b9b3fb3fa1a42e147685f632163c8b3f5a564..807d4008115422ff312d2877495273bc25312eea 100644 --- a/dynolog_npu/plugin/setup.py +++ b/profiler/msprof_analyze/precheck/env_check/communication_check.py @@ -1,42 +1,25 @@ -# Copyright (c) 2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os -from setuptools import setup -from pybind11.setup_helpers import Pybind11Extension - -BASE_DIR = os.path.dirname(os.path.realpath(__file__)) - -# Define the extension module -ext_modules = [ - Pybind11Extension( - "IPCMonitor", # Name of the Python module - sources=["bindings.cpp", - "ipc_monitor/utils.cpp", - "ipc_monitor/DynoLogNpuMonitor.cpp", - "ipc_monitor/NpuIpcClient.cpp", - ], # Source files - include_dirs=[os.path.join(BASE_DIR, "ipc_monitor")], # Include Pybind11 headers - language="c++", # Specify the language - ), -] - -# Set up the package -setup( - name="dynolog_npu_plugin", - version="0.1", - description="dynolog npu plugins", - ext_modules=ext_modules, - install_requires=["pybind11"], -) \ No newline at end of file +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_check import HardwareCheck + + +class CommunicationCheck(HardwareCheck): + CHECK_TYPE = "communication" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/cpu_check.py b/profiler/msprof_analyze/precheck/env_check/cpu_check.py new file mode 100644 index 0000000000000000000000000000000000000000..e3765c71ebc3a9ee4700fe59cfc84727c68c3417 --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/cpu_check.py @@ -0,0 +1,25 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_check import HardwareCheck + + +class CPUCheck(HardwareCheck): + CHECK_TYPE = "cpu" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/environment_check.py b/profiler/msprof_analyze/precheck/env_check/environment_check.py new file mode 100644 index 0000000000000000000000000000000000000000..98d54ac506400ec53348cdaeea613d555dc81290 --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/environment_check.py @@ -0,0 +1,50 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from abc import ABC, abstractmethod + + +class EnvironmentCheck(ABC): + CHECK_TYPE = "" + + def __init__(self, **kwargs): + self.output = kwargs.get("output", "./output") + + def init(self): + pass + + def uninit(self): + pass + + @abstractmethod + def check(self): + pass + + +class HardwareCheck(EnvironmentCheck): + def __init__(self, **kwargs): + super().__init__(**kwargs) + + @abstractmethod + def check(self): + pass + + +class SoftwareCheck(EnvironmentCheck): + def __init__(self, **kwargs): + super().__init__(**kwargs) + + @abstractmethod + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/environment_variable_check.py b/profiler/msprof_analyze/precheck/env_check/environment_variable_check.py new file mode 100644 index 0000000000000000000000000000000000000000..58d2becb23266ff085b80d2acd9c17a229e8420d --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/environment_variable_check.py @@ -0,0 +1,25 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_check import SoftwareCheck + + +class EnvironmentVariableCheck(SoftwareCheck): + CHECK_TYPE = "env_variable" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/io_check.py b/profiler/msprof_analyze/precheck/env_check/io_check.py new file mode 100644 index 0000000000000000000000000000000000000000..5cfd5c425f0d18d7021c8ef8dca7447c9df6dfc6 --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/io_check.py @@ -0,0 +1,25 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_check import HardwareCheck + + +class IOCheck(HardwareCheck): + CHECK_TYPE = "io" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/npu_check.py b/profiler/msprof_analyze/precheck/env_check/npu_check.py new file mode 100644 index 0000000000000000000000000000000000000000..c7ffa4997da7a75f566461c70af22393e9b97fb1 --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/npu_check.py @@ -0,0 +1,25 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_check import HardwareCheck + + +class NPUCheck(HardwareCheck): + CHECK_TYPE = "npu" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/env_check/python_library_check.py b/profiler/msprof_analyze/precheck/env_check/python_library_check.py new file mode 100644 index 0000000000000000000000000000000000000000..81de7000ce7cdf37c1a6c52ff6d650df95d86d9b --- /dev/null +++ b/profiler/msprof_analyze/precheck/env_check/python_library_check.py @@ -0,0 +1,25 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.environment_check import SoftwareCheck + + +class PythonLibraryCheck(SoftwareCheck): + CHECK_TYPE = "python_lib" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def check(self): + pass diff --git a/profiler/msprof_analyze/precheck/examples/__init__.py b/profiler/msprof_analyze/precheck/examples/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/examples/profiler/__init__.py b/profiler/msprof_analyze/precheck/examples/profiler/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/examples/profiler/dynamic_prof.py b/profiler/msprof_analyze/precheck/examples/profiler/dynamic_prof.py new file mode 100644 index 0000000000000000000000000000000000000000..f4b1e9b849b32380978b45661d18c03447ee6482 --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/profiler/dynamic_prof.py @@ -0,0 +1,71 @@ +import json +import os +import logging +from copy import deepcopy + +logger = logging.getLogger(__name__) + +DEFAULT_DP_CONFIG = { + "activities": ["CPU", "NPU"], + "prof_dir": "./prof_result", + "analyse": False, + "record_shapes": False, + "profile_memory": False, + "with_stack": False, + "with_flops": False, + "with_modules": False, + "active": 1, + "is_rank": False, + "rank_list": [], + "experimental_config": { + "profiler_level": "Level0", + "aic_metrics": "AiCoreNone", + "l2_cache": False, + "op_attr": False, + "gc_detect_threshold": None, + "data_simplification": True, + "record_op_args": False, + "export_type": "text", + "msprof_tx": False + } +} + + +def _get_prof_config_json(prof_dp_path): + prof_config_json = os.path.join(prof_dp_path, "profiler_config.json") + return prof_config_json + + +def _set_default_prof_config(prof_config_json): + with open(prof_config_json, "w") as f: + json.dump(DEFAULT_DP_CONFIG, f, indent=4) + + +def get_dynamic_prof_config_path(): + cwd = os.path.dirname(os.path.realpath(__file__)) + prof_dp_path = os.path.join(cwd, './local_config/config_dynamic') + + prof_config_json = _get_prof_config_json(prof_dp_path) + os.makedirs(os.path.dirname(prof_config_json), exist_ok=True) + + if not os.path.exists(prof_config_json): + _set_default_prof_config(prof_config_json) + logger.info("Created default dynamic profiler config file at {}".format(prof_config_json)) + + return prof_dp_path + + +def start_dynamic_profiler(prof_dp_path, prof_save_dir): + prof_config_json = _get_prof_config_json(prof_dp_path) + if prof_save_dir is not None: + if not os.path.exists(prof_config_json): + data = deepcopy(DEFAULT_DP_CONFIG) + else: + with open(prof_config_json, 'r') as f: + data = json.load(f) + data['prof_dir'] = prof_save_dir + + with open(prof_config_json, 'w') as f: + json.dump(data, f, indent=4) + + logger.info('has started dynamic profiling') diff --git a/profiler/msprof_analyze/precheck/examples/profiler/models.py b/profiler/msprof_analyze/precheck/examples/profiler/models.py new file mode 100644 index 0000000000000000000000000000000000000000..4a0f8cc0de62efcd92081a632fb9786188f05de3 --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/profiler/models.py @@ -0,0 +1,67 @@ +import logging +from typing import Dict, Any, Tuple + +import torch +import torch.nn as nn +from torch.utils.data import Dataset + +logger = logging.getLogger(__name__) + + +# ============= Models ============= +class SimpleResNet(nn.Module): + def __init__(self, num_classes: int = 10): + super().__init__() + self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3) + self.bn1 = nn.BatchNorm2d(64) + self.relu = nn.ReLU(inplace=True) + self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) + self.fc = nn.Linear(64 * 56 * 56, num_classes) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + x = self.conv1(x) + x = self.bn1(x) + x = self.relu(x) + x = self.maxpool(x) + x = torch.flatten(x, 1) + x = self.fc(x) + return x + + +# ============= Datasets ============= +class DummyImageDataset(Dataset): + def __init__(self, input_shape: Tuple[int, ...], num_samples: int = 100): + self.input_shape = input_shape + self.num_samples = num_samples + + def __len__(self) -> int: + return self.num_samples + + def __getitem__(self, idx: int) -> Tuple[torch.Tensor, torch.Tensor]: + x = torch.randn(self.input_shape) + y = torch.randint(0, 10, ()) + return x, y + + +# ============= Example Registry ============= +class ExampleRegistry: + @staticmethod + def get_example_config(example_name: str) -> Dict[str, Any]: + configs = { + "resnet": { + "model_class": SimpleResNet, + "model_args": {"num_classes": 10}, + "dataset_class": DummyImageDataset, + "dataset_args": {"input_shape": (3, 224, 224), "num_samples": 800}, + "batch_size": 8, + }, + } + + if example_name not in configs: + available_models = ", ".join(configs.keys()) + raise ValueError( + f"Unknown example name: {example_name}. " + f"Available models are: {available_models}" + ) + + return configs[example_name] diff --git a/profiler/msprof_analyze/precheck/examples/profiler/train_with_profiler.py b/profiler/msprof_analyze/precheck/examples/profiler/train_with_profiler.py new file mode 100644 index 0000000000000000000000000000000000000000..9e6eb482c4cc10e4f31026b36738654d305409f2 --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/profiler/train_with_profiler.py @@ -0,0 +1,286 @@ +""" +Example Usage: +1. Single node training examples: +torchrun --nproc_per_node=8 \ + --nnodes=1 \ + --node_rank=0 \ + --master_addr="127.0.0.1" \ + --master_port=29500 \ + train_with_profiler.py \ + --example_name bert \ + --prof_output_dir ./profiler_output + +2. Distributed training examples: + + # Multiple nodes (2 nodes, 8 GPUs each) + # On node 0 (master node): + torchrun --nproc_per_node=8 \ + --nnodes=2 \ + --node_rank=0 \ + --master_addr="192.168.1.1" \ + --master_port=29500 \ + train_with_profiler.py \ + --example_name bert \ + --prof_output_dir ./profiler_output + + # On node 1: + torchrun --nproc_per_node=8 \ + --nnodes=2 \ + --node_rank=1 \ + --master_addr="192.168.1.1" \ + --master_port=29500 \ + train_with_profiler.py \ + --example_name bert \ + --prof_output_dir ./profiler_output + +Distributed Training Parameters: +--nproc_per_node: Number of processes per node (typically number of GPUs) +--nnodes: Total number of nodes +--node_rank: Rank of current node (0 to nnodes-1) +--master_addr: IP address of master node +--master_port: Port for master node communication + +Available Models: +- resnet: ResNet model implementation + +Environment Variables (automatically set by torchrun): +- RANK: Global rank of the process +- WORLD_SIZE: Total number of processes +- LOCAL_RANK: Local rank within the current node +- MASTER_ADDR: Master node address +- MASTER_PORT: Master node port +""" + +import os +import argparse +import ipaddress +import datetime +import logging +from typing import Optional, List + +import torch +import torch_npu +import torch.nn as nn +import torch.distributed as dist +from torch.utils.data import Dataset, DataLoader +from tqdm import tqdm + +try: + from torch_npu.profiler import dynamic_profile as dp +except ImportError: + dp = None + +from msprof_analyze.precheck.examples.profiler.models import ExampleRegistry +from msprof_analyze.precheck.examples.profiler.dynamic_prof import get_dynamic_prof_config_path +from msprof_analyze.precheck.common.constant import Constant + +logger = logging.getLogger(__name__) + + +class ProfilerCallback: + """Callback for handling profiling operations""" + + def __init__(self, prof_save_dir, + is_dynamic=False, dynamic_prof_path=None): + self.profiler = None + self.is_dynamic = is_dynamic + if is_dynamic: + self.dynamic_prof_path = dynamic_prof_path if dynamic_prof_path else get_dynamic_prof_config_path() + self.prof_save_dir = prof_save_dir + + def on_train_begin(self): + if self.is_dynamic: + dp.init(self.dynamic_prof_path) + dist.barrier() + if dist.get_rank() == 0: + from msprof_analyze.precheck.examples.profiler.dynamic_prof import start_dynamic_profiler + start_dynamic_profiler(self.dynamic_prof_path, + self.prof_save_dir) + self.profiler = dp + else: + experimental_config = torch_npu.profiler._ExperimentalConfig( + aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, + profiler_level=torch_npu.profiler.ProfilerLevel.Level2, + l2_cache=False, + data_simplification=False + ) + self.profiler = torch_npu.profiler.profile( + activities=[ + torch_npu.profiler.ProfilerActivity.CPU, + torch_npu.profiler.ProfilerActivity.NPU + ], + with_stack=True, + record_shapes=True, + profile_memory=True, + schedule=torch_npu.profiler.schedule( + wait=5, warmup=5, active=20, repeat=1, skip_first=10), + experimental_config=experimental_config, + with_flops=True, + with_modules=True, + on_trace_ready=torch_npu.profiler.tensorboard_trace_handler( + self.prof_save_dir) + ) + self.profiler.__enter__() + + def on_step_end(self): + if self.profiler: + self.profiler.step() + + def on_train_end(self): + if not self.is_dynamic and self.profiler: + self.profiler.__exit__(None, None, None) + + +class Trainer: + def __init__( + self, + model: nn.Module, + dataloader: Optional[Dataset] = None, + callbacks: Optional[List[ProfilerCallback]] = None, + criterion: Optional[nn.Module] = None, + optimizer: Optional[torch.optim.Optimizer] = None, + ): + self.model = model + self.dataloader = dataloader + self.callbacks = callbacks or [] + + # Setup loss and optimizer with defaults + self.criterion = criterion or nn.CrossEntropyLoss() + self.optimizer = optimizer or torch.optim.Adam(self.model.parameters()) + + # get dist config from env + self.rank = int(os.environ.get("RANK", 0)) + self.world_size = int(os.environ.get("WORLD_SIZE", 1)) + self.local_rank = int(os.environ.get("LOCAL_RANK", 0)) + self.device = f"npu:{self.local_rank}" + + # Setup device and distributed training + self.setup_distributed(self.rank, self.world_size, self.local_rank) + + # Move model and criterion to device + self.model = self.model.to(self.device) + self.criterion = self.criterion.to(self.device) + + @staticmethod + def setup_distributed(rank, world_size, local_rank): + if dist.is_initialized(): + return + + torch.npu.set_device(local_rank) + dist.init_process_group( + backend='hccl', + rank=rank, + world_size=world_size, + timeout=datetime.timedelta(seconds=1800) + ) + logger.info(f"[Rank {rank}] Initialized distributed training") + + def cleanup(self): + """Explicitly cleanup distributed training resources""" + if dist.is_initialized(): + dist.destroy_process_group() + logger.info(f"[Rank {self.rank}] Destroyed distributed training") + + def train(self, epoch: int = 1): + # Call training start callbacks + for callback in self.callbacks: + callback.on_train_begin() + + # Training loop + for epoch_idx in range(epoch): + if self.rank == 0: + pbar = tqdm( + total=len(self.dataloader), + desc=f'Epoch {epoch_idx + 1}/{epoch}', + unit='batch' + ) + + for step, (inputs, labels) in enumerate(self.dataloader): + # Move data to device + inputs = inputs.to(self.device) + labels = labels.to(self.device) + + # Forward pass + self.optimizer.zero_grad() + outputs = self.model(inputs) + loss = self.criterion(outputs, labels) + + # Backward pass + loss.backward() + self.optimizer.step() + + if self.rank == 0: + pbar.update(1) + pbar.set_postfix({ + 'step': f'{step + 1}/{len(self.dataloader)}', + 'loss': f'{loss.item():.4f}' + }) + + dist.barrier() + + # Call step end callbacks + for callback in self.callbacks: + callback.on_step_end() + + if self.rank == 0: + pbar.close() + + # Call training end callbacks + for callback in self.callbacks: + callback.on_train_end() + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument('--example_name', default='resnet', + choices=['resnet'], + help='Name of the example to run') + parser.add_argument('--prof_output_dir', required=True) + parser.add_argument('--static', action='store_true', required=False, default=False) + args = parser.parse_args() + + # Get example configuration + example_config = ExampleRegistry.get_example_config(args.example_name) + + # Create model and dataset + model = example_config["model_class"](**example_config["model_args"]) + dataset = example_config["dataset_class"](**example_config["dataset_args"]) + + # Create loss and optimizer (可选,使用默认值也可以) + criterion = nn.CrossEntropyLoss() + optimizer = torch.optim.Adam(model.parameters(), lr=0.001) + + # Create profiler callback + profiler_callback = ProfilerCallback( + args.prof_output_dir, + is_dynamic=(not args.static) + ) + + dataloader = DataLoader(dataset, batch_size=example_config["batch_size"]) + + # Initialize trainer + trainer = Trainer( + model=model, + dataloader=dataloader, + callbacks=[profiler_callback], + criterion=criterion, # 可选 + optimizer=optimizer, # 可选 + ) + + try: + trainer.train() + finally: + trainer.cleanup() + + +if __name__ == '__main__': + logging.basicConfig( + level=logging.INFO, + format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' + ) + + try: + main() + except Exception as e: + logger.error(f"Unexpected error: {e}", exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + raise diff --git a/profiler/msprof_analyze/precheck/examples/scripts/precheck_run_llama2.sh b/profiler/msprof_analyze/precheck/examples/scripts/precheck_run_llama2.sh new file mode 100644 index 0000000000000000000000000000000000000000..e3bf0859e7565ecbea7857bb1601fc9e58812b57 --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/scripts/precheck_run_llama2.sh @@ -0,0 +1,128 @@ +#!/bin/bash + +export CUDA_DEVICE_MAX_CONNECTIONS=1 +export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True + +GPUS_PER_NODE=${GPUS_PER_NODE:-8} +MASTER_ADDR=${MASTER_ADDR:-"192.168.0.1"} +MASTER_PORT=${MASTER_PORT:-6000} +NNODES=${NNODES:-2} +NODE_RANK=${NODE_RANK:-0} +WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES)) + +CKPT_SAVE_DIR=${CKPT_SAVE_DIR:-"./ckpt/llama-2-7b"} +CKPT_LOAD_DIR=${CKPT_LOAD_DIR:-"./model_weights/llama-2-7b-legacy"} +TOKENIZER_MODEL=${TOKENIZER_MODEL:-"./model_from_hf/llama-2-7b-hf/tokenizer.model"} +DATA_PATH=${DATA_PATH:-"./dataset/enwiki_text_document"} + +TP=${TP:-2} +PP=${PP:-4} + +# Result directory +OUTPUT_DIR=${OUTPUT_DIR:-"./result/precheck/llama2-1129-2130"} + +PROF_NODE_RES_DIR="$OUTPUT_DIR/node_prof_save_dir" +LOG_FILE="$OUTPUT_DIR/precheck.log" + +# Check if profiling output directory exists before running training +# This prevents starting a long training job if the directory is missing +if [ ! -d "$OUTPUT_DIR" ]; then + echo "Error: Result directory $OUTPUT_DIR does not exist." \ + "Please create the directory before running training" \ + "(in ${BASH_SOURCE[0]})" >&2 + exit 1 +fi + +# Get the directory of the current script and cd into it +# SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +# echo "Script directory: $SCRIPT_DIR" +# cd "$SCRIPT_DIR" +# echo "Changed working directory to: $(pwd)" + + +DISTRIBUTED_ARGS=" + --nproc_per_node $GPUS_PER_NODE \ + --nnodes $NNODES \ + --node_rank $NODE_RANK \ + --master_addr $MASTER_ADDR \ + --master_port $MASTER_PORT +" + +GPT_ARGS=" + --tensor-model-parallel-size ${TP} \ + --pipeline-model-parallel-size ${PP} \ + --sequence-parallel \ + --num-layers 32 \ + --hidden-size 4096 \ + --ffn-hidden-size 11008 \ + --num-attention-heads 32 \ + --tokenizer-type Llama2Tokenizer \ + --tokenizer-model ${TOKENIZER_MODEL} \ + --seq-length 4096 \ + --max-position-embeddings 4096 \ + --micro-batch-size 1 \ + --global-batch-size 256 \ + --make-vocab-size-divisible-by 1 \ + --lr 1.25e-6 \ + --train-iters 5 \ + --lr-decay-style cosine \ + --untie-embeddings-and-output-weights \ + --disable-bias-linear \ + --attention-dropout 0.0 \ + --init-method-std 0.01 \ + --hidden-dropout 0.0 \ + --position-embedding-type rope \ + --normalization RMSNorm \ + --use-fused-rmsnorm \ + --swiglu \ + --use-flash-attn \ + --no-masked-softmax-fusion \ + --attention-softmax-in-fp32 \ + --min-lr 1.25e-7 \ + --weight-decay 1e-1 \ + --lr-warmup-fraction 0.01 \ + --clip-grad 1.0 \ + --adam-beta1 0.9 \ + --initial-loss-scale 65536 \ + --adam-beta2 0.95 \ + --no-gradient-accumulation-fusion \ + --no-load-optim \ + --no-load-rng \ + --use-distributed-optimizer \ + --use-fused-swiglu \ + --use-fused-rotary-pos-emb \ + --overlap-grad-reduce \ + --bf16" + +DATA_ARGS=" \ + --data-path $DATA_PATH \ + --split 949,50,1" + +PROFILE_ARGS=" \ + --profile \ + --profile-step-start 2 \ + --profile-step-end 4 \ + --profile-ranks -1 \ + --profile-level level0 \ + --profile-with-cpu \ + --profile-save-path $PROF_NODE_RES_DIR" + +OUTPUT_ARGS=" \ + --log-interval 1 \ + --save-interval 10000 \ + --eval-interval 1000 \ + --eval-iters 0" + +# Add precheck arguments +# PRECHECK_ARGS=" \ +# --do_precheck" + +torchrun $DISTRIBUTED_ARGS pretrain_gpt.py \ + $GPT_ARGS \ + $DATA_ARGS \ + $OUTPUT_ARGS \ + $PROFILE_ARGS \ + --distributed-backend nccl \ + --load $CKPT_LOAD_DIR \ + --save $CKPT_SAVE_DIR \ + | tee $LOG_FILE diff --git a/profiler/msprof_analyze/precheck/examples/scripts/run_llama2_precheck.sh b/profiler/msprof_analyze/precheck/examples/scripts/run_llama2_precheck.sh new file mode 100644 index 0000000000000000000000000000000000000000..495dab8ca6fdeaab6ca87df61a6be0d4d7830f6c --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/scripts/run_llama2_precheck.sh @@ -0,0 +1,40 @@ +#!/bin/bash + +# You should set the IP addresses of the nodes in the NODES_IP variable +# Change the IP addresses to the actual IP addresses of your nodes +NODES_IP="${NODES_IP:-192.168.0.1,192.168.0.2}" + +# Convert comma-separated NODES_IP to an array nodes_ip +IFS=',' read -r -a nodes_ip <<< "$NODES_IP" + + +echo "Starting distributed precheck with ${#nodes_ip[@]} nodes" +echo "Master node: ${nodes_ip[0]}" +echo "All nodes: ${nodes_ip[*]}" + +output_dir_base="./result/demo_precheck" + +# Add timestamp to task name +timestamp=$(date +"%Y%m%d_%H%M%S") +task_name="llama2-demo_${timestamp}" + +output_dir="${output_dir_base}/${task_name}" +node_prof_save_dir="${output_dir}/node_prof_save_dir" + +# Join array elements with commas +host_ips=$(IFS=,; echo "${nodes_ip[*]}") + +# Run precheck with distributed configuration +msprof-analyze precheck start_all \ + --host_ips "${host_ips}" \ + --master_addr "${nodes_ip[0]}" \ + --master_port 29500 \ + --nnodes ${#nodes_ip[@]} \ + --nproc_per_node 8 \ + --output_dir ${output_dir_base} \ + --task_name ${task_name} \ + --node_prof_save_dir ${node_prof_save_dir} \ + --profiling_cmd "OUTPUT_DIR=${output_dir} bash ./examples/scripts/precheck_run_llama2.sh" \ + --static + +echo "Precheck completed" diff --git a/profiler/msprof_analyze/precheck/examples/scripts/run_precheck.sh b/profiler/msprof_analyze/precheck/examples/scripts/run_precheck.sh new file mode 100644 index 0000000000000000000000000000000000000000..bf5b3b89cff5e945af557ed07c997174ac19a78b --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/scripts/run_precheck.sh @@ -0,0 +1,37 @@ +#!/bin/bash + +# You should set the IP addresses of the nodes in the NODES_IP variable +# Change the IP addresses to the actual IP addresses of your nodes +NODES_IP="${NODES_IP:-192.168.0.1,192.168.0.2}" + + +# Convert comma-separated NODES_IP to an array nodes_ip +IFS=',' read -r -a nodes_ip <<< "$NODES_IP" + +timestamp=$(date +"%Y%m%d_%H%M%S") +task_name="task_demo_${timestamp}" + +echo "Starting distributed precheck with ${#nodes_ip[@]} nodes" +echo "Master node: ${nodes_ip[0]}" +echo "All nodes: ${nodes_ip[@]}" + +output_dir=./output_test + +PROFILING_CMD="[resnet]" + +# Join array elements with commas +host_ips=$(IFS=,; echo "${nodes_ip[*]}") + +# Run precheck with distributed configuration +msprof-analyze precheck start_all \ + --host_ips "${host_ips}" \ + --master_addr ${nodes_ip[0]} \ + --master_port 29500 \ + --nnodes ${#nodes_ip[@]} \ + --nproc_per_node 8 \ + --output_dir "${output_dir}" \ + --task_name ${task_name} \ + --profiling_cmd "${PROFILING_CMD}" \ + --static + +echo "Precheck completed" diff --git a/profiler/msprof_analyze/precheck/examples/scripts/test_hosts_env.sh b/profiler/msprof_analyze/precheck/examples/scripts/test_hosts_env.sh new file mode 100644 index 0000000000000000000000000000000000000000..68aa4b33ddce4cfaeee0b2b5e1008901b02809e8 --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/scripts/test_hosts_env.sh @@ -0,0 +1,166 @@ +#!/bin/bash + +# 默认值设置 +HOST_IPS=${HOST_IPS:-""} +TIMEOUT=${TIMEOUT:-5} + +# ANSI 颜色代码 +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +RED='\033[0;31m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color +BOLD='\033[1m' + +# 检查必需参数 +if [ -z "$HOST_IPS" ]; then + echo -e "${RED}Error: HOST_IPS environment variable is not set${NC}" + echo -e "Usage: ${BOLD}HOST_IPS='192.168.0.1,192.168.0.2' [CHECK_CANN=1] [TIMEOUT=5] bash $0${NC}" + exit 1 +fi + +# 获取CANN信息的函数 +get_cann_info() { + # 尝试多种方式获取CANN信息 + if command -v npu-smi &>/dev/null; then + npu_info=$(npu-smi info 2>/dev/null) + driver_version=$(echo "$npu_info" | grep "Driver Version" | awk -F':' '{print $2}' | tr -d ' ') + firmware_version=$(echo "$npu_info" | grep "Firmware Version" | awk -F':' '{print $2}' | tr -d ' ') + echo "Driver:$driver_version;Firmware:$firmware_version" + else + echo "NPU-SMI Not Found" + fi +} + +# 打印标题 +echo -e "\n${BOLD}🔍 Cluster Environment Checker${NC}" +echo -e "Usage: ${BOLD}HOST_IPS='192.168.0.1,192.168.0.2' [CHECK_CANN=1] [TIMEOUT=5] bash $0${NC}" +echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" + +# 获取本机环境信息 +echo -e "\n${BOLD}📊 Step 1: Collecting local environment info...${NC}" +echo -e "${BLUE}Detecting Python environment...${NC}" +LOCAL_PYTHON_PATH=$(which python3) +LOCAL_PYTHON_VERSION=$($LOCAL_PYTHON_PATH -V 2>&1) +echo -e "${BLUE}Checking installed packages...${NC}" +LOCAL_MSPROF_VERSION=$($LOCAL_PYTHON_PATH -m pip show msprof-analyze | grep Version | awk '{print $2}') +LOCAL_TORCH_VERSION=$($LOCAL_PYTHON_PATH -m pip show torch | grep Version | awk '{print $2}') +LOCAL_TORCH_NPU_VERSION=$($LOCAL_PYTHON_PATH -m pip show torch_npu | grep Version | awk '{print $2}') + +echo -e "\n${BOLD}📌 Local Environment Summary:${NC}" +echo -e " • Python Path: ${GREEN}$LOCAL_PYTHON_PATH${NC}" +echo -e " • Python Version: ${GREEN}$LOCAL_PYTHON_VERSION${NC}" +echo -e " • Msprof-analyze: ${GREEN}v$LOCAL_MSPROF_VERSION${NC}" +echo -e " • Torch: ${GREEN}v$LOCAL_TORCH_VERSION${NC}" +echo -e " • Torch_NPU: ${GREEN}v$LOCAL_TORCH_NPU_VERSION${NC}" + +# 构建远程检查命令 +CHECK_CMD=$(cat << EOF +echo "=== Python Path Check ===" && \ +test -f $LOCAL_PYTHON_PATH && \ +echo "=== Python Version ===" && \ +$LOCAL_PYTHON_PATH -V && \ +echo "=== Msprof-analyze Version ===" && \ +$LOCAL_PYTHON_PATH -m pip show msprof-analyze | grep Version | awk '{print \$2}' && \ +echo "=== Torch Version ===" && \ +$LOCAL_PYTHON_PATH -m pip show torch | grep Version | awk '{print \$2}' && \ +echo "=== Torch_NPU Version ===" && \ +$LOCAL_PYTHON_PATH -m pip show torch_npu | grep Version | awk '{print \$2}' && \ +echo "=== TMUX Check ===" && \ +which tmux +EOF +) + +# 检查每个远程主机 +echo -e "\n${BOLD}🔄 Step 2: Checking cluster nodes...${NC}" +IFS=',' read -ra HOSTS <<< "$HOST_IPS" +total_hosts=${#HOSTS[@]} +current_host=0 +failed_hosts=() + +for host in "${HOSTS[@]}"; do + ((current_host++)) + echo -e "\n${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" + echo -e "${BOLD}📡 Checking host [$current_host/$total_hosts]: ${YELLOW}$host${NC}" + + # 检查ssh连接 + echo -e " ⏳ Testing SSH connection..." + if ! ssh -o BatchMode=yes -o ConnectTimeout=$TIMEOUT $host "exit 0" &>/dev/null; then + echo -e " ${RED}❌ SSH connection failed${NC}" + failed_hosts+=("$host [SSH Failed]") + continue + fi + echo -e " ${GREEN}✓ SSH connection successful${NC}" + + # 检查Python解释器 + echo -e " ⏳ Verifying Python interpreter..." + if ! ssh -o BatchMode=yes -o ConnectTimeout=$TIMEOUT $host "test -f $LOCAL_PYTHON_PATH" &>/dev/null; then + echo -e " ${RED}❌ Python interpreter not found at: $LOCAL_PYTHON_PATH${NC}" + failed_hosts+=("$host [Python Not Found]") + continue + fi + echo -e " ${GREEN}✓ Python interpreter verified${NC}" + + # 检查环境 + echo -e " ⏳ Checking environment..." + remote_output=$(ssh -o BatchMode=yes -o ConnectTimeout=$TIMEOUT $host "$CHECK_CMD" 2>&1) + if [ $? -ne 0 ]; then + echo -e " ${RED}❌ Environment check failed${NC}" + echo -e " Error details: $remote_output" + failed_hosts+=("$host [Check Failed]") + continue + fi + + # 解析远程输出 + remote_python_version=$(echo "$remote_output" | awk '/=== Python Version ===/{getline; print}') + remote_msprof_version=$(echo "$remote_output" | awk '/=== Msprof-analyze Version ===/{getline; print}') + remote_torch_version=$(echo "$remote_output" | awk '/=== Torch Version ===/{getline; print}') + remote_torch_npu_version=$(echo "$remote_output" | awk '/=== Torch_NPU Version ===/{getline; print}') + remote_tmux_path=$(echo "$remote_output" | awk '/=== TMUX Check ===/{getline; print}') + + # 检查结果 + errors=() + + [ "$remote_python_version" != "$LOCAL_PYTHON_VERSION" ] && \ + errors+=("Python version mismatch: Local=$LOCAL_PYTHON_VERSION Remote=$remote_python_version") + + [ "$remote_msprof_version" != "$LOCAL_MSPROF_VERSION" ] && \ + errors+=("Msprof version mismatch: Local=$LOCAL_MSPROF_VERSION Remote=$remote_msprof_version") + + [ "$remote_torch_version" != "$LOCAL_TORCH_VERSION" ] && \ + errors+=("Torch version mismatch: Local=$LOCAL_TORCH_VERSION Remote=$remote_torch_version") + + [ "$remote_torch_npu_version" != "$LOCAL_TORCH_NPU_VERSION" ] && \ + errors+=("Torch_NPU version mismatch: Local=$LOCAL_TORCH_NPU_VERSION Remote=$remote_torch_npu_version") + + [ -z "$remote_tmux_path" ] && \ + errors+=("TMUX not found") + + if [ ${#errors[@]} -eq 0 ]; then + echo -e " ${GREEN}✓ All environment checks passed${NC}" + else + echo -e " ${RED}❌ Environment check failed:${NC}" + for error in "${errors[@]}"; do + echo -e " • ${RED}$error${NC}" + done + failed_hosts+=("$host [Version Mismatch]") + fi +done + +# 总结报告 +echo -e "\n${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}" +echo -e "${BOLD}📋 Final Report${NC}" +if [ ${#failed_hosts[@]} -eq 0 ]; then + echo -e "${GREEN}✅ All $total_hosts hosts passed environment checks!${NC}" + exit 0 +else + echo -e "${RED}❌ Environment check failed for ${#failed_hosts[@]} out of $total_hosts hosts:${NC}" + for failed_host in "${failed_hosts[@]}"; do + echo -e " • ${RED}$failed_host${NC}" + done + echo -e "\n${YELLOW}💡 Tips:${NC}" + echo -e " • Ensure all hosts have the same Python environment" + echo -e " • Check if tmux is installed: ${BOLD}sudo apt-get install tmux${NC}" + echo -e " • Verify SSH connectivity: ${BOLD}ssh-copy-id user@host${NC}" + exit 1 +fi diff --git a/profiler/msprof_analyze/precheck/examples/scripts/test_hosts_ssh.sh b/profiler/msprof_analyze/precheck/examples/scripts/test_hosts_ssh.sh new file mode 100644 index 0000000000000000000000000000000000000000..7489bb601ceaff09acf43aadd2b36e40a6682fb5 --- /dev/null +++ b/profiler/msprof_analyze/precheck/examples/scripts/test_hosts_ssh.sh @@ -0,0 +1,61 @@ +### SSH 连通性测试 +# 保存为 test_hosts_ssh.sh +#!/bin/bash + +# 默认值设置 +HOST_IPS=${HOST_IPS:-""} +TIMEOUT=${TIMEOUT:-5} + +# 检查必需参数 +if [ -z "$HOST_IPS" ]; then + echo "Error: HOST_IPS environment variable is not set" + echo "Usage: HOST_IPS='192.168.0.1,192.168.0.2' TIMEOUT=5 bash $0" + exit 1 +fi + +echo "Testing SSH connections with timeout ${TIMEOUT}s..." +echo "Host list: $HOST_IPS" +echo "-----------------------------------" + +# 测试每个主机的SSH连接 +failed_hosts=() +IFS=',' read -ra HOSTS <<< "$HOST_IPS" +for host in "${HOSTS[@]}"; do + echo -n "Testing SSH connection to $host... " + if ssh -o BatchMode=yes -o ConnectTimeout=$TIMEOUT $host "exit 0" &> /dev/null; then + echo "Success ✓" + else + echo "Failed ✗" + failed_hosts+=($host) + fi +done + +# 如果有失败的主机,输出设置建议 +if [ ${#failed_hosts[@]} -ne 0 ]; then + echo -e "\n❌ Some hosts are not accessible via SSH" + echo "Please run these commands to set up passwordless SSH:" + echo "-----------------------------------" + for host in "${failed_hosts[@]}"; do + echo "# 1. If ~/.ssh/id_rsa doesn't exist, generate it" + echo "[ ! -f ~/.ssh/id_rsa ] && ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa" + echo "" + echo "# 2. Copy your key to remote host" + echo "ssh-copy-id $USER@$host" + echo "" + echo "# 3. Set correct permissions" + echo "chmod 600 ~/.ssh/id_rsa" + echo "-----------------------------------" + done + exit 1 +else + echo -e "\n✅ All SSH connections successful!" +fi + +# 使用方法: +# ```bash +# # 方式1:直接运行(使用默认超时时间5秒) +# HOST_IPS="192.168.0.1,192.168.0.2" bash test_hosts_ssh.sh + +# # 方式2:指定超时时间 +# HOST_IPS="192.168.0.1,192.168.0.2" TIMEOUT=3 bash test_hosts_ssh.sh +# ``` diff --git a/profiler/msprof_analyze/precheck/manager/__init__.py b/profiler/msprof_analyze/precheck/manager/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/manager/args_manager.py b/profiler/msprof_analyze/precheck/manager/args_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..252a51bae5257e750541fde452db69e9e88eb8bb --- /dev/null +++ b/profiler/msprof_analyze/precheck/manager/args_manager.py @@ -0,0 +1,446 @@ +import argparse +import ipaddress +import os +import re +import shlex +import shutil +import sys +import logging +from typing import List, Union +from collections import OrderedDict + +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.precheck.common.utils import cn_now +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +class BaseArgsManager: + def __init__(self, args): + self._args = args + + def __repr__(self): + return str(self.to_dict()) + + @property + def master_addr(self): + return self._args.master_addr + + @property + def master_port(self): + return self._args.master_port + + @property + def nnodes(self): + return self._args.nnodes + + @property + def nproc_per_node(self): + return self._args.nproc_per_node + + @property + def node_prof_save_dir(self): + return self._args.node_prof_save_dir or os.path.join(self.task_output_dir, 'node_prof_save_dir') + + @property + def master_prof_gather_dir(self): + return self._args.master_prof_gather_dir or os.path.join(self.task_output_dir, 'master_prof_gather_dir') + + @property + def output_dir(self): + return self._args.output_dir + + @property + def task_name(self): + if self._args.task_name: + return self._args.task_name + return "task_" + cn_now().strftime("%Y%m%d-%H%M%S") + + @property + def static(self): + return self._args.static + + @property + def task_output_dir(self): + return os.path.join(self.output_dir, self.task_name) + + @property + def profiling_cmd(self): + return self._args.profiling_cmd + + @property + def prof_in_shared_storage(self): + return getattr(self._args, 'prof_in_shared_storage', False) + + @staticmethod + def escape_special_chars(text): + ESCAPE_CHARS_MAP = { + '\n': '\\n', + '\t': '\\t', + '\r': '\\r', + '\\': '\\\\', + '\"': '\\\"', + '\'': '\\\'' + } + return re.sub(r'([\n\t\r\\\'"])', lambda match: ESCAPE_CHARS_MAP[match.group()], text) + + @staticmethod + def _check_output_path_valid(output_path: str) -> Union[Exception, None]: + try: + if not os.path.exists(output_path): + PathManager.check_input_directory_path(output_path) + else: + PathManager.check_input_directory_path(output_path) + PathManager.check_path_owner_consistent(output_path) + except Exception as e: + return e + return None + + @staticmethod + def _check_ip_valid(ip: str) -> Union[Exception, None]: + try: + ipaddress.ip_address(ip) + except ValueError as e: + return e + return None + + @staticmethod + def _check_int_range( + value: int, min_value: int = Constant.ARG_MIN_INT_VALUE, max_value: int = Constant.ARG_MAX_INT_VALUE + ) -> Union[Exception, None]: + if not (min_value <= value <= max_value): + return ValueError(f"The value must be between {min_value} and {max_value}.") + return None + + @staticmethod + def _check_executable_path_valid(executable_path: str) -> Union[Exception, None]: + try: + PathManager.check_path_owner_consistent(executable_path) + if not os.path.isfile(executable_path): + raise ValueError("The path is not a valid executable file.") + if not os.access(executable_path, os.X_OK): + raise ValueError("The file at the path is not executable.") + except Exception as e: + return e + return None + + @staticmethod + def _check_identifier_valid(identifier: str) -> Union[Exception, None]: + pattern = r'^[a-zA-Z_][a-zA-Z0-9_-]*$' + if not re.match(pattern, identifier): + return ValueError(f"It must start with a letter or underscore, " + f"followed by any number of letters, digits, underscores, or dashes.") + return None + + @staticmethod + def _check_command_injection(cmd: str) -> Union[Exception, None]: + dangerous_chars = [';', '&&', '||', '|', '>', '<', '`', '$', '\\'] + for char in dangerous_chars: + if char in cmd: + return ValueError( + f"Command contains dangerous character '{char}'. " + "Command injection is not allowed." + ) + return None + + @staticmethod + def _check_dangerous_commands(cmd: str) -> Union[Exception, None]: + dangerous_commands = [ + 'rm', 'mv', 'cp', 'chmod', 'chown', 'dd', + 'mkfs', 'mount', 'umount', 'sudo', 'su', + 'reboot', 'shutdown', 'poweroff', 'init', + 'passwd', 'adduser', 'deluser', 'useradd', + 'userdel', 'groupadd', 'groupdel' + ] + + cmd_parts = shlex.split(cmd) + if not cmd_parts: + return ValueError("Empty command is not allowed") + + base_cmd = os.path.basename(cmd_parts[0]) + if base_cmd in dangerous_commands: + return ValueError( + f"Command '{base_cmd}' is not allowed for security reasons" + ) + return None + + @classmethod + def safe_format(cls, format_str: str, *args, max_len=Constant.ARG_MAX_LEN): + """ + Safely formats a string by truncating arguments longer than a specified maximum length and escaping special characters. + + This function is designed to create user-friendly error messages by ensuring that all arguments are displayed in a safe and concise manner. + It truncates any argument that exceeds the maximum length and appends an ellipsis to indicate the truncation. + Additionally, it escapes special characters in the arguments to prevent formatting errors or injection issues. + + Args: + format_str (str): The format string into which the arguments are inserted. + *args: Variable length argument list to be formatted into the format_str. + max_len (int): The maximum allowed length of any argument string after which it will be truncated. + Defaults to Constant.MAX_ARG_LEN. + + Returns: + str: A formatted string with all arguments safely inserted. + """ + + def _str(x): + x_str = str(x) + if len(x_str) > max_len: + x_str = x_str[:max_len] + "..." + return cls.escape_special_chars(x_str) + + args = [_str(arg) for arg in args] + return format_str.format(*args) + + @classmethod + def raise_error(cls, error_format_msg, *args): + """ + Raises a RuntimeError with a formatted message that includes special character escaping and length limitation. + + This method is designed to handle untrusted external parameters `*args` by ensuring that the error message is user-friendly. + It applies special character escaping and truncates arguments to a predefined maximum length to prevent formatting errors or injection issues. + + Args: + error_format_msg (str): The format string into which the arguments are inserted. + *args: Variable length argument list to be formatted into the error_format_msg. + """ + err_msg = cls.safe_format(error_format_msg, *args) + raise RuntimeError(err_msg) + + def to_dict(self): + """Automatically convert all properties to a dictionary.""" + properties_dict = {} + for prop in dir(self): + if isinstance(getattr(type(self), prop, None), property): + properties_dict[prop] = getattr(self, prop) + return properties_dict + + def check_args(self): + + error = self._check_ip_valid(self.master_addr) + if error: + self.raise_error('Master address {} is not valid: {}', self.master_addr, error) + + error = self._check_int_range(self.master_port, + min_value=Constant.ARG_MIN_PORT_VALUE, max_value=Constant.ARG_MAX_PORT_VALUE) + if error: + self.raise_error('Master port {} is not valid: {}', self.master_port, error) + + error = self._check_int_range(self.nnodes, min_value=1) + if error: + self.raise_error('Total number of nodes {} is not valid: {}', self.nnodes, error) + + error = self._check_int_range(self.nproc_per_node, min_value=1) + if error: + self.raise_error('Number of processes per node {} is not valid: {}', self.nproc_per_node, error) + + error = self._check_output_path_valid(self.output_dir) + if error: + self.raise_error('Output directory {} is not valid: {}', self.output_dir, error) + + error = self._check_identifier_valid(self.task_name) + if error: + self.raise_error('Task name {} is not valid: {}', self.task_name, error) + + error = self._check_output_path_valid(self.node_prof_save_dir) + if error: + self.raise_error('Node prof save directory {} is not valid: {}', self.node_prof_save_dir, error) + + error = self._check_output_path_valid(self.master_prof_gather_dir) + if error: + self.raise_error('Master prof gather directory {} is not valid: {}', self.master_prof_gather_dir, error) + + self._check_profiling_cmd_valid(self.profiling_cmd) + + def _check_profiling_cmd_valid(self, profiling_cmd: str) -> None: + if not profiling_cmd.strip(): + logger.error('Profiling command should not be empty.') + + if profiling_cmd in Constant.DEFAULT_PROFILING_COMMANDS: + logger.info(self.safe_format('Using default profiling command for {}', profiling_cmd)) + return + + if len(self.profiling_cmd) > Constant.ARG_MAX_LEN: + self.raise_error( + 'The profiling command is too long, it must be less than {} characters', Constant.ARG_MAX_LEN) + + error = self._check_command_injection(self.profiling_cmd) + if error: + self.raise_error('Profiling command {} is not valid: {}', self.profiling_cmd, error) + + error = self._check_dangerous_commands(self.profiling_cmd) + if error: + self.raise_error('Profiling command {} is not valid: {}', self.profiling_cmd, error) + + +class PrecheckArgsManager(BaseArgsManager): + def __init__(self, args): + super().__init__(args) + + self._args = args + self._ssh_remote_hosts = {} + self._host_ips = [] + + self.check_args() + + @property + def host_ips(self): + return self._host_ips + + @property + def host_config_file(self): + return self._args.host_config_file + + @property + def ssh_remote_hosts(self): + return self._ssh_remote_hosts + + @property + def python_path(self): + if not self._args.python_path: + return sys.executable + + if os.path.exists(self._args.python_path): + return self._args.python_path + + python_path = shutil.which(self._args.python_path) + return python_path + + @classmethod + def _check_host_ips_valid(cls, host_ips: List[str]) -> Union[Exception, None]: + if not host_ips: + return None + + for i, ip in enumerate(host_ips): + if not ipaddress.ip_address(ip): + return ValueError(f"The {i}-th host ip is not valid.") + + if len(host_ips) != len(set(host_ips)): + return ValueError("Host IPs must be unique.") + + return None + + def try_to_parse_host_config_file(self, host_config_file: str) -> Union[Exception, OrderedDict]: + if not host_config_file: + logger.info("SSH config file is not provided.") + logger.info("Use default ssh settings for all nodes: ssh_key_file, user, port = ~/.ssh/id_rsa, $USER, 22") + return {} + + if not os.path.isfile(host_config_file): + return FileNotFoundError(f"SSH config file {host_config_file} does not exist.") + + PathManager.check_path_readable(host_config_file) + PathManager.check_file_size(host_config_file) + + ssh_remote_hosts = [] + required_fields = ['host_ip', 'ssh_key_file', 'user', 'port'] + with open(host_config_file, 'r') as f: + header = f.readline().strip().split(',') + if any(field not in header for field in required_fields): + return ValueError(f"Host config file {host_config_file} is missing required fields: {required_fields}") + + for line in f: + values = line.strip().split(',') + if len(values) != len(required_fields): + return ValueError( + f"Host config file {host_config_file} has invalid number of fields in line: {line}") + + host_ip, ssh_key_file, user, port = values + ssh_key_file = PathManager.expanduser_for_argumentparser(ssh_key_file) + port = int(port) + + exception = None + try: + PathManager.check_path_readable(ssh_key_file) + if os.stat(ssh_key_file).st_mode & 0o777 != 0o600: + raise ValueError(f"SSH key file {ssh_key_file} must have permissions set to 600") + + exception = self._check_int_range(port, min_value=Constant.ARG_MIN_PORT_VALUE, + max_value=Constant.ARG_MAX_PORT_VALUE) \ + or self._check_identifier_valid(user) \ + or self._check_ip_valid(host_ip) + + except Exception as e: + exception = e + + if exception: + return RuntimeError( + f"Host config file {host_config_file} is not valid, invalid line: {line}, error: {exception}") + + ssh_remote_hosts.append({ + 'host': host_ip, + 'username': user, + 'key_filename': ssh_key_file, + 'port': int(port) + }) + + ssh_remote_hosts = OrderedDict({item['host']: item for item in ssh_remote_hosts}) + return ssh_remote_hosts + + def check_args(self): + super().check_args() + + error = self._check_executable_path_valid(self.python_path) + if error: + self.raise_error('Python path {} is not valid: {}', self.python_path, error) + + # Ensure either host_ips or host_config_file is provided + if not self.host_config_file and not self._args.host_ips: + self.raise_error('Either host config file or host ips must be provided') + + # If host_ips is provided, validate it first + if self._args.host_ips: + error = self._check_host_ips_valid(self._args.host_ips) + if error: + self.raise_error('Host ips {} is not valid: {}', self._args.host_ips, error) + + # Set the validated host_ips + self._host_ips = self._args.host_ips + + # If config file is provided, parse and validate it + if self.host_config_file: + res = self.try_to_parse_host_config_file(self.host_config_file) + if isinstance(res, Exception): + self.raise_error('Host config file {} is not valid: {}', self.host_config_file, res) + self._ssh_remote_hosts = res + config_file_ips = list(self._ssh_remote_hosts.keys()) + + # If host_ips is also provided, verify they match + if self._args.host_ips: + if not set(self._args.host_ips) == set(config_file_ips): + self.raise_error('Host ips does not match the IPs in host config file. Given: {}, In file: {}', + self._args.host_ips, config_file_ips) + else: + # If only config file is provided, use IPs from the config file + self._host_ips = config_file_ips + + # Validate number of nodes and master node configuration + if self.nnodes != len(self.host_ips): + self.raise_error( + 'The number of nodes {} is not equal to the number of host ips {}', + self.nnodes, len(self.host_ips)) + + if self.master_addr != self.host_ips[0]: + self.raise_error( + 'The master address {} is not the first host ip {}', + self.master_addr, self.host_ips[0]) + + +class PrecheckRunnerArgsManager(BaseArgsManager): + def __init__(self, args): + super().__init__(args) + + self._args = args + self.check_args() + + @property + def node_rank(self): + return self._args.node_rank + + def check_args(self): + super().check_args() + + error = self._check_int_range(self.node_rank, min_value=0, max_value=self.nnodes - 1) + if error: + self.raise_error('Node rank {} is not valid: {}', self.node_rank, error) diff --git a/profiler/msprof_analyze/precheck/manager/disk_manager.py b/profiler/msprof_analyze/precheck/manager/disk_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..a497c992cbe895e2dcaf115a8ad3469a687a0759 --- /dev/null +++ b/profiler/msprof_analyze/precheck/manager/disk_manager.py @@ -0,0 +1,24 @@ +import os +import logging + +logger = logging.getLogger(__name__) + + +class DiskManager: + @staticmethod + def check_disk_space(input_prof_path, prof_data_size_gb): + if not os.path.exists(input_prof_path): + logger.error(f"路径不存在: {input_prof_path}") + raise FileNotFoundError(f"路径不存在: {input_prof_path}") + + if not os.access(input_prof_path, os.R_OK): + logger.error(f"无读取权限: {input_prof_path}") + raise PermissionError(f"无读取权限: {input_prof_path}") + + statvfs = os.statvfs(input_prof_path) + disk_free_gb = statvfs.f_bavail * statvfs.f_frsize / (1024 ** 3) + + if disk_free_gb - prof_data_size_gb <= 50: + logger.error(f"磁盘空间不足: {disk_free_gb:.2f}GB, 输入数据大小: {prof_data_size_gb:.2f}GB") + raise BufferError(f"磁盘空间不足: {disk_free_gb:.2f}GB, 输入数据大小: {prof_data_size_gb:.2f}GB") + diff --git a/profiler/msprof_analyze/precheck/manager/distribute_manager.py b/profiler/msprof_analyze/precheck/manager/distribute_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..f35fdf45c6ad6245ebf3e1225faec41c6a0382c8 --- /dev/null +++ b/profiler/msprof_analyze/precheck/manager/distribute_manager.py @@ -0,0 +1,52 @@ +from copy import deepcopy + + +class DistributeManager: + def __init__(self, args): + self.master_addr = args.master_addr + self.master_port = args.master_port + self.nnodes = args.nnodes + self.nproc_per_node = args.nproc_per_node + self.node_rank = args.node_rank + + self.local_rank = 0 + self.rank = self.local_rank + self.node_rank * self.nproc_per_node + + self.world_size = self.nnodes * self.nproc_per_node + self.local_world_size = self.nproc_per_node + + self.group_rank = -1 + + def __repr__(self): + """ + Custom __repr__ method to print out the object in a human-readable format + """ + return (f"DistributeManager(master_addr='{self.master_addr}', " + f"master_port='{self.master_port}', nnodes={self.nnodes}, " + f"nproc_per_node={self.nproc_per_node}, node_rank={self.node_rank}, " + f"local_rank={self.local_rank}, rank={self.rank}, " + f"world_size={self.world_size}, local_world_size={self.local_world_size}, " + f"group_rank={self.group_rank})") + + def update_local_rank(self, local_rank: int): + self.local_rank = local_rank + self.rank = self.local_rank + self.node_rank * self.nproc_per_node + return deepcopy(self) + + def get_dist_env_data(self): + self.rank = self.local_rank + self.node_rank * self.nproc_per_node + + data = { + "MASTER_ADDR": self.master_addr, + "MASTER_PORT": self.master_port, + "LOCAL_RANK": self.local_rank, + "GROUP_RANK": self.group_rank, + "NODE_RANK": self.node_rank, + "RANK": self.rank, + "WORLD_SIZE": self.world_size, + "LOCAL_WORLD_SIZE": self.local_world_size, + } + + for k in data: + data[k] = str(data[k]) + return data diff --git a/profiler/msprof_analyze/precheck/manager/group_manager.py b/profiler/msprof_analyze/precheck/manager/group_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..fd492bd3d8b54612ccf8401d2e1997f5a0908081 --- /dev/null +++ b/profiler/msprof_analyze/precheck/manager/group_manager.py @@ -0,0 +1,123 @@ +import math +import os +import torch.distributed as dist + +from msprof_analyze.advisor.utils.utils import singleton + + +class EnvGroup: + def __init__(self, rank, local_rank, world_size, master_addr, master_port, group_rank, local_world_size): + self.rank = rank + self.local_rank = local_rank + self.world_size = world_size + self.master_addr = master_addr + self.master_port = master_port + self.group_rank = group_rank + self.local_world_size = local_world_size + self.check_all_attribute() + + def check_all_attribute(self): + if not isinstance(self.rank, int): + raise ValueError('rank must be an integer') + + if not isinstance(self.local_rank, int): + raise ValueError('local_rank must be an integer') + + if not isinstance(self.world_size, int): + raise ValueError('world_size must be an integer') + + if not isinstance(self.master_addr, str): + raise ValueError('master_addr must be an string') + + if not isinstance(self.master_port, int): + raise ValueError('master_port must be an integer') + + if not isinstance(self.group_rank, int): + raise ValueError('group_rank must be an integer') + + if not isinstance(self.local_world_size, int): + raise ValueError('local_world_size must be an integer') + + def set_env(self): + os.environ["RANK"] = str(self.rank) + os.environ["LOCAL_RANK"] = str(self.local_rank) + os.environ["WORLD_SIZE"] = str(self.world_size) + os.environ["MASTER_ADDR"] = self.master_addr + os.environ["MASTER_PORT"] = str(self.master_port) + os.environ["GROUP_RANK"] = str(self.group_rank) + os.environ["LOCAL_WORLD_SIZE"] = str(self.local_world_size) + + +class SubGroup: + def __init__(self, group, master_rank, ranks, file_sizes, file_hashes): + self.group = group + self.master_rank = master_rank + self.ranks = ranks + self.file_sizes = file_sizes + self.file_hashes = file_hashes + self.max_file_sizes = max(file_sizes) + self.split_file_size = None + self.splits = None + self.max_splits = None + + def split_size(self, split_file_size): + self.split_file_size = split_file_size + self.splits = [] + self.max_splits = math.ceil(self.max_file_sizes / split_file_size) + for file_size in self.file_sizes: + cur_splits = [] + for _ in range(self.max_splits): + if file_size > 0: + cur_splits.append(min(split_file_size, file_size)) + else: + cur_splits.append(0) + file_size -= split_file_size + self.splits.append(cur_splits) + + +@singleton +class GroupManager: + _initialized = False + + def __init__(self): + if not self._initialized: + self._rank = int(os.environ['RANK']) + self._local_rank = int(os.environ['LOCAL_RANK']) + self._world_size = int(os.environ['WORLD_SIZE']) + self._group_rank = int(os.environ['GROUP_RANK']) + self._rank_size = int(os.environ['LOCAL_WORLD_SIZE']) + self._local_group = None + self._node_group = None + self._sub_group_dict = {} + + def get_rank(self): + return self._rank + + def get_local_rank(self): + return self._local_rank + + def get_world_size(self): + return self._world_size + + def get_rank_size(self): + return self._rank_size + + def get_group_rank(self): + return self._group_rank + + def get_local_group(self): + if self._local_group is None: + groups = [x for x in range(self._group_rank * self._rank_size, (self._group_rank + 1) * self._rank_size)] + self._local_group = dist.new_group(ranks=groups) + return self._local_group + + def add_rank_sub_group(self, sub_group, ranks, file_sizes, file_hashes): + for rank in ranks: + self._sub_group_dict[rank] = SubGroup(group=sub_group, master_rank=ranks[0], ranks=ranks, + file_sizes=file_sizes, file_hashes=file_hashes) + + def get_rank_sub_group(self, rank): + if rank in self._sub_group_dict: + return self._sub_group_dict[rank] + else: + return None diff --git a/profiler/msprof_analyze/precheck/manager/task_manager.py b/profiler/msprof_analyze/precheck/manager/task_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..d08f7afdad2c624e210e506ae5897480c0e6ced7 --- /dev/null +++ b/profiler/msprof_analyze/precheck/manager/task_manager.py @@ -0,0 +1,79 @@ +import os +import logging +import argparse + +from msprof_analyze.precheck.analyze.advisor_adaptor import advisor_adaptor +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +class TaskManager: + ADVISOR = 'advisor' + supported_analyzer = { + ADVISOR: advisor_adaptor, + } + + all_analyzer = list(supported_analyzer.keys()) + + @staticmethod + def add_analyzer(analyzer_name, analyzer_class): + + if analyzer not in TaskManager.supported_analyzer: + TaskManager.supported_analyzer[analyzer_name] = analyzer_class + + @staticmethod + def get_analyzer(analyzer_name): + return TaskManager.supported_analyzer.get(analyzer_name) + + @staticmethod + def get_result(analyzer_name, input_path, output): + + if analyzer_name not in TaskManager.all_analyzer: + logger.error("Error analyzer %s, supported analyzer are %s", analyzer_name, TaskManager.all_analyzer) + raise ValueError("Error analyzer %s, supported analyzer are %s", analyzer_name, TaskManager.all_analyzer) + + input_profiling_path_real = PathManager.get_realpath(input_path) + output_path_real = PathManager.get_realpath(output) + try: + analyze = TaskManager.get_analyzer(analyzer_name) + analyzer_instance = analyze() + result = analyzer_instance.analyze(input_profiling_path=input_profiling_path_real, + output_path=output_path_real) + + except Exception as e: + logger.error("%s is skipped when an exception is encountered. The exception is as follows: %s", + analyzer_name, e) + + +def get_args(): + parser = argparse.ArgumentParser(description="Profiler task manager") + + # Add command-line arguments + parser.add_argument('--input_profiling_path', type=str, + default=os.path.abspath("./result/"), + help="Path to the input profiling data") + parser.add_argument('--output_path', type=str, default=os.path.abspath('../result'), + help="Path to store the output results") + + return parser.parse_args() + + +if __name__ == "__main__": + try: + # Get arguments from the command line + args = get_args() + + # Use the command-line arguments or the default values + input_profiling_path = args.input_profiling_path + output_path = args.output_path + # Access all analyzers from the TaskManager + all_analyzer = TaskManager.all_analyzer + + # Loop through all analyzers and fetch the results + for analyzer in all_analyzer: + TaskManager.get_result(analyzer=analyzer, input_profiling_path=input_profiling_path, + output_path=output_path) + + except Exception as error: + logger.error("%s", error) diff --git a/profiler/msprof_analyze/precheck/precheck.py b/profiler/msprof_analyze/precheck/precheck.py new file mode 100644 index 0000000000000000000000000000000000000000..64c938983813ab35e07d4ef5208fbe5423808100 --- /dev/null +++ b/profiler/msprof_analyze/precheck/precheck.py @@ -0,0 +1,33 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.precheck.env_check.check_item_factory import CheckItemFactory + + +class Precheck: + + @staticmethod + def env_precheck(**kwargs): + check_type = kwargs.get("check_type") + if not check_type: + return + check_items = CheckItemFactory.get_check_item(check_type) + for check_item in check_items: + check_obj = check_item(**kwargs) + check_obj.check() + return + + +if __name__ == '__main__': + Precheck.env_precheck(check_type="env_variable") diff --git a/profiler/msprof_analyze/precheck/requirements.txt b/profiler/msprof_analyze/precheck/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..8203bbe24f892e6d71b116939629c3582b2b1582 --- /dev/null +++ b/profiler/msprof_analyze/precheck/requirements.txt @@ -0,0 +1,41 @@ +absl-py==2.1.0 +attrs==24.2.0 +auto-tune +cloudpickle==3.0.0 +decorator==5.1.1 +filelock==3.15.4 +fsspec==2024.6.1 +MarkupSafe==2.1.5 +ml-dtypes==0.2.0 +mpmath==1.3.0 +networkx==3.1 +numpy==1.24.4 +psutil==6.0.0 +scipy==1.10.1 +sympy==1.13.2 +te +torch_npu==2.4.0 +tornado==6.4.1 +typing_extensions==4.12.2 + + + +## requirements for mstt advisor +click +tabulate +jinja2 +PyYAML +tqdm +prettytable +ijson +requests +xlsxwriter +SQLAlchemy +urllib3<2.0 +# bottleneck >= 1.3.6 # 注释行没有问题 +pandas + +# 如果你想要确保下载所有包的完整版本和所有的依赖项(包括子依赖), +# pip download -r requirements.txt -d pip_cache --no-deps +# 在离线环境中使用缓存安装依赖 +# pip install --no-index --find-links=file:///path/to/pip_cache -r requirements.txt diff --git a/profiler/msprof_analyze/precheck/runner/__init__.py b/profiler/msprof_analyze/precheck/runner/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/runner/__main__.py b/profiler/msprof_analyze/precheck/runner/__main__.py new file mode 100644 index 0000000000000000000000000000000000000000..8f031ae14c2b3610799e2824f4a2d7212ae19eae --- /dev/null +++ b/profiler/msprof_analyze/precheck/runner/__main__.py @@ -0,0 +1,160 @@ +import subprocess +import sys +import os +import logging + +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.precheck.common.logger import add_file_handler, create_logger +from msprof_analyze.precheck.common.utils import check_file_owner_and_permission, cn_now +from msprof_analyze.precheck.manager.args_manager import PrecheckRunnerArgsManager +from msprof_analyze.precheck.runner.runners import CollectorRunner, AdvisorRunner +from msprof_analyze.precheck.manager.distribute_manager import DistributeManager +from msprof_analyze.prof_common.path_manager import PathManager + +logging.basicConfig(level=Constant.LOGGING_LEVEL) +logger = create_logger("msprof_analyze.precheck", Constant.LOGGING_LEVEL, use_memory_handler=True) + + +def get_conda_envs_info(python_path=sys.executable): + """ + Get the conda environment activation command based on Python executable path. + For non-conda environments, returns source ~/.bashrc command. + + Args: + python_path (str): The path to the Python executable. + + Returns: + tuple: A tuple containing (env_name, activation_command). + For conda: (env_name, "source /path/to/conda/bin/activate env_name") + For non-conda: (None, "source ~/.bashrc") + """ + try: + # Check if we're in a conda environment using CONDA_PREFIX + conda_prefix = os.environ.get('CONDA_PREFIX') + if conda_prefix: + conda_env = os.path.basename(conda_prefix) + conda_base = os.path.dirname(os.path.dirname(conda_prefix)) if 'envs' in conda_prefix else conda_prefix + activate_script = os.path.join(conda_base, "bin", "activate") + + if os.path.exists(activate_script): + check_file_owner_and_permission(activate_script) + return conda_env, f"source {activate_script} {conda_env}" + + # Fallback to path-based detection + CONDA_ENV_BASE_BIAS = 4 + path_splits = python_path.rsplit(os.path.sep, CONDA_ENV_BASE_BIAS) + + if len(path_splits) == CONDA_ENV_BASE_BIAS + 1: + conda_base_path, envs_str, conda_env, _, _ = path_splits + + if envs_str == 'envs': + activate_script = os.path.join(conda_base_path, "bin", "activate") + if os.path.exists(activate_script): + check_file_owner_and_permission(activate_script) + return conda_env, f"source {activate_script} {conda_env}" + + return None, "source ~/.bashrc" + + except Exception as e: + logger.warning("Failed to get conda environment info: %s. Falling back to source ~/.bashrc", str(e)) + return None, "source ~/.bashrc" + + +def start_precheck_runner(args: PrecheckRunnerArgsManager): + logger.info("Starting precheck runner with arguments: %s", args) + + dist_config = DistributeManager(args) + logger.info("Command line arguments: %s", sys.argv) + logger.info("Distributed configuration: %s", dist_config) + + profiler_res_dir_base = args.node_prof_save_dir + transporter_res_dir_base = args.master_prof_gather_dir + advisor_res_dir_base = args.master_prof_gather_dir + + PathManager.make_dir_safety(profiler_res_dir_base) + PathManager.make_dir_safety(transporter_res_dir_base) + PathManager.make_dir_safety(advisor_res_dir_base) + + prof_node_res_dir = profiler_res_dir_base + logger.info("Profiler results directory: %s", prof_node_res_dir) + + # start profiling + logger.info("Starting profiler runner") + env_name, conda_activate_cmd = get_conda_envs_info() + if env_name is None: + logger.warning("No conda environment found. Using system environment.") + else: + logger.info("Using conda environment: %s", env_name) + + profiler_example_name = Constant.DEFAULT_PROFILING_COMMANDS.get(args.profiling_cmd, None) + if profiler_example_name is None: + profiling_cmd = [ + "/bin/bash", "-ic", + f"{conda_activate_cmd} && cd {os.getcwd()} && " + f"MASTER_ADDR={dist_config.master_addr} MASTER_PORT={dist_config.master_port} " + f"NNODES={dist_config.nnodes} NODE_RANK={dist_config.node_rank} " + f"NPROC_PER_NODE={dist_config.nproc_per_node} " + f"{args.profiling_cmd}" + ] + else: + profiler_example_base = os.path.join(os.path.dirname(os.path.dirname(__file__)), "examples", "profiler", ) + + profiling_cmd = [ + "/bin/bash", "-ic", + f"{conda_activate_cmd} && cd {os.getcwd()} && " + f"torchrun " + f"--master_addr={dist_config.master_addr} " + f"--master_port={dist_config.master_port} " + f"--nproc_per_node={dist_config.nproc_per_node} " + f"--nnodes={dist_config.nnodes} " + f"--node_rank={dist_config.node_rank} " + f"{os.path.join(profiler_example_base, 'train_with_profiler.py')} " + f"--example_name {profiler_example_name} " + f"--prof_output_dir {prof_node_res_dir}" + + (" --static" if args.static else "") + ] + + logger.info("Using custom profiling command: %s", ' '.join(profiling_cmd)) + try: + logger.info("Executing profiling command...") + subprocess.run(profiling_cmd, check=True, capture_output=False, text=True) + logger.info("Profiling command completed successfully") + except subprocess.CalledProcessError as e: + logger.error("Profiling command failed with error: %s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + raise + + # zip and transport to master + if args.prof_in_shared_storage: + logger.info("Skipping data collection as profiling data is in shared storage") + prof_gather_dir = prof_node_res_dir + else: + logger.info("Starting collector runner") + CollectorRunner(src_dir=prof_node_res_dir, des_dir=transporter_res_dir_base, config=dist_config).run() + prof_gather_dir = transporter_res_dir_base + + # analyse the gathered files + if dist_config.rank == 0: + logger.info("Starting advisor runner") + AdvisorRunner( + src_dir=prof_gather_dir, + des_dir=advisor_res_dir_base, + config=dist_config, + is_shared_storage=args.prof_in_shared_storage + ).run() + + logger.info("Completed precheck runner execution") + + +def main(args=None): + global logger + output_dir = os.path.join(args.output_dir, args.task_name) + PathManager.make_dir_safety(output_dir) + + timestamp = cn_now().strftime('%Y%m%d_%H%M%S') + log_file_path = os.path.join(output_dir, f'precheck_runner_{timestamp}.log') + logger = add_file_handler(logger, log_file_path) + + try: + start_precheck_runner(args) + except Exception as e: + logger.error("Precheck runner failed with error: %s", e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) diff --git a/profiler/msprof_analyze/precheck/runner/runners.py b/profiler/msprof_analyze/precheck/runner/runners.py new file mode 100644 index 0000000000000000000000000000000000000000..f46dc398a7fe1a5d02733be3428c4fb30f649f43 --- /dev/null +++ b/profiler/msprof_analyze/precheck/runner/runners.py @@ -0,0 +1,151 @@ +import os +import subprocess +import zipfile +import glob +import logging + +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.precheck.manager.distribute_manager import DistributeManager +from msprof_analyze.precheck.tools.archive_utils import create_archive, extract_archive, ArchiveConfig, \ + compare_directory_with_archive +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +class AdvisorRunner: + def __init__(self, src_dir, des_dir, config: DistributeManager, *args, **kwargs): + self.src_dir = src_dir + self.dest_dir = des_dir + self.config = config + self.is_shared_storage = kwargs.get('is_shared_storage', False) + + logger.info('%s init, args: %s, kwargs: %s', self.__class__.__name__, args, kwargs) + self.archive_extract_dir = os.path.join(self.dest_dir, 'prof_unzipped') + + def prepare_analysis_dir(self): + """Prepare directory for analysis, either by extracting archives or using source directly""" + if self.is_shared_storage: + logger.info("Using shared storage directory directly: %s", self.src_dir) + return self.src_dir + + logger.info("Preparing analysis directory by extracting archives") + PathManager.make_dir_safety(self.archive_extract_dir) + + archives_found = False + for root, _, files in os.walk(self.src_dir): + for file in files: + if any(file.endswith(ext) for ext in ['.zip', '.tar', '.tar.gz', '.tgz', ]): + archives_found = True + archive_path = os.path.join(root, file) + logger.info("Extracting archive: %s", archive_path) + extract_archive(archive_path, self.archive_extract_dir) + + if not archives_found: + logger.info("No archives found in %s, using source directory directly", self.src_dir) + return self.src_dir + + return self.archive_extract_dir + + def run(self): + if self.config.node_rank == 0 and self.config.local_rank == 0: + analysis_dir = self.prepare_analysis_dir() + self.run_analyzer(analysis_dir) + + def run_analyzer(self, analysis_dir): + """Find and process ascend_pt files in the analysis directory""" + + def call_analyzer(input_profiling_path, output_path): + from msprof_analyze.precheck.manager.task_manager import TaskManager + all_analyzer = TaskManager.all_analyzer + for analyzer in all_analyzer: + TaskManager.get_result(analyzer_name=analyzer, + input_path=input_profiling_path, + output=output_path) + + ascend_pt_dirs = glob.glob(os.path.join(analysis_dir, "*_ascend_pt"), recursive=False) + + if ascend_pt_dirs: + logger.info("Found %d ascend_pt directories in %s:", len(ascend_pt_dirs), analysis_dir) + for ascend_pt_dir in ascend_pt_dirs: + logger.debug("Found ascend_pt directory: %s", ascend_pt_dir) + + call_analyzer(analysis_dir, self.dest_dir) + else: + logger.warning("No ascend_pt files found in %s", analysis_dir) + + +class CollectorRunner: + def __init__(self, src_dir, des_dir, config: DistributeManager): + self.src_dir = os.path.abspath(src_dir) + self.des_dir = os.path.abspath(des_dir) + self.config = config + + logger.info('%s init', self.__class__.__name__) + + @staticmethod + def zip_directory(src_dir): + """Zip the specified directory.""" + zip_file_path = f"{src_dir}.zip" + + logger.info('Start zipping directory %s to %s', src_dir, zip_file_path) + + # Check if zip file already exists and contents match + if os.path.exists(zip_file_path): + logger.info('Found existing zip file: %s', zip_file_path) + logger.info('Comparing contents with source directory...') + + if compare_directory_with_archive(src_dir, zip_file_path): + logger.info('Existing zip matches source - reusing zip file') + return zip_file_path + + logger.info('Existing zip differs from source - creating new zip') + + # Create new zip file + create_archive(ArchiveConfig( + src_dir=src_dir, + output_path=zip_file_path, + whitelist=Constant.PROFILER_FILE_PATTERNS, + use_regex=True, + regex_fullmatch=False, + )) + + logger.info('Successfully created new zip file %s', zip_file_path) + + return zip_file_path + + def run(self): + zip_file = self.zip_directory(self.src_dir) + + self.transport(zip_file) + + def transport(self, zip_file): + """Transport the zip file to the destination.""" + + def run_collector(input_file_dir, output_file_dir: str, config: DistributeManager): + args_dict = { + "input_file_dir": input_file_dir, + "output_file_dir": output_file_dir, + "nnodes": config.nnodes, + "node_rank": config.node_rank, + "master_addr": config.master_addr, + "master_port": config.master_port, + "master_rank_num": Constant.COLLECTOR_MASTER_RANK_NUM, + "split_file_size": Constant.COLLECTOR_SPLIT_FILE_SIZE, + "time_out": Constant.COLLECTOR_DEFAULT_TIMEOUT, + "log_file": None + } + + from msprof_analyze.precheck.collect.collector import Collector + Collector().run(args_dict) + + run_collector(zip_file, self.des_dir, self.config) + + if self.config.node_rank == 0 or self.config.master_addr in Constant.LOCALHOST_ADDRESSES: + mv_command = ['cp', zip_file, self.des_dir] + logger.info("[rank=%s] %s", self.config.rank, mv_command) + subprocess.run(mv_command, check=True) + else: + pass + + logger.info("[rank=%s] Successfully transferred %s to %s", self.config.rank, zip_file, self.des_dir) diff --git a/profiler/msprof_analyze/precheck/tools/__init__.py b/profiler/msprof_analyze/precheck/tools/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/profiler/msprof_analyze/precheck/tools/archive_utils.py b/profiler/msprof_analyze/precheck/tools/archive_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..236413a3052174d07b7463ffb0d42042b50daa59 --- /dev/null +++ b/profiler/msprof_analyze/precheck/tools/archive_utils.py @@ -0,0 +1,274 @@ +import glob +import os +import zipfile +import tarfile +import logging +import re +import fnmatch +from dataclasses import dataclass +from typing import List, Optional + +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +@dataclass +class ArchiveConfig: + src_dir: str + output_path: str + use_tar: bool = False + whitelist: Optional[List[str]] = None + blacklist: Optional[List[str]] = None + use_regex: bool = False + regex_fullmatch: bool = True + + +def create_archive(archive_args: ArchiveConfig): + """ + Create a zip or tar archive from a source directory. + + The archive will contain files from the source directory that match the whitelist + patterns (if specified) and don't match the blacklist patterns (if specified). + Patterns can be either glob patterns or regular expressions based on the use_regex flag. + + For regex patterns: + - If regex_fullmatch is True, the entire path must match the pattern + - If regex_fullmatch is False, the pattern can match anywhere in the path + + For glob patterns: + - Standard glob syntax is used (*, ?, [seq], [!seq]) + - Patterns are matched against the full relative path + + Args: + archive_args: Configuration object containing: + src_dir: Source directory to archive + output_path: Output path for the archive file + use_tar: If True create tar.gz, if False create zip + whitelist: List of patterns to include + blacklist: List of patterns to exclude + use_regex: If True use regex patterns, if False use glob + regex_fullmatch: If True require full regex match + + """ + + if not os.path.exists(archive_args.src_dir): + raise ValueError(f"Source directory '{archive_args.src_dir}' does not exist") + + save_dir = os.path.dirname(archive_args.output_path) + if not os.path.exists(save_dir): + raise ValueError(f"Destination directory '{save_dir}' does not exist") + + logger.info("Creating %s archive: %s", 'tar' if archive_args.use_tar else 'zip', archive_args.output_path) + logger.debug("Source directory: %s", archive_args.src_dir) + + if archive_args.use_regex: + if archive_args.whitelist: + whitelist = [re.compile(pattern) for pattern in archive_args.whitelist] + else: + whitelist = None + if archive_args.blacklist: + blacklist = [re.compile(pattern) for pattern in archive_args.blacklist] + else: + blacklist = None + else: + whitelist = archive_args.whitelist + blacklist = archive_args.blacklist + + def should_include_file(relative_path): + # Define pattern matching functions + def regex_fullmatch(pattern): + return pattern.fullmatch(relative_path) + + def regex_search(pattern): + return pattern.search(relative_path) + + def glob_match(pattern): + return fnmatch.fnmatch(relative_path, pattern) + + # Choose pattern matcher based on args + if archive_args.use_regex: + if archive_args.regex_fullmatch: + pattern_matcher = regex_fullmatch + else: + pattern_matcher = regex_search + else: + pattern_matcher = glob_match + + # Check blacklist first + if blacklist and any(map(pattern_matcher, blacklist)): + return False + + # If no whitelist, include all non-blacklisted files + if not whitelist: + return True + + # Check whitelist + return any(map(pattern_matcher, whitelist)) + + # Get all files in source directory recursively + abs_files = glob.glob(os.path.join(archive_args.src_dir, '**', '*'), recursive=True) + files = [os.path.relpath(file, archive_args.src_dir) for file in abs_files] + + files_to_add = [ + file for file_abs_path, file in zip(abs_files, files) + if should_include_file(file) and os.path.isfile(file_abs_path) + ] + + logger.info("Has found %d files to add at path: %s", len(files_to_add), archive_args.src_dir) + + # Process files based on archive type (tar or zip) + def add_files_to_tar(files_to_add): + with tarfile.open(archive_args.output_path, 'w:gz') as f: + for file in files_to_add: + file_path = os.path.join(archive_args.src_dir, file) + f.add(file_path, arcname=file) + + def add_files_to_zip(files_to_add): + with zipfile.ZipFile(archive_args.output_path, 'w', zipfile.ZIP_DEFLATED) as f: + for file in files_to_add: + file_path = os.path.join(archive_args.src_dir, file) + f.write(file_path, arcname=file) + + if archive_args.use_tar: + add_files_to_tar(files_to_add) + else: + add_files_to_zip(files_to_add) + + logger.info("Archive created successfully: %s", archive_args.output_path) + + +def _check_safe_zip(archive_file, max_archive_ratio=None, + max_size=Constant.MAX_ARCHIVE_SIZE, + max_file_count=Constant.MAX_ARCHIVE_FILE_COUNT, + ): + PathManager.check_path_readable(archive_file) + + archive_size = os.path.getsize(archive_file) + if max_archive_ratio is not None: + max_size = max(max_size, max_archive_ratio * archive_size) + + try: + with zipfile.ZipFile(archive_file, 'r') as zip_ref: + total_size = 0 + total_file_count = 0 + for info in zip_ref.infolist(): + total_size += info.file_size + total_file_count += 1 + if total_size > max_size: + raise RuntimeError("Archive size exceeds the limit") + if total_file_count > max_file_count: + raise RuntimeError("Archive file count exceeds the limit") + except (zipfile.BadZipFile, OSError) as e: + logger.error("Error reading zip file %s: %s", archive_file, e) + raise + + +def _check_safe_tar(archive_file, max_archive_ratio=None, + max_size=Constant.MAX_ARCHIVE_SIZE, + max_file_count=Constant.MAX_ARCHIVE_FILE_COUNT, + ): + PathManager.check_path_readable(archive_file) + + archive_size = os.path.getsize(archive_file) + if max_archive_ratio is not None: + max_size = max(max_size, max_archive_ratio * archive_size) + + try: + with tarfile.open(archive_file, 'r:*') as tar_ref: + total_size = 0 + total_file_count = 0 + for member in tar_ref.getmembers(): + total_size += member.size + total_file_count += 1 + if total_size > max_size: + raise RuntimeError("Archive size exceeds the limit") + if total_file_count > max_file_count: + raise RuntimeError("Archive file count exceeds the limit") + except (tarfile.TarError, OSError) as e: + logger.error("Error reading tar file %s: %s", archive_file, e) + raise + + +def _unzip(zip_file, extract_dir): + """Extract contents from a zip archive""" + + _check_safe_zip(zip_file, max_archive_ratio=Constant.MAX_ARCHIVE_RATIO) + with zipfile.ZipFile(zip_file, 'r') as zip_ref: + zip_ref.extractall(extract_dir) + logger.info("Unzipped %s to %s", zip_file, extract_dir) + + +def _untar(tar_file, extract_dir): + """Extract contents from a tar/tar.gz/tgz archive""" + + _check_safe_tar(tar_file, max_archive_ratio=Constant.MAX_ARCHIVE_RATIO) + with tarfile.open(tar_file, 'r:*') as tar_ref: # Auto-detect compression type + tar_ref.extractall(extract_dir) + logger.info("Untarred %s to %s", tar_file, extract_dir) + + +def extract_archive(archive_file, extract_dir): + """Extract contents from zip or tar archive files""" + + if archive_file.endswith('.zip'): + _unzip(archive_file, extract_dir) + elif archive_file.endswith('.tar') or archive_file.endswith('.tar.gz') or archive_file.endswith('.tgz'): + _untar(archive_file, extract_dir) + else: + logger.warning("Unsupported archive type: %s", archive_file) + + +def compare_directory_with_archive(src_dir: str, zip_file_path: str) -> bool: + """ + Compare contents of source directory with existing zip file. + + Args: + src_dir: Source directory path + zip_file_path: Path to zip file + + Returns: + bool: True if contents match, False otherwise + """ + # Get source files info + src_files = {} + for file_path in glob.glob(os.path.join(src_dir, "**"), recursive=True): + if os.path.isfile(file_path): + rel_path = os.path.relpath(file_path, src_dir) + src_files[rel_path] = os.path.getsize(file_path) + + # Compare with zip contents + with zipfile.ZipFile(zip_file_path, 'r') as existing_zip: + zip_files = { + info.filename: info.file_size + for info in existing_zip.filelist + } + + return src_files == zip_files + + +if __name__ == '__main__': + logging.basicConfig(level=logging.INFO) + + # Example usage with fnmatch whitelist, blacklist + config = ArchiveConfig( + src_dir="profiler/msprof_analyze/precheck/runner", + output_path="profiler/msprof_analyze/precheck/runner.zip", + whitelist=[r"tools/*", r"profiler/*", r"tests/*"], # Only include files in these directories + blacklist=[r"*.pyc"], # Exclude .pyc files + use_regex=False, + ) + + create_archive(config) + + # Example usage with regex whitelist, blacklist + config = ArchiveConfig( + src_dir="profiler/msprof_analyze/precheck/runner", + output_path="profiler/msprof_analyze/precheck/runner_regex.zip", + whitelist=[r"tools/.*", r"profiler/.*", r"tests/.*"], + blacklist=[r".*\.pyc$"], + use_regex=True, + ) + + create_archive(config) diff --git a/profiler/msprof_analyze/precheck/tools/ssh_utils.py b/profiler/msprof_analyze/precheck/tools/ssh_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..c99c828d15ecda1ff13ad3ad7a2af885431a64e7 --- /dev/null +++ b/profiler/msprof_analyze/precheck/tools/ssh_utils.py @@ -0,0 +1,264 @@ +import getpass +import ipaddress +import os +import logging +import re +import subprocess +import shlex +from dataclasses import dataclass +from typing import List, Union + +from msprof_analyze.precheck.common.constant import Constant +from msprof_analyze.precheck.common.utils import cn_now +from msprof_analyze.prof_common.path_manager import PathManager + +logger = logging.getLogger(__name__) + + +@dataclass +class SSHConfig: + host: str + username: str + key_file: str + port: int = 22 + timeout: int = 3 + + def __post_init__(self): + """ Validate all fields after initialization """ + error = _check_ip_valid(self.host) + if error: + raise RuntimeError(f"Invalid host {self.host}: {error}") + + error = _check_int_range(self.port, min_value=1, max_value=Constant.ARG_MAX_INT_VALUE) + if error: + raise RuntimeError(f"Invalid port {self.port}: {error}") + + error = _check_ssh_key_file_valid(self.key_file) + if error: + raise RuntimeError(f"Invalid SSH key file {self.key_file}: {error}") + + error = _check_identifier_valid(self.username) + if error: + raise RuntimeError(f"Invalid username {self.username}: {error}") + + error = _check_int_range(self.timeout, min_value=1) + if error: + raise RuntimeError(f"Invalid timeout {self.timeout}: {error}") + + +def _check_ip_valid(ip: str) -> Union[Exception, None]: + try: + ipaddress.ip_address(ip) + except ValueError as e: + return e + return None + + +def _check_int_range( + value: int, min_value: int = Constant.ARG_MIN_INT_VALUE, max_value: int = Constant.ARG_MAX_INT_VALUE +) -> Union[Exception, None]: + if not (min_value <= value <= max_value): + return ValueError(f"The value must be between {min_value} and {max_value}.") + return None + + +def _check_identifier_valid(identifier: str) -> Union[Exception, None]: + pattern = r'^[a-zA-Z_][a-zA-Z0-9_-]*$' + if not re.match(pattern, identifier): + return ValueError(f"It must start with a letter or underscore, " + f"followed by any number of letters, digits, underscores, or dashes.") + return None + + +def _check_ssh_key_file_valid(ssh_key_file: str) -> Union[Exception, None]: + try: + expanded_path = os.path.expanduser(ssh_key_file) + stat_info = os.stat(expanded_path) + current_uid = os.getuid() + + # check file owner + if stat_info.st_uid != current_uid: + return ValueError(f"SSH key file {ssh_key_file} must be owned by the current user") + # check permissions to only read and write by owner + if stat_info.st_mode & 0o777 != 0o600: + return ValueError(f"SSH key file {ssh_key_file} must have permissions set to 600") + + return None + + except FileNotFoundError: + return ValueError(f"SSH key file {ssh_key_file} does not exist") + except PermissionError: + return ValueError(f"Permission denied when accessing SSH key file {ssh_key_file}") + + +def execute_ssh_command(config: SSHConfig, command: str) -> dict: + """ + Execute a command directly on a remote host using SSH without using tmux. + + Args: + config (SSHConfig): SSH configuration + command (str): Command to run on the remote host + + Returns: + dict: Dict containing command execution status and output with keys: + - success (bool): Whether the command was executed successfully + - output (str): Output from the command execution + """ + if not isinstance(config, SSHConfig): + raise ValueError("config must be an instance of SSHConfig") + + ssh_prefix = f"ssh -o ConnectTimeout={config.timeout} -p {config.port} {config.username}@{config.host}" + if config.key_file: + ssh_prefix += f" -i {config.key_file}" + + try: + result = subprocess.run([*shlex.split(ssh_prefix), command], capture_output=True, text=True, check=True) + return { + 'success': True, + 'output': result.stdout + } + except subprocess.CalledProcessError as e: + logger.error("SSH command failed on %s: %s", config.host, e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + return { + 'success': False, + 'output': e.stderr + } + + +def execute_ssh_command_in_tmux(config: SSHConfig, session_name: str, command: str) -> dict: + """ + Connect to remote host using system ssh command, start or update tmux session and run command + + Args: + config (SSHConfig): SSH configuration + session_name (str): Base name for tmux session + command (str): Command to run in tmux session + + Returns: + dict: Dict containing session info with keys: + - session_name (str): Name of tmux session + - win_name (str): Name of tmux window + - attach_cmd (str): Command to attach to tmux session + """ + if not isinstance(config, SSHConfig): + raise ValueError("config must be an instance of SSHConfig") + + error = _check_identifier_valid(session_name) + if error: + raise RuntimeError(f"Invalid session name {session_name}: {error}") + + win_name = cn_now().strftime("%H%M") + attach_cmd = "" + + try: + ssh_prefix = f"ssh -o ConnectTimeout={config.timeout} -p {config.port} {config.username}@{config.host}" + if config.key_file: + ssh_prefix += f" -i {config.key_file}" + + check_cmd = f"{ssh_prefix} 'tmux list-sessions | grep -q \"^{session_name}:\" && echo exists || echo new'" + result = subprocess.run(shlex.split(check_cmd), capture_output=True, text=True) + session_status = result.stdout.strip() + + escaped_command = command.replace("'", "\\'").replace('"', '\\"') + + tmux_cmd_suffix = f"script -f /tmp/tmux_output_{win_name} -c \"{escaped_command}\"; bash -i" + if session_status == "exists": + logger.info("Session '%s' exists on %s. Creating a new window with name '%s'.", + session_name, config.host, win_name) + tmux_cmd = f"tmux new-window -t {session_name} -n '{win_name}' '{tmux_cmd_suffix}'" + else: + logger.info( + "Session '%s' does not exist on %s. Creating a new session with name '%s'. " + "Creating a new window with name '%s'.", session_name, config.host, session_name, win_name) + tmux_cmd = f"tmux new-session -d -s {session_name} -n '{win_name}' '{tmux_cmd_suffix}'" + + logger.info("Running command to start session: %s", tmux_cmd) + + result = subprocess.run(shlex.split(ssh_prefix) + [tmux_cmd], capture_output=True, text=True, check=True) + + if result.stdout.strip(): + logger.info("Output from %s:\n%s", config.host, result.stdout) + + attach_cmd = f"tmux attach -t {session_name}:{win_name}" + logger.info('Session started. To attach to the session, run: "%s" in terminal on %s@%s', + attach_cmd, config.username, config.host) + + except Exception as e: + logger.error("Failed to connect to %s: %s", config.host, e, exc_info=Constant.ENABLE_STACKTRACE_LOGGING) + raise RuntimeError(f"Fail to start host {config.host}") from e + + return dict( + session_name=session_name, + win_name=win_name, + attach_cmd=attach_cmd, + ) + + +def run_remote_command(hosts_info: List[dict], session_name: str = None, using_tmux: bool = True) -> List[dict]: + """ + Execute specified commands on remote hosts using SSH, optionally within a tmux session. + + This function supports executing commands directly via SSH or within a tmux session for + better management of long-running processes. + + Args: + hosts_info (list of dict): Information about the hosts on which commands will be executed. + Each dictionary should contain: + - host (str): Hostname or IP address of the remote machine. + - username (str): SSH username for the remote host. + - key_filename (str, optional): Path to the SSH private key file. Defaults to '~/.ssh/id_rsa'. + - command (str): Command to be executed on the remote host. + - port (int, optional): SSH port number. Defaults to 22. + session_name (str, optional): Name to be used for the tmux session, if using tmux. Automatically generated + if not provided. + using_tmux (bool): Whether to execute the command within a tmux session. Defaults to True. + + Returns: + list of dict: Results from each host, with each dictionary containing: + - session_name (str): Name of the tmux session (if used). + - win_name (str): Name of the tmux window (if used). + - attach_cmd (str): Command to attach to the tmux session (if used). + """ + user = getpass.getuser() + if session_name is None: + session_name = f"auto_{user}_{cn_now().strftime('%m%d')}" + + results = [] + + for host_info in hosts_info: + config = SSHConfig( + host=host_info["host"], + username=host_info["username"], + key_file=host_info.get("key_filename", "~/.ssh/id_rsa"), + port=host_info.get("port", 22) + ) + config.key_file = PathManager.expanduser_for_argumentparser(config.key_file) + if using_tmux: + results.append(execute_ssh_command_in_tmux(config, session_name, host_info["command"])) + else: + results.append(execute_ssh_command(config, host_info["command"])) + + return results + + +def main(): + hosts = [{ + "host": "127.0.0.1", + "username": os.getenv("USER"), + "key_filename": "~/.ssh/id_ed25519", + "command": f"echo Hello!", + "port": 22 + }] + + run_remote_command(hosts) + + +if __name__ == "__main__": + logging.basicConfig( + level=logging.DEBUG, + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", + handlers=[ + logging.StreamHandler(), + ] + ) + main() diff --git a/profiler/msprof_analyze/prof_common/__init__.py b/profiler/msprof_analyze/prof_common/__init__.py index c2764ec2a520567abc0c7d119b222f5fea7c3b72..8b7e7544bb1bd466a9b223cb1f706422bcab9435 100644 --- a/profiler/msprof_analyze/prof_common/__init__.py +++ b/profiler/msprof_analyze/prof_common/__init__.py @@ -14,4 +14,4 @@ # limitations under the License. import os import sys -sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))) +sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))) \ No newline at end of file diff --git a/profiler/msprof_analyze/prof_common/base_node.py b/profiler/msprof_analyze/prof_common/base_node.py new file mode 100644 index 0000000000000000000000000000000000000000..e96c5521ca11b778e277df1d17fb26a88f9f988f --- /dev/null +++ b/profiler/msprof_analyze/prof_common/base_node.py @@ -0,0 +1,82 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from math import ceil +from queue import Queue + +from decimal import Decimal + +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.trace_event_bean import TraceEventBean + + +class BaseNode: + def __init__(self, event: TraceEventBean, parent_node=None): + self._event = event + self._parent_node = parent_node + self._child_nodes = [] + + @property + def parent_node(self): + return self._parent_node + + @property + def child_nodes(self): + return self._child_nodes + + @property + def name(self): + return self._event.name + + @property + def start_time(self) -> Decimal: + return self._event.start_time + + @property + def end_time(self) -> Decimal: + return self._event.end_time + + @parent_node.setter + def parent_node(self, parent_node): + self._parent_node = parent_node + + def update_child_nodes(self, node): + self._child_nodes.append(node) + + def binary_search(self, ts_time): + if not self.child_nodes: + return Constant.INVALID_RETURN + right = len(self.child_nodes) - 1 + left = 0 + while right > left: + mid = left + ceil((right - left) / 2) + if ts_time >= self.child_nodes[mid].start_time: + left = mid + else: + right = mid - 1 + if self.child_nodes[left].start_time < ts_time < self.child_nodes[left].end_time: + return self.child_nodes[left] + return Constant.INVALID_RETURN + + def find_all_child_nodes(self) -> list: + result_data = [] + node_queue = Queue() + for child_node in self.child_nodes: + node_queue.put(child_node) + while not node_queue.empty(): + tree_node = node_queue.get() + result_data.append(tree_node) + for child_node in tree_node.child_nodes: + node_queue.put(child_node) + return result_data diff --git a/profiler/msprof_analyze/prof_common/constant.py b/profiler/msprof_analyze/prof_common/constant.py index 5353fc6d40f25cee9f1a10e9b734dc95573b3b2c..d77589dbcc0ebfe865c661c4d54292e05702bc9f 100644 --- a/profiler/msprof_analyze/prof_common/constant.py +++ b/profiler/msprof_analyze/prof_common/constant.py @@ -114,6 +114,37 @@ class Constant(object): DB = "db" INVALID = "invalid" + # profiler db tables + TABLE_AICORE_FREQ = "AICORE_FREQ" + TABLE_CANN_API = "CANN_API" + TABLE_COMMUNICATION_OP = "COMMUNICATION_OP" + TABLE_COMMUNICATION_TASK_INFO = "COMMUNICATION_TASK_INFO" + TABLE_COMPUTE_TASK_INFO = "COMPUTE_TASK_INFO" + TABLE_CONNECTION_IDS = "CONNECTION_IDS" + TABLE_CONNECTION_CATS = "connectionCats" + TABLE_ENUM_API_TYPE = "ENUM_API_TYPE" + TABLE_ENUM_HCCL_DATA_TYPE = "ENUM_HCCL_DATA_TYPE" + TABLE_ENUM_HCCL_LINK_TYPE = "ENUM_HCCL_LINK_TYPE" + TABLE_ENUM_HCCL_RDMA_TYPE = "ENUM_HCCL_RDMA_TYPE" + TABLE_ENUM_TRANSPORT_TYPE = "ENUM_TRANSPORT_TYPE" + TABLE_ENUM_MODULE = "ENUM_MODULE" + TABLE_MSTX_EVENT_TYPE = "MSTX_EVENT_TYPE" + TABLE_HOST_INFO = "HOST_INFO" + TABLE_META_DATA = "META_DATA" + TABLE_NPU_INFO = "NPU_INFO" + TABLE_OVERLAP_ANALYSIS = "OVERLAP_ANALYSIS" + TABLE_PYTORCH_API = "PYTORCH_API" + TABLE_RANK_DEVICE_MAP = "RANK_DEVICE_MAP" + TABLE_SESSION_TIME_INFO = "SESSION_TIME_INFO" + TABLE_STATUS_INFO = "status_info" + TABLE_STEP_TIME = "STEP_TIME" + TABLE_STRING_IDS = "STRING_IDS" + TABLE_TASK = "TASK" + TABLE_TASK_MPU_INFO = "TASK_MPU_INFO" + + # export_type + NOTEBOOK = "notebook" + # db name DB_COMMUNICATION_ANALYZER = "analysis.db" DB_CLUSTER_COMMUNICATION_ANALYZER = "cluster_analysis.db" @@ -126,14 +157,24 @@ class Constant(object): TABLE_HOST_INFO = "HostInfo" TABLE_RANK_DEVICE_MAP = "RankDeviceMap" TABLE_CLUSTER_BASE_INFO = "ClusterBaseInfo" + TABLE_CLUSTER_TIME_SUMMARY = "ClusterTimeSummary" # data config key CONFIG = "config" EXPER_CONFIG = "experimental_config" EXPER_EXPORT_TYPE = "_export_type" + EXPORT_TYPE = "_export_type" # metadata key DISTRIBUTED_ARGS = "distributed_args" + PARALLEL_GROUP_INFO = "parallel_group_info" + + # parallel_info_key + GROUP_NAME = "group_name" + GLOBAL_RANKS = "global_ranks" + + # group name value + PP = "pp" # mode ALL = "all" @@ -247,6 +288,10 @@ class Constant(object): VOID_STEP = -1 + # communication task type + NOTIFY_RECORD = "Notify_Record" + NOTIFY_WAIT = "Notify_Wait" + # advisor # timeline @@ -352,6 +397,7 @@ class Constant(object): PT_PROF_SUFFIX = "ascend_pt" ASCEND_PROFILER_OUTPUT = "ASCEND_PROFILER_OUTPUT" + KERNEL_DETAILS_CSV = "kernel_details.csv" CLUSTER_STEP_TIME_CSV = "cluster_step_trace_time.csv" CLUSTER_COMM_JSON = "cluster_communication.json" COMMUNICATION_JSON = "communication.json" @@ -379,6 +425,7 @@ class Constant(object): # Unit Conversion COMMUNICATION_B_TO_GB = 0.001 ** 3 US_TO_S = 0.001 ** 2 + TIME_UNIT_SCALE = 1000 WRITE_MODES = stat.S_IWUSR | stat.S_IRUSR | stat.S_IRGRP WRITE_FLAGS = os.O_WRONLY | os.O_CREAT | os.O_TRUNC @@ -396,9 +443,9 @@ class Constant(object): OPERATOR_TYPE = 1 VIRTUAL_TYPE = 9 - # json trace bar + # trace bar NPU_BAR = "Ascend Hardware" - COMM_BAR = "Communication" + HCCL_BAR = "HCCL" OVERLAP_BAR = "Overlap Analysis" # overlap_analysis event @@ -421,13 +468,13 @@ class Constant(object): CONCURRENT_MODE = "concurrent" PROFILER_DB_PATH = "profiler_db_path" + ANALYSIS_DB_PATH = "analysis_db_path" RANK_LIST = "rank_list" EXPORT_TYPE = "export_type" EXTRA_ARGS = "args" - STEP_RANGE = "step_range" - START_NS = "startNs" - END_NS = "endNs" # hccl_sum UINT32_BITS = 32 - UINT32_MASK = 0xffffffff \ No newline at end of file + UINT32_MASK = 0xffffffff + + INVALID_RANK_NUM = 4294967295 diff --git a/profiler/msprof_analyze/prof_common/database_service.py b/profiler/msprof_analyze/prof_common/database_service.py index 6b776d4d957a9491aeb5690cf456038c114c3590..1e51b787dcb3e2911f3d0795fefd95cd34bb68af 100644 --- a/profiler/msprof_analyze/prof_common/database_service.py +++ b/profiler/msprof_analyze/prof_common/database_service.py @@ -16,37 +16,13 @@ import pandas as pd from msprof_analyze.prof_common.db_manager import DBManager from msprof_analyze.prof_common.logger import get_logger -from msprof_analyze.prof_common.constant import Constant logger = get_logger() class DatabaseService: - TABLE_TS_DICT = { - "TASK": "startNs", - "COMMUNICATION_OP": "startNs", - "CANN_API": "startNs", - "PYTORCH_API": "startNs", - "MSTX_EVENTS": "startNs", - "GC_RECORD": "startNs", - "ACC_PMU": "timestampNs", - "NIC": "timestampNs", - "RoCE": "timestampNs", - "LLC": "timestampNs", - "SAMPLE_PMU_TIMELINE": "timestampNs", - "NPU_MEM": "timestampNs", - "NPU_MODULE_MEM": "timestampNs", - "NPU_OP_MEM": "timestampNs", - "HBM": "timestampNs", - "DDR": "timestampNs", - "HCCS": "timestampNs", - "PCIE": "timestampNs", - "AICORE_FREQ": "timestampNs" - } - - def __init__(self, db_path, step_range): + def __init__(self, db_path): self._db_path = db_path - self._step_range = step_range self._table_info = {} def add_table_for_query(self, table_name: str, columns=None): @@ -72,12 +48,7 @@ class DatabaseService: logger.warning(f"This table {table_name} does not exist in this database {self._db_path}.") continue columns_str = "*" if not columns else ",".join(columns) - if table_name in self.TABLE_TS_DICT and self._step_range: - where_str = f"where {self.TABLE_TS_DICT.get(table_name)} >= {self._step_range.get(Constant.START_NS)}" \ - f" and {self.TABLE_TS_DICT.get(table_name)} <= {self._step_range.get(Constant.END_NS)}" - else: - where_str = "" - query_sql = f"select {columns_str} from {table_name} {where_str}" + query_sql = f"select {columns_str} from {table_name}" try: data = pd.read_sql(query_sql, conn) result_data[table_name] = data diff --git a/profiler/msprof_analyze/prof_common/db_manager.py b/profiler/msprof_analyze/prof_common/db_manager.py index ac24ec8144f7a67c1796906d7e75ab25a7a7f71c..8740499c27edc9562ad2861b5da8d1a21f02dd0c 100644 --- a/profiler/msprof_analyze/prof_common/db_manager.py +++ b/profiler/msprof_analyze/prof_common/db_manager.py @@ -143,6 +143,41 @@ class DBManager: logger.error("conn is invalid param") return False + @staticmethod + def execute_sql(conn: any, sql: str, params: any = None) -> bool: + """ + execute sql + """ + try: + if isinstance(conn, sqlite3.Connection): + if params: + conn.cursor().execute(sql, params) + else: + conn.cursor().execute(sql) + conn.commit() + return True + except sqlite3.Error as err: + logger.error(err) + return False + logger.error("conn is invalid param") + return False + + @staticmethod + def executemany_sql(conn: any, sql: str, params: any) -> bool: + """ + execute many sql once + """ + try: + if isinstance(conn, sqlite3.Connection): + conn.cursor().executemany(sql, params) + conn.commit() + return True + except sqlite3.Error as err: + logger.error(err) + return False + logger.error("conn is invalid param") + return False + @classmethod def check_tables_in_db(cls, db_path: any, *tables: any) -> bool: if check_db_path_valid(db_path): @@ -249,6 +284,21 @@ class DBManager: cls.insert_data_into_table(conn, table_name, data) cls.destroy_db_connect(conn, curs) + @classmethod + def check_columns_exist(cls, curs: any, table_name: str, columns: set) -> any: + """ + check columns exist in table, return empty set if none of them exist, else return the set of existing columns + """ + if not isinstance(curs, sqlite3.Cursor): + return None + try: + curs.execute(f"PRAGMA table_info({table_name})") + table_columns = {col[1] for col in curs.fetchall()} + return columns & table_columns + except sqlite3.Error as err: + logger.error(err) + return None + class CustomizedDictFactory: @staticmethod diff --git a/profiler/msprof_analyze/prof_common/file_reader.py b/profiler/msprof_analyze/prof_common/file_reader.py new file mode 100644 index 0000000000000000000000000000000000000000..313933ba7f9334d8ce9273824aeba565c379a1cc --- /dev/null +++ b/profiler/msprof_analyze/prof_common/file_reader.py @@ -0,0 +1,86 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import csv +import json +import logging +import os + +from msprof_analyze.prof_common.path_manager import PathManager +from msprof_analyze.prof_common.constant import Constant + + +class FileReader: + DATA_FILE_AUTHORITY = 0o640 + DATA_DIR_AUTHORITY = 0o750 + + @classmethod + def read_json_file(cls, file_path: str) -> any: + PathManager.check_path_readable(file_path) + if not os.path.isfile(file_path): + raise FileNotFoundError("File not exists.") + file_size = os.path.getsize(file_path) + if file_size <= 0: + return [] + if file_size > Constant.MAX_FILE_SIZE_5_GB: + msg = f"The file({file_path}) size exceeds the preset max value, failed to read the file." + raise RuntimeError(msg) + try: + with open(file_path, "rt") as file: + json_data = json.loads(file.read()) + except Exception as e: + msg = f"Can't read file: {file_path}" + raise RuntimeError(msg) from e + return json_data + + @classmethod + def write_json_file(cls, output_path: str, data: dict, file_name: str, format_json: bool = False) -> None: + if not data: + return + output_file = os.path.join(output_path, file_name) + PathManager.check_path_writeable(output_path) + try: + with os.fdopen( + os.open(output_file, os.O_WRONLY | os.O_CREAT, cls.DATA_FILE_AUTHORITY), 'w' + ) as file: + indent = 4 if format_json else None + file.write(json.dumps(data, indent=indent)) + except Exception as e: + raise RuntimeError(f"Can't create the file: {output_file}") from e + + @classmethod + def read_csv_file(cls, file_path: str, bean_class: any = None) -> any: + PathManager.check_path_readable(file_path) + if not os.path.isfile(file_path): + raise FileNotFoundError("File not exists.") + file_size = os.path.getsize(file_path) + if file_size <= 0: + return [] + if file_size > Constant.MAX_FILE_SIZE_5_GB: + check_msg = input( + f"The file({file_path}) size exceeds the preset max value. Continue reading the file? [y/n]") + if check_msg.lower() != "y": + logging.warning(f"The user choose not to read the file: %s", file_path) + return [] + result_data = [] + try: + with open(file_path, newline="") as csv_file: + reader = csv.DictReader(csv_file) + for row in reader: + row_data = bean_class(row) if bean_class else row + result_data.append(row_data) + except Exception as e: + msg = f"Failed to read the file: {file_path}" + raise RuntimeError(msg) from e + return result_data diff --git a/profiler/msprof_analyze/prof_common/kernel_bean.py b/profiler/msprof_analyze/prof_common/kernel_bean.py new file mode 100644 index 0000000000000000000000000000000000000000..f1c90895fc4bf78dc6b7c98bc6d7d781b7308b38 --- /dev/null +++ b/profiler/msprof_analyze/prof_common/kernel_bean.py @@ -0,0 +1,47 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.prof_common.utils import convert_to_decimal + + +class KernelBean: + def __init__(self, data: dict): + self._name = data.get("Name", "") + self._op_type = data.get("Type", "") + self._core_type = data.get("Accelerator Core", "") + self._input_shape = data.get("Input Shapes", "").replace("\"", "") + self._input_type = data.get("Input Data Types", "") + self._input_format = data.get("Input Formats", "") + self._duration = data.get("Duration(us)", 0) + self._ts = data.get("Start Time(us)", "") + + @property + def start_time(self): + return convert_to_decimal(self._ts) + + @property + def end_time(self): + return self.start_time + convert_to_decimal(self.dur) + + @property + def is_computing_op(self): + return self._core_type != "HCCL" + + @property + def dur(self): + return float(self._duration) + + @property + def kernel_info(self): + return [self._name, self._op_type, self._core_type, self._input_shape, self._input_type, self.dur] diff --git a/profiler/msprof_analyze/prof_common/trace_event_bean.py b/profiler/msprof_analyze/prof_common/trace_event_bean.py new file mode 100644 index 0000000000000000000000000000000000000000..ea78b54df57f8a1d72517baf2c48748b13ab7847 --- /dev/null +++ b/profiler/msprof_analyze/prof_common/trace_event_bean.py @@ -0,0 +1,113 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from decimal import Decimal + +from msprof_analyze.prof_common.constant import Constant +from msprof_analyze.prof_common.utils import convert_to_decimal +from msprof_analyze.prof_common.analyze_dict import AnalyzeDict + + +class TraceEventBean(AnalyzeDict): + def __init__(self, data: dict, unique_id: str = None): + super().__init__(data) + self._id = unique_id + self._type = None + self._start_time = convert_to_decimal(self.ts) if self.ts else 0 + self._end_time = self._start_time + convert_to_decimal(self.dur) if self.dur else 0 + self._fwd_bwd_id = None + + @property + def unique_id(self): + return self._id + + @property + def start_time(self) -> Decimal: + return self._start_time + + @property + def step_id(self) -> int: + return self.name.split("#")[-1] + + @property + def end_time(self) -> Decimal: + return self._end_time + + @property + def kernel_info(self): + return [self.name, self.args.get("Task Type", ""), self.dur] + + @property + def event_type(self): + return self._type + + @property + def fwd_bwd_id(self): + return self._fwd_bwd_id + + @event_type.setter + def event_type(self, event_type): + self._type = event_type + + @fwd_bwd_id.setter + def fwd_bwd_id(self, fwd_bwd_id): + self._fwd_bwd_id = fwd_bwd_id + + def set_id(self, name_id): + self._id = name_id + + def is_cpu_op(self): + return self.cat == "cpu_op" + + def is_optimizer(self): + return self.cat == "cpu_op" and self.name.lower().startswith("optimizer") + + def is_nn_module(self): + return self.cat == "python_function" and self.name.lower().startswith("nn.module") + + def is_step(self): + return self.name.lower().startswith("profilerstep#") + + def is_torch_to_npu(self): + return self.cat == "async_npu" + + def is_fwd_bwd_flow(self): + return self.cat == "fwdbwd" + + def is_flow_start(self): + return self.ph == "s" + + def is_flow_end(self): + return self.ph == "f" + + def is_meta(self): + return self.ph == "M" + + def is_kernel_event(self, kernel_pid): + return self.ph == "X" and self.pid == kernel_pid + + def is_hccl_event(self, hccl_pid): + return self.ph == "X" and self.pid == hccl_pid and self.name.startswith("hcom_") + + def is_overlap_analysis_event(self, overlap_analysis_pid): + return self.ph == "X" and self.pid == overlap_analysis_pid + + def is_npu_process(self): + return self.ph == "M" and self.name == "process_name" and self.args.get("name", "") == Constant.NPU_BAR + + def is_hccl_process(self): + return self.ph == "M" and self.name == "process_name" and self.args.get("name", "") == Constant.HCCL_BAR + + def is_overlap_analysis_process(self): + return self.ph == "M" and self.name == "process_name" and self.args.get("name", "") == Constant.OVERLAP_BAR diff --git a/profiler/msprof_analyze/prof_common/tree_builder.py b/profiler/msprof_analyze/prof_common/tree_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..34b056e71bd9880cea0e3402da699a4fbadd150a --- /dev/null +++ b/profiler/msprof_analyze/prof_common/tree_builder.py @@ -0,0 +1,37 @@ +# Copyright (c) 2024 Huawei Technologies Co., Ltd +# All rights reserved. +# +# Licensed under the BSD 3-Clause License (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://opensource.org/licenses/BSD-3-Clause +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from msprof_analyze.prof_common.trace_event_bean import TraceEventBean + + +class TreeBuilder: + @staticmethod + def build_tree(event_list: list, node_class: any, root_bean: any): + root_node = node_class(root_bean) + all_nodes = [root_node] + [None] * len(event_list) + event_list.sort(key=lambda x: x.start_time) + last_node = root_node + index = 1 + for event in event_list: + while last_node: + if last_node != root_node and event.start_time > last_node.end_time: + last_node = last_node.parent_node + continue + tree_node = node_class(event, last_node) + last_node.update_child_nodes(tree_node) + all_nodes[index] = tree_node + last_node = tree_node + index += 1 + break + return all_nodes diff --git a/profiler/msprof_analyze/prof_common/utils.py b/profiler/msprof_analyze/prof_common/utils.py index 005d8505c9ccd750d4856518961c62b4407eea1e..284c17c86e36b8fb87d2ea73ed7e3089f44fcbb6 100644 --- a/profiler/msprof_analyze/prof_common/utils.py +++ b/profiler/msprof_analyze/prof_common/utils.py @@ -12,13 +12,14 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. - import configparser import os from email.utils import parseaddr from typing import Dict, List from urllib.parse import urlparse +from decimal import Decimal + from msprof_analyze.prof_common.logger import get_logger from msprof_analyze.prof_common.path_manager import PathManager @@ -85,6 +86,15 @@ def convert_to_float(num): return 0 +def convert_to_decimal(data: any) -> Decimal: + try: + decimal_value = Decimal(data) + except Exception: + logger.error('Invalid profiling data which failed to convert data to decimal.') + return 0.0 + return decimal_value + + def convert_to_int(num): try: return int(num) diff --git a/profiler/msprof_analyze/prof_exports/base_stats_export.py b/profiler/msprof_analyze/prof_exports/base_stats_export.py index 65ccd69ecde0acb296308e0c37bec3323468ae34..59d58bdff5485a6ace0f2c12dadbf543ecd4b978 100644 --- a/profiler/msprof_analyze/prof_exports/base_stats_export.py +++ b/profiler/msprof_analyze/prof_exports/base_stats_export.py @@ -24,11 +24,11 @@ logger = get_logger() class BaseStatsExport: - def __init__(self, db_path, analysis_class, step_range): + def __init__(self, db_path, analysis_class): self._db_path = db_path self._analysis_class = analysis_class - self._step_range = step_range self._query = None + self.mode = Constant.ANALYSIS def get_query(self): return self._query @@ -39,10 +39,10 @@ class BaseStatsExport: if query is None: logger.error("query is None.") return None - conn, cursor = DBManager.create_connect_db(self._db_path, Constant.ANALYSIS) + conn, cursor = DBManager.create_connect_db(self._db_path, self.mode) data = pd.read_sql(query, conn) DBManager.destroy_db_connect(conn, cursor) return data except Exception as e: logger.error(f"File {self._db_path} read failed error: {e}") - return None + return None \ No newline at end of file diff --git a/profiler/msprof_analyze/prof_exports/cann_api_sum_export.py b/profiler/msprof_analyze/prof_exports/cann_api_sum_export.py index 0d3da94a001609cdbaed7d3f4646dc908d2b8c23..efdba81e94360e7f8e88801711fb2ff72fa5b47f 100644 --- a/profiler/msprof_analyze/prof_exports/cann_api_sum_export.py +++ b/profiler/msprof_analyze/prof_exports/cann_api_sum_export.py @@ -14,7 +14,6 @@ # limitations under the License. from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport -from msprof_analyze.prof_common.constant import Constant QUERY = """ WITH @@ -32,7 +31,6 @@ WITH upper_quartile(endNs - startNs) AS upper_quartile_duration FROM CANN_API - {} GROUP BY name ), totals AS ( @@ -62,14 +60,6 @@ ORDER BY 2 DESC; class CannApiSumExport(BaseStatsExport): - def __init__(self, db_path, recipe_name, step_range): - super().__init__(db_path, recipe_name, step_range) - self._query = self.get_query_statement() - - def get_query_statement(self): - if self._step_range: - filter_statement = f"WHERE CANN_API.startNs >= {self._step_range.get(Constant.START_NS)} " \ - f"and CANN_API.startNs <= {self._step_range.get(Constant.END_NS)}" - else: - filter_statement = "" - return QUERY.format(filter_statement) + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY diff --git a/profiler/msprof_analyze/prof_exports/cluster_time_summary_export.py b/profiler/msprof_analyze/prof_exports/cluster_time_summary_export.py new file mode 100644 index 0000000000000000000000000000000000000000..840359618f383b4606544832788b886fba6cad4b --- /dev/null +++ b/profiler/msprof_analyze/prof_exports/cluster_time_summary_export.py @@ -0,0 +1,101 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from msprof_analyze.prof_common.db_manager import DBManager +from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport + + +class CommunicationTimeExport(BaseStatsExport): + QUERY = """ + SELECT + rdm.rankid AS rank, + si.value AS groupName, + (co.endNs - co.startNs) / 1000.0 AS communication_time, + sii.value AS opName, + step_time.id AS step + FROM + COMMUNICATION_OP co + CROSS JOIN + RANK_DEVICE_MAP rdm + JOIN + STRING_IDS si ON co.groupName = si.id + JOIN + STRING_IDS sii ON co.opName = sii.id + LEFT JOIN STEP_TIME step_time + ON co.startNs >= step_time.startNs + AND co.endNs <= step_time.endNs + """ + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = self.QUERY + + +class MemoryAndDispatchTimeExport(BaseStatsExport): + QUERY = """ + + WITH + computing AS ( + SELECT TASK.startNs, TASK.endNs, CANN_API.startNs as apiStartNs, 0 AS type + FROM COMPUTE_TASK_INFO + JOIN TASK + ON COMPUTE_TASK_INFO.globalTaskId = TASK.globalTaskId + AND TASK.startNs != TASK.endNs + JOIN CANN_API + ON CANN_API.connectionId = TASK.connectionId + ), + communication AS ( + SELECT COMMUNICATION_OP.startNs, COMMUNICATION_OP.endNs, CANN_API.startNs as apiStartNs, 1 AS type + FROM COMMUNICATION_OP + JOIN CANN_API + ON CANN_API.connectionId = COMMUNICATION_OP.connectionId + ), + memory AS ( + SELECT TASK.startNs, TASK.endNs, TASK.startNs as apiStartNs, 4 AS type + FROM TASK + WHERE + taskType = ( + SELECT id + FROM STRING_IDS + WHERE value='MEMCPY_ASYNC' + ) + ), + overlap AS ( + SELECT startNs, endNs, apiStartNs, type + FROM computing + UNION ALL + SELECT startNs, endNs, apiStartNs, type + FROM communication + UNION ALL + SELECT startNs, endNs, apiStartNs, type + FROM memory + ) + SELECT + overlap.startNs AS start, + overlap.endNs AS end, + (overlap.startNs - overlap.apiStartNs) / 1000.0 AS dispatch, + overlap.type, + step_time.id AS step + FROM overlap + LEFT JOIN STEP_TIME step_time + ON overlap.apiStartNs >= step_time.startNs + AND overlap.apiStartNs <= step_time.endNs + ORDER BY overlap.startNs, overlap.endNs + """ + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = self.QUERY + self.mode = None diff --git a/profiler/msprof_analyze/prof_exports/compute_op_sum_export.py b/profiler/msprof_analyze/prof_exports/compute_op_sum_export.py index f337925dc36ff8e26c782ab1ea1c00618ebf271c..ed41d128056368b7a0e35f51e78edd95ce746486 100644 --- a/profiler/msprof_analyze/prof_exports/compute_op_sum_export.py +++ b/profiler/msprof_analyze/prof_exports/compute_op_sum_export.py @@ -14,7 +14,6 @@ # limitations under the License. from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport -from msprof_analyze.prof_common.constant import Constant QUERY = """ SELECT @@ -39,7 +38,6 @@ LEFT JOIN LEFT JOIN STRING_IDS AS INPUTSHAPES_IDS ON INPUTSHAPES_IDS.id == COMPUTE_TASK_INFO.inputShapes -{} """ QUERY_EXCLUDE_OPNAME = """ @@ -61,35 +59,18 @@ LEFT JOIN LEFT JOIN STRING_IDS AS INPUTSHAPES_IDS ON INPUTSHAPES_IDS.id == COMPUTE_TASK_INFO.inputShapes -{} """ class ComputeOpSumExport(BaseStatsExport): - def __init__(self, db_path, recipe_name, step_range): - super().__init__(db_path, recipe_name, step_range) - self._query = self.get_query_statement() - - def get_query_statement(self): - if self._step_range: - filter_statement = f"WHERE TASK.startNs >= {self._step_range.get(Constant.START_NS)} " \ - f"and TASK.startNs <= {self._step_range.get(Constant.END_NS)}" - else: - filter_statement = "" - return QUERY.format(filter_statement) + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY class ComputeOpSumExportExcludeOpName(BaseStatsExport): - def __init__(self, db_path, recipe_name, step_range): - super().__init__(db_path, recipe_name, step_range) - self._query = self.get_query_statement() - - def get_query_statement(self): - if self._step_range: - filter_statement = f"WHERE TASK.startNs >= {self._step_range.get(Constant.START_NS)} " \ - f"and TASK.startNs <= {self._step_range.get(Constant.END_NS)}" - else: - filter_statement = "" - return QUERY_EXCLUDE_OPNAME.format(filter_statement) + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY_EXCLUDE_OPNAME \ No newline at end of file diff --git a/profiler/msprof_analyze/prof_exports/filter_db_export.py b/profiler/msprof_analyze/prof_exports/filter_db_export.py new file mode 100644 index 0000000000000000000000000000000000000000..048b20a260d25ec48c17bd2dc85d70f15b177910 --- /dev/null +++ b/profiler/msprof_analyze/prof_exports/filter_db_export.py @@ -0,0 +1,102 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport +from msprof_analyze.prof_common.logger import get_logger + +logger = get_logger() + +FILTER_TABLES = ["MatMulV3", "MatMulV2", "GroupedMatmul", "FlashAttentionScore", "FlashAttentionScoreGrad"] +values_str = ', '.join([f"'{op_type}'" for op_type in FILTER_TABLES]) + +OP_QUERY = f""" +SELECT COMPUTE_TASK_INFO.* +FROM COMPUTE_TASK_INFO + WHERE + opType IN ( + SELECT + id + FROM + STRING_IDS + WHERE + value IN ({values_str}) + ) +""" + +TASK_QUERY = """ +SELECT TASK.* +FROM TASK +INNER JOIN COMPUTE_TASK_INFO +ON TASK.globalTaskId = COMPUTE_TASK_INFO.globalTaskId; +""" + +CANN_QUERY = """ +WITH all_connection_ids AS ( + SELECT connectionId + FROM TASK + UNION + SELECT connectionId + FROM COMMUNICATION_OP +) + +SELECT CANN_API.* +FROM CANN_API +INNER JOIN all_connection_ids +ON CANN_API.connectionId = all_connection_ids.connectionId; +""" + +PYTORCH_QUERY = """ +WITH all_connection_ids AS ( + SELECT connectionId + FROM TASK + UNION + SELECT connectionId + FROM COMMUNICATION_OP +) + +SELECT PYTORCH_API.* +FROM PYTORCH_API + +INNER JOIN all_connection_ids +ON PYTORCH_API.connectionId = all_connection_ids.connectionId; +""" + + +class OPFilter(BaseStatsExport): + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = OP_QUERY + + +class TaskFilter(BaseStatsExport): + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = TASK_QUERY + + +class CANNFilter(BaseStatsExport): + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = CANN_QUERY + + +class PYTORCHFilter(BaseStatsExport): + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = PYTORCH_QUERY \ No newline at end of file diff --git a/profiler/msprof_analyze/prof_exports/hccl_sum_export.py b/profiler/msprof_analyze/prof_exports/hccl_sum_export.py index c577d40c0f5ae1289d196bdd6d7cd306ebcbf01e..2470e059ffcfb116f1dad657de53d5aa7ddd865b 100644 --- a/profiler/msprof_analyze/prof_exports/hccl_sum_export.py +++ b/profiler/msprof_analyze/prof_exports/hccl_sum_export.py @@ -14,7 +14,6 @@ # limitations under the License. from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport -from msprof_analyze.prof_common.constant import Constant QUERY = """ SELECT @@ -33,20 +32,11 @@ LEFT JOIN LEFT JOIN STRING_IDS AS GROUP_NAME_IDS ON GROUP_NAME_IDS.id == COMMUNICATION_OP.groupName -{} """ class HcclSumExport(BaseStatsExport): - def __init__(self, db_path, recipe_name, step_range): - super().__init__(db_path, recipe_name, step_range) - self._query = self.get_query_statement() - - def get_query_statement(self): - if self._step_range: - filter_statement = f"WHERE COMMUNICATION_OP.startNs >= {self._step_range.get(Constant.START_NS)} " \ - f"and COMMUNICATION_OP.startNs <= {self._step_range.get(Constant.END_NS)}" - else: - filter_statement = "" - return QUERY.format(filter_statement) + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY diff --git a/profiler/msprof_analyze/prof_exports/mstx2commop_export.py b/profiler/msprof_analyze/prof_exports/mstx2commop_export.py new file mode 100644 index 0000000000000000000000000000000000000000..8c68bb4527c363d83941289c4c9b2ae5d86aa2fd --- /dev/null +++ b/profiler/msprof_analyze/prof_exports/mstx2commop_export.py @@ -0,0 +1,39 @@ +# Copyright (c) 2024, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport + +QUERY = """ +SELECT + ta.startNs, + ta.endNs, + ta.connectionId, + si.value +from + MSTX_EVENTS ms +JOIN + TASK ta + ON ms.connectionId == ta.connectionId +JOIN + STRING_IDS si + ON ms.message == si.id + """ + + +class Mstx2CommopExport(BaseStatsExport): + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY diff --git a/profiler/msprof_analyze/prof_exports/mstx_mark_export.py b/profiler/msprof_analyze/prof_exports/mstx_mark_export.py index 6a7f8d0c6d2f1b4cbbceb9323157215421b58464..9b561d9f066687efa373fcbba6dcaaae2e492eff 100644 --- a/profiler/msprof_analyze/prof_exports/mstx_mark_export.py +++ b/profiler/msprof_analyze/prof_exports/mstx_mark_export.py @@ -14,7 +14,6 @@ # limitations under the License. from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport -from msprof_analyze.prof_common.constant import Constant QUERY = """ WITH @@ -27,7 +26,6 @@ WITH LEFT JOIN CONNECTION_IDS ON PYTORCH_API.connectionId == CONNECTION_IDS.id - {} ) SELECT MSG_IDS.value AS "msg", @@ -46,7 +44,6 @@ LEFT JOIN LEFT JOIN STRING_IDS AS MSG_IDS ON MSTX_EVENTS.message == MSG_IDS.id -{} ORDER BY MSTX_EVENTS.startNs """ @@ -54,16 +51,6 @@ ORDER BY class MstxMarkExport(BaseStatsExport): - def __init__(self, db_path, recipe_name, step_range): - super().__init__(db_path, recipe_name, step_range) - self._query = self.get_query_statement() - - def get_query_statement(self): - if self._step_range: - filter_statement_1 = f"WHERE PYTORCH_API.startNs >= {self._step_range.get(Constant.START_NS)} " \ - f"and PYTORCH_API.startNs <= {self._step_range.get(Constant.END_NS)}" - filter_statement_2 = f"WHERE MSTX_EVENTS.startNs >= {self._step_range.get(Constant.START_NS)} " \ - f"and MSTX_EVENTS.startNs <= {self._step_range.get(Constant.END_NS)}" - else: - filter_statement_1, filter_statement_2 = "", "" - return QUERY.format(filter_statement_1, filter_statement_2) + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY diff --git a/profiler/msprof_analyze/prof_exports/mstx_step_export.py b/profiler/msprof_analyze/prof_exports/mstx_step_export.py index c8aec91b7e5ce5fb29fffebeb8668fec723e3fa8..3051a280ccb1c9eb2a83933c357948bcf59b4d1f 100644 --- a/profiler/msprof_analyze/prof_exports/mstx_step_export.py +++ b/profiler/msprof_analyze/prof_exports/mstx_step_export.py @@ -29,6 +29,6 @@ ORDER BY class MstxStepExport(BaseStatsExport): - def __init__(self, db_path, recipe_name, step_range): - super().__init__(db_path, recipe_name, step_range) + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) self._query = QUERY diff --git a/profiler/msprof_analyze/prof_exports/p2p_pairing_export.py b/profiler/msprof_analyze/prof_exports/p2p_pairing_export.py new file mode 100644 index 0000000000000000000000000000000000000000..2f6a73942619e1bad19eb5978893363acb6cca73 --- /dev/null +++ b/profiler/msprof_analyze/prof_exports/p2p_pairing_export.py @@ -0,0 +1,71 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from string import Template + +from msprof_analyze.cluster_analyse.common_func.table_constant import TableConstant +from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport + + +QUERY = Template(""" +SELECT + co.opName AS "$opNameId", + siii.value AS "$opName", + co.startNs AS "$startTime", + co.endNs AS "$endTime", + rdm.rankId AS "$globalRank", + cti.srcRank AS "$srcRank", + cti.dstRank AS "$dstRank", + siiii.value AS "$taskType", + sii.value AS "$coGroupName", + si.value AS "$ctiGroupName" +FROM + COMMUNICATION_TASK_INFO cti + LEFT JOIN COMMUNICATION_OP co on cti.opId = co.opId + CROSS JOIN RANK_DEVICE_MAP rdm + JOIN STRING_IDS si on cti.groupName = si.id + JOIN STRING_IDS sii on co.groupName = sii.id + JOIN STRING_IDS siii on co.opName = siii.id + JOIN STRING_IDS siiii on cti.taskType = siiii.id +""") + + +class P2PPairingExport(BaseStatsExport): + + CO_OP_NAME = "opNameId" + OP_NAME = "opName" + START_TIME = "startTime" + END_TIME = "endTime" + GLOBAL_RANK = "globalRank" + SRC_RANK = "srcRank" + DST_RANK = "dstRank" + TASK_TYPE = "taskType" + CO_GROUP_NAME = "coGroupName" + CTI_GROUP_NAME = "ctiGroupName" + + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY.safe_substitute( + opNameId=self.CO_OP_NAME, + opName=self.OP_NAME, + startTime=self.START_TIME, + endTime=self.END_TIME, + globalRank=self.GLOBAL_RANK, + srcRank=self.SRC_RANK, + dstRank=self.DST_RANK, + taskType=self.TASK_TYPE, + coGroupName=self.CO_GROUP_NAME, + ctiGroupName=self.CTI_GROUP_NAME + ) diff --git a/profiler/msprof_analyze/prof_exports/slow_link_export.py b/profiler/msprof_analyze/prof_exports/slow_link_export.py new file mode 100644 index 0000000000000000000000000000000000000000..c584ceb2b2afbbe89c180b5887a6b99e961d96e6 --- /dev/null +++ b/profiler/msprof_analyze/prof_exports/slow_link_export.py @@ -0,0 +1,54 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from msprof_analyze.prof_exports.base_stats_export import BaseStatsExport + +QUERY = """ + SELECT + si.value AS groupName, + co.endNs - co.startNs AS communicationTime, + sii.value AS opName, + op.value AS opType, + et.name AS dataType, + CASE + WHEN et.name = 'INT8' THEN 1 * co.count + WHEN et.name = 'INT16' THEN 2 * co.count + WHEN et.name = 'INT32' THEN 4 * co.count + WHEN et.name = 'INT64' THEN 8 * co.count + WHEN et.name = 'UINT64' THEN 8 * co.count + WHEN et.name = 'UINT8' THEN 1 * co.count + WHEN et.name = 'UINT16' THEN 2 * co.count + WHEN et.name = 'UINT32' THEN 4 * co.count + WHEN et.name = 'FP16' THEN 2 * co.count + WHEN et.name = 'FP32' THEN 4 * co.count + WHEN et.name = 'FP64' THEN 8 * co.count + WHEN et.name = 'BFP16' THEN 2 * co.count + WHEN et.name = 'INT128' THEN 16 * co.count + END AS dataSize + FROM + COMMUNICATION_OP co + CROSS + JOIN STRING_IDS si ON co.groupName = si.id + JOIN STRING_IDS sii ON co.opName = sii.id + JOIN ENUM_HCCL_DATA_TYPE et ON co.dataType = et.id + JOIN STRING_IDS op ON co.opType = op.id +""" + + +class SlowLinkExport(BaseStatsExport): + + def __init__(self, db_path, recipe_name): + super().__init__(db_path, recipe_name) + self._query = QUERY diff --git a/profiler/msprof_analyze/requirements/build.txt b/profiler/msprof_analyze/requirements/build.txt index 3ef20e787be3bad76de0ccde4dc3e3a1dbe63efb..9bb3af4b2a9cdb8401a8c9c44bc6140fc5dc80ec 100644 --- a/profiler/msprof_analyze/requirements/build.txt +++ b/profiler/msprof_analyze/requirements/build.txt @@ -7,7 +7,7 @@ tqdm prettytable ijson requests -xlsxwriter>=3.0.6 +xlsxwriter sqlalchemy urllib3<2.0 numpy<=1.26.4 diff --git a/profiler/msprof_analyze/test/ut/advisor/common/test_enum_params_parser.py b/profiler/msprof_analyze/test/ut/advisor/common/test_enum_params_parser.py index 5d11af12781eb5845d82e5415253fa93be99cf6b..608a007a60286c6f13312d3d7879b92387034ec9 100644 --- a/profiler/msprof_analyze/test/ut/advisor/common/test_enum_params_parser.py +++ b/profiler/msprof_analyze/test/ut/advisor/common/test_enum_params_parser.py @@ -1,63 +1,63 @@ -# Copyright (c) 2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import unittest - -from msprof_analyze.advisor.common.enum_params_parser import EnumParamsParser -from msprof_analyze.test.ut.advisor.advisor_backend.tools.tool import recover_env - - -class TestEnumParamsParser(unittest.TestCase): - @classmethod - def tearDownClass(cls) -> None: - recover_env() - - def setUp(self) -> None: - self.enum_params_parser = EnumParamsParser() - self.argument_keys = sorted(["cann_version", "torch_version", "analysis_dimensions", "profiling_type", "mindspore_version"]) - self.env_keys = ["ADVISOR_ANALYZE_PROCESSES", "DISABLE_PROFILING_COMPARISON", "DISABLE_AFFINITY_API"] - - def test_get_keys(self): - total_keys = sorted(self.argument_keys + self.env_keys) - keys = sorted(self.enum_params_parser.get_keys()) - self.assertTrue(isinstance(keys, list)) - self.assertEqual(keys, total_keys) - - def test_get_argument_keys(self): - argument_keys = sorted(self.enum_params_parser.get_arguments_keys()) - self.assertTrue(isinstance(argument_keys, list)) - self.assertEqual(argument_keys, self.argument_keys) - - def test_get_env_keys(self): - env_keys = sorted(self.enum_params_parser.get_envs_keys()) - self.assertTrue(isinstance(env_keys, list)) - self.assertEqual(env_keys, sorted(self.env_keys)) - - def test_get_default(self): - self.assertTrue(self.enum_params_parser.get_default("cann_version"), "8.0.RC1") - self.assertTrue(self.enum_params_parser.get_default("torch_version"), "2.1.0") - self.assertTrue(self.enum_params_parser.get_default("analysis_dimensions"), - ["computation", "communication", "schedule", "memory"]) - self.assertTrue(self.enum_params_parser.get_default("profiling_type"), "ascend_pytorch_profiler") - self.assertTrue(self.enum_params_parser.get_default("ADVISOR_ANALYZE_PROCESSES"), 1) - - def test_get_options(self): - self.assertTrue(self.enum_params_parser.get_options("cann_version"), ["6.3.RC2", "7.0.RC1", "7.0.0", "8.0.RC1"]) - self.assertTrue(self.enum_params_parser.get_options("torch_version"), ["1.11.0", "2.1.0"]) - self.assertTrue(self.enum_params_parser.get_options("analysis_dimensions"), - [["computation", "communication", "schedule", "memory"], ["communication"], ["schedule"], - ["computation"], ["memory"]]) - self.assertTrue(self.enum_params_parser.get_options("profiling_type"), - ["ascend_pytorch_profiler", "mslite", "msprof"]) - self.assertTrue(self.enum_params_parser.get_options("ADVISOR_ANALYZE_PROCESSES"), list(range(1, 9))) +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import unittest + +from msprof_analyze.advisor.common.enum_params_parser import EnumParamsParser +from msprof_analyze.test.ut.advisor.advisor_backend.tools.tool import recover_env + + +class TestEnumParamsParser(unittest.TestCase): + @classmethod + def tearDownClass(cls) -> None: + recover_env() + + def setUp(self) -> None: + self.enum_params_parser = EnumParamsParser() + self.argument_keys = sorted(["cann_version", "torch_version", "analysis_dimensions", "profiling_type", "mindspore_version"]) + self.env_keys = ["ADVISOR_ANALYZE_PROCESSES", "DISABLE_PROFILING_COMPARISON", "DISABLE_AFFINITY_API"] + + def test_get_keys(self): + total_keys = sorted(self.argument_keys + self.env_keys) + keys = sorted(self.enum_params_parser.get_keys()) + self.assertTrue(isinstance(keys, list)) + self.assertEqual(keys, total_keys) + + def test_get_argument_keys(self): + argument_keys = sorted(self.enum_params_parser.get_arguments_keys()) + self.assertTrue(isinstance(argument_keys, list)) + self.assertEqual(argument_keys, self.argument_keys) + + def test_get_env_keys(self): + env_keys = sorted(self.enum_params_parser.get_envs_keys()) + self.assertTrue(isinstance(env_keys, list)) + self.assertEqual(env_keys, sorted(self.env_keys)) + + def test_get_default(self): + self.assertTrue(self.enum_params_parser.get_default("cann_version"), "8.0.RC1") + self.assertTrue(self.enum_params_parser.get_default("torch_version"), "2.1.0") + self.assertTrue(self.enum_params_parser.get_default("analysis_dimensions"), + ["computation", "communication", "schedule", "memory"]) + self.assertTrue(self.enum_params_parser.get_default("profiling_type"), "ascend_pytorch_profiler") + self.assertTrue(self.enum_params_parser.get_default("ADVISOR_ANALYZE_PROCESSES"), 1) + + def test_get_options(self): + self.assertTrue(self.enum_params_parser.get_options("cann_version"), ["6.3.RC2", "7.0.RC1", "7.0.0", "8.0.RC1"]) + self.assertTrue(self.enum_params_parser.get_options("torch_version"), ["1.11.0", "2.1.0"]) + self.assertTrue(self.enum_params_parser.get_options("analysis_dimensions"), + [["computation", "communication", "schedule", "memory"], ["communication"], ["schedule"], + ["computation"], ["memory"]]) + self.assertTrue(self.enum_params_parser.get_options("profiling_type"), + ["ascend_pytorch_profiler", "mslite", "msprof"]) + self.assertTrue(self.enum_params_parser.get_options("ADVISOR_ANALYZE_PROCESSES"), list(range(1, 9))) diff --git a/profiler/msprof_analyze/test/ut/advisor/compute_advice/data/kernel_details.csv b/profiler/msprof_analyze/test/ut/advisor/compute_advice/data/kernel_details.csv deleted file mode 100644 index 8a255e939ae2ff4e781c7a356b342815838e2ff3..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/test/ut/advisor/compute_advice/data/kernel_details.csv +++ /dev/null @@ -1,30 +0,0 @@ -Step Id,Model ID,Task ID,Stream ID,Name,Type,OP State,Accelerator Core,Start Time(us),Duration(us),Wait Time(us),Block Dim,Mix Block Dim,HF32 Eligible,Input Shapes,Input Data Types,Input Formats,Output Shapes,Output Data Types,Output Formats,Context ID,aicore_time(us),aic_total_cycles,aic_mac_time(us),aic_mac_ratio,aic_scalar_time(us),aic_scalar_ratio,aic_mte1_time(us),aic_mte1_ratio,aic_mte2_time(us),aic_mte2_ratio,aic_fixpipe_time(us),aic_fixpipe_ratio,aic_icache_miss_rate,aiv_time(us),aiv_total_cycles,aiv_vec_time(us),aiv_vec_ratio,aiv_scalar_time(us),aiv_scalar_ratio,aiv_mte2_time(us),aiv_mte2_ratio,aiv_mte3_time(us),aiv_mte3_ratio,aiv_icache_miss_rate,cube_utilization(%) -19,4294967295,61653,2,aclnnMatmul_MatMulCommon_MatMulV2,MatMulV2,dynamic,AI_CORE,"1736413971558972.912 ",185.504,1.087,16,0,NO,"""81920,4096;8192,512""",DT_BF16;DT_BF16,ND;ND,"""4096,512""",DT_BF16,ND,N/A,183.87,5295467,151.425,0.824,88.03,0.479,119.148,0.648,177.314,0.964,5.736,0.031,0.001,0,0,0,0,0,0,0,0,0,0,0,79.295 -19,4294967295,61669,2,aclnnMatmul_MatMulV3Common_MatMulV3,MatMulV3,dynamic,AI_CORE,"1736413971560588.764 ",501.17,2.2,20,0,NO,"""81920,1536;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,478.701,17233251,356.349,0.744,118.087,0.247,296.009,0.618,452.112,0.944,35.833,0.075,0.001,0,0,0,0,0,0,0,0,0,0,0,95.517 -19,4294967295,61694,2,aclnnMatmul_MatMulCommon_MatMulV2,MatMulV2,dynamic,AI_CORE,"1736413971565213.257 ",186.823,1.178,16,0,NO,"""81920,4096;8192,512""",DT_BF16;DT_BF16,ND;ND,"""4096,512""",DT_BF16,ND,N/A,183.728,5291376,151.502,0.825,87.902,0.478,118.519,0.645,177.654,0.967,5.773,0.031,0.001,0,0,0,0,0,0,0,0,0,0,0,78.675 -19,4294967295,61710,2,aclnnMatmul_MatMulV3Common_MatMulV3,MatMulV3,dynamic,AI_CORE,"1736413971566843.489 ",516.991,2.33,20,0,NO,"""81920,1536;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,491.775,17703905,356.249,0.724,118.59,0.241,295.046,0.6,463.696,0.943,37.671,0.077,0.001,0,0,0,0,0,0,0,0,0,0,0,95.123 -19,4294967295,61735,2,aclnnMatmul_MatMulCommon_MatMulV2,MatMulV2,dynamic,AI_CORE,"1736413971571596.404 ",187.724,0.766,16,0,NO,"""81920,4096;8192,512""",DT_BF16;DT_BF16,ND;ND,"""4096,512""",DT_BF16,ND,N/A,184.904,5325221,151.489,0.819,87.893,0.475,118.63,0.642,178.815,0.967,5.77,0.031,0.001,0,0,0,0,0,0,0,0,0,0,0,78.798 -19,4294967295,61751,2,aclnnMatmul_MatMulV3Common_MatMulV3,MatMulV3,dynamic,AI_CORE,"1736413971573223.437 ",514.87,2.15,20,0,NO,"""81920,1536;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,486.931,17529512,356.117,0.731,118.847,0.244,295.529,0.607,457.002,0.939,37.938,0.078,0.001,0,0,0,0,0,0,0,0,0,0,0,94.574 -19,4294967295,61776,2,aclnnMatmul_MatMulCommon_MatMulV2,MatMulV2,dynamic,AI_CORE,"1736413971577931.851 ",190.544,1.367,16,0,NO,"""81920,4096;8192,512""",DT_BF16;DT_BF16,ND;ND,"""4096,512""",DT_BF16,ND,N/A,187.073,5387702,151.741,0.811,87.935,0.47,117.467,0.628,181.043,0.968,5.803,0.031,0.001,0,0,0,0,0,0,0,0,0,0,0,78.543 -19,4294967295,61792,2,aclnnMatmul_MatMulV3Common_MatMulV3,MatMulV3,dynamic,AI_CORE,"1736413971579566.403 ",504.071,2.28,20,0,NO,"""81920,1536;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,485.542,17479517,356.283,0.734,117.755,0.243,296.421,0.61,455.064,0.937,37.75,0.078,0.001,0,0,0,0,0,0,0,0,0,0,0,96.324 -19,4294967295,13792,2,aclnnMatmul_MatMulV3Common_MatMulV5,MatMulV3,dynamic,AI_CORE,"1736413974248200.543 ",521.31,2.22,20,0,NO,"""8192,15365;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,499.234,17972434,356.364,0.714,117.639,0.236,295.58,0.592,471.784,0.945,35.825,0.072,0.001,0,0,0,0,0,0,0,0,0,0,0,95.765 -19,4294967295,13792,2,aclnnMatmul_MatMulV3Common_MatMulV5,MatMulV3,dynamic,AI_CORE,"1736413974248200.543 ",521.31,2.22,20,0,NO,"""8192,15365;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,499.234,17972434,356.364,0.714,117.639,0.236,295.58,0.592,471.784,0.945,35.825,0.072,0.001,0,0,0,0,0,0,0,0,0,0,0,95.765 -19,4294967295,13792,2,aclnnMatmul_MatMulV3Common_MatMulV5,MatMulV3,dynamic,AI_CORE,"1736413974248200.543 ",521.31,2.22,20,0,NO,"""8192,15365;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,499.234,17972434,356.364,0.714,117.639,0.236,295.58,0.592,471.784,0.945,35.825,0.072,0.001,0,0,0,0,0,0,0,0,0,0,0,95.765 -19,4294967295,13792,2,aclnnMatmul_MatMulV3Common_MatMulV5,MatMulV3,dynamic,AI_CORE,"1736413974248200.543 ",521.31,2.22,20,0,NO,"""8192,15365;8192,4096""",DT_BF16;DT_BF16,ND;ND,"""1536,4096""",DT_BF16,ND,N/A,499.234,17972434,356.364,0.714,117.639,0.236,295.58,0.592,471.784,0.945,35.825,0.072,0.001,0,0,0,0,0,0,0,0,0,0,0,95.765 -19,4294967295,60679,2,aclnnFlashAttentionScore_FlashAttentionScore_FlashAttentionScore,FlashAttentionScore,dynamic,MIX_AIC,"1736413971411629.128 ",410.188,1.53,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;;;;4096,4096;;;;;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;UINT8;DT_BF16;BOOL;INT64;INT64;INT64;INT64;INT64,NCL;NCL;NCL;ND;ND;ND;ND;ND;ND;ND;ND;ND,"""2,4,4096,8;2,4,4096,8;;4096,2,512""",FLOAT;FLOAT;DT_BF16;DT_BF16,ND;ND;ND;ND,0,366.147,13181275,129.055,0.352,352.275,0.962,108.364,0.296,172.86,0.872,216.141,0.59,0.003,365.782,26336326,228.687,0.625,137.979,0.377,118.603,0.324,71.448,0.195,0.013,89.263 -19,4294967295,60707,2,aclnnFlashAttentionScore_FlashAttentionScore_FlashAttentionScore,FlashAttentionScore,dynamic,MIX_AIC,"1736413971415611.468 ",406.128,1.279,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;;;;4096,4096;;;;;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;UINT8;DT_BF16;BOOL;INT64;INT64;INT64;INT64;INT64,NCL;NCL;NCL;ND;ND;ND;ND;ND;ND;ND;ND;ND,"""2,4,4096,8;2,4,4096,8;;4096,2,512""",FLOAT;FLOAT;DT_BF16;DT_BF16,ND;ND;ND;ND,0,358.77,12915719,128.96,0.359,345.096,0.962,108.337,0.302,168.284,0.869,209.057,0.583,0.003,358.308,25798146,228.693,0.638,137.809,0.385,108.679,0.303,70.099,0.196,0.013,88.339 -19,4294967295,60735,2,aclnnFlashAttentionScore_FlashAttentionScore_FlashAttentionScore,FlashAttentionScore,dynamic,MIX_AIC,"1736413971420248.800 ",407.008,0.84,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;;;;4096,4096;;;;;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;UINT8;DT_BF16;BOOL;INT64;INT64;INT64;INT64;INT64,NCL;NCL;NCL;ND;ND;ND;ND;ND;ND;ND;ND;ND,"""2,4,4096,8;2,4,4096,8;;4096,2,512""",FLOAT;FLOAT;DT_BF16;DT_BF16,ND;ND;ND;ND,0,359.702,12949284,128.975,0.359,346.306,0.963,108.43,0.301,166.899,0.864,209.018,0.581,0.003,359.274,25867705,228.693,0.637,138.438,0.385,107.723,0.3,70.146,0.195,0.013,88.377 -19,4294967295,60763,2,aclnnFlashAttentionScore_FlashAttentionScore_FlashAttentionScore,FlashAttentionScore,dynamic,MIX_AIC,"1736413971424592.447 ",405.228,1.35,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;;;;4096,4096;;;;;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;UINT8;DT_BF16;BOOL;INT64;INT64;INT64;INT64;INT64,NCL;NCL;NCL;ND;ND;ND;ND;ND;ND;ND;ND;ND,"""2,4,4096,8;2,4,4096,8;;4096,2,512""",FLOAT;FLOAT;DT_BF16;DT_BF16,ND;ND;ND;ND,0,359.793,12952532,128.923,0.358,345.768,0.961,108.411,0.301,167.379,0.865,208.79,0.58,0.003,359.294,25869164,228.691,0.637,138.411,0.385,107.868,0.3,70.163,0.195,0.013,88.788 -19,4294967295,61655,2,aclnnFlashAttentionScoreGrad_FlashAttentionScoreGrad_FlashAttentionScoreGrad,FlashAttentionScoreGrad,dynamic,MIX_AIC,"1736413971559180.676 ",762.215,1.37,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;4096,2,512;4096,4096;2,4,4096,8;2,4,4096,8;;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;BOOL;FLOAT;FLOAT;DT_BF16;DT_BF16;INT64,NCL;NCL;NCL;NCL;ND;NCHW;NCHW;ND;NCL;ND,"""4096,2,512;4096,2,512;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16,ND;ND;ND;ND,0,755.664,27203907,344.023,0.455,592.472,0.784,266.388,0.353,397.091,0.525,589.726,0.525,0.004,755.04,54362915,318.452,0.422,184.623,0.245,206.78,0.274,152.973,0.203,0.006,99.141 -19,4294967295,61696,2,aclnnFlashAttentionScoreGrad_FlashAttentionScoreGrad_FlashAttentionScoreGrad,FlashAttentionScoreGrad,dynamic,MIX_AIC,"1736413971565420.821 ",763.215,1.189,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;4096,2,512;4096,4096;2,4,4096,8;2,4,4096,8;;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;BOOL;FLOAT;FLOAT;DT_BF16;DT_BF16;INT64,NCL;NCL;NCL;NCL;ND;NCHW;NCHW;ND;NCL;ND,"""4096,2,512;4096,2,512;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16,ND;ND;ND;ND,0,757.83,27281885,344.047,0.454,595.954,0.786,266.123,0.351,389.105,0.513,576.226,0.513,0.004,757.046,54507345,318.443,0.421,188.292,0.249,200.176,0.264,162.113,0.214,0.006,99.294 -19,4294967295,61737,2,aclnnFlashAttentionScoreGrad_FlashAttentionScoreGrad_FlashAttentionScoreGrad,FlashAttentionScoreGrad,dynamic,MIX_AIC,"1736413971571804.228 ",757.095,0.88,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;4096,2,512;4096,4096;2,4,4096,8;2,4,4096,8;;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;BOOL;FLOAT;FLOAT;DT_BF16;DT_BF16;INT64,NCL;NCL;NCL;NCL;ND;NCHW;NCHW;ND;NCL;ND,"""4096,2,512;4096,2,512;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16,ND;ND;ND;ND,0,750.605,27021778,343.983,0.458,586.708,0.782,266.304,0.355,392.522,0.523,584.432,0.523,0.004,749.913,53993736,318.436,0.425,188.508,0.251,207.668,0.277,152.634,0.204,0.006,99.143 -19,4294967295,61778,2,aclnnFlashAttentionScoreGrad_FlashAttentionScoreGrad_FlashAttentionScoreGrad,FlashAttentionScoreGrad,dynamic,MIX_AIC,"1736413971578144.095 ",755.915,1.22,20,40,NO,"""4096,2,512;4096,2,512;4096,2,512;4096,2,512;4096,4096;2,4,4096,8;2,4,4096,8;;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;BOOL;FLOAT;FLOAT;DT_BF16;DT_BF16;INT64,NCL;NCL;NCL;NCL;ND;NCHW;NCHW;ND;NCL;ND,"""4096,2,512;4096,2,512;4096,2,512;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16,ND;ND;ND;ND,0,750.152,27005467,344.115,0.459,579.317,0.772,266.08,0.355,398.019,0.531,587.37,0.531,0.004,749.348,53953058,318.444,0.425,186.908,0.249,207.068,0.276,151.329,0.202,0.006,99.238 -19,4294967295,60763,2,aclnnFlashAttentionScore_FlashAttentionScore_FlashAttentionScore_varlen,FlashAttentionScore,dynamic,MIX_AIC,"1736413971424592.447 ",405.228,1.35,20,40,NO,"""4096,2,511;4096,2,512;4096,2,512;;;;4096,4096;;;;;""",DT_BF16;DT_BF16;DT_BF16;DT_BF16;UINT8;DT_BF16;BOOL;INT64;INT64;INT64;INT64;INT64,NCL;NCL;NCL;ND;ND;ND;ND;ND;ND;ND;ND;ND,"""2,3,4096,8;2,4,4096,8;;4096,2,512""",FLOAT;FLOAT;DT_BF16;DT_BF16,ND;ND;ND;ND,0,359.793,12952532,128.923,0.358,345.768,0.961,108.411,0.301,167.379,0.465,208.79,0.58,0.003,359.294,25869164,228.691,0.637,138.411,0.385,107.868,0.3,70.163,0.195,0.013,88.788 -19,4294967295,60683,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413971412768.871 ",26.78,0.485,40,0,NO,"""512,2,4096;512,2,4096""",DT_BF16;DT_BF16,NCL;NCL,"""512,2,4096""",DT_BF16,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,24.19,1741674,5.986,0.247,1.352,0.056,20.363,0.842,3.195,0.132,0.027,0 -19,4294967295,60690,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413971414677.549 ",31.201,0.664,40,0,NO,"""512,2,4096;512,2,4096""",DT_BF16;DT_BF16,NCL;NCL,"""512,2,4096""",DT_BF16,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,28.617,2060443,5.986,0.209,1.444,0.05,25.005,0.874,3.336,0.117,0.026,0 -19,4294967295,60711,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413971416743.250 ",27.021,1.246,40,0,NO,"""512,2,4096;512,2,4096""",DT_BF16;DT_BF16,NCL;NCL,"""512,2,4096""",DT_BF16,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,24.304,1749862,5.986,0.246,1.258,0.052,20.424,0.84,3.23,0.133,0.027,0 -19,4294967295,60718,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413971419318.962 ",25.08,0.984,40,0,NO,"""512,2,4096;512,2,4096""",DT_BF16;DT_BF16,NCL;NCL,"""512,2,4096""",DT_BF16,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,22.47,1617840,5.989,0.267,2.009,0.089,18.809,0.837,3.191,0.142,0.024,0 -19,4294967295,13907,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413974268377.206 ",1.38,31.48,1,0,NO,""";""",FLOAT;FLOAT,ND;ND,"""""",FLOAT,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,0.883,1589,0.027,0.03,0.265,0.3,0.18,0.204,0.108,0.123,0.182,0 -19,4294967295,13910,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413974268502.128 ",1.46,17.48,1,0,NO,""";""",FLOAT;FLOAT,ND;ND,"""""",FLOAT,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,0.948,1706,0.027,0.028,0.276,0.291,0.217,0.229,0.127,0.134,0.174,0 -19,4294967295,13913,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413974268605.410 ",1.5,0.09,1,0,NO,""";""",FLOAT;FLOAT,ND;ND,"""""",FLOAT,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,0.96,1728,0.027,0.028,0.268,0.28,0.221,0.23,0.132,0.137,0.145,0 -19,4294967295,13916,2,aclnnAdd_AddAiCore_Add,Add,dynamic,AI_VECTOR_CORE,"1736413974268747.953 ",1.58,28.28,1,0,NO,""";""",FLOAT;FLOAT,ND;ND,"""""",FLOAT,ND,N/A,0,0,0,0,0,0,0,0,0,0,0,0,0,1.107,1993,0.027,0.024,0.426,0.384,0.201,0.181,0.118,0.106,0.162,0 \ No newline at end of file diff --git a/profiler/msprof_analyze/test/ut/advisor/compute_advice/test_ai_core_performance_advice.py b/profiler/msprof_analyze/test/ut/advisor/compute_advice/test_ai_core_performance_advice.py deleted file mode 100644 index c8196f5eefdee0c1f3819916b261a002017ba987..0000000000000000000000000000000000000000 --- a/profiler/msprof_analyze/test/ut/advisor/compute_advice/test_ai_core_performance_advice.py +++ /dev/null @@ -1,85 +0,0 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2025. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os -import shutil - -import unittest -from msprof_analyze.advisor.interface.interface import Interface -from msprof_analyze.advisor.common.analyzer_scopes import SupportedScopes - - -class TestAICorePerformanceAdvice(unittest.TestCase): - TMP_DIR = "./ascend_pt" - OUTPUT_DIR = "./ascend_pt/ASCEND_PROFILER_OUTPUT" - interface = None - err_interface = None - - @classmethod - def clear_htmls(cls): - current_path = os.path.dirname(os.path.abspath(__file__)) - for filename in os.listdir(current_path): - # 检查文件是否以“mstt”开头 - if filename.startswith("mstt"): - # 构建文件的完整路径 - file_path = os.path.join(current_path, filename) - # 删除文件 - os.remove(file_path) - - @classmethod - def copy_kernel_details(cls, path): - # Define source and destination paths - source_csv_path = os.path.join(os.path.dirname(__file__), 'data', path) - destination_csv_path = f"{TestAICorePerformanceAdvice.OUTPUT_DIR}/kernel_details.csv" - - # Check if source CSV file exists - if not os.path.exists(source_csv_path): - raise FileNotFoundError(f"test data file not found:{source_csv_path}") - - # Ensure the output directory exists - if not os.path.exists(TestAICorePerformanceAdvice.OUTPUT_DIR): - os.makedirs(TestAICorePerformanceAdvice.OUTPUT_DIR) - - # Copy the CSV file from source to destination - shutil.copyfile(source_csv_path, destination_csv_path) - - def tearDown(self): - if os.path.exists(TestAICorePerformanceAdvice.TMP_DIR): - shutil.rmtree(TestAICorePerformanceAdvice.TMP_DIR) - self.clear_htmls() - - def setUp(self): - if os.path.exists(TestAICorePerformanceAdvice.TMP_DIR): - shutil.rmtree(TestAICorePerformanceAdvice.TMP_DIR) - if not os.path.exists(TestAICorePerformanceAdvice.TMP_DIR): - os.makedirs(TestAICorePerformanceAdvice.TMP_DIR) - if not os.path.exists(TestAICorePerformanceAdvice.OUTPUT_DIR): - os.makedirs(TestAICorePerformanceAdvice.OUTPUT_DIR) - self.clear_htmls() - - def test_ai_core_performance_total(self): - file_path = "kernel_details.csv" - self.copy_kernel_details(file_path) - interface = Interface(profiling_path=self.TMP_DIR) - dimension = Interface.COMPUTATION - scope = SupportedScopes.AICORE_PERFORMANCE_ANALYSIS - result = interface.get_result(dimension, scope, render_html=1, output_dict=False, profiling_path=self.TMP_DIR) - self.assertLess(1, len(result.data.get("Cube算子性能分析").get("data")[0])) - self.assertLess(1, len(result.data.get("Cube算子性能分析").get("data")[1])) - self.assertLess(1, len(result.data.get("Cube算子性能分析").get("data")[2])) - self.assertLess(1, len(result.data.get("FA算子性能分析").get("data")[0])) - self.assertLess(1, len(result.data.get("FA算子性能分析").get("data")[1])) - self.assertLess(1, len(result.data.get("FA算子性能分析").get("data")[2])) - self.assertLess(1, len(result.data.get("Vector算子性能分析").get("data")[0])) - self.assertLess(1, len(result.data.get("Vector算子性能分析").get("data")[1])) - result.clear() \ No newline at end of file diff --git a/profiler/msprof_analyze/test/ut/advisor/timeline_advice/test_timeline_op_collector.py b/profiler/msprof_analyze/test/ut/advisor/timeline_advice/test_timeline_op_collector.py index edef567259f8778896e6f3d3291fb4649664aecc..65b9a6cea045958b2fabfbca9db60b60134b01ca 100644 --- a/profiler/msprof_analyze/test/ut/advisor/timeline_advice/test_timeline_op_collector.py +++ b/profiler/msprof_analyze/test/ut/advisor/timeline_advice/test_timeline_op_collector.py @@ -1,152 +1,152 @@ -# Copyright (c) 2025, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import unittest - -from msprof_analyze.advisor.dataset.timeline_op_collector.timeline_op_collector import ( - OpCompileCollector, - SynchronizeStreamCollector, - MemCollector, - DataloaderCollector, - SyncBNCollector, - AtenCollector, - OptimizerCollector, - FrequencyCollector, - SpecificTaskTypeOpCollector, - TorchToNpuCollector, - AclToNpuCollector, - OpStackCollector, - StepCollector -) -from msprof_analyze.advisor.common.timeline.event import TimelineEvent -from msprof_analyze.test.ut.advisor.advisor_backend.tools.tool import recover_env - - -class TestTimelineOpCollector(unittest.TestCase): - @classmethod - def tearDownClass(cls) -> None: - recover_env() - - def setUp(self) -> None: - self.mock_step_event = TimelineEvent(dict(name="ProfilerStep#1", ts=1, dur=1000)) - self.mock_op_compile_event = TimelineEvent(dict(name="AscendCL@aclopCompileAndExecute", ts=2, dur=1)) - self.mock_sync_stream_event = TimelineEvent(dict(name="AscendCL@aclrtSynchronizeStream", dur=1000000000)) - self.mock_mem_op_event = TimelineEvent(dict(name="AscendCL@aclMallocMemInner", dur=10)) - self.mock_dataloader_event = TimelineEvent(dict(name="dataloader")) - self.mock_sync_bn_event = TimelineEvent(dict(name="syncbatchnorm")) - self.mock_aten_event = TimelineEvent(dict(name="aten::conv3d")) - self.mock_optimizer_event = TimelineEvent(dict(name="Optimizer.step#")) - self.mock_AI_CPU_event = TimelineEvent( - {"name": "index", "args": TimelineEvent({"Task Type": "AI_CPU"}), "ts": 1}) - self.mock_torch_to_npu_event = TimelineEvent(dict(name="torch_to_npu", tid=1, ts=1, ph=1, id=1)) - self.mock_acl_to_npu_event = TimelineEvent(dict(name="acl_to_npu", ts=1)) - self.mock_op_stack_event = TimelineEvent( - {"name": "aten::conv3d", "dataset_index": 1, "ts": 1, "args": TimelineEvent({"Call stack": "mock_stack"})}) - - def test_step_collector(self): - step_collector = StepCollector() - step_collector.add_op(self.mock_step_event) - step_collector.post_process() - self.assertEqual(step_collector.attribute_to_dataset.get("profiler_step"), [self.mock_step_event]) - - def test_op_compile_collector(self): - op_compile_collector = OpCompileCollector() - op_compile_collector.add_op(self.mock_op_compile_event) - op_compile_collector.post_process(op_compile_collector.op_list) - self.assertEqual(op_compile_collector.attribute_to_dataset.get("ops_compile"), op_compile_collector) - self.assertEqual(op_compile_collector.total_time, 1) - self.assertEqual(op_compile_collector.total_count, 1) - - def test_sync_stream_collector(self): - sync_stream_collector = SynchronizeStreamCollector() - sync_stream_collector.post_process() - self.assertEqual(sync_stream_collector.attribute_to_dataset.get("synchronize_stream"), []) - - def test_mem_op_collector(self): - mem_op_collector = MemCollector() - mem_op_collector.add_op(self.mock_mem_op_event) - mem_op_collector.post_process(mem_op_collector.op_list) - self.assertEqual(mem_op_collector.attribute_to_dataset.get("memory_ops"), mem_op_collector) - self.assertEqual(mem_op_collector.mem_op_info.get("AscendCL@aclMallocMemInner"), {"count": 1, "total_dur": 10}) - - def test_dataloader_collector(self): - dataloader_collector = DataloaderCollector() - dataloader_collector.add_op(self.mock_dataloader_event) - dataloader_collector.post_process() - self.assertEqual(len(dataloader_collector.attribute_to_dataset.get("dataloader")), 1) - - def test_sync_bn_collector(self): - sync_bn_collector = SyncBNCollector() - sync_bn_collector.add_op(self.mock_sync_bn_event) - sync_bn_collector.post_process(sync_bn_collector.op_list) - self.assertEqual(len(sync_bn_collector.attribute_to_dataset.get("sync_batchnorm")), 1) - - def test_aten_collector(self): - aten_collector = AtenCollector() - aten_collector.add_op(self.mock_aten_event) - aten_collector.add_op(self.mock_sync_stream_event) - aten_collector.post_process(aten_collector.op_list) - self.assertEqual(len(aten_collector.attribute_to_dataset.get("aten")), 2) - - def test_optimizer_collector(self): - optimizer_collector = OptimizerCollector() - optimizer_collector.add_op(self.mock_optimizer_event) - optimizer_collector.post_process(optimizer_collector.op_list) - self.assertEqual(len(optimizer_collector.attribute_to_dataset.get("optimizer")), 1) - - def test_specific_task_type_op_collector(self): - specific_task_type_op_collector = SpecificTaskTypeOpCollector() - specific_task_type_op_collector.add_op(self.mock_AI_CPU_event) - specific_task_type_op_collector.post_process(specific_task_type_op_collector.op_list) - key = f"{self.mock_AI_CPU_event.name}-{self.mock_AI_CPU_event.ts}" - self.assertTrue( - specific_task_type_op_collector.attribute_to_dataset.get("ops_with_task_type", {}).get(key)) - self.assertTrue(specific_task_type_op_collector.attribute_to_dataset.get("task_op_names"), [key]) - - def test_torch_to_npu_collector(self): - torch_to_npu_collector = TorchToNpuCollector() - torch_to_npu_collector.add_op(self.mock_torch_to_npu_event) - torch_to_npu_collector.post_process(torch_to_npu_collector.op_list) - key = f"{self.mock_torch_to_npu_event.ph}-{self.mock_torch_to_npu_event.id}" - self.assertTrue("1-1" in torch_to_npu_collector.attribute_to_dataset.get("torch_to_npu")) - - def test_acl_to_npu_collector(self): - acl_to_npu_collector = AclToNpuCollector() - acl_to_npu_collector.add_op(self.mock_acl_to_npu_event) - acl_to_npu_collector.post_process(acl_to_npu_collector.op_list) - self.assertEqual(acl_to_npu_collector.attribute_to_dataset.get("acl_to_npu"), - set([str(self.mock_acl_to_npu_event.ts)])) - - def test_op_stack_collector(self): - op_stack_collector = OpStackCollector() - op_stack_collector.add_op(self.mock_op_stack_event) - op_stack_collector.post_process(op_stack_collector.op_list) - self.assertTrue( - str(self.mock_op_stack_event.ts) in op_stack_collector.attribute_to_dataset.get("ops_with_stack")) - - -if __name__ == '__main__': - tester = TestTimelineOpCollector() - tester.test_step_collector() - tester.test_op_compile_collector() - tester.test_sync_stream_collector() - tester.test_mem_op_collector() - tester.test_dataloader_collector() - tester.test_sync_bn_collector() - tester.test_aten_collector() - tester.test_optimizer_collector() - tester.test_specific_task_type_op_collector() - tester.test_torch_to_npu_collector() - tester.test_acl_to_npu_collector() - tester.test_op_stack_collector() +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import unittest + +from msprof_analyze.advisor.dataset.timeline_op_collector.timeline_op_collector import ( + OpCompileCollector, + SynchronizeStreamCollector, + MemCollector, + DataloaderCollector, + SyncBNCollector, + AtenCollector, + OptimizerCollector, + FrequencyCollector, + SpecificTaskTypeOpCollector, + TorchToNpuCollector, + AclToNpuCollector, + OpStackCollector, + StepCollector +) +from msprof_analyze.advisor.common.timeline.event import TimelineEvent +from msprof_analyze.test.ut.advisor.advisor_backend.tools.tool import recover_env + + +class TestTimelineOpCollector(unittest.TestCase): + @classmethod + def tearDownClass(cls) -> None: + recover_env() + + def setUp(self) -> None: + self.mock_step_event = TimelineEvent(dict(name="ProfilerStep#1", ts=1, dur=1000)) + self.mock_op_compile_event = TimelineEvent(dict(name="AscendCL@aclopCompileAndExecute", ts=2, dur=1)) + self.mock_sync_stream_event = TimelineEvent(dict(name="AscendCL@aclrtSynchronizeStream", dur=1000000000)) + self.mock_mem_op_event = TimelineEvent(dict(name="AscendCL@aclMallocMemInner", dur=10)) + self.mock_dataloader_event = TimelineEvent(dict(name="dataloader")) + self.mock_sync_bn_event = TimelineEvent(dict(name="syncbatchnorm")) + self.mock_aten_event = TimelineEvent(dict(name="aten::conv3d")) + self.mock_optimizer_event = TimelineEvent(dict(name="Optimizer.step#")) + self.mock_AI_CPU_event = TimelineEvent( + {"name": "index", "args": TimelineEvent({"Task Type": "AI_CPU"}), "ts": 1}) + self.mock_torch_to_npu_event = TimelineEvent(dict(name="torch_to_npu", tid=1, ts=1, ph=1, id=1)) + self.mock_acl_to_npu_event = TimelineEvent(dict(name="acl_to_npu", ts=1)) + self.mock_op_stack_event = TimelineEvent( + {"name": "aten::conv3d", "dataset_index": 1, "ts": 1, "args": TimelineEvent({"Call stack": "mock_stack"})}) + + def test_step_collector(self): + step_collector = StepCollector() + step_collector.add_op(self.mock_step_event) + step_collector.post_process() + self.assertEqual(step_collector.attribute_to_dataset.get("profiler_step"), [self.mock_step_event]) + + def test_op_compile_collector(self): + op_compile_collector = OpCompileCollector() + op_compile_collector.add_op(self.mock_op_compile_event) + op_compile_collector.post_process(op_compile_collector.op_list) + self.assertEqual(op_compile_collector.attribute_to_dataset.get("ops_compile"), op_compile_collector) + self.assertEqual(op_compile_collector.total_time, 1) + self.assertEqual(op_compile_collector.total_count, 1) + + def test_sync_stream_collector(self): + sync_stream_collector = SynchronizeStreamCollector() + sync_stream_collector.post_process() + self.assertEqual(sync_stream_collector.attribute_to_dataset.get("synchronize_stream"), []) + + def test_mem_op_collector(self): + mem_op_collector = MemCollector() + mem_op_collector.add_op(self.mock_mem_op_event) + mem_op_collector.post_process(mem_op_collector.op_list) + self.assertEqual(mem_op_collector.attribute_to_dataset.get("memory_ops"), mem_op_collector) + self.assertEqual(mem_op_collector.mem_op_info.get("AscendCL@aclMallocMemInner"), {"count": 1, "total_dur": 10}) + + def test_dataloader_collector(self): + dataloader_collector = DataloaderCollector() + dataloader_collector.add_op(self.mock_dataloader_event) + dataloader_collector.post_process() + self.assertEqual(len(dataloader_collector.attribute_to_dataset.get("dataloader")), 1) + + def test_sync_bn_collector(self): + sync_bn_collector = SyncBNCollector() + sync_bn_collector.add_op(self.mock_sync_bn_event) + sync_bn_collector.post_process(sync_bn_collector.op_list) + self.assertEqual(len(sync_bn_collector.attribute_to_dataset.get("sync_batchnorm")), 1) + + def test_aten_collector(self): + aten_collector = AtenCollector() + aten_collector.add_op(self.mock_aten_event) + aten_collector.add_op(self.mock_sync_stream_event) + aten_collector.post_process(aten_collector.op_list) + self.assertEqual(len(aten_collector.attribute_to_dataset.get("aten")), 2) + + def test_optimizer_collector(self): + optimizer_collector = OptimizerCollector() + optimizer_collector.add_op(self.mock_optimizer_event) + optimizer_collector.post_process(optimizer_collector.op_list) + self.assertEqual(len(optimizer_collector.attribute_to_dataset.get("optimizer")), 1) + + def test_specific_task_type_op_collector(self): + specific_task_type_op_collector = SpecificTaskTypeOpCollector() + specific_task_type_op_collector.add_op(self.mock_AI_CPU_event) + specific_task_type_op_collector.post_process(specific_task_type_op_collector.op_list) + key = f"{self.mock_AI_CPU_event.name}-{self.mock_AI_CPU_event.ts}" + self.assertTrue( + specific_task_type_op_collector.attribute_to_dataset.get("ops_with_task_type", {}).get(key)) + self.assertTrue(specific_task_type_op_collector.attribute_to_dataset.get("task_op_names"), [key]) + + def test_torch_to_npu_collector(self): + torch_to_npu_collector = TorchToNpuCollector() + torch_to_npu_collector.add_op(self.mock_torch_to_npu_event) + torch_to_npu_collector.post_process(torch_to_npu_collector.op_list) + key = f"{self.mock_torch_to_npu_event.ph}-{self.mock_torch_to_npu_event.id}" + self.assertTrue("1-1" in torch_to_npu_collector.attribute_to_dataset.get("torch_to_npu")) + + def test_acl_to_npu_collector(self): + acl_to_npu_collector = AclToNpuCollector() + acl_to_npu_collector.add_op(self.mock_acl_to_npu_event) + acl_to_npu_collector.post_process(acl_to_npu_collector.op_list) + self.assertEqual(acl_to_npu_collector.attribute_to_dataset.get("acl_to_npu"), + set([str(self.mock_acl_to_npu_event.ts)])) + + def test_op_stack_collector(self): + op_stack_collector = OpStackCollector() + op_stack_collector.add_op(self.mock_op_stack_event) + op_stack_collector.post_process(op_stack_collector.op_list) + self.assertTrue( + str(self.mock_op_stack_event.ts) in op_stack_collector.attribute_to_dataset.get("ops_with_stack")) + + +if __name__ == '__main__': + tester = TestTimelineOpCollector() + tester.test_step_collector() + tester.test_op_compile_collector() + tester.test_sync_stream_collector() + tester.test_mem_op_collector() + tester.test_dataloader_collector() + tester.test_sync_bn_collector() + tester.test_aten_collector() + tester.test_optimizer_collector() + tester.test_specific_task_type_op_collector() + tester.test_torch_to_npu_collector() + tester.test_acl_to_npu_collector() + tester.test_op_stack_collector() diff --git a/profiler/msprof_analyze/test/ut/cluster_analyse/recipes/test_cluster_time_compare_summary.py b/profiler/msprof_analyze/test/ut/cluster_analyse/recipes/test_cluster_time_compare_summary.py new file mode 100644 index 0000000000000000000000000000000000000000..9cc3dd8180851afb00c2c8fb91e2a37ffb7a0973 --- /dev/null +++ b/profiler/msprof_analyze/test/ut/cluster_analyse/recipes/test_cluster_time_compare_summary.py @@ -0,0 +1,136 @@ +# Copyright (c) 2025, Huawei Technologies Co., Ltd. +# All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import unittest +from unittest import mock +import pandas as pd + +from msprof_analyze.cluster_analyse.recipes.cluster_time_compare_summary.cluster_time_compare_summary import \ + ClusterTimeCompareSummary +from msprof_analyze.prof_common.constant import Constant + +NAMESPACE = "msprof_analyze.prof_common" + + +class TestClusterTimeCompareSummary(unittest.TestCase): + PARAMS = { + Constant.COLLECTION_PATH: "/data", + Constant.DATA_MAP: {}, + Constant.DATA_TYPE: Constant.DB, + Constant.CLUSTER_ANALYSIS_OUTPUT_PATH: "./test_cluster_time_compare_summary", + Constant.RECIPE_NAME: "ClusterTimeCompareSummary", + Constant.RECIPE_CLASS: ClusterTimeCompareSummary, + Constant.PARALLEL_MODE: Constant.CONCURRENT_MODE, + Constant.EXPORT_TYPE: Constant.DB, + ClusterTimeCompareSummary.RANK_LIST: Constant.ALL, + } + + def test_check_params_is_valid_should_return_false_when_bp_param_does_not_exist(self): + params = {} + params.update(self.PARAMS) + self.assertFalse(ClusterTimeCompareSummary(params).check_params_is_valid()) + + def test_check_params_is_valid_should_return_false_when_export_type_is_notebook(self): + params = {Constant.EXTRA_ARGS: ["--bp", "/data2"]} + params.update(self.PARAMS) + params[Constant.EXPORT_TYPE] = Constant.NOTEBOOK + self.assertFalse(ClusterTimeCompareSummary(params).check_params_is_valid()) + + def test_check_params_is_valid_should_return_false_when_base_path_is_invalid(self): + params = {Constant.EXTRA_ARGS: ["--bp", "/data2"]} + params.update(self.PARAMS) + with mock.patch(NAMESPACE + ".path_manager.PathManager.check_input_file_path", side_effect=RuntimeError): + self.assertFalse(ClusterTimeCompareSummary(params).check_params_is_valid()) + + def test_check_params_is_valid_should_return_false_when_table_cluster_time_summary_does_not_exist(self): + params = {} + params.update(self.PARAMS) + with mock.patch(NAMESPACE + ".db_manager.DBManager.check_tables_in_db", return_value=False): + self.assertFalse(ClusterTimeCompareSummary(params).check_params_is_valid()) + + def test_check_params_is_valid_should_return_false_when_base_table_cluster_time_summary_does_not_exist(self): + params = {Constant.EXTRA_ARGS: ["--bp", "/data2"]} + params.update(self.PARAMS) + with mock.patch(NAMESPACE + ".path_manager.PathManager.check_input_file_path"), \ + mock.patch(NAMESPACE + ".db_manager.DBManager.check_tables_in_db", side_effect=[True, False]): + self.assertFalse(ClusterTimeCompareSummary(params).check_params_is_valid()) + + def test_run_when_all_parameters_are_normal(self): + params = {Constant.EXTRA_ARGS: ["--bp", "/data2"]} + params.update(self.PARAMS) + params[Constant.EXPORT_TYPE] = "" + base_cluster_time_summary_df_dict = { + Constant.TABLE_CLUSTER_TIME_SUMMARY: pd.DataFrame( + { + "rank": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6], + "step": [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], + "computation": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "communicationNotOverlapComputation": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "communicationOverlapComputation": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "communication": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "free": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "communicationWaitStageTime": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "communicationTransmitStageTime": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "memory": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "memoryNotOverlapComputationCommunication": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], + "taskLaunchDelayAvgTime": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] + } + ) + } + cluster_time_summary_df_dict = { + Constant.TABLE_CLUSTER_TIME_SUMMARY: pd.DataFrame( + { + "rank": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7], + "step": [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], + "computation": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "communicationNotOverlapComputation": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "communicationOverlapComputation": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "communication": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "free": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "communicationWaitStageTime": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "communicationTransmitStageTime": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "memory": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "memoryNotOverlapComputationCommunication": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], + "taskLaunchDelayAvgTime": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] + } + ) + } + expected_result = pd.DataFrame({ + "rank": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6], + "step": [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], + "computationDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + "communicationNotOverlapComputationDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, + 1.0, 1.0, 1.0, 1.0], + "communicationOverlapComputationDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, + 1.0, 1.0, 1.0, 1.0], + "communicationDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + "freeDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + "communicationWaitStageTimeDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + "communicationTransmitStageTimeDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, + 1.0, 1.0, 1.0, 1.0], + "memoryDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], + "memoryNotOverlapComputationCommunicationDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, + 1.0, 1.0, 1.0, 1.0], + "taskLaunchDelayAvgTimeDiff": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] + }) + with mock.patch(NAMESPACE + ".path_manager.PathManager.check_input_file_path"), \ + mock.patch(NAMESPACE + ".db_manager.DBManager.check_tables_in_db", side_effect=[True, True]), \ + mock.patch(NAMESPACE + ".database_service.DatabaseService.query_data", + side_effect=[cluster_time_summary_df_dict, base_cluster_time_summary_df_dict]): + cluster_time_compare_summary = ClusterTimeCompareSummary(params) + cluster_time_compare_summary.run() + self.assertTrue(cluster_time_compare_summary.compare_result.equals(expected_result)) + diff --git a/sample/README.md b/sample/README.md index 15238cb9f3815d6fecb0c743e6f826d2abc2988b..8e555f4870d2c39fc5cabad3092d1c17f60d3dfa 100644 --- a/sample/README.md +++ b/sample/README.md @@ -8,19 +8,10 @@ 说明:该sample目录中,每个最小目录就是一个完整的样例工程。这些样例工程本身可能以为依赖的不同存在差异。 ## 依赖说明 -- 硬件环境请参见《[昇腾产品形态说明](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2Fcanncommercial%2F80RC22%2Fquickstart%2Fquickstart%2Fquickstart_18_0002.html)》。 -- 软件环境请参见《[CANN 软件安装指南](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fdocument%2Fdetail%2Fzh%2Fcanncommercial%2F80RC22%2Fsoftwareinst%2Finstg%2Finstg_0000.html%3FMode%3DPmIns%26OS%3DUbuntu%26Software%3DcannToolKit)》安装昇腾设备开发或运行环境,即toolkit软件包。 - -以上环境依赖请根据实际环境选择适配的版本。 - -### 版本配套 -| 条件 | 要求 | -|---|---| -| CANN版本 | >=8.0.RC1.alpha001 | -| 硬件要求 | Atlas 800T A2 训练服务器| - -- 支持AscendPyTorch 1.11.0或更高版本,支持的PyTorch和CANN以及PyTorch和Python软件版本配套关系请参见《[Ascend Extension for PyTorch插件](https://gitee.com/ascend/pytorch)》。 -- 固件驱动版本与配套CANN软件支持的固件驱动版本相同,开发者可通过“[昇腾社区-固件与驱动](https://gitee.com/link?target=https%3A%2F%2Fwww.hiascend.com%2Fhardware%2Ffirmware-drivers%2Fcommunity%3Fproduct%3D2%26model%3D28%26cann%3D8.0.RC3.alpha003%26driver%3D1.0.25.alpha)”页面根据产品型号与CANN软件版本获取配套的固件与驱动。 +安装CANN包,并使能环境变量,并确保```ASCEND_HOME_PATH```生效,可以在CANN包安装目录下使能: +``` +source set_env.sh +``` ## 目录介绍 整体目录结构如下: @@ -100,7 +91,7 @@ mssanitizer ./*.fatbin # 默认进行memcheck检查 ``` LINK_LIBS := -L${ASCEND_HOME_PATH}/lib64 -lruntime -lascendcl -lstdc++ 修改为: - LINK_LIBS := -L${ASCEND_HOME_PATH}/lib64 -L${ASCEND_HOME_PATH}/tools/simulator/${SOC_VERSION}/lib/ -lruntime_camodel -lascendcl -lstdc++ # 需要添加libruntime_camodel的依赖路径, SOC_VERSION 通过使用npu-smi info命令进行查询,获取Chip Name信息。实际配置值 为AscendChip Name,例如Chip Name取值为xxxyy,实际配置值为Ascendxxxyy。当Ascendxxxyy为代码样例路径时,需要配置ascendxxxyy。 + LINK_LIBS := -L${ASCEND_HOME_PATH}/lib64 -L${ASCEND_HOME_PATH}/tools/simulator/${SOC_VERSION}/lib/ -lruntime_camodel -lascendcl -lstdc++ # 需要添加libruntime_camodel的依赖路径, SOC_VERSION 使用npu-smi info查询NPU Name ``` + 调试信息增强: ``` diff --git "a/\345\205\254\347\275\221URL\350\257\264\346\230\216.md" "b/\345\205\254\347\275\221URL\350\257\264\346\230\216.md" deleted file mode 100644 index c78d206c1a47d0e39555574ac78b111cc0d37c53..0000000000000000000000000000000000000000 --- "a/\345\205\254\347\275\221URL\350\257\264\346\230\216.md" +++ /dev/null @@ -1,14 +0,0 @@ -# 公网URL说明 - -| 软件类型 | 软件名 | 路径 | 类型 | 内容 | 用途说明 | -|------|----------------------------------------------------|------------------------------------------|------|------------------------------------------------------------------------------------------------------------|--------------------| -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/config/config.ini | 公网地址 | https://gitee.com/ascend/mstt/blob/master/profiler/msprof_analyze/advisor/doc/Samples%20of%20Fused%20Operator%20API%20Replacement.md" | Advisor优化手段参考示例 | -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/config/config.ini | 公网地址 | https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/modeldevpt/ptmigr/AImpug_0067.html | Advisor优化手段参考示例 | -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/config/config.ini | 公网地址 | https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/devtools/auxiliarydevtool/aoe_16_045.html | Advisor优化手段参考示例 | -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/config/config.ini | 公网地址 | https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/converter_tool_ascend.html#aoe-auto-tuning | Advisor优化手段参考示例 | -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/config/config.ini | 公网地址 | https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/modeldevpt/ptmigr/AImpug_0059.html | Advisor优化手段参考示例 | -| 开源软件 | MindStudio Training Tools - msprof-analyze | /profiler/msprof_analyze/config/config.ini | 公网地址 | https://gitee.com/ascend/mstt/tree/master/profiler/msprof_analyze | msprof-analyze工具地址 | -| 开源软件 | MindStudio Training Tools - msprof-analyze | /profiler/msprof_analyze/LICENSE | 公网地址 | http://www.apache.org/licenses/LICENSE-2.0 | 开源软件协议地址 | -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/rules/aicpu_rules.ymal | 公网地址 | https://gitee.com/ascend/mstt/blob/master/profiler/msprof_analyze/advisor/doc/Samples%20of%20AI%20CPU%20Operator%20Replacement.md | AI CPU 算子替换样例 | -| 开源软件 | MindStudio Training Tools - msprof-analyze advisor | /profiler/msprof_analyze/advisor/rules/environment_variable_info.yaml | 公网地址 | https://support.huawei.com/enterprise/zh/doc/EDOC1100371278/5eeeed85?idPath=23710424 | 组网指南 | -| 开源软件 | MindStudio Training Tools - msprof-analyze | /profiler/msprof_analyze/config/config.ini | 公网地址 | pmail_mindstudio@huawei.com | 公网邮箱 |