diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/README.md b/plugins/tensorboard-plugins/tb_graph_ascend/README.md index 737292598ecfdf0fb39d676ce3352fb4b246076e..cd578e3f4d7d597229854e1362afe05753600f03 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/README.md +++ b/plugins/tensorboard-plugins/tb_graph_ascend/README.md @@ -8,62 +8,64 @@ ### 1. 相关依赖 - `python >= 3.7 ,tensorboard >= 2.11.2,numpy <= 1.26.3` +`python >= 3.7 ,tensorboard >= 2.11.2,numpy <= 1.26.3` ### 2. 安装方式 #### 2.1 pip 安装(推荐) - - 现本插件已经上传到 pypi 社区,用户可在 python 环境下直接通过以下 pip 指令进行安装: - ``` - pip install tb-graph-ascend - ``` - - 也可在 pypi 社区上下载离线 whl 包,传输到无法访问公网的环境上离线安装使用。访问[下载链接](https://pypi.org/project/tb-graph-ascend/#files)选择 whl 包进行下载,之后便可使用指令安装(此处{version}为 whl 包实际版本) - ``` - pip install tb-graph_ascend_{version}-py3-none-any.whl - ``` +- 现本插件已经上传到 pypi 社区,用户可在 python 环境下直接通过以下 pip 指令进行安装: + ``` + pip install tb-graph-ascend + ``` +- 也可在 pypi 社区上下载离线 whl 包,传输到无法访问公网的环境上离线安装使用。访问[下载链接](https://pypi.org/project/tb-graph-ascend/#files)选择 whl 包进行下载,之后便可使用指令安装(此处{version}为 whl 包实际版本) + ``` + pip install tb-graph_ascend_{version}-py3-none-any.whl + ``` #### 2.2 从源代码安装 1. 从仓库下载源码并切换到 master 分支: - ``` - git clone https://gitee.com/ascend/mstt.git -b master - ``` + ``` + git clone https://gitee.com/ascend/mstt.git -b master + ``` 2. 进入目录 `plugins/tensorboard-plugins/tb_graph_ascend` 下 3. 编译前端代码,根据操作系统选取不同指令 - ``` - cd fe - // 安装前端依赖 - npm install --force - // Windows系统 - npm run buildWin - // 其他可使用cp指令的系统,如Linux或Mac - npm run buildLinux - ``` + ``` + cd fe + // 安装前端依赖 + npm install --force + // Windows系统 + npm run buildWin + // 其他可使用cp指令的系统,如Linux或Mac + npm run buildLinux + ``` - **注意**: 此步骤需要安装 [Node.js](https://nodejs.org/zh-cn/download) 环境 + **注意**: 此步骤需要安装 [Node.js](https://nodejs.org/zh-cn/download) 环境 4. 回到上级目录直接安装: - ``` - cd ../ - python setup.py develop - ``` - - 或: 构建 whl 包安装 - ``` - python setup.py bdist_wheel - ``` - 在 `plugins/tensorboard-plugins/tb_graph_ascend/dist` 目录下取出 whl 包,使用以下指令安装(此处{version}为 whl 包实际版本) - ``` - pip install tb-graph_ascend_{version}-py3-none-any.whl - ``` + ``` + cd ../ + python setup.py develop + ``` + +- 或: 构建 whl 包安装 + ``` + python setup.py bdist_wheel + ``` + 在 `plugins/tensorboard-plugins/tb_graph_ascend/dist` 目录下取出 whl 包,使用以下指令安装(此处{version}为 whl 包实际版本) + ``` + pip install tb-graph_ascend_{version}-py3-none-any.whl + ``` ### 3. 解析数据说明 - 将通过[msprobe](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe#10-%E5%88%86%E7%BA%A7%E5%8F%AF%E8%A7%86%E5%8C%96%E6%9E%84%E5%9B%BE%E6%AF%94%E5%AF%B9)工具构图功能采集得到的文件后缀为.vis 的模型结构文件(文件本身为 json 格式)放置于某个文件夹中,路径名称下文称之为 `output_path` - - E.g. \ +将通过[msprobe](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe#10-%E5%88%86%E7%BA%A7%E5%8F%AF%E8%A7%86%E5%8C%96%E6%9E%84%E5%9B%BE%E6%AF%94%E5%AF%B9)工具构图功能采集得到的文件后缀为.vis 的模型结构文件(文件本身为 json 格式)放置于某个文件夹中,路径名称下文称之为 `output_path` + +- E.g. \ `---output_path` \ `-----output.vis` \ `-----output2.vis` @@ -90,39 +92,47 @@ 注意:如果`--logdir` 指定目录下的文件太大或太多,请等候,刷新浏览器查看加载结果。 -3. 建议在本地启动tensorboard,如果网络浏览器与启动 TensorBoard 的机器不在同一台机器上,需要远程启动,可参照[远程启动方式](#413-远程查看数据),但需用户自行评估**安全风险**。 +3. 建议在本地启动 tensorboard,如果网络浏览器与启动 TensorBoard 的机器不在同一台机器上,需要远程启动,可参照[远程启动方式](#413-远程查看数据),但需用户自行评估**安全风险**。 ## 三、浏览器查看 + **注意:本工具不支持同时通过多个浏览器窗口同时访问同一个 TensorBoard 服务,否则会出现页面无法正常显示的情况。** ### 3.1 主界面 - ![输入图片说明](./doc/images/main-page.png) ### 3.2 操作方式: -- **节点双击打开,单击选中。** -- **选中的节点边框呈现蓝色,比对场景下若其存在对应节点,则对应节点边框为浅蓝色。** -- **键盘 WS 根据鼠标位置放大缩小,AD 左右移动。** -- **鼠标滚轮上下移动,鼠标可拖动页面。** -- **比对场景鼠标右键可选中节点,并可展开至对应侧的节点并选中。** +- **节点双击打开,单击选中。** +- **选中的节点边框呈现蓝色,比对场景下若其存在对应节点,则对应节点边框为浅蓝色。** +- **键盘 WS 根据鼠标位置放大缩小,AD 左右移动。** +- **鼠标滚轮上下移动,鼠标可拖动页面。** +- **比对场景鼠标右键可选中节点,并可展开至对应侧的节点并选中。** ![输入图片说明](./doc/images/operator-image.png) + ### 3.3 名称搜索 + ![输入图片说明](./doc/images/vis_search_info.png) + ### 3.4 精度筛选/溢出筛选 + 注意:单图场景不存在精度筛选和溢出筛选,下图为双图比对场景。
![输入图片说明](./doc/images/vis_precision_info.png) + ### 3.5 未匹配节点筛选 + 参考匹配说明 ,不符合匹配规则的节点为无匹配节点,颜色标灰。适用于排查两个模型结构差异的场景。
![输入图片说明](./doc/images/vis_unmatch_info.png) + ### 3.6 手动选择节点匹配 + 可通过浏览器界面,通过鼠标选择两个待匹配的灰色节点进行匹配。当前暂不支持真实数据模式。
如果选中"操作选中节点及其子节点":
-点击匹配后会将两个节点及其子节点按照Module名称依次匹配,取消匹配后会将子节点的匹配关系清除。
+点击匹配后会将两个节点及其子节点按照 Module 名称依次匹配,取消匹配后会将子节点的匹配关系清除。
否则:
点击匹配后只会将两个节点进行匹配,取消匹配后会将节点的匹配关系清除 注意:匹配结束之后,需要点击保存才能持久化到源文件里面 @@ -130,43 +140,52 @@ ![输入图片说明](./doc/images/vis_match_info.png) ### 3.7 生成匹配配置文件 + 可保存已经已匹配节点的匹配关系到配置文件中,并支持读取配置文件中的数据,进行匹配操作。
-默认保存在当前目录下,文件名为`[当前文件名].vis.config`,每次切换文件都会扫描当前录下的后缀名为.vis.config配置文件,并更新配置文件列表。 +默认保存在当前目录下,文件名为`[当前文件名].vis.config`,每次切换文件都会扫描当前录下的后缀名为.vis.config 配置文件,并更新配置文件列表。 注意:匹配结束之后,需要点击保存才能持久化到源文件里面 ![输入图片说明](./doc/images/vis_save_match_info.png) +### 3.8 支持用户自定义精度指标配置 +![输入图片说明](./doc/images/vis_update_precision.png) ## 四、附录 ### 4.1 安全加固建议 #### 4.1.1 免责声明 + 本工具为基于 TensorBoard 底座开发的插件,使用本插件需要基于 TensorBoard 运行,请自行关注 TensorBoard 相关安全配置和安全风险。 -打开本工具时,本工具会对logdir目录下的vis文件以及其父目录进行安全检查,如果存在安全风险,本工具会展示如下提示信息,询问用户是否继续执行,用户选择继续执行后,可以操作未通过安全检查的文件和目录,用户需要自行承担操作风险。如果用户选择不继续执行,则用户只能操作通过安全检查的文件。 +打开本工具时,本工具会对 logdir 目录下的 vis 文件以及其父目录进行安全检查,如果存在安全风险,本工具会展示如下提示信息,询问用户是否继续执行,用户选择继续执行后,可以操作未通过安全检查的文件和目录,用户需要自行承担操作风险。如果用户选择不继续执行,则用户只能操作通过安全检查的文件。 ![输入图片说明](./doc/images/saFe_warning.png) -#### 4.1.2 TensorBoard版本说明 + +#### 4.1.2 TensorBoard 版本说明 + 满足[相关依赖](#1-相关依赖)中要求的 TensorBoard 版本皆可正常使用本插件功能,但为 TensorBoard 本身安全风险考虑,建议使用最新版本 TensorBoard 。 + #### 4.1.3 远程查看数据 如果网络浏览器与启动 TensorBoard 的机器不在同一台机器上, TensorBoard 提供了远程查看数据的指令启动方式,但此种方式会将服务器对应端口在局域网内公开(全零监听),请用户自行关注安全风险。 - * 在启动指令尾部加上`--bind_all`或`--host={服务器IP}`参数启用远程查看方式,如: +- 在启动指令尾部加上`--bind_all`或`--host={服务器IP}`参数启用远程查看方式,如: - ``` - tensorboard --logdir output_path --port=6006 --host=xxx.xxx.xxx.xxx - 或 - tensorboard --logdir output_path --port=6006 --bind_all - ``` + ``` + tensorboard --logdir output_path --port=6006 --host=xxx.xxx.xxx.xxx + 或 + tensorboard --logdir output_path --port=6006 --bind_all + ``` - * 在打开浏览器访问界面时,需将 URL 内主机名由`localhost`替换为主机的 ip 地址,如`http://xxx.xxx.xxx.xxx:6006` +- 在打开浏览器访问界面时,需将 URL 内主机名由`localhost`替换为主机的 ip 地址,如`http://xxx.xxx.xxx.xxx:6006` ### 4.2 通信矩阵 -| 序号 | 代码仓 | 功能 | 源设备 | 源IP | 源端口 | 目的设备 | 目的IP | 目的端口
(侦听) | 协议 | 端口说明 | 端口配置| 侦听端口是否可更改 | 所属平面 | 版本 | 特殊场景 | 备注 | -|:----|:---|:--|:--|:---|:---|:---|:----|:--|:--|:---|:---|:---|:---|:-----|:-----|:---| -| 1 | tensorboard-plugins | TensorBoard底座前后端通信 | 访问TensorBoard浏览器所在机器 | 访问TensorBoard浏览器所在机器ip | | TensorBoard服务所在机器 | TensorBoard服务所在服务器的ip | 6006 | HTTP | tensorboard服务通信 | `--port` | 可修改 | 业务面 | 所有版本 | 无 | | + +| 序号 | 代码仓 | 功能 | 源设备 | 源 IP | 源端口 | 目的设备 | 目的 IP | 目的端口
(侦听) | 协议 | 端口说明 | 端口配置 | 侦听端口是否可更改 | 所属平面 | 版本 | 特殊场景 | 备注 | +| :--- | :------------------ | :------------------------- | :------------------------------ | :--------------------------------- | :----- | :----------------------- | :------------------------------ | :-------------------- | :--- | :------------------- | :------- | :----------------- | :------- | :------- | :------- | :--- | +| 1 | tensorboard-plugins | TensorBoard 底座前后端通信 | 访问 TensorBoard 浏览器所在机器 | 访问 TensorBoard 浏览器所在机器 ip | | TensorBoard 服务所在机器 | TensorBoard 服务所在服务器的 ip | 6006 | HTTP | tensorboard 服务通信 | `--port` | 可修改 | 业务面 | 所有版本 | 无 | | + ### 4.3 公网地址说明 -[公网地址说明](./doc/公网地址说明.csv) +[公网地址说明](./doc/公网地址说明.csv) diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_update_precision.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_update_precision.png new file mode 100644 index 0000000000000000000000000000000000000000..b764fc983c0178e6f2f1d77807a6a4635a7dbd9e Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_update_precision.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/fe/package-lock.json b/plugins/tensorboard-plugins/tb_graph_ascend/fe/package-lock.json index 743efb7a0103e2718ada03cf069d4ad3396bdb8d..00185f250091f7a5d19fc126fb1441716015e6fd 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/fe/package-lock.json +++ b/plugins/tensorboard-plugins/tb_graph_ascend/fe/package-lock.json @@ -19,6 +19,7 @@ "@polymer/polymer": "^3.5.1", "@vaadin/button": "24.6.5", "@vaadin/checkbox": "24.6.5", + "@vaadin/checkbox-group": "^24.6.5", "@vaadin/combo-box": "24.6.5", "@vaadin/confirm-dialog": "24.6.5", "@vaadin/context-menu": "24.6.5", @@ -993,6 +994,24 @@ "lit": "^3.0.0" } }, + "node_modules/@vaadin/checkbox-group": { + "version": "24.6.5", + "resolved": "https://registry.npmmirror.com/@vaadin/checkbox-group/-/checkbox-group-24.6.5.tgz", + "integrity": "sha512-1K34LnXxINlMSrwAynLW46nyAGqz6kZW4ogZeKESXa+JogjOiHCaVy127xIKYmfJD2yR4ti31VPQKPNQXlZpxA==", + "license": "Apache-2.0", + "dependencies": { + "@open-wc/dedupe-mixin": "^1.3.0", + "@polymer/polymer": "^3.0.0", + "@vaadin/a11y-base": "~24.6.5", + "@vaadin/checkbox": "~24.6.5", + "@vaadin/component-base": "~24.6.5", + "@vaadin/field-base": "~24.6.5", + "@vaadin/vaadin-lumo-styles": "~24.6.5", + "@vaadin/vaadin-material-styles": "~24.6.5", + "@vaadin/vaadin-themable-mixin": "~24.6.5", + "lit": "^3.0.0" + } + }, "node_modules/@vaadin/combo-box": { "version": "24.6.5", "resolved": "https://registry.npmmirror.com/@vaadin/combo-box/-/combo-box-24.6.5.tgz", diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/fe/package.json b/plugins/tensorboard-plugins/tb_graph_ascend/fe/package.json index f3416a523419b3b6b9367eb5258e56aa7e317f9c..9469af8c44eacb144274ee17ca16c297da3ac43b 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/fe/package.json +++ b/plugins/tensorboard-plugins/tb_graph_ascend/fe/package.json @@ -38,6 +38,7 @@ "@polymer/polymer": "^3.5.1", "@vaadin/button": "24.6.5", "@vaadin/checkbox": "24.6.5", + "@vaadin/checkbox-group": "^24.6.5", "@vaadin/combo-box": "24.6.5", "@vaadin/confirm-dialog": "24.6.5", "@vaadin/context-menu": "24.6.5", diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_color_select/index.ts b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_color_select/index.ts index 44c9f5a7860c67ce497aa646d92755f19be31a42..10bedce9b586c57cc87b73b87690f42abf9d8ed0 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_color_select/index.ts +++ b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_color_select/index.ts @@ -15,6 +15,7 @@ */ import '@vaadin/combo-box'; +import '@vaadin/text-field'; import * as _ from 'lodash'; import { PolymerElement, html } from '@polymer/polymer'; import { Notification } from '@vaadin/notification'; @@ -25,7 +26,7 @@ import request from '../../../utils/request'; import { DarkModeMixin } from '../../../polymer/dark_mode_mixin'; import { LegacyElementMixin } from '../../../polymer/legacy_element_mixin'; import { PRECISION_DESC } from '../../../common/constant'; - +import '../tf_filter_precision_error/index' const UNMATCHED_NODE_NAME = '无匹配节点'; @customElement('tf-color-select') class Legend extends LegacyElementMixin(DarkModeMixin(PolymerElement)) { @@ -193,6 +194,7 @@ class Legend extends LegacyElementMixin(DarkModeMixin(PolymerElement)) { >
+
+ `; @property({ type: Boolean }) _colorSetting: boolean = true; // 颜色设置按钮 + @property({ type: Boolean }) + filterDialogOpened: boolean = false; + @property({ type: Boolean }) isSingleGraph = false; @@ -483,11 +489,41 @@ class Legend extends LegacyElementMixin(DarkModeMixin(PolymerElement)) { } } } + // 请求后端接口,更新筛选数据 + updateFilterData = async () => { + if (_.isEmpty(this.selectColor)) { + return; + } + try { + const params = { + run: this.selection.run, + tag: this.selection.tag, + microStep: this.selection.microStep, + precision_index: this.selectColor.join(','), + }; + + const precisionmenu = await request({ url: 'screen', method: 'GET', params: params }); + this.set('precisionmenu', precisionmenu); + this.set('selectedPrecisionNode', precisionmenu?.[0] || ''); + } + catch (error) { + Notification.show(`获取精度菜单失败,请检查 toggleCheckbox 和 vis 文件中的数据。`, { + position: 'middle', + duration: 4000, + theme: 'error', + }); + } + } toggleVisibility(): void { this.set('_colorSetting', !this._colorSetting); } + _clickFilter(event): void { + event.stopPropagation(); + this.set('filterDialogOpened', true); + } + _clickSetting(event): void { event.stopPropagation(); this.set('_colors', true); diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_filter_precision_error/index.ts b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_filter_precision_error/index.ts new file mode 100644 index 0000000000000000000000000000000000000000..8ddf63321ad355d134eb83797e97cf7f142a17a0 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_filter_precision_error/index.ts @@ -0,0 +1,104 @@ +import '@vaadin/checkbox'; +import '@vaadin/confirm-dialog' +import '@vaadin/checkbox-group'; +import '@vaadin/text-field'; +import { Notification } from '@vaadin/notification'; +import { customElement, property, observe } from '@polymer/decorators'; +import { html, PolymerElement } from '@polymer/polymer'; +import request from '../../../utils/request'; +import { isEmpty } from 'lodash'; + +@customElement('tf-filter-precision-error') +class TfFilterPrecisionError extends PolymerElement { + static readonly template = html` + + + + + + + + + ` + + @property({ type: Boolean, notify: true }) + filterDialogOpened: boolean = false; + + @property({ type: Array }) + filterValue: string[] = []; + + @property({ type: Object }) + selection: any; + + @property({ type: Object }) + updateFilterData: Function = () => { }; + + MAX_RELATIVE_ERR = "0"; + MIN_RELATIVE_ERR = "1"; + MEAN_RELATIVE_ERR = "2"; + NORM_RELATIVE_ERR = "3"; + + @observe('selection') + _selectionChanged() { + this.set('filterValue', [this.MAX_RELATIVE_ERR, this.MIN_RELATIVE_ERR, this.MEAN_RELATIVE_ERR, this.NORM_RELATIVE_ERR]); + } + override ready(): void { + super.ready(); + const filterDialog = this.shadowRoot?.querySelector('#filter-dialog') as HTMLElement; + filterDialog?.addEventListener('confirm', this.onFlterDialogConfirm) + this.set('filterValue', [this.MAX_RELATIVE_ERR, this.MIN_RELATIVE_ERR, this.MEAN_RELATIVE_ERR, this.NORM_RELATIVE_ERR]); + } + onFlterDialogConfirm = async (e: any) => { + if (isEmpty(this.filterValue)) { + Notification.show(`错误: 精度误差计算指标为空,请选择指标`, { + position: 'middle', + duration: 1800, + theme: 'error', + }); + setTimeout(() => { + this.set('filterDialogOpened', true); + }, 1800) + return; + } + const data = { + metaData: this.selection, + filterValue: this.filterValue + }; + const { success, error } = await request({ url: 'updatePrecisionError', method: 'POST', data }); + if (success) { + const updateHierarchyData = new CustomEvent('updateHierarchyData', { bubbles: true, composed: true }); + this.dispatchEvent(updateHierarchyData); + this.set('filterDialogOpened', false); + this.updateFilterData(); + Notification.show(`操作成功:精度误差值已更新`, { + position: 'middle', + duration: 2000, + theme: 'success', + }); + } + else { + Notification.show(`精度误差计算错误${error}`, { + position: 'middle', + duration: 1800, + theme: 'error', + }); + setTimeout(() => { + this.set('filterDialogOpened', true); + }, 1800) + } + } + +} \ No newline at end of file diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_manual_match/index.ts b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_manual_match/index.ts index d9211712a0455ab450194f7c86b51034de7d3c1a..97b4899f7f5febd263ae7d806de6fe32bbda7f6c 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_manual_match/index.ts +++ b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_controls_board/components/tf_manual_match/index.ts @@ -67,12 +67,14 @@ class Legend extends PolymerElement { .match-checkbox { font-size: 14px; } + .vaadin-details-title { font-size: 14px; color: #333333; font-weight: 600; margin-bottom: 0; } + .vaadin-details vaadin-details-summary { font-size: 15px; color: #333333; diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_info_board/components/tf_vaddin_text_table/index.ts b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_info_board/components/tf_vaddin_text_table/index.ts index 7e97498e6c4e143f8cc33b42c3a528445e1c82d9..e490f5e9b598d9f4fb12c6aa919da4cd3aada593 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_info_board/components/tf_vaddin_text_table/index.ts +++ b/plugins/tensorboard-plugins/tb_graph_ascend/fe/src/graph_info_board/components/tf_vaddin_text_table/index.ts @@ -70,7 +70,7 @@ class TfVaadinTable extends PolymerElement { cursor: pointer; position: relative; right: 58px; - bottom: 106px; + bottom: 180px; } .copy-button:hover { background: #0056b3; diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/controllers/match_nodes_controller.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/controllers/match_nodes_controller.py index a15005adbe008467cb8b18f5fbadae9fb958df3d..904247786c215a8232c71a4aa71d2cd980dbe3bd 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/controllers/match_nodes_controller.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/controllers/match_nodes_controller.py @@ -30,6 +30,16 @@ class MatchNodesController: return False return True + @staticmethod + def get_opposite_node_name(node_name): + opposite_node_name = '' + # 如果npu_node_name包含forward,则opposite_npu_node_name为npu_node_name替换forward为backward + if 'forward' in node_name: + opposite_node_name = node_name.replace('forward', 'backward') + else: + opposite_node_name = node_name.replace('backward', 'forward') + return opposite_node_name + @staticmethod def process_task_add(graph_data, npu_node_name, bench_node_name, task): if not MatchNodesController.is_same_node_type(graph_data, npu_node_name, bench_node_name): @@ -39,29 +49,45 @@ class MatchNodesController: } result = {} + opposite_result = {} + opposite_npu_node_name = MatchNodesController.get_opposite_node_name(npu_node_name) + opposite_bench_node_name = MatchNodesController.get_opposite_node_name(bench_node_name) if task == 'md5': result = MatchNodesController.process_md5_task_add(graph_data, npu_node_name, bench_node_name) + opposite_result = MatchNodesController.process_md5_task_add(graph_data, opposite_npu_node_name, opposite_bench_node_name) elif task == 'summary': result = MatchNodesController.process_summary_task_add(graph_data, npu_node_name, bench_node_name) + opposite_result = MatchNodesController.process_summary_task_add(graph_data, opposite_npu_node_name, opposite_bench_node_name) else: result = { 'success': False, 'error': 'task类型错误' } + result['success'] = result.get('success') or opposite_result.get('success') + if not result.get('success'): + result['error'] = f'当前节点:{result.get("error",'')}。对侧节点:{opposite_result.get("error")}' return result @staticmethod def process_task_delete(graph_data, npu_node_name, bench_node_name, task): result = {} + opposite_result = {} + opposite_npu_node_name = MatchNodesController.get_opposite_node_name(npu_node_name) + opposite_bench_node_name = MatchNodesController.get_opposite_node_name(bench_node_name) if task == 'md5': result = MatchNodesController.process_md5_task_delete(graph_data, npu_node_name, bench_node_name) + opposite_result = MatchNodesController.process_md5_task_delete(graph_data, opposite_npu_node_name, opposite_bench_node_name) elif task == 'summary': result = MatchNodesController.process_summary_task_delete(graph_data, npu_node_name, bench_node_name) + opposite_result = MatchNodesController.process_summary_task_delete(graph_data, opposite_npu_node_name, opposite_bench_node_name) else: result = { 'success': False, 'error': 'task类型错误' } + result['success'] = result.get('success') or opposite_result.get('success') + if not result.get('success'): + result['error'] = f'当前节点:{result.get("error",'')}。对侧节点:{opposite_result.get("error")}' return result @staticmethod @@ -215,8 +241,10 @@ class MatchNodesController: @staticmethod def process_md5_task_add(graph_data, npu_node_name, bench_node_name): - npu_node_data = graph_data.get('NPU', {}).get('node', {}).get(npu_node_name, {}) - bench_node_data = graph_data.get('Bench', {}).get('node', {}).get(bench_node_name, {}) + npu_node_data = graph_data.get('NPU', {}).get('node', {}).get(npu_node_name) + bench_node_data = graph_data.get('Bench', {}).get('node', {}).get(bench_node_name) + if not npu_node_data or not bench_node_data: + return {'success': False, 'error': '节点不存在'} # 去除节点名称前缀 npu_input_data = GraphUtils.remove_prefix(npu_node_data.get('input_data', {}), npu_node_name + '.') bench_input_data = GraphUtils.remove_prefix(bench_node_data.get('input_data', {}), bench_node_name + '.') @@ -285,8 +313,13 @@ class MatchNodesController: 'success': False, 'error': "操作失败:节点未匹配,请先匹配节点", } - npu_node_data = graph_data.get('NPU', {}).get('node', {}).get(npu_node_name, {}) - bench_node_data = graph_data.get('Bench', {}).get('node', {}).get(bench_node_name, {}) + npu_node_data = graph_data.get('NPU', {}).get('node', {}).get(npu_node_name) + bench_node_data = graph_data.get('Bench', {}).get('node', {}).get(bench_node_name) + if not npu_node_data or not bench_node_data: + return { + 'success': False, + 'error': "操作失败:节点不存在", + } # 在原始数据上,删除匹配节点,和匹配节点信息 npu_node_data['matched_node_link'] = [] bench_node_data['matched_node_link'] = [] @@ -309,8 +342,13 @@ class MatchNodesController: 'success': False, 'error': "操作失败:节点未匹配,请先匹配节点", } - npu_node_data = graph_data.get('NPU', {}).get('node', {}).get(npu_node_name, {}) - bench_node_data = graph_data.get('Bench', {}).get('node', {}).get(bench_node_name, {}) + npu_node_data = graph_data.get('NPU', {}).get('node', {}).get(npu_node_name) + bench_node_data = graph_data.get('Bench', {}).get('node', {}).get(bench_node_name) + if not npu_node_data or not bench_node_data: + return { + 'success': False, + 'error': "操作失败:节点不存在", + } # 在原始数据上,删除匹配节点,和匹配节点信息 npu_node_data['matched_node_link'] = [] bench_node_data['matched_node_link'] = [] diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/json_graph_service.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/json_graph_service.py index a900850323ec091db6346d593f70eae935260e2a..ba95135dbe58e575db5db96e2de6cd37efd3fb71 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/json_graph_service.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/json_graph_service.py @@ -24,12 +24,14 @@ from ..utils.global_state import GraphState from ..controllers.match_nodes_controller import MatchNodesController from ..controllers.layout_hierarchy_controller import LayoutHierarchyController from ..utils.global_state import NPU_PREFIX, BENCH_PREFIX, NPU, BENCH, SINGLE +from ..utils.global_state import MAX_RELATIVE_ERR, MIN_RELATIVE_ERR, MEAN_RELATIVE_ERR, NORM_RELATIVE_ERR from .base_graph_service import GraphServiceStrategy logger = tb_logging.get_logger() class JsonGraphService(GraphServiceStrategy): + def __init__(self, run_path, tag): super().__init__(run_path, tag) @@ -197,6 +199,51 @@ class JsonGraphService(GraphServiceStrategy): node_type_name = '调试侧' if graph_type == NPU else '标杆侧' return {'success': False, 'error': f'{node_type_name}节点展开或收起发生错误', 'data': None} + def update_precision_error(self, meta_data, filter_value): + try: + graph_data, error_message = GraphUtils.get_graph_data(meta_data) + if error_message: + return {'success': False, 'error': error_message} + npu_node_list = graph_data.get(NPU, {}).get('node', {}) + for _, node_info in npu_node_list.items(): + output_statistical_diff = node_info.get('output_data', None) + if not node_info.get('matched_node_link') or not output_statistical_diff: + continue + max_rel_error = -1 + # 根据filter_value 的选择指标计算新的误差值 + for _, diff_values in output_statistical_diff.items(): + filter_diff_rel = [] + if MAX_RELATIVE_ERR in filter_value: + filter_diff_rel.append(diff_values.get('MaxRelativeErr')) + if MIN_RELATIVE_ERR in filter_value: + filter_diff_rel.append(diff_values.get('MinRelativeErr')) + if NORM_RELATIVE_ERR in filter_value: + filter_diff_rel.append(diff_values.get('NormRelativeErr')) + if MEAN_RELATIVE_ERR in filter_value: + filter_diff_rel.append(diff_values.get('MeanRelativeErr')) + # 过滤掉N/A + filter_diff_rel = [x for x in filter_diff_rel if x and x != 'N/A'] + # 如果output指标中存在 Nan/inf/-inf, 直接标记为最大值 + if "Nan" in filter_diff_rel or "inf" in filter_diff_rel or "-inf" in filter_diff_rel: + max_rel_error = 1 + break + filter_diff_rel = [GraphUtils.convert_to_float(x) for x in filter_diff_rel] + max_rel_error_for_key = max(filter_diff_rel) if filter_diff_rel else 0 + max_rel_error = max(max_rel_error, max_rel_error_for_key) + if max_rel_error != -1: + node_info.setdefault('data', {})['precision_index'] = min(max_rel_error, 1) + return {'success': True, 'data': {}} + except Exception as e: + logger.error('更新精度误差失败:' + str(e)) + return {'success': False, 'error': str(e)} + + def update_hierarchy_data(self, graph_type): + if (graph_type == NPU or graph_type == BENCH): + hierarchy = LayoutHierarchyController.update_hierarchy_data(graph_type) + return {'success': True, 'data': hierarchy} + else: + return {'success': False, 'error': '节点类型错误'} + def get_node_info(self, node_info, meta_data): graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/global_state.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/global_state.py index 9ed8c4920e62f62df8241a9b1b24876ae3e098f6..02db7d15b63fc863a0ad3b251376344b481e034b 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/global_state.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/global_state.py @@ -49,6 +49,12 @@ API = 1 MULTI_COLLECTION = 8 API_LIST = 9 +# 计算指标 +MAX_RELATIVE_ERR = "0" +MIN_RELATIVE_ERR = "1" +MEAN_RELATIVE_ERR = "2" +NORM_RELATIVE_ERR = "3" + class GraphState: # 模块级全局变量 diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/graph_utils.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/graph_utils.py index 569b55f99a4c153d81c651b9c6743a22648fb6ec..5e77cd88c58e206d742703f0d12102a905e23c2c 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/graph_utils.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/utils/graph_utils.py @@ -173,6 +173,12 @@ class GraphUtils: @staticmethod def convert_to_float(value): try: + if isinstance(value, str): + # 处理'0.0%, 由于Mean小于1e-06, 建议不参考此相对误差,请参考绝对误差'和'0.0%'的情况 + value = value.split(',')[0] + if value.endswith('%'): + value = value.replace('%', '').strip() + return float(value) / 100.0 return float(value) except ValueError: return float('nan') diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/views/graph_views.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/views/graph_views.py index d590d3c533d3729da0dabc166e1d94657d9591e1..d3fe234c564309afa401c3df22f947d85b4cd767 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/views/graph_views.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/views/graph_views.py @@ -98,6 +98,18 @@ class GraphView: result = strategy.load_graph_all_node_list(meta_data) response = http_util.Respond(request, result, "application/json") return response + + # 更新误差节点 + @staticmethod + @wrappers.Request.application + @check_file_type + def update_precision_error(request): + data = GraphUtils.safe_json_loads(request.get_data().decode('utf-8')) + meta_data = data.get('metaData') + filter_value = data.get("filterValue") + strategy = GraphView._get_strategy(meta_data) + result = strategy.update_precision_error(meta_data, filter_value) + return http_util.Respond(request, result, "application/json") # 展开关闭节点 @staticmethod diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/plugin.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/plugin.py index 40d1e66120968c46dbfc7faba58322119f4077bb..21c5df98856789dfd6808e8d737c817a8f849f23 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/plugin.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/plugin.py @@ -82,6 +82,7 @@ class GraphsPlugin(base_plugin.TBPlugin): '/saveData': GraphView.save_data, '/updateColors': GraphView.update_colors, '/saveMatchedRelations': GraphView.save_matched_relations, + '/updatePrecisionError': GraphView.update_precision_error, } def is_active(self):