diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/LICENSE b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/README.md b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7baa121855f95bdc3469a6c715c8445d63858ea4 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/README.md @@ -0,0 +1,69 @@ +# TensorBoard 训练监控可视化插件 + +一个用于可视化模型监控指标的 TensorBoard 插件,支持 step、rank 和 moduleName 指标的热力图和趋势图分析,插件支持将特定的数据库脚本生成可视化交互式界面。 + +## 核心功能 + +- **交互式热力图**:支持跨维度(训练步数、rank、模块/参数)指标统计量可视化 +- **趋势分析**:查看选定维度的指标变化趋势 + +## 数据库结构 + +| 表名 | 说明 | +| -------------------- | -------------------- | +| `monitoring_targets` | 被监控模块/层信息 | +| `monitoring_metrics` | 可用指标列表 | +| `metric_stats` | 指标的统计量类型 | +| `global_stats` | 全局步数/rank 范围 | +| `metric_*_step_*` | 分片存储的指标数据表 | + +## 项目文件 + +```bash +hierarchy_plugin/ # 模型分级可视化插件 +├── fe/ # 前端资源 +├── server # 插件后端核心 +monvis_plugin # 训练监控可视化插件 +├── fe/ # 前端资源 +├── server # 插件后端核心 +setup.py # 安装配置 + +``` + +## 安装指南 + +1. 克隆项目仓库 + git clone https://gitee.com/ascend/mstt.git -b poc + +2. 进入目录 `plugins/tensorboard-plugins/tb_graph_ascend` 下 + +```bash +// 进入前端目录 +cd plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/fe +// 安装前端依赖 +npm install --force +// Windows系统 +npm run buildWin +// Linux系统 +npm run buildLinux +// 进入根目录 +cd plugins/tensorboard-plugins/tb_graph_ascend +// 本地安装 +pip install -e . +``` + +3. 插件安装后会被 TensorBoard 自动识别 + +## TensorBoard 使用 + +1. 启动服务: + +```bash +tensorboard --logdir=./db --port=6008 +``` + +访问路径: + +浏览器打开 http://localhost:6008 + +选择顶部导航栏的 MonVis 标签页 diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/app.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/app.py new file mode 100644 index 0000000000000000000000000000000000000000..356e50aeaca98aed59e0a53eab56ee0c807c7789 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/app.py @@ -0,0 +1,57 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os +from typing import Dict, Any +from tensorboard.plugins import base_plugin +from .controllers.monvis_controller import MonvisController + + +class MonVis(base_plugin.TBPlugin): + """MonVis TensorBoard Plugin for visualizing monitoring data.""" + + plugin_name = "mon_vis" + + def __init__(self, context): + super().__init__(context) + self._log_dir = context.logdir + self.db_path = os.path.join(self._log_dir, "monitor_metrics.db") + self.monvis_controller = MonvisController(self.db_path) + self.is_db_connected = self.monvis_controller.is_db_connected + + def get_plugin_apps(self) -> Dict[str, Any]: + """Return all HTTP routes for the plugin.""" + + return { + "/metrics": self.monvis_controller.request_metrics, + "/values": self.monvis_controller.request_values, + "/heatmap_data": self.monvis_controller.request_heatmap_data, + "/trend": self.monvis_controller.request_trend_data, + '/index.js': self.monvis_controller.static_file_route, + '/index.html': self.monvis_controller.static_file_route + } + + def is_active(self) -> bool: + """Determine if the plugin is active.""" + return self.is_db_connected + + def frontend_metadata(self): + """Return frontend metadata.""" + return base_plugin.FrontendMetadata( + es_module_path="/index.js", + tab_name="Mon_Vis" + ) + diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/controllers/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/controllers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/controllers/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/controllers/monvis_controller.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/controllers/monvis_controller.py new file mode 100644 index 0000000000000000000000000000000000000000..cff6e6bae2ce895be8ae3c5b9f1b162ab4620426 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/controllers/monvis_controller.py @@ -0,0 +1,103 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os +import sqlite3 + +from pathlib import Path +from tensorboard.backend import http_util +from werkzeug import wrappers, Response, exceptions +from ..services.monvis_service import MonvisService + + +class MonvisController: + + def __init__(self, db_path): + self.db_path = db_path + self.monvis_service = MonvisService(self.db_path) + self.is_db_connected = self.monvis_service.is_db_connected + + @staticmethod + @wrappers.Request.application + def static_file_route(request): + filename = os.path.basename(request.path) + extension = os.path.splitext(filename)[1] + if extension == '.html': + mimetype = 'text/html' + elif extension == '.js': + mimetype = 'application/javascript' + else: + mimetype = 'application/octet-stream' + server_dir = Path(__file__).resolve().parent.parent + filepath = server_dir / "static" / filename + try: + with open(filepath, 'rb') as infile: + contents = infile.read() + except IOError as e: + raise exceptions.NotFound('404 Not Found') from e + return Response(contents, content_type=mimetype, headers={"X-Content-Type-Options": "nosniff"}) + + @wrappers.Request.application + def request_metrics(self, request): + """Return all available metrics and fixed stats.""" + + try: + result = self.monvis_service.get_metrics_stat() + except sqlite3.Error as e: + result = {'success': False, 'error': f"sqlite error: {str(e)}"} + except Exception as e: + result = {'success': False, 'error': str(e)} + return http_util.Respond(request, result, "application/json") + + @wrappers.Request.application + def request_values(self, request): + """Return list of values for specified metric, stat and dimension.""" + try: + metric = request.args.get('metric') + stat = request.args.get('stat') + dimension = request.args.get('dimension') + result = self.monvis_service.get_values(metric, stat, dimension) + except Exception as e: + result = {'success': False, 'error': str(e)} + return http_util.Respond(request, result, "application/json") + + @wrappers.Request.application + def request_heatmap_data(self, request): + """Return heatmap data for specified parameters.""" + try: + metric = request.args.get('metric') + stat = request.args.get('stat') + dimension = request.args.get('dimension') + value = request.args.get('value') + result = self.monvis_service.get_heatmap_data(metric, stat, dimension, value) + except Exception as e: + result = {'success': False, 'error': str(e)} + return http_util.Respond(request, result, "application/json") + + @wrappers.Request.application + def request_trend_data(self, request): + """Return trend data for specified parameters.""" + try: + metric = request.args.get('metric') + stat = request.args.get('stat') + dimension = request.args.get('dimension') + dim_x = request.args.get('dimX') + dim_y = request.args.get('dimYIdx') + result = self.monvis_service.get_trend_data(metric, stat, dimension, dim_x, dim_y) + except Exception as e: + result = {'success': False, 'error': str(e)} + return http_util.Respond(request, result, "application/json") + diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/database/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/database/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/database/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/database/db_connection.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/database/db_connection.py new file mode 100644 index 0000000000000000000000000000000000000000..8f27a73fd0f9fe6fb07c6e3c6ba7ba61ebd80b4a --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/database/db_connection.py @@ -0,0 +1,40 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import sqlite3 +from tensorboard.util import tb_logging +logger = tb_logging.get_logger() + + +class DBConnection: + + def __init__(self, db_path): + self.db_path = db_path + self.conn = self._initialize_db_connection() + + def is_connected(self) -> bool: + """Check if database is connected.""" + return self.conn is not None + + def _initialize_db_connection(self) -> None: + """Initialize database connection.""" + try: + conn = sqlite3.connect(self.db_path, check_same_thread=False) + conn.row_factory = sqlite3.Row + return conn + except sqlite3.Error as e: + logger.error(f"Error connecting to database: {e}") + return None diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/repositories/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/repositories/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/repositories/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/repositories/monvis_repo.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/repositories/monvis_repo.py new file mode 100644 index 0000000000000000000000000000000000000000..74fbe02d12b54d358d5d31e7f8fb54507d98288c --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/repositories/monvis_repo.py @@ -0,0 +1,116 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import re +from typing import Dict, List, Tuple, Optional +from ..database.db_connection import DBConnection + + +class MonvisRepo: + + def __init__(self, log_dir): + self.db = DBConnection(log_dir) + self.conn = self.db.conn + self.is_db_connected = self.db.is_connected() + + @classmethod + def get_module_name(cls, row: Dict) -> str: + """Generate module name from row data.""" + return "_".join(( + str(row["target_id"]), + str(row["vpp_stage"]), + row["target_name"], + str(row["micro_step"]) + )) + + def query_metrics_stat(self) -> List[str]: + """Get all available metrics from database.""" + query = """ + SELECT m.metric_name, GROUP_CONCAT(ms.stat_name) as stats + FROM monitoring_metrics m + LEFT JOIN metric_stats ms ON m.metric_id = ms.metric_id + GROUP BY m.metric_id + """ + with self.conn as c: + cursor = c.execute(query) + rows = cursor.fetchall() + return rows + + def query_global_stats(self) -> Dict[str, int]: + """Get global statistics from database.""" + query = "SELECT stat_name, stat_value FROM global_stats" + with self.conn as c: + cursor = c.execute(query) + rows = cursor.fetchall() + return {row['stat_name']: row['stat_value'] for row in rows} + + def query_module_names(self) -> Dict[int, str]: + """Get all module names with their target IDs.""" + query = "SELECT target_id, vpp_stage, target_name, micro_step FROM monitoring_targets" + with self.conn as c: + cursor = c.execute(query) + rows = cursor.fetchall() + return {row["target_id"]: self.get_module_name(row) for row in rows} + + def query_metric_id(self, metric_name: str) -> Optional[int]: + """Get metric ID for a given metric name.""" + query = "SELECT metric_id FROM monitoring_metrics WHERE metric_name = ?" + with self.conn as c: + cursor = c.execute(query, (metric_name,)) + row = cursor.fetchone() + return row['metric_id'] if row else None + + def query_relevant_tables(self, metric_id: int) -> List[str]: + """Get all tables relevant to a specific metric ID.""" + query = """ + SELECT name FROM sqlite_master + WHERE type='table' AND name LIKE 'metric_%_step_%' + """ + with self.conn as c: + cursor = c.execute(query) + tables = [table['name'] for table in cursor] + relevant_tables = [] + for table in tables: + match = re.match(r'metric_(\d+)_step_(\d+)_(\d+)', table) + if match and int(match.group(1)) == metric_id: + relevant_tables.append(table) + return relevant_tables + + def query_heatmap_data(self, table: str, stat: str, condition: str, params: tuple) -> List[Dict]: + """Get data for heatmap visualization.""" + query = f""" + SELECT t.rank, t.step, t.target_id, t.{stat}, m.target_name, m.vpp_stage, m.micro_step + FROM {table} t + JOIN monitoring_targets m ON t.target_id = m.target_id + WHERE {condition} + """ + with self.conn as c: + cursor = c.execute(query, params) + rows = cursor.fetchall() + return [dict(row) for row in rows] + + def query_trend_data(self, table: str, stat: str, condition: str, params: tuple) -> List[Tuple]: + """Get data for trend visualization.""" + query = f""" + SELECT t.step, t.rank, t.target_id, t.{stat}, m.target_name, m.vpp_stage, m.micro_step + FROM {table} t + JOIN monitoring_targets m ON t.target_id = m.target_id + WHERE {condition} + """ + with self.conn as c: + cursor = c.execute(query, params) + return [dict(row) for row in cursor] + diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/services/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/services/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/services/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/services/monvis_service.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/services/monvis_service.py new file mode 100644 index 0000000000000000000000000000000000000000..36e6cc14722f12c282c3c75293a491a8f6b4041f --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/services/monvis_service.py @@ -0,0 +1,197 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +from tensorboard.util import tb_logging +from ..repositories.monvis_repo import MonvisRepo +logger = tb_logging.get_logger() + + +class MonvisService(): + + def __init__(self, db_path): + self.db_path = db_path + self.repo = MonvisRepo(db_path) + self.is_db_connected = self.repo.is_db_connected + + def get_metrics_stat(self): + metrics = [] + reslut = self.repo.query_metrics_stat() + for row in reslut: + metric_name = row['metric_name'] + stats = row['stats'].split(',') if row['stats'] else [] + metrics.append({ + "name": metric_name, + "stats": stats + }) + return { + 'success': True, + 'data': metrics + } + + def get_values(self, metric, stat, dimension): + if not metric or not stat: + return {'success': False, 'error': 'metric and stat must not be empty'} + + valid_dimensions = {'step', 'rank', 'module_name'} + if dimension not in valid_dimensions: + return {'success': False, 'error': f'invalid dimension: {dimension}'} + + try: + stats = self.repo.query_global_stats() + values = {} + + if dimension == 'step': + if 'min_step' not in stats or 'max_step' not in stats: + return {'success': False, 'error': 'Step info not found in global_stats'} + values = {v: f'Step {v}' for v in range(stats['min_step'], stats['max_step'] + 1)} + + elif dimension == 'rank': + if 'max_rank' not in stats: + return {'success': False, 'error': 'Rank info not found in global_stats'} + values = {v: f'Rank {v}' for v in range(stats['max_rank'] + 1)} + + else: # module_name + values = self.repo.query_module_names() + + return {'success': True, 'data': values} + + except Exception as e: + return {'success': False, 'error': f'internal error: {str(e)}'} + + def get_heatmap_data(self, metric, stat, dimension, value): + if not all([metric, stat, dimension in ['step', 'rank', 'module_name'], value]): + return { + 'success': False, + 'error': 'Invalid parameters' + } + + try: + metric_id = self.repo.query_metric_id(metric) + if not metric_id: + return {'success': False, 'error': 'metric not found'} + + relevant_tables = self.repo.query_relevant_tables(metric_id) + if not relevant_tables: + return {'success': False, 'error': 'no relevant tables found'} + + selected_value = int(value) + heatmap_data = [] + + if dimension == "step": + for table in relevant_tables: + rows = self.repo.query_heatmap_data( + table, stat, "t.step = ?", (selected_value,) + ) + heatmap_data.extend( + [row['rank'], (row['target_id'], + self.repo.get_module_name(row)), row[stat]] + for row in rows + ) + + elif dimension == "rank": + for table in relevant_tables: + rows = self.repo.query_heatmap_data( + table, stat, "t.rank = ?", (selected_value,) + ) + heatmap_data.extend( + [row['step'], (row['target_id'], + self.repo.get_module_name(row)), row[stat]] + for row in rows + ) + + elif dimension == "module_name": + for table in relevant_tables: + rows = self.repo.query_heatmap_data( + table, stat, "m.target_id = ?", (selected_value,) + ) + heatmap_data.extend( + [row['step'], (row['rank'], row['rank']), row[stat]] + for row in rows + ) + + return { + 'success': True, + 'data': heatmap_data + } + except Exception as e: + return {'success': False, 'error': f'internal error: {str(e)}'} + + def get_trend_data(self, metric, stat, dimension, dim_x, dim_y): + if not all([metric, stat, dimension in ['step', 'rank', 'module_name'], dim_x, dim_y]): + return { + 'success': False, + 'error': 'Invalid parameters' + } + + try: + metric_id = self.repo.query_metric_id(metric) + if not metric_id: + return{ + 'success': False, + 'error': 'metric not found' + } + + relevant_tables = self.repo.query_relevant_tables(metric_id) + if not relevant_tables: + return { + 'success': False, + 'error': 'no relevant tables found' + } + + dim_x = int(dim_x) + dim_y = int(dim_y) + trend_data = [] + if dimension == "step": + for table in relevant_tables: + rows = self.repo.query_trend_data( + table, stat, "t.rank = ? AND t.target_id = ?", ( + dim_x, dim_y) + ) + trend_data.extend((row['step'], row[stat]) for row in rows) + dimensions, values = zip( + *sorted(trend_data, key=lambda x: x[0])) + + elif dimension == "rank": + for table in relevant_tables: + rows = self.repo.query_trend_data( + table, stat, "t.step = ? AND t.target_id = ?", ( + dim_x, dim_y) + ) + trend_data.extend((row['rank'], row[stat]) for row in rows) + dimensions, values = zip(*sorted(trend_data, key=lambda x: x[0])) + + elif dimension == "module_name": + for table in relevant_tables: + rows = self.repo.query_trend_data( + table, stat, "t.step = ? AND t.rank = ?", ( + dim_x, dim_y) + ) + trend_data.extend( + (row['target_id'], self.repo.get_module_name(row), row[stat]) + for row in rows + ) + dimensions, values = list(zip( + *sorted(trend_data, key=lambda x: x[0])))[1:] + return { + 'success': True, + 'data': { + 'dimensions': dimensions, + 'values': values + } + } + except Exception as e: + return {'success': False, 'error': f'internal error: {str(e)}'} + diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/static/__init__.py b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/static/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2432f470b406bca849a0d9362b8e396a7e21b2 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/static/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2025, Huawei Technologies. +# All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/static/index.js b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/static/index.js new file mode 100644 index 0000000000000000000000000000000000000000..77c9699c4c211b901aaebfc8a73215fc1c3a87e4 --- /dev/null +++ b/plugins/tensorboard-plugins/tb_graph_ascend/monvis_plugin/server/static/index.js @@ -0,0 +1,19 @@ +/* ------------------------------------------------------------------------- + Copyright (c) 2025, Huawei Technologies. + All rights reserved. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--------------------------------------------------------------------------------------------*/ +export async function render() { + document.location.href = 'index.html'; +} diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/setup.py b/plugins/tensorboard-plugins/tb_graph_ascend/setup.py index 0a0506dd6e0240194569d7a7e56d3832469e02a3..0983721814205aed159e4435973083665d2679a5 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/setup.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/setup.py @@ -16,7 +16,7 @@ # --------------------------------------------------------------------------------------------# import setuptools -VERSION = '9.1.0' +VERSION = '8.2.0-alpha' INSTALL_REQUIRED = ["tensorboard >= 2.11.2"] setuptools.setup(