diff --git a/profiler/advisor/README.md b/profiler/advisor/README.md deleted file mode 100644 index 77027110559de578d9339c3f5a3d6c762e72a6b5..0000000000000000000000000000000000000000 --- a/profiler/advisor/README.md +++ /dev/null @@ -1,202 +0,0 @@ -# advisor - -msprof-analyze的advisor功能是将Ascend PyTorch Profiler或者msprof采集的PyThon场景性能数据进行分析,并输出性能调优建议(当前暂不支持对db格式文件分析)。 - -性能数据采集方法请参见《[性能分析工具](https://www.hiascend.com/document/detail/zh/mindstudio/70RC1/mscommandtoolug/mscommandug/atlasprofiling_16_0001.html)》。 - -## 工具使用(命令行方式方式) - -### 操作步骤 - -1. 参见《[性能工具](../README.md)》完成工具安装。建议安装最新版本。 - -2. 执行分析。 - - - 总体性能瓶颈 - - ```bash - msprof-analyze advisor all -d $HOME/profiling_data/ - ``` - - - 计算瓶颈 - - ```bash - msprof-analyze advisor computation -d $HOME/profiling_data/ - ``` - - - 调度瓶颈 - - ```bash - msprof-analyze advisor schedule -d $HOME/profiling_data/ - ``` - - 以上命令更多参数介绍请参见“**命令详解**”。 - - 单卡场景需要指定到性能数据文件`*_ascend_pt`目录;多卡或集群场景需要指定到`*_ascend_pt`目录的父目录层级。 - -3. 查看结果。 - - 分析结果输出相关简略建议到执行终端中,并生成`mstt_advisor_{timestamp}.html`和`mstt_advisor_{timestamp}.xlsx`文件供用户预览。 - - `mstt_advisor_{timestamp}.xlsx`文件内容与执行终端输出一致。 - - `mstt_advisor_{timestamp}.html`文件分析详见“**报告解析**”。 - - 执行终端输出示例如下: - - 总体性能瓶颈 - - ![all](./img/all.png) - - 计算瓶颈 - - ![computation](./img/computation.png) - - 调度瓶颈 - - ![schedule](./img/schedule.png) - - - -### 命令详解 - -#### 命令功能介绍 - -| dimension | mode | 参数释义 | -| ---------- | -------------------------- | ---------------------------------------- | -| overall | overall_summary | 计算、通信、空闲等维度对性能数据进行拆解 | -| cluster | slow_rank | 慢卡识别 | -| | slow_link | 慢链路识别 | -| computing | aicpu | AI CPU调优 | -| | dynamic_shape_analysis | 识别动态Shape算子 | -| | block_dim_analysis | block dim算子调优 | -| | operator_no_bound_analysis | operator no bound | -| | graph | 融合算子图调优 | -| | freq_analysis | AI Core算子降频分析 | -| scheduling | timeline_fusion_ops | 亲和API替换调优 | -| | timeline_op_dispatch | 识别算子下发问题(路径3/路径5) | - -- all - - 总体性能瓶颈:包含上表中所有功能。 - -- computation - - 计算瓶颈:包含上表中computing功能。 - -- schedule - - 调度瓶颈:包含上表中scheduling功能。 - -#### 命令格式 - -- 总体性能瓶颈 - - ```bash - msprof-analyze advisor all -d {profiling_path} [-bp benchmark_profiling_path] [-cv cann_version] [-tv torch_version] [-pt profiling_type] [--debug] [-h] - ``` - -- 计算瓶颈 - - ```bash - msprof-analyze advisor computation -d {profiling_path} [-cv cann_version] [-tv torch_version] [-pt profiling_type] [--debug] [-h] - ``` - -- 调度瓶颈 - - ```bash - msprof-analyze advisor schedule -d {profiling_path} [-cv cann_version] [-tv torch_version] [--debug] [-h] - ``` - -#### 参数介绍 - -| 参数 | 说明 | 是否必选 | -| ---------------------------------- | ------------------------------------------------------------ | -------- | -| -d
--profiling_path | 性能数据文件或目录所在路径,Ascend PyTorch Profiler采集场景指定为`*_ascend_pt`性能数据结果目录,其他场景指定为`PROF_XXX`性能数据结果目录。建议通过Ascend PyTorch Profiler获取性能数据。
advisor依赖Profiling工具解析后的timeline数据(.json)、summary(.csv)数据以及info.json*文件,请确保指定的“profiling_path”目录下存在以上文件。 | 是 | -| -bp
--benchmark_profiling_path | 基准性能数据所在目录,用于性能比对。性能数据通过Profiling工具采集获取。
**computation和schedule不支持该参数。** | 否 | -| -cv
--cann_version | 使用Profiling工具采集时对应的CANN软件版本,可通过在环境中执行如下命令获取其version字段,目前配套的兼容版本为“6.3.RC2”,“7.0.RC1”、“7.0.0”、“8.0.RC1”,此字段不填默认按“8.0.RC1”版本数据进行处理,其余版本采集的Profiling数据在分析时可能会导致不可知问题:`cat /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info` | 否 | -| -tv
--torch_version | 运行环境的torch版本,默认为1.11.0,支持torch1.11.0和torch2.1.0,当运行环境torch版本为其他版本如torch1.11.3时,可以忽略小版本号差异选择相近的torch版本如1.11.0。 | 否 | -| -pt
--profiling_type | 配置性能数据采集使用的Profiling工具类型。可取值:
ascend_pytorch_profiler:使用Ascend PyThon Profiler接口方式采集的性能数据时配置,默认值。
msprof:使用msprof命令行方式采集的性能数据时配置。功能完善中,暂不建议使用。
mslite:使用[Benchmark](https://gitee.com/ascend/tools/tree/master/ais-bench_workload/tool/ais_bench)工具采集的性能数据时配置。不建议使用。
**schedule不支持该参数。** | 否 | -| --debug | 工具执行报错时可打开此开关,将会展示详细保存堆栈信息。 | 否 | -| -h,-H
--help | 在需要查询当前命令附属子命令或相关参数时,给出帮助建议。 | 否 | - -### 报告解析 - -如下图所示,工具会从集群、单卡性能拆解、调度和计算等维度进行问题诊断并给出相应的调优建议。 - -![输入图片说明](./img/cluster.png) - -cluster模块的分析包含快慢卡和快慢链路分析,仅识别问题,不提供调优建议。 -如下图示例,识别到当前训练任务的通信和下发(free较多说明存在任务下发存在问题)存在问题。 - -![cluster_1](./img/cluster_1.png) - -overall模块的分析包含当前训练任务慢卡的性能拆解,按照计算、通信和下发三个维度进行耗时的统计,可以基于该分析识别到训练性能瓶颈是计算、通信还是下发问题,同样不提供调优建议。 - -![输入图片说明](./img/overall_0.png) - -![输入图片说明](./img/overall.png) - -schedule模块包含亲和API、aclOpCompile、syncBatchNorm、SynchronizeStream等多项检测。 -如下图示例,Operator Dispatch Issues提示需要在运行脚本的最开头添加如下代码用于消除aclOpCompile: - -```python -torch_npu.npu.set_compile_mode(jit_compile=False); -torch_npu.npu.config.allow_internal_format = False -``` - -![输入图片说明](./img/schedule_1.png) - -如下图示例,Synchronize Stream Issues提示存在耗时较多的同步流,并给出触发同步流的堆栈,需要根据堆栈来修改对应代码消除同步流。 - -![schedule_2](./img/schedule_2.png) - -如下图示例,Affinity API Issues提示存在可以替换的亲和API并给出对应的堆栈,用户可以根据堆栈找到需要修改的代码,并给出修改案例(API instruction超链接)。 - -![schedule_3](./img/schedule_3.png) - -computation模块从device计算性能维度进行分析,能够识别AI CPU、计算bound、动态Shape、AI Core算子降频分析等问题并给出相应建议。此处不再详细展开,按照报告进行调优即可。 - -![computation_1](./img/computation_1.png) - -## 工具使用(Jupyter Notebook方式) - -Jupyter Notebook使用方式如下: - -下列以Windows环境下执行为例介绍。 - -1. 在环境下安装Jupyter Notebook工具。 - - ```bash - pip install jupyter notebook - ``` - - Jupyter Notebook工具的具体安装和使用指导请至Jupyter Notebook工具官网查找。 - -2. 在环境下安装mstt工具。 - - ``` - git clone https://gitee.com/ascend/mstt.git - ``` - - 安装环境下保存Ascend PyTorch Profiler采集的性能数据。 - -3. 进入mstt\profiler\advisor目录执行如下命令启动Jupyter Notebook工具。 - - ```bash - jupyter notebook - ``` - - 执行成功则自动启动浏览器读取mstt\profiler\advisor目录,如下示例: - - ![jupyter_report](./img/jupyter_report.PNG) - - 若在Linux环境下则回显打印URL地址,即是打开Jupyter Notebook工具页面的地址,需要复制URL,并使用浏览器访问(若为远端服务器则需要将域名“**localhost**”替换为远端服务器的IP),进入Jupyter Notebook工具页面。 - -4. 每个.ipynb文件为一项性能数据分析任务,选择需要的.ipynb打开,并在*_path参数下拷贝保存Ascend PyTorch Profiler采集的性能数据的路径。如下示例: - - ![advisor_result](./img/advisor_result.PNG) - -5. 单击运行按钮执行性能数据分析。 - - 分析结果详细内容会在.ipynb页面下展示。 diff --git a/profiler/advisor/__init__.py b/profiler/advisor/__init__.py deleted file mode 100644 index e79018ed05c6d1cdeb56feaa6182f048e3c8e06f..0000000000000000000000000000000000000000 --- a/profiler/advisor/__init__.py +++ /dev/null @@ -1,17 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -from profiler.advisor.interface.interface import Interface \ No newline at end of file diff --git a/profiler/advisor/advisor_backend/__init__.py b/profiler/advisor/advisor_backend/__init__.py deleted file mode 100644 index a0e9f748f4b10347a874f60cec1fa9f6e5285a5e..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. \ No newline at end of file diff --git a/profiler/advisor/advisor_backend/advice_base.py b/profiler/advisor/advisor_backend/advice_base.py deleted file mode 100644 index 35939bcea9c87fb09f2113bd19f77ea18ba54e34..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_base.py +++ /dev/null @@ -1,50 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from abc import abstractmethod - - -class AdviceBase: - DATA = "data" - BOTTLENECK = "bottleneck" - ADVICE = "advice" - - def __init__(self, collection_path: str): - self.collection_path = os.path.realpath(collection_path) - self.bottelneck = '' - self.output_format_data = { - self.DATA: [], - self.BOTTLENECK: '', - self.ADVICE: '' - } - - @abstractmethod - def path_check(self): - """ - check whether input path is valid - """ - - @abstractmethod - def run(self): - """ - analyze profiling data and advice - """ - - @abstractmethod - def output(self): - """ - output relevant data - """ \ No newline at end of file diff --git a/profiler/advisor/advisor_backend/advice_factory/__init__.py b/profiler/advisor/advisor_backend/advice_factory/__init__.py deleted file mode 100644 index a0e9f748f4b10347a874f60cec1fa9f6e5285a5e..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_factory/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. \ No newline at end of file diff --git a/profiler/advisor/advisor_backend/advice_factory/advice_factory.py b/profiler/advisor/advisor_backend/advice_factory/advice_factory.py deleted file mode 100644 index 639f4800cfe8c9acdc8fe7ea5f65a43fc8892b2b..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_factory/advice_factory.py +++ /dev/null @@ -1,50 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os - -from common_func.path_manager import PathManager - - -class AdviceFactory: - def __init__(self, collection_path: str): - self.collection_path = os.path.realpath(collection_path) - - @staticmethod - def run_advice(self, advice: str, kwargs: dict): - """ - run advice to produce data - """ - - def produce_advice(self, advice: str, kwargs: dict): - """ - produce data for input mode and advice - """ - self.path_check() - self.advice_check(advice) - return self.run_advice(advice, kwargs) - - def path_check(self): - """ - check whether input path is valid - """ - PathManager.input_path_common_check(self.collection_path) - - def advice_check(self, advice: str): - """ - check whether input advice is valid - """ - if advice not in self.ADVICE_LIB.keys(): - msg = '[ERROR]Input advice is illegal.' - raise RuntimeError(msg) diff --git a/profiler/advisor/advisor_backend/advice_factory/cluster_advice_factory.py b/profiler/advisor/advisor_backend/advice_factory/cluster_advice_factory.py deleted file mode 100644 index 6bb93f46704eb13fef14d070f891e350446829ea..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_factory/cluster_advice_factory.py +++ /dev/null @@ -1,38 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from advice_factory.advice_factory import AdviceFactory -from cluster_advice.slow_link_advice import SlowLinkAdvice -from cluster_advice.slow_rank_advice import SlowRankAdvice -from cluster_advice.cluster_pipeline_advice import ClusterPipelineAdvice -from cluster_advice.kernel_cluster_advice import KernelClusterAdvice -from common_func_advisor.constant import Constant - - -class ClusterAdviceFactory(AdviceFactory): - ADVICE_LIB = { - Constant.SLOW_RANK: SlowRankAdvice, - Constant.SLOW_LINK: SlowLinkAdvice, - Constant.PIPELINE: ClusterPipelineAdvice, - Constant.KERNEL: KernelClusterAdvice - } - - def __init__(self, collection_path: str): - super().__init__(collection_path) - - def run_advice(self, advice: str, kwargs: dict): - """ - run advice to produce data - """ - return self.ADVICE_LIB.get(advice)(self.collection_path, kwargs).run() diff --git a/profiler/advisor/advisor_backend/advice_factory/compute_advice_factory.py b/profiler/advisor/advisor_backend/advice_factory/compute_advice_factory.py deleted file mode 100644 index 336bef7dd8553eb82586d52260443a7d01e84ab0..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_factory/compute_advice_factory.py +++ /dev/null @@ -1,34 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from common_func_advisor.constant import Constant -from advice_factory.advice_factory import AdviceFactory -from compute_advice.npu_fused_advice import NpuFusedAdvice -from compute_advice.npu_slow_advice import NpuSlowAdvice - - -class ComputeAdviceFactory(AdviceFactory): - ADVICE_LIB = { - Constant.NPU_FUSED: NpuFusedAdvice, - Constant.NPU_SLOW: NpuSlowAdvice, - } - - def __init__(self, collection_path: str): - super().__init__(collection_path) - - def run_advice(self, advice: str, kwargs: dict): - """ - run advice to produce data - """ - return self.ADVICE_LIB.get(advice)(self.collection_path).run() diff --git a/profiler/advisor/advisor_backend/advice_factory/overall_advice_factory.py b/profiler/advisor/advisor_backend/advice_factory/overall_advice_factory.py deleted file mode 100644 index baf80cc200f4c3cd1057b7fc28e750948a450cf1..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_factory/overall_advice_factory.py +++ /dev/null @@ -1,32 +0,0 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from advice_factory.advice_factory import AdviceFactory -from common_func_advisor.constant import Constant -from overall_advice.overall_summary_advice import OverallSummaryAdvice - - -class OverallAdviceFactory(AdviceFactory): - ADVICE_LIB = { - Constant.SUMMARY: OverallSummaryAdvice - } - - def __init__(self, collection_path: str): - super().__init__(collection_path) - - def run_advice(self, advice: str, kwargs: dict): - """ - run advice to produce data - """ - return self.ADVICE_LIB.get(advice)(self.collection_path, kwargs).run() diff --git a/profiler/advisor/advisor_backend/advice_factory/timeline_advice_factory.py b/profiler/advisor/advisor_backend/advice_factory/timeline_advice_factory.py deleted file mode 100644 index 44b352e95a7bb1007bc7373193603c2a0b9d8b6c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/advice_factory/timeline_advice_factory.py +++ /dev/null @@ -1,34 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from advice_factory.advice_factory import AdviceFactory -from common_func_advisor.constant import Constant -from timeline_advice.optimizer_advice import OptimizerAdvice -from timeline_advice.op_schedule_advice import OpScheduleAdvice - - -class TimelineAdviceFactory(AdviceFactory): - ADVICE_LIB = { - Constant.OPTIM: OptimizerAdvice, - Constant.OP_SCHE: OpScheduleAdvice, - } - - def __init__(self, collection_path: str): - super().__init__(collection_path) - - def run_advice(self, advice: str, kwargs: dict): - """ - run advice to produce data - """ - return self.ADVICE_LIB.get(advice)(self.collection_path).run() diff --git a/profiler/advisor/advisor_backend/cluster_advice/__init__.py b/profiler/advisor/advisor_backend/cluster_advice/__init__.py deleted file mode 100644 index 8400fd5ecd1246eaee795cebfccfacc80a94f08c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/cluster_advice/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/profiler/advisor/advisor_backend/cluster_advice/cluster_advice_base.py b/profiler/advisor/advisor_backend/cluster_advice/cluster_advice_base.py deleted file mode 100644 index e9be4675963a9cd48da3b4cd91ee646f8e82468b..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/cluster_advice/cluster_advice_base.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from abc import abstractmethod -from common_func.constant import Constant -from advice_base import AdviceBase -from cluster_analysis import Interface - - -class ClusterAdviceBase(AdviceBase): - def __init__(self, collection_path: str): - super().__init__(collection_path) - - @staticmethod - def compute_max_gap_ratio(data: list, mean: float): - if mean == 0: - return 0 - else: - return (max(data) - min(data)) / mean - - def path_check(self): - """ - check whether input path is valid - """ - for file in os.listdir(self.collection_path): - if file == 'cluster_analysis_output': - print("[INFO]Cluster has been analyzed " - "because of the existence of cluster analysis output directory.") - print("[INFO]Skip Cluster analyze backend.") - return - print("[INFO] cluster analysis is in the process, please wait...") - self.cluster_analyze() - - def cluster_analyze(self): - parameter = { - Constant.COLLECTION_PATH: self.collection_path, - Constant.ANALYSIS_MODE: "all" - } - try: - Interface(parameter).run() - except Exception as e: - raise ValueError(f"Cluster analyze backend failed:{e}") from e - - @abstractmethod - def run(self): - """ - analyze profiling data and advice - """ - - @abstractmethod - def output(self): - """ - output relevant data - """ \ No newline at end of file diff --git a/profiler/advisor/advisor_backend/cluster_advice/cluster_pipeline_advice.py b/profiler/advisor/advisor_backend/cluster_advice/cluster_pipeline_advice.py deleted file mode 100644 index 7f8846f1d99e9bc81636df32d04148df99d12920..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/cluster_advice/cluster_pipeline_advice.py +++ /dev/null @@ -1,437 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import time -import multiprocessing -from typing import Dict -from typing import Optional -from typing import Deque -from typing import List -from typing import Tuple -from collections import defaultdict -from collections import deque -from decimal import Decimal -from dataclasses import dataclass - -from common_func.file_manager import FileManager -from common_func_advisor.constant import Constant -from common_func_advisor.trace_view_preprocessor import FineTraceViewData -from common_func_advisor.trace_view_preprocessor import TraceViewPreProcessor -from cluster_advice.cluster_advice_base import ClusterAdviceBase -from cluster_data_preprocess.pytorch_data_preprocessor import PytorchDataPreprocessor - - -@dataclass -class PipelineTimeSlice: - start: str = "" - end: str = "" - slice_type: str = "" - bp_timeslice: list = None - - def __post_init__(self): - self.bp_timeslice = self.bp_timeslice or [] - - -class PipelineTraceViewer: - STAGE_COLOR = "good" - BUBBLE_COLOR = "generic_work" - FP_COLOR = "good" - BP_COLOR = "bad" - PIPLINE_VIEW = "Pipeline View" - STAGE = "Stage" - BUBBLE = "Bubble" - FP = "FP" - BP = "BP" - - COLORS = { - STAGE: STAGE_COLOR, - BUBBLE: BUBBLE_COLOR, - FP: FP_COLOR, - BP: BP_COLOR - } - - def _gen_trace_pair(self, name: str, start_ts: str, end_ts: str, pid: str, tid: str) -> Dict: - data = { - Constant.OP_NAME: name, - Constant.CNAME: self.COLORS.get(name, self.BUBBLE), - Constant.PH: Constant.PH_X, - Constant.PID: pid, - Constant.OP_TID: tid, - Constant.TS: start_ts, - Constant.DUR: str(Decimal(end_ts) - Decimal(start_ts)) - } - - return data - - def gen_stage_bubble_trace_data(self, rank_id: int, timeslice_list: List[PipelineTimeSlice]) -> List[Dict]: - """ - generate stage bubble trace json data - """ - rank_str = f'Rank {rank_id}' - trace_data = [] - - for timeslice in timeslice_list: - data = self._gen_trace_pair(timeslice.slice_type, timeslice.start, - timeslice.end, self.PIPLINE_VIEW, rank_str) - trace_data.append(data) - - return trace_data - - def gen_fp_bp_trace_data(self, rank_id: int, timeslice_list: List[PipelineTimeSlice]) -> List[Dict]: - """ - generate fp bp trace json data - """ - rank_str = f'Rank {rank_id}' - trace_data = [] - - for timeslice in timeslice_list: - if timeslice.slice_type == self.BUBBLE: - data = self._gen_trace_pair(timeslice.slice_type, timeslice.start, - timeslice.end, self.PIPLINE_VIEW, rank_str) - trace_data.append(data) - else: - last_end = timeslice.start - for bp_bound in timeslice.bp_timeslice: - data = self._gen_trace_pair(self.FP, last_end, - bp_bound[0], self.PIPLINE_VIEW, rank_str) - trace_data.append(data) - last_end = bp_bound[1] - - data = self._gen_trace_pair(self.BP, bp_bound[0], - bp_bound[1], self.PIPLINE_VIEW, rank_str) - trace_data.append(data) - - last_data = self._gen_trace_pair(self.FP, last_end, - timeslice.end, self.PIPLINE_VIEW, rank_str) - trace_data.append(last_data) - - return trace_data - - -class ClusterPipelineAdvice(ClusterAdviceBase): - BUBBLE = "Bubble" - STAGE = "Stage" - PIPELINE_VIEW = "Pipeline View" - SAVE_JSON = "pipeline_view.json" - - def __init__(self, collection_path: str, kwargs: dict): - super().__init__(collection_path) - self.rank_ids = list(set(kwargs.get("rank_ids", []))) - self.worker_num = kwargs.get("worker_num", int(multiprocessing.cpu_count() / 2)) - self.rank_prof_dirs = {} - self.cur_data = [] - self.cur_bottleneck = {} - self.cur_advices = "" - - def run(self) -> dict: - """ - Unified entrance interface - """ - self.rank_prof_dirs = self.get_rank_prof_dirs(self.rank_ids) - if not self.rank_prof_dirs: - print("[ERROR] No rank profiling data found, please check the rank ids or dir path.") - return {} - - self.process() - self.output() - self.identify_bottleneck() - return self.output_format_data - - def process(self) -> None: - """ - process all rank profiling data by using multi-process - """ - start_time = time.time() - print(f"[INFO] Start to process {len(self.rank_prof_dirs)} rank profiling data with {self.worker_num} workers.") - with multiprocessing.Pool(self.worker_num) as pool: - results = pool.map(self.work, self.rank_prof_dirs.items()) - - for (rank_id, _), (res, show_fp_bp) in zip(self.rank_prof_dirs.items(), results): - if show_fp_bp: - self.cur_data += PipelineTraceViewer().gen_fp_bp_trace_data(rank_id, res) - else: - self.cur_data += PipelineTraceViewer().gen_stage_bubble_trace_data(rank_id, res) - print(f"[INFO] Pipline view data process finished, cost {time.time() - start_time:.2f}s.") - - @staticmethod - def _align_trace_bound(results: List) -> None: - """ - align all rank trace bound for better visualization - """ - start_list, end_list = [], [] - for res in results: - start_list.append(res[0].start) - end_list.append(res[-1].end) - - # update all rank trace bound - for res in results: - res[0].start = min(start_list) - res[-1].end = max(end_list) - - def work(self, kv: Tuple[int, str]) -> Tuple[List[PipelineTimeSlice], bool]: - """ - single process worker function - """ - show_fp_bp = False - rank_id, rank_prof_dir = kv - print(f"[INFO] [Rank {rank_id}] Start to process rank profiling data.") - json_path = os.path.join(rank_prof_dir, Constant.ASCEND_PROFILER_OUTPUT, Constant.TRACE_VIEW_JSON) - fine_data = self.load_trace_view_data(json_path) - if not fine_data.hcom_ops or not fine_data.hcom_tids: - print(f"[ERROR] [Rank {rank_id}] No hcom send recv ops found, make sure the trace view data is pipeline " - f"parallel sense.") - return [], show_fp_bp - - timeslice_list = self.get_pipeline_timeslice(fine_data.hcom_ops, fine_data.hcom_tids, fine_data.min_ts, - fine_data.max_ts) - if not fine_data.fp_ops or not fine_data.bp_ops: - print(f"[INFO] [Rank {rank_id}] No frameWork data in trace view, only show stage and bubble.") - elif len(fine_data.hcom_tids) > 1: - print(f"[WARN] [Rank {rank_id}] More than one hcom tid found, only show stage and bubble.") - else: - print(f"[INFO] [Rank {rank_id}] Found frameWork data in trace view, show fp bp and bubble.") - bp_ops = self.get_fp_bp_bound_ops(fine_data) - self.update_stage_fp_bp(timeslice_list, bp_ops) - show_fp_bp = True - print(f"[INFO] [Rank {rank_id}] Rank profiling data process finished.") - - return timeslice_list, show_fp_bp - - def identify_bottleneck(self) -> None: - pass - - def output(self) -> None: - """ - output result - """ - self.cur_data.append( - { - Constant.OP_NAME: Constant.PROCESS_NAME, - Constant.PH: Constant.PH_META, - Constant.PID: self.PIPELINE_VIEW, - Constant.OP_TID: self.PIPELINE_VIEW, - Constant.ARGS: { - Constant.OP_NAME: self.PIPELINE_VIEW - } - } - ) - self.output_format_data[self.DATA] = self.cur_data - self.output_format_data[self.BOTTLENECK] = self.cur_bottleneck - self.output_format_data[self.ADVICE] = self.cur_advices - - def get_rank_prof_dirs(self, rank_ids: list) -> Dict[int, str]: - """ - get rank profiling directories by rank ids - """ - rank_prof_dirs = defaultdict(str) - prof_dirs = [] - for prof_dir in os.listdir(self.collection_path): - if prof_dir.endswith(Constant.PT_PROF_SUFFIX): - prof_dirs.append(os.path.join(self.collection_path, prof_dir)) - - data_map = PytorchDataPreprocessor(prof_dirs).get_data_map() - for rank_id in rank_ids: - if rank_id in data_map: - rank_prof_dirs[rank_id] = data_map[rank_id] - else: - print(f'[Warning] Rank {rank_id} not found in {self.collection_path}') - - return rank_prof_dirs - - @staticmethod - def load_trace_view_data(json_path) -> Optional[FineTraceViewData]: - """ - load trace view data from json file and preprocess - """ - raw_data = FileManager.read_json_file(json_path) - return TraceViewPreProcessor().process(raw_data) - - @staticmethod - def double_queue_pop(fp_que: Deque[dict], bp_que: Deque[dict]) -> Tuple[list, list]: - """ - double queue (fp and bp que) pop alternating algorithm implementation - """ - res_fp_ops, res_bp_ops = [], [] - pop_fp = fp_que[0][Constant.TS] < bp_que[0][Constant.TS] - fp_start_op, fp_end_op = fp_que[0], fp_que[0] - bp_start_op, bp_end_op = bp_que[0], bp_que[0] - - def update_bound_op(que: Deque[dict], start_op: dict, end_op: dict) -> Tuple[dict, dict]: - """ - update fp and bp bound op - """ - op = que.popleft() - op_s = Decimal(op[Constant.TS]) - op_e = op_s + Decimal(op[Constant.DUR]) - - start_op = op if op_s < Decimal(start_op[Constant.TS]) else start_op - end_op = op if op_e > Decimal(end_op[Constant.TS]) + Decimal(end_op[Constant.DUR]) else end_op - - return start_op, end_op - - while fp_que and bp_que: - if pop_fp: - if len(fp_que) > 1 and bp_que and fp_que[1][Constant.TS] > bp_que[0][Constant.TS]: - pop_fp = False # pop bp que - if len(fp_que) == 1: - pop_fp = False # pop bp que - - fp_start_op, fp_end_op = update_bound_op(fp_que, fp_start_op, fp_end_op) - - # time to pop bp que, need to record fp ops and update bp start op - if not pop_fp: - res_fp_ops.append((fp_start_op, fp_end_op)) - if fp_que: - bp_start_op, bp_end_op = bp_que[0], bp_que[0] - else: - if len(bp_que) > 1 and fp_que and bp_que[1][Constant.TS] > fp_que[0][Constant.TS]: - pop_fp = True # pop fp que - if len(bp_que) == 1: - pop_fp = True # pop fp que - - bp_start_op, bp_end_op = update_bound_op(bp_que, bp_start_op, bp_end_op) - - # time to pop fp que, need to record bp ops and update fp start op - if pop_fp: - res_bp_ops.append((bp_start_op, bp_end_op)) - if bp_que: - fp_start_op, fp_end_op = fp_que[0], fp_que[0] - - if fp_que: - fp_start_op, fp_end_op = fp_que[0], fp_que[0] - while fp_que: - fp_start_op, fp_end_op = update_bound_op(fp_que, fp_start_op, fp_end_op) - res_fp_ops.append((fp_start_op, fp_end_op)) - - if bp_que: - bp_start_op, bp_end_op = bp_que[0], bp_que[0] - while bp_que: - bp_start_op, bp_end_op = update_bound_op(bp_que, bp_start_op, bp_end_op) - res_bp_ops.append((bp_start_op, bp_end_op)) - - return res_fp_ops, res_bp_ops - - @staticmethod - def update_ops_time(ops_list: List[List[dict]], torch_to_npu_links: List[dict], - npu_ops_ts_dur: dict) -> List[List[dict]]: - """ - update fp and bp bound ops time at device by using torch_to_npu_links - """ - ops_que = deque(ops_list) - torch_to_npu_que = deque(torch_to_npu_links) - res = [] - link_stack = [] - while ops_que and torch_to_npu_que: - link = torch_to_npu_que.popleft() - link_s = Decimal(link[Constant.TS]) - - # bound op at framework level - cpu_op_l, cpu_op_r = ops_que[0][0], ops_que[0][1] - cpu_op_s = Decimal(cpu_op_l[Constant.TS]) - cpu_op_e = Decimal(cpu_op_r[Constant.TS]) + Decimal(cpu_op_r[Constant.DUR]) - - if cpu_op_s < link_s < cpu_op_e: - link_stack.append(link) - if link_s > cpu_op_e or \ - (link_stack and not torch_to_npu_que): - min_link = link_stack[0] - max_link = link_stack[-1] - - min_link_s = str(min_link[Constant.ID]) - max_link_s = str(max_link[Constant.ID]) - # for compatibility with old data (ts is float type) - if isinstance(min_link[Constant.ID], float): - cpu_op_l["npu_op_ts"] = min_link_s - cpu_op_r["npu_op_ts"] = max_link_s - else: - cpu_op_l["npu_op_ts"] = f"{min_link_s[:-3]}.{min_link_s[-3:]}" - cpu_op_r["npu_op_ts"] = f"{max_link_s[:-3]}.{max_link_s[-3:]}" - cpu_op_l["npu_op_dur"] = npu_ops_ts_dur.get(cpu_op_l["npu_op_ts"], 0) - cpu_op_r["npu_op_dur"] = npu_ops_ts_dur.get(cpu_op_r["npu_op_ts"], 0) - - res.append([cpu_op_l, cpu_op_r]) - ops_que.popleft() - link_stack.clear() - - return res - - def get_fp_bp_bound_ops(self, fine_data: FineTraceViewData) -> List[List[dict]]: - """ - get fp and bp bound ops by using double queue alternating pop algorithm and - update fp and bp bound ops time at device by using torch_to_npu_links - """ - fp_que = deque(fine_data.fp_ops) - bp_que = deque(fine_data.bp_ops) - - # get fp and bp bound ops - _, res_bp_ops = self.double_queue_pop(fp_que, bp_que) - - # according to torch_to_npu_links, split fp and bp timeslice - bp_ops = self.update_ops_time(res_bp_ops, fine_data.torch_to_npu_links, fine_data.npu_ops_ts_dur) - return bp_ops - - def get_pipeline_timeslice(self, hcom_ops: list, hcom_tids: list, - min_ts: str, max_ts: str) -> List[PipelineTimeSlice]: - """ - get pipeline timeslice by using hcom ops - """ - timeslice_list = [] - last_op_end = None - if len(hcom_tids) > 1: - print("[WARN] More than one hcom tid found, default to show minimal tid pipeline view.") - - for op in hcom_ops: - if op[Constant.OP_TID] == min(hcom_tids): - # gap between two hcom ops - if last_op_end: - timeslice_list.append(PipelineTimeSlice(str(last_op_end), op[Constant.TS], self.STAGE)) - # hcom op - last_op_end = Decimal(op[Constant.TS]) + Decimal(op[Constant.DUR]) - timeslice_list.append(PipelineTimeSlice(op[Constant.TS], str(last_op_end), self.BUBBLE)) - - # add start STAGE and end STAGE - timeslice_list.insert(0, PipelineTimeSlice(min_ts, timeslice_list[0].start, self.STAGE)) - timeslice_list.insert(len(timeslice_list), PipelineTimeSlice(timeslice_list[-1].end, max_ts, self.STAGE)) - return timeslice_list - - def update_stage_fp_bp(self, timeslice_list: List[PipelineTimeSlice], - bp_ops: List[List[dict]]) -> None: - """ - update stage fp and bp time - """ - pipeline_que = deque(timeslice_list) - bp_bound_que = deque(bp_ops) - - while pipeline_que and bp_bound_que: - while pipeline_que[0].slice_type != self.STAGE: - pipeline_que.popleft() - if not pipeline_que: - return None - - bp_bound_data = bp_bound_que[0] - bp_bound_s = Decimal(bp_bound_data[0]['npu_op_ts']) - bp_bound_e = Decimal(bp_bound_data[1]['npu_op_ts']) + Decimal(bp_bound_data[1]['npu_op_dur']) - - pipeline_s = Decimal(pipeline_que[0].start) - pipeline_e = Decimal(pipeline_que[0].end) - - if pipeline_s <= bp_bound_s and bp_bound_e <= pipeline_e: - pipeline_que[0].bp_timeslice.append((str(bp_bound_s), str(bp_bound_e))) - bp_bound_que.popleft() - elif bp_bound_s > pipeline_e: - pipeline_que.popleft() - else: - bp_bound_que.popleft() diff --git a/profiler/advisor/advisor_backend/cluster_advice/kernel_cluster_advice.py b/profiler/advisor/advisor_backend/cluster_advice/kernel_cluster_advice.py deleted file mode 100644 index 6fa83c765f5fe1f4ac20dcc62895fe0450e338ce..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/cluster_advice/kernel_cluster_advice.py +++ /dev/null @@ -1,62 +0,0 @@ -import os -import pandas as pd -from common_func.path_manager import PathManager -from common_func.constant import Constant -from common_func_advisor.constant import Constant as AdvisorConstant -from cluster_advice.cluster_advice_base import ClusterAdviceBase -from cluster_data_preprocess.pytorch_data_preprocessor import PytorchDataPreprocessor - - -class KernelClusterAdvice(ClusterAdviceBase): - COLUMNS_TO_GROUP = ["Name", "Input Shapes", "Input Data Types", "Output Shapes"] - COLUMNS_TO_CAL = ["Duration(us)"] - CAL_FUN = ['mean', 'var', 'max', 'min', 'count', 'sum'] - - def __init__(self, collection_path: str, kwargs: dict = None): - super().__init__(collection_path) - self.all_kernel_data = pd.DataFrame() - - def run(self): - self.load_kernel_details_data() - return self.calculate_data() - - def load_kernel_details_data(self): - prof_dirs = self.get_prof_dirs(self.collection_path) - if not prof_dirs: - msg = "[ERROR] There is no profile in this collection path, terminate analysis." - raise RuntimeError(msg) - - data_map = PytorchDataPreprocessor(prof_dirs).get_data_map() - self.all_kernel_data = pd.DataFrame() - for rank_id, profiling_dir_path in data_map.items(): - kernel_file = os.path.join(profiling_dir_path, Constant.SINGLE_OUTPUT, Constant.KERNEL_DETAILS_CSV) - if kernel_file: - # 判断csv文件大小 - PathManager.check_path_readable(kernel_file) - # 读取CSV文件 - df_temp = pd.read_csv(kernel_file) - columns_to_keep = self.COLUMNS_TO_GROUP + self.COLUMNS_TO_CAL - if [1 for element in columns_to_keep if element not in list(df_temp)]: - msg = "[ERROR] Kernel details.csv has wrong data columns, terminate analysis." - raise RuntimeError(msg) - df = df_temp[columns_to_keep] - df.insert(loc=0, column='rank id', value=rank_id) - # 将数据添加到最终的数据框中 - self.all_kernel_data = pd.concat([self.all_kernel_data, df], ignore_index=True) - - def calculate_data(self): - # 存储所有合并后的数据 - calculate_dict = {self.COLUMNS_TO_CAL[i]: self.CAL_FUN - for i in range(len(self.COLUMNS_TO_CAL))} - group_col = ["rank id"] + self.COLUMNS_TO_GROUP - view_data = self.all_kernel_data.groupby(group_col).agg(calculate_dict).reset_index() - view_data.columns = [''.join(col) if col[1] == "" else '_'.join(col) for col in view_data.columns] - return view_data - - def get_prof_dirs(self, collection_path): - prof_dirs = [] - for prof_dir in os.listdir(collection_path): - if prof_dir.endswith(AdvisorConstant.PT_PROF_SUFFIX): - prof_dirs.append(os.path.join(collection_path, prof_dir)) - - return prof_dirs \ No newline at end of file diff --git a/profiler/advisor/advisor_backend/cluster_advice/slow_link_advice.py b/profiler/advisor/advisor_backend/cluster_advice/slow_link_advice.py deleted file mode 100644 index f8a625242f3939602cbb7b8391cd8062e21fe01b..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/cluster_advice/slow_link_advice.py +++ /dev/null @@ -1,110 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from collections import defaultdict -from common_func_advisor.constant import Constant -from common_func.file_manager import FileManager -from cluster_advice.cluster_advice_base import ClusterAdviceBase - - -class SlowLinkAdvice(ClusterAdviceBase): - RDMA_TIME_MS = "RDMA time(ms)" - RDMA_SIZE_MB = "RDMA size(mb)" - SDMA_TIME_MS = "SDMA time(ms)" - SDMA_SIZE_MB = "SDMA size(mb)" - RDMA_BANDWIDTH = "RDMA bandwidth(GB/s)" - SDMA_BANDWIDTH = "SDMA bandwidth(GB/s)" - COMMUNICATION_BANDWIDTH_INFO = "Communication Bandwidth Info" - TRANSIT_TIME = "Transit Time(ms)" - TRANSIT_SIZE = "Transit Size(MB)" - SDMA = "SDMA" - RDMA = "RDMA" - - def __init__(self, collection_path: str, kwargs: dict = None): - super().__init__(collection_path) - self.rank_bw_dict = defaultdict(lambda: { - self.RDMA_TIME_MS: 0, - self.RDMA_SIZE_MB: 0, - self.SDMA_TIME_MS: 0, - self.SDMA_SIZE_MB: 0, - }) - - @staticmethod - def compute_ratio(dividend: float, divisor: float): - if abs(divisor) < 1e-15: - return 0 - else: - return round(dividend / divisor, 4) - - def load_communication_json(self): - json_path = os.path.join(self.collection_path, Constant.CLUSTER_ANALYSIS_OUTPUT, Constant.CLUSTER_COMM_JSON) - if not os.path.exists(json_path): - msg = "[ERROR] cluster_communication.json doesn't exist, terminate analysis." - raise RuntimeError(msg) - communication_json = FileManager.read_json_file(json_path) - return communication_json - - def run(self): - self.path_check() - communication_json = self.load_communication_json() - self.process(communication_json) - self.output() - return self.output_format_data - - def process(self, communication_json: dict): - for comm_group, group_dict in communication_json.items(): - for step, step_dict in group_dict.items(): - for op, op_dict in step_dict.items(): - self.compute_bandwidth(op_dict) - if self.rank_bw_dict: - self.produce_bottleneck(self.RDMA_BANDWIDTH) - self.produce_bottleneck(self.SDMA_BANDWIDTH) - - def compute_bandwidth(self, op_dict: dict): - for rank_id, rank_dict in op_dict.items(): - try: - rank = int(rank_id) - except ValueError as e: - msg = "[ERROR] Cluster_communication.json has invalid structure." - raise ValueError(msg) from e - for comm_type, bw_dict in rank_dict.get(self.COMMUNICATION_BANDWIDTH_INFO, {}).items(): - if comm_type == self.SDMA: - self.rank_bw_dict[rank][self.SDMA_SIZE_MB] += bw_dict.get(self.TRANSIT_SIZE) - self.rank_bw_dict[rank][self.SDMA_TIME_MS] += bw_dict.get(self.TRANSIT_TIME) - if comm_type == self.RDMA: - self.rank_bw_dict[rank][self.RDMA_SIZE_MB] += bw_dict.get(self.TRANSIT_SIZE) - self.rank_bw_dict[rank][self.RDMA_TIME_MS] += bw_dict.get(self.TRANSIT_TIME) - - for rank, rank_dict in self.rank_bw_dict.items(): - self.rank_bw_dict[rank][self.RDMA_BANDWIDTH] = self.compute_ratio( - self.rank_bw_dict[rank][self.RDMA_SIZE_MB], self.rank_bw_dict[rank][self.RDMA_TIME_MS]) - self.rank_bw_dict[rank][self.SDMA_BANDWIDTH] = self.compute_ratio( - self.rank_bw_dict[rank][self.SDMA_SIZE_MB], self.rank_bw_dict[rank][self.SDMA_TIME_MS]) - - def produce_bottleneck(self, link_type: str): - data_list = [rank_dict.get(link_type, 0) for rank_id, rank_dict in self.rank_bw_dict.items()] - avg_bw = round(sum(data_list) / len(data_list), 3) - if avg_bw == 0: - return - self.bottelneck += f'{link_type}: \n' \ - f'The average is {avg_bw}, ' \ - f'while the maximum is {round(max(data_list), 3)}GB/s and ' \ - f'the minimum is {round(min(data_list), 3)}GB/s. ' \ - f'the difference is {round(max(data_list) - min(data_list), 3)}GB/s. \n' - - def output(self): - self.output_format_data[self.DATA] = self.rank_bw_dict - self.output_format_data[self.BOTTLENECK] = self.bottelneck diff --git a/profiler/advisor/advisor_backend/cluster_advice/slow_rank_advice.py b/profiler/advisor/advisor_backend/cluster_advice/slow_rank_advice.py deleted file mode 100644 index 4e789fb7fb688626df7e8f5b25b84e4955d6c2a3..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/cluster_advice/slow_rank_advice.py +++ /dev/null @@ -1,71 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from collections import defaultdict -from common_func_advisor.constant import Constant -from common_func.file_manager import FileManager -from cluster_advice.cluster_advice_base import ClusterAdviceBase -from prof_bean_advisor.cluster_step_trace_time_bean import ClusterStepTraceTimeBean - - -class SlowRankAdvice(ClusterAdviceBase): - RANK = "rank" - RATIO_THRESHOLD = 0.05 - BOTTLENECK_LIST = ['Computing', 'Communication', "Free"] - - def __init__(self, collection_path: str, kwargs: dict = None): - super().__init__(collection_path) - - def load_step_time(self): - csv_path = os.path.join(self.collection_path, Constant.CLUSTER_ANALYSIS_OUTPUT, Constant.CLUSTER_STEP_TIME_CSV) - if not os.path.exists(csv_path): - msg = "[ERROR] cluster_step_trace_time.csv doesn't exist, terminate analysis." - raise RuntimeError(msg) - step_time = FileManager.read_csv_file(csv_path, ClusterStepTraceTimeBean) - return step_time - - def run(self): - self.path_check() - step_data = self.load_step_time() - step_dict = self.process(step_data) - self.output(step_dict) - return self.output_format_data - - def process(self, step_data: list): - step_dict = defaultdict(lambda: [0, 0, 0, 0]) - for step_bean in step_data: - if step_bean.type == self.RANK: - step_dict[step_bean.index][0] += step_bean.compute - step_dict[step_bean.index][1] += step_bean.communication - step_dict[step_bean.index][2] += step_bean.free - total_time_list = [sum(data_tuple) for rank_id, data_tuple in step_dict.items()] - if total_time_list: - mean_total_time = sum(total_time_list) / len(total_time_list) - for i in range(len(self.BOTTLENECK_LIST)): - self.produce_bottleneck(step_dict, i, mean_total_time) - return step_dict - - def produce_bottleneck(self, step_dict: dict, produce_type: int, mean_total_time: float): - data_list = [data_tuple[produce_type] for rank_id, data_tuple in step_dict.items()] - max_ratio = self.compute_max_gap_ratio(data_list, mean_total_time) - if max_ratio > self.RATIO_THRESHOLD: - self.bottelneck += f'{self.BOTTLENECK_LIST[produce_type]} has some issues in the cluster, ' \ - f'because the max difference of {self.BOTTLENECK_LIST[produce_type]} time ' \ - f'has reached {round(max_ratio * mean_total_time / 1000, 3)}ms. \n' - - def output(self, step_dict: dict): - self.output_format_data[self.DATA] = step_dict - self.output_format_data[self.BOTTLENECK] = self.bottelneck diff --git a/profiler/advisor/advisor_backend/common_func_advisor/__init__.py b/profiler/advisor/advisor_backend/common_func_advisor/__init__.py deleted file mode 100644 index 8400fd5ecd1246eaee795cebfccfacc80a94f08c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/common_func_advisor/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/profiler/advisor/advisor_backend/common_func_advisor/constant.py b/profiler/advisor/advisor_backend/common_func_advisor/constant.py deleted file mode 100644 index 46a7fb24c2dade75c157f18118f29233eb924b88..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/common_func_advisor/constant.py +++ /dev/null @@ -1,225 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from enum import Enum - - -class CsvTitle: - MODEL_NAME = "Model Name" - MODEL_ID = "Model ID" - TASK_ID = "Task ID" - STREAM_ID = "Stream ID" - INFER_ID = "Infer ID" - TASK_START_TIME = "Task Start Time(us)" - TASK_WAIT_TIME = "Task Wait Time(us)" - BLOCK_DIM = "Block Dim" - MIX_BLOCK_DIM = "Mix Block Dim" - HF32_ELIGIBLE = "HF32 Eligible" - INPUT_SHAPES = "Input Shapes" - INPUT_DATA_TYPES = "Input Data Types" - INPUT_FORMATS = "Input Formats" - OUTPUT_SHAPES = "Output Shapes" - OUTPUT_DATA_TYPES = "Output Data Types" - OUTPUT_FORMATS = "Output Formats" - CONTEXT_ID = "Context ID" - AICORE_TIME = "aicore_time(us)" - AIC_TOTAL_CYCLES = "aic_total_cycles" - AIC_MAC_TIME = "aic_mac_time(us)" - AIC_MAC_RATIO = "aic_mac_ratio" - AIC_SCALAR_TIME = "aic_scalar_time(us)" - AIC_SCALAR_RATIO = "aic_scalar_ratio" - AIC_MTE1_TIME = "aic_mte1_time(us)" - AIC_MTE1_RATIO = "aic_mte1_ratio" - AIC_MTE2_TIME = "aic_mte2_time(us)" - AIC_MTE2_RATIO = "aic_mte2_ratio" - AIC_FIXPIPE_TIME = "aic_fixpipe_time(us)" - AIC_FIXPIPE_RATIO = "aic_fixpipe_ratio" - AIC_ICACHE_MISS_RATE = "aic_icache_miss_rate" - AIV_TIME = "aiv_time(us)" - AIV_TOTAL_CYCLES = "aiv_total_cycles" - AIV_VEC_TIME = "aiv_vec_time(us)" - AIV_VEC_RATIO = "aiv_vec_ratio" - AIV_SCALAR_TIME = "aiv_scalar_time(us)" - AIV_SCALAR_RATIO = "aiv_scalar_ratio" - AIV_MTE2_TIME = "aiv_mte2_time(us)" - AIV_MTE2_RATIO = "aiv_mte2_ratio" - AIV_MTE3_TIME = "aiv_mte3_time(us)" - AIV_MTE3_RATIO = "aiv_mte3_ratio" - AIV_ICACHE_MISS_RATE = "aiv_icache_miss_rate" - CUBE_UTILIZATION = "cube_utilization( %)" - TASK_DURATION_SUM = "Task Duration Sum(us)" - TASK_DURATION_MEAN = "Task Duration Mean(us)" - TASK_DURATION_STD = "Task Duration Std(us)" - TASK_DURATION_RATIO = "Task Duration Ratio(100%)" - SIZE = "size(MB)" - THROUGHPUT = "throughput(GB/s)" - COLOR = "color" - GAP = "Gap(us)" - DURATION_SUM = "Duration Sum(us)" - COUNT = "Count" - MAX_DURATION = "Max Duration(us)" - MIN_DURATION = "Min Duration(us)" - AVG_DURATION = "Avg Duration(us)" - DURATION_RATIO = "Duration Ratio" - INDEX = "Index" - - -# 定义CSV_TITILE_V1类,继承自CSV_TITILE类, 适配旧版csv -class CsvTitleV1(CsvTitle): - OP_NAME = "Op Name" - OP_TYPE = "OP Type" - TASK_TYPE = "Task Type" - TASK_DURATION = "Task Duration(us)" - - -# 定义CSV_TITILE_V1类,继承自CSV_TITILE类, 适配新版csv -class CsvTitleV2(CsvTitle): - OP_NAME = "Name" - OP_TYPE = "Type" - TASK_TYPE = "Accelerator Core" - TASK_DURATION = "Duration(us)" - - -class Constant: - DTYPE_SIZE_MAP = {"int8": 1, "uint8": 1, - "int16": 2, "uint16": 2, - "int32": 4, "uint32": 4, - "int64": 8, "uint64": 8, - "float16": 2, - "bfloat16": 2, - "bf16": 2, - "dt_bf16": 2, - "float32": 4, - "float": 4, - "float64": 8, - "complex64": 8, - "complex128": 16, - "bool": 1} - TP_THRESHOLD = 1150 - MAX_INPUT_MODE_LEN = 30 - MAX_INPUT_ADVICE_LEN = 30 - SMALL_OP_DUR_RATIO = 0.2 - SMALL_OP_NUM_RATIO = 0.2 - BYTE_UNIT_TRANS = 1024 - UNIT_TRANS = 1000 - - # mode list - COMPUTE = "compute" - TIMELINE = "timeline" - CLUSTER = "cluster" - OVERALL = "overall" - PIPELINE = "pipeline" - - # advice list - SLOW_RANK = "slow rank" - SLOW_LINK = "slow link" - KERNEL = "kernel" - - # compute - NPU_FUSED = "npu_fused" - NPU_SLOW = "npu_slow" - - # timeline - OPTIM = "optimizer" - OP_SCHE = "op_schedule" - - # overall - SUMMARY = "summary" - - PT_PROF_SUFFIX = "ascend_pt" - ASCEND_PROFILER_OUTPUT = "ASCEND_PROFILER_OUTPUT" - COLLECTION_PATH = "collection_path" - CLUSTER_ANALYSIS_OUTPUT = "cluster_analysis_output" - KERNEL_DETAILS_CSV = "kernel_details.csv" - CLUSTER_STEP_TIME_CSV = "cluster_step_trace_time.csv" - CLUSTER_COMM_JSON = "cluster_communication.json" - - # pipline - OP_NAME = "name" - OP_TID = "tid" - PID = "pid" - TS = "ts" - DUR = "dur" - CAT = "cat" - ARGS = "args" - PH = "ph" - ID = "id" - PH_START = "s" - PH_BEGIN = "B" - PH_END = "E" - PH_META = "M" - PH_X = "X" - CNAME = "cname" - PROCESS_NAME = "process_name" - FRAMEWORK_NAME = "Python" - ASCEND_HARDWARE_NAME = "Ascend Hardware" - ASYNC_NPU = "async_npu" - STEP_PREFIX = "ProfilerStep#" - FP_ATEN_OP = "aten" - FP_C10D_OP = "c10d" - HCOM_OP_PREFIX = "hcom_" - BP_AUTOGRAD_OP = "autograd" - TRACE_VIEW_JSON = "trace_view.json" - - # pattern_dict key: pattern, value: pattern name - PATTERN_DICT = {("Add", "DropOutDoMask", "Add"): "bias_dropout_add", - ("BatchMatMul", "Mul", "Cast", "Mul", "MaskedFill", "SoftmaxV2", "Cast", "DropOutDoMask", - "AsStrided", "BatchMatMul", "Transpose"): "FA", - ("Transpose", "Transpose", "Transpose", "Mul", "Transpose", "BatchMatMulV2", "MaskedFill", - "Cast", "SoftmaxV2", "Cast", "DropOutDoMask", "BatchMatMulV2", "Transpose"): "FA", - ("Transpose", "BatchMatMulV2", "Transpose", "Transpose", "BatchMatMulV2", "ZerosLike", - "DropOutDoMask", "Cast", "SoftmaxGrad", "Cast", "MaskedFill", "BatchMatMulV2", - "BatchMatMulV2", "Mul"): "FA", - ("Cast", "Square", "ReduceMeanD", "Add", "Rsqrt", "Cast", "Cast", "Mul", "Cast", "Cast", - "Mul", "Cast"): "RMSNORM", - ("Cast", "LayerNorm", "Cast"): "LayerNorm", - ("Add", "LayerNorm"): "AddLayerNorm", - ("Add", "LayerNormV3"): "AddLayerNorm", - ("Gelu", "Add"): "GeluAdd", - ("Cast", "Square", "MemSet", "ReduceMean", "Add", "Rsqrt", "Mul", "Cast", "Mul"): "RMSNorm", - ("BatchMatMul", "RealDiv", "Add", "Maximum", "SoftmaxV2", "Cast", "BatchMatMul"): "FA", - ("BatchMatMulV2", "RealDiv", "Add", "Cast", "Maximum", "Cast", "SoftmaxV2", "AsStrided", - "BatchMatMulV2"): "FA", - ("BatchMatMulV2", "RealDiv", "Add", "Cast", "SoftmaxV2", "Cast", "BroadcastTo", - "BatchMatMulV2"): "FA", - ("Mul", "Slice", "Neg", "Slice", "ConcatD", "Cast", "Mul", "Add"): "RotaryMul", - ("Mul", "AsStrided", "Neg", "AsStrided", "ConcatD", "Mul", "Add"): "RotaryMul", - ("Mul", "Slice", "Neg", "Slice", "ConcatD", "Mul", "Add"): "RotaryMul", - ("MatMulV2", "Swish", "MatMulV2", "Mul", "MatMulV2"): "FFN", - ("Transpose", "Transpose", "GatherElement", "Transpose"): "GatherElement", - ("Slice", "Slice", "Swish", "Mul"): "torch_npu.npu_swiglu", - ("Cast", "Mul", "MaskedFill", "SoftmaxV2", "Cast"): "torch_npu.npu_scaled_masked_softmax", - ("Mul", "Slice", "Neg", "Slice", "ConcatD", "Mul"): "torch_npu.npu_rotary_mul", - ("Cast", "Square", "ReduceMeanD", "Add", "Rsqrt", "Mul", "Cast", "Mul"): "torch_npu.npu_rms_norm"} - TITLE = CsvTitleV2 - - @classmethod - def update_title(cls): - cls.TITLE = CsvTitleV1 - - -class CoreType: - AIV = "AI_VECTOR_CORE" - AIC = "AI_CORE" - AICPU = "AI_CPU" - MIX_AIV = "MIX_AIV" - MIX_AIC = "MIX_AIC" - HCCL = "HCCL" - - -class PerfColor(Enum): - WHITE = 0 - GREEN = 1 - YELLOW = 2 - RED = 3 diff --git a/profiler/advisor/advisor_backend/common_func_advisor/trace_view_json.py b/profiler/advisor/advisor_backend/common_func_advisor/trace_view_json.py deleted file mode 100644 index 8171f06ee235fc02da715044b4d310087c36c102..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/common_func_advisor/trace_view_json.py +++ /dev/null @@ -1,209 +0,0 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os -from abc import abstractmethod -from dataclasses import dataclass -from dataclasses import field -from typing import Dict -from typing import List - -import pandas as pd - -from common_func.file_manager import FileManager - - -@dataclass -class TraceObj: - ph: str = "" - bp: str = "" - cat: str = "" - name: str = "" - pid: int = 0 - tid: int = 0 - id: int = 0 - ts: str = "" - dur: float = 0.0 - args: dict = field(default='unknown') - - @abstractmethod - def hash(self): - raise Exception("To be implemented") - - def valid(self): - return self.name != "" - - def check_hashable(self): - if not self.valid(): - raise Exception("Illegal {} to hash".format(self.__class__.name)) - - -@dataclass -class Process(TraceObj): - def hash(self): - self.check_hashable() - # msprof 保证name唯一性 - return self.args.get("name") - - -@dataclass -class Thread(TraceObj): - def hash(self): - self.check_hashable() - # msprof 保证name唯一性 - return self.args.get("name") - - -@dataclass -class DurationEvent(TraceObj): - def hash(self): - self.check_hashable() - return self.ts - - -@dataclass -class FlowEvent(TraceObj): - s_point_ts: str = "" - e_point_ts: str = "" - - def hash(self): - self.check_hashable() - return self.e_point_ts - - -class TraceViewJson: - - def __init__(self, path): - self.processes: Dict[str, Process] = dict() - self.threads: Dict[str, Thread] = dict() - self.python_dur_events: Dict[str, DurationEvent] = dict() - self.cann_dur_events: Dict[str, DurationEvent] = dict() - self.ascend_hardware_dur_events: Dict[str, DurationEvent] = dict() - self.torch_2_npu_flow_events: Dict[str, FlowEvent] = dict() - traces = FileManager.read_json_file(path) - self._load_obj(traces) - - def get_call_stack(self, data: pd.DataFrame, index_id: int, ts_col: str) -> str: - if ts_col not in data.columns.tolist(): - print("[ERROR] No {} col found in data columns.".format(ts_col)) - return "" - row = data.loc[index_id] - timestamp = row[ts_col] - flow_event = self.get_torch_2_npu_flow_event(timestamp) - if not flow_event.valid(): - print("[ERROR] Get flow event failed for pattern {}.".format(row['pattern'])) - return "" - flow_event_s_key = flow_event.s_point_ts - python_dur_events = self.get_python_dur_events_contain_ts(flow_event_s_key) - if not python_dur_events: - print("[ERROR] No python dur event found for pattern {}.".format(row['pattern'])) - return "" - # 保持新老版本callstack兼容性 - if python_dur_events[0].args.get("Call stack"): - # 旧版本 - call_stack_list = python_dur_events[0].args.get("Call stack").split(";") - else: - python_dur_events.sort(key=lambda e: e.ts) - # 新版本 - call_stack_list = [event.name for event in python_dur_events if event.cat == "python_function"] - call_stack = "\n".join(call_stack_list) - return call_stack - - def get_torch_2_npu_flow_event(self, end_time) -> FlowEvent: - if not self.torch_2_npu_flow_events or not self.torch_2_npu_flow_events.get(end_time): - print("[ERROR] Find flow event failed for ts: {}".format(end_time)) - return FlowEvent() - return self.torch_2_npu_flow_events.get(end_time) - - def get_python_dur_events_contain_ts(self, ts) -> List[DurationEvent]: - res = [] - for event in self.python_dur_events.values(): - if float(event.ts) <= float(ts) <= float(event.ts) + event.dur: - res.append(event) - return res - - def _load_obj(self, traces): - self._load_format(traces) - if not self._check_format(): - print("[ERROR] parse json failed for error format") - return - self._load_duration_events(traces) - self._load_torch_to_npu_flow_events(traces) - - def _check_format(self): - # 当前功能只需要这两个process,可扩展 - check_processes = ['Python', 'Ascend Hardware'] - for check_process in check_processes: - if check_process in self.processes: - continue - print("[ERROR] {} process not found in json.".format(check_process)) - return False - return True - - # 加载pid, tid头 - def _load_format(self, traces: List[Dict]): - for i, trace in enumerate(traces): - if trace.get('name') == 'process_name': - if not trace.get('args') or not trace.get('args').get('name') or not trace.get('pid'): - continue - process = Process(**trace) - self.processes[process.hash()] = process - if trace.get('name') == 'thread_name': - if not trace.get('args') or not trace.get('args').get('name') or not trace.get('tid'): - continue - thread = Thread(**trace) - self.threads[thread.hash()] = thread - - def _load_duration_events(self, traces: List[Dict]): - def check_events(_trace): - return _trace.get('name') and _trace.get("ts") and _trace.get("dur") - - python_pid = self.processes.get("Python").pid - cann_pid = self.processes.get("CANN").pid - ascend_hardware_pid = self.processes.get("Ascend Hardware").pid - for i, trace in enumerate(traces): - if trace.get('ph') != 'X': - continue - if not check_events(trace): - continue - event = DurationEvent(**trace) - if trace.get('pid') == python_pid: - self.python_dur_events[event.hash()] = event - elif trace.get('pid') == cann_pid: - self.cann_dur_events[event.hash()] = event - elif trace.get("pid") == ascend_hardware_pid: - self.ascend_hardware_dur_events[event.hash()] = event - - def _load_torch_to_npu_flow_events(self, traces: List[Dict]): - def check_events(_trace): - return _trace.get('name') and _trace.get("id") and _trace.get("ts") - - flow_events_table_by_id = dict() - - python_pid = self.processes.get("Python") - for i, trace in enumerate(traces): - if trace.get('ph') != 's' and trace.get('ph') != 'f' and trace.get('pid') != python_pid: - continue - if not check_events(trace): - continue - event = flow_events_table_by_id.get(trace.get("id")) - if not event: - event = FlowEvent(**trace) - if trace.get('ph') == 's': - event.s_point_ts = trace.get('ts') - else: - event.e_point_ts = trace.get('ts') - flow_events_table_by_id[event.id] = event - - self.torch_2_npu_flow_events = {eve.hash(): eve for eve in flow_events_table_by_id.values()} diff --git a/profiler/advisor/advisor_backend/common_func_advisor/trace_view_preprocessor.py b/profiler/advisor/advisor_backend/common_func_advisor/trace_view_preprocessor.py deleted file mode 100644 index 7b9baa32d9423a46bf93d563a6fabbbbb652aaf8..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/common_func_advisor/trace_view_preprocessor.py +++ /dev/null @@ -1,208 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re -import sys -from typing import Optional -from dataclasses import dataclass - -from common_func_advisor.constant import Constant - - -@dataclass -class FineTraceViewData: - py_pid: int = -1 - fp_tid: int = -1 - bp_tid: int = -1 - ascend_pid: int = -1 - min_ts: str = str(sys.maxsize) - max_ts: str = "0" - hcom_tids: list = None - fp_ops: list = None - bp_ops: list = None - hcom_ops: list = None - npu_ops_ts_dur: dict = None - torch_to_npu_links: list = None - - def __post_init__(self): - self.hcom_tids = self.hcom_tids or [] - self.fp_ops = self.fp_ops or [] - self.bp_ops = self.bp_ops or [] - self.hcom_ops = self.hcom_ops or [] - self.npu_ops_ts_dur = self.npu_ops_ts_dur or {} - self.torch_to_npu_links = self.torch_to_npu_links or [] - - def sort(self): - self.fp_ops.sort(key=lambda x: x[Constant.TS]) - self.bp_ops.sort(key=lambda x: x[Constant.TS]) - self.hcom_ops.sort(key=lambda x: x[Constant.TS]) - self.torch_to_npu_links.sort(key=lambda x: x[Constant.TS]) - - -class TraceViewPreProcessor: - """ - Trace view data preprocess - """ - - @staticmethod - def _is_fp_op(op_name: str) -> bool: - """ - check whether op is fp op - """ - return op_name.startswith(Constant.FP_ATEN_OP) or op_name.startswith(Constant.FP_C10D_OP) - - @staticmethod - def _is_fp_data(data: dict, fp_tid: int, py_pid: int) -> bool: - """ - check whether data is valid fp data - """ - return data[Constant.OP_TID] == fp_tid and \ - Constant.TS in data and Constant.DUR in data and \ - not data[Constant.OP_NAME].startswith(Constant.STEP_PREFIX) and \ - data[Constant.PID] == py_pid - - @staticmethod - def _is_bp_op(op_name: str) -> bool: - """ - check whether op is bp op - """ - return op_name.startswith(Constant.BP_AUTOGRAD_OP) - - @staticmethod - def _is_bp_data(data: dict, bp_tid: int, py_pid: int) -> bool: - """ - check whether data is valid bp data - """ - return data[Constant.OP_TID] == bp_tid and \ - Constant.TS in data and Constant.DUR in data and \ - data[Constant.PID] == py_pid - - @staticmethod - def _is_torch_to_npu_link(data: dict, fp_tid: int) -> bool: - """ - check whether data is torch to npu link - """ - return Constant.CAT in data and data[Constant.CAT] == Constant.ASYNC_NPU and \ - data[Constant.PH] == Constant.PH_START and \ - data[Constant.PID] == fp_tid - - @staticmethod - def _is_send_recv_op(op_name: str) -> bool: - """ - check whether op is hcom send or recv op - """ - # eg: hcom_BatchSendRecv__101_0_1 - p1 = re.compile(r'hcom_\w+SendRecv__\d+') - # eg: hcom_send__101_0_1 - p2 = re.compile(r'hcom_send__\d+') - # eg: hcom_receive__101_0_1 - p3 = re.compile(r'hcom_receive__\d+') - return bool(p1.match(op_name)) or bool(p2.match(op_name)) or bool(p3.match(op_name)) - - @staticmethod - def _is_hcom_op(op_name: str) -> bool: - """ - check whether data is hcom data - """ - return op_name.startswith(Constant.HCOM_OP_PREFIX) - - @staticmethod - def _is_python_process(data: dict) -> bool: - """ - check whether data is python process - """ - return Constant.PH in data and data[Constant.PH] == Constant.PH_META and \ - data[Constant.OP_NAME] == Constant.PROCESS_NAME and \ - data[Constant.ARGS][Constant.OP_NAME] == Constant.FRAMEWORK_NAME - - @staticmethod - def _is_step_op(data: dict) -> bool: - """ - check whether data is step data - """ - return data[Constant.OP_NAME].startswith(Constant.STEP_PREFIX) - - @staticmethod - def _is_ascend_process(data: dict) -> bool: - """ - check whether data is ascend process data - """ - return Constant.PH in data and data[Constant.PH] == Constant.PH_META and \ - data[Constant.OP_NAME] == Constant.PROCESS_NAME and \ - data[Constant.ARGS][Constant.OP_NAME] == Constant.ASCEND_HARDWARE_NAME - - @staticmethod - def _is_npu_op(data: dict, ascend_pid: int) -> bool: - """ - check whether data is npu op - """ - return Constant.PH in data and data[Constant.PH] == Constant.PH_X and \ - not data[Constant.OP_NAME].isupper() and \ - data[Constant.PID] == ascend_pid - - def process(self, raw_data: list) -> Optional[FineTraceViewData]: - """ - preprocess raw data - """ - if not raw_data: - print("[ERROR] No raw data found in trace view data.") - return None - - raw_fp_tids, raw_bp_tids, raw_hcom_tids = set(), set(), set() - fine_data = FineTraceViewData() - - # counting fp ops and bp ops tid and ascend pid - for data in raw_data: - if self._is_fp_op(data[Constant.OP_NAME]): - raw_fp_tids.add(data[Constant.OP_TID]) - elif self._is_bp_op(data[Constant.OP_NAME]): - raw_bp_tids.add(data[Constant.OP_TID]) - elif self._is_send_recv_op(data[Constant.OP_NAME]): - fine_data.hcom_ops.append(data) - raw_hcom_tids.add(data[Constant.OP_TID]) - elif self._is_python_process(data): - fine_data.py_pid = data[Constant.PID] - elif self._is_ascend_process(data): - fine_data.ascend_pid = data[Constant.PID] - - # find max and min ts in hcom ops - if self._is_hcom_op(data[Constant.OP_NAME]): - # for compatibility with old data (ts is float type) - ts = data[Constant.TS] if not isinstance(data[Constant.TS], float) else str(data[Constant.TS]) - fine_data.min_ts = min(fine_data.min_ts, ts) - fine_data.max_ts = max(fine_data.max_ts, ts) - - unique_fp_tid = list(raw_fp_tids - raw_bp_tids) - unique_bp_tid = list(raw_bp_tids) - fine_data.hcom_tids = list(raw_hcom_tids) - - if not unique_fp_tid or not unique_bp_tid: - print("[INFO] No fp or bp tid found in trace view data.") - else: - fine_data.fp_tid, fine_data.bp_tid = unique_fp_tid[0], unique_bp_tid[0] - - # filter fp ops and bp ops and torch_to_npu_links - for data in raw_data: - if self._is_fp_data(data, fine_data.fp_tid, fine_data.py_pid): - fine_data.fp_ops.append(data) - elif self._is_bp_data(data, fine_data.bp_tid, fine_data.py_pid): - fine_data.bp_ops.append(data) - elif self._is_torch_to_npu_link(data, fine_data.fp_tid): - fine_data.torch_to_npu_links.append(data) - elif self._is_npu_op(data, fine_data.ascend_pid): - fine_data.npu_ops_ts_dur[data[Constant.TS]] = data[Constant.DUR] - - fine_data.sort() - return fine_data diff --git a/profiler/advisor/advisor_backend/compute_advice/__init__.py b/profiler/advisor/advisor_backend/compute_advice/__init__.py deleted file mode 100644 index 8400fd5ecd1246eaee795cebfccfacc80a94f08c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/profiler/advisor/advisor_backend/compute_advice/compute_advice_base.py b/profiler/advisor/advisor_backend/compute_advice/compute_advice_base.py deleted file mode 100644 index cafbafd8e28c162bc76edb2f77ebd0645fed552f..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/compute_advice_base.py +++ /dev/null @@ -1,105 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import abstractmethod -from collections import defaultdict -import os - -from advice_base import AdviceBase -from common_func.file_manager import FileManager - - -class ComputeAdviceBase(AdviceBase): - ASCEND_PT = 'ascend_pt' - ASCEND_PROFILER_OUTPUT = 'ASCEND_PROFILER_OUTPUT' - KERNEL_DETAIL_FILE = "kernel_details.csv" - TRACE_VIEW_FILE = "trace_view.json" - - def __init__(self, collection_path: str): - super().__init__(collection_path) - self.kernel_details_path = "" - self.has_preparse = False - self.preparse_data = defaultdict(list) - self.call_stack = None - self.trace_view_path = "" - - def path_check(self): - """ - check whether input path is valid - """ - if not os.path.exists(self.collection_path): - print("[ERROR] Path: {} is not exist.".format(self.collection_path)) - return False - if os.path.isdir(self.collection_path) and self.collection_path.endswith("ascend_pt"): - self.kernel_details_path = os.path.join(self.collection_path, "ASCEND_PROFILER_OUTPUT", - "kernel_details.csv") - if not os.path.exists(self.kernel_details_path): - print("[ERROR] kernel_details.csv is not exist in the Path: {}.".format( - os.path.join(self.collection_path, "ASCEND_PROFILER_OUTPUT"))) - return False - elif os.path.isfile(self.collection_path) and os.path.basename(self.collection_path) == "kernel_details.csv": - self.kernel_details_path = self.collection_path - else: - print("[ERROR] Please input ascend_pt or kernel_details.csv") - return False - print("[INFO] Start to analyse the target file: {}".format(self.kernel_details_path)) - self.preparse() - return True - - def has_callstack(self): - if self.call_stack is not None: - return self.call_stack - profiler_info_json_path = "" - for file in os.listdir(self.collection_path): - if file.startswith("profiler_info"): - profiler_info_json_path = os.path.join(self.collection_path, file) - break - if not profiler_info_json_path: - self.call_stack = False - return self.call_stack - self.trace_view_path = os.path.join(self.collection_path, self.ASCEND_PROFILER_OUTPUT, "trace_view.json") - if not os.path.exists(profiler_info_json_path) or not os.path.exists(self.trace_view_path): - self.call_stack = False - return self.call_stack - info = FileManager.read_json_file(profiler_info_json_path) - if not info.get("config") or not info.get("config").get("common_config") \ - or not info.get("config").get("common_config").get("with_stack"): - self.call_stack = False - return self.call_stack - activities = info.get("config").get("common_config").get("activities") - if not activities or "ProfilerActivity.CPU" not in activities: - self.call_stack = False - return self.call_stack - self.call_stack = info.get("config").get("common_config").get("with_stack") - return self.call_stack - - @abstractmethod - def run(self): - """ - analyze profiling data and advice - """ - - @abstractmethod - def output(self): - """ - output relevant data - """ - self.output_format_data[self.DATA] = self.cur_data - self.output_format_data[self.BOTTLENECK] = self.cur_bottleneck - self.output_format_data[self.ADVICE] = self.cur_advice - - def preparse(self): - if self.has_preparse: - return diff --git a/profiler/advisor/advisor_backend/compute_advice/npu_fused/__init__.py b/profiler/advisor/advisor_backend/compute_advice/npu_fused/__init__.py deleted file mode 100644 index 8400fd5ecd1246eaee795cebfccfacc80a94f08c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/npu_fused/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/profiler/advisor/advisor_backend/compute_advice/npu_fused/csv_analyzer.py b/profiler/advisor/advisor_backend/compute_advice/npu_fused/csv_analyzer.py deleted file mode 100644 index c85c14d618ceda199c9c376abc27a3581eed97b8..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/npu_fused/csv_analyzer.py +++ /dev/null @@ -1,81 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import multiprocessing - -import pandas as pd -import numpy as np - -from common_func_advisor.constant import Constant -from .op_perf import OpPerfFactory - - -class CSVAnalyzer: - def __init__(self, path) -> None: - self._path = path - - def process(self): - df = pd.read_csv(self._path, dtype={"Start Time(us)": str}) - # 分析是否存在可融合的算子 - op_type_list = df["Type"].tolist() - duration_list = df["Duration(us)"].tolist() - start_times = df["Start Time(us)"].tolist() - # 去除末尾的\t分隔符 - start_times = [start_time[:-1] for start_time in start_times] - result_list = [] - for pattern in Constant.PATTERN_DICT.keys(): - result_list.extend(self.find_all_sub_lists(op_type_list, duration_list, start_times, pattern)) - data_frame = pd.DataFrame(result_list) - data_frame.columns = ["pattern_name", "pattern", "len", "count", "duration sum(us)", "op durations(us)", - "index", "first_timestamp"] - return data_frame - - @staticmethod - def find_all_sub_lists(op_type_list, duration_list, start_times, expect_sub_list): - # 创建一个空字典,用来存储子列表和它们的出现次数和起始位置 - len_sub_list = len(expect_sub_list) - expect_sub_list = tuple(expect_sub_list) - sublist_dict = {} - # 遍历列表,从每个位置开始,取长度为N的子列表 - for i in range(len(op_type_list) - len_sub_list + 1): - sublist = tuple(op_type_list[i:i + len_sub_list]) - if sublist != expect_sub_list: - continue - # 如果子列表已经在字典中,就增加它的出现次数,否则就初始化为1 - if sublist in sublist_dict: - # count - sublist_dict[sublist][0] += 1 - # index - sublist_dict[sublist][1].append(i) - # total duration - sublist_dict[sublist][2] += sum(duration_list[i:i + len_sub_list]) - # duration - zip_data = zip(sublist_dict[sublist][3], duration_list[i:i + len_sub_list]) - sublist_dict[sublist][3] = [a + b for a, b in zip_data] - else: - sublist_dict[sublist] = [1, [i], sum(duration_list[i:i + len_sub_list]), - duration_list[i:i + len_sub_list], len_sub_list, start_times[i]] - # 创建一个空列表,用来存储所有重复的子列表 - repeated_sublists = [] - for sublist, (count, index, duration_sum, op_durations, sublist_len, first_time) in sublist_dict.items(): - pattern_name = Constant.PATTERN_DICT.get(sublist, "unknown") - op_durations = [round(num, 2) for num in op_durations] - repeated_sublists.append([pattern_name, sublist, sublist_len, count, - duration_sum, op_durations, index, first_time]) - if len(sublist_dict) == 0: - pattern_name = Constant.PATTERN_DICT.get(expect_sub_list, "unknown") - repeated_sublists.append([pattern_name, expect_sub_list, 0, 0, 0, 0, 0, 0]) - # 返回所有重复的子列表 - return repeated_sublists diff --git a/profiler/advisor/advisor_backend/compute_advice/npu_fused/json_analyzer.py b/profiler/advisor/advisor_backend/compute_advice/npu_fused/json_analyzer.py deleted file mode 100644 index fd2a72ffa39bfde1b3e59450c6d76f51d98110d9..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/npu_fused/json_analyzer.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pandas as pd - -from common_func_advisor.trace_view_json import TraceViewJson - - -class JSONAnalyzer(object): - def __init__(self, path): - self._path = path - - def get_custom_code(self, data: pd.DataFrame, ts_col: str, output_col: str): - trace_json = TraceViewJson(self._path) - callstacks = pd.DataFrame(columns=[output_col]) - - for i, row in data.iterrows(): - if ts_col not in data.columns.tolist(): - print("[ERROR] No {} col found in data columns.".format(ts_col)) - return callstacks - timestamp = row[ts_col] - flow_event = trace_json.get_torch_2_npu_flow_event(timestamp) - if not flow_event.valid(): - print("[ERROR] Get flow event failed for pattern {}.".format(row['pattern'])) - callstacks.loc[i] = "" - continue - flow_event_s_key = flow_event.s_point_ts - python_dur_events = trace_json.get_python_dur_events_contain_ts(flow_event_s_key) - if not python_dur_events: - print("[ERROR] No python dur event found for pattern {}.".format(row['pattern'])) - callstacks.loc[i] = "" - continue - # 保持新老版本callstack兼容性 - if python_dur_events[0].args.get("Call stack"): - # 旧版本 - callstack = python_dur_events[0].args.get("Call stack").split(";") - else: - python_dur_events.sort(key=lambda e: e.ts) - # 新版本 - callstack = [event.name for event in python_dur_events if event.cat == "python_function"] - callstack_str = "\n".join(callstack) - callstacks.loc[i] = callstack_str - return callstacks diff --git a/profiler/advisor/advisor_backend/compute_advice/npu_fused/op_perf.py b/profiler/advisor/advisor_backend/compute_advice/npu_fused/op_perf.py deleted file mode 100644 index 7bcbed5a75807b57a55787c743cfaaff55a68589..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/npu_fused/op_perf.py +++ /dev/null @@ -1,196 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import functools -from typing import Dict - -from common_func_advisor.constant import Constant -from common_func_advisor.constant import CoreType -from common_func_advisor.constant import PerfColor - - -class OpPerfFactory: - @classmethod - def build(cls, op_row: Dict): - if op_row.get(Constant.TITLE.TASK_TYPE) == CoreType.AIV: - return VecOpPerf(op_row) - elif op_row.get(Constant.TITLE.TASK_TYPE) == CoreType.AIC: - return CubeOpPerf(op_row) - else: - return OpPerf(op_row) - - -class OpPerf: - def __init__(self, op_row: Dict): - if "OP Type" in op_row.keys(): - Constant.update_title() - self.row = op_row - self.model_name = op_row.get("Model Name") - self.model_id = op_row.get("Model ID") - self.task_id = op_row.get("Task ID") - self.stream_id = op_row.get("Stream ID") - self.infer_id = op_row.get("Infer ID") - self.op_name = op_row.get("Name") - self.op_type = op_row.get("Type") - self.task_type = op_row.get("Accelerator Core") - self.task_start_time = op_row.get("Start Time(us)") - self.task_duration = op_row.get("Duration(us)") - self.task_wait_time = op_row.get("Wait Time(us)") - self.block_dim = op_row.get("Block Dim") - self.mix_block_dim = op_row.get("Mix Block Dim") - - self.hf32_eligible = op_row.get("HF32 Eligible") - self.input_shapes = op_row.get("Input Shapes") - self.input_data_types = op_row.get("Input Data Types") - self.input_formats = op_row.get("Input Formats") - self.output_shapes = op_row.get("Output Shapes") - self.output_data_types = op_row.get("Output Data Types") - self.output_formats = op_row.get("Output Formats") - self.context_id = op_row.get("Context ID") - self.aicore_time = op_row.get("aicore_time(us)") - self.aic_total_cycles = op_row.get("aic_total_cycles") - - self.aic_mac_time = op_row.get("aic_mac_time(us)") - self.aic_mac_ratio = op_row.get("aic_mac_ratio") - self.aic_scalar_time = op_row.get("aic_scalar_time(us)") - self.aic_scalar_ratio = op_row.get("aic_scalar_ratio") - self.aic_mte1_time = op_row.get("aic_mte1_time(us)") - self.aic_mte1_ratio = op_row.get("aic_mte1_ratio") - self.aic_mte2_time = op_row.get("aic_mte2_time(us)") - self.aic_mte2_ratio = op_row.get("aic_mte2_ratio") - self.aic_fixpipe_time = op_row.get("aic_fixpipe_time(us)") - self.aic_fixpipe_ratio = op_row.get("aic_fixpipe_ratio") - self.aic_icache_miss_rate = op_row.get("aic_icache_miss_rate") - self.aiv_time = op_row.get("aiv_time(us)") - self.aiv_total_cycles = op_row.get("aiv_total_cycles") - self.aiv_vec_time = op_row.get("aiv_vec_time(us)") - self.aiv_vec_ratio = op_row.get("aiv_vec_ratio") - self.aiv_scalar_time = op_row.get("aiv_scalar_time(us)") - self.aiv_scalar_ratio = op_row.get("aiv_scalar_ratio") - self.aiv_mte2_time = op_row.get("aiv_mte2_time(us)") - - self.aiv_mte2_ratio = op_row.get("aiv_mte2_ratio") - self.aiv_mte3_time = op_row.get("aiv_mte3_time(us)") - self.aiv_mte3_ratio = op_row.get("aiv_mte3_ratio") - self.aiv_icache_miss_rate = op_row.get("aiv_icache_miss_rate") - self.cube_utilization = op_row.get("cube_utilization( %)") - - @staticmethod - def get_dtype_size(dtype_str: str): - return Constant.DTYPE_SIZE_MAP.get(dtype_str.lower(), 0) - - @staticmethod - def get_element_count(shape: list): - return functools.reduce(lambda x, y: int(x) * int(y), shape) - - @staticmethod - def shape_to_tuple(shape_str: str) -> tuple: - if not isinstance(shape_str, str): - return [] - shape_str = shape_str.strip('"') - split_shape = shape_str.strip(';') - if not split_shape: - return [] - pairs = split_shape.split(';') - shape_result = [] - for pair in pairs: - pair = pair.strip(";") - elements = pair.split(',') - elements = tuple(int(element) if "" != element else 0 for element in elements) - shape_result.append(elements) - return tuple(shape_result) - - @staticmethod - def dtype_to_tuple(dtypes_str: str) -> tuple: - if not isinstance(dtypes_str, str): - return [] - dtypes_str = dtypes_str.strip('"') - split_dtypes = dtypes_str.strip(';') - if not split_dtypes: - return [] - pairs = split_dtypes.split(';') - return tuple(pairs) - - def get_mac_ratio(self): - return self.aic_mac_ratio - - def get_size(self, shapes_str, dtypes_str): - shapes = self.shape_to_tuple(shapes_str) - dtypes = self.dtype_to_tuple(dtypes_str) - if len(shapes) > len(dtypes): - print(f"[ERROR] The size of shape is greater than that of dtypes.") - return 0 - if len(shapes) < len(dtypes): - shapes = list(shapes) - shapes.extend([(1,)] * (len(dtypes) - len(shapes))) - all_size = 0 - for index, shape in enumerate(shapes): - element_count = self.get_element_count(shape) - dtype_size = self.get_dtype_size(dtypes[index]) - all_size += element_count * dtype_size - return all_size - - def get_calc_size(self): - # input and output bytes (MB) - if not self.input_shapes or not self.output_shapes: - print("[ERROR] There is no tensor data, do not assess vector op performance.") - return 0 - intput_size = self.get_size(self.input_shapes, self.input_data_types) - output_size = self.get_size(self.output_shapes, self.output_data_types) - return (intput_size + output_size) / (Constant.BYTE_UNIT_TRANS * Constant.BYTE_UNIT_TRANS) - - def get_throughput(self): - # throughput(GB/s) - if not self.task_duration or abs(self.task_duration) < 1e-6: - print("[ERROR] There is no task_duration, do not assess vector op performance.") - return 0 - return self.row[Constant.TITLE.SIZE] / Constant.BYTE_UNIT_TRANS / self.task_duration * Constant.UNIT_TRANS * Constant.UNIT_TRANS - - def get_perf_color(self): - return PerfColor.WHITE - - def update(self): - self.row[Constant.TITLE.SIZE] = self.get_calc_size() - self.row[Constant.TITLE.THROUGHPUT] = self.get_throughput() - self.row[Constant.TITLE.COLOR] = self.get_perf_color().name - return self.row - - -class VecOpPerf(OpPerf): - def get_perf_color(self) -> PerfColor: - throughput = self.row[Constant.TITLE.THROUGHPUT] - op_duration = self.task_duration - tp_threshold = Constant.TP_THRESHOLD - if throughput == 0: - return PerfColor.WHITE - if throughput < tp_threshold / 2 and op_duration > 20: - return PerfColor.RED - elif tp_threshold / 2 <= throughput < tp_threshold: - return PerfColor.YELLOW - else: - return PerfColor.GREEN - - -class CubeOpPerf(OpPerf): - def get_perf_color(self) -> PerfColor: - aic_mac_ratio = self.get_mac_ratio() - if not aic_mac_ratio: - print("[WARNING] There is no aic_mac_ratio, do not assess cube op performance.") - return PerfColor.WHITE - elif aic_mac_ratio < 0.6: - return PerfColor.RED - elif 0.6 <= aic_mac_ratio < 0.8: - return PerfColor.YELLOW - else: - return PerfColor.GREEN diff --git a/profiler/advisor/advisor_backend/compute_advice/npu_fused_advice.py b/profiler/advisor/advisor_backend/compute_advice/npu_fused_advice.py deleted file mode 100644 index fd5610bbbbb98d15fbab22bb646b2dd7de36ac3d..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/npu_fused_advice.py +++ /dev/null @@ -1,71 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -from abc import ABC - -import pandas as pd - -from compute_advice.compute_advice_base import ComputeAdviceBase -from compute_advice.npu_fused.csv_analyzer import CSVAnalyzer -from compute_advice.npu_fused.json_analyzer import JSONAnalyzer - - -class NpuFusedAdvice(ComputeAdviceBase, ABC): - - def __init__(self, collection_path: str): - super().__init__(collection_path) - self.cur_data = dict() - self.cur_bottleneck = str() - self.cur_advice = str() - self.kernel_details_path = "" - self.call_stack = None - - def run(self): - if not self.path_check(): - return self.output_format_data - self.process() - self.output() - return self.output_format_data - - def process(self): - csv_analyzer = CSVAnalyzer(self.kernel_details_path) - all_pattern_data = csv_analyzer.process() - all_pattern_data = all_pattern_data.sort_values(by='duration sum(us)', ascending=False) - filter_data = all_pattern_data.get(all_pattern_data.get("duration sum(us)", 0) > 0) - if not self.has_callstack(): - print("[Warning] No call stack info found, advice will be incomplete") - self.cur_data = filter_data - else: - json_analyzer = JSONAnalyzer(self.trace_view_path) - custom_code = json_analyzer.get_custom_code(filter_data, "first_timestamp", "custom code") - self.cur_data = pd.concat([filter_data, custom_code], axis=1) - op_num = len(self.cur_data.index) - op_dur = filter_data["duration sum(us)"].sum() - if op_num > 0: - index = 0 - self.cur_bottleneck = f"The computing time of fusable op is {round(op_dur, 2)} ms." - self.cur_advice = "" - for _, row in self.cur_data.iterrows(): - advice = f"Advice {index}:\n" - cur_op = "[" + ", ".join(row.loc["pattern"]) + "]" - npu_fused_op = row.loc["pattern_name"] - advice += f"Replace {cur_op} with {npu_fused_op}. " - if self.call_stack: - advice += f"This pattern first happened in: \n{row['custom code']}" - if index != op_num - 1: - advice += "\n" - index += 1 - self.cur_advice += advice diff --git a/profiler/advisor/advisor_backend/compute_advice/npu_slow_advice.py b/profiler/advisor/advisor_backend/compute_advice/npu_slow_advice.py deleted file mode 100644 index caff1c792c2171c33a4dd876b0741d6c215c5766..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/compute_advice/npu_slow_advice.py +++ /dev/null @@ -1,82 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from abc import ABC -import multiprocessing - -import pandas as pd - -from compute_advice.compute_advice_base import ComputeAdviceBase -from compute_advice.npu_fused.op_perf import OpPerfFactory -from common_func_advisor.constant import Constant -from common_func_advisor.constant import PerfColor -from advisor_backend.common_func_advisor.trace_view_json import TraceViewJson - - -class NpuSlowAdvice(ComputeAdviceBase, ABC): - OP_PERF_SHEET = "op_perf" - - def __init__(self, collection_path: str): - super().__init__(collection_path) - self.kernel_details_path = "" - self.data = pd.DataFrame() - - @staticmethod - def save_to_excel(data: pd.DataFrame, file_path: str) -> None: - writer = pd.ExcelWriter(file_path, engine="xlsxwriter", mode="w") - data.index.name = Constant.TITLE.INDEX - data.to_excel(writer, index=True, sheet_name=NpuSlowAdvice.OP_PERF_SHEET) - NpuSlowAdvice.color_sheet(data, writer.book, writer.sheets[NpuSlowAdvice.OP_PERF_SHEET]) - writer.sheets[NpuSlowAdvice.OP_PERF_SHEET].freeze_panes = "A2" - writer.close() - - @staticmethod - def color_sheet(data: pd.DataFrame, workbook, worksheet): - color_rgb = { - PerfColor.GREEN.name: workbook.add_format({'bg_color': '#C6EFCE'}), - PerfColor.YELLOW.name: workbook.add_format({'bg_color': '#FFEB9C'}), - PerfColor.RED.name: workbook.add_format({'bg_color': '#FFC7CE'}), - } - for row in data.iterrows(): - color = row[1][Constant.TITLE.COLOR] - fill_format = color_rgb.get(color) - if not fill_format: - continue - worksheet.set_row(row[0] + 1, None, fill_format) - - @staticmethod - def update_op_row(row: tuple): - return OpPerfFactory.build(row[1]).update() - - def get_call_stack(self, data: pd.DataFrame, index_id: int, ts_col: str) -> str: - if not self.has_callstack(): - print("There is no call stack info, please set 'with_stack=True'") - return "" - trace_json = TraceViewJson(self.trace_view_path) - return trace_json.get_call_stack(data, index_id, ts_col) - - def run(self): - if not self.path_check(): - return self.data - self.process() - return self.data - - def process(self): - self.data = pd.read_csv(self.kernel_details_path, dtype={"Start Time(us)": str}) - # 去除末尾的\t分隔符 - self.data["Start Time(us)"] = self.data["Start Time(us)"].apply(lambda x: x[:-1]) - pool = multiprocessing.Pool(multiprocessing.cpu_count()) - result = pool.map(self.update_op_row, self.data.iterrows()) - pool.close() - self.data = pd.DataFrame(result) diff --git a/profiler/advisor/advisor_backend/interface.py b/profiler/advisor/advisor_backend/interface.py deleted file mode 100644 index 3e20c26d4d7bb000b20c28439b28ddf4811f057f..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/interface.py +++ /dev/null @@ -1,62 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os -import sys - -sys.path.append( - os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "advisor_backend")) -sys.path.append( - os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), "compare_tools")) -sys.path.append( - os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), "cluster_analyse")) -from common_func_advisor.constant import Constant -from advisor_backend.advice_factory.cluster_advice_factory import ClusterAdviceFactory -from advisor_backend.advice_factory.compute_advice_factory import ComputeAdviceFactory -from advisor_backend.advice_factory.timeline_advice_factory import TimelineAdviceFactory -from advisor_backend.advice_factory.overall_advice_factory import OverallAdviceFactory - - -class Interface: - def __init__(self, collection_path: str): - self.collection_path = os.path.realpath(collection_path) - self._factory_controller = FactoryController(collection_path) - - def get_data(self: any, mode: str, advice: str, **kwargs): - if len(mode) > Constant.MAX_INPUT_MODE_LEN or len(advice) > Constant.MAX_INPUT_ADVICE_LEN: - msg = '[ERROR]Input Mode is illegal.' - raise RuntimeError(msg) - factory = self._factory_controller.create_advice_factory(mode, kwargs.get("input_path", "")) - return factory.produce_advice(advice, kwargs) - - -class FactoryController: - FACTORY_LIB = { - Constant.CLUSTER: ClusterAdviceFactory, - Constant.COMPUTE: ComputeAdviceFactory, - Constant.TIMELINE: TimelineAdviceFactory, - Constant.OVERALL: OverallAdviceFactory - } - - def __init__(self, collection_path: str): - self.collection_path = os.path.realpath(collection_path) - self.temp_input_path = None - - def create_advice_factory(self, mode: str, input_path: str): - collection_path = input_path if input_path else self.collection_path - return self.FACTORY_LIB.get(mode)(collection_path) - - -if __name__ == "__main__": - Interface() diff --git a/profiler/advisor/advisor_backend/overall_advice/__init__.py b/profiler/advisor/advisor_backend/overall_advice/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/advisor_backend/overall_advice/overall_summary_advice.py b/profiler/advisor/advisor_backend/overall_advice/overall_summary_advice.py deleted file mode 100644 index f5bfc351f2820ac8d797798fd959577da8062ea4..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/overall_advice/overall_summary_advice.py +++ /dev/null @@ -1,176 +0,0 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os - -from advisor_backend.advice_base import AdviceBase -from compare_backend.utils.constant import Constant -from compare_interface.comparison_interface import ComparisonInterface - - -class OverallSummaryAdvice(AdviceBase): - advice_map = { - "Computing Time": "if you want more detailed advice please use msprof-analyze advisor computation.", - "Uncovered Communication Time": "if you want more detailed advice, please use msprof-analyze advisor schedule.", - "Free Time": "if you want more detailed advice please use msprof-analyze advisor schedule." - } - time_name_map = { - "Computing Time": "computing", - "Uncovered Communication Time": "communication", - "Free Time": "free", - 'Cube Time(Num)': 'Cube Time', - 'Vector Time(Num)': 'Vector Time', - 'Flash Attention Time(Forward)(Num)': 'Flash Attention Time(Forward)', - 'Flash Attention Time(Backward)(Num)': 'Flash Attention Time(Backward)', - 'Other Time': "Other Computing Time", - 'SDMA Time(Num)': 'SDMA Time' - } - performance_time_dict = { - "Computing Time": ['Cube Time(Num)', 'Vector Time(Num)', 'Flash Attention Time(Forward)(Num)', - 'Flash Attention Time(Backward)(Num)', 'Other Time'], - "Uncovered Communication Time(Wait Time)": [], - "Free Time": ['SDMA Time(Num)'] - } - - def __init__(self, collection_path: str, kwargs: dict): - super().__init__(collection_path) - self.base_collection_path = kwargs.get("base_collection_path", "") - self._has_base_collection = False - self._is_minimal_profiling = False - self.cur_data = {} - self.cur_bottleneck = {} - self.cur_advices = "" - self._headers = [] - self._base_data = [] - self._comparison_data = [] - - @staticmethod - def split_duration_and_num(time_value: str) -> tuple: - split_data = time_value.split("s") # time value example: 0.229s(1756) - duration, num = 0.0, None - if len(split_data) >= 2: - try: - num = int(split_data[1].strip("()")) - except ValueError: - pass - if len(split_data) >= 1: - try: - duration = float(split_data[0]) - except ValueError: - print(f"[WARNING] Invalid time value: {time_value}.") - return duration, num - - @staticmethod - def calculate_ratio(dividend, divisor): - if not divisor: - return float("inf") - return dividend / divisor - - def run(self): - if self.path_check(): - self.process() - self.output() - self.identify_bottleneck() - return self.output_format_data - - def path_check(self): - if self.base_collection_path: - if os.path.exists(self.base_collection_path): - self._has_base_collection = True - else: - print(f"[WARNING] Invalid path which not exists: {self.base_collection_path}.") - return os.path.exists(self.collection_path) - - def process(self): - base_collection_path = self.base_collection_path if self._has_base_collection else self.collection_path - result_data = ComparisonInterface(base_collection_path, self.collection_path).compare(Constant.OVERALL_COMPARE) - for data in result_data.values(): - self._headers = data.get("headers", []) - rows = data.get("rows", []) - if len(rows) == 2: - self._base_data = rows[0] - self._comparison_data = rows[1] - if not self._headers or not self._comparison_data: - return - self._is_minimal_profiling = 'E2E Time(Not minimal profiling)' not in self._headers - if self._has_base_collection: - self.cur_data["comparison_result"] = result_data - time_category_dict = {} - for time_category, time_list in self.performance_time_dict.items(): - time_value = self.get_time_value(time_category, self._comparison_data) - if time_value == Constant.INVALID_VALUE: - continue - duration, _ = self.split_duration_and_num(time_value) - time_category = time_category.split("(")[0] - time_category_dict[time_category] = duration - self.get_sub_category_time(time_category, time_list, duration) - self.cur_data["overall_data"] = time_category_dict - - def get_time_value(self, header_name: str, data_list: list): - try: - data_index = self._headers.index(header_name) - except ValueError: - return Constant.INVALID_VALUE - try: - time_value = data_list[data_index] - except IndexError: - return Constant.INVALID_VALUE - return time_value - - def get_sub_category_time(self, category: str, time_list: list, total_duration: float): - sub_time_dict = {} - for time_name in time_list: - time_value = self.get_time_value(time_name, self._comparison_data) - if time_value == Constant.INVALID_VALUE: - continue - sub_time_dict.setdefault(f"{category} Subtype", []).append(self.time_name_map.get(time_name, "")) - duration, num = self.split_duration_and_num(time_value) - sub_time_dict.setdefault(f"Duration(s)", []).append(duration) - sub_time_dict.setdefault(f"Duration Ratio", []).append( - "{:.2%}".format(self.calculate_ratio(duration, total_duration))) - sub_time_dict.setdefault(f"Kernel Number", []).append(num) - self.cur_data[self.time_name_map.get(category)] = sub_time_dict - - def identify_bottleneck(self): - overall_data = self.cur_data.get("overall_data") - if not overall_data: - return - e2e_time = '%.3f' % sum([data for data in overall_data.values()]) - overall_bottleneck = f"The Model E2E Time is {e2e_time}s.\n" - comparison_bottleneck = "" - for time_type, time_value in overall_data.items(): - # add subtype time bottleneck - advice = self.advice_map.get(time_type, "") - self.cur_bottleneck[self.time_name_map.get(time_type)] = f"{time_type} is {time_value}s.\n{advice}" - # add overall bottleneck - overall_bottleneck += f" -- {time_type} is {time_value}s\n" - if time_type == "Free Time" and self._is_minimal_profiling and self.calculate_ratio(time_value, - e2e_time) > 0.1: - overall_bottleneck += "percentage of free time exceed the threshold 10%." - if not self._has_base_collection: - continue - # add comparison bottleneck - time_type_origin = "Uncovered Communication Time(Wait Time)" \ - if time_type == "Uncovered Communication Time" else time_type - base_duration, _ = self.split_duration_and_num(self.get_time_value(time_type_origin, self._base_data)) - if time_value > base_duration: - ratio = "{:.2%}".format(self.calculate_ratio(time_value - base_duration, base_duration)) - comparison_bottleneck += f"{time_type} exceeds the benchmark by {ratio}\n" - self.cur_bottleneck["overall_data"] = overall_bottleneck - self.cur_bottleneck["comparison_result"] = comparison_bottleneck - - def output(self): - self.output_format_data[self.DATA] = self.cur_data - self.output_format_data[self.BOTTLENECK] = self.cur_bottleneck - self.output_format_data[self.ADVICE] = self.cur_advices diff --git a/profiler/advisor/advisor_backend/prof_bean_advisor/__init__.py b/profiler/advisor/advisor_backend/prof_bean_advisor/__init__.py deleted file mode 100644 index 8400fd5ecd1246eaee795cebfccfacc80a94f08c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/prof_bean_advisor/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/profiler/advisor/advisor_backend/prof_bean_advisor/cluster_step_trace_time_bean.py b/profiler/advisor/advisor_backend/prof_bean_advisor/cluster_step_trace_time_bean.py deleted file mode 100644 index b108fc77a3f3408d48c79ce6b542f98427d88b0b..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/prof_bean_advisor/cluster_step_trace_time_bean.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -class ClusterStepTraceTimeBean: - STEP = "Step" - TYPE = "Type" - INDEX = "Index" - COMPUTING = "Computing" - COMMUNICATION = "Communication(Not Overlapped)" - FREE = "Free" - - def __init__(self, data: dict): - self._data = data - - @property - def step(self) -> str: - return self._data.get(self.STEP, '') - - @property - def type(self) -> str: - return self._data.get(self.TYPE, '') - - @property - def index(self) -> int: - try: - return int(self._data.get(self.INDEX)) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Index'." - raise ValueError(msg) from e - - @property - def compute(self) -> float: - try: - return float(self._data.get(self.COMPUTING, '')) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Computing'." - raise ValueError(msg) from e - - @property - def communication(self) -> float: - try: - return float(self._data.get(self.COMMUNICATION, '')) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Communication'." - raise ValueError(msg) from e - - @property - def free(self) -> float: - try: - return float(self._data.get(self.FREE, '')) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Free'." - raise ValueError(msg) from e - diff --git a/profiler/advisor/advisor_backend/timeline_advice/__init__.py b/profiler/advisor/advisor_backend/timeline_advice/__init__.py deleted file mode 100644 index 8400fd5ecd1246eaee795cebfccfacc80a94f08c..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/timeline_advice/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/profiler/advisor/advisor_backend/timeline_advice/op_schedule_advice.py b/profiler/advisor/advisor_backend/timeline_advice/op_schedule_advice.py deleted file mode 100644 index 9e492b2156c6faee6c023206f3cfc4f852eeb547..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/timeline_advice/op_schedule_advice.py +++ /dev/null @@ -1,89 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from decimal import Decimal -from common_func_advisor.constant import Constant -from timeline_advice.timeline_advice_base import TimelineAdviceBase - - -class OpScheduleAdvice(TimelineAdviceBase): - def __init__(self, collection_path: str): - super().__init__(collection_path) - self.cur_data = list() - self.cur_bottleneck = str() - self.cur_advice = str() - - def run(self): - if not self.path_check(): - return self.output_format_data - self.preparse() - self.process() - self.output() - return self.output_format_data - - def process(self): - cpt_data = self.preparse_data[self.PREPARSE_TYPE.OVERLAP_CPT] - free_data = self.preparse_data[self.PREPARSE_TYPE.OVERLAP_FREE] - if not cpt_data or not free_data: - print("[ERROR] Fail to find Overlap data.") - return - - op_dur = [entry.get("dur", 0) for entry in cpt_data] - op_free = [0.0] * len(cpt_data) - merge_data = list() - merge_data.extend(cpt_data) - merge_data.extend(free_data) - merge_data.sort(key=lambda x : Decimal(x.get("ts"))) - idx = free_idx = 0 - while idx < len(merge_data) and free_idx < len(op_free): - entry = merge_data[idx] - entry_name = entry.get("name") - if entry_name == 'Free': - op_free[free_idx] = merge_data[idx].get('dur') - elif entry_name == 'Computing': - free_idx += 1 - idx += 1 - self.cur_data.append(op_dur) - self.cur_data.append(op_free) - free_ratio, cpt_ratio, _ = self.get_ratio() - if free_ratio < 0.2: - return - self.cur_bottleneck = f"NPU Utilication: {round(free_ratio * 100, 2)}%, " \ - f"NPU Free Utilization: {round(cpt_ratio * 100, 2)}%." - if len(self.preparse_data[self.PREPARSE_TYPE.SYNCHRONIZE]) > 1: - self.cur_advice = f"Device synchronize {len(self.preparse_data[self.PREPARSE_TYPE.SYNCHRONIZE])} times, " \ - "try to reduce synchronization statements to alleviate the bottleneck of operator delivery.\n" - small_op_num = self.small_op_block(op_free, op_dur) - small_op_ratio = small_op_num / len(op_dur) if op_dur else 0.0 - if small_op_ratio > Constant.SMALL_OP_NUM_RATIO: - self.cur_advice += "There are too many small operators, you can increase the batch size appropriately." - - def small_op_block(self, op_frees, op_durs): - small_op_num = 0 - for op_free, op_dur in zip(op_frees, op_durs): - if op_free > op_dur * Constant.SMALL_OP_DUR_RATIO: - small_op_num += 1 - return small_op_num - - def get_ratio(self): - cpt_data = self.preparse_data[self.PREPARSE_TYPE.OVERLAP_CPT] - free_data = self.preparse_data[self.PREPARSE_TYPE.OVERLAP_FREE] - cmu_data = self.preparse_data[self.PREPARSE_TYPE.OVERLAP_CMU] - cpt_time = sum([x.get("dur", 0) for x in cpt_data]) - free_time = sum([x.get("dur", 0) for x in free_data]) - cmu_time = sum([x.get("dur", 0) for x in cmu_data]) - total_time = cpt_time + free_time + cmu_time - if total_time > 0.0: - return (free_time / total_time, cpt_time / total_time, cmu_time / total_time) - return (0.0, 0.0, 0.0) diff --git a/profiler/advisor/advisor_backend/timeline_advice/optimizer_advice.py b/profiler/advisor/advisor_backend/timeline_advice/optimizer_advice.py deleted file mode 100644 index dee2e7ba563d0d00b4459333dffb4099dee9240a..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/timeline_advice/optimizer_advice.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from timeline_advice.timeline_advice_base import TimelineAdviceBase - - -class OptimizerAdvice(TimelineAdviceBase): - OPTIMIZER_MAP = { - "Optimizer.step#SGD.step": "torch_npu.optim.NpuFusedSGD", - "Optimizer.step#Adadelta.step": "torch_npu.optim.NpuFusedAdadelta", - "Optimizer.step#Lamb.step": "torch_npu.optim.NpuFusedLamb", - "Optimizer.step#Adam.step": "torch_npu.optim.NpuFusedAdam", - "Optimizer.step#AdamW.step": "torch_npu.optim.NpuFusedAdamW", - "Optimizer.step#AdamP.step": "torch_npu.optim.NpuFusedAdamP", - "Optimizer.step#BertAdam.step": "torch_npu.optim.NpuFusedBertAdam", - "Optimizer.step#RMSprop.step": "torch_npu.optim.NpuFusedRMSprop", - "Optimizer.step#RMSpropTF.step": "torch_npu.optim.NpuFusedRMSpropTF", - } - - def __init__(self, collection_path: str): - super().__init__(collection_path) - self.cur_data = list() - self.cur_bottleneck = str() - self.cur_advice = str() - - def run(self): - if not self.path_check(): - return self.output_format_data - self.preparse() - self.process() - self.output() - return self.output_format_data - - def process(self): - if not self.preparse_data[self.PREPARSE_TYPE.OPTIMIZER]: - return - - self.cur_data = list(set([entry.get("name", None) for entry in self.preparse_data[self.PREPARSE_TYPE.OPTIMIZER]])) - for index, opt_name in enumerate(self.cur_data): - self.cur_advice += f"You can choose {self.OPTIMIZER_MAP.get(opt_name)} to replace the current Optimizer: {opt_name}." - if index != len(self.cur_data) - 1: - self.cur_advice += "\n" - self.cur_bottleneck = self.cur_advice diff --git a/profiler/advisor/advisor_backend/timeline_advice/timeline_advice_base.py b/profiler/advisor/advisor_backend/timeline_advice/timeline_advice_base.py deleted file mode 100644 index 4c7ac96cd22673741accd6bb2abb463566a2e652..0000000000000000000000000000000000000000 --- a/profiler/advisor/advisor_backend/timeline_advice/timeline_advice_base.py +++ /dev/null @@ -1,99 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import abstractmethod -from collections import defaultdict -import json -import os - -from advice_base import AdviceBase -from common_func.file_manager import FileManager - - -class TimelineAdviceBase(AdviceBase): - class PREPARSE_TYPE: - OPTIMIZER = 0 - STEP = 1 - OVERLAP_CPT = 2 - OVERLAP_FREE = 3 - OVERLAP_CMU = 4 - ENQUEUE = 5 - DEQUEUE = 6 - HOST_TO_DEVICE = 7 - SYNCHRONIZE = 8 - - def __init__(self, collection_path: str): - super().__init__(collection_path) - self.trace_view_path = "" - self.has_preparse = False - self.preparse_data = defaultdict(list) - self.entry_map = { - 'Computing': self.PREPARSE_TYPE.OVERLAP_CPT, - 'Free': self.PREPARSE_TYPE.OVERLAP_FREE, - 'AscendCL@aclrtSynchronizeDevice': self.PREPARSE_TYPE.SYNCHRONIZE - } - - def path_check(self): - """ - check whether input path is valid - """ - if not os.path.exists(self.collection_path): - print("[ERROR] Path: {} is not exist.".format(self.collection_path)) - return False - if os.path.isdir(self.collection_path) and self.collection_path.endswith("ascend_pt"): - self.trace_view_path = os.path.join(self.collection_path, "ASCEND_PROFILER_OUTPUT", "trace_view.json") - if not os.path.exists(self.trace_view_path): - print("[ERROR] trace_view.json is not exist in the Path: {}.".format(os.path.join(self.collection_path, "ASCEND_PROFILER_OUTPUT"))) - return False - elif os.path.isfile(self.collection_path) and os.path.basename(self.collection_path) == "trace_view.json": - self.trace_view_path = self.collection_path - else: - print("[ERROR] Please input ascend_pt or trace_view.json.") - return False - print("[INFO] Start to analyse the target file: {}".format(self.trace_view_path)) - return True - - @abstractmethod - def run(self): - """ - analyze profiling data and advice - """ - - @abstractmethod - def output(self): - """ - output relevant data - """ - self.output_format_data[self.DATA] = self.cur_data - self.output_format_data[self.BOTTLENECK] = self.cur_bottleneck - self.output_format_data[self.ADVICE] = self.cur_advice - - def preparse(self): - if self.has_preparse: - return - json_reader = FileManager.read_json_file(self.trace_view_path) - if not isinstance(json_reader, list): - return - for entry in json_reader: - name = entry.get("name", None) - if not name: - continue - if name.startswith("Optimizer.step#") and name.endswith(".step"): - self.preparse_data[self.PREPARSE_TYPE.OPTIMIZER].append(entry) - elif name.startswith("ProfilerStep#"): - self.preparse_data[self.PREPARSE_TYPE.STEP].append(entry) - elif name in self.entry_map: - self.preparse_data[self.entry_map[name]].append(entry) - self.has_preparse = True diff --git a/profiler/advisor/analyzer/__init__.py b/profiler/advisor/analyzer/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/base_analyzer.py b/profiler/advisor/analyzer/base_analyzer.py deleted file mode 100644 index ada1b0bf4f4c8344c8830fe446c8d05dd583eac5..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/base_analyzer.py +++ /dev/null @@ -1,100 +0,0 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging -from functools import wraps -from typing import Dict, List, Union -from abc import abstractmethod, ABCMeta - -from profiler.advisor.common import constant -from profiler.advisor.common.version_control import VersionControl -from profiler.advisor.dataset.dataset import Dataset -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.display.html.render import HTMLRender - -logger = logging.getLogger() - - -class BaseAnalyzer(VersionControl, metaclass=ABCMeta): - _SUPPORT_VERSIONS = constant.SUPPORTED_CANN_VERSION - - dataset_cls_list = [] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs): - self.n_processes = n_processes - self.cann_version = kwargs.get("cann_version", constant.DEFAULT_CANN_VERSION) - self.torch_version = kwargs.get("torch_version", constant.DEFAULT_TORCH_VERSION) - self.html_render = HTMLRender() - self.collection_path = collection_path - self.kwargs = kwargs - self.dataset_list: Dict[str, List[Dataset]] = {} - self.init_dataset_list() - self.result = OptimizeResult() - self.record_list: Dict[str, List] = {} - - @classmethod - def check_data(cls, data_list: tuple): - """ - check if all data in data list is contained - :param data_list: data list to check - :return: func ptr if check success - """ - - def decorate(func): - - @wraps(func) - def wrapper(self, **kwargs): - data = self.dataset_list - if data is None: - return None - for data_key in data_list: - if data_key not in data: - return None - - logger.info("Enable analysis %s with %s", self.__class__.__name__, ",".join(data_list)) - return func(self) - - return wrapper - - return decorate - - @abstractmethod - def optimize(self, **kwargs): - pass - - def init_dataset_list(self)->None: - dataset_cls_list = self.dataset_cls_list - if len(dataset_cls_list) == 0: - logger.warning(f"Analyser: %s don't rely on any dataset!", self.__class__.__name__) - return - - for dataset_cls in dataset_cls_list: - if dataset_cls and callable(dataset_cls): - dataset = dataset_cls(collection_path=self.collection_path, data=self.dataset_list, **self.kwargs) - key = dataset_cls.get_key() - if key not in self.dataset_list: - self.dataset_list[key] = [] - self.dataset_list[key].append(dataset) - - @staticmethod - def get_first_data_by_key(data, key) -> Union[Dataset, None]: - """ - get the first member from data with key - :param data: input data - :param key: data key - :return: the first dataset in dataset list - """ - if key in data and len(data[key]) > 0: - return data[key][0] - return None diff --git a/profiler/advisor/analyzer/cluster/__init__.py b/profiler/advisor/analyzer/cluster/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/cluster/slow_link_analyser.py b/profiler/advisor/analyzer/cluster/slow_link_analyser.py deleted file mode 100644 index 0b585cbc7c5f136b15cd9eb035ea2dac5caa9e4e..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/cluster/slow_link_analyser.py +++ /dev/null @@ -1,126 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import defaultdict -from typing import Dict, List -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.common import constant -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.dataset.cluster.cluster_dataset import ClusterCommunicationDataset - - -class SlowLinkAnalyzer(BaseAnalyzer): - RDMA_TIME_MS = "RDMA time(ms)" - RDMA_SIZE_MB = "RDMA size(mb)" - SDMA_TIME_MS = "SDMA time(ms)" - SDMA_SIZE_MB = "SDMA size(mb)" - RDMA_BANDWIDTH = "RDMA bandwidth(GB/s)" - SDMA_BANDWIDTH = "SDMA bandwidth(GB/s)" - COMMUNICATION_BANDWIDTH_INFO = "Communication Bandwidth Info" - TRANSIT_TIME = "Transit Time(ms)" - TRANSIT_SIZE = "Transit Size(MB)" - SDMA = "SDMA" - RDMA = "RDMA" - SLOW_LINK_ANALYSIS = "slow_link_analysis" - dataset_cls_list = [ClusterCommunicationDataset] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs): - super().__init__(collection_path, n_processes, **kwargs) - key = ClusterCommunicationDataset.get_key() - self.communication_data_class = self.get_first_data_by_key(self.dataset_list, key) - self.rank_bw_dict = self.communication_data_class.get_data() - self.result = OptimizeResult() - self.bottelneck = '' - self.suggestion = '' - self.format_datas = [] - - def optimize(self, **kwargs): - if self.rank_bw_dict is None: - print("Slow link analysis failed due to data loading failure. \ - Please check your cluster_analysis_output folder. \ - If you are not concerned about this type of data, please ignore this message.") - return self.result - self.process() - self.format_datas = self.format_details() - self.make_record() - self.make_render() - return self.result - - def process(self): - if self.rank_bw_dict: - self.produce_bottleneck(self.RDMA_BANDWIDTH) - self.produce_bottleneck(self.SDMA_BANDWIDTH) - - def produce_bottleneck(self, link_type: str): - data_list = [rank_dict.get(link_type, 0) for rank_id, rank_dict in self.rank_bw_dict.items()] - if len(data_list) > 0: - avg_bw = round(sum(data_list) / len(data_list), 3) - else: - print("The slow link (identified bottleneck) cannot provide a bottleneck \ - because the analysis data is missing bandwidth information.") - return - self.bottelneck += f'{link_type}: \n' \ - f' The average is {avg_bw}, \n' \ - f' while the maximum is {round(max(data_list), 3)}GB/s \n' \ - f' and the minimum is {round(min(data_list), 3)}GB/s. \n' \ - f' the difference is {round(max(data_list) - min(data_list), 3)}GB/s. \n' - - def format_details(self): - if not self.rank_bw_dict: - return { - "headers": [], - "data": [] - } - - details_dict = {} - headers = list({k for rank_bw_value in self.rank_bw_dict.values() for k in rank_bw_value.keys()}) - headers.sort() - data_list = [[rank_id] + [rank_bw.get(k, 0) for k in headers] for rank_id, rank_bw in self.rank_bw_dict.items()] - data_list.sort(key = lambda x: x[0]) # 按rank_id排序 - - details_dict["headers"] = ["rank_id"] + headers - details_dict["data"] = data_list - - return details_dict - - def make_record(self): - """ - make record for what and how to optimize - """ - optimization_item = OptimizeItem( - SlowLinkAnalyzer.SLOW_LINK_ANALYSIS, - self.bottelneck, - self.suggestion - ) - self.result.add(OptimizeRecord(optimization_item)) - - for i, data in enumerate(self.format_datas["data"]): - self.result.add_detail(SlowLinkAnalyzer.SLOW_LINK_ANALYSIS, self.format_datas["headers"], data) - - def make_render(self): - result_for_html = { - "Description" : self.bottelneck, - "suggestion" : self.suggestion, - "details" : [self.format_datas] - } - - self.html_render.render_template(key="cluster", - title=SlowLinkAnalyzer.SLOW_LINK_ANALYSIS, - template_dir="templates", - template_name="cluster_analysis.html", - cann_version=self.cann_version, - torch_version=self.torch_version, - result=result_for_html) \ No newline at end of file diff --git a/profiler/advisor/analyzer/cluster/slow_rank_analyser.py b/profiler/advisor/analyzer/cluster/slow_rank_analyser.py deleted file mode 100644 index f439b31f7736ee4777d5ef10bf968738a76ae1b3..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/cluster/slow_rank_analyser.py +++ /dev/null @@ -1,112 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import defaultdict -from typing import Dict, List -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.common import constant -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.dataset.cluster.cluster_dataset import ClusterStepTraceTimeDataset - - -class SlowRankAnalyzer(BaseAnalyzer): - SLOW_RANK_ANALYSIS = "slow_rank_analysis" - RANK = "rank" - RATIO_THRESHOLD = 0.05 - BOTTLENECK_LIST = ['Computing', 'Communication', "Free"] - dataset_cls_list = [ClusterStepTraceTimeDataset] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs): - super().__init__(collection_path, n_processes, **kwargs) - key = ClusterStepTraceTimeDataset.get_key() - self.step_trace_class = self.get_first_data_by_key(self.dataset_list, key) - self.step_trace_dict = self.step_trace_class.get_data() - self.result = OptimizeResult() - self.bottelneck = '' - self.suggestion = '' - self.format_datas = [] - - def optimize(self, **kwargs): - if self.step_trace_dict is None: - print("slow_rank 分析失败,原因是数据加载失败,请检查你的cluster_analysis_outpu文件夹 \ - 如不关心这类数据请忽略") - return self.result - self.process() - self.format_datas = self.format_details() - self.make_record() - self.make_render() - return self.result - - def process(self): - total_time_list = [sum(data_tuple) for rank_id, data_tuple in self.step_trace_dict.items()] - if total_time_list: - mean_total_time = sum(total_time_list) / len(total_time_list) - for i in range(len(self.BOTTLENECK_LIST)): - self.produce_bottleneck(self.step_trace_dict, i, mean_total_time) - - def produce_bottleneck(self, step_dict: dict, produce_type: int, mean_total_time: float): - data_list = [data_tuple[produce_type] for rank_id, data_tuple in step_dict.items()] - max_ratio = self.compute_max_gap_ratio(data_list, mean_total_time) - if max_ratio > self.RATIO_THRESHOLD: - self.bottelneck += f'{self.BOTTLENECK_LIST[produce_type]} \n' \ - f' has some issues in the cluster, \n' \ - f' because the max difference of {self.BOTTLENECK_LIST[produce_type]} time \n' \ - f' has reached {round(max_ratio * mean_total_time / 1000, 3)}ms. \n' - - def make_record(self): - """ - make record for what and how to optimize - """ - optimization_item = OptimizeItem( - SlowRankAnalyzer.SLOW_RANK_ANALYSIS, - self.bottelneck, - self.suggestion - ) - self.result.add(OptimizeRecord(optimization_item)) - for i, data in enumerate(self.format_datas["data"]): - self.result.add_detail(SlowRankAnalyzer.SLOW_RANK_ANALYSIS, self.format_datas["headers"], data) - - def format_details(self): - details_dict = {} - headers = ["rank_id", "compute(us)", "communication(us)", "free(us)"] - data_list = [] - for key,value in self.step_trace_dict.items(): - data_list.append([key] + value) - details_dict["headers"] = headers - details_dict["data"] = data_list - return details_dict - - def make_render(self): - result_for_html = { - "Description" : self.bottelneck, - "suggestion" : self.suggestion, - "details" : [self.format_datas] - } - - self.html_render.render_template(key="cluster", - title=SlowRankAnalyzer.SLOW_RANK_ANALYSIS, - template_dir="templates", - template_name="cluster_analysis.html", - cann_version=self.cann_version, - torch_version=self.torch_version, - result=result_for_html) - - @staticmethod - def compute_max_gap_ratio(data: list, mean: float): - if mean == 0: - return 0 - else: - return (max(data) - min(data)) / mean diff --git a/profiler/advisor/analyzer/communication/__init__.py b/profiler/advisor/analyzer/communication/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/communication/bandwidth/__init__.py b/profiler/advisor/analyzer/communication/bandwidth/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/communication/environment/__init__.py b/profiler/advisor/analyzer/communication/environment/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/computation/__init__.py b/profiler/advisor/analyzer/computation/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/computation/ai_core_freq/__init__.py b/profiler/advisor/analyzer/computation/ai_core_freq/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/computation/ai_core_freq/ai_core_freq_analyzer.py b/profiler/advisor/analyzer/computation/ai_core_freq/ai_core_freq_analyzer.py deleted file mode 100644 index 4f25deff7c0cdb415ccae6ab748304d4044c5eec..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/ai_core_freq/ai_core_freq_analyzer.py +++ /dev/null @@ -1,36 +0,0 @@ -import logging - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.analyzer.computation.ai_core_freq.ai_core_freq_checker import AICoreFreqChecker -from profiler.advisor.display.html.render import HTMLRender -from profiler.advisor.dataset.ai_core_freq.ai_core_freq_dataset import AICoreFreqDataset -from profiler.advisor.config.config import Config - -logger = logging.getLogger() - - -class AICoreFreqAnalyzer(BaseAnalyzer): - dataset_cls_list = [AICoreFreqDataset] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs) -> None: - super().__init__(collection_path, n_processes, **kwargs) - key = AICoreFreqDataset.get_key() - self.dataset = self.get_first_data_by_key(self.dataset_list, key) - self.result = OptimizeResult() - self.html_render = HTMLRender() - self.html = None - - @BaseAnalyzer.check_data((AICoreFreqDataset.get_key(),)) - def optimize(self, **kwargs): - if not Config().get_config("aic_frequency"): - logger.warning("Can not find ai core frequency in info.json*, please check data integrity.") - return self.result - add_render_list = kwargs.get("add_render_list", True) - ai_core_freq_checker = AICoreFreqChecker() - ai_core_freq_checker.check_ai_core_freq(self.dataset) - if not ai_core_freq_checker.ai_core_freq_issues: - return self.result - ai_core_freq_checker.make_record(self.result) - self.html = ai_core_freq_checker.make_render(self.html_render, add_render_list) - return self.result diff --git a/profiler/advisor/analyzer/computation/ai_core_freq/ai_core_freq_checker.py b/profiler/advisor/analyzer/computation/ai_core_freq/ai_core_freq_checker.py deleted file mode 100644 index 5ea4dbd7542750469967b05ab9a738f2d70600e4..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/ai_core_freq/ai_core_freq_checker.py +++ /dev/null @@ -1,100 +0,0 @@ -import logging - -from profiler.advisor.dataset.ai_core_freq.ai_core_freq_dataset import AICoreFreqDataset -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.config.config import Config -from profiler.advisor.utils.utils import convert_to_float - -logger = logging.getLogger() - - -class AICoreFreqChecker: - DEFAULT_FREQ = 1800 - DECREASE_FREQ_RATIO = 0.05 - SHOW_TOPK_OPS = 10 - TOTAL_DURATION_INDEX = 2 - DECREASE_FREQ_RATIO_INDEX = 3 - - def __init__(self): - - self.ai_core_freq_issues = False - self.desc = "" - self.suggestions = "" - self.decrease_freq_ops = [] - self.headers = [] - self.op_freq = None - self.rank_id = None - self.stage = None - - def check_ai_core_freq(self, event_dataset: AICoreFreqDataset, rank_id=None, stage=None): - """ - :Param event_dataset: dataset of timeline event - """ - if not hasattr(event_dataset, "op_freq") or not getattr(event_dataset, "op_freq"): - logger.debug("Skip slow ai core frequency checker, " - "because no ai core frequency were recorded in trace_view.json") - return - - self.rank_id = rank_id - self.stage = stage - self.op_freq = event_dataset.op_freq - for op_name, op_info in self.op_freq.items(): - freq_list = op_info.get("freq_list", []) - if not freq_list: - continue - - op_count = op_info.get("count", 0) - op_total_duration = round(op_info.get("dur", 0), 2) - max_freq = max(self.DEFAULT_FREQ, convert_to_float(Config().get_config("aic_frequency"))) - - decrease_freq_ratio = sum(max_freq - freq for freq in freq_list) / (max_freq * len(freq_list)) - if decrease_freq_ratio >= self.DECREASE_FREQ_RATIO: - self.ai_core_freq_issues = True - self.decrease_freq_ops.append([op_name, op_count, op_total_duration, - f"{round(decrease_freq_ratio, 4):.2%}", - round(sum(freq_list) / len(freq_list), 2), - max(freq_list), min(freq_list)]) - - if self.decrease_freq_ops: - # 按算子总耗时和降频比率 降序排列 - self.decrease_freq_ops.sort(key= - lambda x: (x[self.TOTAL_DURATION_INDEX], x[self.DECREASE_FREQ_RATIO_INDEX]), - reverse=True) - - self.desc = (f"{len(self.decrease_freq_ops)} operators are found during frequency reduction, and the reduction " - f"ratio is larger than {self.DECREASE_FREQ_RATIO}.") - if self.rank_id: - self.desc = f"For rank {self.rank_id}, " + self.desc.lower() - self.suggestions = "Please check the temperature or max power of your machine." - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - optimization_item = OptimizeItem("AI Core Frequency", self.desc, [self.suggestions]) - result.add(OptimizeRecord(optimization_item)) - - self.headers = ["Operator name", "Count", "Total duration(us)", "AI CORE frequency decreased ratio", - "Average frequency", "Max frequency", "Min frequency"] - if self.rank_id: - self.headers = ["Rank id"] + self.headers - sub_table_name = "AI Core Frequency" if not self.stage else f"Stage-{self.stage}: AI Core Frequency" - result.add_detail(sub_table_name, headers=self.headers) - - for row in self.decrease_freq_ops: - if self.rank_id: - row = [self.rank_id] + row - result.add_detail(sub_table_name, detail=row) - - def make_render(self, html_render, add_render_list=True): - if self.SHOW_TOPK_OPS: - self.desc += f" Only show {self.SHOW_TOPK_OPS} operators here, see latest mstt_advisor.xlsx for details." - return html_render.render_template(key="computation", - template_dir="templates", - template_name="ai_core_frequency.html", - desc=self.desc, - suggestion=self.suggestions, - headers=self.headers, - data=self.decrease_freq_ops[:self.SHOW_TOPK_OPS], - add_render_list=add_render_list) diff --git a/profiler/advisor/analyzer/computation/aicpu/__init__.py b/profiler/advisor/analyzer/computation/aicpu/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/computation/aicpu/aicpu_checker.py b/profiler/advisor/analyzer/computation/aicpu/aicpu_checker.py deleted file mode 100644 index 0caede4b894e0dda15333e6d3a480fa943c66323..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/aicpu/aicpu_checker.py +++ /dev/null @@ -1,278 +0,0 @@ -import copy -import os -from functools import partial -from typing import List, Dict, Optional - -from profiler.advisor.analyzer.computation.operator_checker import OperatorChecker, logger -from profiler.advisor.analyzer.schedule.fusion_ops.timeline_api_stack_checker import OpStackFinder -from profiler.advisor.common import constant -from profiler.advisor.dataset.dataset import Dataset -from profiler.advisor.dataset.profiling.profiling_dataset import ProfilingDataset -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.cluster_analyse.common_func.file_manager import FileManager - - -class AicpuChecker(OperatorChecker): - _CHECKER = "aicpu operator" - _PROBLEM = "AICPU operator" - _MIN_TASK_DURATION = 20 - _description = f"Some operators and task duration exceed {_MIN_TASK_DURATION} us, such as :\n" - _SUGGESTION: List[str] = ["Modify code to avoid aicpu operator"] - STACK_INFO_ITEMS = "stack_info" - SUGGESTION_INFO_ITEMS = "suggestions" - _ITEMS = [ - "op_name", "op_type", "task_duration", "input_shapes", "input_data_types", "input_formats", "output_shapes", - "output_data_types", "output_formats" - ] - - def __init__(self, cann_version): - super(AicpuChecker, self).__init__(cann_version=cann_version) - self.aicpu_rules: Dict = {} - self.aicpu_checker: Dict = {} - self.load_aicpu_rules() - - def _check_data(self, profiling_data: ProfilingDataset) -> bool: - if not self._check_summary(profiling_data): - return False - return True - - def _check_operator(self, op_info) -> bool: - return op_info.task_type == constant.AI_CPU - - def load_aicpu_rules(self, rule_path="rules/aicpu_rules.yaml") -> Dict: - if not os.path.isabs(rule_path): - rule_path = os.path.join(os.path.dirname(__file__), - "../../../", rule_path) - - if not os.path.exists(rule_path): - logger.warning("Skip analyze aicpu issues, because %s does not exist.", rule_path) - return {} - - self.aicpu_rules = FileManager.read_yaml_file(rule_path) - self.filter_aicpu_rules(self.aicpu_rules) - for checker_name, check_rule in self.aicpu_rules.items(): - if not isinstance(check_rule, (list, dict,)): - continue - - if checker_name not in AICPU_CHECKER.keys(): - logger.warning("Skip %s, which is not support now.", checker_name) - continue - - self.aicpu_checker[checker_name] = AICPU_CHECKER[checker_name](check_rule) - - def filter_aicpu_rules(self, aicpu_rules): - support_checkers = [] - for checkers in aicpu_rules['CommonChecker']: - for key, value in checkers.items(): - if key == 'DataTypeChecker' and self.cann_version in value['cann_version']: - support_checkers.append(checkers) - aicpu_rules['CommonChecker'] = support_checkers - return - - def check_aicpu_attr(self, op_info) -> List[str]: - suggestions = [] - for _, checker in self.aicpu_checker.items(): - suggestions.extend(checker.check(op_info)) - return suggestions - - def check(self, profiling_data: ProfilingDataset) -> bool: - """ - check if any operator need optimize - :param profiling_data: profiling datasest - :return: true or false - """ - - if not self._check_data(profiling_data): - return False - op_summary = profiling_data.op_summary - - def get_opeartor_stack_info(api_stack_finder: OpStackFinder, op_name_list: list) -> list: - data: Dict[str, Dataset] = {} - event_dataset = TimelineEventDataset(collection_path=profiling_data.collection_path, data=data, task_type=constant.AI_CPU) - - # disable multiprocessing, avoid cost time of enable new process for light task - api_stack_finder.get_api_stack_by_op(event_dataset, op_name_list, constant.AI_CPU, - disable_multiprocess=True) - return api_stack_finder._stack_record - - self._op_list = [] - total_task_duration = 0.0 - max_task_duration = 0.0 - for op_info in op_summary.op_list: - if self._check_operator(op_info): - self._op_list.append(op_info) - - task_duration = float(op_info.task_duration) - total_task_duration += task_duration - max_task_duration = max(max_task_duration, task_duration) - if (not self._op_list) or (max_task_duration < self._MIN_TASK_DURATION): - return False - - # 获取所有算子堆栈的信息 - op_name_list = [] - for op in self._op_list: - if op.op_name not in op_name_list: - op_name_list.append(op.op_name) - api_stack_finder = OpStackFinder() - stack_record = get_opeartor_stack_info(api_stack_finder, op_name_list) - - # task_id 到 stack 信息的对应 - self._op_list.sort(key=lambda x: int(x.task_id)) - stack_record.sort(key=lambda x: x[0]) - task_id_to_stack = dict() - for stack in stack_record: - task_id_to_stack[stack[0]] = stack[-1] - - # 算子追加堆栈属性 - for op in self._op_list: - stack = task_id_to_stack.get(int(op.task_id)) - op.add_attr(self.STACK_INFO_ITEMS, stack) - suggestions = self.check_aicpu_attr(op) - op.add_attr(self.SUGGESTION_INFO_ITEMS, suggestions) - - # double 类型算子判断 - double_type_ai_cpu_operator = [] - for op in self._op_list: - if not op.has_attr("input_data_types"): - logger.warning( - "Skip checking of input data in AICPU checker because of not containing input_data_dtypes in op summary") - break - if op.has_attr( - "input_data_types") and "DOUBLE" in op.input_data_types and op.op_name not in double_type_ai_cpu_operator: - double_type_ai_cpu_operator.append(op.op_name) - if bool(double_type_ai_cpu_operator): - self._SUGGESTION.append("Try to convert double type operator to float, such as {}".format( - ",".join(double_type_ai_cpu_operator))) - return True - - def make_render(self, html_render, record): - html_render.render_template(key="computation", - template_dir="templates", - template_name="operator_ai_cpu.html", - format_result=self.format_operator_result(record, constant.OPERATOR_LIST_UNLIMIT)) - - def format_operator_result(self, record, limit): - """ - Format operator result to html - :param record: profiling check record - :param limit: Limit number of operator statistics lists. - :return: - """ - optimization_item = record.optimization_item - release_suggestion_list = [] - for suggestion in optimization_item.suggestion: - release_suggestion_list.append(suggestion.replace('\n', '
')) - logger.debug("suggestion list is %s", release_suggestion_list) - format_result = {"record": record.__dict__, "suggestion": '
'.join(release_suggestion_list), - "task_duration": round(record.statistics_item.task_duration, 2)} - - statistic = self.group_by(copy.deepcopy(self._op_list), op_key='op_type', - limit=limit) - format_result["statistic"] = statistic - stack_key_list = ["stack_info", "input_data_types", "output_data_types"] - if statistic: - for key, info in statistic: - op_info_list = self.group_by_list(info.get("op_info_list"), stack_key_list, limit) - info["op_info_list"] = op_info_list - return format_result - - def group_by_list(self, op_list, op_key_list: List = ["stack_info", "input_data_types", "output_data_types"], - limit: int = constant.OPERATOR_LIST_UNLIMIT): - if op_list is None: - op_list = [] - - # op_key_list 合并添加合并的属性,作为 groupby 的 key value - op_key = '+'.join(op_key_list) # str, json - for op_info in op_list: - attribute = "" - for _op in op_key_list: - if op_info.get_attr(_op): - attribute += op_info.get_attr(_op) - op_info.add_attr(op_key, attribute) - - return self.group_by(op_list, op_key=op_key, limit=limit) - - -class BaserChecker: - def __init__(self, *args, **kwargs): - self.checker_list = [] - - def build(self): - raise NotImplementedError - - def check(self, op_info) -> List[str]: - suggestions = [] - for checker in self.checker_list: - suggestion = checker(op_info) - if suggestion is not None: - suggestions.append(suggestion) - return suggestions - - -class CommonChecker(BaserChecker): - def __init__(self, check_rules: List[Dict] = None): - super(CommonChecker, self).__init__() - self.check_rules = check_rules if check_rules is not None else [] - self.supported_checker = dict(DataTypeChecker=self.datatype_checker) - self.build() - - @staticmethod - def datatype_checker(check_item: Dict, op_info) -> Optional[str]: - supported_op_type = check_item.get('op_type', []) - suggestion = check_item.get('suggestion', "") - valid_inputs = check_item.get('input', []) - valid_outputs = check_item.get('output', []) - ignore_type = check_item.get('ignore_type', []) - op_type = getattr(op_info, 'op_type', "UNKNOWN") - if "__ALL__" in supported_op_type or \ - op_type.lower() in supported_op_type: - if op_type.lower() in ignore_type: - return None - - op_input_dtype = getattr(op_info, 'input_data_types', "").split(";") - op_input_dtype = [item.lower() for item in op_input_dtype] - op_output_dtype = getattr(op_info, 'output_data_types', "").split(";") - op_output_dtype = [item.lower() for item in op_output_dtype] - input_dtype_diff = set(op_input_dtype).difference(set(valid_inputs)) - output_dtype_diff = set(op_output_dtype).difference(set(valid_outputs)) - unsupported_dtype_diff = input_dtype_diff.union(output_dtype_diff) - if not unsupported_dtype_diff: - return None - - return suggestion.format(",".join(unsupported_dtype_diff).upper(), - op_type, - ",".join(valid_inputs).upper()) - - def build(self): - for check in self.check_rules: - (check_func, check_rule), = check.items() - if check_func not in self.supported_checker: - logger.warning("Skip %s, which has not been implemented.", check_func) - continue - self.checker_list.append(partial(self.supported_checker.get(check_func), check_rule)) - - -class ExampleGuideChecker(BaserChecker): - def __init__(self, check_rules: List[Dict] = None): - super(ExampleGuideChecker, self).__init__() - self.check_rules = check_rules if check_rules is not None else [] - self.build() - - def build(self): - def _guide_url(check_item: Dict, op_info) -> Optional[str]: - supported_op_type = check_item.get('op_type', []) - url = check_item.get('url', "") - suggestion = check_item.get('suggestion', "") - - if getattr(op_info, 'op_type', "UNKNOWN").lower() in supported_op_type: - return suggestion if "{}" not in suggestion else suggestion.format(url) - - for check in self.check_rules: - (_, check_rule), = check.items() - self.checker_list.append(partial(_guide_url, check_rule)) - - -AICPU_CHECKER = { - "CommonChecker": CommonChecker, - "ExampleGuideChecker": ExampleGuideChecker -} diff --git a/profiler/advisor/analyzer/computation/bound/__init__.py b/profiler/advisor/analyzer/computation/bound/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/computation/bound/block_dim_checker.py b/profiler/advisor/analyzer/computation/bound/block_dim_checker.py deleted file mode 100644 index 7a873c65635fcc8f2ebb35c8d317de09d78da491..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/bound/block_dim_checker.py +++ /dev/null @@ -1,74 +0,0 @@ -import logging -from typing import List - -from profiler.advisor.analyzer.computation.operator_checker import OperatorChecker -from profiler.advisor.common import constant -from profiler.advisor.config.config import Config -from profiler.advisor.dataset.profiling.profiling_dataset import ProfilingDataset - -logger = logging.getLogger() - - -class BlockDimChecker(OperatorChecker): - _SUGGESTION: List[str] = [] - _CHECKER = "block dim" - _PROBLEM = "block dim" - _description = "some operator does not make full use of {} ai core" - _ITEMS = [ - "op_name", "op_type", "task_type", "task_duration", "income", "block_dim", "mix_block_dim", "input_shapes", - "input_data_types", "input_formats", "output_shapes", "output_data_types", "output_formats" - ] - - def pre_check(self, profiling_data) -> bool: - return not self.is_dynamic_shape(profiling_data) - - def _check_data(self, data): - self.format_suggestion_content(data) - if not self._check_summary(data): - return False - if not Config().get_config("ai_core_num"): - logger.warning(self.SKIP_CHECK_MSG, self._CHECKER, "ai core num in info.json file") - return False - summary = data.op_summary - op_info = summary.op_list[0] - if not hasattr(op_info, "block_dim"): - logger.warning(self.SKIP_CHECK_MSG, self._CHECKER, "block dim in op summary") - return False - if Config().get_config("ai_core_num"): - self._aicore_num = int(Config().get_config("ai_core_num")) - if Config().get_config("aiv_num"): - self._aiv_num = int(Config().get_config("aiv_num")) - self._description = self._description.format(self._aicore_num) - if self._aiv_num: - self._description += f" or {self._aiv_num} ai vector core" - self._description += f";\n Top-{OperatorChecker._MAX_TUNE_OP_NUM} operator of " \ - "task duration are as follows:\n" - return True - - def make_render(self, html_render, record): - html_render.render_template(key="computation", - template_dir="templates", - template_name="operator_block_dim.html", - format_result=self.format_operator_result(record, constant.OPERATOR_OUT_TOPK)) - - def _check_operator(self, op_info) -> bool: - if op_info.task_type not in ["AI_CORE", "AI_VECTOR_CORE", "MIX_AIC"]: - return False - block_dim = int(op_info.block_dim) - core_num = self.get_core_num(op_info) - if block_dim % core_num == 0: - return False - if op_info.task_type == "MIX_AIC" and hasattr(op_info, "mix_block_dim") \ - and self._aiv_num and int(op_info.mix_block_dim) % self._aiv_num == 0: - return False - return True - - def get_core_num(self, op_info): - """ - get core num of task type - """ - if op_info.task_type == "AI_CORE" or not self._aiv_num: - core_num = self._aicore_num - else: - core_num = self._aiv_num - return core_num diff --git a/profiler/advisor/analyzer/computation/bound/operator_bound_checker.py b/profiler/advisor/analyzer/computation/bound/operator_bound_checker.py deleted file mode 100644 index a22b380f974b14207d6d7be262cd49f0ba0fbe99..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/bound/operator_bound_checker.py +++ /dev/null @@ -1,53 +0,0 @@ -import logging -from typing import List - -from profiler.advisor.analyzer.computation.operator_checker import OperatorChecker -from profiler.advisor.common import constant -from profiler.advisor.config.config import Config -from profiler.advisor.dataset.profiling.profiling_dataset import ProfilingDataset -from profiler.advisor.utils.utils import to_percent - -logger = logging.getLogger() - - -class OperatorBoundChecker(OperatorChecker): - _MIN_TASK_DURATION = 20 # min task duration 20us - _CHECKER = "operator no bound" - _PROBLEM = "operator no bound" - _SUGGESTION: List[str] = [] - _description = ( - f"There is no mte, cube, vector, scalar ratio is more than {to_percent(Config().operator_bound_ratio)};\n" + - f"Top task duration operators need to be tuned are as follows: \n") - _ITEMS = [ - "op_name", "op_type", "task_type", "task_duration", "vec_ratio", "mac_ratio", "scalar_ratio", "mte1_ratio", - "mte2_ratio", "mte3_ratio", "block_dim", "input_shapes", "input_data_types", "input_formats", "output_shapes", - "output_data_types", "output_formats" - ] - - def pre_check(self, profiling_data) -> bool: - return not self.is_dynamic_shape(profiling_data) - - def _check_data(self, data): - self.format_suggestion_content(data) - if not self._check_summary(data): - return False - for op_info in data.op_summary.op_list: - return self._check_operator(op_info) - - logger.warning(self.SKIP_CHECK_MSG, self._CHECKER, "ratio in op summary") - return False - - def _check_operator(self, op_info) -> bool: - bound_list = ["vec_ratio", "mac_ratio", "scalar_ratio", "mte1_ratio", "mte2_ratio", "mte3_ratio"] - ratio_list = [self.get_ratio(op_info, attr) for attr in bound_list] - if not any(ratio_list): - return False # no data, skip check - if any(ratio and ratio > Config().operator_bound_ratio for ratio in ratio_list): - return False - return True - - def make_render(self, html_render, record): - html_render.render_template(key="computation", - template_dir="templates", - template_name="operator_no_bound.html", - format_result=self.format_operator_result(record, constant.OPERATOR_OUT_TOPK)) diff --git a/profiler/advisor/analyzer/computation/op_compile/__init__.py b/profiler/advisor/analyzer/computation/op_compile/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/computation/op_compile/dynamic_shape_checker.py b/profiler/advisor/analyzer/computation/op_compile/dynamic_shape_checker.py deleted file mode 100644 index 86d3bac4ff8cb163d23a6365307b855839b12a6a..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/op_compile/dynamic_shape_checker.py +++ /dev/null @@ -1,65 +0,0 @@ -import copy -import logging -from typing import List - -from profiler.advisor.analyzer.computation.operator_checker import OperatorChecker -from profiler.advisor.common import constant -from profiler.advisor.dataset.profiling.info_collection import OpInfo -from profiler.advisor.result.item import OptimizeItem, StatisticsItem, OptimizeRecord - -logger = logging.getLogger() - - -class DynamicShapeChecker(OperatorChecker): - ENABLE_COMPILED_SUGGESTION = "Optimize by enabling compiled operator, such as:\n" \ - "`torch_npu.npu.set_compile_mode(jit_compile=False)`\n" - _SUGGESTION: List[str] = [ENABLE_COMPILED_SUGGESTION] - _CHECKER = "dynamic shape operator" - _PROBLEM = "Dynamic shape operator" - _description = f"Found all operators are dynamic shape" - _op_list: List[OpInfo] = [] - _tune_op_list: List[str] = [] # record op name to be tuned, and save to tune_ops_file.cfg - _op_views: List = [] - - def __init__(self, cann_version) -> None: - super().__init__(cann_version=cann_version) - - def check(self, profiling_database) -> bool: - return self.is_dynamic_shape(profiling_database) - - def make_record(self, profiling_database) -> OptimizeRecord: - """ - make record for what and how to optimize - """ - - optimization_item = OptimizeItem( - self._PROBLEM, - self._description, - self._SUGGESTION - ) - statistics_item = StatisticsItem("", "", 1) - return OptimizeRecord(optimization_item, statistics_item) - - def format_operator_result(self, record, limit=-1): - """ - Format operator result to html - :param record: profiling check record - :param limit: Limit number of operator statistics lists. - :return: - """ - optimization_item = record.optimization_item - release_suggestion_list = [] - for suggestion in optimization_item.suggestion: - release_suggestion = copy.deepcopy(suggestion) - if release_suggestion == DynamicShapeChecker.ENABLE_COMPILED_SUGGESTION: - release_suggestion += \ - f"for details please refer to link : LINK" - release_suggestion_list.append(release_suggestion.replace('\n', '
')) - format_result = {"record": record.__dict__, "suggestion": '
'.join(release_suggestion_list)} - return format_result - - def make_render(self, html_render, record): - html_render.render_template(key="computation", - template_dir="templates", - template_name="operator_dynamic_shape.html", - format_result=self.format_operator_result(record)) diff --git a/profiler/advisor/analyzer/computation/operator_checker.py b/profiler/advisor/analyzer/computation/operator_checker.py deleted file mode 100644 index 64618b56a8df7f380277e99ae7ca47cd69d24648..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/operator_checker.py +++ /dev/null @@ -1,307 +0,0 @@ -import copy -import logging -from textwrap import fill -from typing import List - -from profiler.advisor.common import constant -from profiler.advisor.common.version_control import VersionControl -from profiler.advisor.config.config import Config -from profiler.advisor.dataset.profiling.info_collection import OpInfo -from profiler.advisor.dataset.profiling.profiling_dataset import ProfilingDataset -from profiler.advisor.result.item import OptimizeItem, StatisticsItem, OptimizeRecord -from profiler.advisor.utils.utils import safe_division - -logger = logging.getLogger() - - -class OperatorChecker(VersionControl): - _SUPPORT_VERSIONS = constant.SUPPORTED_CANN_VERSION - _MAX_TUNE_OP_NUM = constant.OPERATOR_OUT_TOPK - _MIN_TASK_DURATION = 0 - _MIN_TASK_DURATION_RATIO = 1.0 - _MIN_TOTAL_DURATION_RATIO = 1.0 - _CHECKER = str() - _PROBLEM = str() - _description = str() - STACK_INFO_ITEMS = "" - _ITEMS: List[str] = [] - _SUGGESTION: List[str] = [] - SKIP_CHECK_MSG = "Skip %s checker because of not containing %s" - _tune_op_info_list: List[OpInfo] = [] - PyTorch_OPERATOR_TUNE_SUGGESTION = f"Optimize operator by AOE, such as:\n" \ - f"'aoe --job_type=2 --model_path=$user_dump_path " \ - f"--tune_ops_file={Config().tune_ops_file}'\n" - MSLite_OPERATOR_TUNE_SUGGESTION = f"Optimize operator by AOE in mindspore lite framework, such as:\n" \ - f"converter_lite --fmk=ONNX --optimize=ascend_oriented --saveType=MINDIR " \ - f"--modelFile=$user_model.onnx --outputFile=user_model --configFile=./config.txt\n" - _tune_op_list: List[str] = [] - - def __init__(self, cann_version: str): - self.cann_version = cann_version - self._op_list: List[OpInfo] = [] - - def check(self, profiling_data: ProfilingDataset) -> bool: - """ - check if any operator need optimize - :param profiling_data: profiling datasest - :return: true or false - """ - if not self._check_data(profiling_data): - return False - - summary = profiling_data.op_summary - total_task_duration = 0.0 - max_task_duration = 0.0 - for op_info in summary.op_list: - if not self._check_operator(op_info): - continue - task_duration = float(op_info.task_duration) - total_task_duration += task_duration - max_task_duration = max(max_task_duration, task_duration) - self._op_list.append(op_info) - if task_duration > self._MIN_TASK_DURATION: - self._tune_op_info_list.append(op_info) - - if any([ - max_task_duration > self._MIN_TASK_DURATION, - round(safe_division(max_task_duration, summary.get_total_task_duration()), - 4) > self._MIN_TASK_DURATION_RATIO, - round(safe_division(total_task_duration, summary.get_total_task_duration()), 4) > - self._MIN_TOTAL_DURATION_RATIO, - ]): - self._op_list.sort(key=lambda x: float(x.get_attr("task_duration")), reverse=True) - self._tune_op_info_list.sort(key=lambda x: float(x.get_attr("task_duration")), reverse=True) - for op in self._op_list: - if op.op_name not in self._tune_op_list and len(self._tune_op_list) < constant.OPERATOR_OUT_TOPK: - self._tune_op_list.append(op.op_name) - return True - return False - - def make_record(self, profiling_data: ProfilingDataset): - """ - Make record for what and how to optimize - :param profiling_data: profiling data - :return: optimize record - """ - task_duration_list = [float(op_info.get_attr("task_duration")) for op_info in self._op_list if - hasattr(op_info, "get_attr")] - total_cost_time = sum(task_duration_list) - total_task_duration = profiling_data.op_summary.get_total_task_duration() - count = len(task_duration_list) - statistics_item = StatisticsItem(total_task_duration, total_cost_time, count, self.get_incomes()) - optimization_item = OptimizeItem( - self._PROBLEM, - self._get_description(self._description, self.get_op_type_list(self._op_list)[:self._MAX_TUNE_OP_NUM]), - self._SUGGESTION - ) - return OptimizeRecord(optimization_item, statistics_item) - - def _get_description(self, description, op_type_list=None): - if not op_type_list: - return description - - desc_suffix = [] - for i in range(len(op_type_list)): - if i % 3 == 0 and i != 0: - desc_suffix.append("\n") - - desc_suffix.append(f"{op_type_list[i]}") - - if i < len(op_type_list) - 1: - desc_suffix.append(", ") - - description += "".join(desc_suffix) - return description - - def pre_check(self, profiling_data) -> bool: - return True - - def is_dynamic_shape(self, profiling_database: ProfilingDataset) -> bool: - less_than_cann800_list = [constant.CANN_VERSION_C30, constant.CANN_VERSION_C13, constant.CANN_VERSION_C15] - # CANN 8.0.RC1 之前从 ge_info 中获取 op_state 属性,进行动态 shape 逻辑判断 - if self.cann_version in less_than_cann800_list: - if hasattr(profiling_database, "ge_info"): - ge_info = profiling_database.ge_info - static_shape_operators = ge_info.get_static_shape_operators() - if len(static_shape_operators) == 0: - return True - else: - logger.warning( - "Skip dynamic shape check because of not containing ge_info.db file in host filefloder.\n" - "To enable dynamic shape check, please try to set data_simplification=False in experimental_config.\n" - "More details please refer to link : %s", constant.ASCEND_PROFILER_URL) - else: - # CANN 8.0.RC1 之后 op_state 属性从 op_summary 文件中获取 - if hasattr(profiling_database, "op_summary"): - static_shape_operators = profiling_database.op_summary.get_static_shape_operators() - if len(static_shape_operators) == 0: - return True - else: - logger.warning( - "Skip dynamic shape check because of not containing op_summary.csv file in current filefloder." - ) - return False - - def format_operator_result(self, record, limit): - """ - Format operator result to html - :param record: profiling check record - :param limit: Limit number of operator statistics lists. - :return: - """ - optimization_item = record.optimization_item - release_suggestion_list = [] - for suggestion in optimization_item.suggestion: - release_suggestion = copy.deepcopy(suggestion) - if release_suggestion == OperatorChecker.PyTorch_OPERATOR_TUNE_SUGGESTION: - release_suggestion += \ - (f"for details please refer to link : LINK") - elif release_suggestion == OperatorChecker.MSLite_OPERATOR_TUNE_SUGGESTION: - release_suggestion += \ - (f"\nThe config file for MSLite AOE usage is as follows:\n" \ - f"[ascend_context]\n" \ - f"aoe_mode=\"operator tuning\"\n" \ - f"--tune_ops_file={Config().tune_ops_file}\n" - f"\nFor details please refer to link : LINK") - release_suggestion_list.append(release_suggestion.replace('\n', '
')) - format_result = {"record": record.__dict__, - "suggestion": fill('
'.join(release_suggestion_list), width=200), - "task_duration": round(record.statistics_item.task_duration, 2)} - statistic = self.group_by(copy.deepcopy(self._op_list), limit=limit) - format_result["statistic"] = statistic - return format_result - - def group_by(self, op_list, op_key="op_type", - limit: int = constant.OPERATOR_LIST_UNLIMIT): - """ - group by Profiling.OpInfo's attribute key, then return top limit tuple by duration - :param op_list: input a OpInfo list - :param op_key: group by Profiling.OpInfo's attribute key - :param limit: top limit num, if you do not need to limit the length of tuple, input -1(int) - :return: - """ - if op_list is None: - op_list = [] - statistic = {} # str, json - for op_info in op_list: - if statistic.get(op_info.get_attr(op_key)): - statistic[op_info.get_attr(op_key)]["summary"]["total_duration"] = float( - statistic[op_info.get_attr(op_key)]["summary"]["total_duration"]) + float( - op_info.get_attr("task_duration", constant.DEFAULT_DURATION_ZERO)) - statistic[op_info.get_attr(op_key)]["summary"]["counts"] += 1 - stack_info = op_info.get_attr("stack_info") - if stack_info: - op_info.stack_info = stack_info.replace('\r\n', '
') - statistic[op_info.get_attr(op_key)]["op_info_list"].append(op_info) - else: - statistic[op_info.get_attr(op_key)] = {"summary": {}, "op_info_list": []} - statistic[op_info.get_attr(op_key)]["summary"]["op_type"] = op_info.get_attr( - "op_type", constant.DEFAULT_OPERATOR_TYPE) - statistic[op_info.get_attr(op_key)]["summary"]["total_duration"] = float( - op_info.get_attr("task_duration", constant.DEFAULT_DURATION_ZERO)) - statistic[op_info.get_attr(op_key)]["summary"]["counts"] = 1 - stack_info = op_info.get_attr("stack_info") - if stack_info: - op_info.stack_info = stack_info.replace('\r\n', '
') - statistic[op_info.get_attr(op_key)]["op_info_list"] = [op_info] - - if statistic: - for op_key in statistic.keys(): - statistic[op_key]["summary"]["total_duration"] = round( - statistic[op_key]["summary"]["total_duration"], 2) - # Grouped by op_type, sorted by total_duration, and obtained the top 10 operators that take the most time. - if limit > 0: - statistic = sorted( - statistic.items(), key=lambda kv: kv[1]["summary"]["total_duration"], reverse=True)[:limit] - else: - statistic = sorted(statistic.items(), key=lambda kv: kv[1]["summary"]["total_duration"], reverse=True) - else: - logger.warning("%s checker do not has results to format html", str(self.__class__.__name__)) - return statistic - - def _check_data(self, profiling_data): - return True - - def _check_operator(self, op_info): - return False - - def _get_income(self, _op_info: OpInfo) -> float: - return 0 - - def get_tune_op_list(self): - """ - get tune op list - :return: tune op list - """ - return self._tune_op_list - - def get_views(self, _graph_data): - """Get node views.""" - return [] - - @classmethod - def get_name(cls): - """ - get name of checker - :return: checker name - """ - return cls._PROBLEM - - def get_incomes(self) -> float: - """get incomes""" - incomes = 0.0 - for op_info in self._op_list: - income = self._get_income(op_info) - setattr(op_info, "income", round(income, 2)) - incomes += income - return incomes - - def get_op_type_list(self, op_list: List[OpInfo]): - """get op type list""" - op_type_list = [] - for op_info in op_list: - if op_info.op_type not in op_type_list: - op_type_list.append(op_info.op_type) - return op_type_list - - def _check_summary(self, data: ProfilingDataset): - if not hasattr(data, "op_summary"): - logger.warning(self.SKIP_CHECK_MSG, self._CHECKER, "op summary") - return False - return True - - @staticmethod - def get_ratio(op_info: OpInfo, attr: str) -> float: - if not op_info.has_attr(attr): - return 0 - value = op_info.get_attr(attr) - if not value or value == "N/A": - return 0 - return float(value) - - def get_details(self) -> list: - """ - get details of operator to be optimized - :return: detail list - """ - op_list = self._op_list - if not op_list or not (self._ITEMS + [self.STACK_INFO_ITEMS]): - return [] - details = [] - attrs = [attr for attr in (self._ITEMS + [self.STACK_INFO_ITEMS]) if op_list[0].has_attr(attr)] - details.append(attrs) - op_list = sorted(op_list, key=lambda x: float(x.get_attr("task_duration")), reverse=True) - for op_info in op_list: - content = [ - op_info.get_attr(attr) if attr != "aicore_time" - else op_info.get_float_attr(attr, strict_mode=True) + - op_info.get_float_attr("aiv_time", strict_mode=True) for attr in attrs - ] - details.append(content) - return details - - def format_suggestion_content(self, profiling_data: ProfilingDataset) -> None: - if profiling_data.PROF_TYPE == constant.ASCEND_PYTORCH_PROFILER: - self._SUGGESTION.append(self.PyTorch_OPERATOR_TUNE_SUGGESTION) - elif profiling_data.PROF_TYPE == constant.MSLITE: - self._SUGGESTION.append(self.MSLite_OPERATOR_TUNE_SUGGESTION) diff --git a/profiler/advisor/analyzer/computation/profiling_analyzer.py b/profiler/advisor/analyzer/computation/profiling_analyzer.py deleted file mode 100644 index 2021bcd5765d1df7489f202b3453a83924fb28dc..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/computation/profiling_analyzer.py +++ /dev/null @@ -1,86 +0,0 @@ -import logging -from abc import ABC - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.analyzer.computation.aicpu.aicpu_checker import AicpuChecker -from profiler.advisor.analyzer.computation.bound.block_dim_checker import BlockDimChecker -from profiler.advisor.analyzer.computation.bound.operator_bound_checker import OperatorBoundChecker -from profiler.advisor.analyzer.computation.op_compile.dynamic_shape_checker import DynamicShapeChecker -from profiler.advisor.analyzer.computation.operator_checker import OperatorChecker -from profiler.advisor.display.html.render import HTMLRender -from profiler.advisor.dataset.profiling.profiling_dataset import ProfilingDataset - -logger = logging.getLogger() - - -class ProfilingAnalyzer(BaseAnalyzer, ABC): - dataset_cls_list = [ProfilingDataset] - - def __init__(self, collection_path, **kwargs) -> None: - super().__init__(collection_path, **kwargs) - self.checker = OperatorChecker(self.cann_version) - self.html_render = HTMLRender() - self.result = OptimizeResult() - - @BaseAnalyzer.check_data((ProfilingDataset.get_key(),)) - def optimize(self, **kwargs) -> OptimizeResult: - """ - optimize operator - :param data: input datasets - :return: result - """ - profiling_data = self.get_first_data_by_key(self.dataset_list, ProfilingDataset.get_key()) - checker = self.checker - if not checker.pre_check(profiling_data): - return self.result - if checker.check(profiling_data): - # add record - record = checker.make_record(profiling_data) - checker.make_render(self.html_render, record) - self.result.add(record) - # add details - details = checker.get_details() - if details: - for i, detail in enumerate(details): - if i == 0: - # the first row is header - self.result.add_detail(checker.get_name(), headers=detail) - else: - self.result.add_detail(checker.get_name(), detail=detail) - # add tune op list - tune_op_list = checker.get_tune_op_list() - if tune_op_list: - self.result.add_tune_op_list(tune_op_list) - - return self.result - - def make_record(self): - pass - - def make_render(self): - pass - - -class DynamicShapeAnalyzer(ProfilingAnalyzer): - def __init__(self, collection_path, **kwargs) -> None: - super().__init__(collection_path, **kwargs) - self.checker = DynamicShapeChecker(self.cann_version) - - -class BlockDimAnalyzer(ProfilingAnalyzer): - def __init__(self, collection_path, **kwargs) -> None: - super().__init__(collection_path, **kwargs) - self.checker = BlockDimChecker(self.cann_version) - - -class OperatorBoundAnalyzer(ProfilingAnalyzer): - def __init__(self, collection_path, **kwargs) -> None: - super().__init__(collection_path, **kwargs) - self.checker = OperatorBoundChecker(self.cann_version) - - -class AicpuAnalyzer(ProfilingAnalyzer): - def __init__(self, collection_path, **kwargs) -> None: - super().__init__(collection_path, **kwargs) - self.checker = AicpuChecker(self.cann_version) diff --git a/profiler/advisor/analyzer/dataloader/__init__.py b/profiler/advisor/analyzer/dataloader/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/dataloader/dataloader_analyzer.py b/profiler/advisor/analyzer/dataloader/dataloader_analyzer.py deleted file mode 100644 index 291c3a1f941cf1934c0c91b7603b6270ee66f3fb..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/dataloader/dataloader_analyzer.py +++ /dev/null @@ -1,30 +0,0 @@ -import logging - -from typing import List, Dict, Any - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.analyzer.dataloader.dataloader_checker import DataloaderChecker -from profiler.advisor.display.html.render import HTMLRender -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset - -logger = logging.getLogger() - - -class DataloaderAnalyzer(BaseAnalyzer): - dataset_cls_list = [TimelineEventDataset] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs) -> None: - super().__init__(collection_path, n_processes, **kwargs) - key = TimelineEventDataset.get_key() - self.dataset = self.get_first_data_by_key(self.dataset_list, key) - self.result = OptimizeResult() - self.html_render = HTMLRender() - - @BaseAnalyzer.check_data((TimelineEventDataset.get_key(),)) - def optimize(self, **kwargs): - dataloader_checker = DataloaderChecker() - dataloader_checker.check_slow_dataloader(self.dataset) - dataloader_checker.make_record(self.result) - dataloader_checker.make_render(self.html_render) - return self.result diff --git a/profiler/advisor/analyzer/dataloader/dataloader_checker.py b/profiler/advisor/analyzer/dataloader/dataloader_checker.py deleted file mode 100644 index eb1886284ef5d508f911d0c353df4574fd4a8bd3..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/dataloader/dataloader_checker.py +++ /dev/null @@ -1,84 +0,0 @@ -import os -import re -import logging -import yaml - -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.cluster_analyse.common_func.file_manager import FileManager - -logger = logging.getLogger() - - -class DataloaderChecker: - - def __init__(self): - - self.dataloader_issues = False - self.optimization_item = [] - self.desc = "" - self.suggestions = [] - self.dataloader_duration_threshold = None - self._init_rule() - - def check_slow_dataloader(self, event_dataset: TimelineEventDataset): - """ - :Param event_dataset: dataset of timeline event - """ - if not hasattr(event_dataset, "dataloader") or not getattr(event_dataset, "dataloader"): - logger.debug("Skip slow dataloader checker, because no dataloader duration larger than %s", - self.dataloader_duration_threshold) - return - for event in event_dataset.dataloader: - - dataloader_duration = float(event.dur) / 1000 - if dataloader_duration < self.dataloader_duration_threshold: - continue - self.desc = self.desc.format(dataloader_duration=dataloader_duration, - dataloader_duration_threshold=self.dataloader_duration_threshold) - self.dataloader_issues = True - - if re.search("singleprocess", event.name.lower()): - self.suggestions = self._reset_suggestions(["I/O", "num_workers"]) - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self.dataloader_issues: - return - - self.optimization_item.append(OptimizeItem("Slow dataloader", self.desc, self.suggestions)) - for optimization in self.optimization_item: - result.add(OptimizeRecord(optimization)) - - def make_render(self, html_render): - if not self.dataloader_issues: - return - html_render.render_template(key="dataloader", - template_dir="templates", - template_name="slow_dataloader.html", - desc=self.desc, - suggestions=self.suggestions) - - def _init_rule(self): - dataloader_rule_path = os.path.join( - os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))), - "rules", - "dataloader.yaml" - ) - dataloader_rule = FileManager.read_yaml_file(dataloader_rule_path) - - self.dataloader_duration_threshold = dataloader_rule.get("dataloader_duration_threshold") - self.desc = dataloader_rule.get("problem") - self.suggestions = dataloader_rule.get("solutions") - - def _reset_suggestions(self, suggestion_pattern_list): - - suggestions = [] - for solution in self.suggestions: - for suggestion_pattern in suggestion_pattern_list: - if re.search(suggestion_pattern, solution): - suggestions.append(solution) - return suggestions diff --git a/profiler/advisor/analyzer/graph_fusion/__init__.py b/profiler/advisor/analyzer/graph_fusion/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/graph_fusion/graph_fusion_analyzer.py b/profiler/advisor/analyzer/graph_fusion/graph_fusion_analyzer.py deleted file mode 100644 index 326be83b8d49088b1563ccd8c08b68a4aa3001ef..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/graph_fusion/graph_fusion_analyzer.py +++ /dev/null @@ -1,49 +0,0 @@ -from typing import List -from functools import partial - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.dataset.graph_dataset import GraphDataset -from profiler.advisor.analyzer.graph_fusion.graph_fusion_checker import GraphFusionRules -from profiler.advisor.dataset.profiling.profiling_dataset import ProfilingDataset -from profiler.advisor.display.html.render import HTMLRender - - -class FusionOPAnalyzer(BaseAnalyzer): - """ - fusion optimizer - """ - RULES = dict(graph_dataset=partial(GraphFusionRules, "rules/op_fusion_pass.yaml")) - dataset_cls_list = [GraphDataset, ProfilingDataset] - - def __init__(self, collection_path, **kwargs) -> None: - super(FusionOPAnalyzer, self).__init__(collection_path, **kwargs) - self.result = OptimizeResult() - self.html_render = HTMLRender() - - @BaseAnalyzer.check_data((GraphDataset.get_key(),)) - def optimize(self, **kwargs): - """ - :return: result - """ - self._check(self.dataset_list.get("GraphDataset"), self.dataset_list.get("ProfilingDataset")) - return self.result - - def _check(self, graph_data: List[GraphDataset], - profiling_data: List[ProfilingDataset] = None) -> None: - if len(graph_data) == 0 or graph_data[0].is_empty(): - return - for _, rule in self.RULES.items(): - checker = rule() - if profiling_data is None: - checker.find_fusion_matched_issues(graph_data) - else: - checker.find_fusion_matched_issues_with_times(graph_data, profiling_data) - checker.make_record(self.result) - checker.make_render(self.html_render) - - def make_record(self): - pass - - def make_render(self): - pass diff --git a/profiler/advisor/analyzer/graph_fusion/graph_fusion_checker.py b/profiler/advisor/analyzer/graph_fusion/graph_fusion_checker.py deleted file mode 100644 index 30bd4323795c28df8f476eafd2d43027b8682a32..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/graph_fusion/graph_fusion_checker.py +++ /dev/null @@ -1,207 +0,0 @@ -import logging -from typing import List - -from tqdm import tqdm - -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord, StatisticsItem -from profiler.advisor.common.graph.graph import Graph -from profiler.advisor.common.graph.graph_parser import QueryGraphParser -from profiler.advisor.dataset.graph_dataset import GraphDataset -from profiler.advisor.common.graph.graph_match import find_isomorphisms - -logger = logging.getLogger() - - -class GraphFusionRules: - def __init__(self, fusion_rules: str): - self.fusion_rules = fusion_rules - self.candidates = [] - self.task_duration_list = [] - - @staticmethod - def build_query_graph(query_graphs) -> List[Graph]: - for _, query_graph in query_graphs.fusion_rules.items(): - for sub_graph in query_graph: - graph = Graph(*sub_graph) - graph.build() - yield graph - - def find_fusion_matched_issues(self, graphs: List[GraphDataset]): - query_graphs = QueryGraphParser(self.fusion_rules) - with tqdm(total=query_graphs.num_rules, leave=False, ncols=100, unit=" rules") as pbar: - pbar.set_description(f"Searching Isomorphic Subgraph") - for query_graph in self.build_query_graph(query_graphs): - query_candidates = find_isomorphisms(query_graph.graph, graphs[0].graphs[-1].graph) - pbar.update(1) - if len(query_candidates) > 0: - self.candidates.append(query_candidates) - - def find_fusion_matched_issues_with_times(self, graphs: List[GraphDataset], profiling): - self.find_fusion_matched_issues(graphs) - if len(self.candidates) == 0 or len(profiling) == 0: - return - - if not hasattr(profiling[0], 'op_summary') or profiling[0].op_summary is None: - if hasattr(profiling[0], 'msprof'): - self.match_time_from_msprof(profiling[0].msprof) - return - else: - logger.warning("Skip analyze operator because of not containing op summary.") - return - - self.match_time_from_summary(profiling[0].op_summary) - time_duration_sum = [] - for task_duration in self.task_duration_list: - time_duration_sum.append(sum([sum(duration) for duration in task_duration])) - time_duration_index = sorted(range(len(time_duration_sum)), - key=time_duration_sum.__getitem__, - reverse=True) - self.task_duration_list = [self.task_duration_list[i] for i in time_duration_index] - self.candidates = [self.candidates[i] for i in time_duration_index] - - def match_time_from_summary(self, op_summary): - op_dict = op_summary.task_dict - for candidates in self.candidates: - candidate_duration = [] - for candidate in candidates: - duration_list = [] - for node in candidate.values(): - if node.op_name not in op_dict or op_dict[node.op_name][0].op_type.lower() != node.op_type.lower(): - logger.warning("Operator %s is missing in op summary, which will be set to 0.", node.op_name) - duration_list.append(0.0) - continue - duration_list.append(float(op_dict[node.op_name][0].task_duration)) - candidate_duration.append(duration_list) - self.task_duration_list.append(candidate_duration) - - def match_time_from_msprof(self, msprof): - op_dict = dict() - for task in msprof.tasks: - if "item_id" not in task.args: - continue - op_dict[task.args["item_id"]] = {"task_duration": task.dur} - for candidates in self.candidates: - candidate_duration = [] - for candidate in candidates: - duration_list = [] - for node in candidate.values(): - if node.op_name not in op_dict: - logger.warning("Operator %s is missing in msprof, which will be set to 0.", node.op_name) - duration_list.append(0.0) - continue - duration_list.append(float(op_dict[node.op_name].get("task_duration"))) - candidate_duration.append(duration_list) - self.task_duration_list.append(candidate_duration) - - def make_render(self, html_render): - if not self.candidates: - return - - candidates_list = [] - for case_id, nodes in enumerate(self.candidates): - candidate_dict = dict() - candidate_dict['counts'] = len(nodes) - candidate_dict['matches'] = [] - has_time_info = False - if self.task_duration_list: - has_time_info = True - candidate_dict['total_duration'] = round(sum(sum(duration) for duration in - self.task_duration_list[case_id]), 2) - for node_index, refer_node in enumerate(nodes): - match = [] - index = 0 - pass_name = ','.join(item.op_type for item in refer_node.keys()) - for query_node, host_node in refer_node.items(): - fusion_pattern = query_node.op_pass - - if 'op_pass' not in candidate_dict: - candidate_dict['op_pass'] = fusion_pattern - if 'fusion_pattern' not in candidate_dict: - candidate_dict['fusion_pattern'] = pass_name - match_attr = dict() - match_attr['op_name'] = host_node.op_name - match_attr['dtype'] = query_node.op_type - if has_time_info: - match_attr['duration'] = round(self.task_duration_list[case_id][node_index][index], 2) - index += 1 - match.append(match_attr) - match_attr = dict() - match_attr['op_name'] = "-" - match_attr['dtype'] = "-" - if has_time_info: - match_attr['duration'] = round(sum(self.task_duration_list[case_id][node_index]), 2) - match.append(match_attr) - candidate_dict['matches'].append(match) - candidates_list.append(candidate_dict) - html_render.render_template(key="computation", - template_dir="templates", - template_name="fusion.html", - candidates=candidates_list) - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self.candidates: - return - - optimization_item = OptimizeItem( - "fusion issue", - f"Found {len(self.candidates)} fusion issues", - ["Check fusion issues detail in mstt_advisor*.html"] - ) - total_time = 0.0 - for candidate in self.task_duration_list: - for duration in candidate: - total_time += sum(duration) - statistics_item = StatisticsItem(0, - total_time, - sum([len(candidate) for candidate in self.candidates]) - ) - result.add(OptimizeRecord(optimization_item, statistics_item)) - - record_title = [ - "issue_id", "graph_name", "op_name", "fusion_structure", "fusion_pattern", - "op_type", "input_shape", "input_format", - "input_dtype", "output_shape", "output_format", "output_dtype" - ] - result.add_detail('fusion issues', headers=record_title) - - for case_id, nodes in enumerate(self.candidates): - for _, refer_node in enumerate(nodes): - pass_name = ','.join(item.op_type for item in refer_node.keys()) - for query_node, host_node in refer_node.items(): - fusion_pattern = query_node.op_pass - detail = [ - case_id, - host_node.graph_name, - host_node.op_name, - pass_name, - fusion_pattern, - query_node.op_type, - self.get_attr_shape(host_node, "input", "shape"), - self.get_attr_type(host_node, "input", "format"), - self.get_attr_type(host_node, "input", "dtype"), - self.get_attr_shape(host_node, "output", "shape"), - self.get_attr_type(host_node, "output", "format"), - self.get_attr_type(host_node, "output", "dtype"), - ] - result.add_detail('fusion issues', detail=detail) - - @staticmethod - def get_attr_shape(node, type_name: str, attr_name: str) -> str: - attr_shape = [] - node_attrs = getattr(node, type_name, []) - for attrs in node_attrs: - attr = getattr(attrs, attr_name, []) - attr_shape.append(",".join(attr)) - return ";".join(attr_shape) - - @staticmethod - def get_attr_type(node, type_name: str, attr_name: str) -> str: - attr_type = [] - node_attrs = getattr(node, type_name, []) - for attr in node_attrs: - attr_type.append(getattr(attr, attr_name, "")) - return ";".join(attr_type) diff --git a/profiler/advisor/analyzer/overall/__init__.py b/profiler/advisor/analyzer/overall/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/overall/overall_summary_analyzer.py b/profiler/advisor/analyzer/overall/overall_summary_analyzer.py deleted file mode 100644 index 8e93dbda77d4915e716af856114184324d1d8807..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/overall/overall_summary_analyzer.py +++ /dev/null @@ -1,242 +0,0 @@ -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import os - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.display.html.render import HTMLRender -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.result.result import OptimizeResult -from profiler.compare_tools.compare_backend.utils.constant import Constant -from profiler.compare_tools.compare_interface.comparison_interface import ComparisonInterface - - -class OverallSummaryAnalyzer(BaseAnalyzer): - OVERALL_SUMMARY_ANALYZER = "overall_summary_analysis" - advice_map = { - "Computing Time": "if you want more detailed advice please go to mstt_advisor_*.html", - "Uncovered Communication Time": "if you want more detailed advice please go to mstt_advisor_*.html", - "Free Time": "if you want more detailed advice please go to mstt_advisor_*.html" - } - time_name_map = { - "Computing Time": "computing", - "Uncovered Communication Time": "communication", - "Free Time": "free", - 'Cube Time(Num)': 'Cube Time', - 'Vector Time(Num)': 'Vector Time', - 'Flash Attention Time(Forward)(Num)': 'Flash Attention Time(Forward)', - 'Flash Attention Time(Backward)(Num)': 'Flash Attention Time(Backward)', - 'Other Time': "Other Computing Time", - 'SDMA Time(Num)': 'SDMA Time' - } - performance_time_dict = { - "Computing Time": "computing_time_ms", - " -- Flash Attention": "fa_time_ms", - " -- Conv": "conv_time_ms", - " -- Matmul": "matmul_time_ms", - " -- Vector": "vector_time_ms", - " -- SDMA(Tensor Move)": "tensor_move_time_ms", - " -- Other Cube": "other_cube_time_ms", - "Uncovered Communication Time": "uncovered_communication_time_ms", - " -- Wait": "wait_time_ms", - " -- Transmit": "transmit_time_ms", - "Free Time": "free_time_ms", - " -- SDMA": "sdma_time_ms", - " -- Free": "free_ms", - "E2E Time": "e2e_time_ms" - } - - def __init__(self, collection_path: str, n_processes: int = 1, **kwargs): - profile_path = get_profile_path(collection_path) - super().__init__(profile_path, n_processes, **kwargs) - self.benchmark_profiling_path = kwargs.get("benchmark_profiling_path", "") - self._has_benchmark_profiling = False - self._is_minimal_profiling = False - self.cur_data = {} - self.cur_bottleneck = {} - self._disaggregate_perf = {} - self._disaggregate_benchmark_perf = {} - self.cur_advices = "" - self.html_render = HTMLRender() - self.result = OptimizeResult() - self.bottleneck_str = "" - self.over_summary_analysis = {} - - @staticmethod - def calculate_ratio(dividend, divisor): - if not divisor: - return float("inf") - return dividend / divisor - - @staticmethod - def get_time_category_dict(overall_dict: dict): - time_category_dict = { - "Computing Time": round(overall_dict.get('computing_time_ms', 0.0), 3), - "Uncovered Communication Time": round(overall_dict.get('uncovered_communication_time_ms', 0.0), 3), - "Free Time": round(overall_dict.get('free_time_ms', 0.0), 3) - } - return time_category_dict - - def path_check(self): - if self.benchmark_profiling_path: - if os.path.exists(self.benchmark_profiling_path): - self._has_benchmark_profiling = True - else: - print(f"[WARNING] Invalid path which not exists: {self.benchmark_profiling_path}.") - return os.path.exists(self.collection_path) - - def process(self): - self._disaggregate_perf = ComparisonInterface(self.collection_path).disaggregate_perf(Constant.OVERALL_COMPARE) - if not self._disaggregate_perf: - return - self._is_minimal_profiling = self._disaggregate_perf.get("minimal_profiling", False) - self.cur_data["overall_data"] = self.get_time_category_dict(self._disaggregate_perf.get('overall', {})) - if self._has_benchmark_profiling: - self._disaggregate_benchmark_perf = ComparisonInterface( - self.benchmark_profiling_path).disaggregate_perf(Constant.OVERALL_COMPARE) - - def identify_bottleneck(self): - overall_data = self.cur_data.get("overall_data") - if not overall_data: - return - e2e_time = '%.3f' % sum([data for data in overall_data.values()]) - overall_bottleneck = f"The Model E2E Time is {e2e_time}ms.\n" - comparison_bottleneck = "" - for time_type, time_value in overall_data.items(): - # add overall bottleneck - overall_bottleneck += f" -- {time_type} is {time_value}ms\n" - if time_type == "Free Time" and self._is_minimal_profiling and self.calculate_ratio(time_value, - e2e_time) > 0.1: - overall_bottleneck += "percentage of free time exceed the threshold 10%." - if not self._has_benchmark_profiling: - continue - # add comparison bottleneck - base_duration = self.get_time_category_dict( - self._disaggregate_benchmark_perf.get('overall', {}) - ).get(time_type) - if time_value > base_duration: - ratio = "{:.2%}".format(self.calculate_ratio(time_value - base_duration, base_duration)) - comparison_bottleneck += f"{time_type} exceeds the benchmark by {ratio}\n" - self.cur_bottleneck["overall_data"] = overall_bottleneck - if comparison_bottleneck: - self.cur_bottleneck["comparison_result"] = comparison_bottleneck - - def optimize(self, **kwargs): - if self.path_check(): - self.process() - self.identify_bottleneck() - self.format_bottleneck() - self.format_over_summary_analysis() - self.make_record() - self.make_render() - return self.result - - def format_bottleneck(self): - result = '' - for _, value in self.cur_bottleneck.items(): - if not value: - continue - result += f'{value} \n' - self.bottleneck_str = result - - def format_over_summary_analysis(self): - headers = ['Performance Index', 'Duration(ms)', 'Duration Ratio'] - performance_data = self.get_analysis_data(self._disaggregate_perf) - benchmark_data = self.get_analysis_data(self._disaggregate_benchmark_perf) - if self._has_benchmark_profiling: - headers.append('Diff Duration(ms)') - self.format_analysis_with_benchmark(performance_data, benchmark_data, headers) - else: - self.format_analysis_only(performance_data, headers) - - def get_analysis_data(self, data_dict: dict): - if not data_dict: - return {} - return { - **data_dict.get("overall"), - **data_dict.get("computing_time_disaggregate"), - **data_dict.get("communication_time_disaggregate"), - **data_dict.get("free_time_disaggregate"), - } - - def format_analysis_only(self, performance_data: dict, headers: list): - res = [] - total_duration = performance_data.get('e2e_time_ms', 0.0) - for time_name, time_key in self.performance_time_dict.items(): - row = [time_name] - duration = performance_data.get(time_key, 0.0) - row.append("{:.3f}".format(duration)) - row.append("{:.2%}".format(self.calculate_ratio(duration, total_duration))) - res.append(row) - self.over_summary_analysis["headers"] = headers - self.over_summary_analysis["data"] = res - - def format_analysis_with_benchmark(self, performance_data: dict, benchmark_data: dict, headers: list): - res = [] - total_duration = performance_data.get('e2e_time_ms', 0.0) - for time_name, time_key in self.performance_time_dict.items(): - row = [time_name] - duration = performance_data.get(time_key, 0.0) - row.append("{:.3f}".format(duration)) - row.append("{:.2%}".format(self.calculate_ratio(duration, total_duration))) - row.append("{:.3f}".format(duration - benchmark_data.get(time_key, 0.0))) - res.append(row) - self.over_summary_analysis["headers"] = headers - self.over_summary_analysis["data"] = res - - def make_record(self): - """ - make record for what and how to optimize - """ - if not self.bottleneck_str and not self.cur_advices: - return - optimization_item = OptimizeItem( - OverallSummaryAnalyzer.OVERALL_SUMMARY_ANALYZER, - self.bottleneck_str, - self.cur_advices - ) - self.result.add(OptimizeRecord(optimization_item)) - - self.result.add_detail( - OverallSummaryAnalyzer.OVERALL_SUMMARY_ANALYZER, - headers=self.over_summary_analysis["headers"] - ) - for data in self.over_summary_analysis["data"]: - self.result.add_detail(OverallSummaryAnalyzer.OVERALL_SUMMARY_ANALYZER, detail=data) - - def make_render(self): - if not self.bottleneck_str and not self.cur_advices: - return - # 将\n替换为html换行 - bottleneck_str = self.bottleneck_str.replace('\n', '
') - result_for_html = { - "Description": bottleneck_str, - "suggestion": self.cur_advices, - "details": [self.over_summary_analysis] - } - self.html_render.render_template(key="overall", - title=OverallSummaryAnalyzer.OVERALL_SUMMARY_ANALYZER, - template_dir="templates", - template_name="cluster_analysis.html", - cann_version=self.cann_version, - torch_version=self.torch_version, - result=result_for_html) - - -def get_profile_path(collection_path): - for root, dirs, files in os.walk(collection_path): - for file in files: - if file.startswith("profiler_info"): - return root - return "" diff --git a/profiler/advisor/analyzer/schedule/__init__.py b/profiler/advisor/analyzer/schedule/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/schedule/dispatch/__init__.py b/profiler/advisor/analyzer/schedule/dispatch/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/schedule/dispatch/timeline_op_dispatch_analyzer.py b/profiler/advisor/analyzer/schedule/dispatch/timeline_op_dispatch_analyzer.py deleted file mode 100644 index 0e62a3ff0c8eebc0cf7b5b89953b8a0842df9c9d..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/dispatch/timeline_op_dispatch_analyzer.py +++ /dev/null @@ -1,107 +0,0 @@ -#!/usr/bin/python -# -*- coding: utf-8 -*- -# Copyright (c) 2024, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import logging - - -from profiler.advisor.common import constant as const -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.display.html.render import HTMLRender - -logger = logging.getLogger() - - -class OpDispatchAnalyzer(BaseAnalyzer): - dataset_cls_list = [TimelineEventDataset] - """ - operator dispatch optimizer - """ - - def __init__(self, collection_path, n_processes: int = 1, **kwargs) -> None: - super().__init__(collection_path, n_processes, **kwargs) - key = TimelineEventDataset.get_key() - self.dataset = self.get_first_data_by_key(self.dataset_list, key) - self.result = OptimizeResult() - self.html_render = HTMLRender() - self._op_compile = None - self._issues_record = [] - self.optimization_item = [] - - def optimize(self, **kwargs): - """ - optimize operator - :param data: input datasets - :return: result - """ - self.get_op_compile_info(self.dataset) - self.make_record(self.result) - self.make_render(self.html_render) - return self.result - - def get_op_compile_info(self, event_dataset: TimelineEventDataset): - """ - :Param event_dataset: dataset of timeline event - """ - if hasattr(event_dataset, "ops_compile"): - self._op_compile = getattr(event_dataset, "ops_compile") - if not self._op_compile or self._op_compile.total_count < const.MAX_OP_COMPILE_NUM: - return - - self._issues_record.append(['operator dispatch', - const.OP_COMPILE_ID, - self._op_compile.total_count, - self._op_compile.total_time]) - else: - logger.debug("Skip operator compile checker, because no op_compile attr find.") - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self._op_compile or len(self._issues_record) <= 0: - return - desc = f"Found {self._op_compile.total_count} operator compile issues." - suggestion = (f"Please use `torch_npu.npu.set_compile_mode(jit_compile=False)` to disable jit compile " - f"in dynamic shape usage.") - self.optimization_item.append(OptimizeItem("Operator dispatch", desc, [suggestion])) - for optimization in self.optimization_item: - result.add(OptimizeRecord(optimization)) - record_title = ["Issues", "op name", "counts", "total time"] - result.add_detail('operator dispatch', headers=record_title) - for op_info in self._issues_record: - result.add_detail('operator dispatch', detail=op_info) - - def make_render(self, html_render): - issues = [] - optimizations = [] - for optimization in self.optimization_item: - optimizations.append(dict( - description=optimization.description, - suggestion=optimization.suggestion[0] - )) - for record in self._issues_record: - issues.append(dict(issue=record[0], - op_name=record[1], - counts=record[2], - total_time=record[3])) - html_render.render_template(key="schedule", - template_dir="templates", - template_name="operator_dispatch.html", - issues=issues, - optimizers=optimizations) diff --git a/profiler/advisor/analyzer/schedule/free_event/__init__.py b/profiler/advisor/analyzer/schedule/free_event/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/schedule/fusion_ops/__init__.py b/profiler/advisor/analyzer/schedule/fusion_ops/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/schedule/fusion_ops/fusion_ops_analyzer.py b/profiler/advisor/analyzer/schedule/fusion_ops/fusion_ops_analyzer.py deleted file mode 100644 index c1eb24b8e1e11ac167a7eb9333867167a57dd524..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/fusion_ops/fusion_ops_analyzer.py +++ /dev/null @@ -1,271 +0,0 @@ -import multiprocessing -import logging -import re - -from tqdm import tqdm - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.common import constant as const -from profiler.advisor.common.analyzer_scopes import SupportedScopes -from profiler.advisor.common.timeline.event import TimelineEvent -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.utils.utils import format_timeline_result -from profiler.advisor.common.timeline.fusion_ops_db import init_timeline_ops_db - -logger = logging.getLogger() - - -class TimelineFusionOpsAnalyzer(BaseAnalyzer): - dataset_cls_list = [TimelineEventDataset] - - def __init__(self, collection_path, n_processes: int = 1, **kwargs): - super().__init__(collection_path, n_processes, **kwargs) - self._matched_op_index = {} if self.n_processes <= 1 else multiprocessing.Manager().dict() - self.matched_op_stacks = {} - self.empty_stacks = True - key = TimelineEventDataset.get_key() - self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) - - def optimize(self, **kwargs): - for mode in [const.ATEN.lower(), const.OPTIMIZER.lower()]: - - for op_combined, npu_apis in tqdm(getattr(init_timeline_ops_db(self.cann_version, self.torch_version), - f"_{mode}_op_api_map").items(), leave=False, ncols=100, - desc="Scanning timeline for affinity apis"): - for npu_api in npu_apis.split("/"): - self.find_fusion_ops(self.timeline_event_dataset, op_combined, npu_api, mode) - - self.query_stack(self.timeline_event_dataset) - - logger.info("Finish timeline analysis") - self.make_record() - self.make_render() - return self.result - - def find_fusion_ops(self, event_dataset, ops: str, npu_api: str, mode: str): - """ - :Param event_dataset: dataset of timeline event - :Param ops: operator combination with '-' as separator , e.g. permute-reshape - :Param npu_api: api of torch_npu, generally more efficient than torch api - :Param mode: aten or dequeue or optimizer - :Return: json of op_name and called times and detail stacks - """ - op_rule_pattern, enable_regex = self._format_rule_to_pattern(ops) - if not enable_regex: - self._match_ops(event_dataset, op_rule_pattern, npu_api, mode) - else: - try: - self._match_ops_with_regex(event_dataset, op_rule_pattern, npu_api, mode) - except Exception as e: - logger.warning("Failed to find fusion operators with regex %s, reason is %s", ops, e) - - def _match_ops(self, event_dataset, ops: str, npu_api: str, mode: str): - """ match operator based on fusion operators rule(without regex), - only strictly equals of op name list means matched - :Param event_dataset: dataset of timeline event - :Param ops: operator combination with '-' as separator , e.g. permute-reshape - :Param npu_api: api of torch_npu, generally more efficient than torch api - :Param mode: aten or dequeue or optimizer - """ - op_list = ops.split(const.OP_SEP) - - matched_op_index = set() - api_ops_matched = False - - for index, event in enumerate(getattr(event_dataset, mode)): - if self._replace_op_name_prefix(event.name, mode) != op_list[0]: - continue - tmp_dequeue_event_names = [self._replace_op_name_prefix(event.name, mode) for event in - getattr(event_dataset, mode)[index: index + len(op_list)]] - if tmp_dequeue_event_names != op_list: - continue - api_ops_matched = True - matched_op_index.add(event.dataset_index) - - if api_ops_matched: - self._matched_op_index[npu_api + f":{ops}"] = matched_op_index - - def _match_ops_with_regex(self, event_dataset, op_rule_pattern: str, npu_api: str, - mode: str): - """ match operator based on fusion operators rule(with regex), - using regex to support condition like 'a = torch.mul(xxx) if xxx else torch.add(xxx)' - :Param event_dataset: dataset of timeline event - :Param op_rule_pattern: fusion operators rule with regex definition , e.g. add-mul{0,10}, add-mul* - :Param npu_api: api of torch_npu, generally more efficient than torch api - :Param mode: aten or dequeue or optimizer - """ - matched_op_index = set() - total_op_name = "".join([f"{const.OP_SEP}{self._replace_op_name_prefix(event.name, mode)}{const.OP_SEP}" - for event in - getattr(event_dataset, mode)]) - - matched_pattern_index_tuple = [(x.start(0), x.end(0)) for x in re.finditer(op_rule_pattern, total_op_name)] - # convert list of index tuple to a whole list: [(3, 25), ...] -> [3, 25, ...] - total_ops_split_points = [num for sublist in matched_pattern_index_tuple for num in sublist] - - api_ops_matched = len(total_ops_split_points) != 0 - - op_index = [] - if 0 not in total_ops_split_points: - total_ops_split_points = [0] + total_ops_split_points - if len(list(total_op_name)) not in total_ops_split_points: - total_ops_split_points.append(len(list(total_op_name))) - - # convert total ops name like "-add-mul-xxx-div-" to small pieces like [["add", "mul"], [...], ["div"]] - # by the regex index and then calculate the real index for matched fusion operators in event dataset - for l, r in zip(total_ops_split_points, total_ops_split_points[1:]): - matched_op_flag = True if (l, r) in matched_pattern_index_tuple else False - matched_ops_list = total_op_name[l: r].strip(const.OP_SEP).split(const.OP_SEP + const.OP_SEP) - op_index.append([matched_op_flag, len(matched_ops_list)]) - for i, _ in enumerate(op_index): - if i > 0: - # calculate cumsum for indexing matched operator - op_index[i][1] = op_index[i][1] + op_index[i - 1][1] - op_index = [[False, 0]] + op_index - - for i, _ in enumerate(op_index): - if not op_index[i][0]: - continue - index = op_index[i - 1][1] - matched_op_index.add(index) - - if index > len(getattr(event_dataset, mode)) - 1: - continue - dataset_index = getattr(event_dataset, mode)[index].get("dataset_index") - matched_op_index.add(dataset_index) - - if api_ops_matched: - self._matched_op_index[npu_api + f":{op_rule_pattern}"] = sorted(list(matched_op_index)) - - def make_record(self): - """ - make record for what and how to optimize - """ - if not self.matched_op_stacks: - return - - desc = f"Found {len(format_timeline_result(self.matched_op_stacks))} apis to be replaced" \ - f" based on the runtime env cann-{self.cann_version} and torch-{self.torch_version}" - suggestion = "Please replace training api according to sub table 'Affinity training api'" - if self.empty_stacks: - desc += ", but with no stack" - suggestion = const.TIMELINE_EMPTY_STACKS_PROMPT.format( - timeline_profiling_doc_url=const.TIMELINE_WITH_STACK_DOC_URL - ) - - optimization_item = OptimizeItem( - SupportedScopes.TIMELINE_FUSION_OPS, - desc, - [suggestion] - ) - - self.result.add(OptimizeRecord(optimization_item)) - - record_title = ["Affinity API", "Code stacks", "Stack called counts"] - self.result.add_detail(SupportedScopes.TIMELINE_FUSION_OPS, headers=record_title) - - for api_name, stacks_info in format_timeline_result(self.matched_op_stacks).items(): - if not stacks_info: - detail = [api_name, "null", "null"] - self.result.add_detail(SupportedScopes.TIMELINE_FUSION_OPS, detail=detail) - else: - for stack in stacks_info: - detail = [api_name, *stack] - self.result.add_detail(SupportedScopes.TIMELINE_FUSION_OPS, detail=detail) - - def make_render(self): - format_result_for_html = format_timeline_result(dict(self.matched_op_stacks), dump_html=True) - - self.html_render.render_template(key="schedule", - template_dir="templates", - template_name="affinity_api.html", - cann_version=self.cann_version, - torch_version=self.torch_version, - empty_stacks=self.empty_stacks, - with_stack_doc_url=const.TIMELINE_WITH_STACK_DOC_URL, - api_doc_url=const.TIMELINE_API_DOC_URL, - result=format_result_for_html) - - def query_stack(self, event_dataset): - if all([len(matched_index) == 0 for matched_index in self._matched_op_index.values()]): - return - - op_stack_list = event_dataset.parse_data_with_generator(self._query_stack_by_matched_index) - for op_stack in op_stack_list: - for op_rule, stack in op_stack.items(): - if op_rule not in self.matched_op_stacks: - self.matched_op_stacks[op_rule] = {} - if stack == const.TIMELINE_FUSION_OPS_NO_STACK_FLAG: - continue - if stack not in self.matched_op_stacks[op_rule]: - self.matched_op_stacks[op_rule][stack] = 0 - self.matched_op_stacks[op_rule][stack] += 1 - - def _query_stack_by_matched_index(self, index, event): - stack_record = {} - event = TimelineEvent(event) - - matched_op_rules = [] - for op_rule, matched_index in self._matched_op_index.items(): - if index not in matched_index: - continue - - matched_op_rules.append(op_rule) - stack = event.args.get(const.CALL_STACKS) - - if not stack: - logger.debug("Got empty '%s' for event %s", const.CALL_STACKS, event) - continue - - if self.empty_stacks and stack: - self.empty_stacks = False - - stack_record[op_rule] = stack - - if matched_op_rules and not stack_record: - for op_rule in matched_op_rules: - stack_record[op_rule] = const.TIMELINE_FUSION_OPS_NO_STACK_FLAG - - return stack_record - - def _replace_op_name_prefix(self, event_name, mode): - if mode == const.DEQUEUE.lower(): - op_name_prefix = f"{const.DEQUEUE}{const.DEQUEUE_SEP}" - elif mode == const.ATEN: - op_name_prefix = f"{const.ATEN}{const.ATEN_SEP}" - else: - op_name_prefix = f"{const.OPTIMIZER}.{const.OPTIMIZER_STEP}{const.OPTIMIZER_SEP}" - - return event_name.replace(op_name_prefix, "") - - def _format_rule_to_pattern(self, op_rule): - """ - Args: - op_rule: like (mul){0,1}-(add|neg){0,2}-dropout-(softmax)* - - Returns: op_pattern like (-mul-){0,1}(-add-|-neg-){0,2}(-dropout-)(-softmax-)* - """ - enable_regex = False - if "(" not in op_rule and ")" not in op_rule: - # op_rule which requires fuzzy matching mush consist of "()" - return op_rule, enable_regex - - enable_regex = True - op_pattern_list = op_rule.split(const.OP_SEP) - format_op_pattern = "" - for op_pattern in op_pattern_list: - matched_res = re.search(r'\((.*?)\)', op_pattern) - - ops_index_range = (matched_res.start() + 1, matched_res.end() - 1) if matched_res else ( - 0, len(op_pattern)) - - op_names = op_pattern[ops_index_range[0]: ops_index_range[1]] - tmp_op_names_record = [] - for op_name in op_names.split("|"): - tmp_op_names_record.append(f"{const.OP_SEP}{op_name.strip(' ')}{const.OP_SEP}") - op_suffix = op_pattern[ops_index_range[1] + 1:] - op_names_format = f"({'|'.join(tmp_op_names_record)}){op_suffix}" - - format_op_pattern += op_names_format - return format_op_pattern, enable_regex diff --git a/profiler/advisor/analyzer/schedule/fusion_ops/timeline_api_stack_checker.py b/profiler/advisor/analyzer/schedule/fusion_ops/timeline_api_stack_checker.py deleted file mode 100644 index f684a4892111f113f6c502a010c9e14ccd43768a..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/fusion_ops/timeline_api_stack_checker.py +++ /dev/null @@ -1,163 +0,0 @@ -import logging -from typing import List - -from profiler.advisor.common import constant as const -from profiler.advisor.common.timeline.event import TimelineEvent -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.utils.utils import get_analyze_processes, ParallelJob - -logger = logging.getLogger() - - -class OpStackFinder: - - def __init__(self): - self.n_processes = get_analyze_processes() - self._stack_record = [] - self._task_id_record = {} - self.op_name = None - self.task_type = None - self.matched_index = set() - - def get_api_stack_by_op(self, event_dataset: TimelineEventDataset, op_name: List[str] = None, task_type: str = None, - disable_multiprocess=False): - """ - :Param event_dataset: dataset of timeline event - :Param op_name: operator name, e.g. IndexPutV2 - :Param task_type: operator task type, optionals are AI_CPU and AI_CORE - :Param disable_multiprocess: disable multiprocessing, avoid cost time of enable new process for light task - """ - if not op_name: - op_name = [] - if not isinstance(op_name, list): - op_name = [op_name] - - self.op_name = ",".join(op_name) - self.task_type = task_type - op_name_list = event_dataset.task_op_names if not op_name else op_name - - if self.n_processes <= 1 or disable_multiprocess: - self._query_stacks_multiprocess(event_dataset, op_name_list, task_type) - else: - event_num_per_process = int(len(op_name_list) / self.n_processes) + 1 - parallel_analyzer = ParallelJob( - self._query_stacks_multiprocess, - [[event_dataset, op_name_list[i:i + event_num_per_process], task_type] - for i in range(0, len(op_name_list), event_num_per_process)], - job_name="Analyzing operator stacks from timeline" - ) - parallel_analyzer.start(self.n_processes) - self.query_stack(event_dataset) - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self._stack_record: - return - - desc = f"Found {len(self._stack_record)} called stacks for" - if self.op_name and self.task_type: - desc += f" operators with name '{self.op_name}' with task type '{self.task_type}'" - elif self.op_name and not self.task_type: - desc += f" operators with name '{self.op_name}'" - elif self.task_type and not self.op_name: - desc += f" operators with task type '{self.task_type}'" - else: - desc += " all operators" - - suggestion = f"Please use command 'ma-advisor analyze profiling' to analyze operators" - optimization_item = OptimizeItem( - "Operator stacks", - desc, - [suggestion] - ) - result.add(OptimizeRecord(optimization_item)) - - record_title = ["Task ID", "op name", "op type", "code stacks"] - result.add_detail('operator stacks', headers=record_title) - - for op_info in self._stack_record: - result.add_detail('operator stacks', detail=op_info) - - def _get_api_stack_by_op(self, event_dataset: TimelineEventDataset, op_name: str, task_type: str): - for _, src_op_event in event_dataset.ops_with_task_type.items(): - - op_task_type = src_op_event.get(const.TASK_TYPE) - if not (src_op_event.name == op_name and op_task_type and op_task_type == task_type): - continue - - torch_to_npu_key = f"s-{src_op_event.tid}-{src_op_event.ts}" - torch_to_npu_event = event_dataset.torch_to_npu.get(torch_to_npu_key) or event_dataset.torch_to_npu.get( - f"s-{src_op_event.ts}") or event_dataset.torch_to_npu.get(f"s-{src_op_event.ts.replace('.', '')}") - - acl_to_npu_event = src_op_event.ts in event_dataset.acl_to_npu - - if not torch_to_npu_event and not acl_to_npu_event: - continue - - # query stack by torch_to_npu first, due to each operator had acl_to_npu incoming flow in cann6.3 - if torch_to_npu_event: - dst_op_index = self._query_index_by_torch_to_npu(event_dataset, torch_to_npu_event) - else: - dst_op_index = self._query_index_by_acl_to_npu(acl_to_npu_event) - - if not dst_op_index: - continue - - task_id = src_op_event.task_id - if not task_id: - continue - self.matched_index.add(dst_op_index) - if dst_op_index not in self._task_id_record: - self._task_id_record[dst_op_index] = [] - self._task_id_record[dst_op_index].append([task_id, op_name, task_type]) - - def _query_index_by_torch_to_npu(self, event_dataset, torch_to_npu_event): - dst_op_event_key = torch_to_npu_event.ts - dst_op_event = event_dataset.ops_with_stack.get(dst_op_event_key) - - if not dst_op_event: - return const.TIMELINE_BACKWARD_NO_STACK_CODE - - return dst_op_event.get("dataset_index") - - def _query_index_by_acl_to_npu(self, acl_to_npu_event): - if acl_to_npu_event: - return const.TIMELINE_ACL_TO_NPU_NO_STACK_CODE - - def _query_stacks_multiprocess(self, event_dataset, op_name_list, task_type): - - for op_name in op_name_list: - if task_type is not None: - self._get_api_stack_by_op(event_dataset, op_name, task_type) - else: - self._get_api_stack_by_op(event_dataset, op_name, const.AI_CORE) - self._get_api_stack_by_op(event_dataset, op_name, const.AI_CPU) - - def _format_stack_record(self): - stack_list = [] - for task_id, stack_info in self._task_id_record.items(): - stack_list.append([task_id, *stack_info]) - return stack_list - - def _query_stack_by_matched_index(self, index, event): - if index not in self.matched_index: - return None - event = TimelineEvent(event) - stack = event.args.get(const.CALL_STACKS) - stack = stack if stack else const.NO_STACK_REASON_MAP.get(const.TIMELINE_BACKWARD_NO_STACK_CODE) - for matched_op_info in self._task_id_record.get(index, []): - self._stack_record.append([*matched_op_info, stack]) - - for matched_op_info in self._task_id_record.get(const.TIMELINE_ACL_TO_NPU_NO_STACK_CODE, []): - self._stack_record.append([*matched_op_info, - const.NO_STACK_REASON_MAP.get(const.TIMELINE_ACL_TO_NPU_NO_STACK_CODE)]) - return None - - def query_stack(self, event_dataset: TimelineEventDataset): - if not event_dataset.dataset_len: - return - _ = event_dataset.parse_data_with_generator(self._query_stack_by_matched_index) diff --git a/profiler/advisor/analyzer/schedule/syncbn/__init__.py b/profiler/advisor/analyzer/schedule/syncbn/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py b/profiler/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py deleted file mode 100644 index 2786a784087bb449df3f7126e42f713fcf6a3cc6..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/syncbn/syncbn_analyzer.py +++ /dev/null @@ -1,30 +0,0 @@ -import logging - -from typing import List, Dict, Any - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.analyzer.schedule.syncbn.syncbn_checker import SyncBNChecker -from profiler.advisor.display.html.render import HTMLRender -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset - -logger = logging.getLogger() - - -class SyncBNAnalyzer(BaseAnalyzer): - dataset_cls_list = [TimelineEventDataset] - - def __init__(self, collection_path, **kwargs): - super().__init__(collection_path, **kwargs) - self.result = OptimizeResult() - self.html_render = HTMLRender() - key = TimelineEventDataset.get_key() - self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) - - @BaseAnalyzer.check_data((TimelineEventDataset.get_key(),)) - def optimize(self, **kwargs): - syncbn_checker = SyncBNChecker() - syncbn_checker.check_syncbn(self.timeline_event_dataset) - syncbn_checker.make_record(self.result) - syncbn_checker.make_render(self.html_render) - return self.result diff --git a/profiler/advisor/analyzer/schedule/syncbn/syncbn_checker.py b/profiler/advisor/analyzer/schedule/syncbn/syncbn_checker.py deleted file mode 100644 index c0e10448f3f05736c0fb0518fbcf5729244b058b..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/syncbn/syncbn_checker.py +++ /dev/null @@ -1,70 +0,0 @@ -import logging -import os - -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.cluster_analyse.common_func.file_manager import FileManager - -logger = logging.getLogger() - - -class SyncBNChecker: - - def __init__(self): - self.optimization_item = [] - self.syncbn_issues = False - self.desc = "" - self.suggestions = [] - self.solutions = None - self.max_syncbn_num = None - self._init_rule() - - def check_syncbn(self, event_dataset: TimelineEventDataset): - """ - :Param event_dataset: dataset of timeline event - """ - if not hasattr(event_dataset, "sync_batchnorm") or not getattr(event_dataset, "sync_batchnorm"): - logger.debug("Skip syncbn checker, because no syncbn found") - return - - syncbn_num = len(event_dataset.sync_batchnorm) - self.syncbn_issues = syncbn_num >= self.max_syncbn_num - self.desc = self.desc.format(syncbn_num=syncbn_num) - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self.syncbn_issues: - return - - self.optimization_item.append(OptimizeItem("SyncBatchNorm", self.desc, self.suggestions)) - for optimization in self.optimization_item: - result.add(OptimizeRecord(optimization)) - - def make_render(self, html_render): - if not self.syncbn_issues: - return - html_render.render_template(key="schedule", - template_dir="templates", - template_name="sync_batchnorm.html", - desc=self.desc, - solutions=self.solutions) - - def _init_rule(self): - syncbn_rule_path = os.path.join( - os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))), - "rules", - "sync_batchnorm.yaml" - ) - - syncbn_rule = FileManager.read_yaml_file(syncbn_rule_path) - - self.max_syncbn_num = syncbn_rule.get("max_syncbn_num") - self.desc = syncbn_rule.get("problem") - - self.solutions = syncbn_rule.get("solutions") - for solution in self.solutions: - for key, val in solution.items(): - self.suggestions.append(f"{key}, {val.get('desc')}") diff --git a/profiler/advisor/analyzer/schedule/synchronize_stream/__init__.py b/profiler/advisor/analyzer/schedule/synchronize_stream/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py b/profiler/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py deleted file mode 100644 index d8906504c39141807d45ed1303f0672d6983a2ca..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_analyzer.py +++ /dev/null @@ -1,32 +0,0 @@ -import logging - -from typing import List, Dict, Any - -from profiler.advisor.analyzer.base_analyzer import BaseAnalyzer -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.analyzer.schedule.synchronize_stream.synchronize_stream_checker import SynchronizeStreamChecker -from profiler.advisor.display.html.render import HTMLRender -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset - -logger = logging.getLogger() - - -class SynchronizeStreamAnalyzer(BaseAnalyzer): - dataset_cls_list = [TimelineEventDataset] - - def __init__(self, collection_path, **kwargs): - super().__init__(collection_path, **kwargs) - self.result = OptimizeResult() - self.html_render = HTMLRender() - - key = TimelineEventDataset.get_key() - self.timeline_event_dataset = self.get_first_data_by_key(self.dataset_list, key) - - @BaseAnalyzer.check_data((TimelineEventDataset.get_key(),)) - def optimize(self, **kwargs): - - synchronize_stream_checker = SynchronizeStreamChecker() - synchronize_stream_checker.check_synchronize(self.timeline_event_dataset, kwargs.get("profiling_with_stack")) - synchronize_stream_checker.make_record(self.result) - synchronize_stream_checker.make_render(self.html_render) - return self.result diff --git a/profiler/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_checker.py b/profiler/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_checker.py deleted file mode 100644 index 83ddd80a0f918d1e7e58e3081e9d15ee936128f2..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/synchronize_stream/synchronize_stream_checker.py +++ /dev/null @@ -1,89 +0,0 @@ -import logging - -from profiler.advisor.common import constant as const -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.result import OptimizeResult -from profiler.advisor.result.item import OptimizeItem, OptimizeRecord -from profiler.advisor.analyzer.schedule.timeline_base_checker import TimelineBaseChecker -from profiler.advisor.utils.utils import format_timeline_result - -logger = logging.getLogger() - - -class SynchronizeStreamChecker(TimelineBaseChecker): - - def __init__(self): - super().__init__(n_processes=1) - self.optimization_item = [] - self.synchronize_issues = False - self.desc = "" - self.suggestions = [] - self.solutions = [] - self.max_synchronize_num = None - - def check_synchronize(self, event_dataset: TimelineEventDataset, profiling_with_stack=None): - """ - :Param event_dataset: dataset of timeline event - """ - if not hasattr(event_dataset, "synchronize_stream") or not getattr(event_dataset, "synchronize_stream"): - logger.debug("Skip synchronize stream checker, because no synchronize stream found") - return - - synchronize_num = event_dataset.synchronize_stream.total_count - slow_synchronize_stream = event_dataset.synchronize_stream.slow_synchronize_stream - total_slow_synchronize_time = sum((float(sync_stream.dur) for sync_stream in slow_synchronize_stream)) - - synchronize_stream_rule = event_dataset.synchronize_stream.rule - self.max_synchronize_num = synchronize_stream_rule.get("max_synchronize_num") - self.synchronize_issues = synchronize_num >= self.max_synchronize_num and len(slow_synchronize_stream) > 0 - if not self.synchronize_issues: - return - - for sync_stream in slow_synchronize_stream: - if sync_stream.name not in self._matched_op_index: - self._matched_op_index[sync_stream.name] = [] - self._matched_op_index[sync_stream.name].append(sync_stream.dataset_index) - self.query_stack(event_dataset, profiling_with_stack) - - self.desc = synchronize_stream_rule.get("problem") - self.desc = self.desc.format(synchronize_num=synchronize_num, - slow_synchronize_num=len(slow_synchronize_stream), - total_synchronize_stream_time=total_slow_synchronize_time) - - solutions = synchronize_stream_rule.get("solutions") - for solution in solutions: - renderer_solution = {} - for key, val in solution.items(): - if self.empty_stacks and self.framework_black_list: - # 如果堆栈源于torch, torch_npu等框架,则不提示修改的代码 - if "modify code" in key.lower(): - continue - self.suggestions.append(f"{key}, {val.get('desc')}") - renderer_solution.update({key: val}) - self.solutions.append(renderer_solution) - - def make_record(self, result: OptimizeResult): - """ - make record for what and how to optimize - """ - if not self.synchronize_issues: - return - - self.optimization_item.append(OptimizeItem("SynchronizeStream", self.desc, self.suggestions)) - for optimization in self.optimization_item: - result.add(OptimizeRecord(optimization)) - - def make_render(self, html_render): - if not self.synchronize_issues: - return - - format_result_for_html = format_timeline_result(dict(self.matched_op_stacks), dump_html=True) - html_render.render_template(key="schedule", - template_dir="templates", - template_name="synchronize_stream.html", - desc=self.desc, - solutions=self.solutions, - result=format_result_for_html, - with_stack_doc_url=const.TIMELINE_WITH_STACK_DOC_URL, - empty_stacks=self.empty_stacks, - framework_black_list=self.framework_black_list) diff --git a/profiler/advisor/analyzer/schedule/timeline_base_checker.py b/profiler/advisor/analyzer/schedule/timeline_base_checker.py deleted file mode 100644 index 8bc69150263c11006979f64d12df1dde29a45f15..0000000000000000000000000000000000000000 --- a/profiler/advisor/analyzer/schedule/timeline_base_checker.py +++ /dev/null @@ -1,91 +0,0 @@ -from abc import ABC, abstractmethod -import multiprocessing -import logging - -from profiler.advisor.common import constant as const -from profiler.advisor.common.timeline.event import TimelineEvent -from profiler.advisor.dataset.timeline_event_dataset import TimelineEventDataset -from profiler.advisor.result.result import OptimizeResult - -logger = logging.getLogger() - - -class TimelineBaseChecker(ABC): - - def __init__(self, n_processes: int = 1): - self.n_processes = n_processes - self._matched_op_index = {} if self.n_processes <= 1 else multiprocessing.Manager().dict() - self.matched_op_stacks = {} - self.empty_stacks = True - self.framework_black_list = False - - @abstractmethod - def make_record(self, result: OptimizeResult): - pass - - @abstractmethod - def make_render(self, html_render): - pass - - def query_stack(self, event_dataset: TimelineEventDataset = None, profiling_with_stack: str = None): - if all([len(matched_index) == 0 for matched_index in self._matched_op_index.values()]): - return - - event_dataset = event_dataset if not profiling_with_stack else TimelineEventDataset( - collection_path=profiling_with_stack, data={}, _datasets={}, analysis_mode="fusion_ops", - build_dataset=False) - - op_stack_list = event_dataset.parse_data_with_generator(self._query_stack_by_matched_index) - for op_stack in op_stack_list: - for op, stack in op_stack.items(): - if op not in self.matched_op_stacks: - self.matched_op_stacks[op] = {} - if stack == const.TIMELINE_FUSION_OPS_NO_STACK_FLAG: - continue - if stack not in self.matched_op_stacks[op]: - self.matched_op_stacks[op][stack] = 0 - self.matched_op_stacks[op][stack] += 1 - - def _query_stack_by_matched_index(self, index, event): - stack_record = {} - event = TimelineEvent(event) - - matched_ops = [] - for op, matched_index in self._matched_op_index.items(): - if index not in matched_index: - continue - - matched_ops.append(op) - stack = event.args.get(const.CALL_STACKS) - - if not stack: - logger.debug("Got empty '%s' for event %s", const.CALL_STACKS, event) - continue - - if not self._is_keep_stack(stack): - self.framework_black_list = True - logger.debug("Drop stack from framework %s", const.FRAMEWORK_STACK_BLACK_LIST) - continue - - if self.empty_stacks and stack: - self.empty_stacks = False - - stack_record[op] = stack - - if matched_ops and not stack_record: - for op in matched_ops: - stack_record[op] = const.TIMELINE_FUSION_OPS_NO_STACK_FLAG - - return stack_record - - def _is_keep_stack(self, stack): - # 过滤掉torch, torch_npu, megatron, deepspeed等框架下的堆栈,这些源码基本是不能被修改的 - stack_list = stack.replace("\\r\\n", ";").split(";") - if not stack_list: - return False - - final_called_stack = stack_list[0] - for framework in const.FRAMEWORK_STACK_BLACK_LIST: - if framework in final_called_stack.split("/"): - return False - return True diff --git a/profiler/advisor/cluster_perf_analysis.ipynb b/profiler/advisor/cluster_perf_analysis.ipynb deleted file mode 100644 index 7ee0b24e85467fe42205c5986095a7e66bf0a636..0000000000000000000000000000000000000000 --- a/profiler/advisor/cluster_perf_analysis.ipynb +++ /dev/null @@ -1,1042 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 2, - "id": "initial_id", - "metadata": { - "ExecuteTime": { - "end_time": "2023-11-21T13:31:25.022339600Z", - "start_time": "2023-11-21T13:31:25.016155200Z" - } - }, - "outputs": [], - "source": [ - "import sys\n", - "sys.path.append(\"../..\")" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "c552da9d-36f9-43d3-ae1f-c54f78d3ff2d", - "metadata": {}, - "outputs": [], - "source": [ - "from profiler.advisor.interface.interface import Interface\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from prettytable import PrettyTable, ALL\n", - "from textwrap import fill" - ] - }, - { - "cell_type": "markdown", - "id": "57d17a21205c3c5e", - "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "source": [ - "# 集群调优分析\n", - "## 1. 集群分析的数据准备\n", - "首先我们当前支持PyTorch多卡大模型的集群分析,您需要输入集群分析的profiling_path路径,例如: \n", - "--{profiling_path} \n", - " -- xxxx_ascend_pt \n", - " -- xxxx_ascend_pt \n", - " -- xxxx_ascend_pt \n", - " ...... \n", - " -- xxxx_ascend_pt \n", - "里面每张卡的profiling文件都是ascend_pt结尾的文件。 \n", - "\n", - "## 2. 集群分析解决的问题 \n", - "当前的功能主要有四项: \n", - "1). 识别多卡间的计算慢卡(根据计算时间等推断) \n", - "2). 识别多卡间的通信慢现象(根据通信链路的带宽判断) \n", - "3). 对多卡间的计算算子进行统计展示(识别不同卡的算子差异) \n", - "4). 展示集群流水并行图(根据时间轴展示多卡间的计算和通信时间) " - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "36b7a24cc7ca5da2", - "metadata": { - "ExecuteTime": { - "end_time": "2023-11-21T12:53:38.379699800Z", - "start_time": "2023-11-21T12:53:38.363755900Z" - }, - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [], - "source": [ - "# EDIT THE PROFILING DATA PATH\n", - "cluster_path = r\"YOUR PROFILING PATH\"\n", - "interface = Interface(profiling_path=cluster_path)" - ] - }, - { - "cell_type": "markdown", - "id": "cf832ac2e0dfa30f", - "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "source": [ - "## 1) 识别慢卡" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "40aac93278dd6e34", - "metadata": { - "ExecuteTime": { - "end_time": "2023-11-21T12:53:41.815599700Z", - "start_time": "2023-11-21T12:53:41.783393700Z" - }, - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[INFO]Cluster has been analyzed because of the existence of cluster analysis output directory.\n", - "[INFO]Skip Cluster analyze backend.\n" - ] - } - ], - "source": [ - "slow_rank_result = interface.get_result(\"cluster\", \"slow_rank\")" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "0e943b2a-37a6-4db6-9e70-235d397f1d39", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
rank_idcomputecommunicationfree
028976239.079999877586795.4199998116836641.679994211
129012279.1000001026984613.2200000257388343.859991224
229019115.323000517489956.6330000286881360.253991371
329027089.5600000777963312.2399997946389981.899993688
429044786.936999656533618.6390000177780517.1539908135
529178186.2599998537925184.4200000286286867.999995028
629025331.1899999046386639.907999927941798.704992032
729056803.3049995457234444.8260000247094608.035991492
831383314.9800002283973806.61699999968017981.379989724
931360536.362000194757458.8250000027277062.386991671
1031381891.8000004635276870.3599999986731073.659992552
1131387777.380000334727362.30000000457297578.339992355
1231374132.744999775164443.3880000046829798.933991944
1331377800.1789998044360616.2830000017624691.509991412
1431374658.3600003164457099.6200000017542724.319990785
1531387255.5270000065000860.9056975264.115991174
" - ], - "text/plain": [ - "+---------+--------------------+--------------------+--------------------+\n", - "| rank_id | compute | communication | free |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 0 | 28976239.07999987 | 7586795.419999811 | 6836641.679994211 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 1 | 29012279.100000102 | 6984613.220000025 | 7388343.859991224 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 2 | 29019115.32300051 | 7489956.633000028 | 6881360.253991371 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 3 | 29027089.560000077 | 7963312.239999794 | 6389981.899993688 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 4 | 29044786.93699965 | 6533618.639000017 | 7780517.1539908135 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 5 | 29178186.259999853 | 7925184.420000028 | 6286867.999995028 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 6 | 29025331.189999904 | 6386639.90799992 | 7941798.704992032 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 7 | 29056803.304999545 | 7234444.826000024 | 7094608.035991492 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 8 | 31383314.980000228 | 3973806.6169999996 | 8017981.379989724 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 9 | 31360536.36200019 | 4757458.825000002 | 7277062.386991671 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 10 | 31381891.800000463 | 5276870.359999998 | 6731073.659992552 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 11 | 31387777.38000033 | 4727362.3000000045 | 7297578.339992355 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 12 | 31374132.74499977 | 5164443.388000004 | 6829798.933991944 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 13 | 31377800.178999804 | 4360616.283000001 | 7624691.509991412 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 14 | 31374658.360000316 | 4457099.620000001 | 7542724.319990785 |\n", - "+---------+--------------------+--------------------+--------------------+\n", - "| 15 | 31387255.527000006 | 5000860.905 | 6975264.115991174 |\n", - "+---------+--------------------+--------------------+--------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "slow_rank_data = slow_rank_result.get(\"slow_rank_analysis\")\n", - "if slow_rank_data:\n", - " slow_rank_table = PrettyTable(slow_rank_data.get(\"headers\"))\n", - " for row in slow_rank_data.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " slow_rank_table.add_row(row)\n", - " slow_rank_table.hrules = ALL\n", - " display(slow_rank_table[:16])" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "57a9b1c6-4127-47a2-8699-3c983950bd84", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
problemdescription
slow_rank_analysisComputing has some issues in the cluster, because the max difference of Computing time
has reached 2411.538ms. Communication has some issues in the cluster, because the max
difference of Communication time has reached 3989.506ms.
" - ], - "text/plain": [ - "+--------------------+--------------------------------------------------------------------------------------------------+\n", - "| problem | description |\n", - "+--------------------+--------------------------------------------------------------------------------------------------+\n", - "| slow_rank_analysis | Computing has some issues in the cluster, because the max difference of Computing time |\n", - "| | has reached 2411.538ms. Communication has some issues in the cluster, because the max |\n", - "| | difference of Communication time has reached 3989.506ms. |\n", - "+--------------------+--------------------------------------------------------------------------------------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "problems = slow_rank_result.get(\"problems\")\n", - "headers = problems.get('headers')[:2]\n", - "if problems: # 如果存在相关问题则获取相关问题检测描述及建议\n", - " problem_table = PrettyTable(headers)\n", - " for row in problems.get(\"data\"):\n", - " row = [fill(str(element), width=100) for element in row]\n", - " problem_table.add_row(row[:2])\n", - " display(problem_table)\n", - "else:\n", - " print(\"There is no suggestion related to slow rank analysis.\")" - ] - }, - { - "cell_type": "markdown", - "id": "3511befaff513e8e", - "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "source": [ - "## 2)识别通信链路慢" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "2a1e617d2a117125", - "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[INFO]Cluster has been analyzed because of the existence of cluster analysis output directory.\n", - "[INFO]Skip Cluster analyze backend.\n" - ] - } - ], - "source": [ - "slow_link_result = interface.get_result(\"cluster\", \"slow_link\")" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "c8bca314-a8da-4a5b-985a-c36f00154552", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
rank_idRDMA bandwidth(GB/s)RDMA size(mb)RDMA time(ms)SDMA bandwidth(GB/s)SDMA size(mb)SDMA time(ms)
00009.766842507.34694399984352.225880000002
100010.165342507.3467759997954181.611080000001
200010.47142507.3467759997954059.527798999999
30009.969142507.3467759997954263.9230400000015
40009.146942507.3467759997954647.202435000001
50009.466342507.3467759997954490.373999999999
60009.569242507.3467759997954442.106745000001
70009.844442507.3467759997954317.931616999999
800018.89542507.3899522249.662369
900018.911242507.390808000062247.7420159999997
1000018.771342507.390808000062264.48576
1100018.838942507.390808000062256.3606000000004
1200018.768742507.390808000062264.8021099999996
1300018.971742507.390808000062240.5713950000004
1400018.922642507.390808000062246.381839999999
1500018.834642507.390808000062256.8781
" - ], - "text/plain": [ - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| rank_id | RDMA bandwidth(GB/s) | RDMA size(mb) | RDMA time(ms) | SDMA bandwidth(GB/s) | SDMA size(mb) | SDMA time(ms) |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 0 | 0 | 0 | 0 | 9.7668 | 42507.3469439998 | 4352.225880000002 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 1 | 0 | 0 | 0 | 10.1653 | 42507.346775999795 | 4181.611080000001 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 2 | 0 | 0 | 0 | 10.471 | 42507.346775999795 | 4059.527798999999 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 3 | 0 | 0 | 0 | 9.9691 | 42507.346775999795 | 4263.9230400000015 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 4 | 0 | 0 | 0 | 9.1469 | 42507.346775999795 | 4647.202435000001 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 5 | 0 | 0 | 0 | 9.4663 | 42507.346775999795 | 4490.373999999999 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 6 | 0 | 0 | 0 | 9.5692 | 42507.346775999795 | 4442.106745000001 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 7 | 0 | 0 | 0 | 9.8444 | 42507.346775999795 | 4317.931616999999 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 8 | 0 | 0 | 0 | 18.895 | 42507.389952 | 2249.662369 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 9 | 0 | 0 | 0 | 18.9112 | 42507.39080800006 | 2247.7420159999997 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 10 | 0 | 0 | 0 | 18.7713 | 42507.39080800006 | 2264.48576 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 11 | 0 | 0 | 0 | 18.8389 | 42507.39080800006 | 2256.3606000000004 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 12 | 0 | 0 | 0 | 18.7687 | 42507.39080800006 | 2264.8021099999996 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 13 | 0 | 0 | 0 | 18.9717 | 42507.39080800006 | 2240.5713950000004 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 14 | 0 | 0 | 0 | 18.9226 | 42507.39080800006 | 2246.381839999999 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+\n", - "| 15 | 0 | 0 | 0 | 18.8346 | 42507.39080800006 | 2256.8781 |\n", - "+---------+----------------------+---------------+---------------+----------------------+--------------------+--------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "slow_link_data = slow_link_result.get(\"slow_link_analysis\")\n", - "if slow_link_data:\n", - " slow_link_table = PrettyTable(slow_link_data.get(\"headers\"))\n", - " for row in slow_link_data.get(\"data\"):\n", - " for i in range(len(row)):\n", - " row[i] = fill(str(row[i]), width=60)\n", - " slow_link_table.add_row(row)\n", - " slow_link_table.hrules = ALL\n", - " display(slow_link_table[:16])" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "77d6efa1-48e3-409f-82c4-3e2b3d868898", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
problemdescription
slow_rank_analysisComputing has some issues in the cluster, because the max difference of Computing time
has reached 2411.538ms. Communication has some issues in the cluster, because the max
difference of Communication time has reached 3989.506ms.
slow_link_analysisSDMA bandwidth(GB/s): The average is 14.332, while the maximum is 18.972GB/s and the
minimum is 9.147GB/s. the difference is 9.825GB/s.
" - ], - "text/plain": [ - "+--------------------+------------------------------------------------------------------------------------------------------+\n", - "| problem | description |\n", - "+--------------------+------------------------------------------------------------------------------------------------------+\n", - "| slow_rank_analysis | Computing has some issues in the cluster, because the max difference of Computing time |\n", - "| | has reached 2411.538ms. Communication has some issues in the cluster, because the max |\n", - "| | difference of Communication time has reached 3989.506ms. |\n", - "| slow_link_analysis | SDMA bandwidth(GB/s): The average is 14.332, while the maximum is 18.972GB/s and the |\n", - "| | minimum is 9.147GB/s. the difference is 9.825GB/s. |\n", - "+--------------------+------------------------------------------------------------------------------------------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "problems = slow_link_result.get(\"problems\")\n", - "headers = problems.get('headers')[:2]\n", - "if problems: # 如果存在相关问题则获取相关问题检测描述及建议\n", - " problem_table = PrettyTable(headers)\n", - " for row in problems.get(\"data\"):\n", - " row = [fill(str(element), width=100) for element in row]\n", - " problem_table.add_row(row[:2])\n", - " display(problem_table)\n", - "else:\n", - " print(\"There is no suggestion related to slow link analysis.\")" - ] - }, - { - "cell_type": "markdown", - "id": "ce27a1d3-1354-45f7-88d8-dcb8e438b2b2", - "metadata": {}, - "source": [ - "## 3) 分布式卡上的kernel算子统计展示" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "id": "466a0f30-042c-492a-bbf2-a5a85b649f95", - "metadata": {}, - "outputs": [], - "source": [ - "from advisor_backend.interface import Interface\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "id": "e05774e9-c47e-400f-8421-b4b71bcdcbc4", - "metadata": {}, - "outputs": [], - "source": [ - "interface = Interface(cluster_path)\n", - "dataset = interface.get_data('cluster', 'kernel')" - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "id": "e95b6849-1738-4975-929f-734edff5d1c1", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
rank idNameInput ShapesInput Data TypesOutput ShapesDuration(us)_meanDuration(us)_varDuration(us)_maxDuration(us)_minDuration(us)_countDuration(us)_sum
00Add100\"4096,10880;4096,10880\"FLOAT;FLOAT\"4096,10880\"478.210918237.729252721.420449.801024489687.980
10Add102\"21760;21760\"FLOAT;FLOAT\"21760\"4.3903910.0119154.8203.9810244495.760
20Add106\"21760,4096;21760,4096\"FLOAT;FLOAT\"21760,4096\"933.504395462.9793211257.140927.381024955908.500
30Add111\"4096,4096;4096,4096\"FLOAT;FLOAT\"4096,4096\"91.2673632.15827597.12085.12102493457.780
40Add118\"12288,4096;12288,4096\"FLOAT;FLOAT\"12288,4096\"526.3120121462.617511787.780424.241024538943.500
....................................
251315trans_Cast_12\"4096,1,1,128\"FLOAT\"4096,1,1,128\"8.4864950.0601749.8208.20204817380.342
251415trans_Cast_13\"4096,1,1,128\"FLOAT\"4096,1,1,128\"10.5345640.16638012.9009.48204821574.787
251515trans_Cast_14\"4096,1,1,128\"FLOAT\"4096,1,1,128\"9.7845510.29536813.0218.56204820038.761
251615trans_Cast_15\"4096,1,1,128\"DT_BF16\"4096,1,1,128\"8.3422110.12047110.2207.86204817084.848
251715trans_Cast_16\"4096,1,1,128\"DT_BF16\"4096,1,1,128\"9.5075890.11711111.6819.18204819471.543
\n", - "

2518 rows × 11 columns

\n", - "
" - ], - "text/plain": [ - " rank id Name Input Shapes Input Data Types \\\n", - "0 0 Add100 \"4096,10880;4096,10880\" FLOAT;FLOAT \n", - "1 0 Add102 \"21760;21760\" FLOAT;FLOAT \n", - "2 0 Add106 \"21760,4096;21760,4096\" FLOAT;FLOAT \n", - "3 0 Add111 \"4096,4096;4096,4096\" FLOAT;FLOAT \n", - "4 0 Add118 \"12288,4096;12288,4096\" FLOAT;FLOAT \n", - "... ... ... ... ... \n", - "2513 15 trans_Cast_12 \"4096,1,1,128\" FLOAT \n", - "2514 15 trans_Cast_13 \"4096,1,1,128\" FLOAT \n", - "2515 15 trans_Cast_14 \"4096,1,1,128\" FLOAT \n", - "2516 15 trans_Cast_15 \"4096,1,1,128\" DT_BF16 \n", - "2517 15 trans_Cast_16 \"4096,1,1,128\" DT_BF16 \n", - "\n", - " Output Shapes Duration(us)_mean Duration(us)_var Duration(us)_max \\\n", - "0 \"4096,10880\" 478.210918 237.729252 721.420 \n", - "1 \"21760\" 4.390391 0.011915 4.820 \n", - "2 \"21760,4096\" 933.504395 462.979321 1257.140 \n", - "3 \"4096,4096\" 91.267363 2.158275 97.120 \n", - "4 \"12288,4096\" 526.312012 1462.617511 787.780 \n", - "... ... ... ... ... \n", - "2513 \"4096,1,1,128\" 8.486495 0.060174 9.820 \n", - "2514 \"4096,1,1,128\" 10.534564 0.166380 12.900 \n", - "2515 \"4096,1,1,128\" 9.784551 0.295368 13.021 \n", - "2516 \"4096,1,1,128\" 8.342211 0.120471 10.220 \n", - "2517 \"4096,1,1,128\" 9.507589 0.117111 11.681 \n", - "\n", - " Duration(us)_min Duration(us)_count Duration(us)_sum \n", - "0 449.80 1024 489687.980 \n", - "1 3.98 1024 4495.760 \n", - "2 927.38 1024 955908.500 \n", - "3 85.12 1024 93457.780 \n", - "4 424.24 1024 538943.500 \n", - "... ... ... ... \n", - "2513 8.20 2048 17380.342 \n", - "2514 9.48 2048 21574.787 \n", - "2515 8.56 2048 20038.761 \n", - "2516 7.86 2048 17084.848 \n", - "2517 9.18 2048 19471.543 \n", - "\n", - "[2518 rows x 11 columns]" - ] - }, - "execution_count": 69, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dataset" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "27b75df4-792b-43dc-aa5c-d3c265642c1e", - "metadata": {}, - "outputs": [], - "source": [ - "# 保存到csv查看, 可修改保存路径\n", - "dataset.to_csv('cluster_kernel_details.csv', index=False, sep='\\t')" - ] - }, - { - "cell_type": "markdown", - "id": "ae45826394463cc4", - "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "source": [ - "## 4) 展示集群流水并行图\n", - "使用说明: \n", - "1). 需要使用Ascend Torch Profiler采集数据,如果需要展示FP和BP需要将activities设置为采集CPU和NPU \n", - "2). rank_ids为要展示的rank id列表,必选参数, 可视化顺序与rank_ids的顺序一致 \n", - "3). worker_num为多进程数量,可选参数,请根据机器配置调整,默认值为机器可用核心数的一半 \n", - "4). 如果没有采集CPU数据,则展示Stage和Bubble的流水图 \n", - "5). 生成的json文件可以在chrome trace中查看 \n", - "\n", - "示例图:\n", - "![pipeline_view](../../profiler/test/resource/pipeline_view.png)" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "id": "baf66781eccfbca1", - "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[INFO] Start to process 8 rank profiling data with 8 workers.\n", - "[INFO] Pipline view data process finished, cost 98.48s.\n" - ] - } - ], - "source": [ - "import json\n", - "\n", - "# rank_ids为要呈现的rank id列表,必选参数\n", - "# 可以使用列表推导式生成需要的rank_ids,最终展示顺序和rank_ids的顺序一致\n", - "# worker_num为多进程数量,可选参数,请根据机器配置调整,默认值为机器可用核心数的一半\n", - "dataset = interface.get_data(\"cluster\", \"pipeline\", rank_ids=[0, 1, 2, 3, 4, 5, 6, 7], worker_num=8)\n", - "\n", - "# 保存json数据,在chrome trace中查看\n", - "with open(\"./pipeline_view.json\", \"w\") as f:\n", - " json.dump(dataset.get(\"data\", []), f)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5f34ecf5-5c4a-4bc0-a761-e6338e534bac", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.10" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/profiler/advisor/common/__init__.py b/profiler/advisor/common/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/common/analyzer_scopes.py b/profiler/advisor/common/analyzer_scopes.py deleted file mode 100644 index 52e3e07554f354deb62222ee0de6e66ef8b07e2e..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/analyzer_scopes.py +++ /dev/null @@ -1,18 +0,0 @@ -class SupportedScopes: - - # used for specify fourth-level commands and define the key of the result dict - # the key defined bellow must be the same as value - TIMELINE_FUSION_OPS = "timeline_fusion_ops" - GRAPH = "graph" - SLOW_RANK = "slow_rank" - SLOW_LINK = "slow_link" - OVER_ALL = "over_all" - DYNAMIC_SHAPE_ANALYSIS = "dynamic_shape_analysis" - AICPU_ANALYSIS = "aicpu_analysis" - BLOCK_DIM_ANALYSIS = "block_dim_analysis" - OPERATOR_NO_BOUND_ANALYSIS = "operator_no_bound_analysis" - TIMELINE_OP_DISPATCH = "timeline_op_dispatch" - DATALOADER = "dataloader" - SYNCBN = "syncbn" - SYNCHRONIZE_STREAM = "synchronize_stream" - FREQ_ANALYSIS = "freq_analysis" diff --git a/profiler/advisor/common/constant.py b/profiler/advisor/common/constant.py deleted file mode 100644 index 87245a43ea33981e929a16717357d9c7d1713aff..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/constant.py +++ /dev/null @@ -1,146 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# timeline -DEQUEUE = "Dequeue" -DEQUEUE_SEP = "@" -ATEN = "aten" -NPU = "npu" -ATEN_SEP = "::" -OPTIMIZER = "Optimizer" -OPTIMIZER_SEP = "#" -OPTIMIZER_STEP = "step" -ENQUEUE = "enqueue" -TORCH_TO_NPU = "torch_to_npu" -OP_COMPILE_NAME = "AscendCL@aclopCompileAndExecute" -OP_COMPILE_ID = "aclopCompileAndExecute" -SYNC_STREAM = "AscendCL@aclrtSynchronizeStream" -MAX_OP_COMPILE_NUM = 20 -ACL_TO_NPU = "acl_to_npu" -TASK_TYPE = "Task Type" -CPU_OP = "cpu_op" -AI_CORE = "AI_CORE" -AI_CPU = "AI_CPU" -CALL_STACKS = "Call stack" -INPUT_DIMS = "Input Dims" -OP_SEP = "-" -MA_ADVISOR_MAX_PROCESSES = 16 -MA_ADVISOR_ANALYZE_PROCESSES = "MA_ADVISOR_ANALYZE_PROCESSES" -TIMELINE_OP_STACKS_DATASET = "timeline_op_stacks_dataset" -TIMELINE_BACKWARD_NO_STACK = "Backward broadcast, without call stacks in profiling." -TIMELINE_ACL_TO_NPU_NO_STACK = "Incoming flow is 'acl_to_npu', without call stacks in profiling." -TIMELINE_BACKWARD_NO_STACK_CODE = -1 -TIMELINE_ACL_TO_NPU_NO_STACK_CODE = -2 -TIMELINE_FUSION_OPS_NO_STACK_FLAG = "NO STACK" -NO_STACK_REASON_MAP = { - TIMELINE_BACKWARD_NO_STACK_CODE: "Backward broadcast, without call stacks in profiling.", - TIMELINE_ACL_TO_NPU_NO_STACK_CODE: "Incoming flow is 'acl_to_npu', without call stacks in profiling." -} -TIMELINE_API_DOC_URL = "https://gitee.com/ascend/mstt/blob/master/profiler/advisor/doc \ - /Samples%20of%20Fused%20Operator%20API%20Replacement.md" -AFFINITY_TRAINING_API = "Affinity training api" -TIMELINE_WITH_STACK_DOC_URL = "https://www.hiascend.com/document/detail/zh/canncommercial/" \ - "70RC1/modeldevpt/ptmigr/AImpug_0067.html" -PyTorch_AOE_OPERATOR_TUNE_URL = "https://www.hiascend.com/document/detail/zh/canncommercial/" \ - "70RC1/devtools/auxiliarydevtool/aoe_16_045.html" -MSLite_Infer_AOE_OPEATOR_TUNE_URL = "https://www.mindspore.cn/lite/docs/en/master/use/cloud_infer/converter_tool_ascend.html#aoe-auto-tuning" -ENABLE_COMPILED_TUNE_URL = "https://www.hiascend.com/document/detail/zh/canncommercial/" \ - "70RC1/modeldevpt/ptmigr/AImpug_0059.html" - -ASCEND_PROFILER_URL = "https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/modeldevpt/ptmigr/AImpug_0067.html" -TIMELINE_EMPTY_STACKS_PROMPT = "These APIs have no code stack. If parameter 'with_stack=False' while profiling, " \ - "please refer to {timeline_profiling_doc_url} to set 'with_stack=True'. " \ - "Otherwise, ignore following affinity APIs due to backward broadcast lack of stack." - -CLUSTER_ANALYSIS = "Cluster analysis" -SLOW_RANK_TIME_RATIO_THRESHOLD = 0.05 - -# version_control -CANN_VERSION_C30 = '6.3.RC2' -CANN_VERSION_C13 = '7.0.RC1' -CANN_VERSION_C15 = '7.0.0' -CANN_VERSION_C17 = '8.0.RC1' -SUPPORTED_CANN_VERSION = [CANN_VERSION_C30, CANN_VERSION_C13, CANN_VERSION_C15, CANN_VERSION_C17] -DEFAULT_CANN_VERSION = CANN_VERSION_C17 -ASCEND_PYTORCH_PROFILER = "ascend_pytorch_profiler" -MSLITE = "mslite" -MSPROF = "msprof" -SUPPORTED_PROFILING_TYPE = [ASCEND_PYTORCH_PROFILER, MSLITE, MSPROF] -DEFAULT_PROFILING_TYPE = ASCEND_PYTORCH_PROFILER -TORCH_VERSION_1_11_0 = '1.11.0' -TORCH_VERSION_2_1_0 = '2.1.0' - -SUPPORTED_TORCH_VERSION = [TORCH_VERSION_1_11_0, TORCH_VERSION_2_1_0] -DEFAULT_TORCH_VERSION = TORCH_VERSION_2_1_0 - -TERMINAL_OUTPUT_HEADERS = ["No.", "Problem", "Description", "Suggestion"] -SKIP_ANALYZE_PROMPT = "Finish analysis, no optimization suggestions" -SKIP_QUERY_PROMPT = "Finish query operator stack, no operators" - -# operator output constant -OPERATOR_OUT_TOPK = 10 -OPERATOR_LIST_UNLIMIT = -1 - -DEFAULT_OPERATOR_TYPE = 'None_type' -DEFAULT_DURATION_ZERO = 0.0 - -ADVISOR_LOG_LEVEL = "ADVISOR_LOG_LEVEL" -DEFAULT_LOG_LEVEL = "INFO" -SUPPORTED_LOG_LEVEL = ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"] - -RULE_BUCKET = "RULE-BUCKET" -CLOUD_RULE_REGION_CN_NORTH_9 = "cn-north-9" -CLOUD_RULE_REGION_CN_NORTH_7 = "cn-north-7" -CLOUD_RULE_REGION_CN_SOUTHWEST_2 = "cn-southwest-2" -CLOUD_RULE_REGION_LIST = [CLOUD_RULE_REGION_CN_NORTH_7, CLOUD_RULE_REGION_CN_NORTH_9, CLOUD_RULE_REGION_CN_SOUTHWEST_2] -INNER_REGION_LIST = [CLOUD_RULE_REGION_CN_NORTH_7] -DEFAULT_CLOUD_RULE_REGION = CLOUD_RULE_REGION_CN_SOUTHWEST_2 - -HTTP_PREFIXES = "http://" -HTTPS_PREFIXES = "https://" -COMMON_YAML_DIR = "modelarts/solution/ma_advisor_rules/" -COMMON_ENDPOINT_SUFFIX = "obs.{}.myhuaweicloud.com" -INNER_ENDPOINT_SUFFIX = "obs.{}.ulanqab.huawei.com" - -AICPU_RULES_YAML_NAME = "aicpu_rules.yaml" -FUSION_PASS_YAML_NAME = "op_fusion_pass.yaml" -TIMELINE_FUSION_OPS_YAML_NAME = "timeline_fusion_ops.yaml" -CLOUD_YAML_NAME_LIST = [AICPU_RULES_YAML_NAME, FUSION_PASS_YAML_NAME, TIMELINE_FUSION_OPS_YAML_NAME] - -MAX_RETRIES = 3 -TIMEOUT = 3 - -ADVISOR_RULE_PATH = "ADVISOR_RULE_PATH" -CLOUD_RULE_PATH = "rules/cloud/" -DEFAULT_RULE_PATH = "./rules/" - -TIMELINE_FUSION_OPS_INVALID_UNIQUE_ID = -1 - -DEFAULT_TEMPLATE_HEADER = "Performance Optimization Suggestions" - -PT_PROF_SUFFIX = "ascend_pt" -ASCEND_PROFILER_OUTPUT = "ASCEND_PROFILER_OUTPUT" -COLLECTION_PATH = "collection_path" -CLUSTER_ANALYSIS_OUTPUT = "cluster_analysis_output" -KERNEL_DETAILS_CSV = "kernel_details.csv" -CLUSTER_STEP_TIME_CSV = "cluster_step_trace_time.csv" -CLUSTER_COMM_JSON = "cluster_communication.json" - -BOTTLENECK = "bottleneck" -DATA = "data" - -FRAMEWORK_STACK_BLACK_LIST = ["torch", "torch_npu", "megatron", "deepspeed"] -DISABLE_STREAMING_READER = "DISABLE_STREAMING_READER" -MAX_FILE_SIZE = 10**10 diff --git a/profiler/advisor/common/graph/__init__.py b/profiler/advisor/common/graph/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/common/graph/graph.py b/profiler/advisor/common/graph/graph.py deleted file mode 100644 index 6bab2042de3a09f9317f71fc6a5c9740743cc790..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/graph/graph.py +++ /dev/null @@ -1,135 +0,0 @@ -import logging -from typing import Dict, List, Tuple, Callable, Any, Optional, Union - -import networkx as nx - -from profiler.advisor.common.graph.graph_parser import HostGraphNode, QueryGraphNode - -logger = logging.getLogger() - - -class Graph: - """ - Graph Struct - """ - - # pylint: disable=too-many-instance-attributes - def __init__(self, - nodes: Dict[str, Optional[Union[HostGraphNode, QueryGraphNode]]] = None, - edges: List[Tuple[Optional[Union[HostGraphNode, QueryGraphNode]], - Optional[Union[HostGraphNode, QueryGraphNode]]]] = None, - name: str = None): - self.name = name - self.graph = nx.DiGraph(name=name) - self.nodes = nodes if nodes is not None else {} - self.edges = edges if edges is not None else list() - - def build(self): - for op_name, node in self.nodes.items(): - # add node and mark op_name as tag - self.add_node(node, - op_type=node.op_type - ) - for edge in self.edges: - self.add_edge(*edge) - return self.graph - - def get_size(self) -> Dict[str, int]: - if not hasattr(self.graph, "nodes"): - return {"edges": 0, "nodes": 0} - - return {"edges": len(self.graph.edges), - "nodes": len(self.graph.nodes)} - - def add_node(self, node: HostGraphNode, **kwargs): - if node is None: - return - self.graph.add_node(node, **kwargs) - - def add_edge(self, pre_node: HostGraphNode, next_node: HostGraphNode): - if pre_node is None or next_node is None: - return - - if pre_node not in self.graph or \ - next_node not in self.graph: - logging.error("Nodes between edge should be both exists.") - return - - self.graph.add_edge(pre_node, next_node) - - def add_node_with_edge(self, node, adj_nodes: List[HostGraphNode]): - self.add_node(node) - for adj in adj_nodes: - self.add_edge(node, adj) - - def remove_node(self, node: HostGraphNode = None) -> None: - if node is None: - return - - self.graph.remove_node(node) - - def remove_edge(self, pre_node: HostGraphNode = None, next_node: HostGraphNode = None) -> None: - if pre_node is None or next_node is None: - raise ValueError(f"Invalid edge from {pre_node} to {pre_node}.") - - self.remove_edge(pre_node, next_node) - - def get_subgraph(self, nodes: List[HostGraphNode]) -> nx.DiGraph: - nodes = list(set(nodes)) - for node in nodes: - if not self.is_node_exists(node): - raise ValueError(f"Failed to subtract subgraph because {node.op_name} is not in the graph.") - - return self.graph.subgraph(nodes) - - def highlight_subgraph(self, subgraph: nx.DiGraph = None) -> None: - pass - - def get_node(self, node: HostGraphNode): - if node not in self.graph: - return - - return self.graph[node] - - def get_node_by_name(self, node_name: str): - return self.nodes.get(node_name, None) - - def is_node_exists(self, node: HostGraphNode): - return node in self.graph - - def draw(self, - graph: nx.DiGraph = None, - with_labels: bool = False, - labels: Dict[HostGraphNode, Any] = None, - pos_func: Callable = None, - font_weight: str = "bold", - savefig: bool = False, - node_size: int = 50, - **kwargs - ): - try: - import matplotlib.pylab as plt - except ImportError: - logger.error('Please install matplotlib first by using `pip install matplotlib`.') - return - - if graph is None: - graph = self.graph - - pos = pos_func(graph) if pos_func is not None else None - - if with_labels: - if labels is None: - labels = {k: f"{k}\n({v['op_name']})" for k, v in graph.nodes.items()} - - nx.draw(graph, - with_labels=with_labels, - pos=pos, - node_size=node_size, - font_weight=font_weight, - labels=labels, - **kwargs - ) - if savefig: - plt.savefig(self.name + ".png") - plt.show() diff --git a/profiler/advisor/common/graph/graph_match.py b/profiler/advisor/common/graph/graph_match.py deleted file mode 100644 index d0dfc162952b0c52bf9ed73cef2ff18ff5ffda24..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/graph/graph_match.py +++ /dev/null @@ -1,355 +0,0 @@ -import itertools -import logging -from functools import lru_cache -from collections import deque -from typing import Dict, Generator, List, Callable, Hashable, Tuple - -import networkx as nx - - -@lru_cache() -def match_node_attr_fun(query_node: Hashable, - host_node: Hashable, - query_graph: nx.Graph, - host_graph: nx.Graph - ) -> bool: - """ - Check query node matches the attributes in host graph - - :param query_node: Query graph node - :param host_node: Host graph node - :param query_graph: Query Graph - :param host_graph: Host graph - :return: bool, match or not - """ - # get node attr - if query_node not in query_graph.nodes or host_node not in host_graph.nodes: - return False - - query_node = query_graph.nodes[query_node] - host_node = host_graph.nodes[host_node] - for attr, val in query_node.items(): - if attr not in host_node: - return False - if isinstance(host_node[attr], str) and isinstance(val, str): - if host_node[attr].lower() != val.lower(): - return False - else: - if host_node[attr] != val: - return False - return True - - -@lru_cache() -def match_node_struct_fun(query_node: Hashable, - host_node: Hashable, - query_graph: nx.Graph, - host_graph: nx.Graph - ) -> bool: - """ - Check query node matches the structure in host graph - - :param query_node: Query graph node - :param host_node: Host graph node - :param query_graph: Query Graph - :param host_graph: Host graph - :return: bool, match or not - """ - if query_node not in query_graph.nodes or host_node not in host_graph.nodes: - return False - - return host_graph.degree(host_node) >= query_graph.degree(query_node) - - -@lru_cache() -def match_edge_attr_fun(query_edge: Tuple[Hashable, Hashable], - host_edge: Tuple[Hashable, Hashable], - query_graph: nx.Graph, - host_graph: nx.Graph - ) -> bool: - """ - Check query edge matches the attr in host graph - - :param query_edge: Query graph edge - :param host_edge: Host graph edge - :param query_graph: Query Graph - :param host_graph: Host graph - :return: bool, match or not - """ - # get edge attr - if query_edge not in query_graph.edges or host_edge not in host_graph.edges: - return False - - query_edge = query_graph.edges[query_edge] - host_edge = host_graph.edges[host_edge] - for attr, val in query_edge.items(): - if attr not in host_edge: - return False - if isinstance(host_edge[attr], str) and isinstance(val, str): - if host_edge[attr].lower() != val.lower(): - return False - else: - if host_edge[attr] != val: - return False - return True - - -def find_isomorphisms(query_graph: nx.Graph, - host_graph: nx.Graph, - *args, - _node_attr_fun: Callable = match_node_attr_fun, - _node_struct_fun: Callable = match_node_struct_fun, - _edge_attr_fun: Callable = match_edge_attr_fun, - limit: int = None, - **kwargs) -> List[Dict[Hashable, Hashable]]: - """ - Find all the sub graphs that are isomorphic to query_graph in host_graph . - - :param query_graph: The graph object to query - :param host_graph: The graph object to be queried - :param args: Position args - :param _node_attr_fun: The function to match node attr - :param _node_struct_fun: The function to match node structural - :param _edge_attr_fun: The function to match edge attr - :param limit: The limitation for the number of returned mappings - :param kwargs: Keyword args - :return: Matched node mapping list - ``` - [{query_id: host_id, ...}, ...] - ``` - """ - candidates = [] - for query_result in find_isomorphisms_iter( - query_graph, - host_graph, - *args, - _node_attr_fun=_node_attr_fun, - _node_struct_fun=_node_struct_fun, - _edge_attr_fun=_edge_attr_fun, - **kwargs - ): - candidates.append(query_result) - if limit and len(candidates) >= limit: - return candidates - return candidates - - -def find_isomorphisms_iter(query_graph: nx.Graph, - host_graph: nx.Graph, - directed: bool = None, - _node_attr_fun: Callable = None, - _node_struct_fun: Callable = None, - _edge_attr_fun: Callable = None, - ) -> Generator[Dict[Hashable, Hashable], None, None]: - """ - A generation to find one isomorphic subgraph in host_graph for query_graph. - - :param query_graph: The graph object to query - :param host_graph: The graph object to be queried - :param directed: Whether direction should be considered during search - :param _node_attr_fun: The function to match node attr - :param _node_struct_fun: The function to match node structural - :param _edge_attr_fun: The function to match edge attr - :return: Yield mappings from query node IDs to host graph IDs: {query_id: host_id, ...} - - """ - if directed is None: - # query graph and host graph should consider directions. - if isinstance(query_graph, nx.DiGraph) and \ - isinstance(host_graph, nx.DiGraph): - directed = True - else: - directed = False - - # Initialize queue - dq = deque() - dq.appendleft({}) - - while len(dq) > 0: - backbone = dq.pop() - next_candidate_backbones = get_next_candidates(backbone=backbone, - query_graph=query_graph, - host_graph=host_graph, - directed=directed, - _node_attr_fun=_node_attr_fun, - _node_struct_fun=_node_struct_fun, - _edge_attr_fun=_edge_attr_fun, - ) - for candidate in next_candidate_backbones: - # find a legal isomorphism - if len(candidate) == len(query_graph): - yield candidate - else: - # continue to search - dq.appendleft(candidate) - - -def get_next_candidates( - backbone: Dict, - query_graph: nx.Graph, # noqa - host_graph: nx.Graph, # noqa - next_node: Hashable = None, - directed: bool = True, # noqa - _node_attr_fun: Callable = None, # noqa - _node_struct_fun: Callable = None, # noqa - _edge_attr_fun: Callable = None # noqa -) -> List[Dict[Hashable, Hashable]]: - """ - Get a list of candidate node assignments for the next "step" of this map. - - :param backbone: Mapping of query node IDs to one set of host graph IDs - :param next_node: Optional suggestion for the next node to assign - :return: List[Dict[Hashable, Hashable]]: A new list of node mappings with one additional element mapped - """ - node_priority = {n: 1 for n in query_graph.nodes} - candidate_nodes = [] - - if next_node is None and len(backbone) == 0: - # Start case - next_node = max(node_priority.keys(), - key=lambda x: node_priority.get(x, 0)) - - for node in host_graph.nodes: - if _node_attr_fun(next_node, node, query_graph, host_graph) and \ - _node_struct_fun(next_node, node, query_graph, host_graph): - candidate_nodes.append({next_node: node}) - return candidate_nodes - - nodes_with_maximum_backbone = [] - for query_node_id in query_graph.nodes: - if query_node_id in backbone: - continue - - backbone_neighbors = [] - if not directed: - backbone_neighbors = query_graph.adj[query_node_id] - else: - # nx.DiGraph.pred: A <- B: find previous node from B to A - # nx.DiGraph.adj: A -> B : find next node from A to B - backbone_neighbors = list(set(query_graph.adj[query_node_id]).union(set(query_graph.pred[query_node_id]))) - - query_backbone_node_count = sum([1 for _node in backbone_neighbors if _node in backbone]) - if query_backbone_node_count > 0: - # Find a longer backbone node - nodes_with_maximum_backbone.append(query_node_id) - - # next_node is connected to the current backbone. - next_node = max(nodes_with_maximum_backbone, key=lambda x: node_priority.get(x, 0)) - - # verify all edges between `next_node` and nodes in the backbone are exist in host graph - # Step1: find all edges between `next_node` and nodes in the backbone - next_edge_edges = [] - for _node in query_graph.adj[next_node]: - if _node in backbone: - # `next_node` -> `_node` - next_edge_edges.append((None, next_node, _node)) - - if directed: - for _node in query_graph.pred[next_node]: - if _node in backbone: - # `_node` -> `next_node` - next_edge_edges.append((_node, next_node, None)) - - if len(next_edge_edges) == 0: - logging.warning("Find node without any edge, which is invalid.") - return [] - # Step2: verify candidate nodes that have such edges in the host graph - candidate_nodes = [] - if len(next_edge_edges) == 1: - source, _, target = next_edge_edges[0] - if not directed: - candidate_nodes = list(host_graph.adj[backbone[target]]) - else: - if source is not None: - # means `source` is a `from` edge - candidate_nodes = list(host_graph.adj[backbone[source]]) - elif target is not None: - # means `target` is a `from` edge - candidate_nodes = list(host_graph.pred[backbone[target]]) - - elif len(next_edge_edges) > 1: - candidate_nodes_set = set() - for (source, _, target) in candidate_nodes: - if not directed: - candidate_nodes_from_this_edge = host_graph.adj[backbone[target]] - else: - if source is not None: - candidate_nodes_from_this_edge = host_graph.adj[backbone[source]] - else: # target is not None: - candidate_nodes_from_this_edge = host_graph.pred[backbone[target]] - - if len(candidate_nodes_set) > 0: - candidate_nodes_set = candidate_nodes_set.intersection(candidate_nodes_from_this_edge) - else: - # Initialize candidate_nodes_set - candidate_nodes_set.update(candidate_nodes_from_this_edge) - candidate_nodes = list(candidate_nodes_set) - - tentative_results = [] - for _node in candidate_nodes: - if all([_node not in backbone.values(), - _node_attr_fun(next_node, _node, query_graph, host_graph), - _node_struct_fun(next_node, _node, query_graph, host_graph)] - ): - tentative_results.append({**backbone, - next_node: _node}) - - final_candidates = check_edges_mapping(tentative_results, - query_graph=query_graph, - host_graph=host_graph, - _edge_attr_fun=_edge_attr_fun) - return final_candidates - - -def check_edges_mapping(candidates: List[Dict[Hashable, Hashable]], - query_graph: nx.Graph, - host_graph: nx.Graph, - _edge_attr_fun: Callable = None - ) -> List[Dict[Hashable, Hashable]]: - """ - Check that all edges between the assigned nodes exist in the host graph. - - :param candidates: mapping nodes candidates - :param query_graph: The graph object to query - :param host_graph: The graph object to be queried - :param _edge_attr_fun: The function to match edge attr - :return: - """ - monomorphism_candidates = [] - - for candidate in candidates: - if len(candidate) != len(query_graph): - monomorphism_candidates.append(candidate) - continue - - all_pass_flag = True - for edge_start, edge_end in query_graph.edges: - # check edge in host graph - if not host_graph.has_edge(candidate[edge_start], candidate[edge_end]): - all_pass_flag = False - break - - # check edge attr - if _edge_attr_fun is None or not _edge_attr_fun( - (edge_start, edge_end), - (candidate[edge_start], candidate[edge_end]), - query_graph, - host_graph - ): - all_pass_flag = False - break - - if all_pass_flag: - monomorphism_candidates.append(candidate) - - # Isomorphisms check - final_candidates = [] - for candidate in monomorphism_candidates: - all_product = itertools.product(candidate.keys(), candidate.keys()) - for edge_start, edge_end in all_product: - if not query_graph.has_edge(edge_start, edge_end) and \ - host_graph.has_edge(candidate[edge_start], candidate[edge_end]): - break - else: - final_candidates.append(candidate) - return final_candidates diff --git a/profiler/advisor/common/graph/graph_parser.py b/profiler/advisor/common/graph/graph_parser.py deleted file mode 100644 index ef4dc4d681e0664c12120c9c8904ad48970a5840..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/graph/graph_parser.py +++ /dev/null @@ -1,414 +0,0 @@ -import os -import logging -import itertools -from collections import deque -from dataclasses import dataclass -from typing import List, Tuple, Dict - -from profiler.cluster_analyse.common_func.file_manager import FileManager - -logger = logging.getLogger() - - -@dataclass -class Tensor: - def __init__(self): - super().__init__() - self.shape = [] - self.origin_shape = [] - self.shape_range = [] - self.origin_shape_range = [] - self.dtype = "" - self.origin_data_type = "" - self.format = "" - self.origin_format = [] - - -@dataclass -class Attr: - - def __init__(self): - super().__init__() - self.key = str() - self.value = [] - - -class HostGraphNode: - def __init__(self): - super().__init__() - self.graph_name = str() - self.op_name = str() - self.op_type = str() - self.inputs = [] - self.input = [] - self.outputs = [] - self.output = [] - self.strides = [] - self.pads = [] - self.groups = "" - self.dilations = [] - self.kernelname = "" - self._attrs = [] - - def __repr__(self): - return f"" - - -@dataclass -class HostGraph: - def __init__(self): - super().__init__() - self.name = "" - self.nodes = {} - self.inputs = [] - self.edges = [] - self.model_name = None - self.file_path = None - - def build(self): - """build a graph""" - for name, node in self.nodes.items(): - for input_node in node.inputs: - if input_node not in self.nodes: - continue - self.nodes[input_node].outputs.append(name) - - -class HostGraphParser: - """ - Parse graph metadata from text file - """ - def __init__(self, file_path): - self.buffer = deque(maxlen=100) - self.line_no = 0 - self._file_path = file_path - self.edges: List[Tuple[HostGraphNode, HostGraphNode]] = [] - self.nodes: Dict[str, HostGraphNode] = {} - self.graphs = self._parse(self._file_path) - self._get_node_dict() - self._get_edges_list() - del self.graphs[0] - - @staticmethod - def _get_key_value( line): - res = line.split(':', 1) - return res[0].strip(), res[1].strip().strip('"') - - @staticmethod - def _parse_attr(key, value, obj): - if not isinstance(obj, list) and not obj: - return - if key == "dim" and hasattr(obj, "shape"): - obj.shape.append(value) - elif key == "name" and hasattr(obj, "op_name"): - obj.op_name = value - elif key == "name" and hasattr(obj, "name"): - obj.name = value - elif key == "dtype" and hasattr(obj, "dtype"): - obj.dtype = value - elif key == "layout" and hasattr(obj, "format"): - obj.format = value - elif key == "type" and hasattr(obj, "op_type"): - obj.op_type = value - elif key == "input" and hasattr(obj, "input"): - obj.inputs.append(value.strip('"').split(':')[0]) - elif key == "key" and hasattr(obj, "key"): - obj.key = value - elif hasattr(obj, key): - setattr(obj, key, value) - elif isinstance(obj, list) and key != "val_type": - obj.append(value) - - def _parse_struct(self, in_file, key, in_obj): - - def parse_shape(file, obj): - obj = self._parse_line(file, obj) - - def parse_input_desc(file, obj): - tensor = self._parse_line(file, Tensor()) - if obj and hasattr(obj, "input"): - obj.input.append(tensor) - - def parse_out_desc(file, obj): - tensor = self._parse_line(file, Tensor()) - if obj and hasattr(obj, "output"): - obj.output.append(tensor) - - def parse_op(file, obj: HostGraph): - node = self._parse_line(file, HostGraphNode()) - if hasattr(obj, "name"): - node.graph_name = obj.name - if obj and hasattr(obj, "nodes") and node.op_name: - obj.nodes[node.op_name] = node - - def parse_graph(file, obj): - graph = self._parse_line(file, HostGraph()) - obj.append(graph) - - def parse_attr(file, obj): - attr = self._parse_line(file, Attr()) - if hasattr(obj, attr.key): - if attr.key not in ['format']: - setattr(obj, attr.key, attr.value) - elif attr.key.endswith("_kernelname"): - setattr(obj, "kernelname", attr.value) - if obj and hasattr(obj, "get_attrs"): - obj.get_attrs().append(attr) - - def parse_list(file, obj): - value = [] - self._parse_line(file, value) - if isinstance(obj, list): - obj.append(value) - else: - obj = value - - def parse_value(file, obj): - if hasattr(obj, "value"): - obj.value = self._parse_line(file, obj.value) - - def parse_default(file, _obj=None): - """function with unused argument""" - self._parse_line(file, None) - - parse_methods = { - "shape": parse_shape, - "input_desc": parse_input_desc, - "output_desc": parse_out_desc, - "op": parse_op, - "graph": parse_graph, - "attr": parse_attr, - "list_list_int": parse_list, - "list_list_i": parse_list, - "list": parse_list, - "value": parse_value, - } - parse_methods.get(key, parse_default)(in_file, in_obj) - - def _read_line(self, file): - self.line_no += 1 - line = file.readline() - if line.strip().endswith('}'): - end_line = "" - while self.buffer and not end_line.strip().endswith("{"): - end_line = self.buffer.pop() - else: - self.buffer.append(line) - return line.strip() - - def _parse_line(self, file, obj=None): - line = self._read_line(file) - try: - while line and not line.endswith("}"): - if line.endswith('{'): - key = line.rstrip('{').strip() - self._parse_struct(file, key, obj) - else: - key, value = self._get_key_value(line) - self._parse_attr(key, value, obj) - line = self._read_line(file) - except Exception as exception: - if self.buffer: - logger.debug("***********************graph content**************************") - while self.buffer: - line = self.buffer.popleft() - logger.debug(line) - logger.debug("***********************graph content**************************") - raise exception - return obj - - def _parse(self, graph_file): - # pylint:disable=broad-except - graph_list = [] - with open(graph_file, "r", encoding="gbk") as file: - try: - graph_list = self._parse_line(file, graph_list) - except Exception: - logger.error( - "Parse line %s of file %s failed, make sure the format is correct.", self.line_no, graph_file - ) - graphs = [] - for graph in graph_list: - if isinstance(graph, HostGraph): - graphs.append(graph) - for graph in graphs: - graph.model_name = graphs[0].name - graph.file_path = self._file_path - graph.build() - return graphs - - def _get_edges_list(self) -> None: - if len(self.graphs) <= 0: - return - - def is_repeat_edge(edge, edge_collector): - for _edge in edge_collector: - if edge[0].op_name == _edge[0].op_name and edge[1].op_name == _edge[1].op_name: - return True - return False - - for node in self.nodes.values(): - for input_node_name in node.inputs: - if input_node_name not in self.nodes: - continue - input_node = self.nodes[input_node_name] - if not is_repeat_edge((input_node, node), self.edges): - self.edges.append((input_node, node)) - for output_node_name in node.outputs: - if output_node_name not in self.nodes: - continue - output_node = self.nodes[output_node_name] - if not is_repeat_edge((node, output_node), self.edges): - self.edges.append((node, output_node)) - - def _get_node_dict(self) -> None: - if not self.graphs: - self.nodes = {} - return - self.nodes = {node.op_name: node for graph in self.graphs for node in graph.nodes.values()} - - -class QueryGraphNode: - """ - Graph Node - """ - _ID = 0 - - def __init__(self, op_type: str, op_pass: str): - self._op_type = op_type - self._id = QueryGraphNode._ID - self._op_pass = op_pass - QueryGraphNode._ID += 1 - - def get_property(self, name): - """ - get property - """ - return getattr(self, name, lambda: None) - - @property - def op_type(self): - return self._op_type - - @property - def op_name(self): - return self._op_type + "_id_" + str(self._id) - - @property - def op_pass(self): - return self._op_pass - - @op_type.setter - def op_type(self, op_type): - self._op_type = op_type - - def __eq__(self, other): - return self._op_type == other._op_type and \ - self._id == other._id - - def __hash__(self): - return hash(self._op_type + str(self._id)) - - @staticmethod - def trim_string(string: str, length: int = -1): - """ - - Trim string to target length - :param string: Original string - :param length: Target length of string, -1 indicates original string. - :return: Trimmed string - """ - if string is None or not isinstance(string, str): - raise TypeError(f"Param string must be a string type but got {type(string)}.") - - if length <= -1 or len(string) <= length: - return string - - return string[:length] - - -class QueryGraphParser: - def __init__(self, rule_database_path: str): - self._fusion_rules: Dict[str, List[Tuple]] = dict() - self.load_database(rule_database_path) - self.num_rules = sum([len(v) for v in self._fusion_rules.values()]) - - @property - def fusion_rules(self): - return self._fusion_rules - - def load_database(self, rule_database): - if not os.path.isabs(rule_database): - rule_database = os.path.join(os.path.dirname(__file__), - "../", "../", - rule_database) - - if not os.path.exists(rule_database): - raise FileNotFoundError(f"Path {rule_database} does not exist.") - - database = FileManager.read_yaml_file(rule_database) - self.parse_yaml(database) - - def parse_yaml(self, yaml_database): - fusion_strategy_list = yaml_database.get("GraphFusion", []) - if yaml_database.get("UBFusion", []): - fusion_strategy_list.extend(yaml_database.get("UBFusion", [])) - for fusion_strategy in fusion_strategy_list: - if not isinstance(fusion_strategy, dict): - continue - (fusion_name, strategy), = fusion_strategy.items() - version = strategy.get("version", 0) - if version == 0 or version == "0": - self._fusion_rules[fusion_name] = self.build_query_graph_v0(fusion_name, - strategy.get('struct', [])) - elif version == 1 or version == "1": - self._fusion_rules[fusion_name] = self.build_query_graph_v1(fusion_name, - strategy.get('nodes', []), - strategy.get('edges', [])) - - @staticmethod - def build_query_graph_v0(graph_name: str, graph_struct: List[str]) -> List[Tuple]: - nodes = dict() - graphs = [] - edges = [] - - pre_node, next_node = None, None - for node in graph_struct: - pre_node = next_node - next_node = QueryGraphNode(node, graph_name) - nodes[next_node.op_name] = next_node - if pre_node is None or next_node is None: - continue - edges.append((pre_node, next_node,)) - graphs.append((nodes, edges, graph_name,)) - return graphs - - @staticmethod - def build_query_graph_v1(graph_name: str, - nodes_list: List[Dict], - edges_list: List[List[str]]) -> List[Tuple]: - graphs = [] - node_index = dict() - multi_node_list = [] - for index, node in enumerate(nodes_list): - (node_name, op_type), = node.items() - if isinstance(op_type, str): - op_type = [op_type] - multi_node_list.append([QueryGraphNode(op, graph_name) for op in op_type]) - node_index[node_name] = index - - multi_node = list(itertools.product(*multi_node_list)) - - for index, sub_nodes in enumerate(multi_node): - sub_graph_name = graph_name if index == 0 else f"{graph_name}#{index}" - sub_edge = [] - sub_node = dict() - for node in sub_nodes: - sub_node[node.op_name] = node - for edge in edges_list: - pre_node, next_node = edge - pre_node_index, next_node_index = node_index.get(pre_node), node_index.get(next_node) - sub_edge.append((sub_nodes[pre_node_index], sub_nodes[next_node_index])) - sub_graph = (sub_node, sub_edge, sub_graph_name,) - graphs.append(sub_graph) - return graphs diff --git a/profiler/advisor/common/profiling/__init__.py b/profiler/advisor/common/profiling/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/common/profiling/ge_info.py b/profiler/advisor/common/profiling/ge_info.py deleted file mode 100644 index 4fd5846d88ddbab5d898c020b76537c1ec52db3b..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/profiling/ge_info.py +++ /dev/null @@ -1,48 +0,0 @@ -""" -DB -""" -import logging -import os -from typing import Any, List - -from sqlalchemy import text - -from profiler.advisor.dataset.profiling.db_manager import ConnectionManager -from profiler.advisor.dataset.profiling.profiling_parser import ProfilingParser - -logger = logging.getLogger() - - -class GeInfo(ProfilingParser): - """ - ge info file - """ - FILE_PATTERN_MSG = "ge_info.db" - FILE_INFO = "ge info" - STATIC_OP_STATE = "0" - DYNAMIC_OP_STATE = "1" - - file_pattern_list = [r"ge_info.db"] - - def __init__(self, path: str) -> None: - super().__init__(path) - self.op_state_info_list = None - - def parse_from_file(self, profiling_db_file): - """ - ge info - """ - db_path, db_file = os.path.split(profiling_db_file) - if not ConnectionManager.check_db_exists(db_path, [db_file]): - return False - conn = ConnectionManager(db_path, db_file) - if conn.check_table_exists(['TaskInfo']): - with conn().connect() as sql_conn: - self.op_state_info_list = sql_conn.execute(text("select op_name, op_state from TaskInfo")).fetchall() - return True - - def get_static_shape_operators(self) -> List[Any]: - return [op for op, state in self.op_state_info_list if state == self.STATIC_OP_STATE] - - def get_dynamic_shape_operators(self) -> List[Any]: - return [op for op, state in self.op_state_info_list if state == self.DYNAMIC_OP_STATE] diff --git a/profiler/advisor/common/profiling/msprof.py b/profiler/advisor/common/profiling/msprof.py deleted file mode 100644 index 750c5481e67e31e5e85c4a38ae3a299abed70187..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/profiling/msprof.py +++ /dev/null @@ -1,145 +0,0 @@ -""" -msprof -""" -import logging -from typing import Dict, List - -from profiler.advisor.dataset.profiling.info_collection import TaskInfo -from profiler.advisor.dataset.profiling.profiling_parser import ProfilingParser - -logger = logging.getLogger() - - -class TaskChecker: - """ - check task info - """ - - def __init__(self): - self.sqe_keys = set() - - def is_sqe(self, task: TaskInfo) -> bool: - """check sqe""" - key = (task.pid, task.tid) - if task.args.get('name', '').endswith('_SQE'): - self.sqe_keys.add(key) - return False - - return key in self.sqe_keys - - -class Msprof(ProfilingParser): - """ - msprof - - """ - FILE_PATTERN_MSG = "msprof_*.json" - FILE_INFO = "msprof" - - file_pattern_list = [r"^msprof[_\d]+.json$"] - - def __init__(self, path: str) -> None: - super().__init__(path) - self._tasks: List[TaskInfo] = [] - self._iteration_time = 0.0 - self._model_id = None - self._iteration_id = None - self._process_pid: Dict[str, str] = {} - self._min_time = 0.0 - self._max_time = 0.0 - self._data_process_time = 0.0 - self._start_point = 0.0 - - def parse_from_file(self, file: str): - if not self._parse_json(file): - return False - min_time = float('inf') - max_time = 0.0 - task_checker = TaskChecker() - is_iter = False - for item in self._raw_data: - task = TaskInfo(item) - if task.cat == "Iteration Time": - self._min_time = task.start_time - self._max_time = task.end_time - self._iteration_time = task.dur - is_iter = True - if task.cat == "Data_aug Bound" and "Data_aug Bound(us)" in task.args: - self._data_process_time = task.args["Data_aug Bound(us)"] - - if self._start_point == 0 and task.start_time > 0: - self._start_point = task.start_time - - if task_checker.is_sqe(task): - continue - - self._tasks.append(task) - self._parse_task(task) - - start_time = task.start_time - dur = task.dur - if start_time == -1 or dur == -1 or dur == 0: - continue - if start_time < min_time: - min_time = start_time - end_time = start_time + dur - if end_time > max_time: - max_time = end_time - if not is_iter: - self._iteration_time = dur - self._max_time = max_time - self._min_time = min_time - if self._tasks: - return True - return False - - def _parse_task(self, task): - if "Iteration Refresh" in task.name: - self._iteration_id = task.args.get("Iteration ID") - elif "Model ID" in task.name: - self._model_id = int(task.name.split(":")[1]) - elif "process_name" == task.name: - self._process_pid[task.args.get("name")] = task.pid - - @property - def step_time(self): - return self._iteration_time + self._data_process_time - - @property - def iteration_time(self): - return self._iteration_time - - @property - def iter_max_time(self): - return self._max_time - - @property - def iter_min_time(self): - return self._min_time - - @property - def data_process_time(self): - return self._data_process_time - - @property - def tasks(self): - return self._tasks - - @property - def model_id(self): - return self._model_id - - @property - def iteration_id(self): - return self._iteration_id - - @property - def process_pid(self): - return self._process_pid - - def __len__(self): - return len(self._tasks) - - @property - def start_point(self): - return self._start_point diff --git a/profiler/advisor/common/profiling/op_summary.py b/profiler/advisor/common/profiling/op_summary.py deleted file mode 100644 index 4744b5029ad6f06d5ee5a60426fb9b40b0a8c3c8..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/profiling/op_summary.py +++ /dev/null @@ -1,76 +0,0 @@ -""" -summary -""" -import logging -from decimal import Decimal -from typing import List, Any - -from profiler.advisor.dataset.profiling.info_collection import OpInfo -from profiler.advisor.dataset.profiling.profiling_parser import ProfilingParser -from profiler.advisor.utils.utils import format_excel_title, lazy_property - -logger = logging.getLogger() - - -class OpSummary(ProfilingParser): - """ - op summary - """ - FILE_PATTERN_MSG = "op_summary_*.csv" - FILE_INFO = "op summary" - STATIC_OP_STATE = "static" - DYNAMIC_OP_STATE = "dynamic" - - file_pattern_list = [r"^op_summary_[_\d]+\.csv$"] - - def __init__(self, path: str) -> None: - super().__init__(path) - self.op_list: List[OpInfo] = [] - self._total_task_duration = 0.0 - self._total_task_wait_time = 0.0 - self._raw_data: List[List[str]] = [] - - def parse_from_file(self, file: str): - if not self._parse_csv(file): - return False - title_dict = dict(enumerate(self._raw_data[0])) - for op_data in self._raw_data[1:]: - op_info = OpInfo() - for idx, value in enumerate(op_data): - title = title_dict.get(idx, "") - formatted_title = format_excel_title(title) - if formatted_title == 'task_start_time' and 'us' in title and \ - value.replace('.', '').replace("E+", "").isnumeric(): - value = str(Decimal(value) * Decimal(1000)) - op_info.add_attr(formatted_title, value) - self.op_list.append(op_info) - self._total_task_duration += self.get_float(op_info.get_attr("task_duration")) - self._total_task_wait_time += self.get_float(op_info.get_attr("task_wait_time")) - if not self.op_list: - logger.error("No valid op info in %s", file) - return False - return True - - def get_static_shape_operators(self) -> List[Any]: - return [op_info.get_attr("op_name") for op_info in self.op_list if op_info.get_attr("op_state") == self.STATIC_OP_STATE] - - def get_total_task_duration(self): - """ - get total task duration of all operators - :return: - """ - return self._total_task_duration - - @lazy_property - def task_dict(self): - """ - task dict - """ - task_dict = {} - for op_info in self.op_list: - if op_info.op_name not in task_dict: - task_dict[op_info.op_name] = [op_info] - else: - task_dict[op_info.op_name].append(op_info) - - return task_dict diff --git a/profiler/advisor/common/profiling/tasktime.py b/profiler/advisor/common/profiling/tasktime.py deleted file mode 100644 index 732ff0f36796049fee5ff2521360ca3183ceafce..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/profiling/tasktime.py +++ /dev/null @@ -1,75 +0,0 @@ -""" -task time -""" -import logging -from typing import Dict, List - -from profiler.advisor.dataset.profiling.info_collection import TaskInfo -from profiler.advisor.dataset.profiling.profiling_parser import ProfilingParser - -logger = logging.getLogger() - -AICPU_TASK_TYPE = "AI_CPU" -AICORE_TASK_TYPE = "AI_CORE" - - -class TaskTime(ProfilingParser): - """ - task time info - """ - FILE_PATTERN_MSG = "task_time*.json" - FILE_INFO = "task time" - - file_pattern_list = [r"^task_time_[_\d]+\.json$"] - - def __init__(self, path: str) -> None: - super().__init__(path) - self._tasks: List[TaskInfo] = [] - self._aicore_tasks: List[TaskInfo] = [] - self._aicpu_tasks: List[TaskInfo] = [] - self._process_map: Dict[str, str] = {} - self._pid_map: Dict[str, str] = {} - - def get_aicpu_tasks(self): - """ - get aicpu tasks - :return: aicpu tasks - """ - return self._aicpu_tasks - - def get_aicore_tasks(self): - """ - get aicore tasks - :return: aicore tasks - """ - return self._aicore_tasks - - def parse_from_file(self, file: str): - if not self._parse_json(file): - return False - for item in self._raw_data: - if item.get("ph") != "M": # header - continue - if item.get("name") != "process_name": - continue - pid = item.get("pid") - pname = item["args"]["name"] - self._process_map[pid] = pname - self._pid_map[pname] = pid - for item in self._raw_data: - if item.get("ph") == "M": # header - continue - task = TaskInfo(item) - self._tasks.append(task) - if task.pid != self._pid_map.get("Task Scheduler"): - continue - if task.task_type == AICORE_TASK_TYPE: - self._aicore_tasks.append(task) - elif task.task_type == AICPU_TASK_TYPE: - self._aicpu_tasks.append(task) - self._aicore_tasks.sort(key=lambda x: x.start_time) - self._aicpu_tasks.sort(key=lambda x: x.start_time) - if not self._tasks: - logger.error("No valid task info in %s", file) - return False - return True diff --git a/profiler/advisor/common/timeline/__init__.py b/profiler/advisor/common/timeline/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/common/timeline/event.py b/profiler/advisor/common/timeline/event.py deleted file mode 100644 index e24d983a02ff19fef5d6ae476f7c2f55bd9c8f85..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/timeline/event.py +++ /dev/null @@ -1,24 +0,0 @@ -from decimal import Decimal -class AdvisorDict(dict): - def __getstate__(self): - return self.__dict__ - - def __setstate__(self, d): - self.__dict__.update(d) - - def __getattr__(self, key: str): - if key not in self: - return {} - - value = self[key] - if isinstance(value, dict): - value = AdvisorDict(value) - return value - - -class TimelineEvent(AdvisorDict): - - def ts_include(self, event): - return Decimal(self.ts) <= Decimal(event.ts) and Decimal(self.ts) + Decimal(self.dur) >= Decimal( - event.ts) + Decimal( - event.dur) \ No newline at end of file diff --git a/profiler/advisor/common/timeline/fusion_ops_db.py b/profiler/advisor/common/timeline/fusion_ops_db.py deleted file mode 100644 index 64cc849295ffb6758d1fc8fd77d71e13d0157204..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/timeline/fusion_ops_db.py +++ /dev/null @@ -1,267 +0,0 @@ -import logging -import os - -from profiler.advisor.common import constant -from profiler.advisor.common.timeline.fusion_ops_rule import OpRule -from profiler.advisor.common.timeline.fusion_ops_rule_handler import TimelineOpRuleHandler -from profiler.advisor.utils.log import get_log_level -from profiler.advisor.utils.utils import get_file_path_by_walk -from profiler.cluster_analyse.common_func.file_manager import FileManager - -logger = logging.getLogger() -logger.setLevel(get_log_level()) - - -def init_timeline_ops_db(cann_version=None, torch_version=None): - logger.debug("init operators database") - - return FusionOperatorDB(cann_version=cann_version, torch_version=torch_version) - - -def get_timeline_fusion_ops_yaml_path(): - # 环境变量 ADVISOR_RULE_PATH 不为空且该路径存在, os.walk遍历其下文件, 若存在相应的规则文件则返回路径 - advisor_rule_path = os.getenv(constant.ADVISOR_RULE_PATH) - if advisor_rule_path and os.path.exists(advisor_rule_path): - specified_file_path = get_file_path_by_walk(advisor_rule_path, constant.TIMELINE_FUSION_OPS_YAML_NAME) - if len(specified_file_path.strip()) and os.path.exists(specified_file_path): - logger.debug("Successfully find The %s file which is specified by the environment variable: %s.", - specified_file_path, constant.ADVISOR_RULE_PATH) - return specified_file_path - logger.warning("The %s does not exist in path: %s. Try to use cloud or default local YAML file.", - constant.TIMELINE_FUSION_OPS_YAML_NAME, os.path.normpath(advisor_rule_path)) - # 检查云文件默认保存路径文件夹下是否存在相应文件, 默认路径 ~/rules/cloud/ - cloud_file_path = os.path.join(os.path.expanduser("~"), constant.CLOUD_RULE_PATH, constant.TIMELINE_FUSION_OPS_YAML_NAME) - if os.path.exists(cloud_file_path): - logger.debug("Successfully find The cloud %s file in %s.", constant.TIMELINE_FUSION_OPS_YAML_NAME, - cloud_file_path) - return cloud_file_path - # 检查本地默认文件 - local_file_path = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))), - constant.DEFAULT_RULE_PATH, constant.TIMELINE_FUSION_OPS_YAML_NAME) - if not os.path.exists(local_file_path): - # 若本地默认文件不存在, 则log异常信息并 - logger.error("The default local YAML file does not exist. Please check the YAML file in the default path %s.", - local_file_path) - return local_file_path - - -class FusionOperatorDB: - - def __init__(self, file_path=None, cann_version=None, torch_version=None): - self.timeline_fusion_ops_yaml_path = os.path.normpath(get_timeline_fusion_ops_yaml_path()) - - self.cann_version = cann_version or constant.DEFAULT_CANN_VERSION - self.torch_version = torch_version or constant.DEFAULT_TORCH_VERSION - - self._supported_version_dict = {} - - self.is_empty = False - self.timeline_op_rule_handler = TimelineOpRuleHandler() - self.fusion_operator = self._load_yaml(self.timeline_fusion_ops_yaml_path) - - self._dequeue_op_names = [] - self._aten_op_names = [] - self._optimizer_op_names = [] - self._dequeue_op_api_map = {} - self._aten_op_api_map = {} - self._optimizer_op_api_map = {} - self._parse_db() - - @property - def dequeue_op_names(self): - return self._dequeue_op_names - - @property - def aten_op_names(self): - return self._aten_op_names - - @property - def optimizer_op_names(self): - return self._optimizer_op_names - - @property - def dequeue_op_api_map(self): - return self._dequeue_op_api_map - - @property - def aten_op_api_map(self): - return self._aten_op_api_map - - @property - def optimizer_op_api_map(self): - return self._optimizer_op_api_map - - def get_fusion_operator_with_unique_id(self, unique_id): - if unique_id == constant.TIMELINE_FUSION_OPS_INVALID_UNIQUE_ID: - logger.warning("The specified unique id: %s is invalid.Please check whether the rule of the unique id " - "exists and modify the rule.", constant.TIMELINE_FUSION_OPS_INVALID_UNIQUE_ID) - return {} - result_tmp_rule = self.timeline_op_rule_handler.get_tmp_timeline_op_rule_with_unique_id(unique_id) - result_op_rule = OpRule(result_tmp_rule) - return result_op_rule.get_final_rules() - - def regenerate_timeline_op_rule_with_unique_id(self, unique_id): - self.fusion_operator.clear() - logger.debug("Program try to regenerate the rule to version %s.", unique_id) - self.fusion_operator = self.get_fusion_operator_with_unique_id(unique_id) - self.regenerate_op_api_map_and_op_names() - - def regenerate_timeline_op_rule_with_version(self, cann_version=None, torch_version=None): - cann_version = cann_version or self.cann_version - torch_version = torch_version or self.torch_version - unique_id = self._get_unique_id_in_supported_version_dict(cann_version=cann_version, - torch_version=torch_version) - self.regenerate_timeline_op_rule_with_unique_id(unique_id) - - def regenerate_op_api_map_and_op_names(self): - self._dequeue_op_names.clear() - self._aten_op_names.clear() - self._optimizer_op_names.clear() - self._dequeue_op_api_map.clear() - self._aten_op_api_map.clear() - self._optimizer_op_api_map.clear() - self._parse_db() - - def _is_version_supported(self, db_content): - """校验当前版本是否被规则库中的版本支持, 保存版本支持信息数组, 按数组或字符串的可变方式保存""" - if db_content is None: - logger.warning( - "The rule library is empty. Check the rule library file: %s", - self.timeline_fusion_ops_yaml_path - ) - return False - for rule_dic in db_content: - if not isinstance(rule_dic, dict) or rule_dic.get("unique_id") is None: - continue - cann_version_list = rule_dic.get("cann_version") - torch_version_list = rule_dic.get("torch_version") - if not cann_version_list or not torch_version_list: - continue - supported_version = [cann_version_list, torch_version_list] - - unique_id = rule_dic.get("unique_id") - if unique_id < 0: - logger.warning( - "The unique id: %s of the rule should be a positive integer. " - "Please check and modify the rule configuration in the YAML file: %s.", - unique_id, os.path.normpath(self.timeline_fusion_ops_yaml_path) - ) - self._supported_version_dict[unique_id] = supported_version - - # 若解析timeline规则库的版本支持数组为空, 则存在问题 - if not self._supported_version_dict: - logger.warning( - "The rule library does not contain rules that support the current version. " - "Check the rule library file: %s", - self.timeline_fusion_ops_yaml_path - ) - return False - - # 检验当前版本是否被规则库支持 - is_version_supported = self._is_version_supported_in_supported_version_dict() - if not is_version_supported: - # 若规则库不支持当前版本, 则log警告信息 - logger.warning("Unsupported versions: cann-%s and torch-%s, supported version list of ['cann', 'torch'] " - "is %s", self.cann_version, self.torch_version, self._supported_version_dict.values()) - return is_version_supported - - def _is_version_supported_in_supported_version_dict(self, cann_version=None, torch_version=None): - """校验当前版本是否存在在规则库中的版本支持字典中""" - for _, supported_version in self._supported_version_dict.items(): - if self._is_version_supported_in_versions(supported_version, cann_version, torch_version): - return True - return False - - def _get_unique_id_in_supported_version_dict(self, cann_version=None, torch_version=None) -> int: - """校验当前版本是否存在在规则库中的版本支持字典中, 在使用前请检查是否支持该版本""" - for key_unique_id, supported_version in self._supported_version_dict.items(): - if self._is_version_supported_in_versions(supported_version, cann_version, torch_version): - return key_unique_id - return constant.TIMELINE_FUSION_OPS_INVALID_UNIQUE_ID - - def _is_version_supported_in_versions(self, supported_version, cann_version=None, torch_version=None): - """校验当前cann版本和torch版本是否存在在规则库中的版本支持数组的元素中""" - cann_version_list = supported_version[0] - if not isinstance(cann_version_list, list): - cann_version_list = [cann_version_list] - - torch_version_list = supported_version[1] - if not isinstance(torch_version_list, list): - torch_version_list = [torch_version_list] - - cann_version = cann_version or self.cann_version - torch_version = torch_version or self.torch_version - - if (cann_version in cann_version_list) and (torch_version in torch_version_list): - return True - return False - - def _parse_db(self): - """生成输出的规则库""" - self._parse(constant.ATEN) - self._parse(constant.DEQUEUE) - self._parse(constant.OPTIMIZER) - - def _parse(self, mode): - """生成输出的规则库中指定部分, 如aten, Optimizer等""" - op_info = self.fusion_operator.get(mode, []) or [] - for ops in op_info: - for npu_api, op_combined in ops.items(): - if not isinstance(op_combined, list): - self._parse_in_list(mode, op_combined, npu_api) - for _op_combined in op_combined: - self._parse_in_list(mode, _op_combined, npu_api) - - def _parse_in_list(self, mode, op_combined, npu_api): - """生成输出的规则库中具体部分, 如{silu: torch_npu.npu_silu/torch_npu.contrib.module.SiLU}等""" - if not isinstance(op_combined, str): - logger.warning("Error type in yaml: %s", op_combined) - return - mode_str = mode.lower() - getattr(self, f"{mode_str}_op_names", []).extend(op_combined.split("-")) - - new_npu_api = npu_api - pre_npu_api = getattr(self, f"{mode_str}_op_api_map", {}).get(op_combined) - if pre_npu_api: - new_npu_api = f"{pre_npu_api}/{npu_api}" - getattr(self, f"{mode_str}_op_api_map", {})[op_combined] = new_npu_api - logger.debug("Output rule: %s: %s: %s: %s ", mode, op_combined, new_npu_api, op_combined.split("-")) - - def _load_yaml(self, file_path): - """生成timeline规则库""" - logger.debug("Try to use the following yaml file as timeline ops rule: %s.", os.path.abspath(file_path)) - # 若文件不存在,则报错, 并返回空字典 - if not os.path.exists(file_path): - logger.warning("Path: '%s' does not exist, please specific existed path of " - "fusion operators yaml file by setting env '%s'", - os.path.abspath(file_path), constant.ADVISOR_RULE_PATH) - self.is_empty = True - return {} - - logger.debug("The rule yaml file is successfully found in path: %s", os.path.abspath(file_path)) - - db_content = FileManager.read_yaml_file(file_path) - - if not self._is_version_supported(db_content): - self.is_empty = True - return {} - - logger.debug("The rule library supports the current environment version.") - - # 获取所有版本timeline规则库 - self.timeline_op_rule_handler.set_db_content(db_content) - - # 获取所需版本规则 - unique_id = self._get_unique_id_in_supported_version_dict() - logger.debug("Program is using version %s of the rule.", unique_id) - result_op_rule = self.get_fusion_operator_with_unique_id(unique_id) - if result_op_rule and len(result_op_rule) > 0: - return result_op_rule - - logger.warning( - "Failed to load fusion operators database, skip analyze timeline for affinity api," - " please refer to database yaml %s to customize your yaml.", - self.timeline_fusion_ops_yaml_path - ) - self.is_empty = True - return {} diff --git a/profiler/advisor/common/timeline/fusion_ops_rule.py b/profiler/advisor/common/timeline/fusion_ops_rule.py deleted file mode 100644 index deee68edb9a92d0588f3f3c155a7b2595317a5c7..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/timeline/fusion_ops_rule.py +++ /dev/null @@ -1,110 +0,0 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2024-2024. All rights reserved. -import copy -import logging - -from profiler.advisor.utils.log import get_log_level - -logger = logging.getLogger() -logger.setLevel(get_log_level()) - - -class OpRule: - - def __init__(self, rule=None, timeline_op_rule_handler=None): - if rule is None: - self._tmp_rule = {} - else: - self._tmp_rule = copy.deepcopy(rule) - if timeline_op_rule_handler is None: - self.timeline_op_rule_handler = {} - else: - self.timeline_op_rule_handler = copy.deepcopy(timeline_op_rule_handler) - self._rule = {} - - @property - def tmp_rule(self): - return self._tmp_rule - - @staticmethod - def _format_rule(rule): - """格式化规则函数, 将额外规则格式化为{key,数组list}形式, 使得yaml文件中operator_rules若写成key:str形式也能正常读取""" - format_rule = {} - for key, val in rule.items(): - if not isinstance(val, list): - val = [val] - format_rule[key] = val - return format_rule - - def merge(self, extra_rule): - """合并函数, 将已有规则库与额外规则合并, 若无继承则已有规则库应为空""" - for key, val in extra_rule.items(): - for func, op_rules in val.items(): - try: - getattr(self, f"{func}")(key, op_rules) - except AttributeError: - logger.error("Undefined field and function name. Ensure that %s is correct in the rule " - "library.", func) - - def get_final_rules(self): - """获取最终的规则库""" - self._restore_rule() - return self._rule - - def add(self, key, add_rules: dict): - """新增函数, 新增已有规则库不存在的额外规则""" - if add_rules is None: - return - if self._tmp_rule.get(key) is None: - self._tmp_rule[key] = {} - format_add_rule = self._format_rule(add_rules) - for add_key, add_val in format_add_rule.items(): - logger.debug("add: %s: %s", add_key, add_val) - if add_key not in self._tmp_rule: - self._tmp_rule[key][add_key] = add_val - else: - logger.warning("This key has been written to the rule, " - "%s: %s should be written in the overwrite section", add_key, add_val) - self._tmp_rule[key][add_key].update(add_val) - - def overwrite(self, key, overwrite_rules: dict): - """重写函数, 重写已有规则库中已经存在的规则""" - if overwrite_rules is None: - return - if self._tmp_rule.get(key) is None: - self._tmp_rule[key] = {} - format_overwrite_rules = self._format_rule(overwrite_rules) - for overwrite_key, overwrite_val in format_overwrite_rules.items(): - logger.debug("overwrite: %s: %s", overwrite_key, overwrite_val) - if overwrite_key not in self._tmp_rule: - logger.warning("This key is not written to the rule. " - "%s: %s should be written in the add section", overwrite_key, overwrite_val) - self._tmp_rule[key][overwrite_key] = overwrite_val - else: - self._tmp_rule[key][overwrite_key].update(overwrite_val) - - def exclude(self, key, exclude_rules: list): - """除外函数, 将已有规则库已有的规则除外删除""" - if exclude_rules is None: - return - for exclude_key in exclude_rules: - logger.debug("exclude: %s", exclude_key) - if isinstance(exclude_key, str): - if exclude_key not in self._tmp_rule[key]: - logger.warning("This key is not written to the rule. " - "do not need to exclude: %s.", exclude_key) - continue - self._tmp_rule[key].pop(exclude_key) - else: - logger.warning("Error type rule in exclude: %s", exclude_key) - - def inherit_unique_id(self, key, inherit_unique_id): - """局部继承函数, 将规则库中指定unique_id版本覆盖指定位置""" - result_rule = self.timeline_op_rule_handler.get_tmp_timeline_op_rule_with_unique_id(inherit_unique_id) - if result_rule is not None and result_rule.get(key) is not None: - self._tmp_rule[key] = copy.deepcopy(result_rule.get(key)) - return - logger.error("Rule library version %s does not exist. ", inherit_unique_id) - - def _restore_rule(self): - for key, op_api_map in self._tmp_rule.items(): - self._rule[key] = [{op_combined: api} for op_combined, api in op_api_map.items()] diff --git a/profiler/advisor/common/timeline/fusion_ops_rule_handler.py b/profiler/advisor/common/timeline/fusion_ops_rule_handler.py deleted file mode 100644 index b0558cca6d951ee057e538b5e4da6d9c2e78111b..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/timeline/fusion_ops_rule_handler.py +++ /dev/null @@ -1,193 +0,0 @@ -# Copyright (c) Huawei Technologies Co., Ltd. 2024-2024. All rights reserved. -import copy -import logging - -from profiler.advisor.common import constant -from profiler.advisor.common.timeline.fusion_ops_rule import OpRule -from profiler.advisor.utils.log import get_log_level - -logger = logging.getLogger() -logger.setLevel(get_log_level()) - - -class TimelineOpRuleHandler: - """基于线性规划思想保存OpRule,用于局部继承、全局继承等功能""" - - def __init__(self): - self._db_content = None - # 具体生成的timeline规则,key为unique_id - self._all_tmp_timeline_op_rule = {} - # 所有timeline规则的dict集合,key为unique_id - self._all_origin_timeline_op_rule_dict = {} - # 已生成timeline规则的id数组 - self._exist_timeline_op_rule_unique_id_list = [] - - @staticmethod - def _get_local_inherit_id_list(op_rule: dict): - local_inherit_id_list = [] - for _, val in op_rule.items(): - if val.get("inherit_unique_id") is not None: - local_inherit_id_list.append(val.get("inherit_unique_id")) - return local_inherit_id_list - - @staticmethod - def _is_duplicated_element_in_lists(list_a, list_b): - """检查两个数组中是否存在重复的元素,若有任意元素重复,返回True""" - if not isinstance(list_a, list): - list_a = [list_a] - if not isinstance(list_b, list): - list_b = [list_b] - # 将两个数组合并为一个列表,使用集合(set)判断列表中是否存在重复元素 - combined_list = list_a + list_b - if len(combined_list) != len(set(combined_list)): - return True - return False - - def set_db_content(self, db_content): - # 过滤非 dict 格式, 或 dict 中没有定义 unique_id 的数据, 并保存到 _all_origin_timeline_op_rule_dict 中 - self._db_content = copy.deepcopy(db_content) - for rule_dic in self._db_content: - if not isinstance(rule_dic, dict) or rule_dic.get("unique_id") is None: - continue - self._all_origin_timeline_op_rule_dict[rule_dic.get("unique_id")] = rule_dic - if self._all_origin_timeline_op_rule_dict: - self.generate_all_timeline_op_rule() - - def generate_basic_timeline_op_rules(self): - """用于实现获取无全局继承规则, 无全局继承的规则认为是基础版本规则, 默认不会存在局部继承""" - for _, rule_dic in self._all_origin_timeline_op_rule_dict.items(): - if rule_dic.get("inherit_unique_id") is None: - self.add_basic_timeline_op_rule(rule_dic) - - def add_basic_timeline_op_rule(self, rule_dic): - # 若基础规则中存在局部继承的规则,则跳过 - local_inherit_id_list = self._get_local_inherit_id_list(rule_dic.get("operator_rules")) - if local_inherit_id_list: - return - - temp_rule = OpRule() - temp_rule.merge(rule_dic.get("operator_rules")) - - unique_id = rule_dic.get("unique_id") - logger.debug("The rule of version %s is basic rule.", unique_id) - self.add_new_timeline_op_rule(unique_id, temp_rule.tmp_rule) - - def add_empty_timeline_op_rule(self, unique_id): - if self._all_origin_timeline_op_rule_dict.get(unique_id) is None: - self._all_origin_timeline_op_rule_dict[unique_id] = {} - tmp_rule = {} - logger.debug("The rule of version %s is empty.", unique_id) - self.add_new_timeline_op_rule(unique_id, tmp_rule) - - def add_new_timeline_op_rule(self, unique_id, tmp_rule): - if unique_id not in self._exist_timeline_op_rule_unique_id_list: - self._exist_timeline_op_rule_unique_id_list.append(unique_id) - self._all_tmp_timeline_op_rule[unique_id] = tmp_rule - logger.debug("The rule of version %s is successfully generated.", unique_id) - - def generate_specified_list_timeline_op_rule(self, specified_unique_id_list, kid_id_list=None): - for specified_unique_id in specified_unique_id_list: - if specified_unique_id in self._exist_timeline_op_rule_unique_id_list: - self.generate_specified_timeline_op_rule(specified_unique_id, kid_id_list) - - def generate_specified_timeline_op_rule(self, specified_unique_id, kid_id_list=None): - """用于实现生成特定版本规则 - - 若不存在相应specified_unique_id的规则、或是已生成、循环继承等情况,将该规则置空并返回 - 规则库文件结构设置为多叉树, 结构决定了不断向下搜索最终应该是从基础版本开始继承, 递归生成, - 直到specified_unique_id规则依赖继承的规则库全部生成完毕, 再生成该指定规则库, 将specified_unique_id的规则库归档 - - 参数: - specified_unique_id: 指定版本规则id - kid_id_list: 子规则id数组, 用于防止循环继承, 如间接继承自身或直接继承自身等情况 - 返回: - None - """ - if kid_id_list is None: - kid_id_list = [] - - # 若该unique_id规则在timeline_fusion_ops.yaml中没有相应的规则, 生成该id规则,置为空 - if self._all_origin_timeline_op_rule_dict.get(specified_unique_id) is None: - logger.warning("The specified version %s does not exist in the rule library. " - "Ensure that the corresponding rule is configured in the YAML file. " - "The version %s is left blank.", - specified_unique_id, - specified_unique_id) - self.add_empty_timeline_op_rule(specified_unique_id) - return - - # 若该unique_id规则已经生成,则无需再次生成 - if specified_unique_id in self._exist_timeline_op_rule_unique_id_list: - logger.warning("The rule has been generated and does not need to be generated again. " - "Check whether unique id %s in the YAML file is duplicate.", - specified_unique_id) - return - - # 若kid_id_list不为空,且间接继承自身,则尝试生成空规则用于继承 - if kid_id_list and self._is_duplicated_element_in_lists(specified_unique_id, kid_id_list): - logger.warning("It cannot be inherited indirectly. Ensure that the corresponding rules are correctly " - "configured in the YAML file and leave Version %s blank.", - specified_unique_id) - self.add_empty_timeline_op_rule(specified_unique_id) - return - - rule_dic = self._all_origin_timeline_op_rule_dict.get(specified_unique_id) - if rule_dic is not None: - kid_id_list.append(specified_unique_id) - - global_inherit_id = rule_dic.get("inherit_unique_id") - if global_inherit_id and global_inherit_id not in self._exist_timeline_op_rule_unique_id_list: - logger.debug("The rule of version %s global inherit the rule of version %s", - specified_unique_id, global_inherit_id) - self.generate_specified_timeline_op_rule(global_inherit_id, kid_id_list) - - # 若局部继承的规则未生成, 生成该规则 - local_inherit_id_list = self._get_local_inherit_id_list(rule_dic.get("operator_rules")) - if local_inherit_id_list: - logger.debug("The rule of version %s local inherit the rule of version %s", - specified_unique_id, local_inherit_id_list) - self.generate_specified_list_timeline_op_rule(specified_unique_id_list=local_inherit_id_list, - kid_id_list=kid_id_list) - logger.debug("Start to generate rule of version %s", specified_unique_id) - # 实现全局继承与局部继承 - temp_rule = OpRule(timeline_op_rule_handler=self, - rule=self._all_tmp_timeline_op_rule.get(global_inherit_id)) - temp_rule.merge(rule_dic.get("operator_rules")) - # 将生成的规则归档保存 - self.add_new_timeline_op_rule(specified_unique_id, temp_rule.tmp_rule) - return - logger.error("Failed to generate the rule whose unique_id is %s. Ensure that the rule is configured in " - "the YAML file and the version %s is empty.", specified_unique_id, specified_unique_id) - self.add_empty_timeline_op_rule(specified_unique_id) - - def generate_all_timeline_op_rule(self): - """用于实现获取所有版本规则 - - 查找db_content中的规则库, 规则库文件结构设置为多叉树, 优先生成无继承的基础规则版本 - 循环并生成其他版本, 文件结构决定了不断向下搜索最终应该是从基础版本开始继承, 递归生成,直到全部规则库生成后退出函数 - - 参数: - None - 返回: - None - """ - self.generate_basic_timeline_op_rules() - _unique_id_list = copy.deepcopy(list(self._all_origin_timeline_op_rule_dict.keys())) - for unique_id in _unique_id_list: - if unique_id in self._exist_timeline_op_rule_unique_id_list: - continue - self.generate_specified_timeline_op_rule(unique_id) - - def get_tmp_timeline_op_rule_with_unique_id(self, unique_id): - if unique_id not in self._exist_timeline_op_rule_unique_id_list: - logger.error("The specified unique_id does not exist in the rule library. Ensure that the " - "corresponding rule is configured in the YAML file and the version %s is empty." - "If the value of unique_id is a negative number, the version may not be supported.", - unique_id) - self.add_empty_timeline_op_rule(unique_id) - if unique_id < 0: - logger.error("Advise to use a positive integer as the unique id of rules. " - "Negative numbers: %s are not recommended to use as unique id. " - "If specified invalid unique id: %s is used, an empty rule is returned by default.", - unique_id, constant.TIMELINE_FUSION_OPS_INVALID_UNIQUE_ID) - return self._all_tmp_timeline_op_rule.get(unique_id) diff --git a/profiler/advisor/common/version_control.py b/profiler/advisor/common/version_control.py deleted file mode 100644 index 38b054543fc61e90d91e8442a547376cff4c6406..0000000000000000000000000000000000000000 --- a/profiler/advisor/common/version_control.py +++ /dev/null @@ -1,26 +0,0 @@ -import logging -from typing import List - -logger = logging.getLogger() - - -class VersionControl: - _SUPPORT_VERSIONS = [] - - @classmethod - def is_supported(cls, cann_version: str) -> bool: - """ - Check whether the CANN software version is supported, which can be viewed by executing the following command: - 'cat /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info' - """ - flag = (cls._SUPPORT_VERSIONS.__contains__(cann_version)) - if not flag: - logger.debug("class type is %s, which is not support current CANN version %s", cls.__name__, cann_version) - return flag - - def get_support_version(self) -> List[str]: - """ - Acquire the CANN software version - :return: supported CANN software version - """ - return self._SUPPORT_VERSIONS diff --git a/profiler/advisor/computation_analysis.ipynb b/profiler/advisor/computation_analysis.ipynb deleted file mode 100644 index 0d4aaadfadff05d1e11d4a9873ef7ce4ae2cfaa8..0000000000000000000000000000000000000000 --- a/profiler/advisor/computation_analysis.ipynb +++ /dev/null @@ -1,748 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import sys\n", - "\n", - "sys.path.append(\"../..\")\n", - "\n", - "from prettytable import PrettyTable, ALL\n", - "from textwrap import fill\n", - "from profiler.advisor.interface.interface import Interface" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "# 配置profiling采集出来的数据,需要指定到的profiling目录是同一个工具采集的,并且需要采集l0级别以上\n", - "profiling_path = r\"YOUR PROFILING PATH\"\n", - "interface = Interface(profiling_path=profiling_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Block Dim问题识别\n", - "\n", - "Block Dim问题主要为识别相关core算子AI core核未打满或者Vector 核未打满问题,主要调优手段为AOE调优,由于AOE调优依赖静态shape,所以当所有算子都为动态shape时,将不会检测相关Block Dim问题.\n", - "\n", - "下列代码为样例,主要展示如何检测Block Dim类型问题,并获取相关问题检测结果:\n" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# 查询computation相关是否存在block dim问题\n", - "# 如果profiling数据采集自非8.0.RC1的CANN版本,需要在训练/推理环境中执行: 'cat CANN安装目录/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info'命令查看version\n", - "block_dim_result = interface.get_result(\"computation\", \"block_dim_analysis\", cann_version=\"7.0.RC1\")" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
problemdescriptionsuggestionproblem counttotal_time(us)time ratioincome(us)income ratio
block dimsome operator does not make full use of 25 ai core or 50 ai vector core; Top-10
operator of task duration are as follows: Square, MatMulV2, BatchMatMul,
SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast
1. Optimize operator by AOE, such as: 'aoe --job_type=2
--model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi
sor\\operator_tuning_file_20240613153259.cfg'
101814.01999999999991.0
" - ], - "text/plain": [ - "+-----------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| problem | description | suggestion | problem count | total_time(us) | time ratio | income(us) | income ratio |\n", - "+-----------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| block dim | some operator does not make full use of 25 ai core or 50 ai vector core; Top-10 | 1. Optimize operator by AOE, such as: 'aoe --job_type=2 | 101 | 814.0199999999999 | 1.0 | | |\n", - "| | operator of task duration are as follows: Square, MatMulV2, BatchMatMul, | --model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi | | | | | |\n", - "| | SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast | sor\\operator_tuning_file_20240613153259.cfg' | | | | | |\n", - "+-----------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "problems = block_dim_result.get(\"problems\")\n", - "if problems: # 如果存在相关问题则获取相关问题检测描述及建议\n", - " problem_table = PrettyTable(problems.get(\"headers\"))\n", - " for row in problems.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " problem_table.add_row(row)\n", - " \n", - " problem_table.align = \"l\"\n", - " problem_table.hrules = ALL\n", - " display(problem_table)\n", - "else:\n", - " print(\"There is no suggestion related to block dim.\")" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
op_nameop_typetask_typetask_durationincomeblock_dimmix_block_diminput_shapesinput_data_typesinput_formatsoutput_shapesoutput_data_typesoutput_formats
Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/attention_norm-
LlamaRMSNorm/Square-op34Default/model-LlamaModel/layers-
CellList/0-LLamaDecodeLayer/attention_norm-LlamaRMSNorm/ReduceMean-op35
SquareAI_VECTOR_CORE42.760160"128,128"FLOATNCHW"128,1"FLOATNCHW
Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/ffn_norm-
LlamaRMSNorm/Square-op77Default/model-LlamaModel/layers-
CellList/0-LLamaDecodeLayer/ffn_norm-LlamaRMSNorm/ReduceMean-op78
SquareAI_VECTOR_CORE42.240160"128,128"FLOATNCHW"128,1"FLOATNCHW
Default/lm_head-Linear/MatMul-op213MatMulV2AI_CORE39.020200"128,128;128,32000"FLOAT16;FLOAT16FORMAT_ND;FORMAT_ND"128,32000"FLOATFORMAT_ND
" - ], - "text/plain": [ - "+-----------------------------------------------------------------------------+----------+----------------+---------------+--------+-----------+---------------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| op_name | op_type | task_type | task_duration | income | block_dim | mix_block_dim | input_shapes | input_data_types | input_formats | output_shapes | output_data_types | output_formats |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+--------+-----------+---------------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/attention_norm- | Square | AI_VECTOR_CORE | 42.76 | 0 | 16 | 0 | \"128,128\" | FLOAT | NCHW | \"128,1\" | FLOAT | NCHW |\n", - "| LlamaRMSNorm/Square-op34Default/model-LlamaModel/layers- | | | | | | | | | | | | |\n", - "| CellList/0-LLamaDecodeLayer/attention_norm-LlamaRMSNorm/ReduceMean-op35 | | | | | | | | | | | | |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+--------+-----------+---------------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/ffn_norm- | Square | AI_VECTOR_CORE | 42.24 | 0 | 16 | 0 | \"128,128\" | FLOAT | NCHW | \"128,1\" | FLOAT | NCHW |\n", - "| LlamaRMSNorm/Square-op77Default/model-LlamaModel/layers- | | | | | | | | | | | | |\n", - "| CellList/0-LLamaDecodeLayer/ffn_norm-LlamaRMSNorm/ReduceMean-op78 | | | | | | | | | | | | |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+--------+-----------+---------------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| Default/lm_head-Linear/MatMul-op213 | MatMulV2 | AI_CORE | 39.02 | 0 | 20 | 0 | \"128,128;128,32000\" | FLOAT16;FLOAT16 | FORMAT_ND;FORMAT_ND | \"128,32000\" | FLOAT | FORMAT_ND |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+--------+-----------+---------------+---------------------+------------------+---------------------+---------------+-------------------+----------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "if problems: # 如果存在相关问题则获取相关问题检测细节\n", - " block_dim = block_dim_result.get(\"block dim\")\n", - " block_dim_table = PrettyTable(block_dim.get(\"headers\"))\n", - " for row in block_dim.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " block_dim_table.add_row(row)\n", - "\n", - " block_dim_table.hrules = ALL\n", - " display(block_dim_table[:3])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Operator No Bound问题识别\n", - "Operator No Bound问题主要为识别相关算子无mte, cube, vector, scalar相关bound问题,主要调优手段为AOE调优,由于AOE调优依赖静态shape,所以当所有算子都为动态shape时,将不会检测相关Operator No Bound问题.\n", - "\n", - "下列代码为样例,主要展示如何检测Operator No Bound类型问题,并获取相关问题检测结果:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "from prettytable import PrettyTable, ALL\n", - "from textwrap import fill\n", - "from profiler.advisor.interface.interface import Interface\n", - "\n", - "\n", - "# 配置profiling采集出来的数据,需要指定到的profiling目录是同一个工具采集的,并且需要采集l0级别以上\n", - "profiling_path = r\"YOUR PROFILING PATH\"\n", - "interface = Interface(profiling_path=profiling_path)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "# 查询computation相关是否存在operator no bound问题\n", - "# 如果profiling数据采集自非8.0.RC1的CANN版本,需要在训练/推理环境中执行: 'cat CANN安装目录/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info'命令查看version\n", - "operator_no_bound_result = interface.get_result(\"computation\", \"operator_no_bound_analysis\", cann_version=\"7.0.RC1\")" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
problemdescriptionsuggestionproblem counttotal_time(us)time ratioincome(us)income ratio
block dimsome operator does not make full use of 25 ai core or 50 ai vector core; Top-10
operator of task duration are as follows: Square, MatMulV2, BatchMatMul,
SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast
1. Optimize operator by AOE, such as: 'aoe --job_type=2
--model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi
sor\\operator_tuning_file_20240613153259.cfg'
101814.01999999999991.0
operator no boundThere is no mte, cube, vector, scalar ratio is more than 80.00%; Top task
duration operators need to be tuned are as follows: Square, MatMulV2,
BatchMatMul, SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast
1. Optimize operator by AOE, such as: 'aoe --job_type=2
--model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi
sor\\operator_tuning_file_20240613153259.cfg'
95814.01999999999990.7985
" - ], - "text/plain": [ - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| problem | description | suggestion | problem count | total_time(us) | time ratio | income(us) | income ratio |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| block dim | some operator does not make full use of 25 ai core or 50 ai vector core; Top-10 | 1. Optimize operator by AOE, such as: 'aoe --job_type=2 | 101 | 814.0199999999999 | 1.0 | | |\n", - "| | operator of task duration are as follows: Square, MatMulV2, BatchMatMul, | --model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi | | | | | |\n", - "| | SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast | sor\\operator_tuning_file_20240613153259.cfg' | | | | | |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| operator no bound | There is no mte, cube, vector, scalar ratio is more than 80.00%; Top task | 1. Optimize operator by AOE, such as: 'aoe --job_type=2 | 95 | 814.0199999999999 | 0.7985 | | |\n", - "| | duration operators need to be tuned are as follows: Square, MatMulV2, | --model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi | | | | | |\n", - "| | BatchMatMul, SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast | sor\\operator_tuning_file_20240613153259.cfg' | | | | | |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "problems = operator_no_bound_result.get(\"problems\")\n", - "problem_table = PrettyTable(problems.get(\"headers\"))\n", - "if problems: # 如果存在相关问题则获取相关问题检测描述及建议\n", - " for row in problems.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " problem_table.add_row(row)\n", - "\n", - " problem_table.align = \"l\"\n", - " problem_table.hrules = ALL\n", - " display(problem_table)\n", - "else:\n", - " print(\"There is no suggestion related to operator no bound.\")" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
op_nameop_typetask_typetask_durationvec_ratiomac_ratioscalar_ratiomte1_ratiomte2_ratiomte3_ratioblock_diminput_shapesinput_data_typesinput_formatsoutput_shapesoutput_data_typesoutput_formats
Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/attention_norm-
LlamaRMSNorm/Square-op34Default/model-LlamaModel/layers-
CellList/0-LLamaDecodeLayer/attention_norm-LlamaRMSNorm/ReduceMean-op35
SquareAI_VECTOR_CORE42.760.465400000.005616"128,128"FLOATNCHW"128,1"FLOATNCHW
Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/ffn_norm-
LlamaRMSNorm/Square-op77Default/model-LlamaModel/layers-
CellList/0-LLamaDecodeLayer/ffn_norm-LlamaRMSNorm/ReduceMean-op78
SquareAI_VECTOR_CORE42.240.46600000.006216"128,128"FLOATNCHW"128,1"FLOATNCHW
Default/lm_head-Linear/MatMul-op213MatMulV2AI_CORE39.0200.11050.01190.08570.4284020"128,128;128,32000"FLOAT16;FLOAT16FORMAT_ND;FORMAT_ND"128,32000"FLOATFORMAT_ND
" - ], - "text/plain": [ - "+-----------------------------------------------------------------------------+----------+----------------+---------------+-----------+-----------+--------------+------------+------------+------------+-----------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| op_name | op_type | task_type | task_duration | vec_ratio | mac_ratio | scalar_ratio | mte1_ratio | mte2_ratio | mte3_ratio | block_dim | input_shapes | input_data_types | input_formats | output_shapes | output_data_types | output_formats |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+-----------+-----------+--------------+------------+------------+------------+-----------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/attention_norm- | Square | AI_VECTOR_CORE | 42.76 | 0.4654 | 0 | 0 | 0 | 0 | 0.0056 | 16 | \"128,128\" | FLOAT | NCHW | \"128,1\" | FLOAT | NCHW |\n", - "| LlamaRMSNorm/Square-op34Default/model-LlamaModel/layers- | | | | | | | | | | | | | | | | |\n", - "| CellList/0-LLamaDecodeLayer/attention_norm-LlamaRMSNorm/ReduceMean-op35 | | | | | | | | | | | | | | | | |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+-----------+-----------+--------------+------------+------------+------------+-----------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| Default/model-LlamaModel/layers-CellList/0-LLamaDecodeLayer/ffn_norm- | Square | AI_VECTOR_CORE | 42.24 | 0.466 | 0 | 0 | 0 | 0 | 0.0062 | 16 | \"128,128\" | FLOAT | NCHW | \"128,1\" | FLOAT | NCHW |\n", - "| LlamaRMSNorm/Square-op77Default/model-LlamaModel/layers- | | | | | | | | | | | | | | | | |\n", - "| CellList/0-LLamaDecodeLayer/ffn_norm-LlamaRMSNorm/ReduceMean-op78 | | | | | | | | | | | | | | | | |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+-----------+-----------+--------------+------------+------------+------------+-----------+---------------------+------------------+---------------------+---------------+-------------------+----------------+\n", - "| Default/lm_head-Linear/MatMul-op213 | MatMulV2 | AI_CORE | 39.02 | 0 | 0.1105 | 0.0119 | 0.0857 | 0.4284 | 0 | 20 | \"128,128;128,32000\" | FLOAT16;FLOAT16 | FORMAT_ND;FORMAT_ND | \"128,32000\" | FLOAT | FORMAT_ND |\n", - "+-----------------------------------------------------------------------------+----------+----------------+---------------+-----------+-----------+--------------+------------+------------+------------+-----------+---------------------+------------------+---------------------+---------------+-------------------+----------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "if problems: # 如果存在相关问题则获取相关问题检测细节\n", - " operator_no_bound = operator_no_bound_result.get(\"operator no bound\")\n", - " operator_no_bound_table = PrettyTable(operator_no_bound.get(\"headers\"))\n", - " for row in operator_no_bound.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " operator_no_bound_table.add_row(row)\n", - " operator_no_bound_table.hrules = ALL\n", - " display(operator_no_bound_table[:3])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### AICPU问题识别\n", - "AICPU问题主要为识别相关算子执行时跑到AICPU上计算,并没有利用到AI CORE的计算能力的场景,主要调优手段为修改相关代码来避免AICPU算子,可参见相关资料,来避免AICPU算子的问题:\n", - "https://gitee.com/ascend/mstt/blob/master/profiler/advisor/doc/Samples%20of%20AI%20CPU%20Operator%20Replacement.md\n", - "\n", - "下列代码为样例,主要展示如何检测Dynamic Shape类型问题,并获取相关问题检测结果:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "from prettytable import PrettyTable, ALL\n", - "from textwrap import fill\n", - "from profiler.advisor.interface.interface import Interface\n", - "\n", - "\n", - "# 配置profiling采集出来的数据,需要指定到的profiling目录是同一个工具采集的,并且需要采集l0级别以上\n", - "profiling_path = r\"YOUR PROFILING PATH\"\n", - "interface = Interface(profiling_path=profiling_path)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Please ensure only one trace_view.json in C:\\personalC\\profiling_data, there will analyze first timeline profiling data.\n", - " \r" - ] - } - ], - "source": [ - "# 查询computation相关是否存在aicpu问题\n", - "# 如果profiling数据采集自非8.0.RC1的CANN版本,需要在训练/推理环境中执行: 'cat CANN安装目录/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info'命令查看version\n", - "aicpu_result = interface.get_result(\"computation\", \"aicpu_analysis\")" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
problemdescriptionsuggestionproblem counttotal_time(us)time ratioincome(us)income ratio
block dimsome operator does not make full use of 25 ai core or 50 ai vector core; Top-10
operator of task duration are as follows: Square, MatMulV2, BatchMatMul,
SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast
1. Optimize operator by AOE, such as: 'aoe --job_type=2
--model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi
sor\\operator_tuning_file_20240613153259.cfg'
101814.01999999999991.0
operator no boundThere is no mte, cube, vector, scalar ratio is more than 80.00%; Top task
duration operators need to be tuned are as follows: Square, MatMulV2,
BatchMatMul, SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast
1. Optimize operator by AOE, such as: 'aoe --job_type=2
--model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi
sor\\operator_tuning_file_20240613153259.cfg'
95814.01999999999990.7985
AICPU operatorSome operators and task duration exceed 20 us, such as : Cast1. Modify code to avoid aicpu operator39686568.8600000010.0189
" - ], - "text/plain": [ - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| problem | description | suggestion | problem count | total_time(us) | time ratio | income(us) | income ratio |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| block dim | some operator does not make full use of 25 ai core or 50 ai vector core; Top-10 | 1. Optimize operator by AOE, such as: 'aoe --job_type=2 | 101 | 814.0199999999999 | 1.0 | | |\n", - "| | operator of task duration are as follows: Square, MatMulV2, BatchMatMul, | --model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi | | | | | |\n", - "| | SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast | sor\\operator_tuning_file_20240613153259.cfg' | | | | | |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| operator no bound | There is no mte, cube, vector, scalar ratio is more than 80.00%; Top task | 1. Optimize operator by AOE, such as: 'aoe --job_type=2 | 95 | 814.0199999999999 | 0.7985 | | |\n", - "| | duration operators need to be tuned are as follows: Square, MatMulV2, | --model_path=$user_dump_path --tune_ops_file=c:\\personalC\\code\\att\\profiler\\advi | | | | | |\n", - "| | BatchMatMul, SoftmaxV2, Mul, Transpose, Assign, GatherV2, Sigmoid, Cast | sor\\operator_tuning_file_20240613153259.cfg' | | | | | |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+\n", - "| AICPU operator | Some operators and task duration exceed 20 us, such as : Cast | 1. Modify code to avoid aicpu operator | 39 | 686568.860000001 | 0.0189 | | |\n", - "+-------------------+----------------------------------------------------------------------------------+----------------------------------------------------------------------------------+---------------+-------------------+------------+------------+--------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "problems = aicpu_result.get(\"problems\")\n", - "if problems: # 如果存在相关问题则获取相关问题检测描述及建议\n", - " problem_table = PrettyTable(problems.get(\"headers\"))\n", - " for row in problems.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " problem_table.add_row(row)\n", - "\n", - " problem_table.align = \"l\"\n", - " problem_table.hrules = ALL\n", - " display(problem_table)\n", - "else:\n", - " print(\"There is no suggestion related to operator no bound.\")" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
op_nameop_typetask_durationinput_shapesinput_data_typesinput_formatsoutput_shapesoutput_data_typesoutput_formatsstack_info
trans_Cast_5Cast493.64""INT32FORMAT_ND""UINT64FORMAT_ND/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/functional.py(1279):
dropout; /usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/dropout.py(58): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/language_model.py(236): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/language_model.py(425): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/gpt_model.py(84): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/module.py(184): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/distributed.py(58): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
../../pretrain_gpt.py(88): forward_step;
/profiling_auto_GPT3/megatron/schedules.py(118): forward_step;
/home/s30040711/Megatron-
LM/megatron_npu_adaptor/megatron_npu/adaptor_schedules.py(96):
forward_backward_no_pipelining; /profiling_auto_GPT3/megatron/training.py(419):
train_step; /profiling_auto_GPT3/megatron/training.py(837): train;
/profiling_auto_GPT3/megatron/training.py(152): pretrain;
../../pretrain_gpt.py(122): <module>
trans_Cast_5Cast413.4""INT32FORMAT_ND""UINT64FORMAT_ND/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/functional.py(1279):
dropout; /usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/dropout.py(58): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/language_model.py(236): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/language_model.py(425): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/gpt_model.py(84): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/module.py(184): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
/profiling_auto_GPT3/megatron/model/distributed.py(58): forward;
/usr/local/python3.7.5/lib/python3.7/site-
packages/torch/nn/modules/module.py(1110): _call_impl;
../../pretrain_gpt.py(88): forward_step;
/profiling_auto_GPT3/megatron/schedules.py(118): forward_step;
/home/s30040711/Megatron-
LM/megatron_npu_adaptor/megatron_npu/adaptor_schedules.py(109):
forward_backward_no_pipelining; /profiling_auto_GPT3/megatron/training.py(419):
train_step; /profiling_auto_GPT3/megatron/training.py(837): train;
/profiling_auto_GPT3/megatron/training.py(152): pretrain;
../../pretrain_gpt.py(122): <module>
" - ], - "text/plain": [ - "+--------------+---------+---------------+--------------+------------------+---------------+---------------+-------------------+----------------+----------------------------------------------------------------------------------+\n", - "| op_name | op_type | task_duration | input_shapes | input_data_types | input_formats | output_shapes | output_data_types | output_formats | stack_info |\n", - "+--------------+---------+---------------+--------------+------------------+---------------+---------------+-------------------+----------------+----------------------------------------------------------------------------------+\n", - "| trans_Cast_5 | Cast | 493.64 | \"\" | INT32 | FORMAT_ND | \"\" | UINT64 | FORMAT_ND | /usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/functional.py(1279): |\n", - "| | | | | | | | | | dropout; /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/dropout.py(58): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/language_model.py(236): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/language_model.py(425): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/gpt_model.py(84): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/module.py(184): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/distributed.py(58): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | ../../pretrain_gpt.py(88): forward_step; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/schedules.py(118): forward_step; |\n", - "| | | | | | | | | | /home/s30040711/Megatron- |\n", - "| | | | | | | | | | LM/megatron_npu_adaptor/megatron_npu/adaptor_schedules.py(96): |\n", - "| | | | | | | | | | forward_backward_no_pipelining; /profiling_auto_GPT3/megatron/training.py(419): |\n", - "| | | | | | | | | | train_step; /profiling_auto_GPT3/megatron/training.py(837): train; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/training.py(152): pretrain; |\n", - "| | | | | | | | | | ../../pretrain_gpt.py(122): |\n", - "+--------------+---------+---------------+--------------+------------------+---------------+---------------+-------------------+----------------+----------------------------------------------------------------------------------+\n", - "| trans_Cast_5 | Cast | 413.4 | \"\" | INT32 | FORMAT_ND | \"\" | UINT64 | FORMAT_ND | /usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/functional.py(1279): |\n", - "| | | | | | | | | | dropout; /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/dropout.py(58): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/language_model.py(236): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/language_model.py(425): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/gpt_model.py(84): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/module.py(184): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/model/distributed.py(58): forward; |\n", - "| | | | | | | | | | /usr/local/python3.7.5/lib/python3.7/site- |\n", - "| | | | | | | | | | packages/torch/nn/modules/module.py(1110): _call_impl; |\n", - "| | | | | | | | | | ../../pretrain_gpt.py(88): forward_step; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/schedules.py(118): forward_step; |\n", - "| | | | | | | | | | /home/s30040711/Megatron- |\n", - "| | | | | | | | | | LM/megatron_npu_adaptor/megatron_npu/adaptor_schedules.py(109): |\n", - "| | | | | | | | | | forward_backward_no_pipelining; /profiling_auto_GPT3/megatron/training.py(419): |\n", - "| | | | | | | | | | train_step; /profiling_auto_GPT3/megatron/training.py(837): train; |\n", - "| | | | | | | | | | /profiling_auto_GPT3/megatron/training.py(152): pretrain; |\n", - "| | | | | | | | | | ../../pretrain_gpt.py(122): |\n", - "+--------------+---------+---------------+--------------+------------------+---------------+---------------+-------------------+----------------+----------------------------------------------------------------------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "if problems: # 如果存在相关问题则获取相关问题检测细节\n", - " aicpu = aicpu_result.get(\"AICPU operator\")\n", - " aicpu_table = PrettyTable(aicpu.get(\"headers\"))\n", - " for row in aicpu.get(\"data\"):\n", - " row = [fill(str(element), width=80) for element in row]\n", - " aicpu_table.add_row(row)\n", - " aicpu_table.hrules = ALL\n", - " display(aicpu_table[:2])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "base", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.7" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/profiler/advisor/config/__init__.py b/profiler/advisor/config/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/config/config.ini b/profiler/advisor/config/config.ini deleted file mode 100644 index 06e993160104a5c8e044b1e385f6713470637831..0000000000000000000000000000000000000000 --- a/profiler/advisor/config/config.ini +++ /dev/null @@ -1,17 +0,0 @@ -[LOG] -# console_logging_level : DEBUG/INFO/WARNING/ERROR -console_logging_level = INFO -[ANALYSE] -# analysis_result_file : filename of analysis result -analysis_result_file = analysis_result_file.xlsx -# tune_ops_file: filename of tune op name list -tune_ops_file = operator_tuning_file.cfg -[THRESHOLD] -# operator_bound_ratio: (mte, cube, vector, scalar) ratio greater than this value will be checked in operator_bound_checker -operator_bound_ratio = 0.8 -frequency_threshold = 0.05 -[RULE-BUCKET] -# region : URL of different regions where can download rule yaml file -cn-north-9 = cnnorth9-modelarts-sdk -cn-southwest-2 = cnsouthwest2-modelarts-sdk -cn-north-7 = cnnorth7-modelarts-sdk \ No newline at end of file diff --git a/profiler/advisor/config/config.py b/profiler/advisor/config/config.py deleted file mode 100644 index 4f36dfedfc8e1624f3740951ea982b8cd3196657..0000000000000000000000000000000000000000 --- a/profiler/advisor/config/config.py +++ /dev/null @@ -1,115 +0,0 @@ -""" -advisor config -""" -from profiler.advisor.utils.utils import Timer - -import logging -import os -from configparser import ConfigParser - -from profiler.advisor.utils.utils import singleton - -logger = logging.getLogger() - - -@singleton -class Config: - """ - config - """ - # pylint: disable=too-many-instance-attributes - - _CONFIG_DIR_NAME = "config" - _CONFIG_FILE_NAME = "config.ini" - - def __init__(self) -> None: - config = ConfigParser(allow_no_value=True) - self._work_path = os.getcwd() # pwd - self._root_path = os.path.abspath(os.path.join(__file__, "../../")) - config.read(os.path.join(self._root_path, self._CONFIG_DIR_NAME, self._CONFIG_FILE_NAME)) - self.config = config - # ANALYSE - self._analysis_result_file = self._normalize_path(config.get("ANALYSE", "analysis_result_file")) - self._tune_ops_file = os.path.abspath( - os.path.join(self._work_path, f"operator_tuning_file_{Timer().strftime}.cfg")) - self.log_path = None - - def _normalize_path(self, file) -> str: - if not file.startswith("/"): - file = os.path.join(self._work_path, file) - return os.path.abspath(file) - - @property - def work_path(self) -> str: - """ - get work path - :return: work path - """ - return self._work_path - - @property - def root_path(self) -> str: - """ - get root path - :return: root path - """ - return self._root_path - - def set_config(self, key, value) -> None: - """ - set config value - :param key: config key - :param value: config value - """ - setattr(self, key, value) - - def get_config(self, key) -> str: - """ - get value of config - :param key: config key - :return: config value - """ - try: - return getattr(self, key) - except AttributeError: - return "" - - @property - def analysis_result_file(self) -> str: - """ - get filename of op result file - :return: filename - """ - return self._analysis_result_file - - @property - def tune_ops_file(self) -> str: - """ - get filename of tune op file - :return: filename - """ - return self._tune_ops_file - - @property - def operator_bound_ratio(self) -> float: - """ - operator_bound_ratio - """ - return float(self.config.get("THRESHOLD", "operator_bound_ratio")) - - @property - def frequency_threshold(self) -> float: - """ - frequency_threshold - """ - return float(self.config.get("THRESHOLD", "frequency_threshold")) - - def set_log_path(self, result_file: str, log_path: str = None): - self.log_path = log_path if log_path is not None else os.path.join(self._work_path, "log") - os.makedirs(self.log_path, exist_ok=True) - self.config._analysis_result_file = os.path.join(self.log_path, result_file) - self._analysis_result_file = os.path.join(self.log_path, result_file) - - def remove_log(self): - if self.log_path and os.path.isdir(self.log_path) and not os.listdir(self.log_path): - os.rmdir(self.log_path) diff --git a/profiler/advisor/config/profiling_data_version_config.yaml b/profiler/advisor/config/profiling_data_version_config.yaml deleted file mode 100644 index b8c92fe074d3bf67a23214d18f6a2438be130314..0000000000000000000000000000000000000000 --- a/profiler/advisor/config/profiling_data_version_config.yaml +++ /dev/null @@ -1,81 +0,0 @@ -versions: - - version: 8.0.RC1 - dirs_pattern: - ASCEND_PROFILER_OUTPUT: [ op_summary ] - ^PROF_\d{6}_\d{17}_\w+$: - mindstudio_profiler_output: [ op_summary, msprof ] - class_attr: - op_summary: OpSummary - msprof: Msprof - file_attr: - msprof: ^msprof_\d{14}\.json$ - op_summary: [ kernel_details.csv, '^op_summary_\d{14}\.csv$' ] - - - version: 7.0.0 - dirs_pattern: - ASCEND_PROFILER_OUTPUT: [ op_summary ] - ^PROF_\d{6}_\d{17}_\w+$: - ^device_\d+$: - summary: - [ op_summary ] - timeline: - [ msprof, task_time ] - host: - sqlite: - [ ge_info ] - class_attr: - op_summary: OpSummary - task_time: TaskTime - msprof: Msprof - ge_info: GeInfo - file_attr: - op_summary: [ kernel_details.csv, '^op_summary_\d+_\d+_\d{14}\.csv$'] - task_time: ^task_time_\d+_\d+_\d{14}\.json$ - msprof: ^msprof_\d+_\d+_\d{14}\.json$ - ge_info: ge_info.db - - - version: 7.0.RC1 - dirs_pattern: - ASCEND_PROFILER_OUTPUT: [ op_summary ] - ^PROF_\d{6}_\d{17}_\w+$: - ^device_\d+$: - summary: - [ op_summary ] - timeline: - [ msprof, task_time ] - host: - sqlite: - [ ge_info ] - class_attr: - op_summary: OpSummary - task_time: TaskTime - msprof: Msprof - ge_info: GeInfo - file_attr: - op_summary: [ kernel_details.csv, '^op_summary_\d+_\d+_\d+_\d{14}\.csv$'] - task_time: ^task_time_\d+_\d+_\d+_\d{14}\.json$ - msprof: ^msprof_\d+_\d+_\d+_\d{14}\.json$ - ge_info: ge_info.db - - - version: 6.3.RC2 - dirs_pattern: - ASCEND_PROFILER_OUTPUT: [ op_summary ] - ^PROF_\d{6}_\d{17}_\w+$: - ^device_\d+$: - summary: - [ op_summary ] - timeline: - [ msprof, task_time ] - host: - sqlite: - [ ge_info ] - class_attr: - op_summary: OpSummary - task_time: TaskTime - msprof: Msprof - ge_info: GeInfo - file_attr: - op_summary: [ kernel_details.csv, '^op_summary_\d+_\d+\.csv$'] - task_time: ^task_time_\d+_\d+\.json$ - msprof: ^msprof_\d+_\d+\.json$ - ge_info: ge_info.db diff --git a/profiler/advisor/dataset/__init__.py b/profiler/advisor/dataset/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/dataset/ai_core_freq/__init__.py b/profiler/advisor/dataset/ai_core_freq/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/dataset/ai_core_freq/ai_core_freq_dataset.py b/profiler/advisor/dataset/ai_core_freq/ai_core_freq_dataset.py deleted file mode 100644 index c99baea6564eeae5efd9585342d1f40a40ea745d..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/ai_core_freq/ai_core_freq_dataset.py +++ /dev/null @@ -1,148 +0,0 @@ -import json -import logging -import math -import os -import traceback - -import ijson -from tqdm import tqdm - -from profiler.advisor.common import constant as const -from profiler.advisor.common.timeline.event import TimelineEvent -from profiler.advisor.utils.utils import get_file_path_from_directory -from profiler.advisor.utils.utils import convert_to_float, parse_json_with_generator -from profiler.advisor.dataset.profiling.device_info import DeviceInfoParser -from profiler.advisor.config.config import Config - -logger = logging.getLogger() - - -class AICoreFreqDataset: - - def __init__(self, collection_path, data: dict, build_dataset=True, **kwargs) -> None: - - self._profiler_step = [] - self._ai_core_ops = [] - self._ai_core_freq: [TimelineEvent] = [] - self._previous_freq_index = -1 - - self.timeline_dir = collection_path - self.timeline_data_list = get_file_path_from_directory(collection_path, - lambda file: file.endswith("trace_view.json")) - - self.step = kwargs.get("step") - self.op_freq = {} - info = DeviceInfoParser(collection_path) - info.parse_data() - if not Config().get_config("aic_frequency"): - return - if self.parse(): - key = self.get_key() - if key not in data: - data[key] = [] - data[key].append(self) - - @property - def profiler_step(self): - return self._profiler_step - - @property - def ai_core_freq(self): - return self._ai_core_freq - - @property - def ai_core_ops(self): - return self._ai_core_ops - - @classmethod - def get_key(cls): - """ - get key of dataset - :return: key - """ - return cls.__module__.rsplit('.', maxsplit=1)[-1] - - def parse(self): - - if len(self.timeline_data_list) == 0: - logger.warning("Please ensure trace_view.json in %s, skip timeline analysis.", self.timeline_dir) - return False - - if len(self.timeline_data_list) > 1: - logger.warning("Found multiple trace_view.json in %s, load the file of device 0 for analysis .", - self.timeline_dir) - - _ = parse_json_with_generator(sorted(self.timeline_data_list)[0], self._add_event) - - target_ai_core_ops = self._get_target_ai_core_ops() - self._get_op_frequency(target_ai_core_ops) - return True - - def _add_profiler_step(self, event): - if event.name.startswith("ProfilerStep"): - self._profiler_step.append(event) - - def _add_ai_core_ops(self, event): - if event.args.get("Task Type") in ["MIX_AIC", "AI_CORE"]: - self._ai_core_ops.append(event) - - def _add_ai_core_freq(self, event): - if event.name == "AI Core Freq": - if self._previous_freq_index != -1: - self._ai_core_freq[self._previous_freq_index]["end"] = event.get("ts", float(math.inf)) - self._previous_freq_index += 1 - event.setdefault("end", float(math.inf)) - self._ai_core_freq.append(event) - - def _add_event(self, index, event): - event["dataset_index"] = index - if not isinstance(event, TimelineEvent): - event = TimelineEvent(event) - - self._add_profiler_step(event) - self._add_ai_core_ops(event) - self._add_ai_core_freq(event) - - return True - - def _get_target_ai_core_ops(self): - target_ai_core_ops = [] - if not self.step or f"ProfilerStep#{self.step}" not in [event.name for event in self._profiler_step]: - target_ai_core_ops = self._ai_core_ops - else: - for step_event in self._profiler_step: - if step_event.name != f"ProfilerStep#{self.step}": - continue - - for ai_core_op_event in self._ai_core_ops: - if step_event.ts_include(ai_core_op_event): - target_ai_core_ops.append(ai_core_op_event) - target_ai_core_ops = sorted(target_ai_core_ops, key=lambda x: float(x.ts)) - return target_ai_core_ops - - def _get_op_frequency(self, ai_core_ops): - ai_core_freq = sorted(self._ai_core_freq, key=lambda x: float(x.ts)) - - op_index, freq_index = 0, 0 - while op_index < len(ai_core_ops) and freq_index < len(ai_core_freq): - op_event = ai_core_ops[op_index] - op_end_time = convert_to_float(op_event.ts) + convert_to_float(op_event.dur) - op_freq_list = [] - while freq_index < len(ai_core_freq): - freq_event = ai_core_freq[freq_index] - if convert_to_float(freq_event.end) < op_end_time: - op_freq_list.append(convert_to_float(freq_event.args.MHz)) - freq_index += 1 - continue - elif convert_to_float(freq_event.ts) < op_end_time: - if op_event.name not in self.op_freq: - self.op_freq[op_event.name] = {"count": 0, "dur": 0, "freq_list": []} - self.op_freq[op_event.name]["count"] += 1 - self.op_freq[op_event.name]["dur"] += convert_to_float(op_event.dur) - op_freq_list.append(convert_to_float(freq_event.args.MHz)) - self.op_freq[op_event.name]["freq_list"].append(min(op_freq_list)) - break - else: - break - - op_index += 1 diff --git a/profiler/advisor/dataset/cluster/__init__.py b/profiler/advisor/dataset/cluster/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/dataset/cluster/cluster_dataset.py b/profiler/advisor/dataset/cluster/cluster_dataset.py deleted file mode 100644 index e1163f1cdd84265eb5cc5e356753cad5fa663339..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/cluster/cluster_dataset.py +++ /dev/null @@ -1,165 +0,0 @@ -import logging - -import os - -from profiler.advisor.dataset.dataset import Dataset -from profiler.advisor.utils.utils import singleton -from profiler.cluster_analyse.common_func.file_manager import FileManager -from profiler.advisor.common import constant as const -from profiler.cluster_analyse.common_func.constant import Constant -from collections import defaultdict -from profiler.cluster_analyse.cluster_analysis import Interface -from profiler.advisor.dataset.cluster.cluster_step_trace_time_bean import ClusterStepTraceTimeBean - -logger = logging.getLogger() - - -class ClusterDataset(Dataset): - - def __init__(self, collection_path, data: dict, **kwargs) -> None: - super().__init__(collection_path, data) - - def is_cluster_analysis_output_exist(self): - """ - check whether input path is valid - """ - for file in os.listdir(self.collection_path): - if file == 'cluster_analysis_output': - logger.info("[INFO]Cluster has been analyzed " - "because of the existence of cluster analysis output directory.") - logger.info("[INFO]Skip Cluster analyze backend.") - return True - return False - - def cluster_analyze(self): - if self.is_cluster_analysis_output_exist(): - return - parameter = { - Constant.COLLECTION_PATH: self.collection_path, - Constant.ANALYSIS_MODE: "all" - } - print("[INFO] cluster analysis is in the process, please wait...") - try: - Interface(parameter).run() - except Exception as e: - raise ValueError(f"Cluster analyze backend failed:{e}") from e - - def load_csv_data(self, file_name, dataBean): - csv_path = os.path.join(self.collection_path, const.CLUSTER_ANALYSIS_OUTPUT, file_name) - if not os.path.exists(csv_path): - msg = "[ERROR] cluster_step_trace_time.csv doesn't exist, terminate analysis." - raise RuntimeError(msg) - data = FileManager.read_csv_file(csv_path, dataBean) - return data - - def load_json_data(self, file_name): - json_path = os.path.join(self.collection_path, const.CLUSTER_ANALYSIS_OUTPUT, file_name) - if not os.path.exists(json_path): - msg = "[ERROR] cluster_communication.json doesn't exist, terminate analysis." - raise RuntimeError(msg) - data = FileManager.read_json_file(json_path) - return data - - -@singleton -class ClusterStepTraceTimeDataset(ClusterDataset): - RANK = "rank" - - def __init__(self, collection_path: str, data: dict, **kwargs): - self._step_dict = defaultdict() - super().__init__(collection_path, data) - - def _parse(self): - self.cluster_analyze() - try: - step_data = self.load_csv_data(const.CLUSTER_STEP_TIME_CSV, ClusterStepTraceTimeBean) - except RuntimeError as e: - print("捕获到异常:", e) - self._step_dict = None - return False - self._step_dict = self.format_data(step_data) - return True - - def format_data(self, step_data: list): - step_dict = defaultdict(lambda: [0, 0, 0]) - for step_bean in step_data: - if step_bean.type == self.RANK: - step_dict[step_bean.index][0] += step_bean.compute - step_dict[step_bean.index][1] += step_bean.communication - step_dict[step_bean.index][2] += step_bean.free - return step_dict - - def get_data(self): - return self._step_dict - - -@singleton -class ClusterCommunicationDataset(ClusterDataset): - RDMA_TIME_MS = "RDMA time(ms)" - RDMA_SIZE_MB = "RDMA size(mb)" - SDMA_TIME_MS = "SDMA time(ms)" - SDMA_SIZE_MB = "SDMA size(mb)" - RDMA_BANDWIDTH = "RDMA bandwidth(GB/s)" - SDMA_BANDWIDTH = "SDMA bandwidth(GB/s)" - COMMUNICATION_BANDWIDTH_INFO = "Communication Bandwidth Info" - TRANSIT_TIME = "Transit Time(ms)" - TRANSIT_SIZE = "Transit Size(MB)" - SDMA = "SDMA" - RDMA = "RDMA" - - def __init__(self, collection_path: str, data: dict, **kwargs): - self.rank_bw_dict = defaultdict(lambda: { - self.RDMA_TIME_MS: 0, - self.RDMA_SIZE_MB: 0, - self.SDMA_TIME_MS: 0, - self.SDMA_SIZE_MB: 0, - }) - super().__init__(collection_path, data) - - @staticmethod - def compute_ratio(dividend: float, divisor: float): - if abs(divisor) < 1e-15: - return 0 - else: - return round(dividend / divisor, 4) - - def _parse(self): - self.cluster_analyze() - try: - communication_json = self.load_json_data(const.CLUSTER_COMM_JSON) - except RuntimeError as e: - print("捕获到异常:", e) - self.rank_bw_dict = None - return False - self.process(communication_json) - return True - - def process(self, communication_json: dict): - for comm_group, group_dict in communication_json.items(): - for step, step_dict in group_dict.items(): - for op, op_dict in step_dict.items(): - self.compute_bandwidth(op_dict) - - def compute_bandwidth(self, op_dict: dict): - for rank_id, rank_dict in op_dict.items(): - try: - rank = int(rank_id) - except ValueError as e: - msg = "[ERROR] Cluster_communication.json has invalid structure." - raise ValueError(msg) from e - for comm_type, bw_dict in rank_dict.get(self.COMMUNICATION_BANDWIDTH_INFO, {}).items(): - if comm_type == self.SDMA: - self.rank_bw_dict[rank][self.SDMA_SIZE_MB] += bw_dict.get(self.TRANSIT_SIZE) - self.rank_bw_dict[rank][self.SDMA_TIME_MS] += bw_dict.get(self.TRANSIT_TIME) - if comm_type == self.RDMA: - self.rank_bw_dict[rank][self.RDMA_SIZE_MB] += bw_dict.get(self.TRANSIT_SIZE) - self.rank_bw_dict[rank][self.RDMA_TIME_MS] += bw_dict.get(self.TRANSIT_TIME) - - for rank, rank_dict in self.rank_bw_dict.items(): - self.rank_bw_dict[rank][self.RDMA_BANDWIDTH] = self.compute_ratio( - self.rank_bw_dict[rank][self.RDMA_SIZE_MB], self.rank_bw_dict[rank][self.RDMA_TIME_MS]) - self.rank_bw_dict[rank][self.SDMA_BANDWIDTH] = self.compute_ratio( - self.rank_bw_dict[rank][self.SDMA_SIZE_MB], self.rank_bw_dict[rank][self.SDMA_TIME_MS]) - - def get_data(self): - return self.rank_bw_dict diff --git a/profiler/advisor/dataset/cluster/cluster_step_trace_time_bean.py b/profiler/advisor/dataset/cluster/cluster_step_trace_time_bean.py deleted file mode 100644 index b108fc77a3f3408d48c79ce6b542f98427d88b0b..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/cluster/cluster_step_trace_time_bean.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright (c) 2023, Huawei Technologies Co., Ltd. -# All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -class ClusterStepTraceTimeBean: - STEP = "Step" - TYPE = "Type" - INDEX = "Index" - COMPUTING = "Computing" - COMMUNICATION = "Communication(Not Overlapped)" - FREE = "Free" - - def __init__(self, data: dict): - self._data = data - - @property - def step(self) -> str: - return self._data.get(self.STEP, '') - - @property - def type(self) -> str: - return self._data.get(self.TYPE, '') - - @property - def index(self) -> int: - try: - return int(self._data.get(self.INDEX)) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Index'." - raise ValueError(msg) from e - - @property - def compute(self) -> float: - try: - return float(self._data.get(self.COMPUTING, '')) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Computing'." - raise ValueError(msg) from e - - @property - def communication(self) -> float: - try: - return float(self._data.get(self.COMMUNICATION, '')) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Communication'." - raise ValueError(msg) from e - - @property - def free(self) -> float: - try: - return float(self._data.get(self.FREE, '')) - except ValueError as e: - msg = "[ERROR] Cluster step trace time.csv has invalid value in column 'Free'." - raise ValueError(msg) from e - diff --git a/profiler/advisor/dataset/dataset.py b/profiler/advisor/dataset/dataset.py deleted file mode 100644 index 7f1e40a38b8a4a26585eecfe6271cc75ea054d2d..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/dataset.py +++ /dev/null @@ -1,38 +0,0 @@ -""" -dataset module -""" -import logging -import os - -from profiler.advisor.config.config import Config - -logger = logging.getLogger() - - -class Dataset: - """ - :param collection_path: dataSet absolute path - dataset base class - """ - - def __init__(self, collection_path, data=None) -> None: - if data is None: - data = {} - self.collection_path = os.path.abspath(os.path.join(Config().work_path, collection_path)) - logger.debug("init %s with %s", self.__class__.__name__, self.collection_path) - if self._parse(): - key = self.get_key() - if key not in data: - data[key] = [] - data[key].append(self) - - def _parse(self): - return None - - @classmethod - def get_key(cls): - """ - get key of dataset - :return: key - """ - return cls.__name__.rsplit('.', maxsplit=1)[-1] diff --git a/profiler/advisor/dataset/graph_dataset.py b/profiler/advisor/dataset/graph_dataset.py deleted file mode 100644 index 951de7fd26b1f986d25285547e63b1a420968249..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/graph_dataset.py +++ /dev/null @@ -1,53 +0,0 @@ -import logging -from typing import List - -from profiler.advisor.dataset.dataset import Dataset -from profiler.advisor.common.graph.graph_parser import HostGraphParser -from profiler.advisor.common.graph.graph import Graph -from profiler.advisor.utils.utils import load_parameter, lazy_property, get_file_path_from_directory - -logger = logging.getLogger() - - -class GraphDataset(Dataset): - """ - data directory dataset - """ - FILE_PATTERN = "ATT_ADVISOR_GRAPH_FILE" - - def __init__(self, collection_path, data: dict = None, **kwargs) -> None: - self.graph_files: List[HostGraphParser] = [] - super().__init__(collection_path, data) - - def _parse(self): - graph_list = get_file_path_from_directory(self.collection_path, - lambda file: file.endswith( - load_parameter(self.FILE_PATTERN, "_Build.txt"))) - - for graph_file_path in graph_list[-1:]: - logger.info("Prepare to parse %s as default graph.", graph_file_path) - graph_file = HostGraphParser(graph_file_path) - self.graph_files.append(graph_file) - return self.graph_files - - @lazy_property - def graphs(self) -> List[Graph]: - """ - get a list of graphs - return: List[Graph] - """ - graphs = [] - for parser in self.graph_files: - graph = Graph(nodes=parser.nodes, - edges=parser.edges, - name="Default") - graph.build() - graphs.append(graph) - graphs.sort(key=lambda g: g.name) - if len(self.graph_files) >= 1: - del self.graph_files[0] # remove previous useless data - return graphs - - def is_empty(self) -> bool: - """check empty graph dataset""" - return len(self.graph_files) == 0 diff --git a/profiler/advisor/dataset/profiling/__init__.py b/profiler/advisor/dataset/profiling/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/dataset/profiling/builder_base.py b/profiler/advisor/dataset/profiling/builder_base.py deleted file mode 100644 index 2bfe14f9462b701db2a4ede1d539a07659f48ae8..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/profiling/builder_base.py +++ /dev/null @@ -1,39 +0,0 @@ -""" -profiling base -""" -import logging -from typing import Dict, List - -from profiler.advisor.dataset.profiling.profiling_parser import ProfilingParser -from profiler.advisor.utils.utils import join_prof_path - -logger = logging.getLogger() - - -class ProfilingBuilderBase: - """ - profiling base - """ - DATA_LIST: List[Dict] = [] - - def __init__(self, path) -> None: - self._path = path - - def parse_data(self) -> bool: - """ - parse data for file in data_dir - """ - if isinstance(self, ProfilingParser): - return True - ret = False - for data in self.DATA_LIST: - class_name = data.get("class_name") - if class_name is not None: - if data.get("subdir_name"): - data_class = data.get("class_name")(join_prof_path(self._path, data.get("subdir_name"))) - else: - data_class = data.get("class_name")(self._path) - if data_class.parse_data(): - setattr(self, str(data.get("attr_name")), data_class) - ret = True - return ret diff --git a/profiler/advisor/dataset/profiling/db_manager.py b/profiler/advisor/dataset/profiling/db_manager.py deleted file mode 100644 index c9fb73c7cf69d94c3ca1aba8c726f574d63cd1a3..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/profiling/db_manager.py +++ /dev/null @@ -1,70 +0,0 @@ -""" -connection manager -""" -import os -import re -from typing import List - -from sqlalchemy import MetaData, create_engine - - -class ConnectionManager: - """ - Connection Manager - """ - - def __init__(self, path, db_name): - self.db_path = os.path.join(path, db_name) - self.connection = create_engine(f'sqlite:///{self.db_path}') - self.metadata = MetaData() - self.metadata.reflect(bind=self.connection) - - def __call__(self, *args, **kwargs): - return self.connection - - @staticmethod - def check_db_exists(db_path:str, dbs:List) -> bool: - """ - check db exists - """ - if not os.path.isdir(db_path): - return False - for prof_db in dbs: - if not os.access(db_path, os.R_OK) or prof_db not in os.listdir(db_path): - return False - return True - - def check_table_exists(self, tables:List) -> bool: - """ - check table exists - """ - for table in tables: - if table not in self.metadata.tables: - return False - return True - - def check_column_exists(self, table_name:str, columns:List) -> bool: - """ - check column exists - """ - if table_name not in self.metadata.tables: - return False - for column in columns: - if column not in self.metadata.tables[table_name].columns: - return False - return True - - @classmethod - def get_connection(cls, path, dbs, tables=None, is_host=False): - """ - get connection - """ - if is_host: - pattern = r"/device_[0-9]" - path = re.sub(pattern, "/host", path) - if not cls.check_db_exists(path, dbs): - return None - conn = cls(path, dbs) - if tables and not conn.check_table_exists(tables): - return None - return conn diff --git a/profiler/advisor/dataset/profiling/device_info.py b/profiler/advisor/dataset/profiling/device_info.py deleted file mode 100644 index 110cd0794c6cb153644b9d2e59c7d0793eb280b4..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/profiling/device_info.py +++ /dev/null @@ -1,63 +0,0 @@ -""" -profiling info -""" -import json -import logging - -from profiler.advisor.config.config import Config -from profiler.advisor.utils.utils import get_file_path_from_directory - -logger = logging.getLogger() - - -class DeviceInfoParser: - """ - profiling info - device_id device 名称信息 - "aiv_num" ai vector 个数 - "ai_core_num" aicore 个数 - """ - DATA_LIST = [] - - def __init__(self, path) -> None: - self._path = path - - def parse_data(self) -> bool: - """ - parse profiling data - :return: true for success or false - """ - file_list = get_file_path_from_directory(self._path, lambda x: x.startswith("info.json.")) - if not file_list: - return False - for info in file_list: - if self._parse(info): - return True - return False - - @staticmethod - def _parse(info_file: str) -> bool: - if info_file.endswith("done"): - return False # skip info.json.0.done - try: - with open(info_file, encoding="utf-8") as file: - info = json.load(file) - except (IOError, ValueError) as error: - logger.error("Parse json info file %s failed : %s", info_file, error) - return False - if "DeviceInfo" not in info: - logger.error("No device info in json info file %s", info_file) - return False - config = Config() - for device_info in info["DeviceInfo"]: - if "id" in device_info: - config.set_config("device_id", device_info["id"]) - if "aiv_num" in device_info: - config.set_config("aiv_num", device_info["aiv_num"]) - if "aic_frequency" in device_info: - config.set_config("aic_frequency", device_info["aic_frequency"]) - if "ai_core_num" in device_info: - config.set_config("ai_core_num", device_info["ai_core_num"]) - return True - logger.error("No ai_core_num in json info file %s", info_file) - return False diff --git a/profiler/advisor/dataset/profiling/info_collection.py b/profiler/advisor/dataset/profiling/info_collection.py deleted file mode 100644 index b1f84313bb7980ea2186d2727db51b5fba49e12e..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/profiling/info_collection.py +++ /dev/null @@ -1,270 +0,0 @@ -""" -profiling info -""" -import decimal -import logging - -from profiler.advisor.utils.utils import lazy_property - -logger = logging.getLogger() - - -class Info: - """ - op info - """ - _attr_pre_fix_list = [""] - - def add_attr(self, key: str, value: str): - """ - add attr to op info - :param key: op info key - :param value: op info value - :return: None - """ - if not key or hasattr(self, key): - return - setattr(self, key, value) - - def has_attr(self, key: str, strict_mode=False): - """ - check if op info has attr key - :param key: attr key - :return: true or false - """ - if strict_mode: - return hasattr(self, key) - for prefix in self._attr_pre_fix_list: - attr = prefix + key - if hasattr(self, attr): - return True - return False - - def get_attr(self, key, strict_mode=False): - """ - get attr value by key - :param key: attr key - :return: attr value - """ - if strict_mode: - if hasattr(self, key): - return getattr(self, key) - else: - for prefix in self._attr_pre_fix_list: - attr = prefix + key - if key.startswith("mac") and prefix == "aiv_": - # e.g mac_ratio must match aic_mac_ratio, not aiv_mac_ratio - continue - if key.startswith("vec") and prefix == "aic_": - # e.g vec_ratio must match aiv_vec_ratio, not aic_vec_ratio - continue - if hasattr(self, attr): - return getattr(self, attr) - return "" - - def get_float_attr(self, attr, strict_mode=False): - """ - get attr value by key - :param key: attr key - :return: attr value - """ - try: - return float((self.get_attr(attr, strict_mode))) - except (ValueError, FloatingPointError): - pass - return 0 - - def get_decimal_attr(self, attr, strict_mode=False): - """ - get attr value by key - :param key: attr key - :return: attr value - """ - try: - return decimal.Decimal((self.get_attr(attr, strict_mode))) - except (ValueError, decimal.InvalidOperation): - pass - return decimal.Decimal(0) - - def get_attrs(self) -> dict: - """ - get attr list - :return: attr list - """ - return self.__dict__ - - -class OpInfo(Info): - """ - summary info - """ - - _attr_pre_fix_list = ["", "aic_", "aiv_"] - _mac_ratio_attrs = ["mac_ratio", "mac_fp16_ratio", "mac_int8_ratio", "aic_mac_ratio"] - _aicore_time_key = ["aicore_time", "aiv_time"] - _total_cycles_key = ["total_cycles", "aic_total_cycles", "aiv_total_cycles"] - - def __lt__(self, other): - return self.get_float_attr("task_start_time") < other.get_float_attr("task_start_time") - - @lazy_property - def is_cube_op(self) -> bool: - """ - check type of operator if cube or not - """ - for attr in self._mac_ratio_attrs: - if hasattr(self, attr): - try: - if float(getattr(self, attr)) > 0: - if hasattr(self, "ffts_type") and getattr(self, "ffts_type") == "1": - logger.warning( - "ffts type of op %s is vector buf mac ratio is not 0", getattr(self, "op_name") - ) - return True - except ValueError: - pass - # not cube op - if hasattr(self, "ffts_type") and getattr(self, "ffts_type") == "0": - logger.warning("ffts type of op %s is cube but mac ratio is 0", getattr(self, "op_name")) - return False - - @lazy_property - def has_mac_ratio(self) -> bool: - """ - check if op_info has mac ratio - """ - for attr in self._mac_ratio_attrs: - if attr in self.__dict__: - return True - return False - - def attr_sum(self, attr_list): - """sum of a list attrs""" - total = 0 - for attr in attr_list: - total += self.get_float_attr(attr, strict_mode=True) - return total - - def get_aicore_time(self): - """ - get sum of aicore time and ai vector core time - """ - return self.attr_sum(self._aicore_time_key) - - def get_total_cycles(self): - """ - get sum of total cycle for aicore and ai vector core - """ - return self.attr_sum(self._total_cycles_key) - - -class TaskInfo: - """ - task info - """ - EVENT_TYPE = {"metadata": ['M'], "duration": ['B', 'E'], "complete": ['X'], 'flow': ['s', 't', 'f']} - - def __init__(self, content: dict) -> None: - self._name = content.get("name", "") - self._pid = content.get("pid", 0) - self._tid = content.get("tid", 0) - self._start_time = float(content.get("ts", 0.0)) - self._dur = float(content.get("dur", 0.0)) - self._args = content.get("args", {}) - self._cat = content.get("cat", "") - self._id = content.get("id", "") - - @property - def pk_id(self): - """ - get id - :return: id - """ - return self._id - - @property - def pid(self): - """ - get pid - :return: pid - """ - return self._pid - - @property - def tid(self): - """ - get tid - :return: tid - """ - return self._tid - - @property - def task_type(self): - """ - get pid - :return: pid - """ - return self._args.get("Task Type", "NA") - - @property - def start_time(self): - """ - get starttime - :return: starttime - """ - return self._start_time - - @property - def end_time(self): - """ - get endtime - :return: endtime - """ - return self._start_time + self._dur - - @property - def dur(self): - """ - get duration - :return: duration - """ - return self._dur - - @property - def name(self): - """ - get task name - :return: task name - """ - return self._name - - @property - def stream_id(self): - """ - get stream_id - :return: steram id - """ - return self._args.get("Stream Id", "NA") - - @property - def task_id(self): - """ - get task id - :return: task_id - """ - return self._args.get("Task Id", "NA") - - @property - def args(self): - """ - get args of task - :return: args - """ - return self._args - - @property - def cat(self): - """ - get category of task - """ - return self._cat diff --git a/profiler/advisor/dataset/profiling/profiling_dataset.py b/profiler/advisor/dataset/profiling/profiling_dataset.py deleted file mode 100644 index ebd90951abf5290d376efd13c257b90878343381..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/profiling/profiling_dataset.py +++ /dev/null @@ -1,86 +0,0 @@ -import logging -import os - -import yaml -from profiler.advisor.common import constant -from profiler.advisor.common.profiling.ge_info import GeInfo -from profiler.advisor.common.profiling.msprof import Msprof -from profiler.advisor.common.profiling.op_summary import OpSummary -from profiler.advisor.common.profiling.tasktime import TaskTime -from profiler.advisor.dataset.dataset import Dataset -from profiler.advisor.dataset.profiling.device_info import DeviceInfoParser -from profiler.advisor.utils.utils import join_prof_path -from profiler.cluster_analyse.common_func.file_manager import FileManager - - -logger = logging.getLogger() - - -class ProfilingDataset(Dataset): - PROF_TYPE = "" - - def __init__(self, collection_path, data: dict, **kwargs) -> None: - self.cann_version = kwargs.get("cann_version", constant.DEFAULT_CANN_VERSION) - self.PROF_TYPE = kwargs.get("profiling_type", constant.DEFAULT_PROFILING_TYPE) - self.patterns = self.parse_pattern() - self.current_version_pattern = self.get_current_version_pattern() - super().__init__(collection_path, data) - - def _parse(self): - info = DeviceInfoParser(self.collection_path) - if info.parse_data(): - self._info = info - ret = False - if self.current_version_pattern is not None: - self.build_from_pattern(self.current_version_pattern["dirs_pattern"], self.collection_path) - ret = True - - return ret - - def build_from_pattern(self, dirs_pattern, current_path): - if isinstance(dirs_pattern, dict): - for key, value in dirs_pattern.items(): - self.build_from_pattern(value, join_prof_path(current_path, key)) - elif isinstance(dirs_pattern, list): - for item in dirs_pattern: - if hasattr(self, item) and getattr(self, item): - # 避免重复构建kernel_details.csv, op_summary.csv的数据对象 - continue - file_pattern_list = self.current_version_pattern.get('file_attr').get(item) - data_class = globals()[self.current_version_pattern.get('class_attr').get(item)] - if not hasattr(data_class, "file_pattern_list"): - continue - setattr(data_class, "file_pattern_list", self.current_version_pattern.get('file_attr').get(item)) - data_object = data_class(current_path) - is_success = data_object.parse_data() - if is_success: - setattr(self, item, data_object) - else: - logger.info("Skip parse %s with file pattern %s from local path %s", - self.current_version_pattern.get('class_attr').get(item), file_pattern_list, current_path) - else: - logger.warning(f"Unsupported arguments : %s to build %s", dirs_pattern, self.__class__.__name__) - - def get_current_version_pattern(self): - for version_config_dict in self.patterns['versions']: - if version_config_dict['version'] == self.cann_version: - return version_config_dict - return dict() - - def parse_pattern(self, config_path="config/profiling_data_version_config.yaml"): - - if not os.path.isabs(config_path): - config_path = os.path.join(os.path.dirname(__file__), - "../", "../", config_path) - - if not os.path.exists(config_path): - logger.warning("Skip parse profiling dataset, because %s does not exist.", config_path) - return [] - - patterns = FileManager.read_yaml_file(config_path) - - return patterns - - def collection_path(self): - """collection_path""" - return self.collection_path diff --git a/profiler/advisor/dataset/profiling/profiling_parser.py b/profiler/advisor/dataset/profiling/profiling_parser.py deleted file mode 100644 index 51996617c2b83a3a1e4d1f873140957c8ff68b51..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/profiling/profiling_parser.py +++ /dev/null @@ -1,137 +0,0 @@ -import csv -import json -import os -import re -from typing import List, Dict - -from profiler.advisor.dataset.profiling.info_collection import logger -from profiler.advisor.utils.utils import get_file_path_from_directory, SafeOpen, format_excel_title - - -class ProfilingParser: - """ - profiling - """ - FILE_PATTERN_MSG = "" - FILE_INFO = "" - - file_pattern_list = [] - - def __init__(self, path: str) -> None: - self._path = path - self._raw_data: List[List[str]] = [] - self._filename = "" - - @staticmethod - def file_match_func(pattern): - """file match function""" - return lambda x: re.search(re.compile(pattern), x) - - def parse_data(self) -> bool: - """ - pase task time file - :return: true or false - """ - if self._parse_from_file(): - return True - return False - - def _parse_from_file(self): - - if not isinstance(self.file_pattern_list, list): - self.file_pattern_list = [self.file_pattern_list] - - for file_pattern in self.file_pattern_list: - file_list = get_file_path_from_directory(self._path, self.file_match_func(file_pattern)) - if not file_list: - continue - ## get last file - target_file = file_list[-1] - if len(file_list) > 1: - logger.warning("Multiple copies of %s were found, use %s", self.FILE_INFO, target_file) - return self.parse_from_file(target_file) - return False - - @staticmethod - def get_float(data) -> float: - """ - get float or 0.0 - """ - try: - return float(data) - except (FloatingPointError, ValueError): - return 0.0 - - def parse_from_file(self, file): - """ - parse from file - """ - return False - - @staticmethod - def _check_csv_file_format(csv_file_name: str, csv_content: List[List[str]]): - if not csv_content: - logger.error("%s is empty", csv_file_name) - return False - return True - - def _parse_csv(self, file, check_csv=True) -> bool: - logger.debug("Parse file %s", file) - self._filename = os.path.splitext(os.path.basename(file))[0] - with SafeOpen(file, encoding="utf-8") as csv_file: - try: - csv_content = csv.reader(csv_file) - for row in csv_content: - self._raw_data.append(row) - if check_csv and not self._check_csv_file_format(file, self._raw_data): - logger.error("Invalid csv file : %s", file) - return False - except OSError as error: - logger.error("Read csv file failed : %s", error) - return False - - if not csv_file: - return False - if not self._raw_data: - logger.warning("File %s has no content", file) - return False - return True - - def _parse_json(self, file) -> bool: - logger.debug("Parse file %s", file) - self._filename = os.path.splitext(os.path.basename(file))[0] - try: - with open(file, encoding="utf-8") as json_file: - self._raw_data = json.load(json_file) - except (OSError, ValueError) as error: - logger.error("Parse json file %s failed : %s", file, error) - return False - return True - - def get_raw_data(self): - """ - get raw file name and data - """ - return self._filename, self._raw_data - - @staticmethod - def _get_csv_title(data: List, number=0, title_index=0): - """ - number = 0 replace (us) (ns).. - other replace " " to "_" - title_index: position of title default 0 - """ - title_dict: Dict[int, str] = {} - for idx, title in enumerate(data[title_index]): - if number == 0: - title_dict[idx] = format_excel_title(title) - else: - title_dict[idx] = title.replace(" ", "_") - return title_dict - - @property - def path(self): - """ - path - """ - return self._path diff --git a/profiler/advisor/dataset/timeline_event_dataset.py b/profiler/advisor/dataset/timeline_event_dataset.py deleted file mode 100644 index 1504e65f54fd32398e6873de267992e59606fe4d..0000000000000000000000000000000000000000 --- a/profiler/advisor/dataset/timeline_event_dataset.py +++ /dev/null @@ -1,329 +0,0 @@ -import json -import logging -import os -from typing import List, Any -import traceback - -import ijson -from tqdm import tqdm -import yaml - -from profiler.advisor.common import constant as const -from profiler.advisor.common.timeline.event import TimelineEvent -from profiler.advisor.utils.utils import get_file_path_from_directory, check_path_valid, singleton -from profiler.cluster_analyse.common_func.file_manager import FileManager - -logger = logging.getLogger() - - -class OpCompileCollector: - def __init__(self): - self._total_op_compile_counter = 0 - self._total_op_compile_time = 0.0 - - @property - def total_time(self): - return self._total_op_compile_time - - @property - def total_count(self): - return self._total_op_compile_counter - - def is_empty(self): - return self._total_op_compile_counter == 0 - - def update(self, event: TimelineEvent): - self._total_op_compile_time += float(event.dur) - self._total_op_compile_counter += 1 - - def unset(self): - self._total_op_compile_counter = 0 - self._total_op_compile_time = 0.0 - - -class SynchronizeStreamCollector: - - def __init__(self): - self._synchronize_stream_count = 0 - self._slow_synchronize_stream = [] - self.rule = SynchronizeStreamCollector._load_rule() - - @property - def total_count(self): - return self._synchronize_stream_count - - @property - def slow_synchronize_stream(self): - return self._slow_synchronize_stream - - @staticmethod - def _load_rule(): - sync_stream_rule_path = os.path.join(os.path.dirname(os.path.dirname(os.path.realpath(__file__))), "rules", - "synchronize.yaml") - - sync_stream_rule = FileManager.read_yaml_file(sync_stream_rule_path) - return sync_stream_rule - - def update_sync_stream_count(self): - self._synchronize_stream_count += 1 - - def append_slow_sync_stream(self, event): - if float(event.dur) / 1000 >= self.rule.get("slow_synchronize_threshold", 10): - self._slow_synchronize_stream.append(event) - - def unset(self): - self._synchronize_stream_count = 0 - self._slow_synchronize_stream = [] - - -@singleton -class TimelineEventDataset: - - def __init__(self, collection_path, data: dict, build_dataset=True, **kwargs) -> None: - self._ops_with_task_type = {} - self._ops_with_stack = {} - self._ops_compile = OpCompileCollector() - self._torch_to_npu = {} - self._acl_to_npu = set() - self._aten: List[Any] = [] - self._optimizer: List[Any] = [] - self._dataloader: List[Any] = [] - self._sync_batchnorm: List[Any] = [] - self._synchronize_stream = SynchronizeStreamCollector() - self.timeline_dir = collection_path - self.timeline_data_list = get_file_path_from_directory(collection_path, - lambda file: file.endswith("trace_view.json")) - self.dataset_len = None - self.analysis_mode = kwargs.get("analysis_mode") - self.task_type = kwargs.get("task_type") - - if not build_dataset: - return - - if self.parse(): - key = self.get_key() - if key not in data: - data[key] = [] - data[key].append(self) - - if self.analysis_mode in ["op_stack", "all"]: - self._task_op_names = list(set([event_key.split("-")[0] for event_key in self._ops_with_task_type.keys()])) - - self._post_process() - - @property - def ops_with_stack(self): - return self._ops_with_stack - - @property - def ops_compile(self): - return self._ops_compile - - @property - def torch_to_npu(self): - return self._torch_to_npu - - @property - def acl_to_npu(self): - return self._acl_to_npu - - @property - def ops_with_task_type(self): - return self._ops_with_task_type - - @property - def task_op_names(self): - return self._task_op_names - - @property - def optimizer(self): - return self._optimizer - - @property - def aten(self): - return self._aten - - @property - def dataloader(self): - return self._dataloader - - @property - def sync_batchnorm(self): - return self._sync_batchnorm - - @property - def synchronize_stream(self): - return self._synchronize_stream - - @classmethod - def get_key(cls): - """ - get key of dataset - :return: key - """ - return cls.__module__.rsplit('.', maxsplit=1)[-1] - - def parse(self): - - if len(self.timeline_data_list) == 0: - logger.warning("Please ensure trace_view.json in %s, skip timeline analysis.", self.timeline_dir) - return False - - if len(self.timeline_data_list) > 1: - logger.warning("Found multiple trace_view.json in %s, load the file of device 0 for analysis .", - self.timeline_dir) - - result = self.parse_data_with_generator(self._add_event) - - if not self.dataset_len: - self.dataset_len = len(result) - return True - - def parse_data_with_generator(self, func): - result = [] - timeline_data_path = sorted(self.timeline_data_list)[0] - if not check_path_valid(timeline_data_path): - return result - - try: - with open(timeline_data_path, "r") as f: - for i, event in tqdm(enumerate(ijson.items(f, "item")), - leave=False, ncols=100, desc="Building dataset for timeline analysis", - total=self.dataset_len): - func_res = func(index=i, event=event) - if func_res is not None: - result.append(func_res) - - except Exception: - logger.warning("Error %s while parsing file %s, continue to timeline analysis", traceback.format_exc(), - timeline_data_path) - return result - - def _add_ops_with_task_type(self, event): - key = f"{event.name}-{event.ts}" - self._ops_with_task_type[key] = TimelineEvent( - { - const.TASK_TYPE: event.args.get(const.TASK_TYPE), - "task_id": event.args.get("Task Id"), - "tid": event.tid, - "name": event.name, - "ts": str(event.ts) - } - ) - - def _add_ops_with_stack(self, event): - self._ops_with_stack[str(event.ts)] = TimelineEvent({"name": event.name, "dataset_index": event.dataset_index}) - - def _add_torch_to_npu(self, event): - key = f"{event.ph}-{event.id}" - self._torch_to_npu[key] = TimelineEvent({"tid": event.tid, "ts": str(event.ts)}) - - def _add_acl_to_npu(self, event): - # op with task type equals to ai_cpu which derived from acl_to_npu do not have stacks - self._acl_to_npu.add(str(event.ts)) - - def _add_op_compile(self, event: TimelineEvent): - if event.name == const.OP_COMPILE_NAME or event.args.get("id") == const.OP_COMPILE_ID: - self._ops_compile.update(event) - - def _add_optimizer(self, event: TimelineEvent): - self._optimizer.append(TimelineEvent({"name": event.name, "dataset_index": event.dataset_index})) - - def _add_aten(self, event: TimelineEvent): - self._aten.append(TimelineEvent({ - "name": event.name, "dataset_index": event.dataset_index, "ts": event.ts, "dur": event.dur - })) - - def _add_dataloader(self, event: TimelineEvent): - if "dataloader" in event.name.lower(): - self._dataloader.append(TimelineEvent({ - "name": event.name, "dataset_index": event.dataset_index, "ts": event.ts, "dur": event.dur, - "stack": event.args.get("Call stack") - })) - - def _add_sync_batchnorm(self, event: TimelineEvent): - if event.name.lower() == "syncbatchnorm": - self._sync_batchnorm.append(TimelineEvent({ - "name": event.name, "dataset_index": event.dataset_index, "ts": event.ts, "dur": event.dur - })) - - def _add_synchronize(self, event: TimelineEvent): - if event.name.startswith(const.SYNC_STREAM): - self._synchronize.append(TimelineEvent({ - "name": event.name, "ts": event.ts, "dur": event.dur - })) - - def _add_specific_operator(self, event): - # for analysis of operator aclOpCompile, enable jit_compILE=False - self._add_op_compile(event) - # for analysis of slow dataloader.__next__ - self._add_dataloader(event) - # for analysis of syncBatchNorm operator, prompt users to replace source code of torch_npu's syncbn - self._add_sync_batchnorm(event) - - def _add_event(self, index, event): - event["dataset_index"] = index - if not isinstance(event, TimelineEvent): - event = TimelineEvent(event) - - self._add_specific_operator(event) - - if self.analysis_mode == "fusion_ops": - self._add_event_for_fusion_ops(event) - elif self.analysis_mode == "op_stack": - self._add_event_for_op_stack(event) - else: - self._add_event_for_fusion_ops(event) - self._add_event_for_op_stack(event) - return True - - def _add_event_for_fusion_ops(self, event): - if event.name.lower().startswith(f"{const.ATEN}{const.ATEN_SEP}") or event.name.lower().startswith( - f"{const.NPU}{const.ATEN_SEP}"): - self._add_aten(event) - return - - # 检查cann层同步操作,根据时间窗口索引到host侧的aten算子并给出堆栈 - if event.name.startswith(const.SYNC_STREAM): - self._add_aten(event) - - if event.name.startswith(f"{const.OPTIMIZER}.{const.OPTIMIZER_STEP}{const.OPTIMIZER_SEP}"): - self._add_optimizer(event) - return - - def _add_event_for_op_stack(self, event): - if event.name.lower() == const.TORCH_TO_NPU: - self._add_torch_to_npu(event) - return - - if event.args.get(const.CALL_STACKS): - self._add_ops_with_stack(event) - return - - if event.args.get(const.TASK_TYPE) and event.args.get(const.TASK_TYPE) in [const.AI_CORE, const.AI_CPU]: - self._add_ops_with_task_type(event) - return - - if event.name and event.ts and event.name == const.ACL_TO_NPU: - self._add_acl_to_npu(event) - return - - def _post_process(self): - # eliminate sub aten operator of the first level aten operator by 'ts' and 'dur', - # keep the first level aten operator contiguous - formated_atens = [] - for event in sorted(self._aten, key=lambda x: x.get("ts", -1)): - if event.name.startswith(const.ATEN): - if not formated_atens or not formated_atens[-1].ts_include(event): - formated_atens.append(event) - - elif event.name.startswith(const.SYNC_STREAM): - self._synchronize_stream.update_sync_stream_count() - if formated_atens[-1].ts_include(event): - # 使用aten算子的索引,用于查询堆栈 - event["dataset_index"] = formated_atens[-1].get("dataset_index") - self._synchronize_stream.append_slow_sync_stream(event) - - else: - continue - self._aten = formated_atens diff --git a/profiler/advisor/display/__init__.py b/profiler/advisor/display/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/display/html/__init__.py b/profiler/advisor/display/html/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/display/html/render.py b/profiler/advisor/display/html/render.py deleted file mode 100644 index 3984fa8f34f0858a7281c9b51caaa43a170baf86..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/render.py +++ /dev/null @@ -1,44 +0,0 @@ -import os -import logging -from typing import List, Dict -from collections import defaultdict - -from jinja2 import Environment, FileSystemLoader -from profiler.advisor.common import constant - -from profiler.advisor.config.config import Config -from profiler.advisor.utils.utils import singleton, safe_write - -logger = logging.getLogger() - - -@singleton -class HTMLRender: - def __init__(self): - self.html = "" - self.render_list = defaultdict(list) - - def render_html(self, template_dir: str = "templates", template_name: str = "main.html", - template_header=constant.DEFAULT_TEMPLATE_HEADER): - self.html = self.render_template("main", template_dir, template_name, render_list=self.render_list, - template_header=template_header) - - def render_template(self, key: str, template_dir: str, template_name: str, **kwargs): - if not os.path.isabs(template_dir): - template_dir = os.path.join(os.path.dirname(__file__), template_dir) - - env = Environment(loader=FileSystemLoader(template_dir), - autoescape=True) - template = env.get_template(template_name) - rendered_html = template.render(**kwargs) - self.render_list[key].append(rendered_html) - return rendered_html - - def save_to_file(self, save_path: str): - if not save_path.endswith(".html"): - logger.error("Skip save html file because file name must endswith `.html`, " - "but got %s.", os.path.basename(save_path)) - return - - safe_write(self.html, save_path) - logger.info("Save suggestion to %s.", os.path.join(Config().work_path, save_path)) diff --git a/profiler/advisor/display/html/templates/__init__.py b/profiler/advisor/display/html/templates/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/display/html/templates/affinity_api.html b/profiler/advisor/display/html/templates/affinity_api.html deleted file mode 100644 index 4d12c3e37536392d122f85fc6ef3a4fcc123ef77..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/affinity_api.html +++ /dev/null @@ -1,50 +0,0 @@ -{% if result|length > 0 %} -
-

Affinity API Issues

-
- The analysis results of following affinity APIs are based on runtime env - cann-{{ cann_version }} - and - torch-{{ torch_version }} - -
- - {% if empty_stacks %} - Suggestion: - These APIs have no code stack. If parameter 'with_stack=False' was set while profiling, please refer to - Ascend PyTorch Profiler to set - 'with_stack=True'. Otherwise, ignore following affinity APIs due to backward broadcast lack of stack. - {% endif %} - - {% for api_name, stacks in result.items() %} - - {% if empty_stacks %} -
{{api_name|safe}}
- - {% else %} - -
{{api_name|safe}}
-
- -
- {% for stack in stacks %} -
No.{{loop.index|safe}} code stack, called {{stack[1]|safe}} times
- - {% endfor %} -
-
- {% endif %} - - {% endfor %} - -
- -
-
-{% endif %} diff --git a/profiler/advisor/display/html/templates/ai_core_frequency.html b/profiler/advisor/display/html/templates/ai_core_frequency.html deleted file mode 100644 index d04514203733b445ecb6ce2b69435ce5a86e353d..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/ai_core_frequency.html +++ /dev/null @@ -1,27 +0,0 @@ -{% if data|length > 0 %} -
-

AI CORE Frequency Issues

-
- Issue: {{ desc }} -
- Suggestion: {{ suggestion }} -

- - - {% for header in headers %} - - {% endfor %} - - - {% for row in data %} - - {% for element in row %} - - {% endfor %} - - {% endfor %} -
{{ header }}
{{ element|safe }}
- -
-
-{% endif %} \ No newline at end of file diff --git a/profiler/advisor/display/html/templates/cluster_analysis.html b/profiler/advisor/display/html/templates/cluster_analysis.html deleted file mode 100644 index 32379d56fcb87a78269612107d1b7634b722d8d8..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/cluster_analysis.html +++ /dev/null @@ -1,49 +0,0 @@ -
-

{{title|safe}}

-
-
- - {% if result.get("Description") %} -
Description
- - {% endif %} - - {% if result.get("Suggestion") %} -
Suggestion
- - {% endif %} - - {% if result.get("details") %} -
details
-
- {% for item in result.get("details") %} - - - {% for header in item.get("headers") %} - - {% endfor %} - - {% for row in item.get("data") %} - - {% for element in row %} - {% if element is number %} - - {% else %} - - {% endif %} - {% endfor %} - - {% endfor %} -
{{ header }}
{{ element|round(2) }}{{ element }}
- {% endfor %} -
- {% endif %} - -
- -
-
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/compute_analysis.html b/profiler/advisor/display/html/templates/compute_analysis.html deleted file mode 100644 index e1907c091b705969004bf709db24211c66c38107..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/compute_analysis.html +++ /dev/null @@ -1,29 +0,0 @@ -
-

Abnormal Performance Operator

-
- {{table.get("title")}} - - - - {% for header in table.get("headers") %} - - {% endfor %} - - {% for row in table.get("rows") %} - - {% for element in row %} - {% if element is number %} - - {% else %} - - {% endif %} - {% endfor %} - - {% endfor %} -
{{ header }}
{{ element|round(2) }}{{ element }}
- {% if call_stack %} - call stack:
- {{call_stack}} - {% endif %} -
-
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/fusion.html b/profiler/advisor/display/html/templates/fusion.html deleted file mode 100644 index 605a9d748f7d4499a603efb87bc310fab9bc02f3..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/fusion.html +++ /dev/null @@ -1,47 +0,0 @@ -{% if candidates|length > 0 %} -
-

Fusion Issues

-
-
- {% for node in candidates %} -
{{node.op_pass|safe}}
-
- - - - - - - - - - - -
StructureCountsElapsed Time(us)
{{ node.fusion_pattern|safe }}{{ node.counts|safe }}{{ node.total_duration|safe }}
-
- {% for match in node.matches %} -
SubGraph {{ loop.index|safe }}
-
- - - - - - - {% for node in match %} - - - - - - {% endfor %} -
OP NameOP TypeElapsed Time(us)
{{ node.op_name|safe }}{{ node.dtype|safe }}{{ node.duration|safe }}
-
- {% endfor %} -
-
- {% endfor %} -
-
-
-{% endif %} diff --git a/profiler/advisor/display/html/templates/main.html b/profiler/advisor/display/html/templates/main.html deleted file mode 100644 index 3727125b419547fc6a9ac9743eab34e1e1b76256..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/main.html +++ /dev/null @@ -1,203 +0,0 @@ - - - - - - - -
-

Performance Optimization Suggestions

-{% for key, renders in render_list.items() %} - {% if key == 'operator'%} -
-

computation

-
- {% for render in renders %} - {{render|safe}} - {% endfor %} -
-
- {% else %} -
-

{{ key }}

-
- {% for render in renders %} - {{render|safe}} - {% endfor %} -
-
- {% endif %} -{% endfor %} - -
- - - - - \ No newline at end of file diff --git a/profiler/advisor/display/html/templates/operator_ai_cpu.html b/profiler/advisor/display/html/templates/operator_ai_cpu.html deleted file mode 100644 index b3235a88022fc3973ae0098f543d94cc4b7fac25..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/operator_ai_cpu.html +++ /dev/null @@ -1,61 +0,0 @@ -
-

AICPU Issues

-
- - - - - - - - - - - - - -
DescriptionSuggestionElapsed Time(us)Time Ratio
{{ format_result.record.optimization_item.description|safe }}{{ format_result.suggestion|safe }}{{ format_result.task_duration|safe }}{{ format_result.record.statistics_item.task_duration_ratio|safe }}
-
- {% for op_type, op_info in format_result.statistic %} -
{{ op_type|safe }}
-
- - - - - - - - - - - -
Operator TypeCountsElapsed Time(us)
{{ op_info.summary.op_type|safe }}{{ op_info.summary.counts|safe }}{{ op_info.summary.total_duration|safe }}
-
- {% for trace_stack, info in op_info.op_info_list %} -
- {{ info.summary.op_type|safe }} | Input DType:({{info.op_info_list[0].input_data_types|safe}}) | Output DType:({{info.op_info_list[0].output_data_types|safe}}) | Counts:{{ info.summary.counts|safe}} | Elapsed Time(us):{{ - info.summary.total_duration|safe}} -
-
- {% if info.op_info_list[0].suggestions|length > 0 %} -
- {% for suggestion in info.op_info_list[0].suggestions %} -

- Suggestion {{ loop.index|safe }}: {{suggestion|safe}} -

- {% endfor %} -
- {% else %} -

Suggestion 1: Modify code to avoid AICPU operator

- {% endif %} -
- {{ info.op_info_list[0].stack_info|safe }} -
- {% endfor %} -
-
- {% endfor %} -
-
-
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/operator_block_dim.html b/profiler/advisor/display/html/templates/operator_block_dim.html deleted file mode 100644 index 4e2c832f623a4c0a0f315ebdc2b7a97aeb1996a1..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/operator_block_dim.html +++ /dev/null @@ -1,38 +0,0 @@ -
-

Block Dim Issues

-
- - - - - - - - - - - - - -
DescriptionSuggestionElapsed Time(us)Time Ratio
{{ format_result.record.optimization_item.description|safe }}{{ format_result.suggestion|safe }}{{ format_result.task_duration|safe }}{{ format_result.record.statistics_item.task_duration_ratio|safe }}
-
- {% for op_type, op_info in format_result.statistic %} -
{{ op_type|safe }}
-
- - - - - - - - - - - -
Operator TypeCountsElapsed Time(us)
{{ op_info.summary.op_type|safe }}{{ op_info.summary.counts|safe }}{{ op_info.summary.total_duration|safe }}
-
- {% endfor %} -
-
-
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/operator_dispatch.html b/profiler/advisor/display/html/templates/operator_dispatch.html deleted file mode 100644 index c805086354a41f7f98a803b66b3b666c59393899..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/operator_dispatch.html +++ /dev/null @@ -1,37 +0,0 @@ -{% if optimizers|length > 0 %} -
-

Operator Dispatch Issues

-
- - - - - - - {% for optimizer in optimizers %} - - - - - {% endfor %} -
DescriptionSuggestion
{{ optimizer.description |safe }}{{ optimizer.suggestion|safe }}
- - - - - - - - - {% for issue in issues %} - - - - - - {% endfor %} -
IssueCountsElapsed Time(us)
{{ issue.op_name |safe }}{{ issue.counts |safe }}{{ issue.total_time |safe }}
-
- -
-{% endif %} \ No newline at end of file diff --git a/profiler/advisor/display/html/templates/operator_dynamic_shape.html b/profiler/advisor/display/html/templates/operator_dynamic_shape.html deleted file mode 100644 index 59920b6c9ec276c9edddfd1906a31b41fb106e26..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/operator_dynamic_shape.html +++ /dev/null @@ -1,15 +0,0 @@ -
-

Operator Dynamic Shape Issues

-
- - - - - - - - - -
DescriptionSuggestion
{{ format_result.record.optimization_item.description|safe }}{{ format_result.suggestion|safe }}
-
-
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/operator_no_bound.html b/profiler/advisor/display/html/templates/operator_no_bound.html deleted file mode 100644 index cfbd20baad208216d2d9a1ee856702a163a6abfa..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/operator_no_bound.html +++ /dev/null @@ -1,38 +0,0 @@ -
-

Operator No Bound Issues

-
- - - - - - - - - - - - - -
DescriptionSuggestionElapsed Time(us)Time Ratio
{{ format_result.record.optimization_item.description|safe }}{{ format_result.suggestion|safe }}{{ format_result.task_duration|safe }}{{ format_result.record.statistics_item.task_duration_ratio|safe }}
-
- {% for op_type, op_info in format_result.statistic %} -
{{ op_type|safe }}
-
- - - - - - - - - - - -
Operator TypeCountsElapsed Time(us)
{{ op_info.summary.op_type|safe }}{{ op_info.summary.counts|safe }}{{ op_info.summary.total_duration|safe }}
-
- {% endfor %} -
-
-
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/overall_analysis.html b/profiler/advisor/display/html/templates/overall_analysis.html deleted file mode 100644 index ec61ae224ff2da59f2a80a9b4b10117d4c4c7c7a..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/overall_analysis.html +++ /dev/null @@ -1,15 +0,0 @@ -

Model Profiling Time Distribution

- - - {% for header in headers %} - - {% endfor %} - - {% for row in rows %} - - {% for element in row %} - - {% endfor %} - - {% endfor %} -
{{ header }}
{{ element }}
\ No newline at end of file diff --git a/profiler/advisor/display/html/templates/slow_dataloader.html b/profiler/advisor/display/html/templates/slow_dataloader.html deleted file mode 100644 index bf71a7085b70d80d04d76cfb1778029e5fdf9353..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/slow_dataloader.html +++ /dev/null @@ -1,18 +0,0 @@ -
-

Slow Dataloader Issues

-
- {{ desc }} - - - - - - {% for suggestion in suggestions %} - - - - {% endfor %} -
Suggestions
{{ loop.index }}. {{ suggestion|safe }}
- -
-
diff --git a/profiler/advisor/display/html/templates/sync_batchnorm.html b/profiler/advisor/display/html/templates/sync_batchnorm.html deleted file mode 100644 index bb46c1f06d15ed84b4ffa276d614a317c656cf22..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/sync_batchnorm.html +++ /dev/null @@ -1,30 +0,0 @@ - -
-

SyncBatchNorm Issues

-
- {{ desc }} - - - - - {% for item in solutions %} - {% set rowloop = loop %} - {% for key, value in item.items() %} - - - - {% endfor %} - {% endfor %} -
Suggestions
{{ rowloop.index }}. {{ value.desc }}
- - More efficient code of syncbn forward as follows: - {% for item in solutions %} - {% for key, value in item.items() %} - {% if 'efficient_code' in value %} -
{{ value.efficient_code|safe }}
- {% endif %} - {% endfor %} - {% endfor %} - -
-
diff --git a/profiler/advisor/display/html/templates/synchronize_stream.html b/profiler/advisor/display/html/templates/synchronize_stream.html deleted file mode 100644 index 1832f9406d3e234f80278f2065f461c2db4ae82b..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/synchronize_stream.html +++ /dev/null @@ -1,57 +0,0 @@ -
-

Synchronize Stream Issues

-
- {{ desc }} - - - - - - - {% for item in solutions %} - {% set rowloop = loop %} - {% for key, value in item.items() %} - - - - - {% endfor %} - {% endfor %} -
Suggestions
{{ rowloop.index }}. {{ value.desc }}
- -
- {% if not empty_stacks %} - Please click on the collapsible box below to view the detailed code stack that triggers synchronizeStream - {% elif not framework_black_list %} - Suggestion: - These operators have no code stack. If parameter 'with_stack=False' was set while profiling, please refer to - Ascend PyTorch Profiler to set - 'with_stack=True'. Otherwise, ignore following affinity APIs due to backward broadcast lack of stack. - {% endif %} - - {% for api_name, stacks in result.items() %} - - {% if empty_stacks %} -
{{api_name|safe}}
- - {% elif stacks | length > 0 %} - -
{{api_name|safe}}
-
-
- {% for stack in stacks %} -
No.{{loop.index|safe}} code stack, called {{stack[1]|safe}} times
- - {% endfor %} -
-
- {% endif %} - - {% endfor %} - -
- -
-
diff --git a/profiler/advisor/display/html/templates/timeline_analysis.html b/profiler/advisor/display/html/templates/timeline_analysis.html deleted file mode 100644 index b5ea89124277e05e7fdea63a34704df52bb322d4..0000000000000000000000000000000000000000 --- a/profiler/advisor/display/html/templates/timeline_analysis.html +++ /dev/null @@ -1,34 +0,0 @@ -
-

{{title|safe}}

-
-
-
- {% if result.get("img") %} -
- Image -
- {% endif %} - - {% if result.get("current") %} - - {% endif %} - - {% if result.get("bottlenect") %} - - {% endif %} - - {% if result.get("advice") %} - - {% endif %} - -
-
-
-
diff --git a/profiler/advisor/doc/Samples of AI CPU Operator Replacement.md b/profiler/advisor/doc/Samples of AI CPU Operator Replacement.md deleted file mode 100644 index 6a72ecee2e099b723b05681fb1ed54b2e4198155..0000000000000000000000000000000000000000 --- a/profiler/advisor/doc/Samples of AI CPU Operator Replacement.md +++ /dev/null @@ -1,178 +0,0 @@ -# AI CPU 算子替换样例 - -部分算子因为数据输入类型问题或者算子实现问题,导致会在昇腾芯片的AI CPU上执行,没有充分利用AI CORE的资源,从而导致计算性能较差,影响训练速度。部分场景下,可以通过修改Python代码来减少这类AI CPU算子,从而提升训练性能。 - -当前对 AICPU 算子识别到的调优方式主要包含两种: - -- PyTorch数据类型转换,将执行在AICPU上的类型算子转换为执行在AICORE单元的算子。 -- 等价的算子替换。 - -## 类型转换方式 - -当前PyTorch支持的dtype类型如下,详见[Link](https://pytorch.org/docs/stable/tensor_attributes.html)。 - -图1 PyTorch支持的dtype - -![img](./img/Pytorch_dtype.png) - -基于此对常见的算子如MUL、Equal、TensorEqual等做单算子测试,看有哪些类型的算子是执行在AICPU上的,然后尝试转换到支持AICORE单元的类型dtype上计算,实现效率提升的目的。 - -### MUL - -图2 Mul - -![img](./img/Mul.png) - -AICORE支持的dtype。 - -```python -float, float32, float16, dt_bf16, int32, int64, int8, uint8, complex64 -``` - -AICPU 类型的 dtype。 - -```python -int16, complex128 -``` - -### Equal - -图3 Equal - -![img](./img/Equal.png) - -AICORE支持的dtype。 - -```python -float, float32, float16, dt_bf16, bool, int32, int64, int8, uint81 -``` - -AICPU 类型的 dtype。 - -```python -int16, complex64, complex128 -``` - -### TensorEqual - -图4 TensorEqual - -![img](./img/TensorEqual.png) - -AICORE支持的dtype。 - -```python -float, float32, float16, dt_bf16, float64, bool, int32, int8, uint81 -``` - -AICPU 类型的 dtype。 - -```python -int16, int64 -``` - -## 算子等价替换 - -### Index算子替换 - -- 情形一 :index by index - - 这种操作会造成输出的shape和输入的shape不一致,我们可以直接用index\_select(gatherV2)操作替换该算子运行在aicore性能高上很多。 - - 图5 index by index - - img - -- 情形二:index\_put by index - - ```python - tensor[index] = 3 - ``` - - 这类操作尽量避免,没有特别好的替代方式,可以将index转化成mask,或者一开始就生成mask作为索引而不是index。 - - 如果要替换可以用scatter算子替换,目前发现用到这种场景时index一般比较少,所以用index方式可能性能更高。 - -- 情形三:index\_put by mask - - ```python - tensor\_a[mask] = 3 - ``` - - index\_put by mask可以通过where (selectV2)算子来替代。这种方式与原先语义不同的是,会返回一个新的tensor。 - - 图6 index\_put by mask - - ![img](./img/index_put_by_mask.png) - - index by mask或者index_put by mask相对来说对NPU和框架比较友好。关键在保持shape这样不需要contiguous,然后将必要的index抽取操作放在最后。在index比较少的情况下,index操作就比较快了,可能优于替换。 - -### IndexPut算子替换 - -在tensor类型的赋值和切片操作时,会使用IndexPut算子执行,一般都在AICPU上执行,可以转换为等价的tensor操作转换到CUBE单元上执行。例如: - -```python -masked_input[input_mask] = 0 -``` - -建议替换为: - -```python -masked_input *= ~input_mask -``` - -此处是将IndexPut的masked_input是float类型的tensor数据,input_mask是和masked_input shape 一致的bool类型tensor或者01矩阵。由于是赋0操作,所以先对input_mask 取反后再进行乘法操作。 - -以赋0操作为例,在shape = (512, 32, 64) 类型float32 数据上测试,替换前耗时: 9.639978408813477 ms,替换之后耗时为 0.1747608184814453 ms,如下图,替换前,总体耗时在9.902ms,Host下发到device侧执行5个算子,其中aclnnIndexPutImpl_IndexPut_IndexPut是执行在 AICPU上。 - -图7 替换前耗时 - -![img](./img/替换前耗时.png) - -替换后,总体耗时226.131us。下发三个执行算子,均执行在AI CORE上。 - -图8 替换后耗时 - -![img](./img/替换后耗时.png) - -### ArgMin算子优化 - -ArgMin在CANN 6.3 RC2版本上算子下发到 AICPU执行,在CANN 7.0RC1上下发到AI_CORE 上边执行。出现此类情形建议升级CANN包版本。 - -在shape大小是 (1024, 1024) 的tensor上测试,结果如下:CANN 6.3.RC2上,单算子执行时间 2.603 ms。 - -图9 单算子执行时间(CANN 6.3.RC2) - -![img](./img/single_op_time_CANN63RC2.png) - -CANN7.0 RC1上,单算子执行时间 223.516 us。 - -图10 单算子执行时间(CANN7.0 RC1) - -![img](./img/single_op_time_CANN70RC1.png) - -### nonzero算子优化 - -将mask转化为index,对于所有值大于0的tensor在某些计算中可以利用乘法替代。比如要对mask的tensor求和。tensor_a[mask].sum()就相当于(tensor_a * mask).sum()。 - -例如: - -```python -shape = (1024, ) -mask= torch.randint(-1, 2, shape).npu() -tensor_a = torch.ones(shape).float().npu() -mask_inds = torch.nonzero( - gt_inds > 0, as_tuple=False).squeeze(1) - -tensor_sum = tensor_a[mask_inds].sum() -``` - -就相当于: - -```python -shape = (1024, ) -mask= torch.randint(-1, 2, shape).npu() -tensor_a = torch.ones(shape).float().npu() -mask_inds = torch.nonzero( gt_inds > 0, as_tuple=False).squeeze(1) -tensor_sum2 = (tensor_a * mask_inds2).sum() -``` \ No newline at end of file diff --git a/profiler/advisor/doc/Samples of Fused Operator API Replacement.md b/profiler/advisor/doc/Samples of Fused Operator API Replacement.md deleted file mode 100644 index e62da1bbb499b6008b14405169a0d1629c6abd97..0000000000000000000000000000000000000000 --- a/profiler/advisor/doc/Samples of Fused Operator API Replacement.md +++ /dev/null @@ -1,406 +0,0 @@ -# 昇腾迁移融合算子API替换样例 - -部分torch原生的API在下发和执行时会包括多个小算子,下发和执行耗时较长,可以通过替换成NPU API来使能融合算子,提升训练性能。 - -torch_npu api的功能和参数描述见[API列表](https://www.hiascend.com/document/detail/zh/canncommercial/700/modeldevpt/ptmigr/ptaoplist_000002.html)。 - -## 优化器替换 - -替换优化器一般都能有较大的性能受益,可以优先考虑将torch原生的优化器替换为[昇腾提供的亲和优化器](https://www.hiascend.com/document/detail/zh/canncommercial/63RC2/modeldevpt/ptmigr/ptmigr_0080.html)。下文以AdamW优化器为例,其他优化器的替换方式一致。 - -### torch_npu.optim.NpuFusedAdamW - -torch原生代码示例如下: - -```python -import torch -optimizer = torch.optim.AdamW( - model.parameters(), - learning_rate, - momentum=momentum, - weight_decay=weight_decay -) -``` - -torch_npu代码示例如下: - -```python -import torch_npu -from torch_npu.contrib import transfer_to_npu - -optimizer = torch_npu.optim.NpuFusedAdamW( - model.parameters(), - learning_rate, - momentum=momentum, - weight_decay=weight_decay -) -``` - -## 亲和API替换 - -### optimizer.clip_grad_norm_fused_ - -在替换为npu亲和梯度裁剪api之前,请确保代码中已使用npu亲和优化器。 - -torch原生代码示例如下: - -```python -import torch -optimizer = torch.optim.AdamW(model.parameters(), lr = lr) -torch.nn.utils.clip_grad_norm_(parameters=model.parameters(), max_norm=10, norm_type=2) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -optimizer = torch_npu.optim.NpuFusedAdamW(model.parameters(), lr = lr) -optimizer.clip_grad_norm_fused_(max_norm=10, norm_type=2) -``` - -### torch_npu.npu_confusion_transpose - -**示例一** - -torch原生代码示例如下: - -```python -import torch - -data = torch.rand(64, 3, 64, 128).cuda() -batch, channel, height, width = data.shape -result = torch.permute(data, (0, 2, 1, 3)).reshape(height, batch, channel*width) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -data = torch.rand(64, 3, 64, 128).cuda() -batch, channel, height, width = data.shape -result = torch_npu.npu_confusion_transpose(data, (0, 2, 1, 3), (height, batch, channel*width), transpose_first=True) -``` - -**示例二** - -torch原生代码示例如下: - -```python -import torch - -data = torch.rand(64, 3, 64, 128).cuda() -batch, channel, height, width = data.shape -result = dat.view(batch, height*channel*width).transpose(1, 0) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -data = torch.rand(64, 3, 64, 128).cuda() -batch, channel, height, width = data.shape -result = torch_npu.npu_confusion_transpose(data, (1, 0), (batch, height*channel*width), transpose_first=False) -``` - -### torch_npu.npu_scaled_masked_softmax - -注意atten_mask和atten_scores张量最后一维的取值范围为32-8192,且必须为32的整数倍。 - -torch原生代码示例如下: - -```python -import torch -x = torch.randn([64, 8, 128, 256]).cuda() -mask = torch.randn([1, 1, 128, 256]).cuda() >= 1 -scale = 0.8 - -output = torch.softmax((x * scale).masked_fill(mask, -1*torch.inf), dim=-1) -# shape is (64, 8, 128, 256) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -x = torch.randn([64, 8, 128, 256]).cuda() -mask = torch.randn([1, 1, 128, 256]).cuda() >= 1 -scale = 0.8 - -output = torch_npu.npu_scaled_masked_softmax(x, mask, scale) -# shape is (64, 8, 128, 256) -``` - -### torch_npu.fast_gelu - -**示例一** - -替换torch.nn.functional.fast_gelu方法,实现上有些差异,激活函数输出结果会不同。 - -torch原生代码示例如下: - -```python -import torch -input_data = torch.rand(64, 32).cuda() -result = torch.nn.functional.gelu(input_data) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -input_data = torch.rand(64, 32).cuda() -result = torch_npu.fast_gelu(input_data) -``` - -**示例二** - -继承torch.nn.GELU,基于torch_npu.fast_gelu重写forward方法。 - -torch原生代码示例如下: - -```python -import torch -input_data = torch.rand(64, 32).cuda() -gelu_module = torch.nn.GELU().cuda() -result3 = gelu_module(input_data) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -# 继承torch.nn.GELU,基于torch_npu.fast_gelu重写forward方法 -class FastGelu(torch.nn.GELU): - def forward(self, input_data): - return torch_npu.fast_gelu(input_data) - -input_data = torch.rand(64, 32).cuda() -fast_gelu_module = FastGelu().cuda() -result = fast_gelu_module(input_data) -``` - -### torch_npu.npu_rms_norm - -输入数据dtype仅支持float16、bfloat16、float。 - -torch原生代码示例如下: - -```python -import torch - -class TorchRMSNorm(torch.nn.Module): - def __init__(self, dim: int, eps = 1e-6): - super().__init__() - self.eps = eps - self.weight = nn.Parameter(torch.ones(dim)).cuda() - - def _norm(self, x): - return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps) - - def forward(self, x): - output = self._norm(x.float()).type_as(x) - return output * self.weight - -input_data = torch.randn(128, 256).cuda() -torch_rms_norm = TorchRMSNorm((128, 256)) -result = torch_rms_norm(data) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -class NpuRMSNorm(torch.nn.Module): - def __init__(self, dim: int, eps = 1e-6): - super().__init__() - self.eps = eps - self.weight = nn.Parameter(torch.ones(dim)).cuda() - - def forward(self, x): - return torch_npu.npu_rms_norm(x, self.weight, epsilon=self.eps)[0] - -input_data = torch.randn(128, 256).cuda() -npu_rms_norm = NpuRMSNorm((128, 256)) -result = npu_rms_norm(data) -``` - -### torch_npu.npu_swiglu - -输入数据dtype仅支持float16、bfloat16、float。 - -torch原生代码示例如下: - -```python -import torch -class TorchSwiGlu(torch.nn.Module): - def __init__(self, dim = -1): - super().__init__() - self.dim = dim - - def _swiglu(self, x): - x = torch.chunk(x, 2, -1) - return torch.nn.functional.silu(x[0]) * x[1] - - def forward(self, x): - output = self._swiglu(x) - return output - -input_data = torch.randn(128, 256).cuda() -torch_swiglu = TorchSwiGlu() -result = torch_swiglu(data) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -class NpuSwiGlu(torch.nn.Module): - def __init__(self, dim = -1): - super().__init__() - self.dim = dim - - def forward(self, x): - dim = -1 - return torch_npu.npu_swiglu(x, dim=dim) - -input_data = torch.randn(128, 256).cuda() -npu_swiglu = NpuSwiGlu() -result = npu_swiglu(data) -``` - -### torch_npu.npu_rotary_mul - -torch原生代码示例如下: - -```python -import torch - -x = torch.rand([2, 8192, 5, 128]).cuda() -r1 = torch.rand([1, 8192, 1, 128]).cuda() -r2 = torch.rand([1, 8192, 1, 128]).cuda() - -def torch_func(x, r1, r2): - x1, x2 = x[..., : x.shape[-1] // 2], x[..., x.shape[-1] // 2:] - # x1, x2 = torch.chunk(x, 2, -1) - x_new = torch.cat((-x2, x1), dim=-1) - output = r1 * x + r2 * x_new - return output - -result = torch_func(x, r1, r2) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - -x = torch.rand([2, 8192, 5, 128]).cuda() -r1 = torch.rand([1, 8192, 1, 128]).cuda() -r2 = torch.rand([1, 8192, 1, 128]).cuda() - -result = torch_npu.npu_rotary_mul(x, r1, r2) -``` - -### torch_npu.npu_fusion_attention - -torch原生代码示例如下: - -```python -import torch - -class TorchFlashAttention(): - def supported_op_exec(self, query, key, value, atten_mask=None): - scale = 0.099 - qk = torch.matmul(query, key.transpose(2, 3)).mul(scale) - - if atten_mask is not None: - qk.masked_fill_(atten_mask.npu(), torch.tensor(-float('inf')).npu()) - softmax_res = torch.nn.functional.softmax(qk, dim=-1, dtype=torch.float32).to(torch.float16) - output = torch.matmul(softmax_res, value) - output = output.transpose(1, 2) - output = output.reshape(output.shape[0], output.shape[1], -1) - return output - - def custom_op_exec(self, query, key, value, atten_mask=None): - scale = 0.099 - return torch_npu.npu_fusion_attention( - query, key, value, head_num=32, input_layout="BSH", scale=scale, atten_mask=atten_mask) - - def trans_BNSD2BSH(self, tensor: torch.Tensor): - tensor = torch.transpose(tensor, 1, 2) - tensor = torch.reshape(tensor, (tensor.shape[0], tensor.shape[1], -1)) - return tensor - - def test_torch_flash_attention(self, device="npu"): - query = torch.randn(1, 32, 128, 128, dtype=torch.float16) - key = torch.randn(1, 32, 128, 128, dtype=torch.float16) - value = torch.randn(1, 32, 128, 128, dtype=torch.float16) - atten_mask = torch.randn(1, 1, 128, 128, dtype=torch.float16).npu() >= 0 - - q_npu = self.trans_BNSD2BSH(query).npu() - k_npu = self.trans_BNSD2BSH(key).npu() - v_npu = self.trans_BNSD2BSH(value).npu() - - result = self.supported_op_exec(query.npu(), key.npu(), value.npu(), atten_mask=atten_mask) - # result shape (1, 128, 4096) -``` - -torch_npu代码示例如下: - -```python -import torch -import torch_npu -from torch_npu.contrib import transfer_to_npu - - -class NPUFlashAttention(): - - def npu_exec(self, query, key, value, atten_mask=None): - scale = 0.099 - return torch_npu.npu_fusion_attention( - query, key, value, head_num=32, input_layout="BSH", scale=scale, atten_mask=atten_mask) - - def trans_BNSD2BSH(self, tensor: torch.Tensor): - tensor = torch.transpose(tensor, 1, 2) - tensor = torch.reshape(tensor, (tensor.shape[0], tensor.shape[1], -1)) - return tensor - - def test_npu_flash_attention(self, device="npu"): - query = torch.randn(1, 32, 128, 128, dtype=torch.float16) - key = torch.randn(1, 32, 128, 128, dtype=torch.float16) - value = torch.randn(1, 32, 128, 128, dtype=torch.float16) - atten_mask = torch.randn(1, 1, 128, 128, dtype=torch.float16).npu() >= 0 - - q_npu = self.trans_BNSD2BSH(query).npu() - k_npu = self.trans_BNSD2BSH(key).npu() - v_npu = self.trans_BNSD2BSH(value).npu() - - result, softmax_max, softmax_sum, softmax_out, seed, offset, numels = self.npu_exec(q_npu, k_npu, v_npu, atten_mask) - # result shape (1, 128, 4096) -``` \ No newline at end of file diff --git a/profiler/advisor/doc/img/Equal.png b/profiler/advisor/doc/img/Equal.png deleted file mode 100644 index 97422e959552dc34a3b67b511a91c3b3be23ced1..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/Equal.png and /dev/null differ diff --git a/profiler/advisor/doc/img/Mul.png b/profiler/advisor/doc/img/Mul.png deleted file mode 100644 index 8f0614c3fccc52093b681cae75a98a2fbff9676c..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/Mul.png and /dev/null differ diff --git a/profiler/advisor/doc/img/Pytorch_dtype.png b/profiler/advisor/doc/img/Pytorch_dtype.png deleted file mode 100644 index 054779ca3c5d149452655f3cf34037404237373c..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/Pytorch_dtype.png and /dev/null differ diff --git a/profiler/advisor/doc/img/TensorEqual.png b/profiler/advisor/doc/img/TensorEqual.png deleted file mode 100644 index 079960e2410d4c2b4b6c48d782f4b6a332d83e18..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/TensorEqual.png and /dev/null differ diff --git a/profiler/advisor/doc/img/index by index.png b/profiler/advisor/doc/img/index by index.png deleted file mode 100644 index 0629cfb1ef80ef6f0ca2cab5149e912ce7252611..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/index by index.png and /dev/null differ diff --git a/profiler/advisor/doc/img/index_put_by_mask.png b/profiler/advisor/doc/img/index_put_by_mask.png deleted file mode 100644 index c4efd95da6218cfb673b3ac49ed5882d0c7f4d7b..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/index_put_by_mask.png and /dev/null differ diff --git a/profiler/advisor/doc/img/single_op_time_CANN63RC2.png b/profiler/advisor/doc/img/single_op_time_CANN63RC2.png deleted file mode 100644 index 8195e641e88436b90c5095c2d300d8ec7284b7e0..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/single_op_time_CANN63RC2.png and /dev/null differ diff --git a/profiler/advisor/doc/img/single_op_time_CANN70RC1.png b/profiler/advisor/doc/img/single_op_time_CANN70RC1.png deleted file mode 100644 index 94b3f39d0ffa2fac263e261075be723a11398e86..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/doc/img/single_op_time_CANN70RC1.png and /dev/null differ diff --git "a/profiler/advisor/doc/img/\346\233\277\346\215\242\345\211\215\350\200\227\346\227\266.png" "b/profiler/advisor/doc/img/\346\233\277\346\215\242\345\211\215\350\200\227\346\227\266.png" deleted file mode 100644 index 15ef10bd3642e77aa0eb20cf636ecd2ecd8037b8..0000000000000000000000000000000000000000 Binary files "a/profiler/advisor/doc/img/\346\233\277\346\215\242\345\211\215\350\200\227\346\227\266.png" and /dev/null differ diff --git "a/profiler/advisor/doc/img/\346\233\277\346\215\242\345\220\216\350\200\227\346\227\266.png" "b/profiler/advisor/doc/img/\346\233\277\346\215\242\345\220\216\350\200\227\346\227\266.png" deleted file mode 100644 index 7c5042f795ddc899b27f7196cb5b32ec92e1c832..0000000000000000000000000000000000000000 Binary files "a/profiler/advisor/doc/img/\346\233\277\346\215\242\345\220\216\350\200\227\346\227\266.png" and /dev/null differ diff --git a/profiler/advisor/fusion_operators_api_analysis.ipynb b/profiler/advisor/fusion_operators_api_analysis.ipynb deleted file mode 100644 index ac758f562f13c9dd7466279aac73002c0e68da55..0000000000000000000000000000000000000000 --- a/profiler/advisor/fusion_operators_api_analysis.ipynb +++ /dev/null @@ -1,211 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import sys\n", - "sys.path.append(\"../..\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from prettytable import PrettyTable, ALL\n", - "from textwrap import fill\n", - "from profiler.advisor.interface.interface import Interface" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "profiling_path = \"YOUR PROFILING PATH\"\n", - "interface = Interface(profiling_path=profiling_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 融合算子API识别" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "指定profiling路径后,可以自动识别其中包含的融合算子并给出对应的torch_npu api和需要修改的代码堆栈。基于给定堆栈可以快速定位到需要修改的代码段,替换torch_npu api后,能够减少pytorch侧的小算子的下发,进而提升模型训练速度。" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - " \r" - ] - } - ], - "source": [ - "timeline_fusion_ops_result = interface.get_result(\"schedule\", \"timeline_fusion_ops\")" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
problemdescriptionsuggestion
timeline_fusion_opsFound 2 apis to be replaced based on the runtime env cann-8.0.RC1 and torch-2.1.01. Please replace training api according to sub table 'Affinity training api'
" - ], - "text/plain": [ - "+---------------------+---------------------------------------------------------------------------------+-------------------------------------------------------------------------------+\n", - "| problem | description | suggestion |\n", - "+---------------------+---------------------------------------------------------------------------------+-------------------------------------------------------------------------------+\n", - "| timeline_fusion_ops | Found 2 apis to be replaced based on the runtime env cann-8.0.RC1 and torch-2.1.0 | 1. Please replace training api according to sub table 'Affinity training api' |\n", - "+---------------------+---------------------------------------------------------------------------------+-------------------------------------------------------------------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "display_column_num = 3\n", - "problems = timeline_fusion_ops_result.get(\"problems\")\n", - "problem_table = PrettyTable(problems.get(\"headers\")[:display_column_num])\n", - "for row in problems.get(\"data\"):\n", - " for i in range(len(row)):\n", - " row[i] = fill(str(row[i]), width=80)\n", - " problem_table.add_row(row[:display_column_num])\n", - "\n", - "display(problem_table)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "如下所示,存在亲和优化器和梯度裁剪两个可替换的torch_npu api,并给出了具体的堆栈。" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Affinity APICode stacksStack called counts
optimizer.clip_grad_norm_fused_/home/ma-user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site-
packages/torch/nn/utils/clip_grad.py(49): clip_grad_norm_; /home/ma-
user/work/algorithms/doc_cls/Bert.py(205): train_epoch; /home/ma-
user/work/algorithms/doc_cls/Bert.py(252): <module>
2
torch_npu.optim.NpuFusedAdamW/home/ma-user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site-
packages/torch_npu/npu/profiler.py(675): __enter__; /home/ma-
user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site-
packages/torch_npu/npu/profiler.py(719): wrapper; /home/ma-
user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site-
packages/torch/optim/lr_scheduler.py(65): wrapper; /home/ma-
user/work/algorithms/doc_cls/Bert.py(219): train_epoch; /home/ma-
user/work/algorithms/doc_cls/Bert.py(252): <module>
2
" - ], - "text/plain": [ - "+---------------------------------+-----------------------------------------------------------------------+---------------------+\n", - "| Affinity API | Code stacks | Stack called counts |\n", - "+---------------------------------+-----------------------------------------------------------------------+---------------------+\n", - "| optimizer.clip_grad_norm_fused_ | /home/ma-user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site- | 2 |\n", - "| | packages/torch/nn/utils/clip_grad.py(49): clip_grad_norm_; /home/ma- | |\n", - "| | user/work/algorithms/doc_cls/Bert.py(205): train_epoch; /home/ma- | |\n", - "| | user/work/algorithms/doc_cls/Bert.py(252): | |\n", - "+---------------------------------+-----------------------------------------------------------------------+---------------------+\n", - "| torch_npu.optim.NpuFusedAdamW | /home/ma-user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site- | 2 |\n", - "| | packages/torch_npu/npu/profiler.py(675): __enter__; /home/ma- | |\n", - "| | user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site- | |\n", - "| | packages/torch_npu/npu/profiler.py(719): wrapper; /home/ma- | |\n", - "| | user/anaconda3/envs/PyTorch-1.11.0/lib/python3.9/site- | |\n", - "| | packages/torch/optim/lr_scheduler.py(65): wrapper; /home/ma- | |\n", - "| | user/work/algorithms/doc_cls/Bert.py(219): train_epoch; /home/ma- | |\n", - "| | user/work/algorithms/doc_cls/Bert.py(252): | |\n", - "+---------------------------------+-----------------------------------------------------------------------+---------------------+" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "fusion_ops_api = timeline_fusion_ops_result.get(\"timeline_fusion_ops\")\n", - "if fusion_ops_api:\n", - " fusion_ops_api_table = PrettyTable(fusion_ops_api.get(\"headers\"))\n", - "\n", - " for row in fusion_ops_api.get(\"data\"):\n", - " for i in range(len(row)):\n", - " row[i] = fill(str(row[i]), width=80)\n", - " fusion_ops_api_table.add_row(row)\n", - "\n", - " fusion_ops_api_table.hrules = ALL\n", - " display(fusion_ops_api_table)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.10" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/profiler/advisor/img/advisor_result.PNG b/profiler/advisor/img/advisor_result.PNG deleted file mode 100644 index a9652f4ca53ff142a5ebd1033075aad54f8f0297..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/advisor_result.PNG and /dev/null differ diff --git a/profiler/advisor/img/all.png b/profiler/advisor/img/all.png deleted file mode 100644 index ea9372ff7f4348f65c0f4ed67a96f237c9623417..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/all.png and /dev/null differ diff --git a/profiler/advisor/img/cluster.png b/profiler/advisor/img/cluster.png deleted file mode 100644 index 0ac2ee57e1abd73544714fa6025d2eb7343494b1..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/cluster.png and /dev/null differ diff --git a/profiler/advisor/img/cluster_1.png b/profiler/advisor/img/cluster_1.png deleted file mode 100644 index bef236b22cc5fcb1645f0acf32b16586080124b8..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/cluster_1.png and /dev/null differ diff --git a/profiler/advisor/img/computation.png b/profiler/advisor/img/computation.png deleted file mode 100644 index f2c67b6e37b309691cfb4da9d083812123a9a211..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/computation.png and /dev/null differ diff --git a/profiler/advisor/img/computation_1.png b/profiler/advisor/img/computation_1.png deleted file mode 100644 index 7ece29bbe3ce18f711f531613b0ee58dc94e496d..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/computation_1.png and /dev/null differ diff --git a/profiler/advisor/img/jupyter_report.PNG b/profiler/advisor/img/jupyter_report.PNG deleted file mode 100644 index baa860a7893e1801337916aea37475ea69bbaf04..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/jupyter_report.PNG and /dev/null differ diff --git a/profiler/advisor/img/overall.png b/profiler/advisor/img/overall.png deleted file mode 100644 index 1883d4c97388b1cfb774d05fc9e0d368d0c66901..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/overall.png and /dev/null differ diff --git a/profiler/advisor/img/overall_0.png b/profiler/advisor/img/overall_0.png deleted file mode 100644 index f74cf2dcf131f36df9901e20ea327d509c6fee67..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/overall_0.png and /dev/null differ diff --git a/profiler/advisor/img/schedule.png b/profiler/advisor/img/schedule.png deleted file mode 100644 index 52d66ed01c06bd154e2d4287558f8bed7afd6c96..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/schedule.png and /dev/null differ diff --git a/profiler/advisor/img/schedule_1.png b/profiler/advisor/img/schedule_1.png deleted file mode 100644 index f1432a59589c128ab3bf7f2b9a067d47203aa06d..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/schedule_1.png and /dev/null differ diff --git a/profiler/advisor/img/schedule_2.png b/profiler/advisor/img/schedule_2.png deleted file mode 100644 index 6363153f9709b5dd561934d7fddcf330cd7d7445..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/schedule_2.png and /dev/null differ diff --git a/profiler/advisor/img/schedule_3.png b/profiler/advisor/img/schedule_3.png deleted file mode 100644 index 4cca98637be4e22750a70433e365643b447cd975..0000000000000000000000000000000000000000 Binary files a/profiler/advisor/img/schedule_3.png and /dev/null differ diff --git a/profiler/advisor/interface/__init__.py b/profiler/advisor/interface/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/interface/interface.py b/profiler/advisor/interface/interface.py deleted file mode 100644 index 1d3872a1783111af7b1f543241da6b23fb14a632..0000000000000000000000000000000000000000 --- a/profiler/advisor/interface/interface.py +++ /dev/null @@ -1,83 +0,0 @@ -import os -from collections import OrderedDict -import sys -sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))), "cluster_analyse")) -sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))), "compare_tools")) - -from profiler.advisor.utils.utils import Timer -from profiler.advisor.analyzer.computation.profiling_analyzer import AicpuAnalyzer, BlockDimAnalyzer, DynamicShapeAnalyzer, OperatorBoundAnalyzer -from profiler.advisor.analyzer.schedule.fusion_ops.fusion_ops_analyzer import TimelineFusionOpsAnalyzer -from profiler.advisor.analyzer.graph_fusion.graph_fusion_analyzer import FusionOPAnalyzer -from profiler.advisor.common.analyzer_scopes import SupportedScopes -from profiler.advisor.analyzer.cluster.slow_rank_analyser import SlowRankAnalyzer -from profiler.advisor.analyzer.cluster.slow_link_analyser import SlowLinkAnalyzer -from profiler.advisor.analyzer.overall.overall_summary_analyzer import OverallSummaryAnalyzer -from profiler.advisor.analyzer.schedule.dispatch.timeline_op_dispatch_analyzer import OpDispatchAnalyzer -from profiler.advisor.analyzer.schedule.syncbn.syncbn_analyzer import SyncBNAnalyzer -from profiler.advisor.analyzer.schedule.synchronize_stream.synchronize_stream_analyzer import SynchronizeStreamAnalyzer -from profiler.advisor.analyzer.dataloader.dataloader_analyzer import DataloaderAnalyzer -from profiler.advisor.analyzer.computation.ai_core_freq.ai_core_freq_analyzer import AICoreFreqAnalyzer - - -class Interface: - supported_analyzer = { - "schedule": OrderedDict({ - SupportedScopes.SYNCBN: SyncBNAnalyzer, - SupportedScopes.TIMELINE_OP_DISPATCH: OpDispatchAnalyzer, - SupportedScopes.SYNCHRONIZE_STREAM: SynchronizeStreamAnalyzer, - SupportedScopes.TIMELINE_FUSION_OPS: TimelineFusionOpsAnalyzer - }), - "computation": OrderedDict({ - SupportedScopes.DYNAMIC_SHAPE_ANALYSIS: DynamicShapeAnalyzer, - SupportedScopes.AICPU_ANALYSIS: AicpuAnalyzer, - SupportedScopes.OPERATOR_NO_BOUND_ANALYSIS: OperatorBoundAnalyzer, - SupportedScopes.BLOCK_DIM_ANALYSIS: BlockDimAnalyzer, - SupportedScopes.GRAPH: FusionOPAnalyzer, - SupportedScopes.FREQ_ANALYSIS: AICoreFreqAnalyzer - }), - "communication": OrderedDict(), - "overall": OrderedDict({SupportedScopes.OVER_ALL: OverallSummaryAnalyzer}), - "dataloader": OrderedDict({SupportedScopes.DATALOADER: DataloaderAnalyzer}), - "cluster": OrderedDict({ - SupportedScopes.SLOW_RANK: SlowRankAnalyzer, - SupportedScopes.SLOW_LINK: SlowLinkAnalyzer - }) - } - - all_dimension = list(supported_analyzer.keys()) - - def __init__(self, **kwargs): - self.collection_path = os.path.realpath(kwargs.get("profiling_path")) - - @staticmethod - def get_scope(dimension): - return list(Interface.supported_analyzer.get(dimension).keys()) - - @staticmethod - def get_analyzer(dimension, scope): - return Interface.supported_analyzer.get(dimension).get(scope) - - def get_result(self: any, dimension: str, scope: str, render_html=False, output_dict=True, **kwargs): - """ - :Param mode: affinity apis, ai cpu and so on. - """ - if dimension not in self.all_dimension: - raise ValueError(f"Error dimension {dimension}, supported dimensions are {self.all_dimension}") - - supported_scopes = self.get_scope(dimension) - if scope not in supported_scopes: - raise ValueError(f"Error scope {scope}, supported scopes are {supported_scopes}") - - analyzer = self.get_analyzer(dimension, scope)(collection_path=self.collection_path, **kwargs) - result = analyzer.optimize(**kwargs) - - if render_html and result.data: - if hasattr(analyzer, "html_render"): - analyzer.html_render.render_html() - analyzer.html_render.save_to_file(f'mstt_advisor_{Timer().strftime}.html') - - return result if not output_dict else dict(result.data) - - -if __name__ == "__main__": - Interface() diff --git a/profiler/advisor/result/__init__.py b/profiler/advisor/result/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/result/item.py b/profiler/advisor/result/item.py deleted file mode 100644 index 02db7fdd0044e480ff7af524c4ba8ee34ee45a38..0000000000000000000000000000000000000000 --- a/profiler/advisor/result/item.py +++ /dev/null @@ -1,61 +0,0 @@ -class OptimizeItem: - - def __init__(self, problem, description, suggestion): - self.problem = problem - self.description = description - self.suggestion = suggestion - - @property - def data(self): - format_suggestions = [] - for index, suggesion in enumerate(self.suggestion): - format_suggestions.append(f"{index + 1}. {suggesion}") - suggestion_str = "\n".join(format_suggestions) - return [self.problem, self.description, suggestion_str] - - @property - def headers(self): - return ["category", "description", "suggestion"] - - -class StatisticsItem: - def __init__(self, total_task_duration, task_duration, count, income=None): - self.total_task_duration = total_task_duration - self.task_duration = task_duration - self.count = count - self.income = income - if not isinstance(task_duration, str): - self.task_duration_ratio = round(task_duration / total_task_duration, 4) if total_task_duration != 0 else 0 - else: - self.task_duration_ratio = "" - - @property - def data(self): - - def _cal_ratio(divisor, dividend): - if divisor and dividend != 0: - return divisor, round(divisor / dividend, 4) - else: - return "", "" - - income, income_ratio = _cal_ratio(self.income, self.total_task_duration) - return [self.count, self.total_task_duration, self.task_duration_ratio, income, income_ratio] - - @property - def headers(self): - return ["problem count", "total_time(us)", "time ratio", "income(us)", "income ratio"] - - -class OptimizeRecord: - - def __init__(self, optimization_item, statistics_item=None) -> None: - self.optimization_item = optimization_item - self.statistics_item = statistics_item or StatisticsItem("", "", "") - - @property - def data(self): - return self.optimization_item.data + self.statistics_item.data - - @property - def headers(self): - return self.optimization_item.headers + self.statistics_item.headers diff --git a/profiler/advisor/result/result.py b/profiler/advisor/result/result.py deleted file mode 100644 index 0d0602ee56c2090cef9833cfd1ac594cd8d36169..0000000000000000000000000000000000000000 --- a/profiler/advisor/result/result.py +++ /dev/null @@ -1,216 +0,0 @@ -import json -import os -import stat -from textwrap import fill -from collections import OrderedDict - -import click -import xlsxwriter -from prettytable import ALL, PrettyTable - -from profiler.advisor.common import constant as const -from profiler.advisor.utils.utils import singleton, logger -from profiler.advisor.config.config import Config - - -class ResultWriter: - def __init__(self, result_path=None): - self.result_path = result_path - self.workbook = xlsxwriter.Workbook(result_path) - - self.header_format = None - self.data_cell_format = None - self._init_header_format() - self._init_data_cell_format() - - def _init_header_format(self): - self.header_format = self.workbook.add_format({ - "bold": True, - "color": "#FFFFFF", - "bg_color": "#187498", - "align": "center", - "border": 1, - "font_name": "Arial", - }) - - def _init_data_cell_format(self): - self.data_cell_format = self.workbook.add_format({ - "bold": False, - "align": "left", - "valign": "top", - "border": 1, - "font_name": "Arial", - 'text_wrap': True - }) - - def add_data(self, sheet_name, headers, data_list): - sheet = self.workbook.add_worksheet(sheet_name) - - if headers: - for col_index, header in enumerate(headers): - sheet.write(0, col_index, header, self.header_format) - - if data_list: - for i, row_data in enumerate(data_list): - row_index = i + 1 - for col_index, value in enumerate(row_data): - sheet.write(row_index, col_index, value, self.data_cell_format) - - sheet.autofit() - - def save(self): - try: - self.workbook.close() - except Exception as e: - logger.error("Failed to save analysis results, reason is %s", e) - - -@singleton -class SheetRecoder: - - def __init__(self): - self._sheet_data = OrderedDict() - - @property - def sheet_data(self): - return self._sheet_data - - def _init_sheet_name(self, sheet_name): - if sheet_name not in self._sheet_data: - self._sheet_data[sheet_name] = {} - - def add_headers(self, sheet_name, headers): - self._init_sheet_name(sheet_name) - - if self._sheet_data[sheet_name].get("headers") is None: - self._sheet_data[sheet_name]["headers"] = headers - - def add_data(self, sheet_name, data): - self._init_sheet_name(sheet_name) - - if not isinstance(self._sheet_data[sheet_name].get("data"), list): - self._sheet_data[sheet_name]["data"] = [] - if data not in self._sheet_data[sheet_name]["data"]: - self._sheet_data[sheet_name]["data"].append(data) - - def clear(self): - self._sheet_data.clear() - - -@singleton -class OptimizeResult: - - def __init__(self): - self.result_writer = ResultWriter(Config().analysis_result_file) - self.sheet_recorder = SheetRecoder() - self.page_dict = False - self._tune_op_list = [] - - @property - def data(self): - return self.sheet_recorder.sheet_data - - def add_tune_op_list(self, tune_op_list) -> None: - """ - add tune op name to tune op list - :param tune_op_list: list of operators to be optimized - :return: None - """ - for operator_name in tune_op_list: - if operator_name not in self._tune_op_list: - self._tune_op_list.append(operator_name) - - def add(self, overview_item): - sheet_name = "problems" - - headers = overview_item.headers - data = overview_item.data - self.sheet_recorder.add_headers(sheet_name, headers) - self.sheet_recorder.add_data(sheet_name, data) - - TerminalResult().add(overview_item.optimization_item.data) - self.page_dict = True - - def add_detail(self, sheet_name, headers=None, detail=None): - if headers: - self.sheet_recorder.add_headers(sheet_name, headers) - if detail: - self.sheet_recorder.add_data(sheet_name, detail) - self.page_dict = True - - def show(self): - for sheet_name, sheet_data in self.sheet_recorder.sheet_data.items(): - self.result_writer.add_data(sheet_name, sheet_data.get("headers"), sheet_data.get("data")) - - terminal_result = TerminalResult() - terminal_result.print() - if not terminal_result.result_list: - Config().remove_log() - return - self.result_writer.save() - logger.info("Save problems details file to %s", Config().analysis_result_file) - self._save_op_file_list() - - def clear(self) -> None: - self.data.clear() - - def _save_op_file_list(self) -> None: - if not self._tune_op_list: - return - tune_op_dict = {"tune_ops_name": self._tune_op_list} - tune_ops_file = Config().tune_ops_file - try: - - with os.fdopen(os.open(tune_ops_file, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, stat.S_IWUSR | stat.S_IRUSR), - 'w', encoding="utf-8") as op_tune_file: - json.dump(tune_op_dict, op_tune_file) - except OSError as error: - logger.error("Dump op_list to %s failed, %s", tune_ops_file, error) - return - logger.info("Save tune op name list to %s", tune_ops_file) - - -@singleton -class TerminalResult: - """ - Result output to screen - """ - - def __init__(self): - self.width, _ = self.get_terminal_size() - if self.width is None: - self.table = PrettyTable(["No.", "Category", "Description", "Suggestion"]) - else: - self.table = PrettyTable(["No.", "Category", "Description", "Suggestion"], - max_table_width=max(self.width - 20, 180)) - self.table.hrules = ALL - self.result_list = [] - - @staticmethod - def get_terminal_size(): - try: - width, height = os.get_terminal_size() - except OSError: - width, height = None, None - return width, height - - def add(self, result_str): - """ - add a result str - """ - self.result_list.append(result_str) - - def print(self): - """ - print screen result with format table - """ - table_row_cnt = 0 - for result in self.result_list: - table_row_cnt += 1 - self.table.add_row([table_row_cnt] + result) - self.table.align = "l" - - if table_row_cnt > 0: - click.echo(self.table) - else: - click.echo(click.style(const.SKIP_ANALYZE_PROMPT, fg='red')) diff --git a/profiler/advisor/rules/__init__.py b/profiler/advisor/rules/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/rules/aicpu_rules.yaml b/profiler/advisor/rules/aicpu_rules.yaml deleted file mode 100644 index 58e6eef163204ea1b5efbb5148770948bd4afdad..0000000000000000000000000000000000000000 --- a/profiler/advisor/rules/aicpu_rules.yaml +++ /dev/null @@ -1,103 +0,0 @@ -DataTypeSuggeation: &DataTypeSuggeation "Data type {} in {} operator may cause AICPU issues, Try to convert to {} if possible." -AICPU_DOC_URL: &AICPU_DOC_URL "https://gitee.com/ascend/mstt/blob/master/profiler/advisor/doc/Samples%20of%20AI%20CPU%20Operator%20Replacement.md" - -CommonChecker: - - DataTypeChecker: - cann_version: [7.0.RC1] - op_type: [ __ALL__ ] - ignore_type: [ cast, tensorequal, equal, nonzero, mul ] - input: [ float, float32, float16, bool, int32, uint32, int64, uint64, int8, uint8, int16, uint16, dt_bf16 ] - output: [ float, float32, float16, bool, int32, uint32, int64, uint64, int8, uint8, int16, uint16, dt_bf16 ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [7.0.RC1] - op_type: [ cast ] - input: [ float, float32, float16, bool, int32, uint32, int64, uint64, uint8, dt_bf16 ] - output: [ float, float32, float16, bool, int32, uint32, int64, uint64, uint8, dt_bf16 ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [7.0.RC1] - op_type: [ tensorequal ] - input: [ float, float32, float16, bool, int32, int8, uint8 ] - output: [ bool ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [7.0.RC1] - op_type: [ equal ] - input: [ float, float32, float16, bool, int32, int64, int8, uint8 ] - output: [ bool ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [7.0.RC1] - op_type: [ nonzero ] - input: [ float16, bool, dt_bf16 ] - output: [ int64 ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [7.0.RC1] - op_type: [ mul ] - input: [ float, float32, float16, bool, int32, uint32, int64, uint64, int8, uint8, dt_bf16 ] - output: [ float, float32, float16, bool, int32, uint32, int64, uint64, int8, uint8, dt_bf16 ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [8.0.RC1, 7.0.0] - op_type: [ __ALL__ ] - ignore_type: [ cast, tensorequal, equal, nonzero, mul ] - input: [ float, float32, float16, dt_bf16, float64, bool, int32, int64, int8, uint8, int16, complex64, complex128 ] - output: [ float, float32, float16, dt_bf16, float64, bool, int32, int64, int8, uint8, int16, complex64, complex128 ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [8.0.RC1, 7.0.0] - op_type: [ cast ] - input: [ float, float32, float16, bool, int32, uint32, int64, uint64, uint8, dt_bf16 ] - output: [ float, float32, float16, bool, int32, uint32, int64, uint64, uint8, dt_bf16 ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [8.0.RC1, 7.0.0] - op_type: [ tensorequal ] - input: [ float, float32, float16, dt_bf16, float64, bool, int32, int8, uint8 ] - output: [ bool ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [8.0.RC1, 7.0.0] - op_type: [ equal ] - input: [ float, float32, float16, dt_bf16, float64, bool, int32, int64, int8, uint8 ] - output: [ bool ] - suggestion: *DataTypeSuggeation - - - DataTypeChecker: - cann_version: [8.0.RC1, 7.0.0] - op_type: [ mul ] - input: [ float, float32, float16, dt_bf16, float64, bool, int32, int64, int8, uint8, complex64 ] - output: [ float, float32, float16, dt_bf16, float64, bool, int32, int64, int8, uint8, complex64 ] - suggestion: *DataTypeSuggeation - -ExampleGuideChecker: - - IndexPutChecker: - op_type: [index] - url: *AICPU_DOC_URL - suggestion: 'Please modify source code followed by this LINK, try to replace index operator with equivalent operator.' - - - NonzeroChecker: - op_type: [ indexput, indexputv2 ] - url: *AICPU_DOC_URL - suggestion: 'Please modify source code followed by this LINK, try to replace indexput operator with equivalent operator.' - - - CastChecker: - op_type: [ argmin ] - url: *AICPU_DOC_URL - suggestion: 'Please update your cann-tookit to at least 7.0.RC1 version by this LINK.' - - - CastChecker: - op_type: [ nonzero ] - url: *AICPU_DOC_URL - suggestion: 'Please modify source code followed by this LINK, try to replace nonzero operator with equivalent operator.' \ No newline at end of file diff --git a/profiler/advisor/rules/dataloader.yaml b/profiler/advisor/rules/dataloader.yaml deleted file mode 100644 index a84abcfdfe2dc4a51fcbd7504c6492bc7910ea30..0000000000000000000000000000000000000000 --- a/profiler/advisor/rules/dataloader.yaml +++ /dev/null @@ -1,9 +0,0 @@ -# unit is milliseconds -dataloader_duration_threshold: 10 -problem: "Found slow dataloader, cost {dataloader_duration} milliseconds for one step while profiling, normally less than {dataloader_duration_threshold} milliseconds." -solutions: - - "Please check the disk I/O of your data directory. If you are training model in ModelArts, please move data to '/cache' or mount a more efficient cloud disk for better I/O." - - "Please check if there are any other multiprocess operations in runtime that may have affected the dataloader, such as training process core binding command 'taskset ...' used for launching the training job." - - "Please check the format of your data, avoid file format like tar, tar.gz, zip." - - "Please set 'pin_memory=True' for your dataloader." - - "Try to adjust dataloader parameter 'num_workers'." \ No newline at end of file diff --git a/profiler/advisor/rules/op_fusion_pass.yaml b/profiler/advisor/rules/op_fusion_pass.yaml deleted file mode 100644 index 3ff69a578285ba15d075f2acbb852499d56021a2..0000000000000000000000000000000000000000 --- a/profiler/advisor/rules/op_fusion_pass.yaml +++ /dev/null @@ -1,491 +0,0 @@ -Elementwise: &Elementwise [ Relu, Pow, Add, Sub, Mul, Div, Abs, Ceil, Log, Sqrt, Exp, LeakyRelu ] - -GraphFusion: - - FlashAttentionFusionPass: - version: 1 - nodes: - - node_1: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - node_2: [ Mul ] - - node_3: [ Softmax, SoftmaxV2 ] - - node_4: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - [ node_3, node_4 ] - - - FlashAttentionFusionPass_V2: - version: 1 - nodes: - - node_1: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - node_2: [ Mul ] - - node_3: [ TransData ] - - node_4: [ Softmax, SoftmaxV2 ] - - node_5: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - [ node_3, node_4 ] - - [ node_4, node_5 ] - - - BMMStridedSliceDGeluFusionPass: - version: 1 - nodes: - - node_1: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - node_2: [StridedSliceD] - - node_3: [Relu] - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - - BMMConfusionTransposeDFusionPass: - version: 1 - nodes: - - node_1: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - node_2: [ ConfusionTransposeD ] - - node_3: [ Relu ] - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - - BMMConfusionTransposeDFusionPass_V2: - version: 1 - nodes: - - node_1: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - node_2: [ ConfusionTransposeD ] - edges: - - [ node_1, node_2 ] - - - Conv2DAddGroupNormFusionPass: - version: 0 - struct: [ Conv2D, Add, GroupNorm ] - - - RMSnormAddFusionPass: - version: 0 - struct: [ RMSnorm, Add ] - - - ConvToFullyConnectionFusionPass: - version: 0 - struct: [ Conv ] - - - ZConcatv2dFusionPass: - version: 0 - struct: [ ConcatV2d, ConcatV2d ] - - - BatchMatMulReduceMeanFusionPass: - version: 1 - nodes: - - node_1: [ BatchMatMulV2, BatchMatMul, MatMul, MatMulV2 ] - - node_2: [ Add ] - - node_3: [ Relu ] - - node_4: [ ReduceMean ] - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - [ node_3, node_4 ] - - - PadDepthwiseConv2dFusionPass: - version: 0 - struct: [ PadD, DepthwiseConv2D ] - - - ConvBatchnormFusionPass: - version: 1 - nodes: - - node_1: [ Conv2d, Conv3d, DepthwiseConv2d ] - - node_2: [ Batchnorm ] - - edges: - - [ node_1, node_2 ] - - - AConv2dMulFusion: - version: 1 - nodes: - - node_1: [ Conv2d, Conv3d ] - - node_2: [ Mul ] - - edges: - - [ node_1, node_2 ] - - - TBEConvAddFusion: - version: 1 - nodes: - - node_1: [ Conv2d, Conv3d ] - - node_2: [ Add ] - - edges: - - [ node_1, node_2 ] - - - ZBNupdateReluV2Conv2DBNreducePass: - version: 0 - struct: [ BNTrainingUpdate, ReluV2, Conv2D, BNTrainingReduce ] - - - ASplitConv2dConcatPass: - version: 1 - nodes: - - node_1: [ MatMul, MatMulV2, BatchMatMul, BatchMatMulV2 ] - - node_2: [ Cast ] - - edges: - - [ node_1, node_2 ] - - - MatMulBiasAddFusionPass: - version: 1 - nodes: - - node_1: [ MatMul, MatMulV2, BatchMatMul, BatchMatMulV2 ] - - node_2: [ BiasAdd, Add ] - - edges: - - [ node_1, node_2 ] - - - Conv2DbpInputBiasAddFusionPass: - version: 0 - struct: [ Conv2DBackpropInput, BiasAdd ] - - - BatchMatmulV2ReduceFusionPass: - version: 0 - struct: [ BatchMatMulV2, ReduceSumD ] - - - BatchMatmulV2ReduceFusionPass_V2: - version: 0 - struct: [ BatchMatMulV2, Cast, ReduceSumD ] - - - Conv3DbpInputBiasAddFusionPass: - version: 0 - struct: [ Conv3DBackpropInputD, BiasAdd ] - - - AFullyConnectionReshapePass: - version: 0 - struct: [ FullyConnection, Reshape ] - - - GemmTransFusionPass: - version: 0 - struct: [ Transpose, Gemm ] - - - Resnet50DbnDwFusionPass: - version: 0 - struct: [ BNTrainingReduceGrad, Conv2DBackpropFilterD ] - - - CastReluCastFusionPass: - version: 0 - struct: [ Cast, Relu, Cast ] - - - PadConv2dFusionPass: - version: 1 - nodes: - - node_1: [ PadD, PadDV3 ] - - node_2: [ Conv2D ] - - edges: - - [ node_1, node_2 ] - - - Conv2DTransposeBatchnormFusionPass: - version: 1 - nodes: - - node_1: [ Conv2dTranspose ] - - node_2: [ BatchNorm, BNInference ] - - edges: - - [ node_1, node_2 ] - - - AvgPoolV2GradFusionPass: - version: 0 - struct: [ AvgPooV2lGrad ] - - - DropOutDoMaskFusionPass: - version: 0 - struct: [ DropOutDoMaskV3D ] - - - ConvCastFusionPass: - version: 0 - struct: [ Conv2D, Cast ] - - - ConvCastFusionPass_V2: - version: 0 - struct: [ Conv2D, TransData, Cast ] - - - StridedSliceConcatFusionPass: - version: 1 - nodes: - - node_1: [ StridedSliceD ] - - node_2: [ StridedSliceD ] - - node_3: [ ConcatD ] - - edges: - - [ node_1, node_3 ] - - [ node_2, node_3 ] - - - ConvCastFusionPass: - version: 0 - struct: [ SplitV ] - - - AInplaceAddFusionPass: - version: 0 - struct: [ InplaceAdd ] - - - AInplaceSubFusionPass: - version: 0 - struct: [ InplaceSub ] - - - AInplaceUpdateFusionPass: - version: 0 - struct: [ InplaceUpdate ] - -UBFusion: - - TbeConv3dElemwisePass: - version: 1 - nodes: - - node_1: [ Conv3D ] - - node_2: *Elementwise - edges: - - [ node_1, node_2 ] - - - TbeConv3dDxElemwisePass: - version: 0 - struct: [ Conv3dBackpropInput, AddN, LeakyReluGrad ] - - - TbeConv3dDxElemwisePass_V2: - version: 0 - struct: [ Conv3dBackpropInput, LeakyReluGrad ] - - - MatMulDropoutDoMaskV3dFusionPass: - version: 0 - struct: [ MatMul, Dropout_do_mask_v3_d, Add ] - - - BatchMatMulDropoutDoMaskV3dFusionPass: - version: 0 - struct: [ BatchMatMul, Dropout_do_mask_v3_d, Add ] - - - MatmulReduceSumUbFusion: - version: 0 - struct: [ BatchMatMul, ReduceSum ] - - - TbeBatchMatMulElementWiseFusionPass: - version: 1 - nodes: - - node_1: [ BatchMatMul, GEMM ] - - node_2: *Elementwise - - edges: - - [ node_1, node_2 ] - - - ATbeMatMulElemwiseFusionPass: - version: 1 - nodes: - - node_1: [ MatMul, GEMM ] - - node_2: *Elementwise - - edges: - - [ node_1, node_2 ] - - - MatmulConfusiontransposeUbFusion: - version: 0 - struct: [ MatMul, matmul_transpose ] - - - TbeFullyconnectionElemwiseDequantFusionPass: - version: 1 - nodes: - - node_1: [ BatchMatMul, MatMul, FullyConnection ] - - node_2: *Elementwise - - edges: - - [ node_1, node_2 ] - - - BatchMatmulConfusiontransposeUbFusion: - version: 0 - struct: [ BatchMatMul, batchmatmul_transpose ] - - - TbeConvSigmoidMulQuantFusionPass: - version: 1 - nodes: - - node_1: [ Conv ] - - node_2: [ Sigmoid ] - - node_3: [ Mul ] - - node_4: [ Quant ] - - edges: - - [ node_1, node_2 ] - - [ node_1, node_3 ] - - [ node_2, node_3 ] - - [ node_3, node_4 ] - - - TbeConv2DReluv2Pass: - version: 0 - struct: [ Conv2D, ReluV2 ] - - - TbeConvDoubleInFusionPass: - version: 1 - nodes: - - node_1: [ Conv2D ] - - node_2: *Elementwise - - node_3: *Elementwise - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - - TbeConv2dAddClipMulDivFusionPass: - version: 0 - struct: [ Conv2D, Add, Clip, Mul, Div ] - - - TbeConv2dAddClipMulDivFusionPass_V2: - version: 0 - struct: [ Conv2D, Add, Clip, Mul ] - - - TbeConv2dAddRelu6MulMulFusionPass: - version: 1 - nodes: - - node_1: [ Conv2D, DepthwiseConv2D ] - - node_2: [ Add ] - - node_3: [ Relu6 ] - - node_4: [ Mul ] - - node_5: [ Mul ] - - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - [ node_3, node_4 ] - - [ node_4, node_5 ] - - - ConvClipByValueFusionPass: - version: 1 - nodes: - - node_1: [ Conv2D ] - - node_2: *Elementwise - edges: - - [ node_1, node_2 ] - - - TbeAippConvReluMaxpoolingFusion: - version: 1 - nodes: - - node_1: [ Conv2D ] - - node_2: *Elementwise - - node_3: [ MaxPool, MaxPoolv3 ] - - edges: - - [ node_1, node_2 ] - - [ node_2, node_3 ] - - - TbeReduceElemwiseFusionPass: - version: 1 - nodes: - - node_1: *Elementwise - - node_2: [ CommReduce ] - edges: - - [ node_1, node_2 ] - - - TbeReadSelectEltwiseFusionPass: - version: 1 - nodes: - - node_1: [ ReadSelect ] - - node_2: *Elementwise - - edges: - - [ node_1, node_2 ] - - - TbeEltwiseWriteSelectFusionPass: - version: 1 - nodes: - - node_1: *Elementwise - - node_2: [ write_select ] - - edges: - - [ node_1, node_2 ] - - - TbeEltwiseFusionPass: - version: 1 - nodes: - - node_1: *Elementwise - - node_2: *Elementwise - - edges: - - [ node_1, node_2 ] - - - TbeConvBnreduceFusionPass: - version: 0 - struct: [ Convolution, bn_reduce ] - - - TbeBnupdateEltwiseFusionPass: - version: 1 - nodes: - - node_1: [ bn_update ] - - node_2: *Elementwise - edges: - - [ node_1, node_2 ] - - - TbeConv2DBackpropElemwiseFusionPass: - version: 1 - nodes: - - node_1: [ Conv2DBackpropInputD, Conv2DTransposeD, Deconvolution ] - - node_2: [ Add, ReluGradV2 ] - - edges: - - [ node_1, node_2 ] - - - TbeDxElemwisePass: - version: 1 - nodes: - - node_1: [ Conv2DBackpropInputD, Conv2DTransposeD, Deconvolution ] - - node_2: [ LeakyRelu, Prelu ] - - edges: - - [ node_1, node_2 ] - - - TbeConv2dBackpropRequantFusionPass: - version: 1 - nodes: - - node_1: [ Conv2DBackpropInputD, Conv2DTransposeD, Deconvolution ] - - node_2: [ AscendRequant ] - - edges: - - [ node_1, node_2 ] - - - TbeDwTransdataFusionPass: - version: 1 - nodes: - - node_1: [ Transdate ] - - node_2: [ Transdate ] - - node_3: [ Conv2DBackpropFilter ] - - edges: - - [ node_1, node_3 ] - - [ node_2, node_3 ] - - - TbeDxTransdataFusionPass: - version: 1 - nodes: - - node_1: [ Transdate ] - - node_2: [ Transdate ] - - node_3: [ Conv2DBackpropInput ] - - edges: - - [ node_1, node_3 ] - - [ node_2, node_3 ] - - - TbeEltwiseCastFusionPass: - version: 1 - nodes: - - node_1: [ Relu, Add, Mul, Sqrt ] - - node_2: [ Cast ] - - edges: - - [ node_1, node_2 ] - - - TbeEltwiseCastFusionPass_V2: - version: 1 - nodes: - - node_1: [ Cast ] - - node_2: [ Relu, Add, Mul, Sqrt ] - - - edges: - - [ node_1, node_2 ] - - - TbeConv2DBackpropDequantFusionPass: - version: 1 - nodes: - - node_1: [ Conv2DBackpropInputD, Conv2DTransposeD, Deconvolution ] - - node_2: [ AscendDequant ] - - - edges: - - [ node_1, node_2 ] diff --git a/profiler/advisor/rules/sync_batchnorm.yaml b/profiler/advisor/rules/sync_batchnorm.yaml deleted file mode 100644 index 0f702af6eae4445244778fc5429912380c2d199a..0000000000000000000000000000000000000000 --- a/profiler/advisor/rules/sync_batchnorm.yaml +++ /dev/null @@ -1,41 +0,0 @@ -problem: "Found {syncbn_num} SyncBatchNorm, which can lead to slow python task dispatch and frequent communication between devices and finally reducing training efficiency." -max_syncbn_num: 20 -solutions: - - enable batchnorm: - desc: "disable SyncBatchNorm by remove the code like 'torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)' if possible." - - enable efficient SyncBatchNorm: - desc: "replace the 'forward' method of python script 'torch_npu/utils/syncbatchnorm.py' in your runtime environment." - efficient_code: | - @staticmethod - def forward(self, input_tensor, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size): - input_tensor = input_tensor.contiguous() - input_shape = input_tensor.shape - input_tensor_ = input_tensor.reshape(input_shape[0], input_shape[1], 1, -1) - sum_val, sum_square_val = torch.batch_norm_reduce(input_tensor_, eps) - - count = torch.full((1,), - input_tensor.numel() // input_tensor.size(1), - dtype=sum_val.dtype, - device=sum_val.device) - - num_channels = input_tensor.shape[1] - combined = torch.cat([sum_val, sum_square_val, count], dim=0) - combined_list = torch.empty((world_size,) + combined.shape, dtype=combined.dtype, device=combined.device) - dist.all_gather_togather(combined_list, combined, process_group, async_op=False) - sum_all, square_sum_all, count_all = torch.split(combined_list, num_channels, dim=1) - size = count_all.view(-1).sum() - if size == 1: - raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size)) - - mean, invstd = torch.batch_norm_gather_stats_update(input_tensor, - sum_all, - square_sum_all, - running_mean, - running_var, - momentum, - eps, - count_all.view(-1)) - self.save_for_backward(input_tensor, weight, mean, invstd, count_all.to(torch.int32)) - self.process_group = process_group - out = torch.batch_norm_elemt(input_tensor, weight, bias, mean, invstd, eps) - return out \ No newline at end of file diff --git a/profiler/advisor/rules/synchronize.yaml b/profiler/advisor/rules/synchronize.yaml deleted file mode 100644 index 3bd518d003c598ddca54e53a359b957bce3c0bab..0000000000000000000000000000000000000000 --- a/profiler/advisor/rules/synchronize.yaml +++ /dev/null @@ -1,8 +0,0 @@ -problem: "SynchronizeStream will reduce training efficiency. Found {synchronize_num} SynchronizeStream, {slow_synchronize_num} slow SynchronizeStream cost {total_synchronize_stream_time} us." -max_synchronize_num: 20 -slow_synchronize_threshold: 10 #ms -solutions: - - disable ascend launch blocking: - desc: "please check your env 'ASCEND_LAUNCH_BLOCKING', if ASCEND_LAUNCH_BLOCKING=1, please execute 'unset ASCEND_LAUNCH_BLOCKING' and then start your training job." - - modify code to avoid synchronize stream: - desc: "please try to modify your training code to avoid synchronize stream between cpu and npu." \ No newline at end of file diff --git a/profiler/advisor/rules/timeline_fusion_ops.yaml b/profiler/advisor/rules/timeline_fusion_ops.yaml deleted file mode 100644 index 8207465dc4a5c5ddbb1cc934ef95951493c4bacb..0000000000000000000000000000000000000000 --- a/profiler/advisor/rules/timeline_fusion_ops.yaml +++ /dev/null @@ -1,59 +0,0 @@ -- cann_version: 6.3.RC2 - torch_version: 1.11.0 - unique_id: 0 - operator_rules: - aten: - add: - torch_npu.npu_confusion_transpose: ["(permute|transpose)-(contiguous){0,1}-(reshape|view)", - "(reshape|view)-(contiguous){0,1}-(permute|transpose)"] - torch_npu.fast_gelu: [gelu] - torch_npu.npu_linear: [linear] - torch_npu.npu_mish: [mish] - torch_npu.contrib.module.Mish: [mish] - torch_npu.npu_scaled_masked_softmax: [ "softmax-(mul){0,1}-(masked_fill_|add)" ] - torch_npu.npu_silu: [ silu, mul-sigmoid, sigmoid-mul ] - torch_npu.contrib.module.SiLU: [ silu, mul-sigmoid, sigmoid-mul ] - optimizer.clip_grad_norm_fused_: [add-reciprocal-mul] - Optimizer: - add: - torch_npu.optim.NpuFusedAdamW: [AdamW.step] - torch_npu.optim.NpuFusedSGD: [SGD.step] - torch_npu.optim.NpuFusedAdadelta: [Adadelta.step] - torch_npu.optim.NpuFusedLamb: [Lamb.step] - torch_npu.optim.NpuFusedAdamP: [AdamP.step] - torch_npu.optim.NpuFusedBertAdam: [BertAdam.step] - torch_npu.optim.NpuFusedRMSprop: [RMSprop.step] - torch_npu.optim.NpuFusedRMSpropTF: [RMSpropTF.step] - torch_npu.optim.NpuFusedAdam: [Adam.step] - - -- cann_version: 7.0.RC1 - torch_version: [1.11.0,2.1.0] - unique_id: 1 - inherit_unique_id: 0 - operator_rules: - aten: - add: - torch_npu.npu_fusion_attention: ["matmul-(add){0,1}-(mul){0,1}-(masked_fill_|add){0,1}-softmax-(dropout){0,1}-matmul"] - torch_npu.npu_rotary_mul: ["(chunk|slice)-neg-cat-(mul){0,2}-add"] - -- cann_version: 7.0.0 - torch_version: [1.11.0, 2.1.0] - unique_id: 2 - inherit_unique_id: 1 - operator_rules: - aten: - add: - torch_npu.npu_rms_norm: ["(pow){0,1}-(mean){0,1}-(add){0,1}-rsqrt-mul-(type_as){0,1}"] - torch_npu.npu_swiglu: [ "(slice|chunk)-silu-mul", "(slice|chunk)-mul-silu", - "(slice|chunk)-sigmoid-mul-mul", "(slice|chunk)-mul-sigmoid-mul", - "(slice|chunk)-mul-mul-sigmoid" ] - -- cann_version: 8.0.RC1 - torch_version: [1.11.0, 2.1.0] - unique_id: 3 - inherit_unique_id: 2 - operator_rules: - aten: - add: - torch_npu.npu_geglu: ["(slice|chunk)-gelu-mul", "(slice|chunk)-mul-gelu"] \ No newline at end of file diff --git a/profiler/advisor/utils/__init__.py b/profiler/advisor/utils/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/profiler/advisor/utils/log.py b/profiler/advisor/utils/log.py deleted file mode 100644 index b18272a82b6c5f529e5d36ceca921734eba9f592..0000000000000000000000000000000000000000 --- a/profiler/advisor/utils/log.py +++ /dev/null @@ -1,63 +0,0 @@ -""" -log module -""" -import logging -import os - -from profiler.advisor.common import constant as const - - -def get_log_level(): - log_level = os.getenv(const.ADVISOR_LOG_LEVEL, const.DEFAULT_LOG_LEVEL).upper() - if not hasattr(logging, log_level): - raise AttributeError(f"module 'logging' has no attribute '{log_level}', " - f"supported log level: {', '.join(const.SUPPORTED_LOG_LEVEL)}") - return log_level - - -def init_logger(ctx, param, debug_mode) -> logging.Logger: - logging.logThreads = False - logging.logMultiprocessing = False - logging.logProcesses = False - - class LevelFilter(logging.Filter): - """ - level filter, filer only log with level out - """ - - # pylint:disable=too-few-public-methods - def filter(self, record): - if record.levelno == 60: - return False - return True - - console_log_level = getattr(logging, get_log_level()) - console_handle = logging.StreamHandler() - console_handle.setLevel(console_log_level) - console_handle.addFilter(LevelFilter()) - if debug_mode and not ctx.resilient_parsing: - formatter = logging.Formatter(fmt="[%(asctime)s][%(levelname)s][%(filename)s L%(lineno)s] %(message)s", - datefmt='%Y-%m-%d,%H:%M:%S') - else: - formatter = logging.Formatter(fmt="[%(asctime)s][%(levelname)s] %(message)s", - datefmt='%Y-%m-%d,%H:%M:%S') - console_handle.setFormatter(formatter) - - # add log level out - logging.addLevelName(60, 'OUT') - logger = logging.getLogger() - setattr(logger, 'out', lambda *args: logger.log(60, *args)) - output_handle = logging.StreamHandler() - output_handle.setLevel("OUT") - formatter = logging.Formatter("%(message)s") - output_handle.setFormatter(formatter) - - logger.setLevel("DEBUG") - logger.handlers = [] - if not logger.handlers: - logger.addHandler(console_handle) - logger.addHandler(output_handle) - else: - logger.info(logger.handlers) - logger.debug("The logger of analysis have initialized successfully.") - return logger diff --git a/profiler/advisor/utils/tools.py b/profiler/advisor/utils/tools.py deleted file mode 100644 index 2cbcb5e0521d4a947fb8ff6af40e98c32dedab23..0000000000000000000000000000000000000000 --- a/profiler/advisor/utils/tools.py +++ /dev/null @@ -1,76 +0,0 @@ -from functools import partial - -import click - -CONTEXT_SETTINGS = dict(help_option_names=['-H', '-h', '--help']) - - -class ClickAliasedGroup(click.Group): - """ - Alias click command - """ - FORMAT_LIMIT_LEN = 6 - - def __init__(self, *args, **kwargs): - super(ClickAliasedGroup, self).__init__(*args, **kwargs) - self._alias_dict = {} - self._commands = {} - - def command(self, *args, **kwargs): - alias = kwargs.pop('alias', None) - decorator = super(ClickAliasedGroup, self).command(*args, **kwargs) - if not alias: - return decorator - - return partial(self._decorator_warpper, decorator, alias) - - def group(self, *args, **kwargs): - alias = kwargs.pop('alias', None) - decorator = super(ClickAliasedGroup, self).group(*args, **kwargs) - if not alias: - return decorator - - return partial(self._decorator_warpper, decorator, alias) - - def _decorator_warpper(self, decorator, alias, func=None): - cmd = decorator(func) - self._commands[cmd.name] = alias - self._alias_dict[alias] = cmd.name - return cmd - - def resolve_alias(self, cmd_name): - if cmd_name in self._alias_dict.keys(): - return self._alias_dict[cmd_name] - return cmd_name - - def get_command(self, ctx, cmd_name): - cmd_name = self.resolve_alias(cmd_name) - command = super(ClickAliasedGroup, self).get_command(ctx, cmd_name) - return command if command else None - - def format_commands(self, ctx, formatter): - rows = [] - sub_commands = self.list_commands(ctx) - max_len = 0 - if len(sub_commands) > 0: - max_len = max(len(cmd) for cmd in sub_commands) - - limit = formatter.width - self.FORMAT_LIMIT_LEN - max_len - for sub_command in sub_commands: - cmd = self.get_command(ctx, sub_command) - if cmd is None: - continue - if hasattr(cmd, 'hidden') and cmd.hidden: - continue - if sub_command in self._commands: - alias = self._commands[sub_command] - sub_command = f'{sub_command}, {alias}' - if click.__version__[0] < '7': - cmd_help = cmd.short_help or '' - else: - cmd_help = cmd.get_short_help_str(limit) - rows.append((sub_command, cmd_help)) - - if rows: - with formatter.section('Commands'): - formatter.write_dl(rows) diff --git a/profiler/advisor/utils/utils.py b/profiler/advisor/utils/utils.py deleted file mode 100644 index 83f304c2d3c7d2e583b9c3979a4cf2c232020f55..0000000000000000000000000000000000000000 --- a/profiler/advisor/utils/utils.py +++ /dev/null @@ -1,610 +0,0 @@ -import inspect -import json - -import logging -import multiprocessing as mp -import os -import queue -import re -import stat -import time -import traceback -import types -from functools import wraps -from typing import Any, Set -import ijson -import click -import requests -from requests.adapters import HTTPAdapter -from tqdm import tqdm - -from profiler.advisor.common import constant as const -from profiler.advisor.common.version_control import VersionControl -from profiler.advisor.utils.log import init_logger, get_log_level - -logger = logging.getLogger() -logger.setLevel(get_log_level()) -permission_warned: Set = set() - - -def ignore_warning(exception: Exception = None): - return exception - - -class ContextObject(object): - def __init__(self): - self._debug = False - - def set_debug(self, debug=False): - self._debug = debug - - @property - def debug_mode(self): - return self._debug - - -def debug_option(f): - return click.option('--debug', - is_flag=True, - expose_value=False, - is_eager=True, - callback=init_logger, - help="Debug Mode. Shows full stack trace when error occurs.")(f) - - -def get_class_absolute_path(cls): - module = inspect.getmodule(cls) - if module is not None: - module_path = module.__name__ - class_name = cls.__name__ - return f"{module_path}.{class_name}" - else: - return None - - -def is_static_func(function_obj): - return isinstance(function_obj, staticmethod) - - -def singleton(cls): - """ - :param cls: any class - :return: singleton handle - - When using the singleton function, you need to manually specify collection_path='dataSet_path'. Otherwise, the singleton function - is initialized by class name. - if cls has 'collection_path' property, _instance map will build by class_name and 'collection_path', the default value of - collection path is class absolute path. - - _instance = {cls.name: {collection_path: instance}} - """ - _instance = {} - - def _singleton(*args: any, **kw: any) -> any: - collection_path = kw.get("collection_path") - if not collection_path: - collection_path = get_class_absolute_path(cls) - if cls in _instance and collection_path in _instance[cls]: - return _instance[cls].get(collection_path) - if cls not in _instance: - _instance[cls] = {collection_path: cls(*args, **kw)} - else: - _instance[cls][collection_path] = cls(*args, **kw) - return _instance[cls].get(collection_path) - - # 保留原始类的属性和方法 - _singleton.__name__ = cls.__name__ - _singleton.__module__ = cls.__module__ - _singleton.__doc__ = cls.__doc__ - - # 拷贝原始类的类方法和静态方法 - _singleton.__dict__.update(cls.__dict__) - for base_class in inspect.getmro(cls)[::-1]: - # 获取类的所有成员 - members = inspect.getmembers(base_class) - - # 过滤出函数对象 - function_objs = [member[1] for member in members if inspect.isfunction(member[1]) or inspect.ismethod(member[1])] - for function_obj in function_objs: - if inspect.isfunction(function_obj) and not is_static_func(function_obj): - continue - setattr(_singleton, function_obj.__name__, function_obj) - - return _singleton - - -def lazy_property(func): - """ - Lazy loading of class attributes. - which is calculated only once when it is called for the first time, - and will not be repeated for each call after that. - """ - attr_name = "_lazy_" + func.__name__ - - @property - def _lazy_property(instance): - if not hasattr(instance, attr_name): - setattr(instance, attr_name, func(instance)) - return getattr(instance, attr_name) - - return _lazy_property - - -class CheckPathAccess: - """ - check path access permissions - """ - - # pylint: disable=no-member - def __init__(self, func): - wraps(func)(self) - self.warned = permission_warned - - def __call__(self, *args, **kwargs): - path = args[0] - if not os.access(path, os.R_OK) and path not in self.warned: - logger.warning("%s can not read, check the permissions", path) - self.warned.add(path) - return self.__wrapped__(*args, **kwargs) - - def __get__(self, instance, cls): - if instance is None: - return self - return types.MethodType(self, instance) - - -def walk_error_handler(error): - """ - handle dir walk error - """ - if error.filename not in permission_warned: - logger.warning(error) - permission_warned.add(error.filename) - - -@CheckPathAccess -def get_file_path_from_directory(path: str, check_func: Any) -> list: - """ - get file from directory - """ - file_list = [] - for root, _, files in os.walk(path, onerror=walk_error_handler): - for filename in files: - filepath = os.path.join(root, filename) - if check_func(filename): - file_list.append(filepath) - return file_list - - -@singleton -class Timer: - def __init__(self): - self.strftime = time.strftime("%Y%m%d%H%M%S", time.localtime(time.time())) - - -def get_analyze_processes(): - # n_processes not exposed to user through att-advisor command arguments now - return min(int(os.getenv(const.MA_ADVISOR_ANALYZE_PROCESSES, 1)), const.MA_ADVISOR_MAX_PROCESSES) - - -def format_timeline_result(result: dict, dump_html=False): - """ - :Param result: json for api name and stack - :Return: json after format - """ - format_result = {} - if dump_html: - result = json.loads(json.dumps(result).replace("\\r\\n", "
").replace("", "<module>")) - - for key, stacks in result.items(): - api_name = key.split(":")[0] - format_result[api_name] = sorted(list(stacks.items()), key=lambda stack: stack[1], reverse=True) - return format_result - - -class ParallelJob: - - def __init__(self, src_func, ops_api_list, job_name=None): - if not callable(src_func): - raise TypeError(f"src_func should be callable") - - if not isinstance(ops_api_list, (list, tuple)): - raise TypeError(f"ops_api_list should be list or tuple") - - self.src_func = src_func - self.ops_api_list = ops_api_list - self.job_name = job_name - - def start(self, n_proccesses): - - job_queue = mp.Queue(len(self.ops_api_list)) - completed_queue = mp.Queue() - for i in range(len(self.ops_api_list)): - job_queue.put(i) - - processes = [] - listen = mp.Process(target=self.listener, args=(completed_queue, len(self.ops_api_list),)) - listen.start() - - for i in range(n_proccesses): - p = mp.Process(target=self.parallel_queue, args=(job_queue, completed_queue,)) - processes.append(p) - p.start() - - for p in processes: - p.join() - - completed_queue.put(None) - listen.join() - - def listener(self, completed_queue, num): - pbar = tqdm(total=num, position=0, leave=False, ncols=100, desc=self.job_name) - for _ in iter(completed_queue.get, None): - pbar.update() - pbar.refresh() - pbar.n = num - - def parallel_queue(self, job_queue, completed_queue): - while True: - try: - if job_queue.empty(): - break - token = job_queue.get(timeout=1) - except queue.Empty: - continue - self.src_func(*self.ops_api_list[token]) - completed_queue.put(token) - - -def mp_queue_to_list(job_queue): - queue_list = [] - while True: - try: - if job_queue.empty(): - break - token = job_queue.get(timeout=1) - queue_list.append(token) - except queue.Empty: - continue - return queue_list - - -def load_parameter(parameter, default): - if not os.environ.get(parameter, None): - return default - else: - return os.environ.get(parameter) - - -def get_supported_subclass(clazz: VersionControl.__class__, cann_version: str): - """ - Returns a list of subclasses that support the specified version, because of the __subclasses__(), - you need to import the all subclass first - :param clazz: Class name which is extends to VersionControl.__class__ - :param cann_version: The CANN software version - :return: The list of subclasses that support the specified CANN version - """ - # 获取所有支持这个cann版本的子类 - dataset_classes = clazz.__subclasses__() - sub_class_list = [cls for cls in dataset_classes if cls.is_supported(cann_version)] - logger.debug("The support subclass list is %s, cann version is %s", str(sub_class_list), cann_version) - return sub_class_list - - -def to_percent(num: float) -> str: - """ - change float to percent format - """ - num = num * 100 - return f"{num:.2f}%" - - -def safe_division(numerator, denominator): - """Return 0 if denominator is 0.""" - return denominator and numerator / denominator - - -def safe_write(content, save_path): - if os.path.dirname(save_path) != "": - os.makedirs(os.path.dirname(save_path), exist_ok=True) - - with os.fdopen(os.open(save_path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, - stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP), "w") as f: - f.write(content) - - -def create_directory_for_file(file: str) -> None: - """ - create directory for file - """ - dirname = os.path.dirname(file) - if not os.path.exists(dirname): - os.makedirs(dirname) - - -class CheckPathAccess: - """ - check path access permissions - """ - - # pylint: disable=no-member - def __init__(self, func): - wraps(func)(self) - self.warned = permission_warned - - def __call__(self, *args, **kwargs): - path = args[0] - if path and not os.access(path, os.R_OK) and path not in self.warned: - logger.warning("%s can not read, check the permissions", path) - self.warned.add(path) - return self.__wrapped__(*args, **kwargs) - - def __get__(self, instance, cls): - if instance is None: - return self - return types.MethodType(self, instance) - - -@CheckPathAccess -def get_file_path_from_directory(path, check_func): - """ - get file from directory - """ - file_list = [] - - if not path: - return file_list - - if not os.path.isdir(path): - logger.warning("Expected existed directory, but got %s", path) - - for root, _, files in os.walk(path): - for filename in files: - filepath = os.path.join(root, filename) - if check_func(filename): - file_list.append(filepath) - return file_list - - -@CheckPathAccess -def get_dir_path_from_directory(path: str, check_func: Any) -> list: - """ - get file from directory - """ - file_list = [] - for root, _, files in os.walk(path, onerror=walk_error_handler): - for filename in files: - filepath = os.path.join(root, filename) - if check_func(filename): - file_list.append(filepath) - return file_list - - -def is_regex_pattern(string: str): - """ - Check if str is a regular expression. - """ - escaped_string = re.escape(string) - return not (escaped_string == string) - - -def join_prof_path(root_dir: str, sub_dir: str) -> str: - """ - regular expression matching method for path concatenation - """ - if is_regex_pattern(sub_dir): - for root, _, _ in os.walk(root_dir, onerror=walk_error_handler): - if re.match(sub_dir, os.path.basename(root)): - return root - logger.debug("Fail to get profiling path %s from local path %s by regular expression matching", sub_dir, root_dir) - else: - sub_dir = os.path.join(root_dir, sub_dir) - if os.path.exists(sub_dir): - return sub_dir - logger.debug("Fail to get profiling path %s from local path %s", sub_dir, root_dir) - return "" - - -def format_excel_title(title: str) -> str: - """ - format excel title - """ - title = title.lower() - title = title.replace("(us)", '') - title = title.replace("(ns)", '') - title = title.replace("(%)", '') - title = title.replace(" ", "_") - - # 将kernel_details中的列名转为与op_summary_x.csv中一致 - kernel_details_col_name_map = { - "name": "op_name", - "type": "op_type", - "accelerator_core": "task_type", - "start_time": "task_start_time", - "duration": "task_duration", - "wait_time": "wait_time" - } - return kernel_details_col_name_map.get(title, title) - - -def format_float(num: float) -> float: - """ - format float num, round to 2 decimal places - """ - return round(num, 2) - - -class SafeOpen: - """ - safe open to check file - """ - - # pylint: disable=consider-using-with - def __init__(self, name, mode='r', encoding=None): - self.file = None - if not os.path.exists(name): - logger.warning("%s not exist, please check", name) - return - - if os.access(name, os.R_OK): - self.file = open(name, mode, encoding=encoding, errors="ignore") - else: - logger.warning("%s can not read, check the permissions", name) - - def __enter__(self): - return self.file - - def __exit__(self, exc_type, exc_val, exc_tb): - if self.file: - self.file.close() - return True - - -def save_downloaded_file(response, url_path, file_save_path): - """保存响应体中的文件 - - 参数: - response: 请求后获取的响应体 - url_path: url路径 - file_save_path: 保存路径 - 返回: - final_file_path: 文件保存绝对路径 - """ - # 获取url路径中的文件名, 拼接在保存路径下 - file_save_path = os.path.normpath(file_save_path) - file_name = os.path.basename(url_path) - final_file_path = os.path.join(file_save_path, file_name) - # 若目标保存路径不存在,则自动生成 - if not os.path.exists(file_save_path): - os.makedirs(file_save_path) - if response.status_code <= 300: - logger.debug("Response status code is %s", response.status_code) - flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL - modes = stat.S_IWUSR | stat.S_IRUSR - # 若文件已存在,则移除已有的文件并保存最新的文件 - if os.path.exists(final_file_path): - os.remove(final_file_path) - # 保存文件 - with os.fdopen(os.open(final_file_path, flags, modes), mode="wb") as f: - f.write(response.content) - logger.info("Success to save content in: %s", os.path.abspath(final_file_path)) - else: - # 若响应码不为预期的数值, 显示相应告警 - logger.warning("Failed to save the response body. The response status code is %s. " - "Please check the network or try another region", response.status_code) - - -def request_with_retry(url_path, region_name=None): - """使用requests请求获取文件, 失败则进行重试, 最多请求 max_retries+1 次 - - 参数: - url_path: URL路径 - file_save_path: 云文件保存路径 - """ - logger.debug("Requesting or retrying to get file from region: %s", region_name) - - # 若从环境变量指定了保存路径,优先从环境变量中获取,若为空则使用默认的云文件保存路径constant.CLOUD_RULE_PATH - file_save_path = os.path.join(os.path.expanduser("~"), const.CLOUD_RULE_PATH) - if os.getenv(const.ADVISOR_RULE_PATH): - file_save_path = os.getenv(const.ADVISOR_RULE_PATH) - - session = requests.Session() - # 使用session发起的所有请求, 默认最多会重试 max_retries 次, 计入最初请求, 最差情况下请求 max_retries+1 次 - adapter = HTTPAdapter(max_retries=const.MAX_RETRIES) - session.mount(const.HTTP_PREFIXES, adapter) - session.mount(const.HTTPS_PREFIXES, adapter) - - logger.debug('Session try to get response') - response = None - try: - response = session.get(url_path, timeout=const.TIMEOUT) - except Exception as e: - logger.debug("Error: %s: %s", e, traceback.format_exc()) - - if response is None: - logger.warning("Fail to download file from region: %s, response is None, " - "please use the environment variable %s for more detailed information", - region_name, const.ADVISOR_LOG_LEVEL) - else: - try: - # 若响应码为400~600之间,response.raise_for_status抛出HTTPError错误, 跳过调用save_downloaded_file函数逻辑 - response.raise_for_status() - save_downloaded_file(response, url_path=url_path, file_save_path=file_save_path) - except Exception as e: - logger.warning("Error: %s: %s", e, traceback.format_exc()) - # 关闭 session, 清除所有装配器 - session.close() - - -def read_csv(file): - import csv - - raw_data = [] - logger.debug("Parse file %s", file) - with SafeOpen(file, encoding="utf-8") as csv_file: - try: - csv_content = csv.reader(csv_file) - for row in csv_content: - raw_data.append(row) - except OSError as error: - logger.error("Read csv file failed : %s", error) - return [] - - return raw_data - - -def get_file_path_by_walk(root, filename): - file_path = "" - for root, _, files in os.walk(root, topdown=True): - for name in files: - if name == filename: - file_path = os.path.join(root, name) - return file_path - return file_path - - -def check_path_valid(path): - if os.path.islink(os.path.abspath(path)): - logger.error("fThe path is detected as a soft connection. path:%ss", path) - return False - elif not os.access(path, os.R_OK): - logger.error(f"The file is not readable. path:%ss", path) - return False - elif os.path.getsize(path) > const.MAX_FILE_SIZE: - logger.error(f"The file size exceeds the limit. path:%ss, MAX_FILE_SIZE:%ss B",path, const.MAX_FILE_SIZE) - return False - return True - - -def parse_json_with_generator(timeline_data_path, func): - result = [] - if not check_path_valid(timeline_data_path): - return result - try: - with open(timeline_data_path, "r") as f: - if os.getenv(const.DISABLE_STREAMING_READER) == "1": - logger.debug("Disable streaming reader.") - file_parser = json.loads(f.read()) - else: - logger.debug("Enable streaming reader.") - file_parser = ijson.items(f, "item") - - for i, event in tqdm(enumerate(file_parser), - leave=False, ncols=100, desc="Building dataset for timeline analysis"): - func_res = func(index=i, event=event) - if func_res is not None: - result.append(func_res) - - except Exception: - logger.warning("Error %s while parsing file %s, continue to timeline analysis", traceback.format_exc(), - timeline_data_path) - return result - - -def convert_to_float(num): - try: - return float(num) - except (ValueError, FloatingPointError): - logger.error(f"Can not convert %ss to float", num) - pass - return 0 diff --git a/profiler/advisor/version.py b/profiler/advisor/version.py deleted file mode 100644 index 67d04140866a3df8ecb8484451c476006da2671d..0000000000000000000000000000000000000000 --- a/profiler/advisor/version.py +++ /dev/null @@ -1,38 +0,0 @@ -import sys - - -def get_package_version(package_name) -> str: - """ - Get package version info by importlib - Args: - package_name: package name - - Returns: - version: version info string - """ - if sys.version_info >= (3, 8): - # Because importlib_metadata has been changed to importlib.metadata in py3.8 - from importlib import metadata - from importlib.metadata import PackageNotFoundError - else: - import importlib_metadata as metadata - from importlib_metadata import PackageNotFoundError - - try: - version = metadata.version(package_name) - except PackageNotFoundError: - version = "UNKNOWN" - return version - - -def print_version_callback(ctx, param, value): # NOQA - import click - - if not value or ctx.resilient_parsing: - return - click.echo('Version {}'.format(get_package_version("msprof-analyze"))) - ctx.exit() - - -def cli_version(): - return get_package_version("msprof-analyze")