From 3c3e9a4473a7e18242272ca95345d336ba6d3cbe Mon Sep 17 00:00:00 2001
From: MooYeh <yangtao279@huawei.com>
Date: Tue, 12 Dec 2023 10:57:02 +0800
Subject: [PATCH] Update profiler README.md files

---
 profiler/README.md                          | 76 ++++++++++++++++++++-
 profiler/cluster_analyse/README.md          |  2 +-
 profiler/compare_tools/README.md            |  4 +-
 profiler/merge_profiling_timeline/README.md |  2 +-
 4 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/profiler/README.md b/profiler/README.md
index 44666e7434..0fd85248b6 100644
--- a/profiler/README.md
+++ b/profiler/README.md
@@ -2,11 +2,85 @@
 
 ATT工具针对训练&大模型场景，提供端到端调优工具：用户采集到性能数据后，由ATT工具提供统计、分析以及相关的调优建议。
 
-### Profiling采集
+### NPU Profiling数据采集
+
 目前ATT工具主要支持Ascend PyTorch Profiler接口的性能数据采集，请参见《[Ascend PyTorch Profiler性能调优工具介绍](https://gitee.com/ascend/att/wikis/%E6%A1%88%E4%BE%8B%E5%88%86%E4%BA%AB/%E6%80%A7%E8%83%BD%E6%A1%88%E4%BE%8B/Ascend%20PyTorch%20Profiler%E6%80%A7%E8%83%BD%E8%B0%83%E4%BC%98%E5%B7%A5%E5%85%B7%E4%BB%8B%E7%BB%8D)》。
 
 Ascend PyTorch Profiler接口支持AscendPyTorch 5.0.RC2或更高版本，支持的PyThon和CANN软件版本配套关系请参见《CANN软件安装指南》中的“[安装PyTorch](https://www.hiascend.com/document/detail/zh/canncommercial/63RC2/envdeployment/instg/instg_000041.html)”。
 
+#### 采集方式一：通过with语句进行采集
+
+```python
+import torch_npu
+experimental_config = torch_npu.profiler._ExperimentalConfig(
+    aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization,
+    profiler_level=torch_npu.profiler.ProfilerLevel.Level1,
+    l2_cache=False
+)
+with torch_npu.profiler.profile(
+    activities=[
+        torch_npu.profiler.ProfilerActivity.CPU, 
+        torch_npu.profiler.ProfilerActivity.NPU
+    ],
+    record_shapes=True,
+    profile_memory=True,
+    with_stack=True,
+    experimental_config=experimental_config,
+    schedule=torch.profiler.schedule(wait=10, warmup=0, active=1, repeat=1),
+    on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data")
+) as prof:
+  # 模型训练代码
+  for epoch, data in enumerate(dataloader):
+      train_model_one_step(model, data)
+      prof.step()
+```
+
+#### 采集方式二：start，stop方式进行采集
+
+```python
+import torch_npu
+experimental_config = torch_npu.profiler._ExperimentalConfig(
+    aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization,
+    profiler_level=torch_npu.profiler.ProfilerLevel.Level1,
+    l2_cache=False
+)
+prof = torch_npu.profiler.profile(
+    activities=[
+        torch_npu.profiler.ProfilerActivity.CPU, 
+        torch_npu.profiler.ProfilerActivity.NPU
+    ],
+    record_shapes=True,
+    profile_memory=True,
+    with_stack=True,
+    experimental_config=experimental_config,
+    on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./profiling_data"))
+# 模型训练代码
+for epoch, data in enumerate(dataloader):
+    if epoch == 11:
+        prof.start()
+    train_model_one_step(model, data)
+    prof.step()
+    if epoch == 11:
+        prof.stop()
+```
+
+#### NPU性能数据目录结构
+
+ascend pytorch profiler数据目录结构如下：
+
+```
+|- ascend_pytorch_profiling
+    |- * _ascend_pt
+        |- ASCEND_PROFILER_OUTPUT
+            |- trace_view.json
+        |- FRAMEWORK
+        |- PROF_XXX
+        |- profiler_info.json
+    |- * _ascend_pt
+```
+
+Profiler配置接口详细介绍可以参考官方文档：[Ascend PyTorch Profiler数据采集与分析](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/modeldevpt/ptmigr/AImpug_0067.html)
+
 ### 子功能介绍
 | 工具名称                                                     | 说明                                                         |
 | ------------------------------------------------------------ | ------------------------------------------------------------ |
diff --git a/profiler/cluster_analyse/README.md b/profiler/cluster_analyse/README.md
index 1202542c48..f3a8b71cb9 100644
--- a/profiler/cluster_analyse/README.md
+++ b/profiler/cluster_analyse/README.md
@@ -2,7 +2,7 @@
 cluster_analyse（集群分析工具）是在集群场景下，通过此工具来进行集群数据的分析，当前主要对基于通信域的迭代内耗时分析、通信时间分析以及通信矩阵分析为主， 从而定位慢卡、慢节点以及慢链路问题。
 
 ## 性能数据采集
-当前集群调优工具主要支持Ascend Pytorch Profiler采集方式下的集群数据。
+当前集群调优工具主要支持Ascend PyTorch Profiler采集方式下的集群数据。采集方式参考：[Profiling数据采集](https://gitee.com/ascend/att/tree/master/profiler)，此工具只需要通过Ascend PyTorch Porfiler工具采集NPU的性能数据即可。
 
 我们要求至少是L1级别的数据。
 ```python
diff --git a/profiler/compare_tools/README.md b/profiler/compare_tools/README.md
index c1d643fdc5..17d26d07e2 100644
--- a/profiler/compare_tools/README.md
+++ b/profiler/compare_tools/README.md
@@ -23,6 +23,8 @@ pip3 install numpy
 
 ### 性能数据采集
 
+使用本工具之前需要采集GPU或者NPU的性能数据，然后进行性能比对分析。
+
 #### GPU性能数据采集
 
 通过PyTorch Profiler工具采集GPU的性能数据，参考链接：[torch.profiler](https://pytorch.org/docs/stable/profiler.html)。
@@ -64,7 +66,7 @@ pytorch profiler数据目录结构如下：
 ```
 
 #### NPU性能数据采集
-通过Ascend PyTorch Profiler工具采集NPU的性能数据，采集参数配置跟GPU一致，参考链接：[Ascend PyTorch Profiler数据采集与分析](https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/modeldevpt/ptmigr/AImpug_0067.html)。
+通过Ascend PyTorch Profiler工具（与PyTorch Profiler工具对标）采集NPU的性能数据，采集参数配置跟GPU一致，具体可以参考链接：[Profiling数据采集](https://gitee.com/ascend/att/tree/master/profiler)。
 
 将GPU的性能数据采集代码中torch.profiler替换成torch_npu.profiler。
 
diff --git a/profiler/merge_profiling_timeline/README.md b/profiler/merge_profiling_timeline/README.md
index 546e6c55a5..5075f6bc2f 100644
--- a/profiler/merge_profiling_timeline/README.md
+++ b/profiler/merge_profiling_timeline/README.md
@@ -7,7 +7,7 @@ merge_profiling_timeline（合并大json工具）支持合并Profiling的timelin
 
 ### 性能数据采集
 
-使用msprof采集性能数据，将采集到的所有节点的性能数据拷贝到当前环境同一目录下，以下假设数据在/home/test/cann_profiling下。
+使用Ascend PyTorch Profiler或者E2E性能采集工具采集性能数据，E2E profiling将被废弃，不建议使用。Ascend PyTorch Profiler采集方式参考：[Profiling数据采集](https://gitee.com/ascend/att/tree/master/profiler)。将采集到的所有节点的性能数据拷贝到当前环境同一目录下，以下假设数据在/home/test/cann_profiling下。
 
 E2E Profiling数据目录结构示例如下：
 
-- 
Gitee