diff --git a/profiler/msprof_analyze/cluster_analyse/README.md b/profiler/msprof_analyze/cluster_analyse/README.md
index 4ac6f674fc3293a5a434e423ce8a748478846f74..ddac0672c1a4b6aab56410caf8c3635764ca1c97 100644
--- a/profiler/msprof_analyze/cluster_analyse/README.md
+++ b/profiler/msprof_analyze/cluster_analyse/README.md
@@ -108,8 +108,10 @@ experimental_config = torch_npu.profiler._ExperimentalConfig(
 | ep_load_balance              | 集群场景moe负载信息汇总分析，输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。输出交付件cluster_analysis.db增加EPTokensSummary, TopEPTokensInfo分析表格。                                                                                                       | 否    |
 | slow_rank                    | 集群场景通信算子快慢卡汇总分析，输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。输出交付件cluster_analysis.db中展示各个rank按照当前的快慢卡统计算法得出的快慢卡影响次数。                                                                                                              | 否    |
 | mstx2commop                  | 集群场景基于mstx打点信息生成通信算子信息，输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。输出交付件ascend_pytorch_profiler_{rank_id}.db增加COMMUNICATION_OP, STRING_IDS分析表格。                                                                                   | 否    |
-| cluster_time_summary         | 集群场景迭代耗时细粒度拆解，详见[使用指导](../docs/cluster_time_summary.md)。                                                                                                                                                                               | 否    |
-| cluster_time_compare_summary | 集群间迭代耗时比对，详见[使用指导](../docs/cluster_time_summary.md)。                                                                                                                                                                                   | 否    |
+| cluster_time_summary         | 集群场景迭代耗时细粒度拆解，详见[集群耗时细粒度分析与对比指南](../docs/cluster_time_summary.md)。                                                                                                                                                                               | 否    |
+| cluster_time_compare_summary | 集群间迭代耗时比对，详见[集群耗时细粒度分析与对比指南](../docs/cluster_time_summary.md)。                                                                                                                                                                                   | 否    |
+| p2p_pairing                  | 集群场景P2P算子生成全局关联索引，输入性能数据需要基于ascend_pytorch_profiler_{rank_id}.db文件。输出的关联索引会作为一个新的字段`opConnectionId`附在原性能数据ascend_pytorch_profiler_{rank_id}.db文件的`COMMUNICATION_OP`的表中。                                                                  | 否 |
+| pp_chart                     | 基于打点后的ascend_pytorch_profiler_{rank_id}.db文件，分析打点数据，还原pp流水图，详见[pp流水图采集和分析指导](../docs/pp_chart.md)。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 否  |
 | 自定义分析参数                      | 与cann_api_sum、compute_op_sum、hccl_sum等参数功能类似，用户可自定义一套性能数据的分析规则，要求用户开发者详细了解性能分析规则，具体开发指导请参见“[自定义分析规则开发指导](#自定义分析规则开发指导)”。                                                                                                               | 否    |
    
 
diff --git a/profiler/msprof_analyze/docs/img/1F1B.png b/profiler/msprof_analyze/docs/img/1F1B.png
new file mode 100644
index 0000000000000000000000000000000000000000..bf3849a328cee32ae19c811ad4a8acd7a0ca3288
Binary files /dev/null and b/profiler/msprof_analyze/docs/img/1F1B.png differ
diff --git a/profiler/msprof_analyze/docs/img/DualPipeV.png b/profiler/msprof_analyze/docs/img/DualPipeV.png
new file mode 100644
index 0000000000000000000000000000000000000000..7c33ee843bc9971b9ad8d5e8c7c766800ac320d2
Binary files /dev/null and b/profiler/msprof_analyze/docs/img/DualPipeV.png differ
diff --git a/profiler/msprof_analyze/docs/img/pp_chart_effect.png b/profiler/msprof_analyze/docs/img/pp_chart_effect.png
new file mode 100644
index 0000000000000000000000000000000000000000..fed613f39a8d5a3f6fc226501b7e4114273ad9be
Binary files /dev/null and b/profiler/msprof_analyze/docs/img/pp_chart_effect.png differ
diff --git a/profiler/msprof_analyze/docs/img/pp_chart_operation_steps.png b/profiler/msprof_analyze/docs/img/pp_chart_operation_steps.png
new file mode 100644
index 0000000000000000000000000000000000000000..64b5704cdf167c27d135c311759ef092574ba6ff
Binary files /dev/null and b/profiler/msprof_analyze/docs/img/pp_chart_operation_steps.png differ
diff --git a/profiler/msprof_analyze/docs/pp_chart.md b/profiler/msprof_analyze/docs/pp_chart.md
new file mode 100644
index 0000000000000000000000000000000000000000..244ad4bb7c3a4b7b7def90043caed9f75b5dff9c
--- /dev/null
+++ b/profiler/msprof_analyze/docs/pp_chart.md
@@ -0,0 +1,137 @@
+# pp流水图采集和分析指导
+
+## 简介
+pp流水图指的是将实际pp域内的流水排布进行可视化呈现，可以分析全局通信与前反向关键耗时信息。对于transformer的模型1f1b、dupipe等pp并行策略，当前无法可视化展示。本节介绍如何采集pp流水图数据、使用msprof-analyze工具分析pp流水图，以及使用MindStudio Insight工具呈现pp流水图。
+
+下面是1F1B和DualPipeV的理论效果图
+
+![1F1B](img/1F1B.png)
+![DualPipeV](img/DualPipeV.png)
+
+
+## 操作指导
+
+用户想看到pp流水图，需要按照以下三个步骤操作。
+
+### 1. profiling数据采集
+
+前反向数据需要通过mstx接口采集，需要先找到代码里前反向相关函数的位置。最终在性能数据timeline上的Ascend HardWare层呈现。
+
+tips：若用户只关注pp流水图，可以设置采集参数profiler_level为Level_none；若还关注前反向、通信以及send和recv的关联关系，设置采集参数profiler_level为Level1或更高级别。
+
+
+**约束：**：
+* 采集数据时，需要将profiling数据导出格式export_type设置为db。
+* 以下仅为打点示例，需要根据用户实际代码，准确找到前反向函数的位置，参考下面用装饰器的方式实现打点。
+
+* 若项目使用 Megatron 框架：可直接按照场景一的方法进行打点操作；若项目使用 Mindspeed 框架：需先确认是否开启 DualPipeV 功能，若已开启，则按照场景二的方法进行打点操作；若无法明确区分，如果能找到对应项目中与打点相关的两个核心文件，在这两个文件的打点代码位置处，添加对应的打点逻辑，确保覆盖所有可能场景。
+
+**场景一：**
+
+1. 传统pipeline（不开dualpipe），在```megatron/core/pipeline_parallel/schedules.py```里面添加如下代码（添加在```backward_step```函数定义的后面）：
+```python
+import torch_npu
+def step_wrapper(func, msg: str):
+    def wrapper(*args, **kwargs):
+        new_msg = {"name": msg}
+        mstx_state_step_range_id = torch_npu.npu.mstx.range_start(str(new_msg), torch_npu.npu.current_stream())
+        out = func(*args, **kwargs)
+        if mstx_state_step_range_id is not None:
+            torch_npu.npu.mstx.range_end(mstx_state_step_range_id)
+            mstx_state_step_range_id = None
+        return out
+    return wrapper
+
+forward_step = step_wrapper(forward_step, "forward_step")
+backward_step = step_wrapper(backward_step, "backward_step")
+```
+
+2. 保存上述脚本文件后，执行训练。训练完成后，在xxx目录下生成性能数据文件目录，用于后续mstt工具分析。
+
+**场景二：**
+
+1. DualPipeV，找到前反向代码，在```mindspeed/core/pipeline_parallel/dualpipev/dualpipev_schedules.py```里面添加如下代码（添加在```forward_backward_pipeline_with_cutinhalf```函数定义的前面）：
+```python
+import torch_npu
+def step_wrapper(func, msg: str):
+    def wrapper(*args, **kwargs):
+        new_msg = {"name": msg}
+        if msg = "forward_step_with_model_graph" and kwargs.get("extra_block_kwargs") is not None:
+            new_msg["name"] = "forward_backward_overlaping"
+        if "current_microbatch" in kwargs:
+            new_msg["current_microbatch"] = kwargs["current_microbatch"]
+        if msg == "WeightGradStore_pop" and len(WeightGradStore.cache) == 0:
+            mstx_state_step_range_id = None
+        else:
+            mstx_state_step_range_id = torch_npu.npu.mstx.range_start(str(new_msg), torch_npu.npu.current_stream())
+        out = func(*args, **kwargs)
+        if mstx_state_step_range_id is not None:
+            torch_npu.npu.mstx.range_end(mstx_state_step_range_id)
+            mstx_state_step_range_id = None
+        return out
+    return wrapper
+
+forward_step_with_model_graph = step_wrapper(forward_step_with_model_graph, "forward_step_with_model_graph")
+forward_step_no_model_graph = step_wrapper(forward_step_no_model_graph, "forward_step_no_model_graph")
+backward_step_with_model_graph = step_wrapper(backward_step_with_model_graph, "backward_step_with_model_graph")
+backward_step = step_wrapper(backward_step, "backward_step")
+WeightGradStore.pop = step_wrapper(WeightGradStore.pop, "WeightGradStore.pop")
+```
+
+同时，采集profiling数据时，如果使用的是MindSpeed，未使用MindSpeed-LLM，需要在prof定义（```prof = torch_npu.profiler.profile(...)```）的后面添加metadata代码：
+```
+prof.add_metadata('pp_info', json.dumps(
+    {
+        'pp_type': 'dualpipev',
+        'microbatch_num': 10,
+    }
+))
+# microbatch_num根据公式计算实际的值：microbatch_num = global_batch_size // micro_batch_size // data_parallel_size
+```
+如果使用MindSpeed-LLM，在```mindspeed-llm/training/training.py```中```prof.add_metadata_json('distributed_args'...)```的后面添加metadata代码：
+```
+prof.add_metadata('pp_info', json.dumps(
+    {
+        'pp_type': args.schedules_method,
+        'microbatch_num': args.global_batch_size // args.micro_batch_size // args.data_parallel_size
+    }
+))
+```
+
+2. 保存上述脚本文件后，执行训练。训练完成后，在xxx目录下生成xxx_ascend_pt性能数据文件目录，用于后续mstt工具分析。
+
+### 2. msprof-analyze工具分析
+
+**命令行使能：**
+```
+msprof-analyze cluster -m pp_chart -d ./cluster_data
+```
+**参数说明：**  
+* `-d` 第一步打点后采集到的集群数据路径
+* 其余参数：与cluster集群分析功能支持的参数一致，详见[参数列表](../cluster_analyse/README.md)  
+
+**输出数据：**  
+* 存储位置：每个rank的数据ASCEND_PROFILER_OUTPUT/ascend_pytorch_profiler_{rank_id}.db里面新增一张表StepTaskInfo
+* 数据表名：StepTaskInfo
+
+用户无需关注该表字段的具体含义，可以直接使用MindStudio Insight呈现。
+
+**字段说明：**
+
+| 字段名 | 类型 | 含义 |
+| ------ | ---- | ---- |
+| name    | TEXT    | 前反向信息，对应pp流水图色块显示的名称 |
+| startNs | INTEGER | 前反向task在device上开始时间 |
+| endNs   | INTEGER | 前反向task在device上结束时间 |
+| type    | INTEGER | 类型，不同类型显示不同颜色 |
+
+### 3. MindStudio Insight呈现
+MindStudio Insight工具的详细安装和操作请参见[《MindStudio Insight工具用户指南》](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html)。
+
+在MindStudio Insight工具导入mstt工具分析后的性能数据，在Summary页面点击generate后按照如下截图配置：
+
+![操作步骤](img/pp_chart_operation_steps.png)
+
+如下为pp_chart完成pp流水图分析后的呈现效果：
+
+![操作步骤](img/pp_chart_effect.png)
\ No newline at end of file