diff --git a/tutorials/source_en/debug/profiler.md b/tutorials/source_en/debug/profiler.md index 6db35690ce773ea2432650794cb5a0281e963758..e265f2d6aa3a77f70b76964d0b18fda8e9301d5c 100644 --- a/tutorials/source_en/debug/profiler.md +++ b/tutorials/source_en/debug/profiler.md @@ -22,10 +22,66 @@ There are four ways to collect training performance data, and the following desc ### Method 1: mindspore.Profiler Interface Enabling -Add the MindSpore Profiler related interfaces in the training script, see [MindSpore Profiler parameter details](https://www.mindspore.cn/docs/en/r2.6.0rc1/api_python/mindspore/mindspore.Profiler.html) for details. +Add the MindSpore Profiler related interfaces in the training script, users can refer to [MindSpore Profiler parameter details](https://www.mindspore.cn/docs/en/r2.6.0rc1/api_python/mindspore/mindspore.Profiler.html) and [_ExperimentalConfig Parameter Details](https://www.mindspore.cn/docs/en/r2.6.0rc1/api_python/mindspore/mindspore.profiler._ExperimentalConfig.html) to configure parameters such as profiler_level according to their data requirements. The interface supports two collection modes: CallBack mode and custom for loop mode, and supports both Graph and PyNative modes. +#### Example Collection in a Custom for Loop Mode + +In custom for loop mode, users can enable Profiler by configuring schedule parameter. + +Sample as follows: + +```python +import mindspore +from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, HostSystem + +# Define model training times +steps = 15 + +# Define the training model network +net = Net() + +# Configure the extensibility parameters +experimental_config = mindspore.profiler._ExperimentalConfig( + profiler_level=ProfilerLevel.Level0, + aic_metrics=AicoreMetrics.AiCoreNone, + l2_cache=False, + mstx=False, + data_simplification=False, + host_sys=[HostSystem.CPU, HostSystem.MEM] +) + +# Initialize profile +with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], + schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=1, + repeat=1, skip_first=0), + # Disable online parsing by setting analyse_flag=False in tensorboard_trace_handler + on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), + profile_memory=False, + experimental_config=experimental_config) as prof: + for step in range(steps): + train(net) + # Call step collection + prof.step() +``` + +- schedule: After schedule is enabled, kernel_details.csv in disk drive data contains a column of Step ID information. According to the schedule configuration, skip_first skips 0 steps, wait 0 step, warmup 0 step. Based on the active value being 1, data collection starts from step 0 and continues for 1 step. Therefore, the Step ID is 0, indicating that the 0th step is being collected. +- on_trace_ready: The disk loading path of profiler is specified through the tensorboard_trace_handler parameter of on_trace_ready. tensorboard_trace_handler will parse the performance data by default. If the user does not configure tensorboard_trace_handler, the data will be written to the '/data' folder in the same-level directory of the current script by default. The performance data can be parsed through the off-line parsing function. The off-line parsing function can be referred to [Method 4: Off-line Parsing](https://www.mindspore.cn/tutorials/en/r2.6.0rc1/debug/profiler.html#method-4-off-line-parsing). + +For the complete case, refer to [custom for loop collection complete code example](https://gitee.com/mindspore/docs/blob/r2.6.0rc1/docs/sample_code/profiler/for_loop_profiler.py). + +**The principle of configuring schedule parameters is as follows:** + +As illustrated in the following figure, schedule has 5 configurable parameters: skip_first, wait, warmup, active, and repeat. Among them, skip_first indicates skipping the first skip_first steps; wait represents the waiting phase, skipping wait steps; warmup represents the warm-up phase, skipping warmup steps; active indicates collecting active steps; repeat indicates the number of repetitions. One repeat includes wait+warmup+active steps. After all steps in a repeat are executed, the callback function configured via on_trace_ready will be executed to parse performance data. For detailed descriptions of each parameter, please refer to the [schedule API](https://www.mindspore.cn/docs/en/r2.6.0rc1/api_python/mindspore/mindspore.profiler.schedule.html). + +![schedule.png](../../source_zh_cn/debug/images/schedule.png) + +For example: If there are 100 steps (0-99) in model training and the schedule is configured as `schedule(skip_first=10, wait=10, warmup=5, active=5, repeat=2)` . Profiler will first skip the first 10 steps (0-9). Starting from step 10, the first repeat will wait for 10 steps (10-19), warm up for 5 steps (20-24), and finally collect performance data for 5 steps (25-29). The second repeat will again wait for 10 steps (30-39), warm up for 5 steps (40-44), and finally collect performance data for 5 steps (45-49). + +> - In single-card scenarios, profiler generates multiple performance data files in the same directory based on the repeat count. Each repeat corresponds to a folder containing performance data collected from all active steps in that repeat. In multi-card scenarios, each card generates performance data independently, and the data from each card is divided into multiple parts based on the repeat count. When repeat is configured to 0, the specific number of repetitions is determined by the total number of steps, continuously repeating the wait-warmup-active cycle until all steps are completed. +> - The schedule needs to be used with [mindspore.profiler.profile.step](https://www.mindspore.cn/docs/en/r2.6.0rc1/api_python/mindspore/mindspore.profiler.profile.html#mindspore.profiler.profile.step) interface. If you only configure schedule without using mindspore.profiler.profile.step interface to collect data, all collected data will belong to step 0. Therefore, performance data files will only be generated when step 0 corresponds to active (wait, warmup, skip_first are all set to 0). + #### CallBack Mode Collection Example ```python @@ -58,49 +114,6 @@ class StopAtStep(mindspore.Callback): For the complete case, refer to [CallBack mode collection complete code example](https://gitee.com/mindspore/docs/blob/r2.6.0rc1/docs/sample_code/profiler/call_back_profiler.py). -#### Example Collection in a Custom for Loop Mode - -In custom for loop mode, users can enable Profiler through setting schedule and on_trace_ready parameters. - -For example, if you want to collect the performance data of the first two steps, you can use the following configuration to collect. - -Sample as follows: - -```python -import mindspore -from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics - -# Define model training times -steps = 15 - -# Define the training model network -net = Net() - -# Configure the extensibility parameters -experimental_config = mindspore.profiler._ExperimentalConfig( - profiler_level=ProfilerLevel.Level0, - aic_metrics=AicoreMetrics.AiCoreNone, - l2_cache=False, - mstx=False, - data_simplification=False) - -# Initialize profile -with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], - schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2, - repeat=1, skip_first=2), - on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), - profile_memory=False, - experimental_config=experimental_config) as prof: - for step in range(steps): - train(net) - # Call step collection - prof.step() -``` - -After the function is enabled, kernel_details.csv in disk drive data contains a column of Step ID information. According to the schedule configuration, skip_first skips 2 steps, wait 1 step, warmup 1 step, and collection starts from the 4th step. Then the fourth and fifth steps are collected, so the Step ID is 4 and 5, indicating that the fourth and fifth steps are collected. - -For the complete case, refer to [custom for loop collection complete code example](https://gitee.com/mindspore/docs/blob/r2.6.0rc1/docs/sample_code/profiler/for_loop_profiler.py). - ### Method 2: Dynamic Profiler Enabling Users can use the mindspore.profiler.DynamicProfilerMonitor interface to enable Profiler without interrupting the training process, modify the configuration file, and complete the collection task under the new configuration. This interface requires a JSON configuration file. The JSON file must be named "profiler_config.json", if not configured, a default JSON configuration file is generated. diff --git a/tutorials/source_zh_cn/debug/images/schedule.png b/tutorials/source_zh_cn/debug/images/schedule.png new file mode 100644 index 0000000000000000000000000000000000000000..e2b8817f811cfe1f52a4fe56fc2e2d4097967238 Binary files /dev/null and b/tutorials/source_zh_cn/debug/images/schedule.png differ diff --git a/tutorials/source_zh_cn/debug/profiler.md b/tutorials/source_zh_cn/debug/profiler.md index c7740504b278464ee2da36b0ebc008ad9f273583..be8c4d9d51c27b5b9ca729d726cf40a9f90b2287 100644 --- a/tutorials/source_zh_cn/debug/profiler.md +++ b/tutorials/source_zh_cn/debug/profiler.md @@ -22,9 +22,67 @@ ### 方式一:mindspore.profiler.profile接口使能 -在训练脚本中添加MindSpore profile相关接口,profile接口详细介绍请参考[MindSpore profile参数详解](https://www.mindspore.cn/docs/zh-CN/r2.6.0rc1/api_python/mindspore/mindspore.profiler.profile.html)。 +在训练脚本中添加MindSpore profile相关接口,用户可以参考[MindSpore profile参数详解](https://www.mindspore.cn/docs/zh-CN/r2.6.0rc1/api_python/mindspore/mindspore.profiler.profile.html)和[_ExperimentalConfig可扩展参数详解](https://www.mindspore.cn/docs/zh-CN/r2.6.0rc1/api_python/mindspore/mindspore.profiler._ExperimentalConfig.html),针对自己的数据需求配置采集性能数据的级别等参数。 -该接口支持两种采集方式:CallBack方式和自定义for循环方式,且在Graph和PyNative两种模式下都支持。 +该接口支持两种采集方式:自定义for循环方式和CallBack方式,且在Graph和PyNative两种模式下都支持。 + +#### 自定义for循环方式采集样例 + +自定义for循环方式下,用户可以通过配置schedule参数来使能Profiler。 + +样例如下: + +```python +import mindspore +from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, HostSystem + +# 定义模型训练次数 +steps = 15 + +# 定义训练模型网络 +net = Net() + +# 配置可扩展参数 +experimental_config = mindspore.profiler._ExperimentalConfig( + profiler_level=ProfilerLevel.Level0, + aic_metrics=AicoreMetrics.AiCoreNone, + l2_cache=False, + mstx=False, + data_simplification=False, + host_sys=[HostSystem.CPU, HostSystem.MEM] +) + +# 初始化profile +with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], + schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=1, + repeat=1, skip_first=0), + # 可以通过配置tensorboard_trace_handler的参数analyse_flag为False关闭在线解析 + on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), + profile_memory=False, + experimental_config=experimental_config) as prof: + for step in range(steps): + train(net) + # 调用step采集 + prof.step() +``` + +- schedule:使能后,落盘数据中kernel_details.csv中包含了Step ID一列信息。根据样例中schedule的配置,skip_first跳过0个step,wait等待0个step,warmup预热0个step。根据active为1,则从第0个step开始采集,采集1个step。因此Step ID为0,表示采集的是第0个step。 +- on_trace_ready:profiler的落盘路径是通过on_trace_ready的tensorboard_trace_handler参数指定的,tensorboard_trace_handler会默认解析性能数据,用户如果没有配置tensorboard_trace_handler,数据会默认落盘到当前脚本同级目录的'/data'文件夹下,可以通过离线解析功能解析性能数据,离线解析功能可参考[方式四:离线解析](https://www.mindspore.cn/tutorials/zh-CN/r2.6.0rc1/debug/profiler.html#%E6%96%B9%E5%BC%8F%E5%9B%9B-%E7%A6%BB%E7%BA%BF%E8%A7%A3%E6%9E%90)。 + +完整案例参考[自定义for循环采集完整代码样例](https://gitee.com/mindspore/docs/blob/r2.6.0rc1/docs/sample_code/profiler/for_loop_profiler.py)。 + +**schedule参数配置原理如下:** + +如下图,schedule中有5个参数可以配置,分别为:skip_first、wait、warmup、active、repeat。其中skip_first表示跳过前skip_first个step;wait表示等待阶段, +跳过wait个step;warmup表示预热阶段,跳过warmup个step;active表示采集active个step;repeat表示重复执行次数。其中1个repeat包括wait+warmup+active个step。 +一个repeat内所有step执行完之后,会执行on_trace_ready配置的回调函数解析性能数据。各个参数的详细介绍请参考[schedule API文档](https://www.mindspore.cn/docs/zh-CN/r2.6.0rc1/api_python/mindspore/mindspore.profiler.schedule.html)。 + +![schedule.png](./images/schedule.png) + +例如:模型训练共100个step(0-99),此时配置schedule为 `schedule(skip_first=10, wait=10, warmup=5, active=5, repeat=2)` 。那么profiler将会先跳过前10个step(0-9)。从step 10开始,第1个repeat将等待10个step(10-19),预热5个step(20-24),最终采集5个step(25-29)的性能数据。第2个repeat重复等待10个step(30-39),预热5个step(40-44),最终采集5个step(45-49)的性能数据。 + +> - 在单卡场景下,profiler根据repeat次数在同一目录下生成多份性能数据。每个repeat对应一个文件夹,包含该repeat中所有active step采集到的性能数据。在多卡场景下,每张卡会独立生成性能数据,每张卡的数据都会根据repeat次数分成多份。当repeat配置为0时,表示重复执行的具体次数由总step数确定,不断重复wait-warmup-active直到所有step执行完毕。 +> - schedule需要配合[mindspore.profiler.profile.step](https://www.mindspore.cn/docs/zh-CN/r2.6.0rc1/api_python/mindspore/mindspore.profiler.profile.html#mindspore.profiler.profile.step)接口使用,如果配置了schedule而没有调用mindspore.profiler.profile.step接口进行数据采集,则profiler数据采集区间的所有数据都属于第0个step,因此只有在第0个step对应active(wait、warmup、skip_first都配置为0)时,才会生成性能数据文件。 #### CallBack方式采集样例 @@ -58,49 +116,6 @@ class StopAtStep(mindspore.Callback): 完整案例请参考[CallBack方式采集完整代码样例](https://gitee.com/mindspore/docs/blob/r2.6.0rc1/docs/sample_code/profiler/call_back_profiler.py)。 -#### 自定义for循环方式采集样例 - -自定义for循环方式下,用户可以通过设置schedule以及on_trace_ready参数来使能Profiler。 - -例如用户想要采集前两个step的性能数据,可以使用如下配置的schedule进行采集。 - -样例如下: - -```python -import mindspore -from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics - -# 定义模型训练次数 -steps = 15 - -# 定义训练模型网络 -net = Net() - -# 配置可扩展参数 -experimental_config = mindspore.profiler._ExperimentalConfig( - profiler_level=ProfilerLevel.Level0, - aic_metrics=AicoreMetrics.AiCoreNone, - l2_cache=False, - mstx=False, - data_simplification=False) - -# 初始化profile -with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], - schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2, - repeat=1, skip_first=2), - on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), - profile_memory=False, - experimental_config=experimental_config) as prof: - for step in range(steps): - train(net) - # 调用step采集 - prof.step() -``` - -使能后,落盘数据中kernel_details.csv中包含了Step ID一列信息,根据schedule的配置,skip_first跳过2步,wait等待1步,warmup预热1步,从第4步开始采集,根据active为2,则采集第4、5步,因此Step ID为4、5,表示采集的是第4、5个step。 - -完整案例参考[自定义for循环采集完整代码样例](https://gitee.com/mindspore/docs/blob/r2.6.0rc1/docs/sample_code/profiler/for_loop_profiler.py)。 - ### 方式二:动态profiler使能 在训练过程中,如果用户想要在不中断训练流程的前提下,修改配置文件并完成新配置下的采集任务,可以使用mindspore.profiler.DynamicProfilerMonitor接口使能。该接口需要配置一个JSON文件,该JSON文件的命名必须为"profiler_config.json",如果不配置则会生成一个默认的JSON配置文件。