diff --git a/docs/sample_code/profiler/for_loop_profiler.py b/docs/sample_code/profiler/for_loop_profiler.py index bacdaabc353a4f684c16fbd39e580ec62fd6d758..447034a831aa558723ccfe758c703080d7ae6161 100644 --- a/docs/sample_code/profiler/for_loop_profiler.py +++ b/docs/sample_code/profiler/for_loop_profiler.py @@ -64,7 +64,7 @@ if __name__ == "__main__": with mindspore.profiler.profile( activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], schedule=mindspore.profiler.schedule( - wait=1, warmup=1, active=2, repeat=1, skip_first=2 + wait=0, warmup=0, active=1, repeat=1, skip_first=0 ), on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), profile_memory=False, diff --git a/tutorials/source_en/debug/profiler.md b/tutorials/source_en/debug/profiler.md index a6ddd557171afea04fb6b31ddbf461cae0a4edb7..2a41ac285967271c3b78ec0ad0236086959de2d8 100644 --- a/tutorials/source_en/debug/profiler.md +++ b/tutorials/source_en/debug/profiler.md @@ -62,7 +62,18 @@ For the complete case, refer to [CallBack mode collection complete code example] In custom for loop mode, users can enable Profiler through setting schedule and on_trace_ready parameters. -For example, if you want to collect the performance data of the first two steps, you can use the following configuration to collect. +There are five parameters that can be configured in the schedule, namely: skip_first, +wait, warmup, active, and repeat. Here, "skip_first" indicates skipping the previous "skip_first" steps; "wait" indicates the waiting stage. +Skip the wait steps; "warmup" indicates the preheating stage. Skip the warmup steps. "active" indicates collecting active steps; +"repeat" indicates the number of times the execution is repeated. Among them, one repeat includes wait+warmup+active steps. +After all the steps within a repeat have been executed, the performance data will be parsed by the callback function configured through on_strace_ready. + +For example: The model training consists of 100 steps. The schedule is configured as schedule = schedule(skip_first=10, +wait=10, warmup=5, active=5, repeat=2), indicating that the first 10 steps are skipped. +Starting from the 11th step, in the first repeat, 10 steps will be waited for, 5 steps of preheating will be executed, +and finally the performance data of a total of 5 steps from the 26th to the 30th will be collected. +In the second repeat, it will continue to wait for 10 steps, perform 5 steps of preheating, and finally collect the +performance data of a total of 5 steps from step 46 to step 50. Sample as follows: @@ -86,8 +97,8 @@ experimental_config = mindspore.profiler._ExperimentalConfig( # Initialize profile with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], - schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2, - repeat=1, skip_first=2), + schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=1, + repeat=1, skip_first=0), on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), profile_memory=False, experimental_config=experimental_config) as prof: @@ -97,7 +108,9 @@ with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivi prof.step() ``` -After the function is enabled, kernel_details.csv in disk drive data contains a column of Step ID information. According to the schedule configuration, skip_first skips 2 steps, wait 1 step, warmup 1 step, and collection starts from the 4th step. Then the fourth and fifth steps are collected, so the Step ID is 4 and 5, indicating that the fourth and fifth steps are collected. +After the function is enabled, kernel_details.csv in disk drive data contains a column of Step ID information. According to the schedule configuration, skip_first skips 0 steps, wait 0 step, warmup 0 step, and collection starts from the step of 0. Then the step of 0 are collected, so the Step ID is 0, indicating that step of 0 are collected. + +> The disk loading path of profiler is specified through the tensorboard_trace_handler parameter of on_trace_ready. tensorboard_trace_handler will parse the performance data by default. If the user does not configure tensorboard_trace_handler, the data will be written to the '/data' folder in the same-level directory of the current script by default. The performance data can be parsed through the off-line parsing function. The off-line parsing function can be referred to in [Method 4: Off-line Parsing](https://www.mindspore.cn/tutorials/en/master/debug/profiler.html#method-4-off-line-parsing). For the complete case, refer to [custom for loop collection complete code example](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/for_loop_profiler.py). diff --git a/tutorials/source_zh_cn/debug/images/schedule.png b/tutorials/source_zh_cn/debug/images/schedule.png new file mode 100644 index 0000000000000000000000000000000000000000..ddcc5c57e2dd5d8c2676a624c4b980e4afadec89 Binary files /dev/null and b/tutorials/source_zh_cn/debug/images/schedule.png differ diff --git a/tutorials/source_zh_cn/debug/profiler.md b/tutorials/source_zh_cn/debug/profiler.md index c718b434a7c7db2e0e561a7b68d3ea39ed460970..11f2ae778e730713b6710f1763e9488c02aa88e0 100644 --- a/tutorials/source_zh_cn/debug/profiler.md +++ b/tutorials/source_zh_cn/debug/profiler.md @@ -62,7 +62,15 @@ class StopAtStep(mindspore.Callback): 自定义for循环方式下,用户可以通过设置schedule以及on_trace_ready参数来使能Profiler。 -例如用户想要采集前两个step的性能数据,可以使用如下配置的schedule进行采集。 +如下图,schedule中有5个参数可以配置,分别为:skip_first、wait、warmup、active、repeat。其中skip_first表示跳过前skip_first个step;wait表示等待阶段, +跳过wait个step;warmup表示预热阶段,跳过warmup个step;active表示采集active个step;repeat表示重复执行次数。其中1个repeat包括wait+warmup+active个step。 +一个repeat内所有step执行完之后会执行通过on_strace_ready配置的回调函数解析性能数据。 + +![schedule.png](./images/schedule.png) + +例如:模型训练共100个step,schedule配置为schedule = schedule(skip_first=10, wait=10, warmup=5, active=5, repeat=2),表示跳过前10个step, +从第11个step开始,在第1个repeat中将等待10个step,执行5个step的预热,最终采集第26~第30个step(一共5个step)的性能数据, +在第2个repeat中将继续等待10个step,执行5个step的预热,最终采集第46个~第50个step(一共5个step)的性能数据。 样例如下: @@ -86,8 +94,8 @@ experimental_config = mindspore.profiler._ExperimentalConfig( # 初始化profile with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU], - schedule=mindspore.profiler.schedule(wait=1, warmup=1, active=2, - repeat=1, skip_first=2), + schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=1, + repeat=1, skip_first=0), on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"), profile_memory=False, experimental_config=experimental_config) as prof: @@ -97,7 +105,9 @@ with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivi prof.step() ``` -使能后,落盘数据中kernel_details.csv中包含了Step ID一列信息,根据schedule的配置,skip_first跳过2步,wait等待1步,warmup预热1步,从第4步开始采集,根据active为2,则采集第4、5步,因此Step ID为4、5,表示采集的是第4、5个step。 +使能后,落盘数据中kernel_details.csv中包含了Step ID一列信息。根据schedule的配置,skip_first跳过0个step,wait等待0个step,warmup预热0个step。根据active为1,则从第0个step开始采集,采集1个step。因此Step ID为0,表示采集的是第0个step。 + +> profiler的落盘路径是通过on_trace_ready的tensorboard_trace_handler参数指定的,tensorboard_trace_handler会默认解析性能数据,用户如果没有配置tensorboard_trace_handler,数据会默认落盘到当前脚本同级目录的'/data'文件夹下,可以通过离线解析功能解析性能数据,离线解析功能可参考[方式四:离线解析](https://www.mindspore.cn/tutorials/zh-CN/master/debug/profiler.html#%E6%96%B9%E5%BC%8F%E5%9B%9B-%E7%A6%BB%E7%BA%BF%E8%A7%A3%E6%9E%90)。 完整案例参考[自定义for循环采集完整代码样例](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/for_loop_profiler.py)。