diff --git a/tutorials/source_en/debug/profiler.md b/tutorials/source_en/debug/profiler.md index b9cf26b15dbb478ed669c3b4784e3b61922c012f..7834b2c26331666161a1822ead570ae2df98a900 100644 --- a/tutorials/source_en/debug/profiler.md +++ b/tutorials/source_en/debug/profiler.md @@ -26,54 +26,9 @@ Add the MindSpore Profiler related interfaces in the training script, see [MindS The interface supports two collection modes: CallBack mode and custom for loop mode, and supports both Graph and PyNative modes. -#### CallBack Mode Collection Example - -```python -import mindspore - -class StopAtStep(mindspore.Callback): - def __init__(self, start_step, stop_step): - super(StopAtStep, self).__init__() - self.start_step = start_step - self.stop_step = stop_step - experimental_config = mindspore.profiler._ExperimentalConfig() - self.profiler = mindspore.profiler.profile(start_profile=False, experimental_config=experimental_config, - schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=self.stop_step - self.start_step + 1, repeat=1, skip_first=0), - on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data")) - - def on_train_step_begin(self, run_context): - cb_params = run_context.original_args() - step_num = cb_params.cur_step_num - if step_num == self.start_step: - self.profiler.start() - - def on_train_step_end(self, run_context): - cb_params = run_context.original_args() - step_num = cb_params.cur_step_num - if self.start_step <= step_num <= self.stop_step: - self.profiler.step() - if step_num == self.stop_step: - self.profiler.stop() -``` - -For the complete case, refer to [CallBack mode collection complete code example](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/call_back_profiler.py). - #### Example Collection in a Custom for Loop Mode -In custom for loop mode, users can enable Profiler through setting schedule and on_trace_ready parameters. - -There are five parameters that can be configured in the schedule, namely: skip_first, -wait, warmup, active, and repeat. Here, "skip_first" indicates skipping the previous "skip_first" steps; "wait" indicates the waiting stage. -Skip the wait steps; "warmup" indicates the preheating stage. Skip the warmup steps. "active" indicates collecting active steps; -"repeat" indicates the number of times the execution is repeated. Among them, one repeat includes wait+warmup+active steps. -After all the steps within a repeat have been executed, the performance data will be parsed by the callback function configured through on_strace_ready. - -For example: The model training consists of 100 steps. The schedule is configured as schedule = schedule(skip_first=10, -wait=10, warmup=5, active=5, repeat=2), indicating that the first 10 steps are skipped. -Starting from the 11th step, in the first repeat, 10 steps will be waited for, 5 steps of preheating will be executed, -and finally the performance data of a total of 5 steps from the 26th to the 30th will be collected. -In the second repeat, it will continue to wait for 10 steps, perform 5 steps of preheating, and finally collect the -performance data of a total of 5 steps from step 46 to step 50. +In custom for loop mode, users can enable Profiler by configuring schedule parameter. Sample as follows: @@ -109,12 +64,59 @@ with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivi prof.step() ``` -After the function is enabled, kernel_details.csv in disk drive data contains a column of Step ID information. According to the schedule configuration, skip_first skips 0 steps, wait 0 step, warmup 0 step, and collection starts from the step of 0. Then the step of 0 are collected, so the Step ID is 0, indicating that step of 0 are collected. - -> The disk loading path of profiler is specified through the tensorboard_trace_handler parameter of on_trace_ready. tensorboard_trace_handler will parse the performance data by default. If the user does not configure tensorboard_trace_handler, the data will be written to the '/data' folder in the same-level directory of the current script by default. The performance data can be parsed through the off-line parsing function. The off-line parsing function can be referred to in [Method 4: Off-line Parsing](https://www.mindspore.cn/tutorials/en/master/debug/profiler.html#method-4-off-line-parsing). +- schedule: After schedule is enabled, kernel_details.csv in disk drive data contains a column of Step ID information. According to the schedule configuration, skip_first skips 0 steps, wait 0 step, warmup 0 step. Based on the active value being 1, data collection starts from step 0 and continues for 1 step. Therefore, the Step ID is 0, indicating that the 0th step is being collected. +- on_trace_ready: The disk loading path of profiler is specified through the tensorboard_trace_handler parameter of on_trace_ready. tensorboard_trace_handler will parse the performance data by default. If the user does not configure tensorboard_trace_handler, the data will be written to the '/data' folder in the same-level directory of the current script by default. The performance data can be parsed through the off-line parsing function. The off-line parsing function can be referred to [Method 4: Off-line Parsing](https://www.mindspore.cn/tutorials/en/master/debug/profiler.html#method-4-off-line-parsing). For the complete case, refer to [custom for loop collection complete code example](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/for_loop_profiler.py). +**The principle of configuring schedule parameters is as follows:** + +As illustrated in the following figure, schedule has 5 configurable parameters: skip_first, wait, warmup, active, and repeat. Among them, skip_first indicates skipping the first skip_first steps; wait represents the waiting phase, skipping wait steps; warmup represents the warm-up phase, skipping warmup steps; active indicates collecting active steps; repeat indicates the number of repetitions. One repeat includes wait+warmup+active steps. After all steps in a repeat are executed, the callback function configured via on_trace_ready will be executed to parse performance data. For detailed descriptions of each parameter, please refer to the [schedule API](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.profiler.schedule.html). + +![schedule.png](../../source_zh_cn/debug/images/schedule.png) + +For example: The model training consists of 100 steps. The schedule is configured as schedule = schedule(skip_first=10, +wait=10, warmup=5, active=5, repeat=2), indicating that the first 10 steps are skipped. +Starting from the 11th step, in the first repeat, 10 steps will be waited for, 5 steps of preheating will be executed, +and finally the performance data of a total of 5 steps from the 26th to the 30th will be collected. +In the second repeat, it will continue to wait for 10 steps, perform 5 steps of preheating, and finally collect the +performance data of a total of 5 steps from step 46 to step 50. + +> - Profiler generates multiple performance data files in the same directory based on the repeat count. Each repeat corresponds to a folder containing performance data collected from all active steps in that repeat. When repeat is configured to 0, the specific number of repetitions is determined by the total number of steps, continuously repeating the wait-warmup-active cycle until all steps are completed. +> - The schedule needs to be used with [mindspore.profiler.profile.step](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.profiler.profile.html#mindspore.profiler.profile.step) interface. If you only configure schedule without using mindspore.profiler.profile.step interface to collect data, all collected data will belong to step 0. Therefore, performance data files will only be generated when step 0 corresponds to active (wait, warmup, skip_first are all set to 0). + +#### CallBack Mode Collection Example + +```python +import mindspore + +class StopAtStep(mindspore.Callback): + def __init__(self, start_step, stop_step): + super(StopAtStep, self).__init__() + self.start_step = start_step + self.stop_step = stop_step + experimental_config = mindspore.profiler._ExperimentalConfig() + self.profiler = mindspore.profiler.profile(start_profile=False, experimental_config=experimental_config, + schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=self.stop_step - self.start_step + 1, repeat=1, skip_first=0), + on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data")) + + def on_train_step_begin(self, run_context): + cb_params = run_context.original_args() + step_num = cb_params.cur_step_num + if step_num == self.start_step: + self.profiler.start() + + def on_train_step_end(self, run_context): + cb_params = run_context.original_args() + step_num = cb_params.cur_step_num + if self.start_step <= step_num <= self.stop_step: + self.profiler.step() + if step_num == self.stop_step: + self.profiler.stop() +``` + +For the complete case, refer to [CallBack mode collection complete code example](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/call_back_profiler.py). + ### Method 2: Dynamic Profiler Enabling Users can use the mindspore.profiler.DynamicProfilerMonitor interface to enable Profiler without interrupting the training process, modify the configuration file, and complete the collection task under the new configuration. This interface requires a JSON configuration file. The JSON file must be named "profiler_config.json", if not configured, a default JSON configuration file is generated. diff --git a/tutorials/source_zh_cn/debug/images/schedule.png b/tutorials/source_zh_cn/debug/images/schedule.png index ddcc5c57e2dd5d8c2676a624c4b980e4afadec89..e2b8817f811cfe1f52a4fe56fc2e2d4097967238 100644 Binary files a/tutorials/source_zh_cn/debug/images/schedule.png and b/tutorials/source_zh_cn/debug/images/schedule.png differ diff --git a/tutorials/source_zh_cn/debug/profiler.md b/tutorials/source_zh_cn/debug/profiler.md index 5b00c15842bfc84d055c15502515b1aeac2cf242..77ba45f3750ac2309ec06df40ac85589bc2100a2 100644 --- a/tutorials/source_zh_cn/debug/profiler.md +++ b/tutorials/source_zh_cn/debug/profiler.md @@ -24,53 +24,11 @@ 在训练脚本中添加MindSpore profile相关接口,profile接口详细介绍请参考[MindSpore profile参数详解](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.profiler.profile.html)。 -该接口支持两种采集方式:CallBack方式和自定义for循环方式,且在Graph和PyNative两种模式下都支持。 - -#### CallBack方式采集样例 - -```python -import mindspore - -class StopAtStep(mindspore.Callback): - def __init__(self, start_step, stop_step): - super(StopAtStep, self).__init__() - self.start_step = start_step - self.stop_step = stop_step - experimental_config = mindspore.profiler._ExperimentalConfig() - self.profiler = mindspore.profiler.profile(start_profile=False, experimental_config=experimental_config, - schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=self.stop_step - self.start_step + 1, repeat=1, skip_first=0), - on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data")) - - def on_train_step_begin(self, run_context): - cb_params = run_context.original_args() - step_num = cb_params.cur_step_num - if step_num == self.start_step: - self.profiler.start() - - def on_train_step_end(self, run_context): - cb_params = run_context.original_args() - step_num = cb_params.cur_step_num - if self.start_step <= step_num <= self.stop_step: - self.profiler.step() - if step_num == self.stop_step: - self.profiler.stop() -``` - -完整案例请参考[CallBack方式采集完整代码样例](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/call_back_profiler.py)。 +该接口支持两种采集方式:自定义for循环方式和CallBack方式,且在Graph和PyNative两种模式下都支持。 #### 自定义for循环方式采集样例 -自定义for循环方式下,用户可以通过设置schedule以及on_trace_ready参数来使能Profiler。 - -如下图,schedule中有5个参数可以配置,分别为:skip_first、wait、warmup、active、repeat。其中skip_first表示跳过前skip_first个step;wait表示等待阶段, -跳过wait个step;warmup表示预热阶段,跳过warmup个step;active表示采集active个step;repeat表示重复执行次数。其中1个repeat包括wait+warmup+active个step。 -一个repeat内所有step执行完之后会执行通过on_strace_ready配置的回调函数解析性能数据。 - -![schedule.png](./images/schedule.png) - -例如:模型训练共100个step,schedule配置为schedule = schedule(skip_first=10, wait=10, warmup=5, active=5, repeat=2),表示跳过前10个step, -从第11个step开始,在第1个repeat中将等待10个step,执行5个step的预热,最终采集第26~第30个step(一共5个step)的性能数据, -在第2个repeat中将继续等待10个step,执行5个step的预热,最终采集第46个~第50个step(一共5个step)的性能数据。 +自定义for循环方式下,用户可以通过配置schedule参数来使能Profiler。 样例如下: @@ -106,12 +64,58 @@ with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivi prof.step() ``` -使能后,落盘数据中kernel_details.csv中包含了Step ID一列信息。根据schedule的配置,skip_first跳过0个step,wait等待0个step,warmup预热0个step。根据active为1,则从第0个step开始采集,采集1个step。因此Step ID为0,表示采集的是第0个step。 - -> profiler的落盘路径是通过on_trace_ready的tensorboard_trace_handler参数指定的,tensorboard_trace_handler会默认解析性能数据,用户如果没有配置tensorboard_trace_handler,数据会默认落盘到当前脚本同级目录的'/data'文件夹下,可以通过离线解析功能解析性能数据,离线解析功能可参考[方式四:离线解析](https://www.mindspore.cn/tutorials/zh-CN/master/debug/profiler.html#%E6%96%B9%E5%BC%8F%E5%9B%9B-%E7%A6%BB%E7%BA%BF%E8%A7%A3%E6%9E%90)。 +- schedule:使能后,落盘数据中kernel_details.csv中包含了Step ID一列信息。根据样例中schedule的配置,skip_first跳过0个step,wait等待0个step,warmup预热0个step。根据active为1,则从第0个step开始采集,采集1个step。因此Step ID为0,表示采集的是第0个step。 +- on_trace_ready:profiler的落盘路径是通过on_trace_ready的tensorboard_trace_handler参数指定的,tensorboard_trace_handler会默认解析性能数据,用户如果没有配置tensorboard_trace_handler,数据会默认落盘到当前脚本同级目录的'/data'文件夹下,可以通过离线解析功能解析性能数据,离线解析功能可参考[方式四:离线解析](https://www.mindspore.cn/tutorials/zh-CN/master/debug/profiler.html#%E6%96%B9%E5%BC%8F%E5%9B%9B-%E7%A6%BB%E7%BA%BF%E8%A7%A3%E6%9E%90)。 完整案例参考[自定义for循环采集完整代码样例](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/for_loop_profiler.py)。 +**schedule参数配置原理如下:** + +如下图,schedule中有5个参数可以配置,分别为:skip_first、wait、warmup、active、repeat。其中skip_first表示跳过前skip_first个step;wait表示等待阶段, +跳过wait个step;warmup表示预热阶段,跳过warmup个step;active表示采集active个step;repeat表示重复执行次数。其中1个repeat包括wait+warmup+active个step。 +一个repeat内所有step执行完之后,会执行on_trace_ready配置的回调函数解析性能数据。各个参数的详细介绍请参考[schedule API文档](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.profiler.schedule.html)。 + +![schedule.png](./images/schedule.png) + +例如:模型训练共100个step,schedule配置为schedule = schedule(skip_first=10, wait=10, warmup=5, active=5, repeat=2),表示跳过前10个step, +从第11个step开始,在第1个repeat中将等待10个step,执行5个step的预热,最终采集第26~第30个step(一共5个step)的性能数据, +在第2个repeat中将继续等待10个step,执行5个step的预热,最终采集第46个~第50个step(一共5个step)的性能数据。 + +> - profiler根据repeat次数在同一目录下生成多份性能数据。每个repeat对应一个文件夹,包含该repeat中所有active step采集到的性能数据。当repeat配置为0时,表示重复执行的具体次数由总step数确定,不断重复wait-warmup-active直到所有step执行完毕。 +> - schedule需要配合[mindspore.profiler.profile.step](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.profiler.profile.html#mindspore.profiler.profile.step)接口使用,如果配置了schedule而没有调用mindspore.profiler.profile.step接口进行数据采集,则profiler数据采集区间的所有数据都属于第0个step,因此只有在第0个step对应active(wait、warmup、skip_first都配置为0)时,才会生成性能数据文件。 + +#### CallBack方式采集样例 + +```python +import mindspore + +class StopAtStep(mindspore.Callback): + def __init__(self, start_step, stop_step): + super(StopAtStep, self).__init__() + self.start_step = start_step + self.stop_step = stop_step + experimental_config = mindspore.profiler._ExperimentalConfig() + self.profiler = mindspore.profiler.profile(start_profile=False, experimental_config=experimental_config, + schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=self.stop_step - self.start_step + 1, repeat=1, skip_first=0), + on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data")) + + def on_train_step_begin(self, run_context): + cb_params = run_context.original_args() + step_num = cb_params.cur_step_num + if step_num == self.start_step: + self.profiler.start() + + def on_train_step_end(self, run_context): + cb_params = run_context.original_args() + step_num = cb_params.cur_step_num + if self.start_step <= step_num <= self.stop_step: + self.profiler.step() + if step_num == self.stop_step: + self.profiler.stop() +``` + +完整案例请参考[CallBack方式采集完整代码样例](https://gitee.com/mindspore/docs/blob/master/docs/sample_code/profiler/call_back_profiler.py)。 + ### 方式二:动态profiler使能 在训练过程中,如果用户想要在不中断训练流程的前提下,修改配置文件并完成新配置下的采集任务,可以使用mindspore.profiler.DynamicProfilerMonitor接口使能。该接口需要配置一个JSON文件,该JSON文件的命名必须为"profiler_config.json",如果不配置则会生成一个默认的JSON配置文件。