diff --git a/docs/mindformers/docs/source_en/appendix/conf_files.md b/docs/mindformers/docs/source_en/appendix/conf_files.md index d7da74374bdde4ad3bcb7f3d62fb55f808888ebd..afd834fd98c2b36975f6334e173cefc7f443f987 100644 --- a/docs/mindformers/docs/source_en/appendix/conf_files.md +++ b/docs/mindformers/docs/source_en/appendix/conf_files.md @@ -192,6 +192,9 @@ MindFormers provides encapsulated Callbacks function class, mainly to achieve to | global_batch_size | Set the number of global batch data samples in `MFLossMonitor`. If this parameter is not set, the system automatically calculates the number of global batch data samples based on the dataset size and parallel strategy | int | | gradient_accumulation_steps | Set the number of gradient accumulation steps in `MFLossMonitor`. If this parameter is not set, the value of this parameter is the same as that of `gradient_accumulation_steps` in [Model Training Configuration](#model-training-configuration) | int | | check_for_nan_in_loss_and_grad | Whether to enable overflow detection in `MFLossMonitor`. After overflow detection is enabled, the training exits if overflow occurs during model training. The default value is `False` | bool | + | enable_stress_detect | Whether to enable stress detection in `MFLossMonitor`. The default value is `False` | bool | + | per_detect_steps | Set the number of steps for stress detection interval in `MFLossMonitor`.It can't exceed `steps_per_epoch`. The default value is `None` | int | + | detect_num | Set the number of continuous stress detection in `MFLossMonitor`.The default value is `None` | int | 2. SummaryMonitor diff --git a/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md b/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md index f553e3900575fc7c621f8b0c967a2edc2ffac280..7eed661a6f5b47c87fd58a2cc44975084d0ef3fa 100644 --- a/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md +++ b/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md @@ -192,6 +192,9 @@ MindFormers提供封装后的Callbacks函数类,主要实现在模型训练过 | global_batch_size | 设置`MFLossMonitor`中全局批数据样本数,若不配置该参数,则会根据数据集大小以及并行策略自动计算 | int | | gradient_accumulation_steps | 设置`MFLossMonitor`中梯度累计步数,若不配置该参数,则与[模型训练配置](#模型训练配置)中`gradient_accumulation_steps`一致 | int | | check_for_nan_in_loss_and_grad | 设置是否在`MFLossMonitor`中开启溢出检测,开启后在模型训练过程中出现溢出则退出训练,默认值为`False` | bool | + | enable_stress_detect | 设置是否在`MFLossMonitor`中开启硬件精度在线压测,默认值为`False` | bool | + | per_detect_steps | 设置`MFLossMonitor`中硬件精度在线压测间隔的步数,该值不能超过`steps_per_epoch`,默认值为`None` | int | + | detect_num | 设置`MFLossMonitor`中硬件精度在线压测连续压测的次数,默认值为`None` | int | 2. SummaryMonitor