From 3a8f348f48a2e8057b10251f0d35f64e017f9043 Mon Sep 17 00:00:00 2001 From: k30067541 <1045916357@qq.com> Date: Fri, 29 Nov 2024 15:51:10 +0800 Subject: [PATCH] =?UTF-8?q?=E3=80=90=E3=80=91=E6=96=B0=E5=A2=9E=E8=8A=AF?= =?UTF-8?q?=E7=89=87=E7=A1=AC=E4=BB=B6=E7=B2=BE=E5=BA=A6=E5=9C=A8=E7=BA=BF?= =?UTF-8?q?=E5=8E=8B=E6=B5=8B=E9=85=8D=E7=BD=AE=E6=96=87=E6=A1=A3=E8=AF=B4?= =?UTF-8?q?=E6=98=8E?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/mindformers/docs/source_en/appendix/conf_files.md | 3 +++ docs/mindformers/docs/source_zh_cn/appendix/conf_files.md | 3 +++ 2 files changed, 6 insertions(+) diff --git a/docs/mindformers/docs/source_en/appendix/conf_files.md b/docs/mindformers/docs/source_en/appendix/conf_files.md index d7da74374b..afd834fd98 100644 --- a/docs/mindformers/docs/source_en/appendix/conf_files.md +++ b/docs/mindformers/docs/source_en/appendix/conf_files.md @@ -192,6 +192,9 @@ MindFormers provides encapsulated Callbacks function class, mainly to achieve to | global_batch_size | Set the number of global batch data samples in `MFLossMonitor`. If this parameter is not set, the system automatically calculates the number of global batch data samples based on the dataset size and parallel strategy | int | | gradient_accumulation_steps | Set the number of gradient accumulation steps in `MFLossMonitor`. If this parameter is not set, the value of this parameter is the same as that of `gradient_accumulation_steps` in [Model Training Configuration](#model-training-configuration) | int | | check_for_nan_in_loss_and_grad | Whether to enable overflow detection in `MFLossMonitor`. After overflow detection is enabled, the training exits if overflow occurs during model training. The default value is `False` | bool | + | enable_stress_detect | Whether to enable stress detection in `MFLossMonitor`. The default value is `False` | bool | + | per_detect_steps | Set the number of steps for stress detection interval in `MFLossMonitor`.It can't exceed `steps_per_epoch`. The default value is `None` | int | + | detect_num | Set the number of continuous stress detection in `MFLossMonitor`.The default value is `None` | int | 2. SummaryMonitor diff --git a/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md b/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md index f553e39005..7eed661a6f 100644 --- a/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md +++ b/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md @@ -192,6 +192,9 @@ MindFormers提供封装后的Callbacks函数类,主要实现在模型训练过 | global_batch_size | 设置`MFLossMonitor`中全局批数据样本数,若不配置该参数,则会根据数据集大小以及并行策略自动计算 | int | | gradient_accumulation_steps | 设置`MFLossMonitor`中梯度累计步数,若不配置该参数,则与[模型训练配置](#模型训练配置)中`gradient_accumulation_steps`一致 | int | | check_for_nan_in_loss_and_grad | 设置是否在`MFLossMonitor`中开启溢出检测,开启后在模型训练过程中出现溢出则退出训练,默认值为`False` | bool | + | enable_stress_detect | 设置是否在`MFLossMonitor`中开启硬件精度在线压测,默认值为`False` | bool | + | per_detect_steps | 设置`MFLossMonitor`中硬件精度在线压测间隔的步数,该值不能超过`steps_per_epoch`,默认值为`None` | int | + | detect_num | 设置`MFLossMonitor`中硬件精度在线压测连续压测的次数,默认值为`None` | int | 2. SummaryMonitor -- Gitee