diff --git a/docs/mindformers/docs/source_zh_cn/feature/monitor.md b/docs/mindformers/docs/source_zh_cn/feature/monitor.md index 2667cb4cf8b1fab57a71e0dc60e1edcced13d0d8..0f6cb63d8d218895dc9d6e7753712f43a3fdc226 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/monitor.md +++ b/docs/mindformers/docs/source_zh_cn/feature/monitor.md @@ -27,6 +27,10 @@ monitor_config: device_local_norm_format: ['log', 'tensorboard'] optimizer_state_format: null weight_state_format: null + weight_stable_rank_format: null + weight_eigenvalue_format: null + weight_aggregation: False + experts_abstract: False throughput_baseline: null print_struct: False check_for_global_norm: False @@ -57,6 +61,10 @@ callbacks: | monitor_config.device_local_norm_format | 设置指标`device_local_norm`的记录形式 | str或list[str] | | monitor_config.optimizer_state_format | 设置指标`optimizer_state`的记录形式 | str或list[str] | | monitor_config.weight_state_format | 设置指标`权重L2-norm`的记录形式 | str或list[str] | +| monitor_config.weight_stable_rank_format | 设置指标`权重stable_rank`的记录形式 | str或list[str] | +| monitor_config.weight_eigenvalue_format | 设置指标`权重最大特征值`的记录形式 | str或list[str] | +| monitor_config.weight_aggregation | 设置计算指标`weight_stable_rank`和`weight_eigenvalue`时是否先通信做权重聚合 | bool | +| monitor_config.experts_abstract | MOE模型下设置展示指标`weight_stable_rank`和`weight_eigenvalue`在log中的展示形式:是否全量展示(MOE模型下指标`weight_stable_rank`和`weight_eigenvalue`不支持tensorboard显示) | bool | | monitor_config.throughput_baseline | 设置指标`吞吐量线性度`的基线值,需要为正数。会同时写入到 Tensorboard 和日志。未设置时默认为`null`,表示不监控该指标 | int或float | | monitor_config.print_struct | 设置是否打印模型的全部可训练参数名。若为`True`,则会在第一个step开始时打印所有可训练参数的名称,并在step结束后退出训练。默认为`False` | bool | | monitor_config.check_for_global_norm | 设置是否开启指标`global norm`的异常监测。默认为`False` | bool |