diff --git a/docs/mindformers/docs/source_en/feature/other_training_features.md b/docs/mindformers/docs/source_en/feature/other_training_features.md index 680a54ce9206e8b5d1c79e4930d0218b8de99e75..e1e0272c36e68193550e5bc98ed847b2029642a6 100644 --- a/docs/mindformers/docs/source_en/feature/other_training_features.md +++ b/docs/mindformers/docs/source_en/feature/other_training_features.md @@ -80,16 +80,19 @@ max_grad_norm: 1.0 For MoE (Mixture of Experts), there are fragmented expert computation operations and communications. The GroupedMatmul operator merges multi-expert computations to improve the training performance of MoE. By invoking the GroupedMatmul operator, multiple expert computations are fused to achieve acceleration. +The `token_dispatcher` routes different tokens (input subwords or subunits) to different experts, compute units, or branches for independent processing based on the computed routing strategy. It primarily relies on `all_to_all` communication. + ### Configuration and Usage #### YAML Parameter Configuration -To enable GroupedMatmul in MoE scenarios, users only need to configure the `use_gmm` parameter under the moe_config section in the configuration file and set it to `True`: +In scenarios where GroupedMatmul needs to be enabled for MoE, users only need to set the `use_gmm` option to `True` under the `moe_config` section in the configuration file. If the fused operator for `token_permute` is required, configure `use_fused_ops_permute` to `True`: ```yaml moe_config: ... use_gmm: True + use_fused_ops_permute: True ... ``` diff --git a/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md b/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md index 72be28f1e1e00cfa58c81819e4df7a46492d0159..1f5fef8b111d4677df936e36e2f3db928413ab34 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md +++ b/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md @@ -80,16 +80,19 @@ runner_wrapper: 针对MoE单卡多专家计算,存在细碎的专家计算操作与通信,通过GroupedMatmul算子对多专家计算进行合并,提升MoE单卡多专家训练性能。通过调用GroupedMatmul算子,对多个专家计算进行融合达到加速效果。 +`token_dispatcher`可以根据根据计算后的路由策略,将不同的 token(输入的子词/子单元)路由分派给不同的专家(Expert)、计算单元或分支进行独立处理,该模块主要有`all_to_all`通信构成。 + ### 配置与使用 #### YAML 参数配置 -用户在需要MoE开启GroupedMatmul的场景下,只需在配置文件中的 `moe_config` 项下配置 `use_gmm` 项,设置为`True`即可: +用户在需要MoE开启GroupedMatmul的场景下,只需在配置文件中的 `moe_config` 项下配置 `use_gmm` 项,设置为`True`。如果需要使用`token_permute`融合算子,配置`use_fused_ops_permute`为`True`: ```yaml moe_config: ... use_gmm: True + use_fused_ops_permute: True ... ```