diff --git a/docs/mindformers/docs/source_en/feature/other_training_features.md b/docs/mindformers/docs/source_en/feature/other_training_features.md
index 680a54ce9206e8b5d1c79e4930d0218b8de99e75..e1e0272c36e68193550e5bc98ed847b2029642a6 100644
--- a/docs/mindformers/docs/source_en/feature/other_training_features.md
+++ b/docs/mindformers/docs/source_en/feature/other_training_features.md
@@ -80,16 +80,19 @@ max_grad_norm: 1.0
 
 For MoE (Mixture of Experts), there are fragmented expert computation operations and communications. The GroupedMatmul operator merges multi-expert computations to improve the training performance of MoE. By invoking the GroupedMatmul operator, multiple expert computations are fused to achieve acceleration.
 
+The `token_dispatcher` routes different tokens (input subwords or subunits) to different experts, compute units, or branches for independent processing based on the computed routing strategy. It primarily relies on `all_to_all` communication.
+
 ### Configuration and Usage
 
 #### YAML Parameter Configuration
 
-To enable GroupedMatmul in MoE scenarios, users only need to configure the `use_gmm` parameter under the moe_config section in the configuration file and set it to `True`:
+In scenarios where GroupedMatmul needs to be enabled for MoE, users only need to set the `use_gmm` option to `True` under the `moe_config` section in the configuration file. If the fused operator for `token_permute` is required, configure `use_fused_ops_permute` to `True`:
 
 ```yaml
 moe_config:
   ...
   use_gmm: True
+  use_fused_ops_permute: True
   ...
 ```
 
diff --git a/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md b/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md
index 72be28f1e1e00cfa58c81819e4df7a46492d0159..1f5fef8b111d4677df936e36e2f3db928413ab34 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md
@@ -80,16 +80,19 @@ runner_wrapper:
 
 针对MoE单卡多专家计算，存在细碎的专家计算操作与通信，通过GroupedMatmul算子对多专家计算进行合并，提升MoE单卡多专家训练性能。通过调用GroupedMatmul算子，对多个专家计算进行融合达到加速效果。
 
+`token_dispatcher`可以根据根据计算后的路由策略，将不同的 token（输入的子词/子单元）路由分派给不同的专家（Expert）、计算单元或分支进行独立处理，该模块主要有`all_to_all`通信构成。
+
 ### 配置与使用
 
 #### YAML 参数配置
 
-用户在需要MoE开启GroupedMatmul的场景下，只需在配置文件中的 `moe_config` 项下配置 `use_gmm` 项，设置为`True`即可：
+用户在需要MoE开启GroupedMatmul的场景下，只需在配置文件中的 `moe_config` 项下配置 `use_gmm` 项，设置为`True`。如果需要使用`token_permute`融合算子，配置`use_fused_ops_permute`为`True`：
 
 ```yaml
 moe_config:
   ...
   use_gmm: True
+  use_fused_ops_permute: True
   ...
 ```