From c2a3ac13cfbbcbb015e3f18497e3fb2c9707a424 Mon Sep 17 00:00:00 2001 From: shen_haochen Date: Wed, 27 Aug 2025 17:08:09 +0800 Subject: [PATCH] doc for MS_DISABLE_LCCL_KERNEL_LIST --- .../source_en/parallel/dynamic_cluster.md | 23 +++++++++++++++++ .../source_zh_cn/parallel/dynamic_cluster.md | 25 ++++++++++++++++++- 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/tutorials/source_en/parallel/dynamic_cluster.md b/tutorials/source_en/parallel/dynamic_cluster.md index c0961e182f..71cb0dd897 100644 --- a/tutorials/source_en/parallel/dynamic_cluster.md +++ b/tutorials/source_en/parallel/dynamic_cluster.md @@ -100,6 +100,29 @@ The relevant environment variables: 1 for yes, other values for no. The default is no. The LCCL communication library currently only supports single-machine multi-card scenario and must be executed when the graph compilation level is O0. + + MS_DISABLE_LCCL_KERNELS_LIST + Specifies the blacklist of LCCL operators that are not enabled. + String + Valid operator names, with multiple operators separated by commas (','). + + Takes effect only when using the LCCL communication library.
+ Currently supported LCCL operators:
+ + Notes:
+ - Operator names are case-sensitive
+ - There should be no spaces when multiple operators are separated by commas + + MS_TOPO_TIMEOUT Cluster networking phase timeout time in seconds. diff --git a/tutorials/source_zh_cn/parallel/dynamic_cluster.md b/tutorials/source_zh_cn/parallel/dynamic_cluster.md index 0b19e0be8f..0179d12b7e 100644 --- a/tutorials/source_zh_cn/parallel/dynamic_cluster.md +++ b/tutorials/source_zh_cn/parallel/dynamic_cluster.md @@ -100,7 +100,30 @@ MindSpore**动态组网**特性通过**复用Parameter Server模式训练架构* 1代表开启,0代表关闭。默认为0。 LCCL通信库暂只支持单机多卡,并且必须在图编译等级为O0时执行。 - + + MS_DISABLE_LCCL_KERNELS_LIST + 指定不使能LCCL算子的列表。 + String + 合法的算子名称,多个算子用','分割。 + + 只有在使用LCCL通信库的场景下才生效。
+ 目前LCCL支持的算子:
+ + 注意:
+ - 算子名称区分大小写
+ - 多个算子用','分割时不能有空格 + + + MS_TOPO_TIMEOUT 集群组网阶段超时时间,单位:秒。 Integer -- Gitee