From 80533c18abeb45c24a9f0336259f6ef6d4616d52 Mon Sep 17 00:00:00 2001
From: huan <3174348550@qq.com>
Date: Mon, 9 Jun 2025 13:53:22 +0800
Subject: [PATCH] modify error links in mindformers
---
.../docs/source_en/advanced_development/dev_migration.md | 2 +-
.../advanced_development/precision_optimization.md | 6 +++---
docs/mindformers/docs/source_en/env_variables.md | 2 +-
docs/mindformers/docs/source_en/feature/safetensors.md | 2 +-
.../docs/source_en/guide/supervised_fine_tuning.md | 2 +-
.../docs/source_zh_cn/advanced_development/dev_migration.md | 2 +-
.../advanced_development/precision_optimization.md | 2 +-
docs/mindformers/docs/source_zh_cn/env_variables.md | 2 +-
.../docs/source_zh_cn/guide/supervised_fine_tuning.md | 2 +-
9 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
index 2fe7a74370..42d6b8110a 100644
--- a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
+++ b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
@@ -46,7 +46,7 @@ All tokenizer classes must be inherited from the PretrainedTokenizer or Pretrain
### Preparing a Weight and a Dataset
-If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html).
+If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#weight-format-conversion).
For details about how to prepare a dataset, see [Dataset](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html) or the model document, for example, [Llama2 Description Document > Dataset Preparation](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87).
diff --git a/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md
index a749593416..dab574a711 100644
--- a/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md
+++ b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md
@@ -146,7 +146,7 @@ After setting the environment variables, start the program training to get the c
### Other Introductions
-In addition to the full amount of operator Dump introduced above, the tool also supports partial data Dump, overflow Dump, specified-condition Dump and so on. Limited to space, interested users can refer to [Dump function debugging](https://www.mindspore.cn/tutorials/en/master/debug/dump.html) for configuration and use. In addition, the msprobe precision debugging tool is provided. msprobe is a tool package under the precision debugging component of the MindStudio Training Tools suite. It mainly includes functions such as precision pre-check, overflow detection, and precision comparison. For more information, refer to [msprobe User Guide](https://gitee.com/ascend/mstt/tree/master/debug/precision_tools/msprobe).
+In addition to the full amount of operator Dump introduced above, the tool also supports partial data Dump, overflow Dump, specified-condition Dump and so on. Limited to space, interested users can refer to [Dump function debugging](https://www.mindspore.cn/tutorials/en/master/debug/dump.html) for configuration and use. In addition, the msprobe precision debugging tool is provided. msprobe is a tool package under the precision debugging component of the MindStudio Training Tools suite. It mainly includes functions such as precision pre-check, overflow detection, and precision comparison. For more information, refer to [msprobe User Guide](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe).
## Generalized Processes for Precision Positioning
@@ -187,7 +187,7 @@ Since features such as model parallelism, flow parallelism, sequence parallelism
#### Weight Conversion
-During training, MindSpore is loaded with the same weights as PyTorch. In case of pre-training scenarios, you can use PyTorch to save an initialized weight and then convert it to MindSpore weights. Because MindSpore weight names differ from PyTorch, the essence of weight conversion is to change the names in the PyTorch weight dict to MindSpore weight names to support MindSpore loading. Refer to [weight conversion guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html) for weight conversion.
+During training, MindSpore is loaded with the same weights as PyTorch. In case of pre-training scenarios, you can use PyTorch to save an initialized weight and then convert it to MindSpore weights. Because MindSpore weight names differ from PyTorch, the essence of weight conversion is to change the names in the PyTorch weight dict to MindSpore weight names to support MindSpore loading. Refer to [weight conversion guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#weight-format-conversion) for weight conversion.
Both MindSpore and PyTorch support `bin` format data, loading the same dataset for training ensures consistency from step to step.
@@ -254,7 +254,7 @@ By comparing the loss and local norm of the first step (step1) and the second st
#### Comparison of Step1 Losses
-After fixing the weights, dataset, and randomness, the difference in the loss value of the first step of training is compared. The loss value of the first step is obtained from the forward computation of the network. If the difference with the benchmark loss is large, it can be determined that there is an precision difference in the forward computation, which may be due to the model structure is not aligned, and the precision of the operator is abnormal. The tensor values of each layer of MindSpore and PyTorch can be obtained by printing or Dump tool. Currently, the tool does not have automatic comparison function, users need to manually identify the correspondence for comparison. For the introduction of MindSpore Dump tool, please refer to [Introduction of Precision Debugging Tools](#introduction-to-precision-debugging-tools), and for the use of PyTorch Dump tool, please refer to [Function Explanation of Precision Tools](https://gitee.com/ascend/mstt/blob/master/debug/precision_tools/msprobe/docs/05.data_dump_PyTorch.md).
+After fixing the weights, dataset, and randomness, the difference in the loss value of the first step of training is compared. The loss value of the first step is obtained from the forward computation of the network. If the difference with the benchmark loss is large, it can be determined that there is an precision difference in the forward computation, which may be due to the model structure is not aligned, and the precision of the operator is abnormal. The tensor values of each layer of MindSpore and PyTorch can be obtained by printing or Dump tool. Currently, the tool does not have automatic comparison function, users need to manually identify the correspondence for comparison. For the introduction of MindSpore Dump tool, please refer to [Introduction of Precision Debugging Tools](#introduction-to-precision-debugging-tools), and for the use of PyTorch Dump tool, please refer to [Function Explanation of Precision Tools](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md).
Find the correspondence of layers through PyTorch api_stack_dump.pkl file, and MindSpore statistic.csv file, and initially determine the degree of difference between input and output through max, min, and L2Norm. If you need further comparison, you can load the corresponding npy data for detailed comparison.
diff --git a/docs/mindformers/docs/source_en/env_variables.md b/docs/mindformers/docs/source_en/env_variables.md
index c23acc5123..82af089a3c 100644
--- a/docs/mindformers/docs/source_en/env_variables.md
+++ b/docs/mindformers/docs/source_en/env_variables.md
@@ -40,4 +40,4 @@ The following environment variables are supported by MindSpore Transformers.
| **MS_ENABLE_FA_FLATTEN** | on | Controls whether support FlashAttention flatten optimization. | `on`: Enable FlashAttention flatten optimization;
`off`: Disable FlashAttention flatten optimization. | Provide a fallback mechanism for models that have not yet been adapted to FlashAttention flatten optimization. |
| **EXPERIMENTAL_KERNEL_LAUNCH_GROUP** | NA | Control whether to support the batch parallel submission of operators. If supported, enable the parallel submission and configure the number of parallel submissions. | `thread_num`: The number of concurrent threads is not recommended to be increased. The default value is 2;
`kernel_group_num`: Total number of operator groups, 'kernel_group_num/thread_num' groups per thread, default is' 8 '. | This feature will continue to evolve in the future, and the subsequent behavior may change. Currently, only the `deepseek` reasoning scenario is supported, with certain performance optimization, but other models using this feature may deteriorate, and users need to use it with caution, as follows:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`. |
| **FORCE_EAGER** | False | Control whether to disable jit mode. | `False`: Enable jit mode;
`True`: Do not enable jit mode. | Jit compiles functions into a callable MindSpore graph, sets FORCE_EAGER to False to enable jit mode, which can generate performance benefits. Currently, only inference mode is supported. |
-| **MS_ENABLE_TFT** | NA | Enable [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) feature. Turn on TTP, UCE, ARF or TRE feature. | The value of the environment variable can be:"{TTP:1,UCE:1,ARF:1,TRE:1}", when using a certain feature, the corresponding field can be configured as "1". | Usage can refer to [High Availability](https://www.mindspore.cn/mindformers/docs/en/dev/function/high_availability.html). |
+| **MS_ENABLE_TFT** | NA | Enable [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) feature. Turn on TTP, UCE, ARF or TRE feature. | The value of the environment variable can be:"{TTP:1,UCE:1,ARF:1,TRE:1}", when using a certain feature, the corresponding field can be configured as "1". | Usage can refer to [High Availability](https://www.mindspore.cn/mindformers/docs/en/dev/feature/high_availability.html). |
diff --git a/docs/mindformers/docs/source_en/feature/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md
index 8863707d88..bc28fe03d4 100644
--- a/docs/mindformers/docs/source_en/feature/safetensors.md
+++ b/docs/mindformers/docs/source_en/feature/safetensors.md
@@ -15,7 +15,7 @@ There are two main types of Safetensors files: complete weights files and distri
Safetensors complete weights can be obtained in two ways:
1. Download directly from Huggingface.
-2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html#safetensors-weight-merging).
+2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#distributed-weight-slicing-and-merging).
Huggingface Safetensors example catalog structure is as follows:
diff --git a/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md b/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md
index ae22b57881..893f4eae8a 100644
--- a/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md
+++ b/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md
@@ -243,7 +243,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
--run_mode finetune" 8
```
-When the distributed strategy of the weights does not match the distributed strategy of the model, the weights need to be transformed. The load weight path should be set to the upper path of the directory named with `rank_0`, and the weight auto transformation function should be enabled by setting `--auto_trans_ckpt True` . For a more detailed description of the scenarios and usage of distributed weight transformation, please refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html).
+When the distributed strategy of the weights does not match the distributed strategy of the model, the weights need to be transformed. The load weight path should be set to the upper path of the directory named with `rank_0`, and the weight auto transformation function should be enabled by setting `--auto_trans_ckpt True` . For a more detailed description of the scenarios and usage of distributed weight transformation, please refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#distributed-weight-slicing-and-merging).
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
index 4efe088164..fc2cfe2ab9 100644
--- a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
@@ -46,7 +46,7 @@ MindSpore Transformers提供了[PretrainedTokenizer](https://www.mindspore.cn/mi
### 准备权重和数据集
-如已有基于PyTorch的模型权重,可以参考[权重转换文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)将权重转换为MindSpore格式的权重。
+如已有基于PyTorch的模型权重,可以参考[权重转换文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html#%E6%9D%83%E9%87%8D%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2)将权重转换为MindSpore格式的权重。
数据集的准备可以参考[数据集文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html),或参考模型文档,如[Llama2说明文档——数据集准备](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)。
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md b/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md
index 914af36745..9d49686110 100644
--- a/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md
@@ -187,7 +187,7 @@ export MINDSPORE_DUMP_CONFIG=${JSON_PATH}
#### 权重转换
-训练过程中,MindSpore与PyTorch加载同一份权重。若是预训练场景,可以使用PyTorch保存一个初始化权重后,转换为MindSpore权重。因为MindSpore的权重名称与PyTorch有差异,权重转换的本质是将PyTorch权重dict中的名字改为MindSpore权重名字,以支持MindSpore加载。权重转换参考[权重转换指导](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。
+训练过程中,MindSpore与PyTorch加载同一份权重。若是预训练场景,可以使用PyTorch保存一个初始化权重后,转换为MindSpore权重。因为MindSpore的权重名称与PyTorch有差异,权重转换的本质是将PyTorch权重dict中的名字改为MindSpore权重名字,以支持MindSpore加载。权重转换参考[权重转换指导](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html#%E6%9D%83%E9%87%8D%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2)。
MindSpore与PyTorch均支持`bin`格式数据,加载相同的数据集进行训练,保证每个step一致。
diff --git a/docs/mindformers/docs/source_zh_cn/env_variables.md b/docs/mindformers/docs/source_zh_cn/env_variables.md
index f3ed1b7a4c..e58a500de1 100644
--- a/docs/mindformers/docs/source_zh_cn/env_variables.md
+++ b/docs/mindformers/docs/source_zh_cn/env_variables.md
@@ -40,4 +40,4 @@
| **MS_ENABLE_FA_FLATTEN** | on | 控制 是否支持 FlashAttention flatten 优化。 | `on`:启用 FlashAttention flatten 优化;
`off`: 禁用 FlashAttention flatten 优化。 | 对于还未适配FlashAttention flatten 优化的模型提供回退机制。 |
| **EXPERIMENTAL_KERNEL_LAUNCH_GROUP** | NA | 控制是否支持算子批量并行下发,支持开启并行下发,并配置并行数 | `thread_num`: 并发线程数,一般不建议增加,默认值为`2`;
`kernel_group_num`: 算子分组总数量,每线程`kernel_group_num/thread_num`个组,默认值为`8`。 | 该特性后续还会继续演进,后续行为可能会有变更,当前仅支持`deepseek`推理场景,有一定的性能优化,但是其他模型使用该特性可能会有劣化,用户需要谨慎使用,使用方法如下:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`。 |
| **FORCE_EAGER** | False | 控制是否**不开启**jit模式。 | `False`: 开启jit模式;
`True`: 不开启jit模式。 | Jit将函数编译成一张可调用的MindSpore图,设置FORCE_EAGER为False开启jit模式,可以获取性能收益,当前仅支持推理模式。 |
-| **MS_ENABLE_TFT** | NA | 使能 [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) 特性,表示启用 TTP、UCE、ARF 或 TRE 功能。 | 取值为"{TTP:1,UCE:1,ARF:1,TRE:1}",使用某一功能时,可将对应字段配置为"1"。 | 使用方式可以参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/high_availability.html)。 |
+| **MS_ENABLE_TFT** | NA | 使能 [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) 特性,表示启用 TTP、UCE、ARF 或 TRE 功能。 | 取值为"{TTP:1,UCE:1,ARF:1,TRE:1}",使用某一功能时,可将对应字段配置为"1"。 | 使用方式可以参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/high_availability.html)。 |
diff --git a/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md b/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md
index 5dcdb51211..4ba7b69452 100644
--- a/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md
+++ b/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md
@@ -243,7 +243,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
--run_mode finetune" 8
```
-当权重的分布式策略和模型的分布式策略不一致时,需要对权重进行切分转换。加载权重路径应设置为以 `rank_0` 命名的目录的上一层路径,同时开启权重自动切分转换功能 `--auto_trans_ckpt True` 。关于分布式权重切分转换的场景和使用方式的更多说明请参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html)。
+当权重的分布式策略和模型的分布式策略不一致时,需要对权重进行切分转换。加载权重路径应设置为以 `rank_0` 命名的目录的上一层路径,同时开启权重自动切分转换功能 `--auto_trans_ckpt True` 。关于分布式权重切分转换的场景和使用方式的更多说明请参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html#%E6%9D%83%E9%87%8D%E5%88%87%E5%88%86%E4%B8%8E%E5%90%88%E5%B9%B6)。
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
--
Gitee