From 673502357aac5deedd4a0a15d32180ecb82539c6 Mon Sep 17 00:00:00 2001 From: SaiYao Date: Mon, 24 Nov 2025 15:06:00 +0800 Subject: [PATCH] =?UTF-8?q?=E3=80=90=E6=9D=83=E9=87=8D=E3=80=91=E6=9B=B4?= =?UTF-8?q?=E6=96=B0=E5=9C=A8=E7=BA=BF=E5=8A=A0=E8=BD=BDHuggingFace?= =?UTF-8?q?=E6=9D=83=E9=87=8D=E6=A8=A1=E5=9E=8B=E8=8C=83=E5=9B=B4?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../docs/source_en/feature/safetensors.md | 12 ++++++------ .../docs/source_zh_cn/feature/safetensors.md | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/mindformers/docs/source_en/feature/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md index e817c12a17..de08ae6a63 100644 --- a/docs/mindformers/docs/source_en/feature/safetensors.md +++ b/docs/mindformers/docs/source_en/feature/safetensors.md @@ -162,12 +162,12 @@ MindSpore Transformers supports training, inference, and resumable training in a ### Configuration Description -| Parameter names | Descriptions | -| ------------------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| load_checkpoint | The path to the folder where the weights are preloaded.
- In case of full weights, fill in the path to the folder where the slices/individual weight files are located.
Note: Huggingface safetensors weights loading is supported (currently only Llama series models are supported). During the online loading process, a copy of the converted MindSpore safetensors weights file is saved to `/output/ms_safetensors`.
- In case of distributed weights, they need to be stored in `model_dir/rank_x/xxx.safetensors` format, with the folder path filled in as `model_dir`. | -| load_ckpt_format | The format of the loaded model weights, optionally `ckpt`, `safetensors`, defaults to `ckpt`.
Loading weights in `safetensors` format needs to change this configuration to `safetensors`. | -| use_parallel | Whether to load in parallel. | -| auto_trans_ckpt | Whether to enable the online slicing function.
- If loading weight is full weight:
a. when `use_parallel: True`, it is judged as distributed loading, `auto_trans_ckpt: True` needs to be set synchronously to turn on online slicing.
b. When `use_parallel: False`, it is judged as single card loading, you need to set `auto_trans_ckpt: False` synchronously to disable the online slicing function.
- If loading weight is distributed weight:
a. Without changing the original slicing strategy, you need to set `auto_trans_ckpt: False` to load directly according to the original slicing strategy.
b. To change the original slicing strategy, set `auto_trans_ckpt: True` and configure `src_strategy_path_or_dir` to be the original slicing strategy file path.
When the task is pulled up, the weights are merged online into full weights, which are sliced and loaded according to the parallelism strategy set in the configuration file. The online merged weights are saved in the current directory under the `/output/unified_checkpoint` file. | +| Parameter names | Descriptions | +|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| load_checkpoint | The path to the folder where the weights are preloaded. Supports MindSpore Safetensors and Hugging Face Safetensors.
For MindSpore Safetensors:
- If it's a complete weight, enter the folder path to the slice/single weight file.
- If it's a distributed weight, it must be stored in the format `model_dir/rank_x/xxx.safetensors`, and the folder path should be `model_dir`.
For Hugging Face Safetensors:
- Supports directly loading model weights downloaded from Hugging Face (currently supports the [Qwen3](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3) and [Qwen3-MoE](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3_moe) series models of the Mcore architecture).
- During loading, it will automatically convert to MindSpore Safetensors for loading, and save a copy of the converted weight file to `/output/ms_safetensors`. | +| load_ckpt_format | The format of the loaded model weights, optionally `ckpt`, `safetensors`, defaults to `ckpt`.
Loading weights in `safetensors` format needs to change this configuration to `safetensors`. | +| use_parallel | Whether to load in parallel. | +| auto_trans_ckpt | Whether to enable the online slicing function.
- If loading weight is full weight:
a. when `use_parallel: True`, it is judged as distributed loading, `auto_trans_ckpt: True` needs to be set synchronously to turn on online slicing.
b. When `use_parallel: False`, it is judged as single card loading, you need to set `auto_trans_ckpt: False` synchronously to disable the online slicing function.
- If loading weight is distributed weight:
a. Without changing the original slicing strategy, you need to set `auto_trans_ckpt: False` to load directly according to the original slicing strategy.
b. To change the original slicing strategy, set `auto_trans_ckpt: True` and configure `src_strategy_path_or_dir` to be the original slicing strategy file path.
When the task is pulled up, the weights are merged online into full weights, which are sliced and loaded according to the parallelism strategy set in the configuration file. The online merged weights are saved in the current directory under the `/output/unified_checkpoint` file. | ### Complete Weight Loading diff --git a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md index bf2b08b3f1..95bc09dc1d 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md +++ b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md @@ -162,12 +162,12 @@ MindSpore Transformers支持训练、推理、续训在单卡多卡全场景下 ### 配置说明 -| 参数名称 | 说明 | -| ---------------- | ------------------------------------------------------------ | -| load_checkpoint | 预加载权重的文件夹路径。
- 如果是完整权重,填写切片/单个权重文件所在文件夹路径。
注:支持Huggingface safetensors权重加载(当前仅支持Llama系列模型)。在线加载过程中,会保存一份转换后的MindSpore safetensors权重文件至`/output/ms_safetensors`下。
- 如果是分布式权重,需按照`model_dir/rank_x/xxx.safetensors`格式存放,文件夹路径填写为`model_dir`。 | -| load_ckpt_format | 加载的模型权重的格式,可选`ckpt`、`safetensors`,默认为`ckpt`。
加载权重为`safetensors`格式时,需配套修改此配置为`safetensors`。 | -| use_parallel | 是否并行加载。 | -| auto_trans_ckpt | 是否开启在线切分功能。
- 如果加载权重是完整权重:
a. `use_parallel: True`时,判断为分布式加载,需同步设置`auto_trans_ckpt: True`,开启在线切分功能。
b. `use_parallel: False`时,判断为单卡加载,需同步设置`auto_trans_ckpt: False`,关闭在线切分功能。
- 如果加载权重是分布式权重:
a. 不改变原有切分策略,需设置`auto_trans_ckpt: False`,直接按原先切分策略直接加载。
b. 改变原有切分策略,需设置`auto_trans_ckpt: True` 并配置`src_strategy_path_or_dir`为原有切分策略文件路径。
任务拉起时,会将权重在线合并为完整权重,并依据配置文件中设定的并行策略进行切分与加载。在线合并的完整权重会保存在当前目录`/output/unified_checkpoint`文件下。 | +| 参数名称 | 说明 | +|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| load_checkpoint | 预加载权重所在的文件夹路径。支持MindSpore Safetensors和Hugging Face Safetensors。
对于MindSpore Safetensors:
- 如果是完整权重,填写切片/单个权重文件所在文件夹路径。
- 如果是分布式权重,需按照`model_dir/rank_x/xxx.safetensors`格式存放,文件夹路径填写为`model_dir`。
对于Hugging Face Safetensors:
- 支持直接加载从Hugging Face下载的模型权重(当前支持 Mcore 架构的 [Qwen3](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3) 及 [Qwen3-MoE](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3_moe) 系列模型)
- 加载过程中,会自动转换成MindSpore Safetensors进行加载,同时保存一份转换后的权重文件至`/output/ms_safetensors`下。 | +| load_ckpt_format | 加载的模型权重的格式,可选`ckpt`、`safetensors`,默认为`ckpt`。
加载权重为`safetensors`格式时,需配套修改此配置为`safetensors`。 | +| use_parallel | 是否并行加载。 | +| auto_trans_ckpt | 是否开启在线切分功能。
- 如果加载权重是完整权重:
a. `use_parallel: True`时,判断为分布式加载,需同步设置`auto_trans_ckpt: True`,开启在线切分功能。
b. `use_parallel: False`时,判断为单卡加载,需同步设置`auto_trans_ckpt: False`,关闭在线切分功能。
- 如果加载权重是分布式权重:
a. 不改变原有切分策略,需设置`auto_trans_ckpt: False`,直接按原先切分策略直接加载。
b. 改变原有切分策略,需设置`auto_trans_ckpt: True` 并配置`src_strategy_path_or_dir`为原有切分策略文件路径。
任务拉起时,会将权重在线合并为完整权重,并依据配置文件中设定的并行策略进行切分与加载。在线合并的完整权重会保存在当前目录`/output/unified_checkpoint`文件下。 | ### 完整权重加载 -- Gitee