diff --git a/docs/mindformers/docs/source_en/feature/ckpt.md b/docs/mindformers/docs/source_en/feature/ckpt.md
index e9a91256b0139d2c21b480f55ff139f9c19cf2d1..98c3acbf99d8bb933058d1e9c3516475b9487245 100644
--- a/docs/mindformers/docs/source_en/feature/ckpt.md
+++ b/docs/mindformers/docs/source_en/feature/ckpt.md
@@ -40,7 +40,7 @@ python convert_weight.py [-h] --model MODEL [--reversed] --input_path INPUT_PATH
### Conversion Example
-Assume that you have downloaded the [Llama2 model weight](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD) and saved it in the `/home/user/torch_weights` path, to convert it to the MindSpore Transformers weight and save it in the `/home/user/ms_weights` path, run the following command:
+Assume that you have downloaded the [Llama3.1 model weight](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD) and saved it in the `/home/user/torch_weights` path, to convert it to the MindSpore Transformers weight and save it in the `/home/user/ms_weights` path, run the following command:
```bash
python convert_weight.py --model llama --input_path /home/user/torch_weights --output_path /home/user/ms_weights/llama.ckpt
@@ -50,21 +50,13 @@ After the preceding steps are performed, the HuggingFace weight is successfully
### Supported Models
-| Parameter Value | Supported models |
-|-----------|---------------------------------------------|
-| llama | Llama2, Llama3, Llama3.1, CodeLlama |
-| baichuan2 | Baichuan2 |
-| glm-n | GLM2, GLM3, GLM3-32K, GLM4 |
-| cogvlm2 | CogVLM2-Video, CogVLM2-Image |
-| qwen | Qwen, Qwen1.5, Qwen2 |
-| qwenvl | QwenVL |
-| internlm | InternLM |
-| internlm2 | InternLM2 |
-| yi | Yi |
-| mixtral | Mixtral |
-| deepseek | DeepSeekCoder, DeepSeekCoder1.5, DeepSeekV2 |
-| gpt | GPT2 |
-| whisper | Whisper |
+| Parameter Value | Supported models |
+|-----------------|------------------------------|
+| llama | Llama3.1 |
+| glm-n | GLM4 |
+| qwen | Qwen2.5 |
+| mixtral | Mixtral |
+| deepseek | DeepSeekV3 |
### Developing Weight Conversion for Unsupported Models
@@ -147,13 +139,13 @@ When a model loads a weight, it automatically checks whether the weight is match
Parameters in the `yaml` file related to **automatic weight conversion** are described as follows:
-| Parameter | Description |
-| ------------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| load_checkpoint | Absolute path or folder path of the pre-loaded weights.
- For a complete set of weights, set this parameter to an absolute path.
- For a distributed weight, set this parameter to the folder path. The distributed weight must be stored in the `model_dir/rank_x/xxx.ckpt` format. The folder path is `model_dir`.
**If there are multiple CKPT files in the rank_x folder, the last CKPT file in the file name sequence is used for conversion by default.** |
-| src_strategy_path_or_dir | Path of [the distributed strategy file](#offline-conversion-configuration) corresponding to the pre-loaded weights.
- If the pre-loaded weights are a complete set of weights, leave this parameter **blank**.
- If the pre-loaded weights are distributed and pipeline parallelism is used when the pre-loaded weights are saved, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- If the pre-loaded weights are distributed and pipeline parallelism is not used when the pre-load weights are saved, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. |
-| auto_trans_ckpt | Specifies whether to enable automatic weight conversion. The value True indicates that it is enabled. The default value is False. |
-| transform_process_num | Number of processes used for automatic weight conversion. The default value is 1.
- If transform_process_num is set to 1, only rank_0 is used for weight conversion. Other processes wait until the conversion ends.
- If transform_process_num is larger than 1, **multiple processes conduct conversion**. For example, for an 8-device task, if transform_process_num is set to 2, rank_0 is used for converting the weights of slices rank_0, rank_1, rank_2, and rank_3, and rank_4 is used for converting the weights of slices rank_4, rank_5, rank_6, and rank_7, and other processes wait until rank_0 and rank_4 complete the conversion.
**Note**:
1. A larger value of transform_process_num indicates a shorter conversion time and **a larger host memory occupied by the conversion**. If the host memory is insufficient, decrease the value of transform_process_num.
2. The value of transform_process_num must be a number that can be exactly divided by and cannot exceed that of NPUs. |
-| transform_by_rank | Specifies whether to use the mindspore.transform_checkpoint_by_rank API for weight conversion.
- If transform_process_num is larger than 1, the value is automatically set to `True`.
- If transform_process_num is set to 1, if the target weight is a distributed weight, the mindspore.transform_checkpoint_by_rank API is cyclically called to convert the weight of each rank slice in serial mode.
- If transform_process_num is set to 1, if the target weight is a complete weight, the value is automatically set to `False`, and the mindspore.transform_checkpoints API is called for weight conversion. |
+| Parameter | Description |
+|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| load_checkpoint | Absolute path or folder path of the pre-loaded weights.
- For a complete set of weights, set this parameter to an absolute path.
- For a distributed weight, set this parameter to the folder path. The distributed weight must be stored in the `model_dir/rank_x/xxx.ckpt` format. The folder path is `model_dir`.
**If there are multiple CKPT files in the rank_x folder, the last CKPT file in the file name sequence is used for conversion by default.** |
+| src_strategy_path_or_dir | Path of [the distributed strategy file](#offline-conversion-configuration) corresponding to the pre-loaded weights.
- If the pre-loaded weights are a complete set of weights, leave this parameter **blank**.
- If the pre-loaded weights are distributed and pipeline parallelism is used when the pre-loaded weights are saved, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- If the pre-loaded weights are distributed and pipeline parallelism is not used when the pre-load weights are saved, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. |
+| auto_trans_ckpt | Specifies whether to enable automatic weight conversion. The value True indicates that it is enabled. The default value is False. |
+| transform_process_num | Number of processes used for automatic weight conversion. The default value is 1.
- If transform_process_num is set to 1, only rank_0 is used for weight conversion. Other processes wait until the conversion ends.
- If transform_process_num is larger than 1, **multiple processes conduct conversion**. For example, for an 8-device task, if transform_process_num is set to 2, rank_0 is used for converting the weights of slices rank_0, rank_1, rank_2, and rank_3, and rank_4 is used for converting the weights of slices rank_4, rank_5, rank_6, and rank_7, and other processes wait until rank_0 and rank_4 complete the conversion.
**Note**:
1. A larger value of transform_process_num indicates a shorter conversion time and **a larger host memory occupied by the conversion**. If the host memory is insufficient, decrease the value of transform_process_num.
2. The value of transform_process_num must be a number that can be exactly divided by and cannot exceed that of NPUs. |
+| transform_by_rank | Specifies whether to use the mindspore.transform_checkpoint_by_rank API for weight conversion.
- If transform_process_num is larger than 1, the value is automatically set to `True`.
- If transform_process_num is set to 1, if the target weight is a distributed weight, the mindspore.transform_checkpoint_by_rank API is cyclically called to convert the weight of each rank slice in serial mode.
- If transform_process_num is set to 1, if the target weight is a complete weight, the value is automatically set to `False`, and the mindspore.transform_checkpoints API is called for weight conversion. |
#### YAML Configurations in Different Scenarios
@@ -221,15 +213,15 @@ When using offline conversion, you can manually configure conversion parameters
Parameters in the `yaml` file related to **offline weight conversion** are described as follows:
-| Parameter | Description |
-| ----------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| src_checkpoint | Absolute path or folder path of the source weight.
- For **a complete set of weights**, set this parameter to an **absolute path**.
- For **distributed weights**, set this parameter to the **folder path**. The distributed weights must be stored in the `model_dir/rank_x/xxx.ckpt` format. The folder path is `model_dir`.
**If there are multiple CKPT files in the rank_x folder, the last CKPT file in the file name sequence is used for conversion by default.** |
-| src_strategy_path_or_dir | Path of the distributed strategy file corresponding to the source weight.
- For a complete set of weights, leave it **blank**.
- For distributed weights, if pipeline parallelism is used, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- For distributed weights, if pipeline parallelism is not used, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. |
-| dst_checkpoint | Path of the folder that stores the target weight. |
-| dst_strategy | Path of the distributed strategy file corresponding to the target weight.
- For a complete set of weights, leave it **blank**.
- For distributed weights, if pipeline parallelism is used, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- For distributed weights, if pipeline parallelism is not used, set this parameter to any **ckpt_strategy_rank_x.ckpt** path.|
-| prefix | Prefix name of the saved target weight. The weight is saved as {prefix}rank_x.ckpt. The default value is checkpoint_. |
-| world_size | Total number of slices of the target weight. Generally, the value is dp \* mp \* pp. |
-| process_num | Number of processes used for offline weight conversion. The default value is 1.
- If process_num is set to 1, **a single process is used for conversion**.
- If process_num is larger than 1, **multi-process conversion** is used. For example, if the target weight for conversion is the distributed weight of eight GPUs and process_num is set to 2, two processes are started to convert the weights of slices rank_0, rank_1, rank_2, and rank_3 and slices rank_4, rank_5, rank_6, and rank_7, respectively. |
+| Parameter | Description |
+|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| src_checkpoint | Absolute path or folder path of the source weight.
- For **a complete set of weights**, set this parameter to an **absolute path**.
- For **distributed weights**, set this parameter to the **folder path**. The distributed weights must be stored in the `model_dir/rank_x/xxx.ckpt` format. The folder path is `model_dir`.
**If there are multiple CKPT files in the rank_x folder, the last CKPT file in the file name sequence is used for conversion by default.** |
+| src_strategy_path_or_dir | Path of the distributed strategy file corresponding to the source weight.
- For a complete set of weights, leave it **blank**.
- For distributed weights, if pipeline parallelism is used, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- For distributed weights, if pipeline parallelism is not used, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. |
+| dst_checkpoint | Path of the folder that stores the target weight. |
+| dst_strategy | Path of the distributed strategy file corresponding to the target weight.
- For a complete set of weights, leave it **blank**.
- For distributed weights, if pipeline parallelism is used, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- For distributed weights, if pipeline parallelism is not used, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. |
+| prefix | Prefix name of the saved target weight. The weight is saved as {prefix}rank_x.ckpt. The default value is checkpoint_. |
+| world_size | Total number of slices of the target weight. Generally, the value is dp \* mp \* pp. |
+| process_num | Number of processes used for offline weight conversion. The default value is 1.
- If process_num is set to 1, **a single process is used for conversion**.
- If process_num is larger than 1, **multi-process conversion** is used. For example, if the target weight for conversion is the distributed weight of eight GPUs and process_num is set to 2, two processes are started to convert the weights of slices rank_0, rank_1, rank_2, and rank_3 and slices rank_4, rank_5, rank_6, and rank_7, respectively. |
#### Offline Conversion Configuration
diff --git a/docs/mindformers/docs/source_en/feature/quantization.md b/docs/mindformers/docs/source_en/feature/quantization.md
index 0cd91eeed5d0899616178d1a80ab68e835c2c11d..4755352753cb34342f250a83e273a42874968f75 100644
--- a/docs/mindformers/docs/source_en/feature/quantization.md
+++ b/docs/mindformers/docs/source_en/feature/quantization.md
@@ -16,4 +16,3 @@ Currently, only the following models are supported, and the supported models are
|-----------------------------------------------------------------------------------------------------------------------------------|
| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/predict_deepseek3_671b.yaml) |
| [DeepSeek-R1](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b.yaml) |
-| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_13b_ptq.yaml) |
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/feature/resume_training.md b/docs/mindformers/docs/source_en/feature/resume_training.md
index ec61b7934d5c2212ef1434069e7a20dc7ba316d2..51726a0643e517b284fdaff1171b5b104d2e8614 100644
--- a/docs/mindformers/docs/source_en/feature/resume_training.md
+++ b/docs/mindformers/docs/source_en/feature/resume_training.md
@@ -14,27 +14,27 @@ MindSpore Transformers supports **step-level resumable training**, which allows
You can modify the configuration file to control resumable training. The main parameters are as follows. For details about other parameters, see the description of CheckpointMonitor.
-| Parameter | Description |
-|------------------|---------------------------------------------------------------------|
-| load_checkpoint | Weight path loaded during resumable training. The path can be a folder path (used to load distributed weights) or a specific weight file path. The default value is an empty string, indicating that no weight is loaded (required for resumable training). |
-| resume_training | Specifies whether to enable resumable training. You can set it to `True` or specify a weight file name. If the value is `True`, the system automatically resumes the training from the last interruption. The default value is `False`. |
-| load_ckpt_async | Determines whether to load model weights and compile in parallel (this configuration does not take effect when auto_trans_ckpt is set to true). The default value is False (serial execution).
When it is `True`, the parallel capability of loading ckpt weights and building model is enabled to reduce the overall time resume training. |
+| Parameter | Description |
+|------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| load_checkpoint | Weight path loaded during resumable training. The path can be a folder path (used to load distributed weights) or a specific weight file path. The default value is an empty string, indicating that no weight is loaded (required for resumable training). |
+| resume_training | Specifies whether to enable resumable training. You can set it to `True` or specify a weight file name. If the value is `True`, the system automatically resumes the training from the last interruption. The default value is `False`. |
+| load_ckpt_async | Determines whether to load model weights and compile in parallel (this configuration does not take effect when auto_trans_ckpt is set to true). The default value is False (serial execution).
When it is `True`, the parallel capability of loading ckpt weights and building model is enabled to reduce the overall time resume training. |
Based on the input parameters, there are four cases.
-| load_checkpoint | resume_training | Description | Recommended or Not|
-|-----------------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
-| Weight file path | True | Resumes a training based on the weights specified by load_checkpoint. | √ |
-| Weight file path | Weight file name | The file name specified by resume_training is invalid. A training is resumed based on the weights specified by load_checkpoint. | × |
-| Weight folder path | True | **Scenario 1: Single-node system, multi-node system+shared directory, or ModelArts**
1. Resumes the training based on the weights recorded in meta.json files and supports fault recovery.
2. Resumes the training based on the latest weight of all ranks if the meta.json file of any rank is missing.
**Scenario 2: Multi-node+non-shared directory**
Resumes the training based on the latest weight of all ranks.| √ |
-| Weight folder path | Weight file name | Resumes the training based on the weights specified by resume_training. | √ |
+| load_checkpoint | resume_training | Description | Recommended or Not |
+|---------------------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
+| Weight file path | True | Resumes a training based on the weights specified by load_checkpoint. | √ |
+| Weight file path | Weight file name | The file name specified by resume_training is invalid. A training is resumed based on the weights specified by load_checkpoint. | × |
+| Weight folder path | True | **Scenario 1: Single-node system, multi-node system+shared directory, or ModelArts**
1. Resumes the training based on the weights recorded in meta.json files and supports fault recovery.
2. Resumes the training based on the latest weight of all ranks if the meta.json file of any rank is missing.
**Scenario 2: Multi-node+non-shared directory**
Resumes the training based on the latest weight of all ranks. | √ |
+| Weight folder path | Weight file name | Resumes the training based on the weights specified by resume_training. | √ |
In addition, you can modify the following parameters in the configuration file to use related functions.
-| Parameter | Description |
-|------------------|-------------------------------------------------------------------------------------------------------------|
-| ignore_data_skip | Specifies whether to ignore the mechanism of skipping data during resumable training and read the dataset from the beginning instead. This parameter is used when the dataset is changed during resumable training. If this parameter is set to `True`, no data is skipped. The default value is `False`. |
-| data_skip_steps | Number of steps skipped for the dataset. This parameter is used when the training is interrupted again after being resumed because the dataset or `global batch size` is changed. You need to manually set this parameter to configure the number of steps skipped for the new dataset. If the `global batch size` is changed, you need to divide and round down its value by the scaling coefficient and then specify the result as the value of this parameter.|
+| Parameter | Description |
+|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| ignore_data_skip | Specifies whether to ignore the mechanism of skipping data during resumable training and read the dataset from the beginning instead. This parameter is used when the dataset is changed during resumable training. If this parameter is set to `True`, no data is skipped. The default value is `False`. |
+| data_skip_steps | Number of steps skipped for the dataset. This parameter is used when the training is interrupted again after being resumed because the dataset or `global batch size` is changed. You need to manually set this parameter to configure the number of steps skipped for the new dataset. If the `global batch size` is changed, you need to divide and round down its value by the scaling coefficient and then specify the result as the value of this parameter. |
#### Fault Recovery Mechanism
@@ -44,12 +44,12 @@ If `resume_training` is set to `True`, the system automatically resumes training
### Example of Distributed Training
-The following example shows how to enable resumable training in single-device and multi-device environments. The example is based on the `llama2_7b` model.
-For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/pretrain_llama2_7b.yaml).
+The following example shows how to enable resumable training in single-device and multi-device environments. The example is based on the `llama3.1 8b` model.
+For related configuration files, see [research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml).
#### Complete Training
-1. Modify `configs/llama2/pretrain_llama2_7b.yaml`.
+1. Modify `research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml`.
Configure the parallelism as required.
@@ -67,7 +67,7 @@ For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](ht
callbacks:
...
- type: CheckpointMonitor
- prefix: "llama2_7b"
+ prefix: "llama3_1_8b"
save_checkpoint_steps: 10
keep_checkpoint_max: 3
integrated_save: False
@@ -75,12 +75,12 @@ For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](ht
...
```
-2. Prepare a dataset. The following uses [wikitext2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87) as an example to describe how to start four-device distributed training.
+2. Prepare a dataset. The following uses [alpaca datasets](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%95%B0%E6%8D%AE%E9%9B%86%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87) as an example to describe how to start four-device distributed training.
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/pretrain_llama2_7b.yaml \
- --train_dataset /path/to/wikitext2-llama2.mindrecord \
+ --config research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml \
+ --train_dataset /path/to/alpaca-fastchat8192.mindrecord \
--run_mode train \
--use_parallel True" 4
```
@@ -89,9 +89,9 @@ For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](ht
```text
checkpoint/rank_0
- ├── llama2_7b_rank_0-10_2.ckpt
- ├── llama2_7b_rank_0-15_2.ckpt
- ├── llama2_7b_rank_0-20_2.ckpt
+ ├── llama3_1_8b_rank_0-10_2.ckpt
+ ├── llama3_1_8b_rank_0-15_2.ckpt
+ ├── llama3_1_8b_rank_0-20_2.ckpt
└── meta.json
```
@@ -108,8 +108,8 @@ For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](ht
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/pretrain_llama2_7b.yaml \
- --train_dataset /path/to/wikitext2-llama2.mindrecord \
+ --config research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml \
+ --train_dataset /path/to/alpaca-fastchat8192.mindrecord \
--run_mode train \
--use_parallel True" 4
```
@@ -168,12 +168,12 @@ If `global batch size` is changed (for example, doubled) when a training is resu
If some weight files are missing, the system automatically restores the files based on the latest available weight.
-1. Delete the `llama2_7b_rank_0-20_2.ckpt` file from the `rank_3` directory. The folder structure after the deletion is as follows:
+1. Delete the `llama3_1_8b_rank_0-20_2.ckpt` file from the `rank_3` directory. The folder structure after the deletion is as follows:
```text
checkpoint/rank_3
- ├── llama2_7b_rank_0-10_2.ckpt
- ├── llama2_7b_rank_0-15_2.ckpt
+ ├── llama3_1_8b_rank_0-10_2.ckpt
+ ├── llama3_1_8b_rank_0-15_2.ckpt
└── meta.json
```
@@ -188,8 +188,8 @@ If some weight files are missing, the system automatically restores the files ba
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/pretrain_llama2_7b.yaml \
- --train_dataset /path/to/wikitext2-llama2.mindrecord \
+ --config research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml \
+ --train_dataset /path/to/alpaca-fastchat8192.mindrecord \
--run_mode train \
--use_parallel True" 4
```
diff --git a/docs/mindformers/docs/source_zh_cn/feature/ckpt.md b/docs/mindformers/docs/source_zh_cn/feature/ckpt.md
index 29bda97b5622701189220ebca264f1078e9b95c4..ea73cea3d76187323f9b1956fddee37b4d57ae69 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/ckpt.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/ckpt.md
@@ -40,7 +40,7 @@ python convert_weight.py [-h] --model MODEL [--reversed] --input_path INPUT_PATH
### 转换示例
-假设用户已经下载了[Llama2模型的权重](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD),并保存在路径`/home/user/torch_weights`中,用户希望将其转换为MindSpore Transformers权重并保存在路径`/home/user/ms_weights`中,可以使用以下命令:
+假设用户已经下载了[Llama3.1模型的权重](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD),并保存在路径`/home/user/torch_weights`中,用户希望将其转换为MindSpore Transformers权重并保存在路径`/home/user/ms_weights`中,可以使用以下命令:
```bash
python convert_weight.py --model llama --input_path /home/user/torch_weights --output_path /home/user/ms_weights/llama.ckpt
@@ -50,21 +50,13 @@ python convert_weight.py --model llama --input_path /home/user/torch_weights --o
### 已支持模型
-| 参数取值 | 支持模型 |
-|-----------|-------------------------------------------|
-| llama | Llama2、Llama3、Llama3.1、CodeLlama |
-| baichuan2 | Baichuan2 |
-| glm-n | GLM2、GLM3、GLM3-32K、GLM4 |
-| cogvlm2 | CogVLM2-Video、CogVLM2-Image |
-| qwen | Qwen、Qwen1.5、Qwen2 |
-| qwenvl | QwenVL |
-| internlm | InternLM |
-| internlm2 | InternLM2 |
-| yi | Yi |
-| mixtral | Mixtral |
-| deepseek | DeepSeekCoder、DeepSeekCoder1.5、DeepSeekV2 |
-| gpt | GPT2 |
-| whisper | Whisper |
+| 参数取值 | 支持模型 |
+|----------|------------------------------|
+| llama | Llama3.1 |
+| glm-n | GLM4 |
+| qwen | Qwen2.5 |
+| mixtral | Mixtral |
+| deepseek | DeepSeekV3 |
### 未支持模型权重转换开发
diff --git a/docs/mindformers/docs/source_zh_cn/feature/quantization.md b/docs/mindformers/docs/source_zh_cn/feature/quantization.md
index 76d2c1a3b3a5dc84868835733ce69fffdf7ecad9..6f4b96a8837af5b39de7bdd3521f8fdcd3af57ea 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/quantization.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/quantization.md
@@ -16,4 +16,3 @@ MindSpore Transformers 集成 MindSpore Golden Stick 工具组件,提供统一
|-----------------------------------------------------------------------------------------------------------------------------------|
| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/predict_deepseek3_671b.yaml) |
| [DeepSeek-R1](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b.yaml) |
-| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_13b_ptq.yaml) |
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/feature/resume_training.md b/docs/mindformers/docs/source_zh_cn/feature/resume_training.md
index f9762887e3e6174842eab9859ccccaffecc81a59..3ef3704f4ed026b436868b0085faaa177023ebc2 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/resume_training.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/resume_training.md
@@ -44,12 +44,11 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中
### 分布式训练示例
-以下示例演示了如何在单卡和多卡环境中启动断点续训。示例基于`llama2_7b`
-模型,相关配置文件[configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/pretrain_llama2_7b.yaml)。
+以下示例演示了如何在单卡和多卡环境中启动断点续训。示例基于 `llama3.1 8b` 模型,相关配置文件[research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)。
#### 完整训练
-1. 修改`configs/llama2/pretrain_llama2_7b.yaml`:
+1. 修改`research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml`:
根据需要设置并行配置:
@@ -67,7 +66,7 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中
callbacks:
...
- type: CheckpointMonitor
- prefix: "llama2_7b"
+ prefix: "llama3_1_8b"
save_checkpoint_steps: 10
keep_checkpoint_max: 3
integrated_save: False
@@ -75,23 +74,23 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中
...
```
-2. 准备数据集,此处以[wikitext2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)为例,启动4卡分布式训练:
+2. 准备数据集,此处以 [alpaca 数据集](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%95%B0%E6%8D%AE%E9%9B%86%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)为例,启动4卡分布式训练:
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/pretrain_llama2_7b.yaml \
- --train_dataset /path/to/wikitext2-llama2.mindrecord \
+ --config research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml \
+ --train_dataset /path/to/alpaca-fastchat8192.mindrecord \
--run_mode train \
--use_parallel True" 4
```
- 在第四次保存完毕后,结束进程,此时`checkpoint`下的`rank_0`文件夹结构为:
+ 在第四次保存完毕后,结束进程,此时 `checkpoint` 下的 `rank_0` 文件夹结构为:
```text
checkpoint/rank_0
- ├── llama2_7b_rank_0-10_2.ckpt
- ├── llama2_7b_rank_0-15_2.ckpt
- ├── llama2_7b_rank_0-20_2.ckpt
+ ├── llama3_1_8b_rank_0-10_2.ckpt
+ ├── llama3_1_8b_rank_0-15_2.ckpt
+ ├── llama3_1_8b_rank_0-20_2.ckpt
└── meta.json
```
@@ -108,8 +107,8 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/pretrain_llama2_7b.yaml \
- --train_dataset /path/to/wikitext2-llama2.mindrecord \
+ --config research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml \
+ --train_dataset /path/to/alpaca-fastchat8192.mindrecord \
--run_mode train \
--use_parallel True" 4
```
@@ -168,12 +167,12 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中
当部分权重文件缺失时,系统会自动基于上一个可用的权重进行恢复。
-1. 删除`rank_3`下的`llama2_7b_rank_0-20_2.ckpt`文件。删除后文件夹结构应为:
+1. 删除`rank_3`下的`llama3_1_8b_rank_0-20_2.ckpt`文件。删除后文件夹结构应为:
```text
checkpoint/rank_3
- ├── llama2_7b_rank_0-10_2.ckpt
- ├── llama2_7b_rank_0-15_2.ckpt
+ ├── llama3_1_8b_rank_0-10_2.ckpt
+ ├── llama3_1_8b_rank_0-15_2.ckpt
└── meta.json
```
@@ -188,8 +187,8 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/pretrain_llama2_7b.yaml \
- --train_dataset /path/to/wikitext2-llama2.mindrecord \
+ --config research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml \
+ --train_dataset /path/to/alpaca-fastchat8192.mindrecord \
--run_mode train \
--use_parallel True" 4
```