diff --git a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
index 42d6b8110a02ca80f5f9e648b668256c68402439..76331cf273bba369560980e69bd7aaa4e0047ba2 100644
--- a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
+++ b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
@@ -93,7 +93,7 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l
 
 `register_path` is set to `research/llama3_1` (path of the directory where the external code is located). For details about how to prepare the model weight, see [Llama3.1 Description Document > Model Weight Download](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD).
 
-For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama2-7B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml).
+For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama3_1-8B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml).
 
 After all the preceding basic elements are prepared, you can refer to other documents in the MindSpore Transformers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/precision_optimization.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html).
 
@@ -122,18 +122,16 @@ The differences are as follows:
 - In Llama3-8B, the special word metaindex is modified. Therefore, `bos_token_id` is set to `128000`, `eos_token_id` is set to `128001`, and `pad_token_id` is set to `128002`.
 - In Llama3-8B, the value of **theta** in the rotation position code is changed to **500000**. Therefore, `theta` is set to `500000`.
 
-After modifying the corresponding content in the `YAML` file of Llama2-7B, you can obtain the [Llama3-8B configuration file](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_8b/finetune_llama3_8b.yaml).
+After modifying the corresponding content in the `YAML` file of Llama2-7B, you can obtain the Llama3-8B configuration file.
 
 #### Tokenizer
 
-Llama3-8B re-implements the tokenizer. According to the official implementation, PretrainedTokenizer is inherited from MindSpore Transformers to implement Llama3Tokenizer, which is written in [llama3_tokenizer.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_tokenizer.py).
+Llama3-8B re-implements the tokenizer. According to the official implementation, PretrainedTokenizer is inherited from MindSpore Transformers to implement Llama3Tokenizer.
 
 #### Weight Conversion
 
-The parameters of Llama3-8B are the same as those of Llama2-7B. Therefore, the weight conversion process of Llama2-7B can be reused. For details, see [Llama3 Document > Weight Conversion](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E8%BD%AC%E6%8D%A2).
+The parameters of Llama3-8B are the same as those of Llama2-7B. Therefore, the weight conversion process of Llama2-7B can be reused.
 
 #### Dataset Processing
 
-The tokenizer of Llama3-8B is different from that of Llama2-7B. Therefore, you need to replace the tokenizer of Llama3-8B to preprocess data based on the dataset processing script of Llama2-7B. For details, see [conversation.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_conversation.py) and [llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_preprocess.py).
-
-For details about the implementation of Llama3 in MindSpore Transformers, see [Llama3 folder](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) in the MindSpore Transformers repository. For details about how to use Llama3 in MindSpore Transformers, see [LLama3 documents](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md).
+The tokenizer of Llama3-8B is different from that of Llama2-7B. Therefore, you need to replace the tokenizer of Llama3-8B to preprocess data based on the dataset processing script of Llama2-7B.
diff --git a/docs/mindformers/docs/source_en/feature/dataset.md b/docs/mindformers/docs/source_en/feature/dataset.md
index 86a2c3ca070e2df787ae83dee5d62bd60038d806..88001ff761d6339b9c0a2fa91d325d5d5de5dff1 100644
--- a/docs/mindformers/docs/source_en/feature/dataset.md
+++ b/docs/mindformers/docs/source_en/feature/dataset.md
@@ -226,7 +226,7 @@ The following explains how to configure and use Megatron datasets in the configu
 3. Start Model Pre-training
 
    After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the model pre-training task.
-   Here, we take the [Llama3 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md) as an example.
+   Here, we take the [Llama3_1 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md) as an example.
 
 ## HuggingFace Datasets
 
@@ -593,8 +593,6 @@ Users can define custom data handlers to apply various preprocessing logic to th
 
 - ADGEN Dataset Sample
 
-  Modify the task configuration file [run_glm3_6b_finetune_2k_800T_A2_64G.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/glm3/run_glm3_6b_finetune_2k_800T_A2_64G.yaml).
-
   Modify the following parameters:
 
   ```yaml
@@ -745,7 +743,7 @@ The [datasets_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/to
 
 MindRecord is an efficient data storage and reading module provided by MindSpore. It reduces disk IO and network IO overhead, resulting in a better data loading experience. For more detailed feature introductions, refer to the [documentation](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.mindrecord.html). Here, we only cover how to use MindRecord in MindSpore Transformers model training tasks.
 
-The following example uses `qwen2-0.5b` fine-tuning to explain related functionalities.
+The following example uses `qwen2_5-0.5b` fine-tuning to explain related functionalities.
 
 ### Data Preprocessing
 
@@ -786,11 +784,11 @@ The following example uses `qwen2-0.5b` fine-tuning to explain related functiona
 
 ### Model Fine-tuning
 
-Following the above data preprocessing steps, you can generate a MindRecord dataset for fine-tuning the `qwen2-0.5b` model. Below is an introduction on how to use the generated data file to start the model fine-tuning task.
+Following the above data preprocessing steps, you can generate a MindRecord dataset for fine-tuning the `qwen2_5-0.5b` model. Below is an introduction on how to use the generated data file to start the model fine-tuning task.
 
 1. Modify the model configuration file
 
-   The `qwen2-0.5b` model fine-tuning uses the [finetune_qwen2_0.5b_32k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/qwen2_0_5b/finetune_qwen2_0.5b_32k.yaml) configuration file. Modify the dataset section as follows:
+   The `qwen2_5-0.5b` model fine-tuning uses the [finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml) configuration file. Modify the dataset section as follows:
 
    ```yaml
    train_dataset: &train_dataset
@@ -808,7 +806,7 @@ Following the above data preprocessing steps, you can generate a MindRecord data
 
 2. Start Model Fine-tuning
 
-   After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the fine-tuning task. Here, we take the [Qwen2 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/README.md) as an example.
+   After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the fine-tuning task. Here, we take the [Qwen2_5 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/README.md) as an example.
 
 ### Multi-source Datasets
 
diff --git a/docs/mindformers/docs/source_en/feature/parallel_training.md b/docs/mindformers/docs/source_en/feature/parallel_training.md
index 14d19251edeaf7db64eb66d1b71b8cf0f1ee6f3e..286fa31691937a51ac9d3142c9b31dad8f6ab1fd 100644
--- a/docs/mindformers/docs/source_en/feature/parallel_training.md
+++ b/docs/mindformers/docs/source_en/feature/parallel_training.md
@@ -157,7 +157,7 @@ For more information on configuring distributed parallel parameters, see the [Mi
 
 ## MindSpore Transformers Distributed Parallel Application Practices
 
-In the [Llama3-70B fine-tuning configuration](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_70b/finetune_llama3_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows:
+In the [Llama3_1-70B fine-tuning configuration](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows:
 
 - **Data parallelism**: No additional data parallelism is enabled (`data_parallel: 1`).
 - **Model parallelism**: A model is sliced into eight parts, which are computed on different devices (`model_parallel: 8`).
@@ -168,4 +168,4 @@ In the [Llama3-70B fine-tuning configuration](https://gitee.com/mindspore/mindfo
 
 > Sequential parallelism must be turned on at the same time that fine-grained multicopy parallelism is turned on.
 
-With the preceding configurations, the distributed training on Llama3-70B can effectively utilize hardware resources in a multi-node multi-device environment to implement efficient and stable model training.
+With the preceding configurations, the distributed training on Llama3_1-70B can effectively utilize hardware resources in a multi-node multi-device environment to implement efficient and stable model training.
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
index fc2cfe2ab98727886724b2cef09e883c7d4d87f6..48442dd5b480b426fa01a7a661402e8b28aaf918 100644
--- a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
@@ -93,7 +93,7 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l
 
 其中设置了`register_path`为外挂代码所在目录的路径`research/llama3_1`，模型权重的准备参考[Llama3.1说明文档——模型权重下载](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD)。
 
-配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。在实际编写配置文件时，也可以参考库内已有的配置文件，例如[Llama2-7B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml)。
+配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。在实际编写配置文件时，也可以参考库内已有的配置文件，例如[Llama3_1-8B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)。
 
 在准备完上述所有基本要素之后，可以参考MindSpore Transformers使用教程中的其余文档进行模型训练、微调、推理等流程的实践。后续模型调试调优可以参考[大模型精度调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/precision_optimization.html)和[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。
 
@@ -122,18 +122,16 @@ Llama3-8B与Llama2-7B拥有相同的模型结构，只有部分模型参数、
 - Llama3-8B修改了特殊词元索引，修改`bos_token_id`为`128000`、`eos_token_id`为`128001`、`pad_token_id`为`128002`。
 - Llama3-8B修改了旋转位置编码中的theta值为500000，修改`theta`为`500000`。
 
-修改Llama2-7B的`YAML`配置文件中的对应内容即可得到[Llama3-8B的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_8b/finetune_llama3_8b.yaml)。
+修改Llama2-7B的`YAML`配置文件中的对应内容即可得到Llama3-8B的配置文件。
 
 #### 分词器
 
-Llama3-8B重新实现了分词器。对照官方的实现，继承MindSpore Transformers中的PretrainedTokenizer实现Llama3Tokenizer，编写在[llama3_tokenizer.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_tokenizer.py)中。
+Llama3-8B重新实现了分词器。对照官方的实现，继承MindSpore Transformers中的PretrainedTokenizer实现Llama3Tokenizer。
 
 #### 权重转换
 
-Llama3-8B的参数命名和Llama2-7B一致，因此可以复用Llama2-7B的权重转换流程，参考[Llama3文档的权重转换章节](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E8%BD%AC%E6%8D%A2)。
+Llama3-8B的参数命名和Llama2-7B一致，因此可以复用Llama2-7B的权重转换流程。
 
 #### 数据集处理
 
-由于Llama3-8B的分词器与Llama2-7B不同，因此Llama3-8B需要在Llama2-7B的数据集处理脚本的基础上，替换Llama3-8B的分词器对数据进行预处理，参考[conversation.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_conversation.py)和[llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_preprocess.py)。
-
-关于MindSpore Transformers中Llama3的具体实现，可以参考MindSpore Transformers仓库中[Llama3的文件夹](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3)。关于MindSpore Transformers中Llama3的使用，可以参考[LLama3的说明文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md)。
+由于Llama3-8B的分词器与Llama2-7B不同，因此Llama3-8B需要在Llama2-7B的数据集处理脚本的基础上，替换Llama3-8B的分词器对数据进行预处理。
diff --git a/docs/mindformers/docs/source_zh_cn/feature/dataset.md b/docs/mindformers/docs/source_zh_cn/feature/dataset.md
index bbce4057ccea16cd07934e30406b6367b8a16153..c548acb989c79572c5fb73831185cc482be8e8da 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/dataset.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/dataset.md
@@ -216,7 +216,7 @@ MindSpore Transformers推荐用户使用Megatron数据集进行模型预训练
 
 3. 启动模型预训练
 
-   修改模型配置文件中数据集以及并行相关配置项之后，即可参考模型文档拉起模型预训练任务，这里以[Llama3模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md)为例。
+   修改模型配置文件中数据集以及并行相关配置项之后，即可参考模型文档拉起模型预训练任务，这里以[Llama3_1模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md)为例。
 
 ## HuggingFace数据集
 
@@ -579,8 +579,6 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64"
 
 - ADGEN 数据集示例
 
-  修改任务配置文件 [run_glm3_6b_finetune_2k_800T_A2_64G.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/glm3/run_glm3_6b_finetune_2k_800T_A2_64G.yaml)。
-
   修改如下参数：
 
   ```yaml
@@ -729,7 +727,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64"
 
 MindRecord是MindSpore提供的高效数据存储/读取模块，可以减少磁盘IO、网络IO开销，从而获得更好的数据加载体验，更多具体功能介绍可参考[文档](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.mindrecord.html)，这里仅对如何在MindSpore Transformers模型训练任务中使用MindRecord进行介绍。
 
-下面以`qwen2-0.5b`进行微调为示例进行相关功能说明。
+下面以`qwen2_5-0.5b`进行微调为示例进行相关功能说明。
 
 ### 数据预处理
 
@@ -770,11 +768,11 @@ MindRecord是MindSpore提供的高效数据存储/读取模块，可以减少磁
 
 ### 模型微调
 
-参考上述数据预处理流程可生成用于`qwen2-0.5b`模型微调的MindRecord数据集，下面介绍如何使用生成的数据文件启动模型微调任务。
+参考上述数据预处理流程可生成用于`qwen2_5-0.5b`模型微调的MindRecord数据集，下面介绍如何使用生成的数据文件启动模型微调任务。
 
 1. 修改模型配置文件
 
-   `qwen2-0.5b`模型微调使用[finetune_qwen2_0.5b_32k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/qwen2_0_5b/finetune_qwen2_0.5b_32k.yaml)配置文件，修改其中数据集部分配置：
+   `qwen2_5-0.5b`模型微调使用[finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml)配置文件，修改其中数据集部分配置：
 
    ```yaml
    train_dataset: &train_dataset
@@ -792,7 +790,7 @@ MindRecord是MindSpore提供的高效数据存储/读取模块，可以减少磁
 
 2. 启动模型微调
 
-   修改模型配置文件中数据集以及并行相关配置项之后，即可参考模型文档拉起模型微调任务，这里以[Qwen2模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/README.md)为例。
+   修改模型配置文件中数据集以及并行相关配置项之后，即可参考模型文档拉起模型微调任务，这里以[Qwen2_5模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/README.md)为例。
 
 ### 多源数据集
 
diff --git a/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md
index 7fe00f2d6da397fc1e50f97a4831b872d2fc9af7..d57bcb4a23802cb6f2ab984cd9a05907fdeb9b53 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md
@@ -214,7 +214,7 @@ parallel:
 
 参数说明：
 
-- enable_parallel_optimizer：是否开启优化器并行，默认为Fasle。
+- enable_parallel_optimizer：是否开启优化器并行，默认为`False`。
 
 关于分布式并行参数的配置方法，参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。
 
@@ -243,7 +243,7 @@ model_config:
 
 ## MindSpore Transformers 分布式并行应用实践
 
-在官网提供的[Llama3-70B微调配置](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_70b/finetune_llama3_70b.yaml#)文件中，使用了多种分布式并行策略，以提升多机多卡环境中的训练效率。以下是该配置文件中涉及的主要并行策略和关键参数：
+在官网提供的[Llama3_1-70B微调配置](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#)文件中，使用了多种分布式并行策略，以提升多机多卡环境中的训练效率。以下是该配置文件中涉及的主要并行策略和关键参数：
 
 - **数据并行**：未启用额外的数据并行（`data_parallel: 1`）。
 - **模型并行**：模型被切分成8个部分，在不同设备上计算（`model_parallel: 8`）。
@@ -254,4 +254,4 @@ model_config:
 
 > 开启细粒度多副本并行的同时必须开启序列并行。
 
-通过以上配置，Llama3-70B的分布式训练在多机多卡环境中可以有效利用硬件资源，实现高效、稳定的模型训练。
+通过以上配置，Llama3_1-70B的分布式训练在多机多卡环境中可以有效利用硬件资源，实现高效、稳定的模型训练。