diff --git a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md index 42d6b8110a02ca80f5f9e648b668256c68402439..76331cf273bba369560980e69bd7aaa4e0047ba2 100644 --- a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md +++ b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md @@ -93,7 +93,7 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l `register_path` is set to `research/llama3_1` (path of the directory where the external code is located). For details about how to prepare the model weight, see [Llama3.1 Description Document > Model Weight Download](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD). -For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama2-7B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml). +For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama3_1-8B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml). After all the preceding basic elements are prepared, you can refer to other documents in the MindSpore Transformers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/precision_optimization.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html). @@ -122,18 +122,16 @@ The differences are as follows: - In Llama3-8B, the special word metaindex is modified. Therefore, `bos_token_id` is set to `128000`, `eos_token_id` is set to `128001`, and `pad_token_id` is set to `128002`. - In Llama3-8B, the value of **theta** in the rotation position code is changed to **500000**. Therefore, `theta` is set to `500000`. -After modifying the corresponding content in the `YAML` file of Llama2-7B, you can obtain the [Llama3-8B configuration file](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_8b/finetune_llama3_8b.yaml). +After modifying the corresponding content in the `YAML` file of Llama2-7B, you can obtain the Llama3-8B configuration file. #### Tokenizer -Llama3-8B re-implements the tokenizer. According to the official implementation, PretrainedTokenizer is inherited from MindSpore Transformers to implement Llama3Tokenizer, which is written in [llama3_tokenizer.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_tokenizer.py). +Llama3-8B re-implements the tokenizer. According to the official implementation, PretrainedTokenizer is inherited from MindSpore Transformers to implement Llama3Tokenizer. #### Weight Conversion -The parameters of Llama3-8B are the same as those of Llama2-7B. Therefore, the weight conversion process of Llama2-7B can be reused. For details, see [Llama3 Document > Weight Conversion](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E8%BD%AC%E6%8D%A2). +The parameters of Llama3-8B are the same as those of Llama2-7B. Therefore, the weight conversion process of Llama2-7B can be reused. #### Dataset Processing -The tokenizer of Llama3-8B is different from that of Llama2-7B. Therefore, you need to replace the tokenizer of Llama3-8B to preprocess data based on the dataset processing script of Llama2-7B. For details, see [conversation.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_conversation.py) and [llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_preprocess.py). - -For details about the implementation of Llama3 in MindSpore Transformers, see [Llama3 folder](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) in the MindSpore Transformers repository. For details about how to use Llama3 in MindSpore Transformers, see [LLama3 documents](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md). +The tokenizer of Llama3-8B is different from that of Llama2-7B. Therefore, you need to replace the tokenizer of Llama3-8B to preprocess data based on the dataset processing script of Llama2-7B. diff --git a/docs/mindformers/docs/source_en/feature/dataset.md b/docs/mindformers/docs/source_en/feature/dataset.md index 86a2c3ca070e2df787ae83dee5d62bd60038d806..88001ff761d6339b9c0a2fa91d325d5d5de5dff1 100644 --- a/docs/mindformers/docs/source_en/feature/dataset.md +++ b/docs/mindformers/docs/source_en/feature/dataset.md @@ -226,7 +226,7 @@ The following explains how to configure and use Megatron datasets in the configu 3. Start Model Pre-training After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the model pre-training task. - Here, we take the [Llama3 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md) as an example. + Here, we take the [Llama3_1 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md) as an example. ## HuggingFace Datasets @@ -593,8 +593,6 @@ Users can define custom data handlers to apply various preprocessing logic to th - ADGEN Dataset Sample - Modify the task configuration file [run_glm3_6b_finetune_2k_800T_A2_64G.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/glm3/run_glm3_6b_finetune_2k_800T_A2_64G.yaml). - Modify the following parameters: ```yaml @@ -745,7 +743,7 @@ The [datasets_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/to MindRecord is an efficient data storage and reading module provided by MindSpore. It reduces disk IO and network IO overhead, resulting in a better data loading experience. For more detailed feature introductions, refer to the [documentation](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.mindrecord.html). Here, we only cover how to use MindRecord in MindSpore Transformers model training tasks. -The following example uses `qwen2-0.5b` fine-tuning to explain related functionalities. +The following example uses `qwen2_5-0.5b` fine-tuning to explain related functionalities. ### Data Preprocessing @@ -786,11 +784,11 @@ The following example uses `qwen2-0.5b` fine-tuning to explain related functiona ### Model Fine-tuning -Following the above data preprocessing steps, you can generate a MindRecord dataset for fine-tuning the `qwen2-0.5b` model. Below is an introduction on how to use the generated data file to start the model fine-tuning task. +Following the above data preprocessing steps, you can generate a MindRecord dataset for fine-tuning the `qwen2_5-0.5b` model. Below is an introduction on how to use the generated data file to start the model fine-tuning task. 1. Modify the model configuration file - The `qwen2-0.5b` model fine-tuning uses the [finetune_qwen2_0.5b_32k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/qwen2_0_5b/finetune_qwen2_0.5b_32k.yaml) configuration file. Modify the dataset section as follows: + The `qwen2_5-0.5b` model fine-tuning uses the [finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml) configuration file. Modify the dataset section as follows: ```yaml train_dataset: &train_dataset @@ -808,7 +806,7 @@ Following the above data preprocessing steps, you can generate a MindRecord data 2. Start Model Fine-tuning - After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the fine-tuning task. Here, we take the [Qwen2 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/README.md) as an example. + After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the fine-tuning task. Here, we take the [Qwen2_5 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/README.md) as an example. ### Multi-source Datasets diff --git a/docs/mindformers/docs/source_en/feature/parallel_training.md b/docs/mindformers/docs/source_en/feature/parallel_training.md index 14d19251edeaf7db64eb66d1b71b8cf0f1ee6f3e..286fa31691937a51ac9d3142c9b31dad8f6ab1fd 100644 --- a/docs/mindformers/docs/source_en/feature/parallel_training.md +++ b/docs/mindformers/docs/source_en/feature/parallel_training.md @@ -157,7 +157,7 @@ For more information on configuring distributed parallel parameters, see the [Mi ## MindSpore Transformers Distributed Parallel Application Practices -In the [Llama3-70B fine-tuning configuration](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_70b/finetune_llama3_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows: +In the [Llama3_1-70B fine-tuning configuration](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows: - **Data parallelism**: No additional data parallelism is enabled (`data_parallel: 1`). - **Model parallelism**: A model is sliced into eight parts, which are computed on different devices (`model_parallel: 8`). @@ -168,4 +168,4 @@ In the [Llama3-70B fine-tuning configuration](https://gitee.com/mindspore/mindfo > Sequential parallelism must be turned on at the same time that fine-grained multicopy parallelism is turned on. -With the preceding configurations, the distributed training on Llama3-70B can effectively utilize hardware resources in a multi-node multi-device environment to implement efficient and stable model training. +With the preceding configurations, the distributed training on Llama3_1-70B can effectively utilize hardware resources in a multi-node multi-device environment to implement efficient and stable model training. diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md index fc2cfe2ab98727886724b2cef09e883c7d4d87f6..48442dd5b480b426fa01a7a661402e8b28aaf918 100644 --- a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md +++ b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md @@ -93,7 +93,7 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l 其中设置了`register_path`为外挂代码所在目录的路径`research/llama3_1`,模型权重的准备参考[Llama3.1说明文档——模型权重下载](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD)。 -配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。在实际编写配置文件时,也可以参考库内已有的配置文件,例如[Llama2-7B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml)。 +配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。在实际编写配置文件时,也可以参考库内已有的配置文件,例如[Llama3_1-8B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)。 在准备完上述所有基本要素之后,可以参考MindSpore Transformers使用教程中的其余文档进行模型训练、微调、推理等流程的实践。后续模型调试调优可以参考[大模型精度调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/precision_optimization.html)和[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。 @@ -122,18 +122,16 @@ Llama3-8B与Llama2-7B拥有相同的模型结构,只有部分模型参数、 - Llama3-8B修改了特殊词元索引,修改`bos_token_id`为`128000`、`eos_token_id`为`128001`、`pad_token_id`为`128002`。 - Llama3-8B修改了旋转位置编码中的theta值为500000,修改`theta`为`500000`。 -修改Llama2-7B的`YAML`配置文件中的对应内容即可得到[Llama3-8B的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_8b/finetune_llama3_8b.yaml)。 +修改Llama2-7B的`YAML`配置文件中的对应内容即可得到Llama3-8B的配置文件。 #### 分词器 -Llama3-8B重新实现了分词器。对照官方的实现,继承MindSpore Transformers中的PretrainedTokenizer实现Llama3Tokenizer,编写在[llama3_tokenizer.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_tokenizer.py)中。 +Llama3-8B重新实现了分词器。对照官方的实现,继承MindSpore Transformers中的PretrainedTokenizer实现Llama3Tokenizer。 #### 权重转换 -Llama3-8B的参数命名和Llama2-7B一致,因此可以复用Llama2-7B的权重转换流程,参考[Llama3文档的权重转换章节](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E8%BD%AC%E6%8D%A2)。 +Llama3-8B的参数命名和Llama2-7B一致,因此可以复用Llama2-7B的权重转换流程。 #### 数据集处理 -由于Llama3-8B的分词器与Llama2-7B不同,因此Llama3-8B需要在Llama2-7B的数据集处理脚本的基础上,替换Llama3-8B的分词器对数据进行预处理,参考[conversation.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_conversation.py)和[llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_preprocess.py)。 - -关于MindSpore Transformers中Llama3的具体实现,可以参考MindSpore Transformers仓库中[Llama3的文件夹](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3)。关于MindSpore Transformers中Llama3的使用,可以参考[LLama3的说明文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md)。 +由于Llama3-8B的分词器与Llama2-7B不同,因此Llama3-8B需要在Llama2-7B的数据集处理脚本的基础上,替换Llama3-8B的分词器对数据进行预处理。 diff --git a/docs/mindformers/docs/source_zh_cn/feature/dataset.md b/docs/mindformers/docs/source_zh_cn/feature/dataset.md index bbce4057ccea16cd07934e30406b6367b8a16153..c548acb989c79572c5fb73831185cc482be8e8da 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/dataset.md +++ b/docs/mindformers/docs/source_zh_cn/feature/dataset.md @@ -216,7 +216,7 @@ MindSpore Transformers推荐用户使用Megatron数据集进行模型预训练 3. 启动模型预训练 - 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型预训练任务,这里以[Llama3模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/README.md)为例。 + 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型预训练任务,这里以[Llama3_1模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md)为例。 ## HuggingFace数据集 @@ -579,8 +579,6 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64" - ADGEN 数据集示例 - 修改任务配置文件 [run_glm3_6b_finetune_2k_800T_A2_64G.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/glm3/run_glm3_6b_finetune_2k_800T_A2_64G.yaml)。 - 修改如下参数: ```yaml @@ -729,7 +727,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64" MindRecord是MindSpore提供的高效数据存储/读取模块,可以减少磁盘IO、网络IO开销,从而获得更好的数据加载体验,更多具体功能介绍可参考[文档](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.mindrecord.html),这里仅对如何在MindSpore Transformers模型训练任务中使用MindRecord进行介绍。 -下面以`qwen2-0.5b`进行微调为示例进行相关功能说明。 +下面以`qwen2_5-0.5b`进行微调为示例进行相关功能说明。 ### 数据预处理 @@ -770,11 +768,11 @@ MindRecord是MindSpore提供的高效数据存储/读取模块,可以减少磁 ### 模型微调 -参考上述数据预处理流程可生成用于`qwen2-0.5b`模型微调的MindRecord数据集,下面介绍如何使用生成的数据文件启动模型微调任务。 +参考上述数据预处理流程可生成用于`qwen2_5-0.5b`模型微调的MindRecord数据集,下面介绍如何使用生成的数据文件启动模型微调任务。 1. 修改模型配置文件 - `qwen2-0.5b`模型微调使用[finetune_qwen2_0.5b_32k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/qwen2_0_5b/finetune_qwen2_0.5b_32k.yaml)配置文件,修改其中数据集部分配置: + `qwen2_5-0.5b`模型微调使用[finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml)配置文件,修改其中数据集部分配置: ```yaml train_dataset: &train_dataset @@ -792,7 +790,7 @@ MindRecord是MindSpore提供的高效数据存储/读取模块,可以减少磁 2. 启动模型微调 - 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型微调任务,这里以[Qwen2模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2/README.md)为例。 + 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型微调任务,这里以[Qwen2_5模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/README.md)为例。 ### 多源数据集 diff --git a/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md index 7fe00f2d6da397fc1e50f97a4831b872d2fc9af7..d57bcb4a23802cb6f2ab984cd9a05907fdeb9b53 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md +++ b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md @@ -214,7 +214,7 @@ parallel: 参数说明: -- enable_parallel_optimizer:是否开启优化器并行,默认为Fasle。 +- enable_parallel_optimizer:是否开启优化器并行,默认为`False`。 关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 @@ -243,7 +243,7 @@ model_config: ## MindSpore Transformers 分布式并行应用实践 -在官网提供的[Llama3-70B微调配置](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_70b/finetune_llama3_70b.yaml#)文件中,使用了多种分布式并行策略,以提升多机多卡环境中的训练效率。以下是该配置文件中涉及的主要并行策略和关键参数: +在官网提供的[Llama3_1-70B微调配置](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#)文件中,使用了多种分布式并行策略,以提升多机多卡环境中的训练效率。以下是该配置文件中涉及的主要并行策略和关键参数: - **数据并行**:未启用额外的数据并行(`data_parallel: 1`)。 - **模型并行**:模型被切分成8个部分,在不同设备上计算(`model_parallel: 8`)。 @@ -254,4 +254,4 @@ model_config: > 开启细粒度多副本并行的同时必须开启序列并行。 -通过以上配置,Llama3-70B的分布式训练在多机多卡环境中可以有效利用硬件资源,实现高效、稳定的模型训练。 +通过以上配置,Llama3_1-70B的分布式训练在多机多卡环境中可以有效利用硬件资源,实现高效、稳定的模型训练。