diff --git a/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md b/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md index 17450d31a580ffb0d3e64411ccb37cbf0f1d3cb4..aa5bf298dbbc70716df128040b968b5eae184e72 100644 --- a/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md +++ b/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md @@ -24,8 +24,8 @@ Before locating the operator accuracy problem, we should first eliminate the int * Generalized structure (Llama2 as an example) -| Key parameters | Descriptions | CheckList | -| ----------------- | ------------------------------------------------------------ |------------------------------------------------------------------------------------------------------------------------------------| +| Key parameters | Descriptions | CheckList | +| ----------------- | ------------------------- |---------------------------------| | num_layers | transformer layers | Check for alignment with benchmarks | | num_heads | The number of attention heads in transformer | Check for alignment with benchmarks | | hidden_size | Transformer hidden layer size | Check for alignment with benchmarks | @@ -33,7 +33,7 @@ Before locating the operator accuracy problem, we should first eliminate the int | Attention | Attention module in transformer |
- Check that the following structures and calculations are aligned: the attention structure has different structures such as MQA, GQA, and MHA.
- Sparse computational models: causal/sliding window attention (SWA), etc.
- Whether the matrix of wq/wk/wv has a fusion computation. | | normalization | Regularization functions, common structures are LayerNorm, RMSNorm | Check for alignment with benchmarks | | normal_eps | Regularized epsilon parameters | Check for alignment with benchmarks | -| dropout | dropout in the network | Currently, when MindSpore opens Dropout, recalculation cannot be enabled; if precision comparison is carried out, it is recommended that both sides be closed to reduce the random factor. | +| dropout | Dropout in the network | Currently, when MindSpore opens Dropout, recalculation cannot be enabled; if precision comparison is carried out, it is recommended that both sides be closed to reduce the random factor. | | activation function | Common activation functions ReLU/GeLU/FastGeLU/SwigLU etc. | Check for alignment with benchmarks | | fusion computation | Common fusion operators include FA, ROPE, Norm, SwigLU; some users will fuse Wq, Wk, Wv for computation | When comparing accuracy on the same hardware, if fusion algorithms are used, they need to be consistent. When comparing accuracy on different hardware, focus on checking whether there are differences in the fusion calculation. | | position code | / | Check the way to use positional coding: absolute/relative positional coding. | @@ -76,14 +76,14 @@ Before locating the operator accuracy problem, we should first eliminate the int ### Mixed-precision CheckList -| Key parameters | Descriptions | CheckList | -| ----------------- | ------------------------------------------------------------ |------------------------------------------------------------------------------------------------------------------------------------| +| Key parameters | Descriptions | CheckList | +| ----------------- | ----------------------------------------- |---------------------------------------| | compute_dtype | Compute accuracy | Keep alignment with benchmarks | | layernorm_compute_type | layerNorm/RMSNorm compute precision | Megatron is not configurable, need to check that implementations are consistent. | | softmax_compute_type | When MindSpore uses FlashAttention, the internal Softmax fix is calculated with FA. | Megatron is not configurable, needs to check if the implementation is consistent. |MindSpore | Calculation of weights | accuracy calculation for each weight such as, Embedding, lm_head, type of calculation is configurable only for small arithmetic splicing implementations | Megatron is not configurable, need to check that implementations are consistent. | | rotary_dtype | Calculation accuracy of rotary position encoding | Since MindFormers weight initialization needs to be set to fp32, and the usual calculation precision is bf16/fp16, it is necessary to check whether the weight data type is converted to bf16/fp16 before weight calculation. | -| bias add | Bias in the linear layer | If bias is present, Linear layer checks consistency in the computational accuracy of add. | +| bias add | bias in the linear layer | If bias is present, Linear layer checks consistency in the computational accuracy of add. | | residual add | sum of residuals | Check that the accuracy of the calculation of the residuals is consistent with the benchmarks | | loss | Loss Calculation Module | Check that the accuracy of the calculation of the entire loss module is consistent with the benchmarks | | Operator High Precision Mode | Ascend Calculator supports high precision mode | Method: context.set_context(ascend_config= {"ge_options":{ "global":{ "ge.opSelectImplmode":"high_precision" } } }) | @@ -469,7 +469,7 @@ After completing the single card training, start the multi-card training test: s ![loss6](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss6.png) -To verify that this error is within reasonable limits, the deterministic computation was turned off and the GPU experiment was run twice repeatedly. The red line in the figure is the curve of MindSpore training, and the blue and green lines are the curves of the first and second GPU training, respectively. At the training instability around 7K steps, the curve of MindSpore training is right between the curves of the two GPU trainings, indicating that the error is within a reasonable range and the problem is finally solved. +To verify that this error is within reasonable limits, the deterministic computation was turned off and the GPU experiment was run twice repeatedly. The red line in the figure is the curve of MindSpore training, and the blue and green lines are the curves of the first and second GPU training, respectively. At the training instability around 7000 steps, the curve of MindSpore training is right between the curves of the two GPU trainings, indicating that the error is within a reasonable range and the problem is finally solved. ![loss7](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss7.png) diff --git a/docs/mindformers/docs/source_en/function/distributed_parallel.md b/docs/mindformers/docs/source_en/function/distributed_parallel.md index 462cb9c57974b55f2615e949e9d26bd60cc2a658..1fd7e87c80d659a91eb8cc69523263854de9d54a 100644 --- a/docs/mindformers/docs/source_en/function/distributed_parallel.md +++ b/docs/mindformers/docs/source_en/function/distributed_parallel.md @@ -1,3 +1,51 @@ -# Distributed Parallel +# Distributed Parallelism -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/distributed_parallel.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/distributed_parallel.md) + +## Parallel Modes and Application Scenarios + +Large-scale deep learning model training requires robust computing power, especially in the case of a large dataset and a complex model architecture. As such, a single device usually cannot meet this requirement. To solve this problem, MindSpore provides a set of powerful parallelism strategies for configuration. You can use flexible parallelism strategies to greatly improve training efficiency and reduce computing resource consumption. + +MindSpore offers parallel modes including data parallelism, model parallelism, pipeline parallelism, and sequence parallelism. They can be used independently or combined as a hybrid parallelism strategy to meet different model training requirements. By adopting proper parallelism strategies, you can leverage the computing resources of multiple devices, significantly improving the training efficiency. + +In actual applications, different parallelism strategies apply to different scenarios. + +- **Data parallelism**: applies to a simple model with a lot of data. +- **Model parallelism**: applies to a model with a huge number of parameters that a single device cannot accommodate. +- **Pipeline parallelism**: applies to ultra-large-scale model training that requires multi-device computing. +- **Sequence parallelism**: applies to a model with input of long sequences, reducing the GPU memory usage of a single device. +- **Multi-copy parallelism**: uses sequential scheduling algorithm to control the parallelism of fine-grained multi-branch operations, improving the overlap of computing and communications. +- **Optimizer parallelism**: distributes computing tasks of optimizers to multiple devices to reduce memory usage and improve training efficiency. + +> The parallelism strategy configuration in the YAML file provided by the repository has been optimized. Currently, you are recommended to use semi-automatic parallelism for optimal performance and stability. + +## Parallelism Features Supported by MindFormers + +MindFormers supports multiple parallelism features. You can use these features to optimize the training of different model architectures and hardware configurations. The following table outlines these parallelism features and provides links to the details in the MindSpore documentation. + +| **Parallelism Feature** | **Description** | +|-----------------------------------|---------------------------------------------------------------------------------| +| **[Data parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/data_parallel.html)** | Splits data to multiple devices and trains the data on each device at the same time. This mode applies to training a simple model with a lot of data. | +| **[Model parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/operator_parallel.html)** | Distributes model parameters to multiple devices. This mode applies to the scenario where a single device cannot accommodate the entire model. | +| **[Pipeline parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/pipeline_parallel.html)** | Divides an ultra-large model into multiple phases with each running on different devices for efficient training. | +| **[Optimizer parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/optimizer_parallel.html)** | Distributes the optimizer computation to multiple devices to reduce memory usage and improve training efficiency. | +| **[Sequence parallelism](https://gitee.com/mindspore/mindformers/blob/dev/docs/feature_cards/Long_Sequence_Training.md)** | Slices the LayerNorm and Dropout inputs at the Transformer layer by sequence to reduce the GPU memory pressure of a single device. This mode applies to a model for processing long sequence inputs. | +| **Context parallelism** | Slices all inputs and output activations by sequence to further reduce the GPU memory usage of the model for processing long sequence inputs.| +| **[Multi-copy parallelism](https://www.mindspore.cn/docs/en/master/model_train/parallel/pipeline_parallel.html#mindspore-interleaved-pipeline-scheduler)** | Implements fine-grained parallel control among multiple copies to optimize performance and resource utilization. This mode is suitable for efficient training of models with large specifications. | + +For details about how to configure distributed parallel parameters, see [MindFormers Configuration Description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html). + +## MindFormers Distributed Parallel Application Practices + +In the [Llama3-70B fine-tuning configuration](https://gitee.com/kong_de_shu/mindformers/blob/dev/research/llama3/finetune_llama3_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows: + +- **Data parallelism**: No additional data parallelism is enabled (`data_parallel: 1`). +- **Model parallelism**: A model is sliced into eight parts, which are computed on different devices (`model_parallel: 8`). +- **Pipeline parallelism**: A model is divided into eight pipeline phases, which run on different devices in sequence (`pipeline_stage: 8`). +- **Sequence parallelism**: After it is enabled (`use_seq_parallel: True`), the inputs of LayerNorm and Dropout at the Transformer layer are sliced by sequence. In this way, each device only needs to process part of LayerNorm and Dropout, reducing the model GPU memory usage. +- **Multi-copy parallelism**: Sequential scheduling algorithm is used to control the parallelism of fine-grained multi-branch operations (`fine_grain_interleave: 2`), improving the overlap of computing and communications. +- **Optimizer parallelism**: The calculation of optimizers is distributed to multiple devices to reduce memory usage (`enable_parallel_optimizer: True`). + +> Note: Sequential parallelism must be turned on at the same time that fine-grained multicopy parallelism is turned on. + +With the preceding configurations, the distributed training on Llama3-70B can effectively utilize hardware resources in a multi-node multi-device environment to implement efficient and stable model training. diff --git a/docs/mindformers/docs/source_en/function/resume_training.md b/docs/mindformers/docs/source_en/function/resume_training.md index 987a5c7d36c4c2c9a9462a9dc168031b084b4793..5431ff89e6e549933173614d83e45ee43779e333 100644 --- a/docs/mindformers/docs/source_en/function/resume_training.md +++ b/docs/mindformers/docs/source_en/function/resume_training.md @@ -1,3 +1,279 @@ -# Resumable Training After Breakpoint +# Weight Saving and Resumable Training -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/resume_training.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/resume_training.md) + +## Weight Saving + +### Overview + +To train a deep learning model, saving the weights of the model is a critical step. The weight saving function enables you to store model parameters at any training stage so that you can resume training, evaluation, or deployment after the training is interrupted or completed. By saving the weights, you can also reproduce the experiment results in different environments. + +### Directory Structure + +During training, MindFormers generates two weight saving folders in the output directory: `checkpoint` and `checkpoint_network`. + +| Folder | Description | +|--------------------|-----------------------------------------------------| +| checkpoint | Stores the weights, optimizer status, steps, and epoches to the ckpt file for **resuming training**. | +| checkpoint_network | Stores only weight parameters in the ckpt file. This folder applies to **pre-trained weight** loading or **inference and evaluation** but not for resuming training.| + +#### `checkpoint` Directory Structure + +The weight file in the `checkpoint` folder is saved in the following format: + +```text +checkpoint + ├── rank_0 + ├── meta.json + └── {prefix}-{epoch}_{step}.ckpt + ... + └── rank_x + ├── meta.json + └── {prefix}-{epoch}_{step}.ckpt +``` + +| File | Description | +|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| meta.json | Record the `epoch`, `step`, and name of the last saved weight. Each rank process maintains an independent `meta.json` file. | +| {prefix}-{epoch}_{step}.ckpt | Saved weight file. `prefix` contains the rank_id information in the `{prefix}-{epoch}_{step}.ckpt` format. If a file with the same prefix already exists, the system automatically adds a suffix. When data offloading is enabled, the `epoch` location is calculated as follows: $\frac{CurrentTotalStepNumber}{SinkSize} = \frac{((CurrentEpoch-1)*StepsPerEpoch+CurrentStepInEpoch)}{SinkSize}$. `step` is fixed to `sink_size`.| + +#### Directory Structure of `checkpoint_network` + +```text +checkpoint + ├── rank_0 + └── {prefix}-{epoch}_{step}.ckpt + ... + └── rank_x + └── {prefix}-{epoch}_{step}.ckpt +``` + +| File | Description | +|------------------------------|-------------------------------------------------------------------------------------------------------| +| {prefix}-{epoch}_{step}.ckpt | Saved weight file. `prefix` contains the rank_id information in the `{prefix}-{epoch}_{step}.ckpt` format. If a file with the same prefix already exists, the system automatically adds a suffix. The naming rule when data offloading is enabled is the same as the preceding naming rule.| + +### Configuration and Usage + +#### YAML Parameters + +You can modify the configuration file to control weight saving. The main parameters are as follows. + +| Parameter | Description | +|-----------------------|-----------------------------------| +| save_checkpoint_steps | Number of steps taken each time a weight is saved. If this parameter is not set, no weight is saved. | +| keep_checkpoint_max | Maximum number of weight files that can be saved at the same time. If the number of weight files reaches the upper limit, the earliest weight file will be deleted when the latest weight file is saved.| + +You can modify the fields under `CheckpointMonitor` in the `yaml` configuration file to control the weight saving behavior. For example: + +```yaml +callbacks: + ... + - type: CheckpointMonitor + prefix: "llama2_7b" + save_checkpoint_steps: 500 + keep_checkpoint_max: 3 + ... +``` + +In the preceding example, the weights are saved every 500 steps. A maximum of three weights can be saved at the same time. + +## Resumable Training + +### Overview + +MindFormers supports **step-level resumable training**, which allows the checkpoints of a model to be saved during training. If the training is interrupted, you can load a saved checkpoint to resume the training. This feature is crucial for processing large-scale training tasks, and can effectively reduce time and resource waste caused by unexpected interruptions. In addition, to resume a training where the dataset remains unchanged but the `Global Batch Size` is changed, for example, when the cluster is changed or the configuration is modified, this tool supports automatic scaling of the number of resumable training steps and skipped data steps in the same proportion. + +### Configuration and Usage + +#### YAML Parameters + +You can modify the configuration file to control resumable training. The main parameters are as follows. For details about other parameters, see the description of CheckpointMonitor. + +| Parameter | Description | +|------------------|---------------------------------------------------------------------| +| load_checkpoint | Weight path loaded during resumable training. The path can be a folder path (used to load distributed weights) or a specific weight file path. The default value is an empty string, indicating that no weight is loaded. | +| resume_training | Specifies whether to enable resumable training. You can set it to `True` or specify a weight file name. If the value is `True`, the system automatically resumes the training from the last interruption. The default value is `False`. | + +Based on the input parameters, there are four cases. + +| load_checkpoint | resume_training | Description | Recommended or Not| +|-----------------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------| +| Weight file path | True | Resumes a training based on the weights specified by load_checkpoint. | √ | +| Weight file path | Weight file name | The file name specified by resume_training is invalid. A training is resumed based on the weights specified by load_checkpoint. | × | +| Weight folder path | True | **Scenario 1: Single-node system, multi-node system+shared directory, or ModelArts**
1. Resumes the training based on the weights recorded in meta.json files and supports fault recovery.
2. Resumes the training based on the latest weight of all ranks if the meta.json file of any rank is missing.
**Scenario 2: Multi-node+non-shared directory**
Resumes the training based on the latest weight of all ranks.| √ | +| Weight folder path | Weight file name | Resumes the training based on the weights specified by resume_training. | √ | + +In addition, you can modify the following parameters under the `trainer` field in the configuration file to use related functions. + +| Parameter | Description | +|------------------|-------------------------------------------------------------------------------------------------------------| +| ignore_data_skip | Specifies whether to ignore the mechanism of skipping data during resumable training and read the dataset from the beginning instead. This parameter is used when the dataset is changed during resumable training. If this parameter is set to `True`, no data is skipped. The default value is `False`. | +| data_skip_steps | Number of steps skipped for the dataset. This parameter is used when the training is interrupted again after being resumed because the dataset or `global batch size` is changed. You need to manually set this parameter to configure the number of steps skipped for the new dataset. If the `global batch size` is changed, you need to divide and round down its value by the scaling coefficient and then specify the result as the value of this parameter.| + +#### Fault Recovery Mechanism + +If `resume_training` is set to `True`, the system automatically resumes training based on the weights recorded in `meta.json`. If the weight file of a rank is missing or damaged, the system rolls back to the latest available weight for recovery. + +> In a distributed environment, resumable training requires that the weights of all nodes be in the same shared directory. You can use the `SHARED_PATHS` environment variable to set the shared path. + +### Example of Distributed Training + +The following example shows how to enable resumable training in single-device and multi-device environments. The example is based on the `llama2_7b` model. +For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/pretrain_llama2_7b.yaml). + +#### Complete Training + +1. Modify `configs/llama2/pretrain_llama2_7b.yaml`. + + Configure the parallelism as required. + + ```yaml + parallel_config: + data_parallel: 1 + model_parallel: 2 + pipeline_stage: 2 + micro_batch_num: 2 + ``` + + Configure the model weight saving as required. + + ```yaml + callbacks: + ... + - type: CheckpointMonitor + prefix: "llama2_7b" + save_checkpoint_steps: 10 + keep_checkpoint_max: 3 + integrated_save: False + async_save: False + ... + ``` + +2. Prepare a dataset. The following uses [wikitext2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87) as an example to describe how to start four-device distributed training. + + ```shell + bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/pretrain_llama2_7b.yaml \ + --train_dataset /path/to/wikitext2-llama2.mindrecord \ + --run_mode train \ + --use_parallel True" 4 + ``` + + After the fourth saving is complete, end the process. The structure of the `rank_0` folder under `checkpoint` is as follows: + + ```text + checkpoint/rank_0 + ├── llama2_7b_rank_0-10_2.ckpt + ├── llama2_7b_rank_0-15_2.ckpt + ├── llama2_7b_rank_0-20_2.ckpt + └── meta.json + ``` + +#### Resumable Training + +1. Modify the configuration and specify the resumable training weight file. + + ```yaml + load_checkpoint: './output/checkpoint' + resume_training: True + ``` + +2. Resume training. + + ```shell + bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/pretrain_llama2_7b.yaml \ + --train_dataset /path/to/wikitext2-llama2.mindrecord \ + --run_mode train \ + --use_parallel True" 4 + ``` + + If the initial number of steps is `42`, the training is resumed successfully. The saved weight file contains the information about step `40`. The default value of `sink_size` is `2`, indicating that the information is printed every two steps. Therefore, the initial number of steps is `42`. + +#### Resumable Training with the Dataset Changed + +There are three main scenarios where the dataset is changed in resumable training. You need to modify the configuration file in each scenario. The following describes each case one by one, and describes in detail which step of the basic resumable training process needs to be modified, and how to modify a specific configuration to achieve an expected effect. + +**Scenario 1: Training resumed with a new dataset (but not skipping trained steps)** + +In this scenario, when the new dataset is used, the model training starts from scratch without skipping any data or steps. In this case, you need to set the configuration file **to ignore the previous data progress** so that the model can be trained from scratch based on the new dataset. + +- **Configuration modification**: You need to set `ignore_data_skip` based on the first step of the basic resumable training process. Set `ignore_data_skip` to `True`, indicating that no data is skipped. + + ```yaml + load_checkpoint: './output/checkpoint' + resume_training: True + trainer: + ignore_data_skip: True + ``` + +- **Expected result**: The model is trained from scratch based on the new dataset without skipping any steps. + +**Scenario 2: Training resumed with a new dataset, skipping trained steps** + +In this case, the model has been partially trained based on the new dataset (for example, `2` steps have been performed before the training is interrupted), and the training is expected to continue from the last interruption. In this case, you must manually specify the number of steps to be skipped. + +- **Configuration modification**: You need to set `ignore_data_skip` and `data_skip_steps` based on the first step of the basic resumable training process. Set `ignore_data_skip` to `False` and use `data_skip_steps` to specify the number of trained steps to skip (for example, `2`). + + ```yaml + load_checkpoint: './output/checkpoint' + resume_training: True + trainer: + ignore_data_skip: False + data_skip_steps: 2 + ``` + +- **Expected result**: The model skips the first `2` steps and continues the training from step `3` based on the new dataset. + +**Scenario 3: Training resumed with a new dataset and `global batch size` changed** + +If `global batch size` is changed (for example, doubled) when a training is resumed based on a new dataset, you need to scale the number of steps that have been performed when manually specifying the number of steps to be skipped. Specifically, the number of skipped steps needs to be divided and rounded down based on the scaling coefficient. For example, if the value of `global batch size` is changed to `2` times of the original value, the number of steps that need to be skipped is halved. + +- **Configuration modification**: Adjust `data_skip_steps` based on Scenario 2. Set `data_skip_steps` to the number of steps after scaling. For example, if `global batch size` is changed to `2` times of the original value, the number of steps to be skipped is changed to `1` (rounded down). + + ```yaml + load_checkpoint: './output/checkpoint' + resume_training: True + trainer: + ignore_data_skip: False + data_skip_steps: 1 + ``` + +- **Expected result**: The model adjusts the number of skipped steps based on the new setting of `global batch size` and continues the training from the specified position. + +#### Fault Recovery Example + +If some weight files are missing, the system automatically restores the files based on the latest available weight. + +1. Delete the `llama2_7b_rank_0-20_2.ckpt` file from the `rank_3` directory. The folder structure after the deletion is as follows: + + ```text + checkpoint/rank_3 + ├── llama2_7b_rank_0-10_2.ckpt + ├── llama2_7b_rank_0-15_2.ckpt + └── meta.json + ``` + +2. Modify the configuration to enable fault recovery. + + ```yaml + load_checkpoint: './output/checkpoint' + resume_training: True + ``` + +3. Start distributed training. + + ```shell + bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/pretrain_llama2_7b.yaml \ + --train_dataset /path/to/wikitext2-llama2.mindrecord \ + --run_mode train \ + --use_parallel True" 4 + ``` + + If the initial number of steps is `32`, the training is resumed successfully. Because the weight of the information in step `40` under `rank_3` is deleted, the weight saved last time, that is, the weight of the information in step `30`, is automatically used. The default value of `sink_size` is `2`, indicating that information is printed every two steps. Therefore, the initial number of steps is `32`. + +### Precautions + +- **Data offloading**: You must enable data offloading and configure `sink_mode=True` for distributed resumable training. +- **Weight file check**: Ensure that the weights loaded for resumable training are the ones saved when the training is interrupted instead of in the entire training process. Otherwise, an error is reported. diff --git a/docs/mindformers/docs/source_en/function/transform_weight.md b/docs/mindformers/docs/source_en/function/transform_weight.md index e2e53aa86dae4ca3f192f6ace7e2198f101e2c46..38f186b67494d83fa655f59509bccb40aae61826 100644 --- a/docs/mindformers/docs/source_en/function/transform_weight.md +++ b/docs/mindformers/docs/source_en/function/transform_weight.md @@ -1,3 +1,387 @@ -# Distributed Weight Segmentation And Combination +# Distributed Weight Slicing and Merging -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/transform_weight.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/transform_weight.md) + +## Overview + +In a current distributed training and inference environment, if a pre-trained weight does not match a distributed strategy, the pre-trained weight needs to be converted to adapt to the corresponding distributed strategy. MindFormers provides a set of weight conversion tools to meet the requirements in different scenarios. This tool can be used to slice a single-device weight into multi-device weights, convert between multi-device weights, and merge multi-device weights into a single-device weight. You can select [Automatic Conversion](#automatic-conversion) or [Offline Conversion](#offline-conversion) as required so that a model can quickly switch between different distributed scenarios. + +In addition, MindFormers supports [LoRA Weight Merging](#lora-weight-merging) to facilitate the deployment of models fine-tuned using LoRA. + +## Automatic Conversion + +When a model loads a weight, it automatically checks whether the weight is matching the distributed slicing strategy of the current model. If they do not match, the weight is automatically converted. + +### Parameters + +Parameters in the `yaml` file related to **automatic weight conversion** are described as follows: + +| Parameter | Description | +| ------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| load_checkpoint | Absolute path or folder path of the pre-loaded weights.
- For a complete set of weights, set this parameter to an absolute path.
- For a distributed weight, set this parameter to the folder path. The distributed weight must be stored in the `model_dir/rank_x/xxx.ckpt` format. The folder path is `model_dir`.
**If there are multiple CKPT files in the rank_x folder, the last CKPT file in the file name sequence is used for conversion by default.** | +| src_strategy | Path of the distributed strategy file corresponding to the pre-loaded weights.
- If the pre-loaded weights are a complete set of weights, leave this parameter **blank**.
- If the pre-loaded weights are distributed and pipeline parallelism is used when the pre-loaded weights are saved, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- If the pre-loaded weights are distributed and pipeline parallelism is not used when the pre-load weights are saved, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. | +| auto_trans_ckpt | Specifies whether to enable automatic weight conversion. The value True indicates that it is enabled. The default value is False. | +| transform_process_num | Number of processes used for automatic weight conversion. The default value is 1.
- If transform_process_num is set to 1, only rank_0 is used for weight conversion. Other processes wait until the conversion ends.
- If transform_process_num is larger than 1, **multiple processes conduct conversion**. For example, for an 8-device task, if transform_process_num is set to 2, rank_0 is used for converting the weights of slices rank_0, rank_1, rank_2, and rank_3, and rank_4 is used for converting the weights of slices rank_4, rank_5, rank_6, and rank_7, and other processes wait until rank_0 and rank_4 complete the conversion.
**Note**:
1. A larger value of transform_process_num indicates a shorter conversion time and **a larger host memory occupied by the conversion**. If the host memory is insufficient, decrease the value of transform_process_num.
2. The value of transform_process_num must be a number that can be exactly divided by and cannot exceed that of NPUs.| +| transform_by_rank | Specifies whether to use the mindspore.transform_checkpoint_by_rank API for weight conversion.
- If transform_process_num is larger than 1, the value is automatically set to `True`.
- If transform_process_num is set to 1, if the target weight is a distributed weight, the mindspore.transform_checkpoint_by_rank API is cyclically called to convert the weight of each rank slice in serial mode.
- If transform_process_num is set to 1, if the target weight is a complete weight, the value is automatically set to `False`, and the mindspore.transform_checkpoints API is called for weight conversion. | + +### YAML Configurations in Different Scenarios + +#### Slicing a Single-Device Weight into Multi-Device Weights + +```yaml +# load_checkpoint: specifies path of the pre-trained weight file. +load_checkpoint: "/worker/llama3_8b/llama3_8b.ckpt" + +# auto_trans_ckpt: specifies whether to enable automatic conversion. +auto_trans_ckpt: True +``` + +#### Conversion Between Multi-Device Weights + +```yaml +# load_checkpoint: specifies the path of the multi-device weight folder. +load_checkpoint: "/worker/checkpoint/llama3-8b-2layer-dp2mp2pp2" + +# src_strategy_path_or_dir: specifies the path of the distributed strategy file. +src_strategy_path_or_dir: "/worker/checkpoint/llama3-8b-2layer-dp2mp2pp2/strategy/merged_ckpt_strategy.ckpt" + +# auto_trans_ckpt: specifies whether to enable automatic conversion. +auto_trans_ckpt: True +``` + +#### Merging Multi-Device Weights into a Single-Device Weight + +```yaml +# load_checkpoint: specifies the path of the multi-device weight folder. +load_checkpoint: "/worker/checkpoint/llama3-8b-2layer-dp1mp2pp2" + +# src_strategy_path_or_dir: specifies the path of the distributed strategy file. +src_strategy_path_or_dir: "/worker/checkpoint/llama3-8b-2layer-dp1mp2pp2/strategy/merged_ckpt_strategy.ckpt" + +# auto_trans_ckpt: specifies whether to enable automatic conversion. +auto_trans_ckpt: True + +# use_parallel: Set it to False. +use_parallel: False +``` + +#### Enabling Multi-Process Conversion (Optional) + +```yaml +# transform_process_num: specifies the number of processes involved in the conversion. +transform_process_num: 2 +``` + +### Precautions + +- **Multi-process conversion**: Set the `transform_process_num` parameter to enable multi-process conversion. Pay attention to the memory usage. If a memory overflow occurs, you are advised to reduce the number of processes. + +- **Automatic weight conversion**: After this function is enabled, the system deletes the old `strategy` and `transformed_checkpoint` folders from the `output` directory and saves the output of the current task. After the conversion task is complete, you are advised to move the `strategy` and `transformed_checkpoint` folders to a user-defined directory to prevent them from being deleted by mistake in subsequent operations. + +- **Distributed strategy file saving**: The distributed strategy file is saved in the `output/strategy` folder. If **pipeline parallelism** is enabled, the system automatically merges all `ckpt_strategy_rank_x.ckpt` files into a `merged_ckpt_strategy.ckpt` file. If pipeline parallelism is not enabled, the MERGE operation is not performed. + +## Offline Conversion + +The offline conversion function is designed to meet your requirements for manually converting weights. With offline conversion, you can convert model weights in an independent environment. Offline conversion supports multiple weight conversion scenarios, including slicing a single-device weight into multi-device weights, converting between multi-device weights, and merging multi-device weights into a single-device weight. + +When using offline conversion, you can manually configure conversion parameters as required to ensure that the conversion process is flexible and controllable. This function is especially suitable for model deployment and optimization in a strictly controlled computing environment. + +### Parameters + +Parameters in the `yaml` file related to **offline weight conversion** are described as follows: + +| Parameter | Description | +| ----------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| src_checkpoint | Absolute path or folder path of the source weight.
- For **a complete set of weights**, set this parameter to an **absolute path**.
- For **distributed weights**, set this parameter to the **folder path**. The distributed weights must be stored in the `model_dir/rank_x/xxx.ckpt` format. The folder path is `model_dir`.
**If there are multiple CKPT files in the rank_x folder, the last CKPT file in the file name sequence is used for conversion by default.** | +| src_strategy | Path of the distributed strategy file corresponding to the source weight.
- For a complete set of weights, leave it **blank**.
- For distributed weights, if pipeline parallelism is used, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- For distributed weights, if pipeline parallelism is not used, set this parameter to any **ckpt_strategy_rank_x.ckpt** path. | +| dst_checkpoint | Path of the folder that stores the target weight. | +| dst_strategy | Path of the distributed strategy file corresponding to the target weight.
- For a complete set of weights, leave it **blank**.
- For distributed weights, if pipeline parallelism is used, set this parameter to the **merged strategy file path** or **distributed strategy folder path**.
- For distributed weights, if pipeline parallelism is not used, set this parameter to any **ckpt_strategy_rank_x.ckpt** path.| +| prefix | Prefix name of the saved target weight. The weight is saved as {prefix}rank_x.ckpt. The default value is checkpoint_. | +| world_size | Total number of slices of the target weight. Generally, the value is dp \* mp \* pp. | +| process_num | Number of processes used for offline weight conversion. The default value is 1.
- If process_num is set to 1, **a single process is used for conversion**.
- If process_num is larger than 1, **multi-process conversion** is used. For example, if the target weight for conversion is the distributed weight of eight GPUs and process_num is set to 2, two processes are started to convert the weights of slices rank_0, rank_1, rank_2, and rank_3 and slices rank_4, rank_5, rank_6, and rank_7, respectively. | + +### Offline Conversion Configuration + +#### Single-Process Conversion + +Use [mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py) to perform single-process conversion on the loaded weight. + +**Run the command.** + +```shell +python transform_checkpoint.py \ + --src_checkpoint=/worker/checkpoint/llama3-8b-2layer/rank_0/llama3_8b.ckpt \ + --dst_checkpoint=/worker/transform_ckpt/llama3_8b_1to8/ \ + --dst_strategy=/worker/mindformers/output/strategy/ +``` + +**Precautions**: + +If no target strategy file is available during offline conversion, you can set `only_save_strategy: True` to generate a strategy file and run the task once to obtain it. + +#### Multi-Process Conversion + +Use [mindformers/tools/ckpt_transform/transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.sh) to perform multi-process conversion on the loaded weight. + +**Run the command.** + +```shell +bash transform_checkpoint.sh \ + /worker/checkpoint/llam3-8b-2layer/rank_0/llama3_8b.ckpt \ + None \ + /worker/transform_ckpt/llama3_8b_1to8/ \ + /worker/mindformers/output/strategy/ \ + 8 2 +``` + +**Precautions**: + +- When the [transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.sh) script is used, `8` indicates the number of target devices, and `2` indicates that two processes are used for conversion. +- If no target strategy file is available, you can set `only_save_strategy: True` to generate a strategy file. + +#### Parameter Configuration Examples + +- **Save the strategy file.** + + ```yaml + only_save_strategy: True + ``` + +- **Configure the dataset.** + + ```yaml + train_dataset: &train_dataset + data_loader: + type: MindDataset + dataset_dir: "/worker/dataset/wiki103/" + shuffle: True + ``` + +- **Configure an 8-device distributed strategy.** + + ```yaml + parallel_config: + data_parallel: 2 + model_parallel: 2 + pipeline_stage: 2 + micro_batch_num: 2 + ``` + +- **Configure the model.** + + ```yaml + model: + model_config: + seq_length: 512 + num_layers: 2 + ``` + +## Special Scenarios + +### Multi-Node Multi-Device Training on Physical Machines + +Training a large-scale model usually needs a cluster of servers. In the multi-node multi-device scenario, if there is a shared disk between servers, the automatic conversion function can be used. Otherwise, only offline conversion can be used. The following example is a training that uses two servers and 16 GPUs. + +#### Scenario 1: A shared disk exists between servers. + +If there is a shared disk between servers, you can use MindFormers to automatically convert a weight before multi-node multi-device training. Assume that `/data` is the shared disk between the servers and the MindFormers project code is stored in the `/data/mindformers` directory. + +- **Single-process conversion** + + In single-process conversion mode, you only need to set the path of the pre-trained weight in the configuration file and enable automatic weight conversion. + + **Configure the parameter.** + + ```yaml + # Set the path of the pre-trained weight file to an absolute path. + load_checkpoint: "/worker/checkpoint/llama3-8b/rank_0/llama3_8b.ckpt" + + # Set auto_trans_ckpt to True to enable automatic weight conversion. + auto_trans_ckpt: True + + # Set the dataset path. + train_dataset: &train_dataset + data_loader: + type: MindDataset + dataset_dir: "/worker/dataset/wiki103/" + shuffle: True + + # Configure the 16-device distributed strategy (for reference only). + parallel_config: + data_parallel: 2 + model_parallel: 4 + pipeline_stage: 2 + micro_batch_num: 2 + vocab_emb_dp: True + gradient_aggregation_group: 4 + micro_batch_interleave_num: 1 + ``` + +- **Multi-process conversion (optional)** + + To accelerate weight conversion, you can choose the multi-process conversion mode by setting the `transform_process_num` parameter. + + **Configure the parameter.** + + ```yaml + # Use two processes for conversion. + transform_process_num: 2 + ``` + + **Start a task.** + + Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh) to start the task. + + ```shell + # First server (main node) + bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config {CONFIG_PATH} \ + --run_mode train" \ + 16 8 ${ip} ${port} 0 output/msrun_log False 300 + # Second server (subnode) + bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config {CONFIG_PATH} \ + --run_mode train" \ + 16 8 ${ip} ${port} 1 output/msrun_log False 300 + ``` + +#### Scenario 2: No shared disk exists between servers. + +If there is no shared disk between servers, you need to use the offline weight conversion tool to convert the weight. The following steps describe how to perform offline weight conversion and start a multi-node multi-device training task. + +- **Obtain the distributed policy file.** + + Before offline weight conversion, you need to obtain the distributed strategy file of each node. + + **Configure the parameter.** + + ```yaml + # Set **only_save_strategy** to **True** to obtain the distributed strategy file. + only_save_strategy: True + + # Set the dataset path. + train_dataset: &train_dataset + data_loader: + type: MindDataset + dataset_dir: "/worker/dataset/wikitext_2048/" + shuffle: True + + # Configure the 16-device distributed strategy (for reference only). + parallel_config: + data_parallel: 2 + model_parallel: 4 + pipeline_stage: 2 + micro_batch_num: 2 + vocab_emb_dp: True + gradient_aggregation_group: 4 + micro_batch_interleave_num: 1 + ``` + + The strategy file of each node is stored in the corresponding `output/strategy` directory. For example, node 0 stores the `ckpt_strategy_rank_0-7.ckpt` file, and node 1 stores the `ckpt_strategy_rank_8-15.ckpt` file. Then, you need to integrate the strategy files of all nodes on the same server to facilitate subsequent operations. + +- **Offline weight conversion** + + On the server where all strategy files are stored, use [mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py) to perform offline weight conversion. + + **Single-process conversion** + + ```shell + python mindformers/tools/ckpt_transform/transform_checkpoint.py \ + --src_checkpoint=/worker/checkpoint/llama3-8b/rank_0/llama_7b.ckpt \ + --dst_checkpoint=./output/llama3_8b_dp2mp4pp2 \ + --dst_strategy=./output/strategy + ``` + + **Multi-process conversion (optional)** + + ```shell + # Use two processes for conversion. + bash mindformers/tools/ckpt_transform/transform_checkpoint.sh \ + /worker/checkpoint/llama3-8b/rank_0/llama_7b.ckpt \ + None \ + ./output/llama3_8b_dp2mp4pp2 \ + ./output/strategy \ + 16 2 + ``` + +- **Copy the weights to other nodes.** + + Copy the distributed weights that have been converted to respective nodes. Node 0 requires only the weights of slices from `rank_0` to `rank_7`, and node 1 requires only the weights of slices from `rank_8` to `rank_15`. + +- **Set the parameter.** + + ```yaml + # Set the pre-trained weight path to model_dir, the distributed weight folder path. + load_checkpoint: "/worker/checkpoint/llama3_8b_dp2mp4pp2" + + # Change only_save_strategy to False. + only_save_strategy: False + ``` + +### ModelArts training + +Training in ModelArts is similar to multi-node multi-device training on physical machines. Automatic weight conversion can also be enabled. You can set `auto_trans_ckpt=True` in the hyperparameters of a training task to enable automatic weight conversion and set `transform_process_num > 1` to enable multi-process conversion. + +**Note**: If the number of NPUs on the server node in the ModelArts resource pool is not 8, you need to set `npu_num_per_node = the number of NPUs on the node`. For example, if each node is configured with 16 NPUs, `npu_num_per_node=16` should be set. + +## LoRA Weight Merging + +### Overview + +The basic principle of low-rank adaptation (LoRA) is to parameterize the original model with low-rank weights. The core process of merging LoRA weights is to calculate the parameters of the LoRA branches and add them to the corresponding model parameters, which makes the parameter list of the final weight file the same as that of the original model and excludes additional LoRA parameters. This operation does not affect the inference result. Therefore, the model after merging still has the same performance as the original model during inference. +For details about the principles and implementation of LoRA, see the following resources: + +- Paper: [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) +- GitHub: [https://github.com/microsoft/LoRA](https://github.com/microsoft/LoRA) + +### Instructions + +Use the [LoRA weight merging script](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/transform_ckpt_lora.py) provided by MindFormers to merge LoRA weights as follows: + +```shell +python mindformers/tools/transform_ckpt_lora.py \ + --src_ckpt_strategy src_strategy_path_or_dir \ + --src_ckpt_path_or_dir src_ckpt_path_or_dir \ + --dst_ckpt_dir dst_ckpt_dir \ + --prefix "checkpoint_" \ + --lora_scaling lora_alpha/lora_rank +``` + +#### Parameters + +- **src_ckpt_strategy**: specifies the path of the distributed strategy file corresponding to the source weight. The file is stored in the `output/strategy/` directory by default after the training task is started. If the source is a complete set of weights, you do not need to set this parameter. If the source contains distributed weights, set this parameter based on the following conditions: + - **Pipeline parallelism enabled for the source weights**: Weight conversion is based on the merging strategy file. Set the parameter to the path of the distributed strategy folder. The script automatically merges all `ckpt_strategy_rank_x.ckpt` files in the folder into `merged_ckpt_strategy.ckpt` in the folder. If `merged_ckpt_strategy.ckpt` already exists, set the parameter to the path of the file. + - **Pipeline parallelism not enabled for the source weights**: Weight conversion can be based on any strategy file. Set the parameter to the path of any `ckpt_strategy_rank_x.ckpt` file. + + **Note**: If a `merged_ckpt_strategy.ckpt` already exists in the strategy folder and is still transferred to the folder path, the script deletes the old `merged_ckpt_strategy.ckpt` and then merges files into a new `merged_ckpt_strategy.ckpt` for weight conversion. Therefore, ensure that the folder has enough write permission. Otherwise, an error will be reported. +- **src_ckpt_path_or_dir**: specifies the path of the source weight. For distributed weights, set the parameter to the path of the folder where the source weights are located. The source weights must be stored in the `model_dir/rank_x/xxx.ckpt` format, and the folder path must be set to `model_dir`. If the source is a complete set of weights, set the parameter to an absolute path. +- **dst_ckpt_dir**: specifies the path for storing the target weight, which must be a user-defined path of an empty folder. The target weight is saved in the `model_dir/rank_x/xxx.ckpt` format. +- **prefix**: name prefix of the target weight file. The default value is "checkpoint_", indicating that the target weight is saved in the `model_dir/rank_x/checkpoint_x.ckpt` format. +- **lora_scaling**: combination coefficient of the LoRA weight. The default value is `lora_alpha/lora_rank`. The two parameters are used for LoRA model configuration and need to be calculated. + +### Examples + +#### Scenario 1: There is a complete set of weights for LoRA parameters. + +If the weight file before merging is a complete one, you can set the parameters as follows (directly enter the path of the complete set of weights): + +```shell +python mindformers/tools/transform_ckpt_lora.py \ + --src_ckpt_path_or_dir .../xxx/xxx.ckpt \ + --dst_ckpt_dir dst_ckpt_dir \ + --prefix "checkpoint_" \ + --lora_scaling lora_alpha/lora_rank +``` + +#### Scenario 2: There are distributed weights for LoRA parameters. + +If the weight file before merging contains distributed weights, you can set the parameters as follows (enter the path of the distributed weight folder and the path of the distributed strategy folder). The obtained weights are automatically merged into a complete weight file. + +```shell +python mindformers/tools/transform_ckpt_lora.py \ + --src_ckpt_strategy .../xxx/mindformers/output/strategy/ \ + --src_ckpt_path_or_dir .../xxx/model_dir \ + --dst_ckpt_dir dst_ckpt_dir \ + --prefix "checkpoint_" \ + --lora_scaling lora_alpha/lora_rank +``` diff --git a/docs/mindformers/docs/source_en/function/weight_conversion.md b/docs/mindformers/docs/source_en/function/weight_conversion.md index 4387fed39e7dbe61fd35a9dee12c6df01ddc0aa3..f4179098c6beea2a0b24f3bffb730210934a5712 100644 --- a/docs/mindformers/docs/source_en/function/weight_conversion.md +++ b/docs/mindformers/docs/source_en/function/weight_conversion.md @@ -1,3 +1,135 @@ -# Weight Conversion +# Weight Format Conversion -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/weight_conversion.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md) + +## Overview + +MindFormers provides a unified weight conversion tool that allows model weights to convert between the HuggingFace and MindFormers formats. This helps you: + +- Convert a HuggingFace weight to a MindFormers one for fine-tuning, evaluation, or inference on MindFormers. +- Convert the weights trained or fine-tuned using MindFormers to HuggingFace weights and uses them on other frameworks. + +## Conversion Procedure + +To perform weight conversion, clone the complete HuggingFace repository of the model to be converted locally, and execute the `mindformers/convert_weight.py` script. This script automatically converts the HuggingFace model weight file into a weight file applicable to MindFormers. If you want to convert a MindFormers weight to a HuggingFace one, set +`reversed` to `True`. + +```shell +python convert_weight.py [-h] --model MODEL [--reversed] --input_path INPUT_PATH --output_path OUTPUT_PATH [--dtype DTYPE] [--n_head N_HEAD] [--hidden_size HIDDEN_SIZE] [--layers LAYERS] [--is_pretrain IS_PRETRAIN] [--telechat_type TELECHAT_TYPE] +``` + +### Parameters + +- model: model name. +- reversed: converts a MindFormers weight to the HuggingFace one. +- input_path: path of the HuggingFace weight folder, which points to the downloaded weight file. +- output_path: path for storing the MindFormers weight file after conversion. +- dtype: weight data type after conversion. +- n_head: takes effect only for the BLOOM model. Set this parameter to `16` when `bloom_560m` is used and to `32` when `bloom_7.1b` is used. +- hidden_size: takes effect only for the BLOOM model. Set this parameter to `1024` when `bloom_560m` is used and to `4096` when `bloom_7.1b` is used. +- layers: number of layers to be converted. This parameter takes effect only for the GPT2 and WizardCoder models. +- is_pretrain: converts the pre-trained weight. This parameter takes effect only for the Swin model. +- telechat_type: version of the TeleChat model. This parameter takes effect only for the TeleChat model. + +## Conversion Example + +Assume that you have downloaded the [Llama2 model weight](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD) and saved it in the `/home/user/torch_weights` path. To convert it to the MindFormers weight and save it in the `/home/user/ms_weights` path, run the following command: + +```bash +python convert_weight.py --model llama2 --input_path /home/user/torch_weights --output_path /home/user/ms_weights/llama.ckpt +``` + +After the preceding steps are performed, the HuggingFace weight is successfully converted to a MindFormers weight, facilitating model training or inference on MindFormers. + +## Supported Models + +- Baichuan +- BLIP +- BLOOM +- CodeGeeX2 +- CogVLM2 +- DeepSeek +- GLM +- GLM-n +- GPT +- InternLM +- InternLM2 +- knowlm +- Llama +- MAE +- Mixtral +- Qwen +- Qwen2 +- Qwen2-MoE +- Qwen-VL +- Skywork +- Swin +- TeleChat +- ViT +- WizardCoder +- Yi + +## Developing Weight Conversion for Unsupported Models + +1. Add the `convert_weight.py` and `convert_reversed.py` files to the extended model directory. +2. Compile the `convert_pt_to_ms` and `convert_ms_to_pt` weight conversion functions in the files. The function parameters are `input_path`, `output_path`, `dtype`, and an additional parameter `**kwargs`. +3. Add the extended model name and conversion function import paths to the `convert_map` and `reversed_convert_map` dictionaries in the `convert_weight.py` file in the MindFormers root directory. +4. Call the `parser.add_argument()` method in the `main` function to add the additional parameter. + +## Example of Developing Model Weight Conversion + +Llama is used as an example. To convert a HuggingFace weight to a MindFormers one, define the `convert_pt_to_ms` function in [convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py). + +```python +def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs): + """convert hf weight to ms.""" + print(f"Trying to convert huggingface checkpoint in '{input_path}'.", flush=True) + try: + from transformers import LlamaForCausalLM + except: + raise ImportError(f"Failed to load huggingface checkpoint. Please make sure transformers is available.") + + try: + model_hf = LlamaForCausalLM.from_pretrained(os.path.dirname(input_path)) + except Exception as e: + print(f"Do not find huggingface checkpoint in '{os.path.dirname(input_path)}', Error {e.message}.", flush=True) + return False + ckpt_list = [] + for name, value in model_hf.state_dict().items(): + name = name_replace(name) + if name == 'norm.weight': + name = 'norm_out.weight' + if name[:7] == 'layers.': + name = name[7:] + + print(f'\rprocessing parameter: {name} {value.shape} ', end='', flush=True) + ckpt_list.append({'name': name, 'data': pt2ms(value, dtype)}) + + ms.save_checkpoint(ckpt_list, output_path) + print(f"\rConvert huggingface checkpoint finished, the mindspore checkpoint is saved in '{output_path}'.", + flush=True) + return True +``` + +To convert a MindFormers weight to a HuggingFace one, define the `convert_ms_to_pt` function in [convert_reversed.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_reversed.py). + +```python +def convert_ms_to_pt(input_path, output_path, dtype=None, **kwargs): + """convert ms weight to hf.""" + print(f"Trying to convert mindspore checkpoint in '{input_path}'.", flush=True) + model_ms = ms.load_checkpoint(input_path) + + state_dict = {} + for name, value in model_ms.items(): + name = name_replace(name) + print(f'\rprocessing parameter: {name} {value.shape} ', end='', flush=True) + if is_lora_param(name): + name = name.replace('.tk_delta_lora_a', '.lora_A.weight') + name = name.replace('.tk_delta_lora_b', 'lora_B.weight') + state_dict[name] = ms2pt(value, dtype) + + torch.save(state_dict, output_path) + print(f"\rConvert mindspore checkpoint finished, the huggingface checkpoint is saved in '{output_path}'.", + flush=True) + return True +``` \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/quick_start/source_code_start.md b/docs/mindformers/docs/source_en/quick_start/source_code_start.md index d6c1663f71415a7e4f27c2064adc752d58b572a9..fe505dd9e4a8bf65a6203bd0f7e13af84ab6679a 100644 --- a/docs/mindformers/docs/source_en/quick_start/source_code_start.md +++ b/docs/mindformers/docs/source_en/quick_start/source_code_start.md @@ -20,7 +20,7 @@ Word list download link: [tokenizer.model](https://ascend-repo-modelzoo.obs.cn-e 2. Data Preprocessing - 1. Execute [mindformers/tools/dataset_preprocess/llama/alpaca_converter.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/alpaca_converter.py), and use the fastchat tool to add prompts templates to convert the raw dataset into a multi-round conversation format. + 1. Execute [mindformers/tools/dataset_preprocess/llama/alpaca_converter.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/alpaca_converter.py), and use the fastchat tool to add prompt templates to convert the raw dataset into a multi-round conversation format. ```shell diff --git a/docs/mindformers/docs/source_en/usage/dev_migration.md b/docs/mindformers/docs/source_en/usage/dev_migration.md index 65da489baa1ddff1896d69114aad51a7bc526d11..dfe2ce8e3390dc3932c1d58b2da08ec9af4f7a96 100644 --- a/docs/mindformers/docs/source_en/usage/dev_migration.md +++ b/docs/mindformers/docs/source_en/usage/dev_migration.md @@ -1,3 +1,137 @@ # Development Migration -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/dev_migration.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/dev_migration.md) + +This document describes how to develop and build foundation models based on MindFormers and complete basic adaptation to start the training and inference processes. + +## Building a Foundation Model Based on MindFormers + +The basic components of a foundation model in MindFormers include the configurations, models, and tokenizers for large language models (LLMs). In addition, to use the run_mindformer.py unified script to start the training or inference process, you need to prepare the `YAML` configuration file for training or inference. + +### Writing Configurations + +A model configuration is an instance that contains all information about a model. The `__init__` methods of all models in MindFormers receive a model configuration instance as the input parameter. All submodules of the model are initialized based on the information contained in the configuration instance. + +MindFormers provides the [PretrainedConfig](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PretrainedConfig.html) class, which provides some common configuration methods. The configuration classes of all models should be inherited from the PretrainedConfig class. Developers only need to define all configuration parameters that help build foundation models. Foundation models of the Transformer type have configuration parameters such as `seq_length`, `hidden_size`, `num_layers`, and `num_heads`, and foundation models of the text type have `vocab_size` in addition. + +For details, see the configuration class [LlamaConfig](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaConfig.html) of the Llama model in MindFormers. + +> If your model is similar to a model in the library, you can reuse the same configurations as the model. + +### Writing a Model + +The MindFormers foundation model is developed based on the MindSpore framework. If your model has been implemented based on PyTorch, see [MindSpore Network Construction](https://www.mindspore.cn/docs/en/master/migration_guide/model_development/model_and_cell.html). Developers only need to pay attention to the implementation of the model network. + +MindFormers provides the [PretrainedModel](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PreTrainedModel.html) class, which is responsible for storage model configurations and processing the methods of loading and saving models. All model classes must be inherited from the PretrainedModel class, and the model input must be the same. That is, the input parameters of the `construct` method of the model must be the same. For details about the input parameters and meanings, see the Llama model class [LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaForCausalLM.html) in MindFormers. In addition, the model class must implement some abstract methods of the base class, including: + +- `prepare_inputs_for_generation`: method for building input for model inference. +- `prepare_inputs_for_predict_layout`: method for building virtual input for the distributed loading model weight. + +For specific meanings, refer to the descriptions in [LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaForCausalLM.html). + +> If your model structure is similar to that of a model in the library, you can reuse the model. + +### Writing a Tokenizer (for LLMs) + +A tokenizer is used to process input and output of LLMs. It is required in the workflow of LLMs. + +MindFormers provides the [PretrainedTokenizer](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PreTrainedTokenizer.html) and [PretrainedTokenizerFast](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PreTrainedTokenizerFast.html) classes, which use Python only and use the Rust library, respectively. The features of the latter one are as follows: + +- Faster batch processing. +- Additional methods for mapping between text strings and lexical spaces. For example, the indexes of the lexical element containing a given character or the character spans corresponding to the given lexical element are obtained. + +All tokenizer classes must be inherited from the PretrainedTokenizer or PretrainedTokenizerFast class. For details, see [LlamaTokenizer](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaTokenizer.html) and [LlamaTokenizerFast](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaTokenizerFast.html). + +> If your tokenizer is similar to that in the library, you can reuse that in the library. + +### Preparing a Weight and a Dataset + +If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html). + +For details about how to prepare a dataset, see [Dataset](https://www.mindspore.cn/mindformers/docs/en/dev/function/dataset.html) or the model document, for example, [Llama2 Description Document > Dataset Preparation](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87). + +### Preparing a `YAML` File + +MindFormers uses a `YAML` file to configure all parameters required by a task, including model parameters, training parameters (such as optimizer, learning rate, and dataset), inference parameters (such as tokenizer), distributed parallel parameters, and context environment parameters. + +The code of the customized model is not in the MindFormers library, and the customized module in the code is not registered with MindFormers. Therefore, the customized model cannot be automatically instantiated. The code is also called external code (for example, the code in the `research` directory). Therefore, you need to add the `auto_register` configuration item for automatically registering any module to the corresponding module configuration in the `YAML` file and set the configuration items to the relative import paths of the API to be registered. When the run_mindformer.py script is executed to start the task, you need to add the input parameter `--register_path` of the registration path and set it to the relative path of the directory where the external code is located. + +For example, in the `YAML` file [`research/llama3_1/predict_llama3_1_8b.yaml`](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/predict_llama3_1_8b.yaml) of the Llama3.1-8B model inference in the `research` directory, the configuration item `auto_register` is added for automatic registration to register the customized `Llama3Tokenizer` in [`research/llama3_1/llama3_1_tokenizer.py`](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_tokenizer.py). + +```yaml +... +processor: + return_tensors: ms + tokenizer: + model_max_length: 8192 + vocab_file: "/path/tokenizer.json" + pad_token: "<|reserved_special_token_0|>" + type: Llama3Tokenizer + auto_register: llama3_1_tokenizer.Llama3Tokenizer + type: LlamaProcessor +... +``` + +The relative import path `auto_register: llama3_1_tokenizer.Llama3Tokenizer` of `Llama3Tokenizer` is configured under `tokenizer`. + +Run the following command to start the inference job: + +```bash +python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --load_checkpoint path/to/llama3_1_8b.ckpt --register_path research/llama3_1 --predict_data "hello" +``` + +**Parameters** + +| Parameter | Description | +|:---------------:|:--------------| +| config | Path of the `YAML` file.| +| load_checkpoint | Loaded weight path. | +| register_path | Path of the directory where the external code is located. | +| predict_data | Input data for inference. | + +`register_path` is set to `research/llama3_1` (path of the directory where the external code is located). For details about how to prepare the model weight, see [Llama3.1 Description Document > Model Weight Download](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD). + +For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama2-7B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml). + +After all the preceding basic elements are prepared, you can refer to other documents in the MindFormers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/acc_optimize/acc_optimize.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html). + +### Contributing Models to the MindFormers Open Source Repository + +You can contribute models to the MindFormers open source repository for developers to research and use. For details, see [MindFormers Contribution Guidelines](https://www.mindspore.cn/mindformers/docs/en/dev/faq/mindformers_contribution.html). + +## MindFormers Model Migration Practice + +### Migration from Llama2-7B to Llama3-8B + +Llama3-8B and Llama2-7B have the same model structure but different model parameters, tokenizers, and weights. + +#### Model Configurations + +The following compares the model configurations between Llama2-7B and Llama3-8B. + +![model_config_comparison](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/docs/mindformers/docs/source_zh_cn/usage/image/model_config_comparison.png) + +The differences are as follows: + +- The sequence length of Llama3-8B is 8192. Therefore, `seq_length` is set to `8192`. +- Llama3-8B uses GQA and the number of heads in each key-value group is 8. Therefore, `n_kv_head` is set to `8`. +- The size of the Llama3-8B vocabulary is 128,256. Therefore, `vocab_size` is set to `128256`. +- Llama3-8B expands the hidden layer size of the feed-forward network to 14,336. Therefore, `intermediate_size` is set to `14336`. +- In Llama3-8B, the special word metaindex is modified. Therefore, `bos_token_id` is set to `128000`, `eos_token_id` is set to `128001`, and `pad_token_id` is set to `128002`. +- In Llama3-8B, the value of **theta** in the rotation position code is changed to **500000**. Therefore, `theta` is set to `500000`. + +After modifying the corresponding content in the `YAML` file of Llama2-7B, you can obtain the [Llama3-8B configuration file](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/finetune_llama3_8b_8k_800T_A2_64G.yaml). + +#### Tokenizer + +Llama3-8B re-implements the tokenizer. According to the official implementation, PretrainedTokenizer is inherited from MindFormers to implement Llama3Tokenizer, which is written in [llama3_tokenizer.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3_tokenizer.py). + +#### Weight Conversion + +The parameters of Llama3-8B are the same as those of Llama2-7B. Therefore, the weight conversion process of Llama2-7B can be reused. For details, see [Llama3 Document > Weight Conversion](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E8%BD%AC%E6%8D%A2). + +#### Dataset Processing + +The tokenizer of Llama3-8B is different from that of Llama2-7B. Therefore, you need to replace the tokenizer of Llama3-8B to preprocess data based on the dataset processing script of Llama2-7B. For details, see [conversation.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/conversation.py) and [llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama_preprocess.py). + +For details about the implementation of Llama3 in MindFormers, see [Llama3 folder](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) in the MindFormers repository. For details about how to use Llama3 in MindFormers, see [LLama3 documents](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3.md). diff --git a/docs/mindformers/docs/source_en/usage/evaluation.md b/docs/mindformers/docs/source_en/usage/evaluation.md index ce069ecf07881b2d1a1357994de0311547f5a768..c4beafe9e66380b4e171cd3a0927bfd7b759f04b 100644 --- a/docs/mindformers/docs/source_en/usage/evaluation.md +++ b/docs/mindformers/docs/source_en/usage/evaluation.md @@ -1,3 +1,115 @@ # Evaluation -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/evaluation.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/evaluation.md) + +## Harness Evaluation + +### Introduction + +[LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) is an open-source language model evaluation framework that provides evaluation of more than 60 standard academic datasets, supports multiple evaluation modes such as HuggingFace model evaluation, PEFT adapter evaluation, and vLLM inference evaluation, and supports customized prompts and evaluation metrics, including the evaluation tasks of the loglikelihood, generate_until, and loglikelihood_rolling types. +After MindFormers is adapted based on the Harness evaluation framework, the MindFormers model can be loaded for evaluation. + +### Installation + +```shell +pip install lm_eval==0.4.3 +``` + +### Usage + +Run the [eval_with_harness.py](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/benchmarks/eval_with_harness.py) script. + +#### Viewing a Dataset Evaluation Task + +```shell +#!/bin/bash + +python toolkit/benchmarks/eval_with_harness.py --tasks list +``` + +#### Starting the Single-Device Evaluation Script + +```shell +#!/bin/bash + +python toolkit/benchmarks/eval_with_harness.py --model mf --model_args "pretrained=MODEL_DIR,device_id=0" --tasks TASKS +``` + +#### Starting the Multi-Device Parallel Evaluation Script + +```shell +#!/bin/bash + +export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 + +bash mindformers/scripts/msrun_launcher.sh "toolkit/benchmarks/eval_with_harness.py \ + --model mf \ + --model_args pretrained=MODEL_DIR,use_parallel=True,tp=1,dp=4 \ + --tasks TASKS \ + --batch_size 4" 4 +``` + +You can set multiple device numbers through the environment variable ASCEND_RT_VISIBLE_DEVICES. + +#### Evaluation Parameters + +Harness parameters + +| Parameter | Type | Description | Required| +|---------------|-----|---------------------------|------| +| --model | str | The value must be **mf**, indicating the MindFormers evaluation policy.| Yes | +| --model_args | str | Model and evaluation parameters. For details, see "MindFormers model parameters." | Yes | +| --tasks | str | Dataset name. Multiple datasets can be specified and separated by commas (,). | Yes | +| --batch_size | int | Number of batch processing samples. | No | +| --num_fewshot | int | Number of few-shot samples. | No | +| --limit | int | Number of samples for each task. This parameter is mainly used for function tests. | No | + +MindFormers model parameters + +| Parameter | Type | Description | Required| +|--------------|------|-----------------------------------|------| +| pretrained | str | Model directory. | Yes | +| use_past | bool | Specifies whether to enable incremental inference. This parameter must be enabled for evaluation tasks of the generate_until type.| No | +| device_id | int | Device ID. | No | +| use_parallel | bool | Specifies whether to enable the parallel policy. | No | +| dp | int | Data parallelism. | No | +| tp | int | Model parallelism. | No | + +#### Preparations Before Evaluation + +1. Create a model directory MODEL_DIR. +2. Store the MindFormers weight, YAML file, and tokenizer file in the model directory. For details about how to obtain the weight and files, see the README file of the MindFormers model. +3. Configure the yaml file. + +YAML configuration references: + +```yaml +run_mode: 'predict' +model: + model_config: + use_past: True + checkpoint_name_or_path: "model.ckpt" +processor: + tokenizer: + vocab_file: "tokenizer.model" +``` + +### Evaluation Example + +```shell +#!/bin/bash + +python toolkit/benchmarks/eval_with_harness.py --model mf --model_args "pretrained=./llama3-8b,use_past=True" --tasks gsm8k + +``` + +The evaluation result is as follows. Filter indicates the output mode of the matching model, Metric indicates the evaluation metric, Value indicates the evaluation score, and Stderr indicates the score error. + +| Tasks | Version | Filter | n-shot | Metric | | Value | | Stderr | +|-------|--------:|------------------|-------:|-------------|---|--------|---|--------| +| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.5034 | ± | 0.0138 | +| | | strict-match | 5 | exact_match | ↑ | 0.5011 | ± | 0.0138 | + +### Features + +For details about all Harness evaluation tasks, see [Viewing a Dataset Evaluation Task](#viewing-a-dataset-evaluation-task). \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/usage/inference.md b/docs/mindformers/docs/source_en/usage/inference.md index f1dc1e12c3e8ca6829511b028ad8833040bc6a55..412bf306e6c28f7400bf23d9c550f1e20174a6a3 100644 --- a/docs/mindformers/docs/source_en/usage/inference.md +++ b/docs/mindformers/docs/source_en/usage/inference.md @@ -1,3 +1,176 @@ # Inference -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/inference.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/inference.md) + +## Overview + +MindFormers provides the foundation model inference capability. You can write a script to call the high-level pipeline API or run the unified script run_mindformer to start inference. In inference mode, you can easily set and execute model inference tasks using the pipeline API. The pipeline API simplifies the overall process from data preparation to model inference. The modular design allows users to define each phase of data processing and inference through configuration files or APIs. In addition, users can customize data processing logic and inference policies based on requirements. If the unified script run_mindformer is used, you can directly start the system through the configuration file without writing code. + +The following table lists the features supported by MindFormers text generation and inference. + +|Feature|Concept|Function| +|:------------|:---------------------------------------|:-----------------------------------------------------| +|[Incremental inference](#incremental-inference)|Incremental inference indicates that the model can generate text step by step instead of generating all content at a time.|You can accelerate the text generation speed when the text_generator method is called to generate autoregressive text. use_past is set to True in the YAML file by default to enable incremental inference.| +|[Batch inference](#multi-device-multi-batch-inference)|Batch inference is a method of processing multiple input samples at the same time.|You can input multiple samples to perform inference in batches. When the computing power of a single batch is insufficient, multi-batch inference can improve the inference throughput.| +|[Stream inference](#stream-inference)|Stream inference is a processing method that allows a model to start to output a result after receiving a part of an input, instead of waiting for the entire input sequence to be completely received.|With the Streamer class provided, when the text_generator method is called to generate text, you can view each generated word in real time without waiting for all results to be generated.| +|[Distributed inference](#multi-device-inference)|Distributed inference is a method of distributing computing tasks on multiple compute nodes for execution.|For models that cannot be deployed on a single device, you need to split the models using the multi-device distributed model before inference.| + +## Procedure + +Based on actual operations, the inference process can be divided into the following steps: + +1. **Selecting a model to be inferred:** + Select a model based on the required inference task. For example, select Llama2 for text generation. + +2. **Preparing the model weight:** + Download the weight of the corresponding model from the HuggingFace model library and convert the model to the CKPT format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html). + +3. **Executing inference tasks:** + Call the pipeline API or use the unified script **run_mindformer** to execute inference tasks. + +## Inference Based on the Pipeline API + +An inference task process can be generated based on the customized text of the pipeline API. Single-device inference and multi-device inference are supported. For details about how to use the pipeline API to start a task and output the result, see the following implementation. For details about the parameters, see [the pipeline API document](https://gitee.com/mindspore/mindformers/blob/dev/docs/api/api_python/mindformers.pipeline.rst). + +### Incremental Inference + +```python +from mindformers import build_context +from mindformers import AutoModel, AutoTokenizer, pipeline, TextStreamer + +# Construct the input content. +inputs = ["I love Beijing, because", "LLaMA is a", "Huawei is a company that"] + +# Initialize the environment. +build_context({'context': {'mode': 0}, 'parallel': {}, 'parallel_config': {}}) + +# Instantiate a tokenizer. +tokenizer = AutoTokenizer.from_pretrained('llama2_7b') + +# Instantiate a model. +# Modify the path to the local weight path. +model = AutoModel.from_pretrained('llama2_7b', checkpoint_name_or_path="path/to/llama2_7b.ckpt", use_past=True) + +# Start a non-stream inference task in the pipeline. +text_generation_pipeline = pipeline(task="text_generation", model=model, tokenizer=tokenizer) +outputs = text_generation_pipeline(inputs, max_length=512, do_sample=False, top_k=3, top_p=1) +for output in outputs: + print(output) +``` + +Save the example to **pipeline_inference.py**, modify the path for loading the weight, and run the **pipeline_inference.py** script. + +```shell +python pipeline_inference.py +``` + +The inference result is as follows: + +```text +'text_generation_text': [I love Beijing, because it is a city that is constantly constantly changing. I have been living here for ......] +'text_generation_text': [LLaMA is a large-scale, open-source, multimodal, multilingual, multitask, and multimodal pretrained language model. It is ......] +'text_generation_text': [Huawei is a company that has been around for a long time. ......] +``` + +### Stream Inference + +```python +from mindformers import build_context +from mindformers import AutoModel, AutoTokenizer, pipeline, TextStreamer + +# Construct the input content. +inputs = ["I love Beijing, because", "LLaMA is a", "Huawei is a company that"] + +# Initialize the environment. +build_context({'context': {'mode': 0}, 'parallel': {}, 'parallel_config': {}}) + +# Instantiate a tokenizer. +tokenizer = AutoTokenizer.from_pretrained('llama2_7b') + +# Instantiate a model. +# Modify the path to the local weight path. +model = AutoModel.from_pretrained('llama2_7b', checkpoint_name_or_path="path/to/llama2_7b.ckpt", use_past=True) + +# Start a stream inference task in the pipeline. +streamer = TextStreamer(tokenizer) +text_generation_pipeline = pipeline(task="text_generation", model=model, tokenizer=tokenizer, streamer=streamer) +_ = text_generation_pipeline(inputs, max_length=512, do_sample=False, top_k=3, top_p=1) +``` + +Save the example to **pipeline_inference.py**, modify the path for loading the weight, and run the **pipeline_inference.py** script. + +```shell +python pipeline_inference.py +``` + +The inference result is as follows: + +```text +'text_generation_text': [I love Beijing, because it is a city that is constantly constantly changing. I have been living here for ......] +'text_generation_text': [LLaMA is a large-scale, open-source, multimodal, multilingual, multitask, and multimodal pretrained language model. It is ......] +'text_generation_text': [Huawei is a company that has been around for a long time. ......] +``` + +## Inference Based on the run_mindformer Script + +For single-device inference, you can directly run [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py). For multi-device inference, you need to run [scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh). Take Llama2 as an example. You are advised to configure the [predict_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_7b.yaml) file. + +## Single-Device Inference + +```shell +python run_mindformer.py \ +--config configs/llama2/predict_llama2_7b.yaml \ +--run_mode predict \ +--use_parallel False \ +--load_checkpoint path/to/checkpoint.ckpt \ +--predict_data 'I love Beijing, because' +``` + +## Multi-Device Inference + +```shell +bash scripts/msrun_launcher.sh "python run_mindformer.py \ +--config configs/llama2/predict_llama2_7b.yaml \ +--run_mode predict \ +--use_parallel True \ +--auto_trans_ckpt True \ +--load_checkpoint path/to/checkpoint.ckpt \ +--predict_data 'I love Beijing, because'" \ +2 +``` + +## Multi-Device Multi-Batch Inference + +```shell +bash scripts/msrun_launcher.sh "python run_mindformer.py \ +--config configs/llama2/predict_llama2_7b.yaml \ +--run_mode predict \ +--predict_batch_size 4 \ +--use_parallel True \ +--auto_trans_ckpt True \ +--load_checkpoint path/to/checkpoint.ckpt \ +--predict_data path/to/input_predict_data.txt" \ +2 +``` + +The following table describes the input parameters for script execution. +|Parameter|Description| +|:---------------------------------|:-------------------------------------------------------------------------| +|config|Path of the YAML file.| +|run_mode|Running mode. Set it to **predict** for inference.| +|predict_batch_size|Size of inferences in batches.| +|use_parallel|Specifies whether to use the multi-device inference.| +|auto_trans_ckpt|For multi-device inference, set this parameter to **True**, indicating automatic weight segmentation. The default value is **False**.| +|load_checkpoint|Loaded weight path.| +|predict_data|Input data for inference. For multi-batch inference, the path of the TXT file containing the input data needs to be specified.| +|2|In the multi-device inference command, **2** indicates the number of devices used for inference.| + +The results of running the preceding single-device and multi-device inference commands are as follows: + +```text +'text_generation_text': [I love Beijing, because it is a city that is constantly constantly changing. I have been living here for ......] +``` + +## More Information + +For more inference examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html). diff --git a/docs/mindformers/docs/source_en/usage/parameter_efficient_fine_tune.md b/docs/mindformers/docs/source_en/usage/parameter_efficient_fine_tune.md index 9a9c833c824da699cff6a4a50d3d4c8031318eb6..a5804df5413f451132ca0ef90ae29c2200336052 100644 --- a/docs/mindformers/docs/source_en/usage/parameter_efficient_fine_tune.md +++ b/docs/mindformers/docs/source_en/usage/parameter_efficient_fine_tune.md @@ -1,3 +1,98 @@ -# Low-Parameter Fine-Tuning +# Parameter-Efficient Fine-Tuning (PEFT) -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/parameter_efficient_fine_tune.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/parameter_efficient_fine_tune.md) + +## Overview + +In the fine-tuning process of a deep learning model, all weights of the model need to be updated, which causes a large amount of computing resource consumption. Low-rank adaptation (LoRA) is a technology that significantly reduces the number of parameters required for fine-tuning by decomposing a partial weight matrix of a model into low-rank matrices. With Huawei Ascend AI processors, MindSpore deep learning framework, and MindFormers foundation model suite, LoRA can be used for PEFT of large-scale pretrained models (such as Llama2), providing efficient and flexible model customization capabilities. + +## LoRA Principles + +LoRA achieves a significant reduction in the number of parameters by decomposing the weight matrix of the original model into two low-rank matrices. For example, assuming that the size of a weight matrix **W** is *m* x *n*, the matrix is decomposed into two low-rank matrices **A** and **B** by using LoRA, where the size of **A** is *m* x *r*, and the size of **B** is *r* x *n* (*r* is far less than *m* and *n*). In the fine-tuning process, only the two low-rank matrices are updated without changing other parts of the original model. +This method not only greatly reduces the computing overhead of fine-tuning, but also retains the original performance of the model. It is especially suitable for model optimization in environments with limited data volume and computing resources. For details, see [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685). + +## LoRA Fine-Tuning Process + +The following key steps are required for LoRA PEFT: + +1. **Pretrained model weight loading**: Load base weights from the pretrained model. These weights are parameters obtained after the model is trained on a large-scale dataset. + +2. **Dataset preparation**: Select and prepare a dataset for fine-tuning. The dataset must be related to the target task, and the format must match the input format of the model. + +3. **Fine-tuning parameter settings**: Set fine-tuning parameters, including the learning rate, optimizer type, and batch size. + +4. **LoRA parameter settings**: Set the **pet_config** parameter at the key layer (such as the attention layer) of the model and adjust the low-rank matrix to update the model parameters. + +5. **Fine-tuning process startup**: Use the set parameters and datasets to start the fine-tuning process in a distributed environment. + +6. **Evaluation and saving**: During or after fine-tuning, evaluate the model performance, and save the fine-tuned model weight. + +## Using MindFormers for LoRA PEFT of Llama2 + +In the distributed environment of Ascend AI processors, the MindFormers suite can be used to easily implement the LoRA PEFT. The following shows the core configuration part of the LoRA fine-tuning of the Llama2 model and details the **pet_config** parameters. + +### YAML File Example + +For details about the complete YAML file, see [the Llama2 LoRA fine-tuning YAML file](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/lora_llama2_7b.yaml). + +```yaml +# model config +model: + model_config: + type: LlamaConfig + batch_size: 1 + seq_length: 4096 + hidden_size: 4096 + num_layers: 32 + num_heads: 32 + vocab_size: 32000 + compute_dtype: "float16" + pet_config: + pet_type: lora + lora_rank: 16 + lora_alpha: 16 + lora_dropout: 0.05 + target_modules: '.*wq|.*wk|.*wv|.*wo' + arch: + type: LlamaForCausalLM +``` + +### pet_config Parameters + +In **model_config**, **pet_config** is the core setting part of LoRA fine-tuning and is used to specify LoRA parameters. The parameters are described as follows: + +- **pet_type**: specifies that the type of the parameter-efficient tuning (PET) is LoRA. The LoRA module is inserted in the key layer of the model to reduce the number of parameters required for fine-tuning. +- **lora_rank**: specifies the rank value of a low-rank matrix. A smaller rank value indicates fewer parameters that need to be updated during fine-tuning, reducing occupation of computing resources. The value **16** is a common equilibrium point, which significantly reduces the number of parameters while maintaining the model performance. +- **lora_alpha**: specifies the scaling ratio for weight update in the LoRA module. This value determines the amplitude and impact of weight update during fine-tuning. The value **16** indicates that the scaling amplitude is moderate, stabilizing the training process. +- **lora_dropout**: specifies the dropout probability in the LoRA module. Dropout is a regularization technique used to reduce overfitting risks. The value **0.05** indicates that there is a 5% probability that some neuron connections are randomly disabled during training. This is especially important when the data volume is limited. +- **target_modules**: specifies the weight matrices to which LoRA applies in the model by using a regular expression. In Llama, the configuration here applies LoRA to the Query (WQ), Key (WK), Value (WV), and Output (WO) matrices in the self-attention mechanism of the model. These matrices play a key role in the Transformer structure. After LoRA is inserted, the model performance can be maintained while the number of parameters is reduced. + +By configuring these parameters, LoRA can effectively reduce the computing resource usage during fine-tuning while maintaining the high performance of the model. + +### Examples of LoRA Fine-Tuning for Llama2-7B + +MindFormers provides [the LoRA fine-tuning examples](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#lora%E5%BE%AE%E8%B0%83) of Llama2-7B. For details about the dataset used during fine-tuning, see [dataset downloading](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). + +Take Llama2-7B as an example. You can run the following **msrun** startup script to perform 8-device distributed fine-tuning. + +```shell +bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/lora_llama2_7b.yaml \ + --train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord \ + --load_checkpoint /{path}/llama2_7b.ckpt \ + --auto_trans_ckpt False \ + --use_parallel True \ + --run_mode finetune" 8 +``` + +If the weight can be loaded only after conversion, the weight loading path must be set to the upper-layer path of **rank_0** and the automatic weight conversion function must be enabled (**--auto_trans_ckpt** is set to **True**). + +```shell +bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/lora_llama2_7b.yaml \ + --train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord \ + --load_checkpoint /{path}/checkpoint/ \ + --auto_trans_ckpt True \ + --use_parallel True \ + --run_mode finetune" 8 +``` diff --git a/docs/mindformers/docs/source_en/usage/pre_training.md b/docs/mindformers/docs/source_en/usage/pre_training.md index 9c08b40b136acb7c6a1207d17ffe5db55267d999..a07e99f0c028882e99028c052e9246be81e567d3 100644 --- a/docs/mindformers/docs/source_en/usage/pre_training.md +++ b/docs/mindformers/docs/source_en/usage/pre_training.md @@ -1,3 +1,87 @@ -# Pre-training +# Pretraining -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/pre_training.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/pre_training.md) + +## Overview + +Pretraining refers to training a model on large-scale unlabeled data, so that the model can comprehensively capture a wide range of features of a language. A pretrained model can learn knowledge at the vocabulary, syntax, and semantic levels. After fine-tuning, the knowledge is applied in downstream tasks to optimize the performance of specific tasks. The objective of the MindFormers framework pretraining is to help developers quickly and conveniently build and train pretrained models based on the Transformer architecture. + +## Procedure + +Based on actual operations, the basic pretraining process can be divided into the following steps: + +1. **Preparing a dataset:** + Prepare a large-scale unlabeled text dataset for pretraining. Such datasets contain a large amount of text from multiple sources, such as networks, books, and articles. The diversity and scale of datasets have a great impact on the generalization capability of models. + +2. **Selecting a model architecture:** + Select a proper model architecture to build a pretrained model based on task requirements and computing resources. + +3. **Pretraining:** + Perform pretraining with the prepared large-scale dataset and use the configured model architecture and training configuration to perform long-time training to generate the final pretrained model weight. + +4. **Saving a model:** + After the training is complete, save the model weight to the specified location. + +## MindFormers-based Pretraining Practice + +Currently, MindFormers supports mainstream foundation models in the industry. In this practice, Llama2-7B and Llama3-70B are used to demonstrate [Single-Node Training](#single-node-training) and [Multi-Node Training](#multi-node-training), respectively. + +### Preparing a Dataset + +| Dataset | Applicable Model | Applicable Phase | Download Link | +|:--------|:----------:|:--------:|:-------------------------------------------------------------------------------:| +| Wikitext2 | Llama2-7B | Pretrain | [Link](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/dataset/wikitext-2/wikitext-2-v1.zip) | +| Wiki103 | Llama3-70B | Pretrain | [Link](https://dagshub.com/DagsHub/WIkiText-103/src/main/dataset/tokens) | + +### Data Preprocessing + +For details about how to process the Llama2-7B and Llama3-70B datasets, see [the Wikitext2 data preprocessing](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md) and [the Wiki103 data preprocessing](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3.md), respectively. + +## Executing a Pretrained Task + +### Single-Node Training + +Take Llama2-7B as an example. Specify the configuration file [pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/pretrain_llama2_7b.yaml) and start the [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py) script in msrun mode to perform 8-device distributed training. The startup command is as follows: + +```bash +bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/pretrain_llama2_7b.yaml \ + --train_dataset_dir /{path}/wiki4096.mindrecord \ + --use_parallel True \ + --run_mode train" 8 + + # Parameters: + config: model configuration file, which is stored in the config directory of the MindFormers code repository. + train_dataset_dir: path of the training dataset. + use_parallel: specifies whether to enable parallelism. + run_mode: running mode. The value can be train, finetune, or predict (inference). + ``` + +After the task is executed, the **checkpoint** folder is generated in the **mindformers/output** directory, and the model file is saved in this folder. + +### Multi-Node Training + +Take Llama3-70B as an example. Use the [pretrain_llama3_70b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/pretrain_llama3_70b.yaml) configuration file to run [run_llama3.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/run_llama3.py) in msrun mode to perform 8-node 64-device pretraining. To perform distributed training on a multi-node multi-device script, you need to run the script on different nodes and set the **MASTER_ADDR** parameter to the IP address of the primary node. The IP addresses of all nodes are the same, and only the values of **NODE_RANK** are different for different nodes. For details about the parameter positions, see [msrun Launching Guide](https://www.mindspore.cn/docs/en/master/model_train/parallel/msrun_launcher.html). + +```shell +# Node 0: Set the IP address of node 0 to the value of MASTER_ADDR, which is used as the IP address of the primary node. There are 64 devices in total with 8 devices for each node. +# Change the value of node_num for nodes 0 to 7 in sequence. For example, if there are eight nodes, the value of node_num ranges from 0 to 7. +bash scripts/msrun_launcher.sh "run_llama3.py \ + --config pretrain_llama3_70b.yaml \ + --train_dataset dataset_dir + --use_parallel True \ + --run_mode train" \ + 64 8 {MASTER_ADDR} 8118 {node_num} output/msrun_log False 300 + + # Parameters: + config: model configuration file, which is stored in the config directory of the MindFormers code repository. + train_dataset_dir: path of the training dataset. + use_parallel: specifies whether to enable parallelism. + run_mode: running mode. The value can be train, finetune, or predict (inference). +``` + +**Note**: During multi-node distributed training, some performance problems may occur. To ensure the efficiency and stability of the training process, you are advised to optimize and adjust the performance by referring to [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html). + +## More Information + +For more training examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html). diff --git a/docs/mindformers/docs/source_en/usage/quantization.md b/docs/mindformers/docs/source_en/usage/quantization.md index 6a2d5b77d7602c99b50065aec8911cf12d739ece..8d30412672091458f6e05d2b1c0ee25b26a36b48 100644 --- a/docs/mindformers/docs/source_en/usage/quantization.md +++ b/docs/mindformers/docs/source_en/usage/quantization.md @@ -1,3 +1,256 @@ # Quantization -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/quantization.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/quantization.md) + +## Overview + +Quantization is an important technology for compressing foundation models. It converts floating-point parameters in a model into low-precision integer parameters to compress the parameters. As the parameters and specifications of a model increase, quantization can effectively reduce the model storage space and loading time during model deployment, improving the model inference performance. + +MindFormers integrates the MindSpore Golden Stick tool component to provide a unified quantization inference process, facilitating out-of-the-box use. + +## Auxiliary Installation + +Before using the quantization inference function, install MindSpore Golden Stick. For details, see [Installation](https://gitee.com/mindspore/golden-stick/blob/master/README.md/#documents). + +Download the source code and go to the `golden_stick` directory. + +```bash +bash build.sh +pip install output/mindspore_gs-0.6.0-py3-none-any.whl +``` + +Run the following commands to verify the installation: + +```bash +pip show mindspore_gs + +# Name: mindspore_gs +# Version: 0.6.0 +# Summary: A MindSpore model optimization algorithm set.. +# Home-page: https://www.mindspore.cn +# Author: The MindSpore Authors +# Author-email: contact@mindspore.cn +# License: Apache 2.0 +``` + +## Procedure + +Based on actual operations, quantization may be decomposed into the following steps: + +1. **Selecting a model:** + Select a language model. Currently, the Llama2_13B and Llama2_70B models support quantization. + +2. **Downloading the model weights:** + Download the weight of the corresponding model from the HuggingFace model library and convert the model to the CKPT format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html). + +3. **Converting the quantization model weight:** + Run the conversion script `quant_ckpt.py` in the mindspore_gs library to convert the original weight in step 2 to the quantization weight. + +4. **Preparing the quantization configuration file:** + Use the built-in quantization inference configuration file of MindFormers that matches the model. The quantization-related configuration item is `model.model_config.quantization_config`. + + The following uses the `llama2_13b_rtn` quantization model as an example. The default quantization configuration is as follows: + + ```yaml + quantization_config: + quant_method: 'rtn' + weight_dtype: 'int8' + activation_dtype: None + kvcache_dtype: None + outliers_suppression: None + modules_to_not_convert: ['lm_head'] + algorithm_args: {} + ``` + + | Parameter | Attribute| Description | Type | Value Range | + | ---------------------- | ---- |:----------------------------------------------| --------- |------------------| + | quant_method | Required| Supported quantization algorithm. Currently, only the RTN, Smooth_Quant, and PTQ algorithms are supported. | str | rtn/smooth_quant/ptq | + | weight_dtype | Required| Quantized weight type. Currently, only int8 is supported. | str | int8/None | + | activation_dtype | Required| Activation type of the parameter. **None** indicates that the original computing type (**compute_dtype**) of the network remains unchanged. | str | int8/None | + | kvcache_dtype | Optional| KVCache quantization type. If the value is **None** or not specified, the original KVCache data type remains unchanged. | str | int8/None | + | outliers_suppression | Optional| Algorithm type used for abnormal value suppression. Currently, only smooth suppression is supported. | str | smooth/None | + | modules_to_not_convert | Required| Layer that is not quantized. | List[str] | / | + | algorithm_args | Required| Configurations of different algorithm types for connecting to the MindSpore Golden Stick. For example, **alpha** is set to **0.5** for the Smooth_Quant algorithm.| Dict | / | + +5. **Executing inference tasks:** + Implement the inference script based on the `generate` API and run the script to obtain the inference result. + +## Using the RTN Quantization Algorithm to Perform A16W8 Quantization Inference Based on the Llama2_13B Model + +### Selecting a Model + +In this practice, the Llama2-13B model is used for single-device quantization inference. + +In this practice, `AutoModel.from_pretrained()` is used to instantiate a model by specifying the models or weight path. You need to create a storage directory in advance. + +```shell +mkdir /data/tutorial/llama2_13b_rtn_a16w8_dir +``` + +> Note: Currently, the AutoModel.from_pretrained() API does not support instantiation by specifying parameters based on the quantized model name. + +Directory structure of a single device + +```shell +llama2_13b_rtn_a16w8_dir + ├── predict_llama2_13b_rtn.yaml + └── llama2_13b_rtn_a16w8.ckpt +``` + +### Downloading the Model Weights + +MindFormers provides pretrained weights and vocabulary files that have been converted for pretraining, fine-tuning, and inference. You can also download the official HuggingFace weights and perform the operations in [Converting Model Weights](#converting-model-weights) before using these weights. + +You can download the vocabulary at [tokenizer.model](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/tokenizer.model). + +| Model | MindSpore Weight | HuggingFace Weight | +|:----------------|:----------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------:| +| llama2-13b | [llama2-13b-fp16.ckpt](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/llama2-13b-fp16.ckpt) | [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | + +> Note: All weights of Llama2 need to be obtained by [submitting an application](https://ai.meta.com/resources/models-and-libraries/llama-downloads) to Meta. If necessary, apply for the weights by yourself. + +### Converting Model Weights + +Go to the root directory `golden-stick` of the mindspore_gs library and run the quantization weight conversion script. + +```bash +python example/ptq/quant_ckpt.py -c /path/to/predict_llama2_13b.yaml -s /path/to/boolq/dev.jsonl -t boolq -a rtn-a16w8 > log_rtn_a16w8_quant 2>& +``` + +Set `load_checkpoint` in `predict_llama2_13b.yaml` to the path for storing the original weight downloaded in the previous step. + +During the conversion, `boolq` is used to verify the datasets. You can download it at [the boolq dataset link](https://github.com/svinkapeppa/boolq). After the download is complete, specify the path for storing `dev.jsonl` in the preceding script. + +Run the script to copy the generated quantization weight file to the `llama2_13b_rtn_a16w8_dir` directory. + +```shell +cp output/rtn-a16w8_ckpt/rank_0/rtn-a16w8.ckpt /data/tutorial/llama2_13b_rtn_a16w8_dir/llama2_13b_rtn_a16w8.ckpt +``` + +### Preparing the Quantization Configuration File + +The configuration file [predict_llama2_13b_rtn.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_13b_rtn.yaml) is provided in MindFormers. You need to copy it to the `llama2_13b_rtn_a16w8_dir` directory. + +```shell +cp configs/llama2/predict_llama2_13b_rtn.yaml /data/tutorial/llama2_13b_rtn_a16w8_dir +``` + +### Executing Inference Tasks + +1. **Script instances** + + Replace the [run_llama2_generate.py](https://gitee.com/mindspore/mindformers/blob/dev/scripts/examples/llama2/run_llama2_generate.py) script in MindFormers with the following code. + + In this practice, the quantization model is instantiated based on the `AutoModel.from_pretrained()` API. You need to modify the parameters in the API to the created directory. + + You can call the `generate` API to obtain the inference result. For details about the parameters, see the [AutoModel](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers/mindformers.AutoModel.html) and [generate](https://www.mindspore.cn/mindformers/docs/en/dev/generation/mindformers.generation.GenerationMixin.html) API documents. + + ```python + """llama2 predict example.""" + import argparse + import os + + import mindspore as ms + from mindspore import Tensor, Model + from mindspore.common import initializer as init + + from mindformers import AutoModel + from mindformers import MindFormerConfig, logger + from mindformers.core.context import build_context + from mindformers.core.parallel_config import build_parallel_config + from mindformers.models.llama import LlamaTokenizer + from mindformers.trainer.utils import transform_and_load_checkpoint + + + def main(config_path, use_parallel, load_checkpoint): + # Construct the input content. + inputs = ["I love Beijing, because", + "LLaMA is a", + "Huawei is a company that"] + batch_size = len(inputs) + + # Generate model configurations based on the YAML file. + config = MindFormerConfig(config_path) + config.use_parallel = use_parallel + device_num = os.getenv('MS_WORKER_NUM') + logger.info(f"Use device number: {device_num}, it will override config.model_parallel.") + config.parallel_config.model_parallel = int(device_num) if device_num else 1 + config.parallel_config.data_parallel = 1 + config.parallel_config.pipeline_stage = 1 + config.load_checkpoint = load_checkpoint + + # Initialize the environment. + build_context(config) + build_parallel_config(config) + model_name = config.trainer.model_name + + # Instantiate a tokenizer. + tokenizer = LlamaTokenizer.from_pretrained(model_name) + # Instantiate the model. + network = AutoModel.from_pretrained("/data/tutorial/llama2_13b_rtn_a16w8_dir", + download_checkpoint=False) + model = Model(network) + + # Load weights. + if config.load_checkpoint: + logger.info("----------------Transform and load checkpoint----------------") + seq_length = config.model.model_config.seq_length + input_ids = Tensor(shape=(batch_size, seq_length), dtype=ms.int32, init=init.One()) + infer_data = network.prepare_inputs_for_predict_layout(input_ids) + transform_and_load_checkpoint(config, model, network, infer_data, do_predict=True) + + inputs_ids = tokenizer(inputs, max_length=config.model.model_config.seq_length, padding="max_length")["input_ids"] + + outputs = network.generate(inputs_ids, + max_length=config.model.model_config.max_decode_length, + do_sample=config.model.model_config.do_sample, + top_k=config.model.model_config.top_k, + top_p=config.model.model_config.top_p) + for output in outputs: + print(tokenizer.decode(output)) + + + if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument('--config_path', default='predict_llama2_7b.yaml', type=str, + help='model config file path.') + parser.add_argument('--use_parallel', action='store_true', + help='if run model prediction in parallel mode.') + parser.add_argument('--load_checkpoint', type=str, + help='load model checkpoint path or directory.') + + args = parser.parse_args() + main( + args.config_path, + args.use_parallel, + args.load_checkpoint + ) + ``` + +2. **Startup script** + + MindFormers provides a quick inference script for the `Llama2` model, supporting single-device, multi-device, and multi-batch inferences. + + ```shell + # Script usage + bash scripts/examples/llama2/run_llama2_predict.sh PARALLEL CONFIG_PATH CKPT_PATH DEVICE_NUM + # Parameters + PARALLEL: specifies whether to use multi-device inference. 'single' indicates single-device inference, and 'parallel' indicates multi-device inference. + CONFIG_PATH: model configuration file path. + CKPT_PATH: path of the model weight file. + DEVICE_NUM: number of used devices. This parameter takes effect only when multi-device inference is enabled. + ``` + + Single-Device Inference + + ```bash + bash scripts/examples/llama2/run_llama2_predict.sh single /data/tutorial/llama2_13b_w8a16_dir/predict_llama2_13b_w8a16.yaml /data/tutorial/llama2_13b_w8a16_dir/llama2_13b_w8a16.ckpt + ``` + + The inference result is as follows: + + ```text + 'text_generation_text': [I love Beijing, because it is a city that is constantly constantly changing. I have been living here for ......] + 'text_generation_text': [LLaMA is a large-scale, open-source, multimodal, multilingual, multitask, and multimodal pretrained language model. It is ......] + 'text_generation_text': [Huawei is a company that has been around for a long time. ......] + ``` diff --git a/docs/mindformers/docs/source_en/usage/sft_tuning.md b/docs/mindformers/docs/source_en/usage/sft_tuning.md index 47101f6a6bddb1b4577dd5b791ac60079921d3e7..4b16db7f75bb1610232bea081b18470e73bcaf6f 100644 --- a/docs/mindformers/docs/source_en/usage/sft_tuning.md +++ b/docs/mindformers/docs/source_en/usage/sft_tuning.md @@ -1,3 +1,161 @@ -# SFT-Tuning +# Supervised Fine-Tuning (SFT) -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/sft_tuning.md) \ No newline at end of file +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/sft_tuning.md) + +## Overview + +SFT uses supervised learning. Pretraining is performed with a source dataset to obtain an original model, and then parameters of the original model are fine-tuned with a new dataset to obtain a new model, achieving better performance on new tasks. + +## Process + +SFT consists of the following steps: + +- **Pretraining:** + A neural network model is trained on a large-scale dataset. For example, an LLM is trained on a large amount of unlabeled text data. The objective of the pre-training phase is to enable the model to obtain common knowledge and understanding capabilities. +- **Fine-tuning:** + Based on the target task, the obtained pretrained model is fine-tuned by using the new training dataset. During fine-tuning, all or some parameters of the original model can be optimized through backpropagation to achieve a better effect of the model on the target task. +- **Evaluation:** + After fine-tuning, a new model is obtained. The fine-tuning model may be evaluated by using the evaluation dataset of the target task to obtain performance metrics of the fine-tuning model on the target task. + +Based on actual operations, SFT may be decomposed into the following steps: + +1. **Selecting a pretrained model:** + Select a pretrained language model, for example, GPT-2 or Llama2. The pretrained model is trained on a large text corpus to learn a general representation of a language. +2. **Downloading the model weights:** + For the selected pretrained model, download the pretrained weights from the HuggingFace model library. +3. **Converting model weights:** + Convert the downloaded HuggingFace weight based on the required framework, for example, convert it to the CKPT weights supported by the MindSpore framework. +4. **Preparing a dataset:** + Select a dataset for fine-tuning tasks based on the fine-tuning objective. For LLMs, the fine-tuning dataset is data that contains text and labels, for example, the alpaca dataset. When using a dataset, you need to preprocess the corresponding data. For example, when using the MindSpore framework, you need to convert the dataset to the MindRecord format. +5. **Performing a fine-tuning task:** + Use the dataset of the fine-tuning task to train the pre-trained model and update the model parameters. If all parameters are fine-tuned, all parameters are updated. After the fine-tuning task is complete, a new model can be obtained. + +## MindFormers-based Full-Parameter Fine-Tuning Practice + +### Selecting a Pretrained Model + +MindFormers supports mainstream foundation models in the industry. This practice uses the Llama2-7B model for SFT as an example. + +### Downloading the Model Weights + +MindFormers provides pretrained weights and vocabulary files that have been converted for pretraining, fine-tuning, and inference. You can also download the official HuggingFace weights and convert model weights before using these weights. + +You can download the vocabulary at [tokenizer.model](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/tokenizer.model). + +| Model | MindSpore Weight | HuggingFace Weight | +|:----------|:------------------------:| :----------------------: | +| Llama2-7B | [Link](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/llama2_7b.ckpt) | [Link](https://huggingface.co/meta-llama/Llama-2-7b-hf) | + +> All weights of Llama2 need to be obtained by [submitting an application](https://ai.meta.com/resources/models-and-libraries/llama-downloads) to Meta. If necessary, apply for the weights by yourself. + +### Converting Model Weights + +Take the [Llama2-7B model](https://huggingface.co/meta-llama/Llama-2-7b-hf/tree/main) as an example. The original HuggingFace weight file contains the following information:
+ +- `config.json`: main configuration information of the model architecture.
+- `generation_config.json`: configuration information related to text generation.
+- `safetensors file`: model weight file.
+- `model.safetensors.index.json`: JSON file that describes safetensors model parameter file index and model slices.
+- `bin file`: PyTorch model weight file.
+- `pytorch_model.bin.index.json`: JSON file that describes PyTorch index and model slices.
+- `tokenizer.json`: tokenizer vocabulary configuration file.
+- `tokenizer.model`: tokenizer of the model.
+ +MindFormers provides a weight conversion script. You can run the conversion script [convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/convert_weight.py) to convert the HuggingFace weights to the complete CKPT weights. + +```bash +python convert_weight.py --model llama --input_path TORCH_CKPT_DIR --output_path {path}/MS_CKPT_NAME +``` + +Parameters: + +```commandline +model: model name. For details about other models, see the model description document. +input_path: path of the folder where the HuggingFace weight is downloaded. +output_path: path for storing the converted MindSpore weight file. +``` + +### Preparing a Dataset + +MindFormers provides **WikiText2** as the pretraining dataset and **alpaca** as the fine-tuning dataset. + +| Dataset | Applicable Model | Applicable Phase | Download Link | +|:----------|:-------------------------------------:|:---------:| :--------------------------------------------------------------------------------------------------------------------------------------------------------------: | +| alpaca | Llama2-7B
Llama2-13B
Llama2-70B | Fine-tuning | [Link](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) | + +The following uses the alpaca dataset as an example. After downloading the dataset, you need to preprocess it. For details about how to download the `tokenizer.model` used in preprocessing, see the model weight download. + +**alpaca Data Preprocessing** + +1. Run the [alpaca_converter.py script](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/alpaca_converter.py) in MindFormers to convert the dataset into the multi-round dialog format. + + ```bash + python alpaca_converter.py \ + --data_path /{path}/alpaca_data.json \ + --output_path /{path}/alpaca-data-conversation.json + ``` + + Parameters: + + ```commandline + data_path: path of the file to be downloaded. + output_path: path for storing output files. + ``` + +2. Run the [llama_preprocess.py script](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/llama_preprocess.py) in MindFormers to convert the data into the MindRecord format. This operation depends on the fastchat tool package to parse the prompt template. You need to install fastchat 0.2.13 or later and Python 3.9 in advance. + + ```bash + python llama_preprocess.py \ + --dataset_type qa \ + --input_glob /{path}/alpaca-data-conversation.json \ + --model_file /{path}/tokenizer.model \ + --seq_length 4096 \ + --output_file /{path}/alpaca-fastchat4096.mindrecord + ``` + + Parameters: + + ```commandline + dataset_type: type of the data to be preprocessed. + input_glob: path of the converted alpaca file. + model_file: path of the tokenizer.model file. + seq_length: sequence length of the output data. + output_file: path for storing output files. + ``` + +### Performing a Fine-tuning Task + +#### Single-Node Training + +Take Llama2-7B as an example. Run the startup script **msrun** to perform 8-device distributed training. The startup command is as follows: + +```bash +bash scripts/msrun_launcher.sh "run_mindformer.py \ + --config configs/llama2/finetune_llama2_7b.yaml \ + --load_checkpoint /{path}/llama2_7b.ckpt \ + --train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord \ + --use_parallel True \ + --run_mode finetune" 8 +``` + +Parameters: + +```commandline +config: model configuration file, which is stored in the config directory of the MindFormers code repository. +load_checkpoint: path of the checkpoint file. +train_dataset_dir: path of the training dataset. +use_parallel: specifies whether to enable parallelism. +run_mode: running mode. The value can be train, finetune, or predict (inference). +``` + +After the task is executed, the **checkpoint** folder is generated in the **mindformers/output** directory, and the model file is saved in this folder. + +#### Multi-Node Training + +The multi-node multi-device fine-tuning task is similar to the pretrained task. You can refer to the multi-node multi-device pretraining command and modify the command as follows: + +1. Add the input parameter `--load_checkpoint /{path}/llama2_7b.ckpt` to the startup script to load the pretrained weights. +2. Set `--train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord` in the startup script to load the fine-tuning dataset. +3. Set `--run_mode finetune` in the startup script. **run_mode** indicates the running mode, whose value can be **train**, **finetune**, or **predict** (inference). + +After the task is executed, the **checkpoint** folder is generated in the **mindformers/output** directory, and the model file is saved in this folder. diff --git a/docs/mindformers/docs/source_zh_cn/function/transform_weight.md b/docs/mindformers/docs/source_zh_cn/function/transform_weight.md index 5c23244da4e3ac434b571fe3570148cee324dc13..5263725b844d14d37986a5099f8303a4f4f61878 100644 --- a/docs/mindformers/docs/source_zh_cn/function/transform_weight.md +++ b/docs/mindformers/docs/source_zh_cn/function/transform_weight.md @@ -90,14 +90,14 @@ transform_process_num: 2 **离线权重转换**相关`yaml`参数说明如下: -| 参数名称 | 说明 | -| ----------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| 参数名称 | 说明 | +| ----------------- |-----------------------------| | src_checkpoint | 源权重的绝对路径或文件夹路径。
- 如果是**完整权重**,则填写**绝对路径**;
- 如果是**分布式权重**,则填写**文件夹路径**,分布式权重须按照`model_dir/rank_x/xxx.ckpt`格式存放,文件夹路径填写为`model_dir`。
**如果rank_x文件夹下存在多个ckpt,将会使用文件名默认排序最后的ckpt文件用于转换。** | | src_strategy | 源权重对应的分布式策略文件路径。
- 如果是完整权重,则**不填写**;
- 如果是分布式权重,且使用了流水线并行,则填写**合并的策略文件路径**或**分布式策略文件夹路径**;
- 如果是分布式权重,且未使用流水线并行,则填写任一**ckpt_strategy_rank_x.ckpt**路径; | -| dst_checkpoint | 保存目标权重的文件夹路径。 | -| dst_strategy | 目标权重对应的分布式策略文件路径。
- 如果是完整权重,则**不填写**;
- 如果是分布式权重,且使用了流水线并行,则填写**合并的策略文件路径**或**分布式策略文件夹路径**;
- 如果是分布式权重,且未使用流水线并行,则填写任一**ckpt_strategy_rank_x.ckpt**路径; | -| prefix | 目标权重保存的前缀名,权重保存为”{prefix}rank_x.ckpt”,默认”checkpoint_”。 | -| world_size | 目标权重的切片总数,一般等于dp \* mp \* pp。 | +| dst_checkpoint | 保存目标权重的文件夹路径。 | +| dst_strategy | 目标权重对应的分布式策略文件路径。
- 如果是完整权重,则**不填写**;
- 如果是分布式权重,且使用了流水线并行,则填写**合并的策略文件路径**或**分布式策略文件夹路径**;
- 如果是分布式权重,且未使用流水线并行,则填写任一**ckpt_strategy_rank_x.ckpt**路径; | +| prefix | 目标权重保存的前缀名,权重保存为”{prefix}rank_x.ckpt”,默认”checkpoint_”。 | +| world_size | 目标权重的切片总数,一般等于dp \* mp \* pp。 | | process_num | 离线权重转换使用的进程数,默认为1。
- 如果process_num = 1,使用**单进程转换**;
- 如果process_num > 1,使用**多进程转换**,比如转换的目标权重为8卡分布式权重,process_num=2时,会启动两个进程分别负责rank_0/1/2/3和rank_4/5/6/7切片权重的转换; | ### 离线转换配置说明 diff --git a/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md b/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md index 22a13ba0f37ef004bf85dbda86ed4964760ba7bc..b1e4b8b576755533bb3f8930ff2d05c5c4df6af9 100644 --- a/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md +++ b/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md @@ -74,13 +74,13 @@ python convert_weight.py --model llama2 --input_path /home/user/torch_weights -- ## 未支持模型权重转换开发 1. 在扩展模型目录下新增`convert_weight.py`及`convert_reversed.py`文件。 -2. 在文件中分别编写`conver_pt_to_ms`及`conver_ms_to_pt`权重转换函数,函数参数为`input_path`、`output_path`、`dtype`及额外参数`**kwargs`。 +2. 在文件中分别编写`convert_pt_to_ms`及`convert_ms_to_pt`权重转换函数,函数参数为`input_path`、`output_path`、`dtype`及额外参数`**kwargs`。 3. 在MindFormers根目录下`convert_weight.py`文件中的`convert_map`和`reversed_convert_map`字典中加入扩展模型名称及转换函数引入路径。 4. 额外参数在`main`函数中通过调用`parser.add_argument()`方法新增。 ## 模型权重转换开发示例 -此处以Llama为例。如若希望转换HuggingFace权重至MindFormers权重,需在[convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py)内定义`conver_pt_to_ms`函数: +此处以Llama为例。如若希望转换HuggingFace权重至MindFormers权重,需在[convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py)内定义`convert_pt_to_ms`函数: ```python def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs): diff --git a/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md b/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md index cd30f4fc531c149b23069ea826f04f555b89c43c..a86aa58bd4e1b2f9bedac180bf0016f900f0bf03 100644 --- a/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md +++ b/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md @@ -134,4 +134,4 @@ Llama3-8B的参数命名和Llama2-7B一致,因此可以复用Llama2-7B的权 由于Llama3-8B的分词器与Llama2-7B不同,因此Llama3-8B需要在Llama2-7B的数据集处理脚本的基础上,替换Llama3-8B的分词器对数据进行预处理,参考[conversation.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/conversation.py)和[llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama_preprocess.py)。 -关于MindFormers中Llama3的具体实现,可以参考MindFormers仓库中[Llama3的文件夹](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3)。关于MindFormers中Llama3的使用,可以参考[LLama3的说明文档](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3)。 \ No newline at end of file +关于MindFormers中Llama3的具体实现,可以参考MindFormers仓库中[Llama3的文件夹](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3)。关于MindFormers中Llama3的使用,可以参考[LLama3的说明文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3/llama3.md)。 \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/usage/evaluation.md b/docs/mindformers/docs/source_zh_cn/usage/evaluation.md index 0cce423eeb3a9780a3c0b6444241ef873aab0469..f7f9bb2ae33968f849e82655d117a4da8123ebc9 100644 --- a/docs/mindformers/docs/source_zh_cn/usage/evaluation.md +++ b/docs/mindformers/docs/source_zh_cn/usage/evaluation.md @@ -6,8 +6,7 @@ ### 基本介绍 -[LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) -是一个开源语言模型评测框架,提供60多种标准学术数据集的评测,支持HuggingFace模型评测、PEFT适配器评测、vLLM推理评测等多种评测方式,支持自定义prompt和评测指标,包含loglikelihood、generate_until、loglikelihood_rolling三种类型的评测任务。基于Harness评测框架对MindFormers进行适配后,支持加载MindFormers模型进行评测。 +[LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness)是一个开源语言模型评测框架,提供60多种标准学术数据集的评测,支持HuggingFace模型评测、PEFT适配器评测、vLLM推理评测等多种评测方式,支持自定义prompt和评测指标,包含loglikelihood、generate_until、loglikelihood_rolling三种类型的评测任务。基于Harness评测框架对MindFormers进行适配后,支持加载MindFormers模型进行评测。 ### 安装 diff --git a/docs/mindformers/docs/source_zh_cn/usage/parameter_efficient_fine_tune.md b/docs/mindformers/docs/source_zh_cn/usage/parameter_efficient_fine_tune.md index 7fe177d522068ec765e7b6e87d55c64b31e7a8bd..693ff6e2a03d7ba730883ab5159876f32705de3e 100644 --- a/docs/mindformers/docs/source_zh_cn/usage/parameter_efficient_fine_tune.md +++ b/docs/mindformers/docs/source_zh_cn/usage/parameter_efficient_fine_tune.md @@ -21,7 +21,7 @@ LoRA通过将原始模型的权重矩阵分解为两个低秩矩阵来实现参 3. **配置微调参数**:设置微调相关的参数,包括学习率、优化器类型、批次大小(batch size)等。 -4. **配置LoRA参数**:在模型的关键层(如注意力层)中配置 pet config 参数,通过调整低秩矩阵来实现模型的参数更新。 +4. **配置LoRA参数**:在模型的关键层(如注意力层)中配置 pet_config 参数,通过调整低秩矩阵来实现模型的参数更新。 5. **启动微调过程**:利用设置好的参数和数据集,在分布式环境中启动微调过程。 @@ -96,5 +96,3 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ --use_parallel True \ --run_mode finetune" 8 ``` - - diff --git a/docs/mindformers/docs/source_zh_cn/usage/quantization.md b/docs/mindformers/docs/source_zh_cn/usage/quantization.md index 38cae49d40a37945c39bb8cf959c2ae6058f6397..9d4229f7c12bd2511cb8b28db914e73e566c3500 100644 --- a/docs/mindformers/docs/source_zh_cn/usage/quantization.md +++ b/docs/mindformers/docs/source_zh_cn/usage/quantization.md @@ -108,6 +108,7 @@ MindFormers提供已经转换完成的预训练权重、词表文件用于预训 | llama2-13b | [llama2-13b-fp16.ckpt](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/llama2-13b-fp16.ckpt) | [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | > 注:Llama2的所有权重都需要通过向Meta[提交申请](https://ai.meta.com/resources/models-and-libraries/llama-downloads)来获取,如有需要请自行申请。 + ### 模型权重转换 进入mindspore_gs库根目录`golden-stick`,执行量化权重转换脚本 diff --git a/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md b/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md index 28b978e7e86f55d925e1e4dcada9cfe7646ee415..bc1172ef82f3c69dce510df6e255d36141cd9473 100644 --- a/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md +++ b/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md @@ -145,7 +145,7 @@ config: 模型的配置文件,文件在MindFormers代码仓中con load_checkpoint: checkpoint文件的路径 train_dataset_dir: 训练数据集路径 use_parallel: 是否开启并行 -run_mode: 运行模式,train:训练,fintune:微调,predict:推理 +run_mode: 运行模式,train:训练,finetune:微调,predict:推理 ``` 任务执行完成后,在mindformers/output目录下,会生成checkpoint文件夹,同时模型文件会保存在该文件夹下。 @@ -156,7 +156,7 @@ run_mode: 运行模式,train:训练,fintune:微调,predict 1. 增加启动脚本入参`--load_checkpoint /{path}/llama2_7b.ckpt`加载预训练权重。 2. 设置启动脚本中的`--train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord`加载微调数据集。 -3. 设置启动脚本中的`--run_mode finetune`,run_mode表示运行模式,train:训练,fintune:微调,predict:推理。 +3. 设置启动脚本中的`--run_mode finetune`,run_mode表示运行模式,train:训练,finetune:微调,predict:推理。 任务执行完成后,在mindformers/output目录下,会生成checkpoint文件夹,同时模型文件会保存在该文件夹下。 diff --git a/docs/mindspore/source_zh_cn/orange_pi/index.rst b/docs/mindspore/source_zh_cn/orange_pi/index.rst index ca44e2107ac554a630d1982c3aa80a7f0d8a6332..27af000e2ee8a489577794d245b27a4ed2504094 100644 --- a/docs/mindspore/source_zh_cn/orange_pi/index.rst +++ b/docs/mindspore/source_zh_cn/orange_pi/index.rst @@ -1,7 +1,7 @@ 香橙派开发 =============== -[OrangePi AIpro(香橙派 AIpro)](http://www.orangepi.cn/index.html)采用昇腾AI技术路线,具体为4核64位处理器和AI处理器,集成图形处理器,目前支持8-12TOPS和20TOPS AI算力,其中8-12TOPS算力开发板拥有12GB/24GB LPDDR4X,20TOPS算力开发板拥有8GB/16GB LPDDR4X。两种算力的开发板均可以外接32GB/64GB/256GB eMMC模块,支持双4K高清输出。 +[OrangePi AIpro(香橙派 AIpro)](http://www.orangepi.cn/index.html) 采用昇腾AI技术路线,具体为4核64位处理器和AI处理器,集成图形处理器,目前支持8-12TOPS和20TOPS AI算力,其中8-12TOPS算力开发板拥有12GB/24GB LPDDR4X,20TOPS算力开发板拥有8GB/16GB LPDDR4X。两种算力的开发板均可以外接32GB/64GB/256GB eMMC模块,支持双4K高清输出。 目前已实现OrangePi AIpro开发板的系统镜像预置昇思MindSpore AI框架,并在后续版本迭代中持续演进,当前已支持MindSpore官网教程涵盖的全部网络模型。OrangePi AIpro开发板向开发者提供的官方系统镜像有openEuler版本预ubuntu版本,两个镜像版本均已预置昇思MindSpore,便于用户体验软硬协同优化后带来的高效开发体验。同时,欢迎开发者自定义配置MindSpore和CANN运行环境。