From ecc324d4bbf05a393fa637f6fdf7161ada3f775c Mon Sep 17 00:00:00 2001 From: huan <3174348550@qq.com> Date: Tue, 8 Jul 2025 10:39:46 +0800 Subject: [PATCH] update en files 2.6.0 --- .../advanced_development/dev_migration.md | 26 +- .../performance_optimization.md | 6 +- .../precision_optimization.md | 6 +- .../docs/source_en/env_variables.md | 4 +- .../docs/source_en/faq/feature_related.md | 2 +- .../docs/source_en/feature/ckpt.md | 20 +- .../docs/source_en/feature/configuration.md | 18 +- .../docs/source_en/feature/dataset.md | 24 +- .../docs/source_en/feature/evaluation.md | 8 +- .../feature/load_huggingface_config.md | 4 +- .../docs/source_en/feature/logging.md | 6 +- .../source_en/feature/memory_optimization.md | 2 +- .../docs/source_en/feature/monitor.md | 2 +- .../source_en/feature/parallel_training.md | 20 +- .../docs/source_en/feature/quantization.md | 6 +- .../docs/source_en/feature/resume_training.md | 4 +- .../docs/source_en/feature/safetensors.md | 28 +-- .../skip_data_and_ckpt_health_monitor.md | 6 +- .../docs/source_en/feature/start_tasks.md | 6 +- .../docs/source_en/feature/tokenizer.md | 2 +- .../feature/training_hyperparameters.md | 26 +- .../docs/source_en/guide/deployment.md | 10 +- .../docs/source_en/guide/inference.md | 10 +- .../docs/source_en/guide/pre_training.md | 22 +- .../source_en/guide/supervised_fine_tuning.md | 18 +- docs/mindformers/docs/source_en/index.rst | 60 ++--- .../docs/source_en/installation.md | 2 +- .../docs/source_en/introduction/models.md | 10 +- .../docs/source_en/introduction/overview.md | 8 +- .../accuracy_comparison.md | 12 +- .../advanced_development/dev_migration.md | 26 +- .../performance_optimization.md | 6 +- .../precision_optimization.md | 6 +- .../docs/source_zh_cn/env_variables.md | 4 +- .../convert_ckpt_to_megatron.md | 2 +- .../example/distilled/distilled.md | 8 +- .../docs/source_zh_cn/faq/feature_related.md | 2 +- .../docs/source_zh_cn/feature/ckpt.md | 20 +- .../source_zh_cn/feature/configuration.md | 18 +- .../docs/source_zh_cn/feature/dataset.md | 24 +- .../docs/source_zh_cn/feature/evaluation.md | 8 +- .../feature/load_huggingface_config.md | 4 +- .../docs/source_zh_cn/feature/logging.md | 6 +- .../feature/memory_optimization.md | 2 +- .../docs/source_zh_cn/feature/monitor.md | 2 +- .../source_zh_cn/feature/parallel_training.md | 20 +- .../docs/source_zh_cn/feature/quantization.md | 6 +- .../source_zh_cn/feature/resume_training.md | 4 +- .../docs/source_zh_cn/feature/safetensors.md | 26 +- .../skip_data_and_ckpt_health_monitor.md | 6 +- .../docs/source_zh_cn/feature/start_tasks.md | 6 +- .../docs/source_zh_cn/feature/tokenizer.md | 2 +- .../feature/training_hyperparameters.md | 26 +- .../docs/source_zh_cn/guide/deployment.md | 10 +- .../docs/source_zh_cn/guide/inference.md | 10 +- .../docs/source_zh_cn/guide/pre_training.md | 22 +- .../guide/supervised_fine_tuning.md | 18 +- docs/mindformers/docs/source_zh_cn/index.rst | 62 ++--- .../docs/source_zh_cn/installation.md | 2 +- .../docs/source_zh_cn/introduction/models.md | 10 +- .../source_zh_cn/introduction/overview.md | 8 +- .../ms_infer/llm_inference_overview.md | 21 +- .../model_infer/ms_infer/model_dev.md | 10 +- .../model_infer/ms_infer/parallel.md | 232 +++++++++--------- .../model_infer/ms_infer/quantization.md | 2 +- .../model_infer/ms_infer/weight_prepare.md | 14 +- .../model_infer/ms_infer/weight_split.md | 2 +- .../source_en/parallel/split_technique.md | 8 +- .../ms_infer/llm_inference_overview.md | 16 +- .../source_zh_cn/parallel/split_technique.md | 8 +- 70 files changed, 533 insertions(+), 534 deletions(-) diff --git a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md index 3f1fe03f63..fe70a8c306 100644 --- a/docs/mindformers/docs/source_en/advanced_development/dev_migration.md +++ b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md @@ -12,9 +12,9 @@ The basic components of a foundation model in MindSpore Transformers include the A model configuration is an instance that contains all information about a model. The `__init__` methods of all models in MindSpore Transformers receive a model configuration instance as the input parameter. All submodules of the model are initialized based on the information contained in the configuration instance. -MindSpore Transformers provides the [PretrainedConfig](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PretrainedConfig.html) class, which provides some common configuration methods. The configuration classes of all models should be inherited from the PretrainedConfig class. Developers only need to define all configuration parameters that help build foundation models. Foundation models of the Transformer type have configuration parameters such as `seq_length`, `hidden_size`, `num_layers`, and `num_heads`, and foundation models of the text type have `vocab_size` in addition. +MindSpore Transformers provides the [PretrainedConfig](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.PretrainedConfig.html) class, which provides some common configuration methods. The configuration classes of all models should be inherited from the PretrainedConfig class. Developers only need to define all configuration parameters that help build foundation models. Foundation models of the Transformer type have configuration parameters such as `seq_length`, `hidden_size`, `num_layers`, and `num_heads`, and foundation models of the text type have `vocab_size` in addition. -For details, see the configuration class [LlamaConfig](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaConfig.html) of the Llama model in MindSpore Transformers. +For details, see the configuration class [LlamaConfig](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.LlamaConfig.html) of the Llama model in MindSpore Transformers. > If your model is similar to a model in the library, you can reuse the same configurations as the model. @@ -22,12 +22,12 @@ For details, see the configuration class [LlamaConfig](https://www.mindspore.cn/ The MindSpore Transformers foundation model is developed based on the MindSpore framework. Developers only need to pay attention to the implementation of the model network. -MindSpore Transformers provides the [PretrainedModel](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PreTrainedModel.html) class, which is responsible for storage model configurations and processing the methods of loading and saving models. All model classes must be inherited from the PretrainedModel class, and the model input must be the same. That is, the input parameters of the `construct` method of the model must be the same. For details about the input parameters and meanings, see the Llama model class [LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaForCausalLM.html) in MindSpore Transformers. In addition, the model class must implement some abstract methods of the base class, including: +MindSpore Transformers provides the [PretrainedModel](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.PreTrainedModel.html) class, which is responsible for storage model configurations and processing the methods of loading and saving models. All model classes must be inherited from the PretrainedModel class, and the model input must be the same. That is, the input parameters of the `construct` method of the model must be the same. For details about the input parameters and meanings, see the Llama model class [LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.LlamaForCausalLM.html) in MindSpore Transformers. In addition, the model class must implement some abstract methods of the base class, including: - `prepare_inputs_for_generation`: method for building input for model inference. - `prepare_inputs_for_predict_layout`: method for building virtual input for the distributed loading model weight. -For specific meanings, refer to the descriptions in [LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaForCausalLM.html). +For specific meanings, refer to the descriptions in [LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.LlamaForCausalLM.html). > If your model structure is similar to that of a model in the library, you can reuse the model. @@ -35,20 +35,20 @@ For specific meanings, refer to the descriptions in [LlamaForCausalLM](https://w A tokenizer is used to process input and output of LLMs. It is required in the workflow of LLMs. -MindSpore Transformers provides the [PretrainedTokenizer](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PreTrainedTokenizer.html) and [PretrainedTokenizerFast](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.PreTrainedTokenizerFast.html) classes, which use Python only and use the Rust library, respectively. The features of the latter one are as follows: +MindSpore Transformers provides the [PretrainedTokenizer](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.PreTrainedTokenizer.html) and [PretrainedTokenizerFast](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.PreTrainedTokenizerFast.html) classes, which use Python only and use the Rust library, respectively. The features of the latter one are as follows: - Faster batch processing. - Additional methods for mapping between text strings and lexical spaces. For example, the indexes of the lexical element containing a given character or the character spans corresponding to the given lexical element are obtained. -All tokenizer classes must be inherited from the PretrainedTokenizer or PretrainedTokenizerFast class. For details, see [LlamaTokenizer](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaTokenizer.html) and [LlamaTokenizerFast](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaTokenizerFast.html). +All tokenizer classes must be inherited from the PretrainedTokenizer or PretrainedTokenizerFast class. For details, see [LlamaTokenizer](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.LlamaTokenizer.html) and [LlamaTokenizerFast](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.LlamaTokenizerFast.html). > If your tokenizer is similar to that in the library, you can reuse that in the library. ### Preparing a Weight and a Dataset -If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#weight-format-conversion). +If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html#weight-format-conversion). -For details about how to prepare a dataset, see [Dataset](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html) or the model document, for example, [Llama2 Description Document > Dataset Preparation](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87). +For details about how to prepare a dataset, see [Dataset](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html) or the model document, for example, [Llama2 Description Document > Dataset Preparation](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87). ### Preparing a `YAML` Configuration File @@ -56,7 +56,7 @@ MindSpore Transformers uses a `YAML` file to configure all parameters required b The code of the customized model is not in the MindSpore Transformers library, and the customized module in the code is not registered with MindSpore Transformers. Therefore, the customized model cannot be automatically instantiated. The code is also called external code (for example, the code in the `research` directory). Therefore, you need to add the `auto_register` configuration item for automatically registering any module to the corresponding module configuration in the `YAML` file and set the configuration items to the relative import paths of the API to be registered. When the run_mindformer.py script is executed to start the task, you need to add the input parameter `--register_path` of the registration path and set it to the relative path of the directory where the external code is located. -For example, in the `YAML` file [`research/llama3_1/predict_llama3_1_8b.yaml`](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml) of the Llama3.1-8B model inference in the `research` directory, the configuration item `auto_register` is added for automatic registration to register the customized `Llama3Tokenizer` in [`research/llama3_1/llama3_1_tokenizer.py`](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_tokenizer.py). +For example, in the `YAML` file [`research/llama3_1/predict_llama3_1_8b.yaml`](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml) of the Llama3.1-8B model inference in the `research` directory, the configuration item `auto_register` is added for automatic registration to register the customized `Llama3Tokenizer` in [`research/llama3_1/llama3_1_tokenizer.py`](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_tokenizer.py). ```yaml ... @@ -91,15 +91,15 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l | register_path | Path of the directory where the external code is located. | | predict_data | Input data for inference. | -`register_path` is set to `research/llama3_1` (path of the directory where the external code is located). For details about how to prepare the model weight, see [Llama3.1 Description Document > Model Weight Download](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD). +`register_path` is set to `research/llama3_1` (path of the directory where the external code is located). For details about how to prepare the model weight, see [Llama3.1 Description Document > Model Weight Download](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD). -For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama3_1-8B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml). +For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama3_1-8B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml). -After all the preceding basic elements are prepared, you can refer to other documents in the MindSpore Transformers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/precision_optimization.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html). +After all the preceding basic elements are prepared, you can refer to other documents in the MindSpore Transformers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/advanced_development/precision_optimization.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/advanced_development/performance_optimization.html). ### Contributing Models to the MindSpore Transformers Open Source Repository -You can contribute models to the MindSpore Transformers open source repository for developers to research and use. For details, see [MindSpore Transformers Contribution Guidelines](https://www.mindspore.cn/mindformers/docs/en/dev/contribution/mindformers_contribution.html). +You can contribute models to the MindSpore Transformers open source repository for developers to research and use. For details, see [MindSpore Transformers Contribution Guidelines](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/contribution/mindformers_contribution.html). ## MindSpore Transformers Model Migration Practice diff --git a/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md b/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md index f1a51a4292..afe429b54e 100644 --- a/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md +++ b/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md @@ -64,7 +64,7 @@ Parallelism strategies are usually classified into various parallel modes: In practice, multiple parallel strategies and multiple optimizations, such as using optimizer parallelism and recomputation, are usually employed to reduce the model's use of memory and improve training efficiency. Parallel strategy design is closely related to the efficiency of the model, and it is crucial to identify one or more sets of better parallel strategies before model tuning. -For details, refer to [Parallel Strategy Guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html). +For details, refer to [Parallel Strategy Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/parallel_training.html). For models with different parameter count specifications, the following parallel strategy can be selected: @@ -277,7 +277,7 @@ Click anywhere on the timeline page tree or graphical pane can be performed usin #### IR Graph -In the [MindSpore Transformers configuration file](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html), just turn on save_graphs, and the runtime will output some intermediate files ending with the .ir suffix generated during the graph compilation process, which we call IR files. By default, a directory of graphs will be generated in the current task execution directory, and all IR graphs will be saved in this. It is a relatively intuitive and easy to understand document describing the structure of the model in text format, which can be viewed directly with text editing software. Refer to [Config Configuration Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html) for the meaning of the configuration items, and the configuration method is as follows: +In the [MindSpore Transformers configuration file](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html), just turn on save_graphs, and the runtime will output some intermediate files ending with the .ir suffix generated during the graph compilation process, which we call IR files. By default, a directory of graphs will be generated in the current task execution directory, and all IR graphs will be saved in this. It is a relatively intuitive and easy to understand document describing the structure of the model in text format, which can be viewed directly with text editing software. Refer to [Config Configuration Description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html) for the meaning of the configuration items, and the configuration method is as follows: ```yaml context: @@ -598,7 +598,7 @@ For the bottleneck points analyzed above, we can apply the following optimizatio 2. Embedding parameter configuration optimizer parallelism: large vocabulary occupies too much memory, and the optimizer parallelism of vocabulary weights needs additional configuration, which effectively alleviates the problem of insufficient memory in the first stage; - An introduction to the use of optimizer parallelism can be found in [MindSpore Optimizer Parallelism Documentation](https://www.mindspore.cn/tutorials/en/r2.7.0rc1/parallel/optimizer_parallel.html). In addition, the Llama model has additional configurations for optimizers in the embedding layer, the `parallel_optimizer` in the [LlamaConfig API documentation](https://www.mindspore.cn/mindformers/docs/en/dev/models/mindformers.models.LlamaConfig.html#mindformers.models.LlamaConfig) controls the parallelism of the embedding optimizer; + An introduction to the use of optimizer parallelism can be found in [MindSpore Optimizer Parallelism Documentation](https://www.mindspore.cn/tutorials/en/r2.7.0rc1/parallel/optimizer_parallel.html). In addition, the Llama model has additional configurations for optimizers in the embedding layer, the `parallel_optimizer` in the [LlamaConfig API documentation](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/models/mindformers.models.LlamaConfig.html#mindformers.models.LlamaConfig) controls the parallelism of the embedding optimizer; A sample configuration is shown below: ```yaml diff --git a/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md index 930171c259..f3da95bb46 100644 --- a/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md +++ b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md @@ -187,7 +187,7 @@ Since features such as model parallelism, flow parallelism, sequence parallelism #### Weight Conversion -During training, MindSpore is loaded with the same weights as PyTorch. In case of pre-training scenarios, you can use PyTorch to save an initialized weight and then convert it to MindSpore weights. Because MindSpore weight names differ from PyTorch, the essence of weight conversion is to change the names in the PyTorch weight dict to MindSpore weight names to support MindSpore loading. Refer to [weight conversion guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#weight-format-conversion) for weight conversion. +During training, MindSpore is loaded with the same weights as PyTorch. In case of pre-training scenarios, you can use PyTorch to save an initialized weight and then convert it to MindSpore weights. Because MindSpore weight names differ from PyTorch, the essence of weight conversion is to change the names in the PyTorch weight dict to MindSpore weight names to support MindSpore loading. Refer to [weight conversion guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html#weight-format-conversion) for weight conversion. Both MindSpore and PyTorch support `bin` format data, loading the same dataset for training ensures consistency from step to step. @@ -226,7 +226,7 @@ The training process fixes randomness and turns on deterministic computation in # Original code ``` -* MindSpore code, in [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py), the new seed_all method is added and called in the main method, adding the method as follows: +* MindSpore code, in [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py), the new seed_all method is added and called in the main method, adding the method as follows: ```python import numpy as np @@ -337,7 +337,7 @@ def get_parameters(self): return params ``` -For MindSpore Transformers loading gradient, refer to [mindformers/wrapper/wrapper.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/wrapper/wrapper.py) implementation. Note that users need to find the correspondence between MindSpore Transformers and PyTorch gradient. Refer to the following modified code: +For MindSpore Transformers loading gradient, refer to [mindformers/wrapper/wrapper.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/wrapper/wrapper.py) implementation. Note that users need to find the correspondence between MindSpore Transformers and PyTorch gradient. Refer to the following modified code: ```python class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell): diff --git a/docs/mindformers/docs/source_en/env_variables.md b/docs/mindformers/docs/source_en/env_variables.md index 6123e6e945..e7569d4de5 100644 --- a/docs/mindformers/docs/source_en/env_variables.md +++ b/docs/mindformers/docs/source_en/env_variables.md @@ -14,7 +14,7 @@ The following environment variables are supported by MindSpore Transformers. | **ASCEND_LAUNCH_BLOCKING** | 0 | training or online inference scenarios, this environment variable can be used to control whether synchronization mode is activated during operator execution. | `1`: synchronized mode is mandatory;
`0`: synchronized mode is optional. | Since the default operator executes asynchronously during NPU model training, when an error is reported during operator execution, the error stack information printed is not the actual call stack information. When set to `1`, synchronized mode is mandatory, which prints the correct call stack information and makes it easier to debug and locate problems in the code. Setting it to `1` provides more efficient arithmetic. | | **TE_PARALLEL_COMPILER** | 8 | The number of threads on which the operator is compiled in parallel. Enables parallel compilation when greater than 1. | Takes a positive integer;Maximum number of cpu cores\*80%/number of Ascend AI processors, value range 1~32, default value is 8. | When the network model is large, parallel compilation of the operator can be turned on by configuring this environment variable;
setting it to `1` for single-threaded compilation simplifies the difficulty when debugging. | | **CPU_AFFINITY** | 0 | Turn on the CPU affinity switch, thus ensuring that each process or thread is bound to a single CPU core to improve performance. | `1`: turn on the CPU affinity switch;
`0`: turn off the CPU affinity switch. | CPU affinity is turned off by default for **optimized resource utilization** and **energy saving**. | -| **MS_MEMORY_STATISTIC** | 0 | Memory Statistics. | `1`: turn on memory statistics;
`0`: turn off memory statistics. | During memory analysis, basic memory usage can be counted. You can refer to [Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html) for details. | +| **MS_MEMORY_STATISTIC** | 0 | Memory Statistics. | `1`: turn on memory statistics;
`0`: turn off memory statistics. | During memory analysis, basic memory usage can be counted. You can refer to [Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/advanced_development/performance_optimization.html) for details. | | **MINDSPORE_DUMP_CONFIG** | NA | Specify the path to the configuration file that the [cloud-side Dump function](https://www.mindspore.cn/tutorials/en/r2.7.0rc1/debug/dump.html) or [end-side Dump function](https://www.mindspore.cn/lite/docs/en/r2.7.0rc1/tools/benchmark_tool.html#dump) depends on. | File path, support relative path and absolute path. | | **GLOG_v** | 3 | Controls the level of MindSpore logs. | `0`: DEBUG
`1`: INFO
`2`: WARNING
`3`: ERROR: indicates that an error has been reported in the execution of the program, an error log is output, and the program may not be terminated;
`4`: CRITICAL, indicates that an exception has occurred in the execution of the program, and the execution of the program will be terminated. | | **ASCEND_GLOBAL_LOG_LEVEL** | 3 | Controls the logging level of CANN. | `0`: DEBUG
`1`: INFO
`2`: WARNING
`3`: ERROR
`4`: NULL, no log is output. | @@ -40,4 +40,4 @@ The following environment variables are supported by MindSpore Transformers. | **MS_ENABLE_FA_FLATTEN** | on | Controls whether support FlashAttention flatten optimization. | `on`: Enable FlashAttention flatten optimization;
`off`: Disable FlashAttention flatten optimization. | Provide a fallback mechanism for models that have not yet been adapted to FlashAttention flatten optimization. | | **EXPERIMENTAL_KERNEL_LAUNCH_GROUP** | NA | Control whether to support the batch parallel submission of operators. If supported, enable the parallel submission and configure the number of parallel submissions. | `thread_num`: The number of concurrent threads is not recommended to be increased. The default value is 2;
`kernel_group_num`: Total number of operator groups, 'kernel_group_num/thread_num' groups per thread, default is' 8 '. | This feature will continue to evolve in the future, and the subsequent behavior may change. Currently, only the `deepseek` reasoning scenario is supported, with certain performance optimization, but other models using this feature may deteriorate, and users need to use it with caution, as follows:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`. | | **FORCE_EAGER** | False | Control whether to disable jit mode. | `False`: Enable jit mode;
`True`: Do not enable jit mode. | Jit compiles functions into a callable MindSpore graph, sets FORCE_EAGER to False to enable jit mode, which can generate performance benefits. Currently, only inference mode is supported. | -| **MS_ENABLE_TFT** | NA | Enable [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) feature. Turn on TTP, UCE, HCCE, ARF, TRE or TSP feature. | The value of the environment variable can be:"{TTP:1,UCE:1,HCCE:1,ARF:1,TRE:1,TSP:1}", when using a certain feature, the corresponding field can be configured as "1". | Usage can refer to [High Availability](https://www.mindspore.cn/mindformers/docs/en/dev/feature/high_availability.html). | +| **MS_ENABLE_TFT** | NA | Enable [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) feature. Turn on TTP, UCE, HCCE, ARF, TRE or TSP feature. | The value of the environment variable can be:"{TTP:1,UCE:1,HCCE:1,ARF:1,TRE:1,TSP:1}", when using a certain feature, the corresponding field can be configured as "1". | Usage can refer to [High Availability](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/high_availability.html). | diff --git a/docs/mindformers/docs/source_en/faq/feature_related.md b/docs/mindformers/docs/source_en/faq/feature_related.md index 45e07fe5fa..7a346f3cbb 100644 --- a/docs/mindformers/docs/source_en/faq/feature_related.md +++ b/docs/mindformers/docs/source_en/faq/feature_related.md @@ -10,7 +10,7 @@ A: The official download link is not available, please follow the community Issu ## Q: How Do I Generate a Model Sharding Strategy File? -A: The model sharding strategy file documents the sharding strategy for model weights in distributed scenarios and is generally used when slicing weights offline. Configure `only_save_strategy: True` in the network `yaml` file, and then start the distributed task normally, then the distributed strategy file can be generated in the `output/strategy/` directory. For details, please refer to the [Tutorial on Slicing and Merging Distributed Weights](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html). +A: The model sharding strategy file documents the sharding strategy for model weights in distributed scenarios and is generally used when slicing weights offline. Configure `only_save_strategy: True` in the network `yaml` file, and then start the distributed task normally, then the distributed strategy file can be generated in the `output/strategy/` directory. For details, please refer to the [Tutorial on Slicing and Merging Distributed Weights](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html).
diff --git a/docs/mindformers/docs/source_en/feature/ckpt.md b/docs/mindformers/docs/source_en/feature/ckpt.md index 5f669fc610..7153720925 100644 --- a/docs/mindformers/docs/source_en/feature/ckpt.md +++ b/docs/mindformers/docs/source_en/feature/ckpt.md @@ -6,7 +6,7 @@ Ckpt is a common file format used to save model training status in the deep learning framework. It contains model parameters, optimizer status, and training progress. It is used to restore training or fine-tune models. This document describes how MindSpore Transformers supports conversion , slice and merge. -> The ckpt format is planned to offline. The safetensors format is recommended for weights. Safetensors is a reliable and portable machine learning model storage format from Huggingface for storing Tensors securely and with fast storage (zero copies). For details, see [Safetensors Weights](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html). +> The ckpt format is planned to offline. The safetensors format is recommended for weights. Safetensors is a reliable and portable machine learning model storage format from Huggingface for storing Tensors securely and with fast storage (zero copies). For details, see [Safetensors Weights](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html). ## Weight Format Conversion @@ -40,7 +40,7 @@ python convert_weight.py [-h] --model MODEL [--reversed] --input_path INPUT_PATH ### Conversion Example -Assume that you have downloaded the [Llama2 model weight](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD) and saved it in the `/home/user/torch_weights` path, to convert it to the MindSpore Transformers weight and save it in the `/home/user/ms_weights` path, run the following command: +Assume that you have downloaded the [Llama2 model weight](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD) and saved it in the `/home/user/torch_weights` path, to convert it to the MindSpore Transformers weight and save it in the `/home/user/ms_weights` path, run the following command: ```bash python convert_weight.py --model llama --input_path /home/user/torch_weights --output_path /home/user/ms_weights/llama.ckpt @@ -75,7 +75,7 @@ After the preceding steps are performed, the HuggingFace weight is successfully ### Example of Developing Model Weight Conversion -Llama is used as an example. To convert a HuggingFace weight to a MindSpore Transformers one, define the `convert_pt_to_ms` function in [convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py). +Llama is used as an example. To convert a HuggingFace weight to a MindSpore Transformers one, define the `convert_pt_to_ms` function in [convert_weight.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/convert_weight.py). ```python def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs): @@ -108,7 +108,7 @@ def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs): return True ``` -To convert a MindSpore Transformers weight to a HuggingFace one, define the `convert_ms_to_pt` function in [convert_reversed.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_reversed.py). +To convert a MindSpore Transformers weight to a HuggingFace one, define the `convert_ms_to_pt` function in [convert_reversed.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/convert_reversed.py). ```python def convert_ms_to_pt(input_path, output_path, dtype=None, **kwargs): @@ -241,7 +241,7 @@ If there is currently no distributed strategy file, it can be quickly generated **Single-Process Conversion** -Use [mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py) to perform single-process conversion on the loaded weight. +Use [mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.py) to perform single-process conversion on the loaded weight. **Run the command.** @@ -254,7 +254,7 @@ python transform_checkpoint.py \ **Multi-Process Conversion** -Use [mindformers/tools/ckpt_transform/transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.sh) to perform multi-process conversion on the loaded weight. +Use [mindformers/tools/ckpt_transform/transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.sh) to perform multi-process conversion on the loaded weight. **Run the command.** @@ -269,7 +269,7 @@ bash transform_checkpoint.sh \ **Precautions**: -- When the [transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.sh) script is used, `8` indicates the number of target devices, and `2` indicates that two processes are used for conversion. +- When the [transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.sh) script is used, `8` indicates the number of target devices, and `2` indicates that two processes are used for conversion. ### Special Scenarios @@ -325,7 +325,7 @@ If there is a shared disk between servers, you can use MindSpore Transformers to **Start a task.** - Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh) to start the task. + Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh) to start the task. ```shell # First server (main node) @@ -376,7 +376,7 @@ If there is no shared disk between servers, you need to use the offline weight c - **Offline weight conversion** - On the server where all strategy files are stored, use [mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py) to perform offline weight conversion. + On the server where all strategy files are stored, use [mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.py) to perform offline weight conversion. **Single-process conversion** @@ -431,7 +431,7 @@ For details about the principles and implementation of LoRA, see the following r #### Instructions -Use the [LoRA weight merging script](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/transform_ckpt_lora.py) provided by MindSpore Transformers to merge LoRA weights as follows: +Use the [LoRA weight merging script](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/transform_ckpt_lora.py) provided by MindSpore Transformers to merge LoRA weights as follows: ```shell python mindformers/tools/transform_ckpt_lora.py \ diff --git a/docs/mindformers/docs/source_en/feature/configuration.md b/docs/mindformers/docs/source_en/feature/configuration.md index 9e143280d8..2a481cd565 100644 --- a/docs/mindformers/docs/source_en/feature/configuration.md +++ b/docs/mindformers/docs/source_en/feature/configuration.md @@ -19,9 +19,9 @@ The basic configuration is mainly used to specify MindSpore random seeds and rel | seed | Set the global seed. For details, refer to [mindspore.set_seed](https://www.mindspore.cn/docs/en/r2.7.0rc1/api_python/mindspore/mindspore.set_seed.html). | int | | run_mode | Set the running mode of the model: `train`, `finetune`, `eval` or `predict`. | str | | output_dir | Set the path where log, checkpoint, strategy, etc. files are saved. | str | -| load_checkpoint | File or folder paths for loading weights. Currently there are 3 application scenarios
1. Support for passing in full weight file paths.
2. Support for passing in offline sliced weight folder paths.
3. Support for passing in folder paths containing lora weights and base weights
Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html) for the ways of obtaining various weights. | str | -| auto_trans_ckpt | Enable distributed weight auto slicing and merging. Refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html). | bool | -| resume_training | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html#resumable-training). | bool | +| load_checkpoint | File or folder paths for loading weights. Currently there are 3 application scenarios
1. Support for passing in full weight file paths.
2. Support for passing in offline sliced weight folder paths.
3. Support for passing in folder paths containing lora weights and base weights
Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html) for the ways of obtaining various weights. | str | +| auto_trans_ckpt | Enable distributed weight auto slicing and merging. Refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html). | bool | +| resume_training | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/resume_training.html#resumable-training). | bool | | load_ckpt_format | The format of loading checkpoint, either `ckpt` or `safetensors`. | str | | remove_redundancy | Whether the checkpoint has removed redundancy while loading checkpoint. The default value is `False`. | bool | | train_precision_sync | Switching on or off deterministic computation of the training process. The default value is `None`. | Optional[bool] | @@ -149,7 +149,7 @@ When starting model training, in addition to model-related parameters, you also ### Parallel Configuration -In order to improve the performance of the model, it is usually necessary to configure the parallelism strategy for the model in large-scale cluster usage scenarios. For details, please refer to [Distributed Parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), the parallel configuration in MindSpore Transformers is as follows. +In order to improve the performance of the model, it is usually necessary to configure the parallelism strategy for the model in large-scale cluster usage scenarios. For details, please refer to [Distributed Parallelism](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/parallel_training.html), the parallel configuration in MindSpore Transformers is as follows. | Parameters | Descriptions | Types | |-----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| @@ -183,7 +183,7 @@ In order to improve the performance of the model, it is usually necessary to con ### Model Optimization Configuration -1. MindSpore Transformers provides recomputation-related configurations to reduce the memory footprint of the model during training, see [Recomputation](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html#recomputation) for details. +1. MindSpore Transformers provides recomputation-related configurations to reduce the memory footprint of the model during training, see [Recomputation](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/advanced_development/performance_optimization.html#recomputation) for details. | Parameters | Descriptions | Types | |----------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------| @@ -195,7 +195,7 @@ In order to improve the performance of the model, it is usually necessary to con | recompute_config.select_recompute_exclude | Disable recomputation for the specified operator, valid only for the Primitive operators. | bool/list | | recompute_config.select_comm_recompute_exclude | Disable communication recomputation for the specified operator, valid only for the Primitive operators. | bool/list | -2. MindSpore Transformers provides fine-grained activations SWAP-related configurations to reduce the memory footprint of the model during training, see [Fine-Grained Activations SWAP](https://www.mindspore.cn/mindformers/docs/en/dev/feature/memory_optimization.html#fine-grained-activations-swap) for details. +2. MindSpore Transformers provides fine-grained activations SWAP-related configurations to reduce the memory footprint of the model during training, see [Fine-Grained Activations SWAP](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/memory_optimization.html#fine-grained-activations-swap) for details. | Parameters | Descriptions | Types | |----------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------| @@ -290,7 +290,7 @@ MindSpore Transformers provides model evaluation function, and also supports mod ### Profile Configuration -MindSpore Transformers provides Profile as the main tool for model performance tuning, please refer to [Performance Tuning Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html) for more details. The following is the Profile related configuration. +MindSpore Transformers provides Profile as the main tool for model performance tuning, please refer to [Performance Tuning Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/advanced_development/performance_optimization.html) for more details. The following is the Profile related configuration. | Parameters | Descriptions | Types | |-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| @@ -310,7 +310,7 @@ MindSpore Transformers provides Profile as the main tool for model performance t ### Metric Monitoring Configuration -The metric monitoring configuration is primarily used to configure methods to record metrics during training, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/dev/feature/monitor.html) for more details.Below is a description of the common metric monitoring configuration options in MindSpore Transformers: +The metric monitoring configuration is primarily used to configure methods to record metrics during training, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/monitor.html) for more details.Below is a description of the common metric monitoring configuration options in MindSpore Transformers: | Parameters | Descriptions | Types | |--------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------| @@ -333,7 +333,7 @@ The metric monitoring configuration is primarily used to configure methods to re ### TensorBoard Configuration -The TensorBoard configuration is primarily used to configure parameters related to TensorBoard during training, allowing for real-time monitoring and visualization of training metrics, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/dev/feature/monitor.html) for more details. Below is a description of the common TensorBoard configuration options in MindSpore Transformers: +The TensorBoard configuration is primarily used to configure parameters related to TensorBoard during training, allowing for real-time monitoring and visualization of training metrics, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/monitor.html) for more details. Below is a description of the common TensorBoard configuration options in MindSpore Transformers: | Parameters | Descriptions | Types | |--------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| diff --git a/docs/mindformers/docs/source_en/feature/dataset.md b/docs/mindformers/docs/source_en/feature/dataset.md index 216f634184..c68289a6ed 100644 --- a/docs/mindformers/docs/source_en/feature/dataset.md +++ b/docs/mindformers/docs/source_en/feature/dataset.md @@ -16,7 +16,7 @@ The following sections will explain how to generate `.bin` and `.idx` files, as ### Data Preprocessing -MindSpore Transformers provides a data preprocessing script, [preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py), which is used to convert raw text data in `json` format into `.bin` and `.idx` files. +MindSpore Transformers provides a data preprocessing script, [preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py), which is used to convert raw text data in `json` format into `.bin` and `.idx` files. If the raw text data is not in `json` format, users need to preprocess and convert it into the appropriate format themselves. @@ -73,7 +73,7 @@ The following example demonstrates how to convert the `wikitext-103` dataset int 4. Generate `.bin` and `.idx` data files - Run the data preprocessing script [preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py) to convert the original text data into corresponding token IDs using the model's tokenizer. + Run the data preprocessing script [preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py) to convert the original text data into corresponding token IDs using the model's tokenizer. The script accepts the following parameters: @@ -107,7 +107,7 @@ The following example demonstrates how to convert the `wikitext-103` dataset int --seq-length 8192 ``` - Take outer tokenizer class [Llama3Tokenizer](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_tokenizer.py) for example, make sure **local** MindSpore Transformers repository has 'research/llama3_1/llama3_1_tokenizer.py', and execute the following command to preprocess the dataset: + Take outer tokenizer class [Llama3Tokenizer](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_tokenizer.py) for example, make sure **local** MindSpore Transformers repository has 'research/llama3_1/llama3_1_tokenizer.py', and execute the following command to preprocess the dataset: ```shell python mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py \ @@ -207,7 +207,7 @@ The following explains how to configure and use Megatron datasets in the configu | pad | Token ID of the pad token in the dataset | | data_path | List, every two consecutive elements (number, string) are considered as a dataset, represent ratio of the dataset and the path to its bin file excluding `.bin` suffix respectively. The sum of datasets' ratios should be equal to 1. | - In addition, the Megatron dataset also depends on configurations such as `input_columns`, `construct_args_key`, and `full_batch`. For more details, refer to the [configuration file documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html). + In addition, the Megatron dataset also depends on configurations such as `input_columns`, `construct_args_key`, and `full_batch`. For more details, refer to the [configuration file documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html). Here, we only explain how to configure them in different scenarios: @@ -247,7 +247,7 @@ The following explains how to configure and use Megatron datasets in the configu 3. Start Model Pre-training After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the model pre-training task. - Here, we take the [Llama3_1 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md) as an example. + Here, we take the [Llama3_1 model documentation](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/README.md) as an example. ## HuggingFace Datasets @@ -414,7 +414,7 @@ When packing is configured, the dataset returns an `actual_seq_len` column. For prefetch_size: 1 ``` - 1. For parameter descriptions in `train_dataset`, please refer to the [documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html). + 1. For parameter descriptions in `train_dataset`, please refer to the [documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html). 2. `AlpacaInstructDataHandler` is an online processing script developed for the `alpaca` dataset. If using a different dataset, you need to implement a custom data handler by referring to the [Custom Data Handler](#custom-data-handler) guide. @@ -495,7 +495,7 @@ Users can define custom data handlers to apply various preprocessing logic to th - alpaca Dataset Sample - Modify the task configuration file [finetune_qwen2_5_0_5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml). + Modify the task configuration file [finetune_qwen2_5_0_5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml). Modify the following parameters: @@ -531,7 +531,7 @@ Users can define custom data handlers to apply various preprocessing logic to th prefetch_size: 1 ``` - The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). + The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). Custom data handler: @@ -645,7 +645,7 @@ Users can define custom data handlers to apply various preprocessing logic to th seed: 0 ``` - The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). + The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). Custom adgen_handler: @@ -724,7 +724,7 @@ Using the above configuration file to process the `alpaca` dataset will execute In addition to supporting online dataset loading and processing, `CommonDataLoader` also supports offline dataset processing and saving. -The [datasets_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/data_preprocess/huggingface/datasets_preprocess.py) script can be used to process Huggingface datasets offline and save them. +The [datasets_preprocess.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/toolkit/data_preprocess/huggingface/datasets_preprocess.py) script can be used to process Huggingface datasets offline and save them. - Parameter Description @@ -809,7 +809,7 @@ Following the above data preprocessing steps, you can generate a MindRecord data 1. Modify the model configuration file - The `qwen2_5-0.5b` model fine-tuning uses the [finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml) configuration file. Modify the dataset section as follows: + The `qwen2_5-0.5b` model fine-tuning uses the [finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml) configuration file. Modify the dataset section as follows: ```yaml train_dataset: &train_dataset @@ -827,7 +827,7 @@ Following the above data preprocessing steps, you can generate a MindRecord data 2. Start Model Fine-tuning - After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the fine-tuning task. Here, we take the [Qwen2_5 model documentation](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/README.md) as an example. + After modifying the dataset and parallel-related configurations in the model configuration file, you can refer to the model documentation to launch the fine-tuning task. Here, we take the [Qwen2_5 model documentation](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/README.md) as an example. ### Multi-source Datasets diff --git a/docs/mindformers/docs/source_en/feature/evaluation.md b/docs/mindformers/docs/source_en/feature/evaluation.md index fc79a41948..3dc3f4ba43 100644 --- a/docs/mindformers/docs/source_en/feature/evaluation.md +++ b/docs/mindformers/docs/source_en/feature/evaluation.md @@ -44,9 +44,9 @@ pip install -e . 1. Create a new directory with e.g. the name `model_dir` for storing the model yaml files. 2. Place the model inference yaml configuration file (predict_xxx_.yaml) in the directory created in the previous step. The directory location of the reasoning yaml configuration file for different models refers to [model library](../introduction/models.md). - 3. Configure the yaml file. If the model class, model Config class, and model Tokenzier class in yaml use cheat code, that is, the code files are in [research](https://gitee.com/mindspore/mindformers/tree/dev/research) directory or other external directories, it is necessary to modify the yaml file: under the corresponding class `type` field, add the `auto_register` field in the format of `module.class`. (`module` is the file name of the script where the class is located, and `class` is the class name. If it already exists, there is no need to modify it.). + 3. Configure the yaml file. If the model class, model Config class, and model Tokenzier class in yaml use cheat code, that is, the code files are in [research](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research) directory or other external directories, it is necessary to modify the yaml file: under the corresponding class `type` field, add the `auto_register` field in the format of `module.class`. (`module` is the file name of the script where the class is located, and `class` is the class name. If it already exists, there is no need to modify it.). - Using [predict_1lama3_1_8b. yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml) configuration as an example, modify some of the configuration items as follows: + Using [predict_1lama3_1_8b. yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml) configuration as an example, modify some of the configuration items as follows: ```yaml run_mode: 'predict' # Set inference mode @@ -71,13 +71,13 @@ pip install -e . #### Evaluation Example -Execute the script of [run_harness.sh](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/benchmarks/run_harness.sh) to evaluate. +Execute the script of [run_harness.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/toolkit/benchmarks/run_harness.sh) to evaluate. The following table lists the parameters of the script of `run_harness.sh`: | Parameter | Type | Description | Required | |---------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| -| `--register_path`| str | The absolute path of the directory where the cheat code is located. For example, the model directory under the [research](https://gitee.com/mindspore/mindformers/tree/dev/research) directory. | No(The cheat code is required) | +| `--register_path`| str | The absolute path of the directory where the cheat code is located. For example, the model directory under the [research](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research) directory. | No(The cheat code is required) | | `--model` | str | The value must be `mf`, indicating the MindSpore Transformers evaluation policy. | Yes | | `--model_args` | str | Model and evaluation parameters. For details, see MindSpore Transformers model parameters. | Yes | | `--tasks` | str | Dataset name. Multiple datasets can be specified and separated by commas (,). | Yes | diff --git a/docs/mindformers/docs/source_en/feature/load_huggingface_config.md b/docs/mindformers/docs/source_en/feature/load_huggingface_config.md index b5d8990594..2375220662 100644 --- a/docs/mindformers/docs/source_en/feature/load_huggingface_config.md +++ b/docs/mindformers/docs/source_en/feature/load_huggingface_config.md @@ -26,7 +26,7 @@ This feature only involves the model and inference configurations, with the rele - pretrained_model_dir: The directory path where the Hugging Face model configuration is located; - model_config: Model configuration fields specific to MindSpore Transformers; -- generation_config: Parameters related to text generation. Optional configuration, increase if customization is needed. For the configuration items, refer to [GenerationConfig](https://www.mindspore.cn/mindformers/docs/en/dev/generation/mindformers.generation.GenerationConfig.html). +- generation_config: Parameters related to text generation. Optional configuration, increase if customization is needed. For the configuration items, refer to [GenerationConfig](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/generation/mindformers.generation.GenerationConfig.html). ```yaml pretrained_model_dir: "./local/qwen3" @@ -59,7 +59,7 @@ generation_config: ### Initiating Tasks -Refer to [Using run_mindformer.py to initiate inference tasks](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html#using-run-mindformer-once-to-start-the-inference-script). +Refer to [Using run_mindformer.py to initiate inference tasks](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/inference.html#using-run-mindformer-once-to-start-the-inference-script). ## Frequently Asked Questions diff --git a/docs/mindformers/docs/source_en/feature/logging.md b/docs/mindformers/docs/source_en/feature/logging.md index 3470582022..214232777c 100644 --- a/docs/mindformers/docs/source_en/feature/logging.md +++ b/docs/mindformers/docs/source_en/feature/logging.md @@ -46,7 +46,7 @@ By default, MindSpore Transformers specifies the file output path as `./output` If you need to re-specify the output log folder, you can modify the configuration in yaml. -Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L2) as an example, the following configuration can be made: +Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L2) as an example, the following configuration can be made: ```yaml output_dir: './output' # path to save logs/checkpoint/strategy @@ -54,13 +54,13 @@ output_dir: './output' # path to save logs/checkpoint/strategy #### Specifying Output Directory for Single-Card Tasks -In addition to specifying the yaml file configuration, MindSpore Transformers also supports [run_mindformer In the one-click start script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/start_tasks.html#run-mindformer-one-click-start-script), +In addition to specifying the yaml file configuration, MindSpore Transformers also supports [run_mindformer In the one-click start script](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/start_tasks.html#run-mindformer-one-click-start-script), use the `--output_dir` start command to specify the log output path. > If the output path is configured here, it will overwrite the configuration in the yaml file! #### Distributed Task Specifies the Output Directory -If the model training requires multiple servers, use the [distributed task launch script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/start_tasks.html#distributed-task-pull-up-script) to start the distributed training task. +If the model training requires multiple servers, use the [distributed task launch script](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/start_tasks.html#distributed-task-pull-up-script) to start the distributed training task. If shared storage is set, you can also specify the input parameter `LOG_DIR` in the startup script to specify the log output path of the Worker and Scheduler, and output the logs of all machine nodes to one path for unified observation. diff --git a/docs/mindformers/docs/source_en/feature/memory_optimization.md b/docs/mindformers/docs/source_en/feature/memory_optimization.md index 1c95dc4e40..d64d8ce413 100644 --- a/docs/mindformers/docs/source_en/feature/memory_optimization.md +++ b/docs/mindformers/docs/source_en/feature/memory_optimization.md @@ -14,7 +14,7 @@ Recomputation can significantly reduce activation memory usage during training b Users can enable recomputation by adding a `recompute_config` module to the YAML configuration file used for model training. -Taking the [DeepSeek-V3 pre-training's YAML file](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L113) as an example, it could be configured as follows: +Taking the [DeepSeek-V3 pre-training's YAML file](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L113) as an example, it could be configured as follows: ```yaml # recompute config diff --git a/docs/mindformers/docs/source_en/feature/monitor.md b/docs/mindformers/docs/source_en/feature/monitor.md index 230e59aa4f..ad2005463e 100644 --- a/docs/mindformers/docs/source_en/feature/monitor.md +++ b/docs/mindformers/docs/source_en/feature/monitor.md @@ -256,4 +256,4 @@ All configuration names and descriptions are listed below: > 2. Configuration parameters set by the user in the training configuration file `yaml`; > 3. Default configuration parameters during training. > -> Refer to [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html) for all configurable parameters. \ No newline at end of file +> Refer to [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html) for all configurable parameters. \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/feature/parallel_training.md b/docs/mindformers/docs/source_en/feature/parallel_training.md index c480f6e454..a647daef8f 100644 --- a/docs/mindformers/docs/source_en/feature/parallel_training.md +++ b/docs/mindformers/docs/source_en/feature/parallel_training.md @@ -40,7 +40,7 @@ Parameter description: - data_parallel: The number of parallel data sharding, which is set to 1 by default, is configured based on user requirements. -For the configuration method of distributed parallel parameters, see the parallel configuration section in the [MindSpore Transformers Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For the configuration method of distributed parallel parameters, see the parallel configuration section in the [MindSpore Transformers Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). ### Model Parallelism @@ -59,7 +59,7 @@ Parameter description: - model_parallel: The number of parallel shards of the model, which is set to 1 by default, is configured according to user requirements. -For the configuration method of distributed parallel parameters, see the parallel configuration section in the [MindSpore Transformers Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For the configuration method of distributed parallel parameters, see the parallel configuration section in the [MindSpore Transformers Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). ### Sequence parallelism @@ -78,7 +78,7 @@ Parameter description: - use_seq_parallel:Whether to enable sequence parallelism, which is Fasle by default. -For the configuration method of distributed parallel parameters, see the parallel configuration section in the [MindSpore Transformers Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For the configuration method of distributed parallel parameters, see the parallel configuration section in the [MindSpore Transformers Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). ### Long Sequence Parallelism @@ -109,7 +109,7 @@ Parameter Descriptions: - use_ring_attention: Whether to enable Ring Attention, default is False. - context_parallel: The number of sequence parallel slices, default is 1, configure according to user requirements. -For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). #### Ulysses Sequence Parallelism @@ -140,7 +140,7 @@ Parameter Descriptions: - enable_alltoall: Generate alltoall communication operator, default is False, when the parameter is not enabled, it will be replaced by a combination of other operators such as allgather. See MindSpore `set_auto_parallel_context` [interface documentation](https://www.mindspore.cn/docs/en/r2.7.0rc1/api_python/mindspore/mindspore.set_auto_parallel_context.html). We expect to be able to directly input allto_all communication operators when we enable the Ulysses scenario, so we turn this configuration item on. - context_parallel_algo: Set to `ulysses_cp` to enable Ulysses sequence parallelism. -For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). #### Hybrid Sequence Parallelism @@ -166,7 +166,7 @@ Parameter Descriptions: - context_parallel_algo: hybrid sequence parallelism is turned on when set to `hybrid_cp`. - ulysses_degree_in_cp: the number of parallel slices of the Ulysses sequence. -For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). ### Pipeline Parallelism @@ -199,7 +199,7 @@ Notes: - Currently, only Llama and DeepSeek series models are supported. - Using Megatron's multi-source datasets for training is not yet supported. -For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html), specifically the section on parallel configuration. +For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html), specifically the section on parallel configuration. ### Optimizer parallelism @@ -218,7 +218,7 @@ Parameter Descriptions: - enable_parallel_optimizer:Whether to enable optimizer parallelism, which is Fasle by default. -For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html), specifically the section on parallel configuration. +For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html), specifically the section on parallel configuration. ### Multi-replica Parallelism @@ -241,11 +241,11 @@ Notes: - Currently, only Llama and DeepSeek series models are supported. -For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html), specifically the section on parallel configuration. +For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html), specifically the section on parallel configuration. ## MindSpore Transformers Distributed Parallel Application Practices -In the [Llama3_1-70B fine-tuning configuration](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows: +In the [Llama3_1-70B fine-tuning configuration](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#) file provided on the official website, multiple distributed parallelism strategies are used to improve the training efficiency in the multi-node multi-device environment. The main parallelism strategies and key parameters involved in the configuration file are as follows: - **Data parallelism**: No additional data parallelism is enabled (`data_parallel: 1`). - **Model parallelism**: A model is sliced into eight parts, which are computed on different devices (`model_parallel: 8`). diff --git a/docs/mindformers/docs/source_en/feature/quantization.md b/docs/mindformers/docs/source_en/feature/quantization.md index ba6d50d831..a29c07e841 100644 --- a/docs/mindformers/docs/source_en/feature/quantization.md +++ b/docs/mindformers/docs/source_en/feature/quantization.md @@ -14,6 +14,6 @@ Currently, only the following models are supported, and the supported models are | Supported Model | |-----------------------------------------------------------------------------------------------------------------------------------| -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/predict_deepseek3_671b.yaml) | -| [DeepSeek-R1](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b.yaml) | -| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_13b_ptq.yaml) | \ No newline at end of file +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/predict_deepseek3_671b.yaml) | +| [DeepSeek-R1](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b.yaml) | +| [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/llama2/predict_llama2_13b_ptq.yaml) | \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/feature/resume_training.md b/docs/mindformers/docs/source_en/feature/resume_training.md index 93eddba0f6..6dd9ed77ba 100644 --- a/docs/mindformers/docs/source_en/feature/resume_training.md +++ b/docs/mindformers/docs/source_en/feature/resume_training.md @@ -45,7 +45,7 @@ If `resume_training` is set to `True`, the system automatically resumes training ### Example of Distributed Training The following example shows how to enable resumable training in single-device and multi-device environments. The example is based on the `llama2_7b` model. -For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/pretrain_llama2_7b.yaml). +For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/llama2/pretrain_llama2_7b.yaml). #### Complete Training @@ -75,7 +75,7 @@ For related configuration files, see [configs/llama2/pretrain_llama2_7b.yaml](ht ... ``` -2. Prepare a dataset. The following uses [wikitext2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87) as an example to describe how to start four-device distributed training. +2. Prepare a dataset. The following uses [wikitext2](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87) as an example to describe how to start four-device distributed training. ```shell bash scripts/msrun_launcher.sh "run_mindformer.py \ diff --git a/docs/mindformers/docs/source_en/feature/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md index 3e2ba5b937..8d501255c4 100644 --- a/docs/mindformers/docs/source_en/feature/safetensors.md +++ b/docs/mindformers/docs/source_en/feature/safetensors.md @@ -15,7 +15,7 @@ There are two main types of Safetensors files: complete weights files and distri Safetensors complete weights can be obtained in two ways: 1. Download directly from Huggingface. -2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html#distributed-weight-slicing-and-merging). +2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html#distributed-weight-slicing-and-merging). Huggingface Safetensors example catalog structure is as follows: @@ -69,7 +69,7 @@ qwen2_7b In the training process of deep learning models, saving the model weights is a crucial step. The weight saving function allows us to store the model parameters at any stage of training, so that users can restore, continue training, evaluate or deploy after training is interrupted or completed. At the same time, by saving weights, experimental results can be reproduced in different environments. -Currently, MindSpore TransFormer supports reading and saving weight files in the [safetensors](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html) format. +Currently, MindSpore TransFormer supports reading and saving weight files in the [safetensors](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html) format. ### Directory Structure @@ -119,7 +119,7 @@ Users can control the weight saving behavior by modifying the configuration file Users can modify the fields under `CheckpointMonitor` in the `yaml` configuration file to control the weight saving behavior. -Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L206) as an example, the following configuration can be made: +Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L206) as an example, the following configuration can be made: ```yaml # callbacks @@ -152,7 +152,7 @@ The main parameters concerning the preservation of the weight configuration are | remove_redundancy | Whether redundancy is removed when saving model weights. | (bool, optional) - Default: `False` . | | save_network_params | Whether to additionally save only network parameters. | (bool, optional) - Whether to additionally save only network parameters. Default: `False` . | -If you want to know more about CheckpointMonitor, you can refer to [CheckpointMonitor API documentation](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.CheckpointMonitor.html). +If you want to know more about CheckpointMonitor, you can refer to [CheckpointMonitor API documentation](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.CheckpointMonitor.html). ## Weight Loading @@ -264,7 +264,7 @@ parallel_config: # Configuring a 16-card dist **Initiating tasks**: -Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh) to initiate tasks. +Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh) to initiate tasks. ```shell # The first server (master node) @@ -358,7 +358,7 @@ auto_trans_ckpt: False # Distributed weight loading **4. Initiating tasks**: -Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh) to initiate tasks. +Use [mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh) to initiate tasks. ```shell # The first server (master node) @@ -456,7 +456,7 @@ In the current distributed training and inference environment, when users need t #### Usage Directions -Use the [safetensors weights merging script](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/safetensors/unified_safetensors.py) provided by MindSpore Transformers to perform safetensors weight merging as follows. The format of the merged weights is [complete-weights](#complete-weights). +Use the [safetensors weights merging script](https://gitee.com/mindspore/mindformers/blob/r1.6.0/toolkit/safetensors/unified_safetensors.py) provided by MindSpore Transformers to perform safetensors weight merging as follows. The format of the merged weights is [complete-weights](#complete-weights). ```shell python toolkit/safetensors/unified_safetensors.py \ @@ -573,7 +573,7 @@ callbacks: ### Examples of Training Tasks -If you use the full weighted multicard online fine-tuning, take the Qwen2.5-7B model as an example and modify the configuration item [finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): +If you use the full weighted multicard online fine-tuning, take the Qwen2.5-7B model as an example and modify the configuration item [finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): ```yaml # Modified configuration @@ -589,7 +589,7 @@ callbacks: checkpoint_format: safetensors # Save weights file format ``` -If you use distributed weights multicard online fine-tuning, take the Qwen2.5-7B model as an example, modify the configuration item [finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): +If you use distributed weights multicard online fine-tuning, take the Qwen2.5-7B model as an example, modify the configuration item [finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): ```yaml # Modified configuration @@ -617,11 +617,11 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder. -For more details, please refer to [Introduction to SFT fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/guide/supervised_fine_tuning.html) and [Introduction to Pre-training](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html). +For more details, please refer to [Introduction to SFT fine-tuning](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/supervised_fine_tuning.html) and [Introduction to Pre-training](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/pre_training.html). ### Example of an Inference Task -If you use complete weighted multicard online inference, take the Qwen2.5-7B model as an example, and modify the configuration item [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): +If you use complete weighted multicard online inference, take the Qwen2.5-7B model as an example, and modify the configuration item [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): ```yaml # Modified configuration @@ -634,7 +634,7 @@ parallel_config: pipeline_stage: 1 ``` -If you use distributed weighted multicard online inference, take the Qwen2.5-7B model as an example, modify the configuration item [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): +If you use distributed weighted multicard online inference, take the Qwen2.5-7B model as an example, modify the configuration item [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): ```yaml # Modified configuration @@ -664,7 +664,7 @@ The results of executing the above single-card inference and multi-card inferenc 'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......] ``` -For more details, please refer to: [Introduction to Inference](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html) +For more details, please refer to: [Introduction to Inference](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/inference.html) ### Examples of Resumable Training after Breakpoint Tasks @@ -700,4 +700,4 @@ callbacks: checkpoint_format: safetensors # Save weights file format ``` -For more details, please refer to: [Introduction to Breakpoints](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html). +For more details, please refer to: [Introduction to Breakpoints](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/resume_training.html). diff --git a/docs/mindformers/docs/source_en/feature/skip_data_and_ckpt_health_monitor.md b/docs/mindformers/docs/source_en/feature/skip_data_and_ckpt_health_monitor.md index 3b71cd3b50..e526bfeaf8 100644 --- a/docs/mindformers/docs/source_en/feature/skip_data_and_ckpt_health_monitor.md +++ b/docs/mindformers/docs/source_en/feature/skip_data_and_ckpt_health_monitor.md @@ -10,7 +10,7 @@ Please refer to [Checkpoint Health Monitor](#checkpoint-health-monitor) for the > - The combination of data skipping function and health monitoring function can effectively solve the problem of data anomalies caused by abnormal global norm during the training process. Before use, please train normally for a period of time to determine the threshold of the global norm that needs to be set, the threshold of the number of consecutive anomalies, and the threshold of the embedding norm. > - Please note that training will only be interrupted when there are consecutive exceptions. If there is only one instance where it returns to normal, the cumulative count will be cleared. Therefore, please control the threshold setting. -> - The data skipping function cannot be used in conjunction with the quick fault recovery function. Refer to the process level rescheduling recovery function in the [high availability feature](https://www.mindspore.cn/mindformers/docs/en/dev/feature/high_availability.html). +> - The data skipping function cannot be used in conjunction with the quick fault recovery function. Refer to the process level rescheduling recovery function in the [high availability feature](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/high_availability.html). ## Skipping Data @@ -53,7 +53,7 @@ monitor_config: ### Conversion Example -Assuming Llama3.1-8B is taken as an example, use [finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml) to add parameters according to the above [Configuration](#usage), please refer to the [Llama3.1-8B Document](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md) for the remaining steps. Start training: +Assuming Llama3.1-8B is taken as an example, use [finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml) to add parameters according to the above [Configuration](#usage), please refer to the [Llama3.1-8B Document](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md) for the remaining steps. Start training: ```shell bash scripts/msrun_launcher.sh "run_mindformer.py \ @@ -155,7 +155,7 @@ parallel_config: ### Conversion Example -Assuming Llama3.1-8B is taken as an example, use [finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml) to add parameters and modify according to the above [Configuration](#usage-1), please refer to the [Llama3.1-8B Document](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md) for the remaining steps. Start training: +Assuming Llama3.1-8B is taken as an example, use [finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml) to add parameters and modify according to the above [Configuration](#usage-1), please refer to the [Llama3.1-8B Document](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md) for the remaining steps. Start training: ```shell bash scripts/msrun_launcher.sh "run_mindformer.py \ diff --git a/docs/mindformers/docs/source_en/feature/start_tasks.md b/docs/mindformers/docs/source_en/feature/start_tasks.md index fc8c60c799..49727c9834 100644 --- a/docs/mindformers/docs/source_en/feature/start_tasks.md +++ b/docs/mindformers/docs/source_en/feature/start_tasks.md @@ -22,7 +22,7 @@ In the root directory of the MindSpore Transformers code, execute the `run_mindf | `--device_id` | Set the execution device ID. The value must be within the range of available devices. | int, optional | pre-train/finetune/predict | | `--device_target` | Set the backend execution device. MindSpore Transformers is only supported on `Ascend` devices. | str, optional | pre-train/finetune/predict | | `--run_mode` | Set the running mode of the model: `train`, `finetune` or `predict`. | str, optional | pre-train/finetune/predict | -| `--load_checkpoint` | File or folder paths for loading weights. For detailed usage, please refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html) | str, optional | pre-train/finetune/predict | +| `--load_checkpoint` | File or folder paths for loading weights. For detailed usage, please refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html) | str, optional | pre-train/finetune/predict | | `--use_parallel` | Whether use parallel mode. | bool, optional | pre-train/finetune/predict | | `--output_dir` | Set the path where log, checkpoint, strategy, etc. files are saved. | str, optional | pre-train/finetune/predict | | `--register_path` | The absolute path of the directory where the external code is located. For example, the model directory under the research directory. | str, optional | pre-train/finetune/predict | @@ -33,7 +33,7 @@ In the root directory of the MindSpore Transformers code, execute the `run_mindf | Parameters | Parameter Descriptions | Value Description | Applicable Scenarios | |:----------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|-----------------------------| | `--src_strategy_path_or_dir` | The strategy of load_checkpoint. | str, optional | pre-train/finetune/predict | -| `--auto_trans_ckpt` | Enable online weight automatic conversion. Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html). | bool, optional | pre-train/finetune/predict | +| `--auto_trans_ckpt` | Enable online weight automatic conversion. Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html). | bool, optional | pre-train/finetune/predict | | `--transform_process_num` | The number of processes responsible for checkpoint transform. | int, optional | pre-train/finetune/predict | | `--only_save_strategy` | Whether to only save the strategy files. | bool, optional, when it is `true`, the task exits directly after saving the strategy file. | pre-train/finetune/predict | @@ -42,7 +42,7 @@ In the root directory of the MindSpore Transformers code, execute the `run_mindf | Parameters | Parameter Descriptions | Value Description | Applicable Scenarios | |:--------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|----------------------| | `--train_dataset_dir` | Dataset directory of data loader to pre-train/finetune. | str, optional | pre-train/finetune | -| `--resume_training` | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html#resumable-training). | bool, optional | pre-train/finetune | +| `--resume_training` | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/resume_training.html#resumable-training). | bool, optional | pre-train/finetune | | `--epochs` | Train epochs. | int, optional | pre-train/finetune | | `--batch_size` | The sample size of the batch data. | int, optional | pre-train/finetune | | `--gradient_accumulation_steps` | The number of gradient accumulation steps. | int, optional | pre-train/finetune | diff --git a/docs/mindformers/docs/source_en/feature/tokenizer.md b/docs/mindformers/docs/source_en/feature/tokenizer.md index e862b21f48..92eb9bd987 100644 --- a/docs/mindformers/docs/source_en/feature/tokenizer.md +++ b/docs/mindformers/docs/source_en/feature/tokenizer.md @@ -38,7 +38,7 @@ The inference process takes the Qwen3 model as an example. 1. Modify the yaml configuration - Qwen3 model configuration file [predict_qwen3 yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/qwen3/predict_qwen3.yaml) needs to be modified The places are as follows: + Qwen3 model configuration file [predict_qwen3 yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/qwen3/predict_qwen3.yaml) needs to be modified The places are as follows: ```yaml use_legacy: False diff --git a/docs/mindformers/docs/source_en/feature/training_hyperparameters.md b/docs/mindformers/docs/source_en/feature/training_hyperparameters.md index f7c6c1657a..acf3e4ddd5 100644 --- a/docs/mindformers/docs/source_en/feature/training_hyperparameters.md +++ b/docs/mindformers/docs/source_en/feature/training_hyperparameters.md @@ -24,7 +24,7 @@ Setting the learning rate too high can prevent the model from converging, while Users can utilize the learning rate by adding an `lr_schedule` module to the YAML configuration file used for model training. -Taking the [DeepSeek-V3 pre-training's YAML file](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L31) as an example, it could be configured as follows: +Taking the [DeepSeek-V3 pre-training's YAML file](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L31) as an example, it could be configured as follows: ```yaml # lr schedule @@ -39,14 +39,14 @@ lr_schedule: Different learning rates require different configuration parameters. MindSpore Transformers currently supports the following learning rates: -1. [Constant Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.ConstantWarmUpLR.html) -2. [Linear with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.LinearWithWarmUpLR.html) -3. [Cosine with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.CosineWithWarmUpLR.html) -4. [Cosine with Restarts and Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.CosineWithRestartsAndWarmUpLR.html) -5. [Polynomial with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.PolynomialWithWarmUpLR.html) -6. [The cosine annealing part of SGDR](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.CosineAnnealingLR.html) -7. [Set the learning rate of each parameter group using a cosine annealing schedule](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.CosineAnnealingWarmRestarts.html) -8. [Learning Rate Wise Layer](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.LearningRateWiseLayer.html) +1. [Constant Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.ConstantWarmUpLR.html) +2. [Linear with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.LinearWithWarmUpLR.html) +3. [Cosine with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.CosineWithWarmUpLR.html) +4. [Cosine with Restarts and Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.CosineWithRestartsAndWarmUpLR.html) +5. [Polynomial with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.PolynomialWithWarmUpLR.html) +6. [The cosine annealing part of SGDR](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.CosineAnnealingLR.html) +7. [Set the learning rate of each parameter group using a cosine annealing schedule](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.CosineAnnealingWarmRestarts.html) +8. [Learning Rate Wise Layer](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.LearningRateWiseLayer.html) Taking the cosine warm-up learning rate (CosineWithWarmUpLR) as an example, the main parameters that need to be paid attention to are listed in the following table: @@ -73,7 +73,7 @@ lr_schedule: total_steps: 20 # -1 means it will load the total steps of the dataset ``` -For more details about the learning rate API (such as `type` configuration names and introductions to learning rate algorithms), please refer to the related links in the [MindSpore Transformers API Documentation: Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers.core.html#learning-rate). +For more details about the learning rate API (such as `type` configuration names and introductions to learning rate algorithms), please refer to the related links in the [MindSpore Transformers API Documentation: Learning Rate](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/mindformers.core.html#learning-rate). ## Optimizer @@ -83,7 +83,7 @@ An optimizer is an algorithmic choice used for optimizing neural network weights Selecting the right optimizer is crucial for the convergence speed and final performance of the model. Different optimizers employ various strategies to adjust the learning rate and other hyperparameters to accelerate the training process, improve convergence, and avoid local optima. -Currently, MindSpore Transformers only supports the [AdamW optimizer](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers.core.html#optimizer). +Currently, MindSpore Transformers only supports the [AdamW optimizer](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/mindformers.core.html#optimizer). ### Configuration and Usage @@ -91,7 +91,7 @@ Currently, MindSpore Transformers only supports the [AdamW optimizer](https://ww Users can use the optimizer by adding an `optimizer` module to the YAML configuration file for model training. -Taking the [DeepSeek-V3 pre-training's YAML file](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L24) as an example, it could be configured like this: +Taking the [DeepSeek-V3 pre-training's YAML file](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L24) as an example, it could be configured like this: ```yaml # optimizer @@ -103,4 +103,4 @@ optimizer: #### Key Parameters Introduction -For the main parameters of optimizer configuration, see the relevant link in [MindSpore Transformers API Documentation: Optimizer](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.AdamW.html#mindformers.core.AdamW). \ No newline at end of file +For the main parameters of optimizer configuration, see the relevant link in [MindSpore Transformers API Documentation: Optimizer](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/core/mindformers.core.AdamW.html#mindformers.core.AdamW). \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/guide/deployment.md b/docs/mindformers/docs/source_en/guide/deployment.md index 36b35fa97c..2e65cc8ca9 100644 --- a/docs/mindformers/docs/source_en/guide/deployment.md +++ b/docs/mindformers/docs/source_en/guide/deployment.md @@ -8,7 +8,7 @@ MindIE, full name Mind Inference Engine, is a high-performance inference framewo MindSpore Transformers are hosted in the model application layer MindIE LLM, and large models in MindSpore Transformers can be deployed through MindIE Service. -The model support for MindIE inference can be found in [model repository](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html). +The model support for MindIE inference can be found in [model repository](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/introduction/models.html). ## Environment Setup @@ -16,7 +16,7 @@ The model support for MindIE inference can be found in [model repository](https: 1. Install MindSpore Transformers - Refer to [MindSpore Transformers Official Installation Guide](https://www.mindspore.cn/mindformers/docs/en/dev/installation.html) for installation. + Refer to [MindSpore Transformers Official Installation Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/installation.html) for installation. 2. Install MindIE @@ -86,9 +86,9 @@ processor: merges_file: "/path/to/mf_model/qwen1_5_72b/merges.txt" # merges file absolute path ``` -For model weight downloading and conversions, refer to the [Weight Format Conversion Guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html). +For model weight downloading and conversions, refer to the [Weight Format Conversion Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html). -Required files and configurations may vary from model to model. Refer to the model-specific inference sections in [Model Repository](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html) for details. +Required files and configurations may vary from model to model. Refer to the model-specific inference sections in [Model Repository](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/introduction/models.html) for details. ### Starting MindIE @@ -346,4 +346,4 @@ The validation is successful with the following returned inference result: ## Model List -Examples of MindIE inference for other models can be found in the introduction documentation for each model in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html). \ No newline at end of file +Examples of MindIE inference for other models can be found in the introduction documentation for each model in [Model Library](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/introduction/models.html). \ No newline at end of file diff --git a/docs/mindformers/docs/source_en/guide/inference.md b/docs/mindformers/docs/source_en/guide/inference.md index 49d8ee0356..8b588ef7e7 100644 --- a/docs/mindformers/docs/source_en/guide/inference.md +++ b/docs/mindformers/docs/source_en/guide/inference.md @@ -19,7 +19,7 @@ Depending on the required inference task, different models are chosen, e.g. for Currently, the inference weights can be loaded online to perform inference with the complete weights. The weights can be obtained through the following two methods: 1. Download the complete open-source weights of the corresponding model from the Hugging Face model library. -2. Pre-trained or fine-tuned distributed weights through [merger](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html#weight-merging) Generate a complete weight. +2. Pre-trained or fine-tuned distributed weights through [merger](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html#weight-merging) Generate a complete weight. ### 3. Executing Inference Tasks @@ -27,7 +27,7 @@ Use the unified script `run_mindformer` to execute inference tasks. ## Inference Based on the run_mindformer Script -For single-device inference, you can directly run [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py). For multi-device inference, you need to run [scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh). +For single-device inference, you can directly run [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py). For multi-device inference, you need to run [scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh). The arguments to run_mindformer.py are described below: @@ -45,7 +45,7 @@ The arguments to run_mindformer.py are described below: msrun_launcher.sh includes the run_mindformer.py command and the number of inference cards as two parameters. -The following will describe the usage of single and multi-card inference using `Qwen2.5-7B` as an example, with the recommended configuration of the [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml) file. +The following will describe the usage of single and multi-card inference using `Qwen2.5-7B` as an example, with the recommended configuration of the [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml) file. ### Configuration Modification @@ -76,7 +76,7 @@ processor: merges_file: "path/to/merges.txt" ``` -For specific configuration instructions, please refer to [yaml Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). +For specific configuration instructions, please refer to [yaml Configuration Instructions](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html). ### Single-Device Inference @@ -152,4 +152,4 @@ Inference results are viewed in the same way as multi-card inference. ## More Information -For more inference examples of different models, see [the models supported by MindSpore Transformers](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html). +For more inference examples of different models, see [the models supported by MindSpore Transformers](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/introduction/models.html). diff --git a/docs/mindformers/docs/source_en/guide/pre_training.md b/docs/mindformers/docs/source_en/guide/pre_training.md index 9ab5c08985..edb39a7122 100644 --- a/docs/mindformers/docs/source_en/guide/pre_training.md +++ b/docs/mindformers/docs/source_en/guide/pre_training.md @@ -12,23 +12,23 @@ Based on actual operations, the basic pretraining process can be divided into th ### 1. Preparing a dataset - The pretraining phase of MindSpore Transformers currently supports datasets in both [Megatron format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html#megatron-dataset) and [MindRecord format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html#mindrecord-dataset). Users can prepare the data according to the specific requirements of their tasks. + The pretraining phase of MindSpore Transformers currently supports datasets in both [Megatron format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html#megatron-dataset) and [MindRecord format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html#mindrecord-dataset). Users can prepare the data according to the specific requirements of their tasks. ### 2. Configuring File Preparation - The pretraining task in MindSpore Transformers is managed through a unified [configuration file](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html), allowing users to flexibly adjust various [training hyperparameters](https://www.mindspore.cn/mindformers/docs/en/dev/feature/training_hyperparameters.html). In addition, pretraining performance can be further optimized using features such as [distributed parallel training](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), [memory optimization](https://www.mindspore.cn/mindformers/docs/en/dev/feature/memory_optimization.html), and [other training features](https://www.mindspore.cn/mindformers/docs/en/dev/feature/other_training_features.html). + The pretraining task in MindSpore Transformers is managed through a unified [configuration file](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html), allowing users to flexibly adjust various [training hyperparameters](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/training_hyperparameters.html). In addition, pretraining performance can be further optimized using features such as [distributed parallel training](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/parallel_training.html), [memory optimization](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/memory_optimization.html), and [other training features](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/other_training_features.html). ### 3. Launching the Training Task - MindSpore Transformers provides a convenient [one-click script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/start_tasks.html) to launch the pretraining task. During training, users can monitor the progress using [logging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/logging.html) and [visualization tools](https://www.mindspore.cn/mindformers/docs/en/dev/feature/monitor.html). + MindSpore Transformers provides a convenient [one-click script](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/start_tasks.html) to launch the pretraining task. During training, users can monitor the progress using [logging](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/logging.html) and [visualization tools](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/monitor.html). ### 4. Saving a model - Checkpoint files can be saved during training or after completion. Currently, MindSpore Transformers supports saving models in [Ckpt format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html) or [Safetensors format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html), which can be used for later tasks such as resuming training or fine-tuning. + Checkpoint files can be saved during training or after completion. Currently, MindSpore Transformers supports saving models in [Ckpt format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html) or [Safetensors format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html), which can be used for later tasks such as resuming training or fine-tuning. ### 5. Fault Recovery - To handle unexpected interruptions during training, MindSpore Transformers includes [high availability features](https://www.mindspore.cn/mindformers/docs/en/dev/feature/high_availability.html) such as final-state saving and automatic recovery. It also supports [resuming training from checkpoints](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html), improving training stability. + To handle unexpected interruptions during training, MindSpore Transformers includes [high availability features](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/high_availability.html) such as final-state saving and automatic recovery. It also supports [resuming training from checkpoints](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/resume_training.html), improving training stability. ## MindSpore Transformers-based Pretraining Practice @@ -44,7 +44,7 @@ Currently, MindSpore Transformers supports Megatron dataset, which is typically ### Data Preprocessing -For dataset processing, refer to [Megatron Dataset - Data Preprocessing](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html#data-preprocessing). +For dataset processing, refer to [Megatron Dataset - Data Preprocessing](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html#data-preprocessing). - Generate Megatron BIN Format Files @@ -78,11 +78,11 @@ For dataset processing, refer to [Megatron Dataset - Data Preprocessing](https:/ ### Single-Node Training -Specify the configuration file [pretrain_deepseek3_671b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml) and start the [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py) script in msrun mode to perform 8-device distributed training. +Specify the configuration file [pretrain_deepseek3_671b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml) and start the [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py) script in msrun mode to perform 8-device distributed training. -The default configuration includes large values for parameters such as the number of layers and hidden dimensions, which are intended for large-scale multi-node distributed training. It cannot be directly used for pretraining on a single machine. You will need to modify the configuration as described in [DeepSeek-V3 - Configuration Modification](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/README.md#%E4%BF%AE%E6%94%B9%E9%85%8D%E7%BD%AE). +The default configuration includes large values for parameters such as the number of layers and hidden dimensions, which are intended for large-scale multi-node distributed training. It cannot be directly used for pretraining on a single machine. You will need to modify the configuration as described in [DeepSeek-V3 - Configuration Modification](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/README.md#%E4%BF%AE%E6%94%B9%E9%85%8D%E7%BD%AE). -For detailed instructions on launching the training task, refer to [Launch Task](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/README.md#%E6%8B%89%E8%B5%B7%E4%BB%BB%E5%8A%A1). The launch command is as follows: +For detailed instructions on launching the training task, refer to [Launch Task](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/README.md#%E6%8B%89%E8%B5%B7%E4%BB%BB%E5%8A%A1). The launch command is as follows: ```shell cd $MINDFORMERS_HOME @@ -116,8 +116,8 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ > The example code below assumes the **master node IP** is `192.168.1.1` and the current node's **Rank** is `0`. In actual execution, please set `master_ip` to the real **IP address** of the master node, and set `node_rank` to the **Rank** index of the current node. -**Note**: During multi-node distributed training, some performance problems may occur. To ensure the efficiency and stability of the training process, you are advised to optimize and adjust the performance by referring to [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html). +**Note**: During multi-node distributed training, some performance problems may occur. To ensure the efficiency and stability of the training process, you are advised to optimize and adjust the performance by referring to [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/advanced_development/performance_optimization.html). ## More Information -For more training examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html). +For more training examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/introduction/models.html). diff --git a/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md b/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md index dac8297929..9bc5ccf9b7 100644 --- a/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md +++ b/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md @@ -14,27 +14,27 @@ Combining practical operations, SFT fine-tuning can be broken down into the foll ### 1. Weight Preparation -Before fine-tuning, the weight files of the pre-trained model need to be prepared. MindSpore Transformers supports loading [safetensors weights](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html), enabling direct loading of model weights downloaded from the Hugging Face model hub. +Before fine-tuning, the weight files of the pre-trained model need to be prepared. MindSpore Transformers supports loading [safetensors weights](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html), enabling direct loading of model weights downloaded from the Hugging Face model hub. ### 2. Dataset Preparation -MindSpore Transformers currently supports datasets in [Hugging Face format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html#huggingface-datasets) and [MindRecord format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html#mindrecord-dataset) for the fine-tuning phase. Users can prepare data according to task requirements. +MindSpore Transformers currently supports datasets in [Hugging Face format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html#huggingface-datasets) and [MindRecord format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html#mindrecord-dataset) for the fine-tuning phase. Users can prepare data according to task requirements. ### 3. Configuration File Preparation -Fine-tuning tasks are uniformly controlled through [configuration files](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html), allowing users to flexibly adjust [model training hyperparameters](https://www.mindspore.cn/mindformers/docs/en/dev/feature/training_hyperparameters.html). Additionally, fine-tuning performance can be optimized using [distributed parallel training](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), [memory optimization features](https://www.mindspore.cn/mindformers/docs/en/dev/feature/memory_optimization.html), and [other training features](https://www.mindspore.cn/mindformers/docs/en/dev/feature/other_training_features.html). +Fine-tuning tasks are uniformly controlled through [configuration files](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/configuration.html), allowing users to flexibly adjust [model training hyperparameters](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/training_hyperparameters.html). Additionally, fine-tuning performance can be optimized using [distributed parallel training](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/parallel_training.html), [memory optimization features](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/memory_optimization.html), and [other training features](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/other_training_features.html). ### 4. Launching the Training Task -MindSpore Transformers provides a [one-click startup script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/start_tasks.html) to initiate fine-tuning tasks. During training, [logs](https://www.mindspore.cn/mindformers/docs/en/dev/feature/logging.html) and [visualization tools](https://www.mindspore.cn/mindformers/docs/en/dev/feature/monitor.html) can be used to monitor the training process. +MindSpore Transformers provides a [one-click startup script](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/start_tasks.html) to initiate fine-tuning tasks. During training, [logs](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/logging.html) and [visualization tools](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/monitor.html) can be used to monitor the training process. ### 5. Model Saving -Checkpoints are saved during training, or model weights are saved to a specified path upon completion. Currently, weights can be saved in [Safetensors format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html) or [Ckpt format](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html), which can be used for resumed training or further fine-tuning. +Checkpoints are saved during training, or model weights are saved to a specified path upon completion. Currently, weights can be saved in [Safetensors format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html) or [Ckpt format](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html), which can be used for resumed training or further fine-tuning. ### 6. Fault Recovery -To handle exceptions such as training interruptions, MindSpore Transformers offers [high-availability features](https://www.mindspore.cn/mindformers/docs/en/dev/feature/high_availability.html) like last-state saving and automatic recovery, as well as [checkpoint-based resumed training](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html), enhancing training stability. +To handle exceptions such as training interruptions, MindSpore Transformers offers [high-availability features](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/high_availability.html) like last-state saving and automatic recovery, as well as [checkpoint-based resumed training](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/resume_training.html), enhancing training stability. ## Full-Parameter Fine-Tuning with MindSpore Transformers @@ -44,7 +44,7 @@ MindSpore Transformers currently supports mainstream large-scale models in the i ### Downloading Model Weights -MindSpore Transformers supports loading Hugging Face model weights, enabling direct loading of weights downloaded from the Hugging Face model hub. For details, refer to [MindSpore Transformers-Safetensors Weights](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html). +MindSpore Transformers supports loading Hugging Face model weights, enabling direct loading of weights downloaded from the Hugging Face model hub. For details, refer to [MindSpore Transformers-Safetensors Weights](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/safetensors.html). | Model Name | Hugging Face Weight Download Link | | :---------- | :---------------------------------------------------: | @@ -52,7 +52,7 @@ MindSpore Transformers supports loading Hugging Face model weights, enabling dir ### Dataset Preparation -MindSpore Transformers supports online loading of Hugging Face datasets. For details, refer to [MindSpore Transformers-Dataset-Hugging Face Dataset](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html#huggingface-datasets). +MindSpore Transformers supports online loading of Hugging Face datasets. For details, refer to [MindSpore Transformers-Dataset-Hugging Face Dataset](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html#huggingface-datasets). This guide uses [llm-wizard/alpaca-gpt4-data](https://huggingface.co/datasets/llm-wizard/alpaca-gpt4-data) as the fine-tuning dataset. @@ -144,7 +144,7 @@ After task completion, a checkpoint folder will be generated in the mindformers/ #### Multi-Node Training -Multi-Node, multi-NPU fine-tuning tasks are similar to launching pre-training. Refer to [multi-node, multi-NPU pre-training commands](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html#multi-node-training). +Multi-Node, multi-NPU fine-tuning tasks are similar to launching pre-training. Refer to [multi-node, multi-NPU pre-training commands](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/pre_training.html#multi-node-training). First, modify the configuration file, adjusting settings based on the number of nodes: diff --git a/docs/mindformers/docs/source_en/index.rst b/docs/mindformers/docs/source_en/index.rst index 7dc8a5d003..1b0ad3dc85 100644 --- a/docs/mindformers/docs/source_en/index.rst +++ b/docs/mindformers/docs/source_en/index.rst @@ -11,7 +11,7 @@ Based on MindSpore's built-in parallel technology and component-based design, th - Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.; - Provide real-time visualization of training accuracy/performance monitoring indicators. -Users can refer to `Overall Architecture `_ and `Model Library `_ to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models. +Users can refer to `Overall Architecture `_ and `Model Library `_ to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models. If you have any suggestions for MindSpore Transformers, please contact us via `issue `_ and we will handle them promptly. @@ -20,10 +20,10 @@ Full-process Developing with MindSpore Transformers MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents: -- `Pretraining `_ -- `Supervised Fine-Tuning `_ -- `Inference `_ -- `Service Deployment `_ +- `Pretraining `_ +- `Supervised Fine-Tuning `_ +- `Inference `_ +- `Service Deployment `_ Code repository address: @@ -34,75 +34,75 @@ MindSpore Transformers provides a wealth of features throughout the full-process - General Features: - - `Start Tasks `_ + - `Start Tasks `_ One-click start for single-device, single-node and multi-node tasks. - - `Ckpt Weights `_ + - `Ckpt Weights `_ Supports conversion, slice and merge weight files in ckpt format. - - `Safetensors Weights `_ + - `Safetensors Weights `_ Supports saving and loading weight files in safetensors format. - - `Configuration File `_ + - `Configuration File `_ Supports the use of `YAML` files to centrally manage and adjust configurable items in tasks. - - `Loading Hugging Face Model Configurations `_ + - `Loading Hugging Face Model Configurations `_ Supports plug-and-play loading of Hugging Face community model configurations for seamless integration. - - `Logging `_ + - `Logging `_ Introduction of logs, including log structure, log saving, and so on. - - `Using Tokenizer `_ + - `Using Tokenizer `_ Introduction of tokenizer, supports the Hugging Face Tokenizer for use in reasoning and datasets. - Training Features: - - `Dataset `_ + - `Dataset `_ Supports multiple types and formats of datasets. - - `Model Training Hyperparameters `_ + - `Model Training Hyperparameters `_ Flexibly configure hyperparameter settings for large model training. - - `Training Metrics Monitoring `_ + - `Training Metrics Monitoring `_ Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process. - - `Resumable Training After Breakpoint `_ + - `Resumable Training After Breakpoint `_ Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training. - - `Training High Availability (Beta) `_ + - `Training High Availability (Beta) `_ Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature). - - `Parallel Training `_ + - `Parallel Training `_ One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards. - - `Training Memory Optimization `_ + - `Training Memory Optimization `_ Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training. - - `Other Training Features `_ + - `Other Training Features `_ Supports gradient accumulation and gradient clipping, etc. - Inference Features: - - `Evaluation `_ + - `Evaluation `_ Supports the use of third-party open-source evaluation frameworks and datasets for large-scale model ranking evaluations. - - `Quantization `_ + - `Quantization `_ Integrated MindSpore Golden Stick toolkit to provides a unified quantization inference process. @@ -111,29 +111,29 @@ Advanced developing with MindSpore Transformers - Diagnostics and Optimization - - `Precision Optimization `_ - - `Performance Optimization `_ + - `Precision Optimization `_ + - `Performance Optimization `_ - Model Development - - `Development Migration `_ + - `Development Migration `_ Environment Variables ------------------------------------ -- `Environment Variables Description `_ +- `Environment Variables Description `_ Contribution Guide ------------------------------------ -- `MindSpore Transformers Contribution Guide `_ -- `Modelers Contribution Guide `_ +- `MindSpore Transformers Contribution Guide `_ +- `Modelers Contribution Guide `_ FAQ ------------------------------------ -- `Model-Related `_ -- `Function-Related `_ +- `Model-Related `_ +- `Function-Related `_ .. toctree:: :glob: diff --git a/docs/mindformers/docs/source_en/installation.md b/docs/mindformers/docs/source_en/installation.md index 680b418212..c59beee591 100644 --- a/docs/mindformers/docs/source_en/installation.md +++ b/docs/mindformers/docs/source_en/installation.md @@ -23,7 +23,7 @@ Historical version matching relationship: ## Installing Dependent Software -1. Install Firmware and Driver: Download the firmware and driver package through the [Confirming Version Matching Relationship](https://www.mindspore.cn/mindformers/docs/en/dev/installation.html#confirming-version-matching-relationship) to download the installation package, and refer to the [Ascend official tutorial](https://www.hiascend.com/en/document) for installation. +1. Install Firmware and Driver: Download the firmware and driver package through the [Confirming Version Matching Relationship](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/installation.html#confirming-version-matching-relationship) to download the installation package, and refer to the [Ascend official tutorial](https://www.hiascend.com/en/document) for installation. 2. Install CANN and MindSpore: Follow the [Manual Installation](https://www.mindspore.cn/install/en) section on the MindSpore website for installation. diff --git a/docs/mindformers/docs/source_en/introduction/models.md b/docs/mindformers/docs/source_en/introduction/models.md index 96d34d423a..636efdc98f 100644 --- a/docs/mindformers/docs/source_en/introduction/models.md +++ b/docs/mindformers/docs/source_en/introduction/models.md @@ -6,11 +6,11 @@ The following table lists models supported by MindFormers. | Model | Specifications | Model Type | Latest Version | |:--------------------------------------------------------------------------------------------------------|:------------------------------|:----------------:|:----------------------:| -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | Sparse LLM | In-development version, 1.5.0 | -| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version, 1.5.0 | -| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version, 1.5.0 | -| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 | -| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/dev/research/telechat2) | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 | +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/deepseek3) | 671B | Sparse LLM | In-development version, 1.5.0 | +| [GLM4](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version, 1.5.0 | +| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/llama3_1) | 8B/70B | Dense LLM | In-development version, 1.5.0 | +| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 | +| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/telechat2) | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 | | [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | Dense LLM | 1.5.0 | | [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 | | [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 | diff --git a/docs/mindformers/docs/source_en/introduction/overview.md b/docs/mindformers/docs/source_en/introduction/overview.md index 02fa113dba..4a654b0c9b 100644 --- a/docs/mindformers/docs/source_en/introduction/overview.md +++ b/docs/mindformers/docs/source_en/introduction/overview.md @@ -7,7 +7,7 @@ The overall architecture formed by MindSpore Transformers and the end-to-end AI 1. At the hardware level, MindSpore Transformers supports users running large models on Ascend servers; 2. At the software level, MindSpore Transformers implements the big model-related code through the Python interface provided by MindSpore and performs data computation by the operator libraries provided by the supporting software package of the Ascend AI processor; 3. The basic functionality features currently supported by MindSpore Transformers are listed below: - 1. Supports tasks such as running training and inference for large models [distributed parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), with parallel capabilities including data parallelism, model parallelism, ultra-long sequence parallelism; - 2. Supports [model weight conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html), [distributed weight splitting and combination](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html), and different format of [dataset loading](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html) and [resumable training after breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html); - 3. Support 25+ large models [pretraining](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html), [fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/guide/supervised_fine_tuning.html), [inference](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html) and [evaluation] (https://www.mindspore.cn/mindformers/docs/en/dev/feature/evaluation.html). Meanwhile, it also supports [quantization](https://www.mindspore.cn/mindformers/docs/en/dev/feature/quantization.html), and the list of supported models can be found in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html); -4. MindSpore Transformers supports users to carry out model service deployment function through [MindIE](https://www.mindspore.cn/mindformers/docs/en/dev/guide/deployment.html), and also supports the use of [MindX]( https://www.hiascend.com/software/mindx-dl) to realize large-scale cluster scheduling; more third-party platforms will be supported in the future, please look forward to it. + 1. Supports tasks such as running training and inference for large models [distributed parallelism](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/parallel_training.html), with parallel capabilities including data parallelism, model parallelism, ultra-long sequence parallelism; + 2. Supports [model weight conversion](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html), [distributed weight splitting and combination](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/ckpt.html), and different format of [dataset loading](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/dataset.html) and [resumable training after breakpoint](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/resume_training.html); + 3. Support 25+ large models [pretraining](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/pre_training.html), [fine-tuning](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/supervised_fine_tuning.html), [inference](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/inference.html) and [evaluation] (https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/evaluation.html). Meanwhile, it also supports [quantization](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/feature/quantization.html), and the list of supported models can be found in [Model Library](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/introduction/models.html); +4. MindSpore Transformers supports users to carry out model service deployment function through [MindIE](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/guide/deployment.html), and also supports the use of [MindX]( https://www.hiascend.com/software/mindx-dl) to realize large-scale cluster scheduling; more third-party platforms will be supported in the future, please look forward to it. diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md b/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md index b25b5f0ba4..e21ac82889 100644 --- a/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md +++ b/docs/mindformers/docs/source_zh_cn/advanced_development/accuracy_comparison.md @@ -37,7 +37,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 - **Megatron-LM**:参考 [Megatron-LM 文档](https://github.com/NVIDIA/Megatron-LM/tree/core_r0.12.0?tab=readme-ov-file#setup) -- **MindSpore Transformers**:参考 [MindSpore Transformers 文档](https://gitee.com/mindspore/mindformers/blob/dev/README_CN.md) +- **MindSpore Transformers**:参考 [MindSpore Transformers 文档](https://gitee.com/mindspore/mindformers/blob/r1.6.0/README_CN.md) ## 3. 精度对比流程 @@ -252,7 +252,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 - 重计算配置 - MindSpore Transformers 重计算配置逻辑与 Megatron-LM 差异较大,参考[重计算配置](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/memory_optimization.html#%E9%87%8D%E8%AE%A1%E7%AE%97)使能即可。 + MindSpore Transformers 重计算配置逻辑与 Megatron-LM 差异较大,参考[重计算配置](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/memory_optimization.html#%E9%87%8D%E8%AE%A1%E7%AE%97)使能即可。 | Megatron-LM | 含义 | MindSpore Transformers | 含义 | |--------------------------------|-----------------------|------------------------|--------------------------| @@ -264,7 +264,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 | `checkpoint-activations` | 是否启用激活值检查点机制以减少显存 | 不支持配置 | | | `moe-layer-recompute` | MoE 层启用重计算 | 不支持配置 | | -**注意**:两个框架还有其他训练相关性较小的配置,MindSpore Transformer 详情参考[配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html),Megatron-LM 可通过执行命令`torchrun --nproc_per_node=1 pretrain_gpt.py --help`查看。 +**注意**:两个框架还有其他训练相关性较小的配置,MindSpore Transformer 详情参考[配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html),Megatron-LM 可通过执行命令`torchrun --nproc_per_node=1 pretrain_gpt.py --help`查看。 ### 3.2 数据集对齐 @@ -282,7 +282,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 - 生成Megatron BIN格式文件 - 将数据集文件`wiki.train.tokens`和分词模型文件`tokenizer.json`放置在`../dataset`下,并参照[Megatron数据集-数据预处理](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#%E6%95%B0%E6%8D%AE%E9%A2%84%E5%A4%84%E7%90%86)制作`data.json`文件。 + 将数据集文件`wiki.train.tokens`和分词模型文件`tokenizer.json`放置在`../dataset`下,并参照[Megatron数据集-数据预处理](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#%E6%95%B0%E6%8D%AE%E9%A2%84%E5%A4%84%E7%90%86)制作`data.json`文件。 使用以下命令将数据集文件转换为BIN格式文件。 @@ -358,7 +358,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 1. 生成 MinSpore Transformers 初始权重 - 参照[callbacks 配置](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html#callbacks%E9%85%8D%E7%BD%AE)通过修改 `example.yaml` 文件并执行[查看结果](#34-查看结果)中提供的命令,即可通过预训练在`example.yaml`中的`output_dir`的`checkpoints`下获得一份初始权重,修改内容如下: + 参照[callbacks 配置](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html#callbacks%E9%85%8D%E7%BD%AE)通过修改 `example.yaml` 文件并执行[查看结果](#34-查看结果)中提供的命令,即可通过预训练在`example.yaml`中的`output_dir`的`checkpoints`下获得一份初始权重,修改内容如下: ```yaml # Before (example.yaml) @@ -385,7 +385,7 @@ Megatron-LM 是一个面向大规模训练任务的成熟框架,具备高度 2. MindSpore Transformers to Megatron-LM - 为了将 MindSpore Transformers 的权重精确映射为 Megatron-LM 可加载的等价权重,我们提供了转换权重脚本,执行权重转换脚本即可获得等价权重。详情可查看[转换模型权重为Megatron模型权重的实践案例](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/example/convert_ckpt_to_megatron/convert_ckpt_to_megatron.html) + 为了将 MindSpore Transformers 的权重精确映射为 Megatron-LM 可加载的等价权重,我们提供了转换权重脚本,执行权重转换脚本即可获得等价权重。详情可查看[转换模型权重为Megatron模型权重的实践案例](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/example/convert_ckpt_to_megatron/convert_ckpt_to_megatron.html) 注意: diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md index c477e428df..56ffc36e86 100644 --- a/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md +++ b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md @@ -12,9 +12,9 @@ MindSpore Transformers中大模型的基本组成包含配置、模型、分词 模型配置是一个实例,包含模型的所有信息。MindSpore Transformers中所有模型的`__init__`方法都接收一个模型配置的实例作为入参,模型的所有子模块都通过这个配置实例中所包含的信息来初始化。 -MindSpore Transformers提供了[PretrainedConfig](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.PretrainedConfig.html)类,负责提供一些配置的通用方法。所有模型的配置类都应该继承于PretrainedConfig类,开发者只需关心定义所有帮助构建大模型的配置参数:Transformer类大模型通常都拥有`seq_length`、`hidden_size`、`num_layers`、`num_heads`等配置参数,文本类的大模型通常还有`vocab_size`等。 +MindSpore Transformers提供了[PretrainedConfig](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.PretrainedConfig.html)类,负责提供一些配置的通用方法。所有模型的配置类都应该继承于PretrainedConfig类,开发者只需关心定义所有帮助构建大模型的配置参数:Transformer类大模型通常都拥有`seq_length`、`hidden_size`、`num_layers`、`num_heads`等配置参数,文本类的大模型通常还有`vocab_size`等。 -可以参考MindSpore Transformers中Llama模型的配置类[LlamaConfig](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.LlamaConfig.html)。 +可以参考MindSpore Transformers中Llama模型的配置类[LlamaConfig](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.LlamaConfig.html)。 > 如果您的模型与库内的模型非常相似,可以复用与该模型相同的配置。 @@ -22,12 +22,12 @@ MindSpore Transformers提供了[PretrainedConfig](https://www.mindspore.cn/mindf MindSpore Transformers的大模型基于MindSpore框架进行开发,其中开发者只需要关心模型网络本身的实现。 -MindSpore Transformers提供了[PretrainedModel](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.PreTrainedModel.html)类,负责存储模型配置并处理加载、保存模型的方法。所有模型的类都应该继承于PretrainedModel类,并且模型的输入应该是统一的,即模型的`construct`方法的入参应该一致,具体入参和含义可以参考MindSpore Transformers中的Llama模型类[LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.LlamaForCausalLM.html)。同时,模型类必须实现基类的一些抽象方法,包括: +MindSpore Transformers提供了[PretrainedModel](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.PreTrainedModel.html)类,负责存储模型配置并处理加载、保存模型的方法。所有模型的类都应该继承于PretrainedModel类,并且模型的输入应该是统一的,即模型的`construct`方法的入参应该一致,具体入参和含义可以参考MindSpore Transformers中的Llama模型类[LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.LlamaForCausalLM.html)。同时,模型类必须实现基类的一些抽象方法,包括: - `prepare_inputs_for_generation`:为模型推理构建输入的方法。 - `prepare_inputs_for_predict_layout`:为分布式加载模型权重构建虚拟输入的方法。 -关于它们的具体含义,可以参考[LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.LlamaForCausalLM.html)中的描述。 +关于它们的具体含义,可以参考[LlamaForCausalLM](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.LlamaForCausalLM.html)中的描述。 > 如果您的模型结构与库内的模型非常相似,可以复用该模型的实现。 @@ -35,20 +35,20 @@ MindSpore Transformers提供了[PretrainedModel](https://www.mindspore.cn/mindfo 分词器(Tokenizer)的作用是处理大语言模型的输入与输出。它在大语言模型的工作流程中是必需的。 -MindSpore Transformers提供了[PretrainedTokenizer](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.PreTrainedTokenizer.html)类和[PretrainedTokenizerFast](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.PreTrainedTokenizerFast.html)类,分别是纯Python的实现和使用Rust库的实现。后者实现的区别是: +MindSpore Transformers提供了[PretrainedTokenizer](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.PreTrainedTokenizer.html)类和[PretrainedTokenizerFast](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.PreTrainedTokenizerFast.html)类,分别是纯Python的实现和使用Rust库的实现。后者实现的区别是: - 在进行批量处理时速度显著提高; - 额外包含一些在文本字符串和词元空间映射的方法(例如,获取包含给定字符的词元的索引或与给定词元相对应的字符跨度) -所有分词器的类应该继承于PretrainedTokenizer类或PretrainedTokenizerFast类,具体实现可以参考[LlamaTokenizer](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.LlamaTokenizer.html)和[LlamaTokenizerFast](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.LlamaTokenizerFast.html)。 +所有分词器的类应该继承于PretrainedTokenizer类或PretrainedTokenizerFast类,具体实现可以参考[LlamaTokenizer](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.LlamaTokenizer.html)和[LlamaTokenizerFast](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.LlamaTokenizerFast.html)。 > 如果您的分词器与库内的分词器非常相似,可以复用该分词器的实现。 ### 准备权重和数据集 -如已有基于PyTorch的模型权重,可以参考[权重转换文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html#%E6%9D%83%E9%87%8D%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2)将权重转换为MindSpore格式的权重。 +如已有基于PyTorch的模型权重,可以参考[权重转换文档](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html#%E6%9D%83%E9%87%8D%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2)将权重转换为MindSpore格式的权重。 -数据集的准备可以参考[数据集文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html),或参考模型文档,如[Llama2说明文档——数据集准备](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)。 +数据集的准备可以参考[数据集文档](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html),或参考模型文档,如[Llama2说明文档——数据集准备](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)。 ### 准备`YAML`配置文件 @@ -56,7 +56,7 @@ MindSpore Transformers使用`YAML`配置文件配置一个任务所需的所有 由于自定义模型的代码不在MindSpore Transformers库内,代码中的自定义模块没有注册在MindSpore Transformers中,因而不能被自动实例化。这些代码也称为外挂代码(如`research`目录下代码)。因此需要在编写的`YAML`配置文件中的对应模块配置下添加自动注册任意模块的配置项`auto_register`,设置为要注册的API接口的相对导入路径。后续在执行run_mindformer.py脚本拉起任务时添加注册路径的入参`--register_path`,设置为外挂代码所在目录的相对路径。 -例如,`research`目录下的Llama3.1-8B模型的推理`YAML`配置文件[`research/llama3_1/predict_llama3_1_8b.yaml`](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml)中,添加了自动注册的配置项`auto_register`,以注册[`research/llama3_1/llama3_1_tokenizer.py`](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_tokenizer.py)中自定义的`Llama3Tokenizer`: +例如,`research`目录下的Llama3.1-8B模型的推理`YAML`配置文件[`research/llama3_1/predict_llama3_1_8b.yaml`](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml)中,添加了自动注册的配置项`auto_register`,以注册[`research/llama3_1/llama3_1_tokenizer.py`](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_tokenizer.py)中自定义的`Llama3Tokenizer`: ```yaml ... @@ -91,15 +91,15 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l | register_path | 外挂代码所在目录的路径 | | predict_data | 推理的输入数据 | -其中设置了`register_path`为外挂代码所在目录的路径`research/llama3_1`,模型权重的准备参考[Llama3.1说明文档——模型权重下载](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD)。 +其中设置了`register_path`为外挂代码所在目录的路径`research/llama3_1`,模型权重的准备参考[Llama3.1说明文档——模型权重下载](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD)。 -配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。在实际编写配置文件时,也可以参考库内已有的配置文件,例如[Llama3_1-8B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)。 +配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html)。在实际编写配置文件时,也可以参考库内已有的配置文件,例如[Llama3_1-8B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)。 -在准备完上述所有基本要素之后,可以参考MindSpore Transformers使用教程中的其余文档进行模型训练、微调、推理等流程的实践。后续模型调试调优可以参考[大模型精度调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/precision_optimization.html)和[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。 +在准备完上述所有基本要素之后,可以参考MindSpore Transformers使用教程中的其余文档进行模型训练、微调、推理等流程的实践。后续模型调试调优可以参考[大模型精度调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/advanced_development/precision_optimization.html)和[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/advanced_development/performance_optimization.html)。 ### 将模型贡献给MindSpore Transformers开源仓库 -可以参考[MindSpore Transformers贡献指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/contribution/mindformers_contribution.html),将模型贡献到MindSpore Transformers的开源仓库,供广大开发者研究和使用。 +可以参考[MindSpore Transformers贡献指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/contribution/mindformers_contribution.html),将模型贡献到MindSpore Transformers的开源仓库,供广大开发者研究和使用。 ## MindSpore Transformers大模型迁移实践 diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md b/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md index efa61e860f..ef57007c61 100644 --- a/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md +++ b/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md @@ -64,7 +64,7 @@ $$ 在实际应用中,通常会采用多种并行策略和优化手段,例如使用优化器并行和重计算等方式,以减少模型对内存的使用并提高训练效率。并行策略设计与模型的效率密切相关,因此在模型调优之前先确定一组或多组较优的并行策略,是至关重要的。 -详细介绍参考文档[并行策略指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html)。 +详细介绍参考文档[并行策略指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/parallel_training.html)。 对于不同的参数量规格的模型,可参考以下并行策略选择方向: @@ -277,7 +277,7 @@ MindStudio Insight工具以时间线(Timeline)的形式呈现全流程在线 #### IR 图 -在[MindSpore Transformers配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)中,只需要开启save_graphs,运行时会输出一些图编译过程中生成的.ir后缀的中间文件,这些被称为IR文件。默认情况下,这些文件会保存在当前执行目录下的graph目录中。IR文件是一种比较直观易懂的文本格式文件,用于描述模型结构的文件,可以直接用文本编辑软件查看。配置项含义参考[Config配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html),配置方法如下: +在[MindSpore Transformers配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html)中,只需要开启save_graphs,运行时会输出一些图编译过程中生成的.ir后缀的中间文件,这些被称为IR文件。默认情况下,这些文件会保存在当前执行目录下的graph目录中。IR文件是一种比较直观易懂的文本格式文件,用于描述模型结构的文件,可以直接用文本编辑软件查看。配置项含义参考[Config配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html),配置方法如下: ```yaml context: @@ -598,7 +598,7 @@ recompute_config: 2. embedding参数配置优化器并行:大词表占用内存过多,且词表权重的优化器并行需额外配置,配置后有效缓解首个stage显存不足问题; - 优化器并行使用介绍可参考[MindSpore优化器并行文档](https://www.mindspore.cn/docs/zh-CN/r2.7.0rc1/features/parallel/optimizer_parallel.html);此外,Llama模型还对embedding层的优化器有额外配置,[LlamaConfig API文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/models/mindformers.models.LlamaConfig.html#mindformers.models.LlamaConfig)中的`parallel_optimizer`项即为控制embedding优化器并行的控制项; + 优化器并行使用介绍可参考[MindSpore优化器并行文档](https://www.mindspore.cn/docs/zh-CN/r2.7.0rc1/features/parallel/optimizer_parallel.html);此外,Llama模型还对embedding层的优化器有额外配置,[LlamaConfig API文档](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/models/mindformers.models.LlamaConfig.html#mindformers.models.LlamaConfig)中的`parallel_optimizer`项即为控制embedding优化器并行的控制项; 配置样例如下: ```yaml diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md b/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md index 0810ec9697..aad5a8037f 100644 --- a/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md +++ b/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md @@ -187,7 +187,7 @@ export MINDSPORE_DUMP_CONFIG=${JSON_PATH} #### 权重转换 -训练过程中,MindSpore与PyTorch加载同一份权重。若是预训练场景,可以使用PyTorch保存一个初始化权重后,转换为MindSpore权重。因为MindSpore的权重名称与PyTorch有差异,权重转换的本质是将PyTorch权重dict中的名字改为MindSpore权重名字,以支持MindSpore加载。权重转换参考[权重转换指导](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html#%E6%9D%83%E9%87%8D%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2)。 +训练过程中,MindSpore与PyTorch加载同一份权重。若是预训练场景,可以使用PyTorch保存一个初始化权重后,转换为MindSpore权重。因为MindSpore的权重名称与PyTorch有差异,权重转换的本质是将PyTorch权重dict中的名字改为MindSpore权重名字,以支持MindSpore加载。权重转换参考[权重转换指导](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html#%E6%9D%83%E9%87%8D%E6%A0%BC%E5%BC%8F%E8%BD%AC%E6%8D%A2)。 MindSpore与PyTorch均支持`bin`格式数据,加载相同的数据集进行训练,保证每个step一致。 @@ -226,7 +226,7 @@ MindSpore与PyTorch均支持`bin`格式数据,加载相同的数据集进行 # 原始代码 ``` -* MindSpore代码,在[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py)中,新增seed_all方法,并在main方法中调用,添加方法如下: +* MindSpore代码,在[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py)中,新增seed_all方法,并在main方法中调用,添加方法如下: ```python import numpy as np @@ -337,7 +337,7 @@ def get_parameters(self): return params ``` -MindSpore Transformers加载梯度参考[mindformers/wrapper/wrapper.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/wrapper/wrapper.py)实现。注意,需要用户自行找到MindSpore Transformers与PyTorch梯度的对应关系,参考如下修改代码: +MindSpore Transformers加载梯度参考[mindformers/wrapper/wrapper.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/wrapper/wrapper.py)实现。注意,需要用户自行找到MindSpore Transformers与PyTorch梯度的对应关系,参考如下修改代码: ```python class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell): diff --git a/docs/mindformers/docs/source_zh_cn/env_variables.md b/docs/mindformers/docs/source_zh_cn/env_variables.md index 4607479244..218196855f 100644 --- a/docs/mindformers/docs/source_zh_cn/env_variables.md +++ b/docs/mindformers/docs/source_zh_cn/env_variables.md @@ -14,7 +14,7 @@ | **ASCEND_LAUNCH_BLOCKING** | 0 | 训练或在线推理场景,可通过此环境变量控制算子执行时是否启动同步模式。 | `1`:强制算子采用同步模式运行;
`0`:不强制算子采用同步模式运行。 | 由于 NPU 模型训练时默认算子异步执行,导致算子执行过程中出现报错时,打印的报错堆栈信息并不是实际的调用栈信息。当设置为`1`时,强制算子采用同步模式运行,这样能够打印正确的调用栈信息,从而更容易地调试和定位代码中的问题。设置为`1`时有更高的运算效率。 | | **TE_PARALLEL_COMPILER** | 8 | 算子最大并行编译进程数,当大于 1 时开启并行编译。 | 取值为正整数;最大不超过 cpu 核数\*80%/昇腾 AI 处理器个数,取值范围 1~32,默认值是 8。 | 网络模型较大时,可通过配置此环境变量开启算子的并行编译功能;
设置为`1`时为单线程编译,在调试时,可以简化难度。 | | **CPU_AFFINITY** | 0 | 启动 CPU 亲和性开关,启动该选项可以确保每个进程或线程绑定到一个 CPU 核心上,以提高性能。 | `1`:开启 CPU 亲和性开关;
`0`:关闭 CPU 亲和性开关。 | 出于**优化资源利用** 以及**节能** 的考虑,CPU 亲和性默认关闭。 | -| **MS_MEMORY_STATISTIC** | 0 | 内存统计。 | `1`:开启内存统计功能;
`0`:关闭内存统计功能。 | 在内存分析时,可以统计内存的基本使用情况。具体可以参考[调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。 | +| **MS_MEMORY_STATISTIC** | 0 | 内存统计。 | `1`:开启内存统计功能;
`0`:关闭内存统计功能。 | 在内存分析时,可以统计内存的基本使用情况。具体可以参考[调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/advanced_development/performance_optimization.html)。 | | **MINDSPORE_DUMP_CONFIG** | | 指定 [云侧 Dump 功能](https://www.mindspore.cn/tutorials/zh-CN/r2.7.0rc1/debug/dump.html) 或 [端侧 Dump 功能](https://www.mindspore.cn/lite/docs/zh-CN/r2.7.0rc1/tools/benchmark_tool.html#dump功能) 所依赖的配置文件的路径 | 文件路径,支持相对路径与绝对路径。 | | | **GLOG_v** | 3 | 控制 MindSpore 日志的级别。 | `0`:DEBUG;
`1`:INFO;
`2`:WARNING;
`3`:ERROR:表示程序执行出现报错,输出错误日志,程序可能不会终止;
`4`:CRITICAL,表示程序执行出现异常,将会终止执行程序。 | | | **ASCEND_GLOBAL_LOG_LEVEL** | 3 | 控制 CANN 的日志级别。 | `0`:DEBUG;
`1`:INFO;
`2`:WARNING;
`3`:ERROR;
`4`:NULL,不输出日志。 | | @@ -40,4 +40,4 @@ | **MS_ENABLE_FA_FLATTEN** | on | 控制 是否支持 FlashAttention flatten 优化。 | `on`:启用 FlashAttention flatten 优化;
`off`: 禁用 FlashAttention flatten 优化。 | 对于还未适配FlashAttention flatten 优化的模型提供回退机制。 | | **EXPERIMENTAL_KERNEL_LAUNCH_GROUP** | NA | 控制是否支持算子批量并行下发,支持开启并行下发,并配置并行数 | `thread_num`: 并发线程数,一般不建议增加,默认值为`2`;
`kernel_group_num`: 算子分组总数量,每线程`kernel_group_num/thread_num`个组,默认值为`8`。 | 该特性后续还会继续演进,后续行为可能会有变更,当前仅支持`deepseek`推理场景,有一定的性能优化,但是其他模型使用该特性可能会有劣化,用户需要谨慎使用,使用方法如下:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`。 | | **FORCE_EAGER** | False | 控制是否**不开启**jit模式。 | `False`: 开启jit模式;
`True`: 不开启jit模式。 | Jit将函数编译成一张可调用的MindSpore图,设置FORCE_EAGER为False开启jit模式,可以获取性能收益,当前仅支持推理模式。 | -| **MS_ENABLE_TFT** | NA | 使能 [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) 特性,表示启用 TTP、UCE、HCCE、ARF、TRE 或 TSP 功能。 | 取值为"{TTP:1,UCE:1,HCCE:1,ARF:1,TRE:1,TSP:1}",使用某一功能时,可将对应字段配置为"1"。 | 使用方式可以参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/high_availability.html)。 | +| **MS_ENABLE_TFT** | NA | 使能 [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) 特性,表示启用 TTP、UCE、HCCE、ARF、TRE 或 TSP 功能。 | 取值为"{TTP:1,UCE:1,HCCE:1,ARF:1,TRE:1,TSP:1}",使用某一功能时,可将对应字段配置为"1"。 | 使用方式可以参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/high_availability.html)。 | diff --git a/docs/mindformers/docs/source_zh_cn/example/convert_ckpt_to_megatron/convert_ckpt_to_megatron.md b/docs/mindformers/docs/source_zh_cn/example/convert_ckpt_to_megatron/convert_ckpt_to_megatron.md index 7b81230d98..4fbb4bf550 100644 --- a/docs/mindformers/docs/source_zh_cn/example/convert_ckpt_to_megatron/convert_ckpt_to_megatron.md +++ b/docs/mindformers/docs/source_zh_cn/example/convert_ckpt_to_megatron/convert_ckpt_to_megatron.md @@ -21,7 +21,7 @@ 使用 MindSpore Transformers 保存的safetensors权重进行转换。 > - 当前仅支持由SelfAttention和MLP组成的类GPT模型权重转换(如GPT、Qwen等),暂不支持MLA和MoE。 -> - 仅支持未分布式切分的完整权重。如为分布式权重,请先参考[权重合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html#%E6%9D%83%E9%87%8D%E5%90%88%E5%B9%B6)进行合并。 +> - 仅支持未分布式切分的完整权重。如为分布式权重,请先参考[权重合并](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html#%E6%9D%83%E9%87%8D%E5%90%88%E5%B9%B6)进行合并。 ## 权重转换步骤 diff --git a/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md b/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md index 6570259c52..ee15486c26 100644 --- a/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md +++ b/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md @@ -14,7 +14,7 @@ ### 1.1 环境 -安装方式请参考[MindSpore Transformers安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html)。 +安装方式请参考[MindSpore Transformers安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/installation.html)。 并将本案例的[distilled](https://gitee.com/mindspore/docs/tree/r2.7.0rc1/docs/mindformers/docs/source_zh_cn/example/distilled/distilled)文件夹,复制到MindSpore Transformers源码根目录下。 @@ -227,7 +227,7 @@ python toolkit/data_preprocess/huggingface/datasets_preprocess.py \ 最后在`packed_data`中可以找到处理后的数据集,格式为arrow。 -更多数据集处理的教程请参考[MindSpore Transformers官方文档-数据集](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AEhandler)。 +更多数据集处理的教程请参考[MindSpore Transformers官方文档-数据集](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AEhandler)。 ##### 选项 2: 使用完成转换的数据 @@ -277,7 +277,7 @@ train_dataset: &train_dataset ...... ``` -其余参数配置的解释可以参考[MindSpore Transformers官方文档-SFT微调](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/supervised_fine_tuning.html)。 +其余参数配置的解释可以参考[MindSpore Transformers官方文档-SFT微调](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/supervised_fine_tuning.html)。 ## 2. 启动微调 @@ -297,7 +297,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py --config distilled/finetune_qw 日志记录在`output/msrun_log`目录下,例如可以通过`tail -f output/msrun_log/worker_7.log`指令查看worker 7的日志信息。 微调完成后,输出的`safetensors`权重文件在`output/checkpoint`目录下。 -更多safetensors权重的内容请参考[MindSpore Transformers官方文档-Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html)。 +更多safetensors权重的内容请参考[MindSpore Transformers官方文档-Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html)。 ## 3. 执行推理 diff --git a/docs/mindformers/docs/source_zh_cn/faq/feature_related.md b/docs/mindformers/docs/source_zh_cn/faq/feature_related.md index 41ba7998aa..e3e3d9d740 100644 --- a/docs/mindformers/docs/source_zh_cn/faq/feature_related.md +++ b/docs/mindformers/docs/source_zh_cn/faq/feature_related.md @@ -10,7 +10,7 @@ A: 官方下载链接失效,请关注社区Issue [#IBV35D](https://gitee.com/m ## Q: 如何生成模型切分策略文件? -A: 模型切分策略文件记录了模型权重在分布式场景下的切分策略,一般在离线权重切分时使用。在网络`yaml`文件中配置`only_save_strategy: True`,然后正常启动分布式任务,便可在`output/strategy/`目录下生成分布式策略文件,详细介绍请参阅[分布式权重切分与合并教程](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 +A: 模型切分策略文件记录了模型权重在分布式场景下的切分策略,一般在离线权重切分时使用。在网络`yaml`文件中配置`only_save_strategy: True`,然后正常启动分布式任务,便可在`output/strategy/`目录下生成分布式策略文件,详细介绍请参阅[分布式权重切分与合并教程](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。
diff --git a/docs/mindformers/docs/source_zh_cn/feature/ckpt.md b/docs/mindformers/docs/source_zh_cn/feature/ckpt.md index 4ded58449d..7127462958 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/ckpt.md +++ b/docs/mindformers/docs/source_zh_cn/feature/ckpt.md @@ -6,7 +6,7 @@ ckpt是深度学习框架中用于保存模型训练状态的通用文件格式,包含模型参数、优化器状态和训练进度等信息,主要用于恢复训练或微调模型,本文主要介绍MindSpore Transformers如何支持该文件格式的转换和切分。 -> 已计划日落ckpt格式,使用权重更推荐使用safetensors格式。Safetensors 是 Huggingface 推出的一种可靠、易移植的机器学习模型存储格式,用于安全地存储Tensor,而且存储速度较快。详细参考文档[Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html)。 +> 已计划日落ckpt格式,使用权重更推荐使用safetensors格式。Safetensors 是 Huggingface 推出的一种可靠、易移植的机器学习模型存储格式,用于安全地存储Tensor,而且存储速度较快。详细参考文档[Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html)。 ## 权重格式转换 @@ -40,7 +40,7 @@ python convert_weight.py [-h] --model MODEL [--reversed] --input_path INPUT_PATH ### 转换示例 -假设用户已经下载了[Llama2模型的权重](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD),并保存在路径`/home/user/torch_weights`中,用户希望将其转换为MindSpore Transformers权重并保存在路径`/home/user/ms_weights`中,可以使用以下命令: +假设用户已经下载了[Llama2模型的权重](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/llama2.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD),并保存在路径`/home/user/torch_weights`中,用户希望将其转换为MindSpore Transformers权重并保存在路径`/home/user/ms_weights`中,可以使用以下命令: ```bash python convert_weight.py --model llama --input_path /home/user/torch_weights --output_path /home/user/ms_weights/llama.ckpt @@ -75,7 +75,7 @@ python convert_weight.py --model llama --input_path /home/user/torch_weights --o ### 模型权重转换开发示例 -此处以Llama为例。如若希望转换HuggingFace权重至MindSpore Transformers权重,需在[convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py)内定义`convert_pt_to_ms`函数: +此处以Llama为例。如若希望转换HuggingFace权重至MindSpore Transformers权重,需在[convert_weight.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/convert_weight.py)内定义`convert_pt_to_ms`函数: ```python def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs): @@ -108,7 +108,7 @@ def convert_pt_to_ms(input_path, output_path, dtype=None, **kwargs): return True ``` -而若是希望转换MindSpore Transformers权重至HuggingFace权重,则需在[convert_reversed.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_reversed.py)内定义`convert_ms_to_pt`函数: +而若是希望转换MindSpore Transformers权重至HuggingFace权重,则需在[convert_reversed.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/convert_reversed.py)内定义`convert_ms_to_pt`函数: ```python def convert_ms_to_pt(input_path, output_path, dtype=None, **kwargs): @@ -241,7 +241,7 @@ MindSpore每次运行分布式任务后都会在`output/strategy`文件夹下生 **单进程转换** -使用[mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py)对载入权重进行单进程转换。 +使用[mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.py)对载入权重进行单进程转换。 **运行命令**: @@ -254,7 +254,7 @@ python transform_checkpoint.py \ **多进程转换** -使用[mindformers/tools/ckpt_transform/transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.sh)对载入权重进行多进程转换。 +使用[mindformers/tools/ckpt_transform/transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.sh)对载入权重进行多进程转换。 **运行命令**: @@ -269,7 +269,7 @@ bash transform_checkpoint.sh \ **注意事项**: -- 使用[transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.sh)脚本时,参数`8`表示目标设备数,参数`2`表示使用2个进程进行转换。 +- 使用[transform_checkpoint.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.sh)脚本时,参数`8`表示目标设备数,参数`2`表示使用2个进程进行转换。 ### 特殊场景 @@ -325,7 +325,7 @@ bash transform_checkpoint.sh \ **启动任务:** - 使用[mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh)进行任务启动。 + 使用[mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh)进行任务启动。 ```shell # 第一台服务器(主节点) @@ -376,7 +376,7 @@ bash transform_checkpoint.sh \ - **离线权重转换** - 在保存有所有策略文件的服务器上,使用[mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py)进行离线权重转换。 + 在保存有所有策略文件的服务器上,使用[mindformers/tools/ckpt_transform/transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.py)进行离线权重转换。 **单进程转换:** @@ -431,7 +431,7 @@ LoRA(Low-Rank Adaptation)的基本原理是对原始模型的参数进行低 #### 使用说明 -使用MindSpore Transformers提供的[LoRA权重合并脚本](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/transform_ckpt_lora.py),按照如下方式进行LoRA权重合并。 +使用MindSpore Transformers提供的[LoRA权重合并脚本](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/transform_ckpt_lora.py),按照如下方式进行LoRA权重合并。 ```shell python mindformers/tools/transform_ckpt_lora.py \ diff --git a/docs/mindformers/docs/source_zh_cn/feature/configuration.md b/docs/mindformers/docs/source_zh_cn/feature/configuration.md index 198326ad4c..0668f43f69 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/configuration.md +++ b/docs/mindformers/docs/source_zh_cn/feature/configuration.md @@ -19,9 +19,9 @@ MindSpore Transformers提供的`YAML`文件中包含对于不同功能的配置 | seed | 设置全局种子,详情可参考[mindspore.set_seed](https://www.mindspore.cn/docs/zh-CN/r2.7.0rc1/api_python/mindspore/mindspore.set_seed.html)。 | int | | run_mode | 设置模型的运行模式,可选`train`、`finetune`、`eval`或`predict`。 | str | | output_dir | 设置保存log、checkpoint、strategy等文件的路径。 | str | -| load_checkpoint | 加载权重的文件或文件夹路径,目前有3个应用场景:
1. 支持传入完整权重文件路径。
2. 支持传入离线切分后的权重文件夹路径。
3. 支持传入包含lora权重和base权重的文件夹路径。
各种权重的获取途径可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 | str | -| auto_trans_ckpt | 是否开启分布式权重自动切分与合并功能,详情可参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 | bool | -| resume_training | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool | +| load_checkpoint | 加载权重的文件或文件夹路径,目前有3个应用场景:
1. 支持传入完整权重文件路径。
2. 支持传入离线切分后的权重文件夹路径。
3. 支持传入包含lora权重和base权重的文件夹路径。
各种权重的获取途径可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。 | str | +| auto_trans_ckpt | 是否开启分布式权重自动切分与合并功能,详情可参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。 | bool | +| resume_training | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool | | load_ckpt_format | 加载的模型权重的格式,可选`ckpt`、`safetensors`。 | str | | remove_redundancy | 加载的模型权重是否去除了冗余。默认值为`False`。 | bool | | train_precision_sync | 训练确定性计算开关。默认值为`None` 。 | Optional[bool] | @@ -149,7 +149,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/ ### 并行配置 -为了提升模型的性能,在大规模集群的使用场景中通常需要为模型配置并行策略,详情可参考[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html),MindSpore Transformers中的并行配置如下。 +为了提升模型的性能,在大规模集群的使用场景中通常需要为模型配置并行策略,详情可参考[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/parallel_training.html),MindSpore Transformers中的并行配置如下。 | 参数 | 说明 | 类型 | |-----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| @@ -183,7 +183,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/ ### 模型优化配置 -1. MindSpore Transformers提供重计算相关配置,以降低模型在训练时的内存占用,详情可参考[重计算](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html#重计算)。 +1. MindSpore Transformers提供重计算相关配置,以降低模型在训练时的内存占用,详情可参考[重计算](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/advanced_development/performance_optimization.html#重计算)。 | 参数 | 说明 | 类型 | |----------------------------------------------------|-------------------------------|-----------------| @@ -195,7 +195,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/ | recompute_config.select_recompute_exclude | 关闭指定算子的重计算,只对Primitive算子有效。 | bool/list | | recompute_config.select_comm_recompute_exclude | 关闭指定算子的通讯重计算,只对Primitive算子有效。 | bool/list | -2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/memory_optimization.html#%E7%BB%86%E7%B2%92%E5%BA%A6%E6%BF%80%E6%B4%BB%E5%80%BCswap)。 +2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/memory_optimization.html#%E7%BB%86%E7%B2%92%E5%BA%A6%E6%BF%80%E6%B4%BB%E5%80%BCswap)。 | 参数 | 说明 | 类型 | |------|-----|-----| @@ -290,7 +290,7 @@ MindSpore Transformers提供模型评估功能,同时支持模型边训练边 ### Profile配置 -MindSpore Transformers提供Profile作为模型性能调优的主要工具,详情可参考[性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。以下是Profile相关配置。 +MindSpore Transformers提供Profile作为模型性能调优的主要工具,详情可参考[性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/advanced_development/performance_optimization.html)。以下是Profile相关配置。 | 参数 | 说明 | 类型 | |-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------|------| @@ -310,7 +310,7 @@ MindSpore Transformers提供Profile作为模型性能调优的主要工具,详 ### 指标监控配置 -指标监控配置主要用于配置训练过程中各指标的记录方式,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/monitor.html)。以下是MindSpore Transformers中通用的指标监控配置项说明: +指标监控配置主要用于配置训练过程中各指标的记录方式,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/monitor.html)。以下是MindSpore Transformers中通用的指标监控配置项说明: | 参数名称 | 说明 | 类型 | |--------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|---------------| @@ -333,7 +333,7 @@ MindSpore Transformers提供Profile作为模型性能调优的主要工具,详 ### TensorBoard配置 -TensorBoard配置主要用于配置训练过程中与TensorBoard相关的参数,便于在训练过程中实时查看和监控训练信息,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/monitor.html)。以下是MindSpore Transformers中通用的TensorBoard配置项说明: +TensorBoard配置主要用于配置训练过程中与TensorBoard相关的参数,便于在训练过程中实时查看和监控训练信息,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/monitor.html)。以下是MindSpore Transformers中通用的TensorBoard配置项说明: | 参数名称 | 说明 | 类型 | |-------------------------------------------|----------------------------------------------------------|------| diff --git a/docs/mindformers/docs/source_zh_cn/feature/dataset.md b/docs/mindformers/docs/source_zh_cn/feature/dataset.md index 25c14ef627..bb96767342 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/dataset.md +++ b/docs/mindformers/docs/source_zh_cn/feature/dataset.md @@ -16,7 +16,7 @@ Megatron数据集是为大规模分布式语言模型预训练场景设计的一 ### 数据预处理 -MindSpore Transformers提供了数据预处理脚本[preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py)用于将`json`格式的原始文本预料转换成`.bin`或`.idx`文件。如果用户的原始文本不是`json`格式,需要自行将数据处理成对应格式的文件。 +MindSpore Transformers提供了数据预处理脚本[preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py)用于将`json`格式的原始文本预料转换成`.bin`或`.idx`文件。如果用户的原始文本不是`json`格式,需要自行将数据处理成对应格式的文件。 下面是`json`格式文件的示例: @@ -70,7 +70,7 @@ MindSpore Transformers提供了数据预处理脚本[preprocess_indexed_dataset. 4. 生成`.bin`或`.idx`数据文件 - 执行数据预处理脚本[preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py)可以将原始文本数据通过模型的tokenizer转换为对应的token id。 + 执行数据预处理脚本[preprocess_indexed_dataset.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py)可以将原始文本数据通过模型的tokenizer转换为对应的token id。 该脚本参数如下: @@ -104,7 +104,7 @@ MindSpore Transformers提供了数据预处理脚本[preprocess_indexed_dataset. --seq-length 8192 ``` - 以外部tokenizer类[Llama3Tokenizer](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_tokenizer.py)为例,确保**本地**mindformers仓库下存在'research/llama3_1/llama3_1_tokenizer.py',执行如下命令处理数据集: + 以外部tokenizer类[Llama3Tokenizer](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_tokenizer.py)为例,确保**本地**mindformers仓库下存在'research/llama3_1/llama3_1_tokenizer.py',执行如下命令处理数据集: ```shell python mindformers/tools/dataset_preprocess/preprocess_indexed_dataset.py \ @@ -200,7 +200,7 @@ MindSpore Transformers推荐用户使用Megatron数据集进行模型预训练 | pad | 数据集中pad的token id | | data_path | 列表,每连续两个列表元素(数字,字符串)被视作一个数据集,分别表示该数据集的采样占比和数据集bin文件去掉后缀`.bin`的路径,所有数据集的占比之和应当为1 | - 此外,Megatron数据集还依赖`input_columns`、`construct_args_key`、`full_batch`等配置,具体可参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html),这里仅说明在不同场景如何配置: + 此外,Megatron数据集还依赖`input_columns`、`construct_args_key`、`full_batch`等配置,具体可参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html),这里仅说明在不同场景如何配置: - 当`create_compressed_eod_mask=True`时: @@ -237,7 +237,7 @@ MindSpore Transformers推荐用户使用Megatron数据集进行模型预训练 3. 启动模型预训练 - 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型预训练任务,这里以[Llama3_1模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md)为例。 + 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型预训练任务,这里以[Llama3_1模型文档](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/README.md)为例。 ## HuggingFace数据集 @@ -401,7 +401,7 @@ train_dataset: &train_dataset prefetch_size: 1 ``` - 1. `train_dataset`中参数说明可参考[文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html); + 1. `train_dataset`中参数说明可参考[文档](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html); 2. `AlpacaInstructDataHandler`是针对`alpaca`数据集开发的在线处理脚本,如果使用其他数据集,用户需要参考[自定义数据handler](#自定义数据handler)完成自定义数据处理的功能实现。 @@ -482,7 +482,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64" - alpaca 数据集示例 - 修改任务配置文件 [finetune_qwen2_5_0_5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml)。 + 修改任务配置文件 [finetune_qwen2_5_0_5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml)。 修改如下参数: @@ -517,7 +517,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64" prefetch_size: 1 ``` - 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 的 “模型训练配置” 和 “模型评估配置”。 + 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 的 “模型训练配置” 和 “模型评估配置”。 自定义数据 handler: @@ -630,7 +630,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64" seed: 0 ``` - 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 的 “模型训练配置” 和 “模型评估配置”。 + 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 的 “模型训练配置” 和 “模型评估配置”。 自定义 adgen_handler: @@ -708,7 +708,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64" `CommonDataLoader`除了支持数据集在线加载与处理,还支持离线处理数据集并进行保存。 -使用[datasets_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/data_preprocess/huggingface/datasets_preprocess.py)脚本可以离线处理 HuggingFace 数据集并进行保存。 +使用[datasets_preprocess.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/toolkit/data_preprocess/huggingface/datasets_preprocess.py)脚本可以离线处理 HuggingFace 数据集并进行保存。 - 参数说明 @@ -793,7 +793,7 @@ MindRecord是MindSpore提供的高效数据存储/读取模块,可以减少磁 1. 修改模型配置文件 - `qwen2_5-0.5b`模型微调使用[finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml)配置文件,修改其中数据集部分配置: + `qwen2_5-0.5b`模型微调使用[finetune_qwen2_5_0.5b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml)配置文件,修改其中数据集部分配置: ```yaml train_dataset: &train_dataset @@ -811,7 +811,7 @@ MindRecord是MindSpore提供的高效数据存储/读取模块,可以减少磁 2. 启动模型微调 - 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型微调任务,这里以[Qwen2_5模型文档](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/README.md)为例。 + 修改模型配置文件中数据集以及并行相关配置项之后,即可参考模型文档拉起模型微调任务,这里以[Qwen2_5模型文档](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/README.md)为例。 ### 多源数据集 diff --git a/docs/mindformers/docs/source_zh_cn/feature/evaluation.md b/docs/mindformers/docs/source_zh_cn/feature/evaluation.md index 4bef33361a..c0d45f9ad3 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/evaluation.md +++ b/docs/mindformers/docs/source_zh_cn/feature/evaluation.md @@ -44,9 +44,9 @@ pip install -e . 1. 创建一个新目录,例如名称为`model_dir`,用于存储模型yaml文件。 2. 在上个步骤创建的目录中,放置模型推理yaml配置文件(predict_xxx_.yaml)。不同模型的推理yaml配置文件所在目录位置,请参考[模型库](../introduction/models.md)。 - 3. 配置yaml文件。如果yaml中模型类、模型Config类、模型Tokenzier类使用了外挂代码,即代码文件在[research](https://gitee.com/mindspore/mindformers/tree/dev/research)目录或其他外部目录下,需要修改yaml文件:在相应类的`type`字段下,添加`auto_register`字段,格式为“module.class”(其中“module”为类所在脚本的文件名,“class”为类名。如果已存在,则不需要修改)。 + 3. 配置yaml文件。如果yaml中模型类、模型Config类、模型Tokenzier类使用了外挂代码,即代码文件在[research](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research)目录或其他外部目录下,需要修改yaml文件:在相应类的`type`字段下,添加`auto_register`字段,格式为“module.class”(其中“module”为类所在脚本的文件名,“class”为类名。如果已存在,则不需要修改)。 - 以[predict_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml)配置为例,对其中的部分配置项进行如下修改: + 以[predict_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml)配置为例,对其中的部分配置项进行如下修改: ```yaml run_mode: 'predict' # 设置推理模式 @@ -71,13 +71,13 @@ pip install -e . #### 评测样例 -执行脚本[run_harness.sh](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/benchmarks/run_harness.sh)进行评测。 +执行脚本[run_harness.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/toolkit/benchmarks/run_harness.sh)进行评测。 run_harness.sh脚本参数配置如下表: | 参数 | 类型 | 参数介绍 | 是否必须 | |------------------|-----|------------------------------------------------------------------------------------------------|------| -| `--register_path`| str | 外挂代码所在目录的绝对路径。比如[research](https://gitee.com/mindspore/mindformers/tree/dev/research)目录下的模型目录 | 否(外挂代码必填) | +| `--register_path`| str | 外挂代码所在目录的绝对路径。比如[research](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research)目录下的模型目录 | 否(外挂代码必填) | | `--model` | str | 需设置为 `mf` ,对应为MindSpore Transformers评估策略 | 是 | | `--model_args` | str | 模型及评估相关参数,见下方模型参数介绍 | 是 | | `--tasks` | str | 数据集名称。可传入多个数据集,使用逗号(,)分隔 | 是 | diff --git a/docs/mindformers/docs/source_zh_cn/feature/load_huggingface_config.md b/docs/mindformers/docs/source_zh_cn/feature/load_huggingface_config.md index 05d25526e5..f2e415f176 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/load_huggingface_config.md +++ b/docs/mindformers/docs/source_zh_cn/feature/load_huggingface_config.md @@ -26,7 +26,7 @@ - pretrained_model_dir:Hugging Face 模型配置所在的目录路径; - model_config:MindSpore Transformers 自有的模型配置字段; -- generation_config:文本生成相关的参数。可选配置,如需自定义则增加。其下的配置项可以参考[GenerationConfig](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/generation/mindformers.generation.GenerationConfig.html)。 +- generation_config:文本生成相关的参数。可选配置,如需自定义则增加。其下的配置项可以参考[GenerationConfig](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/generation/mindformers.generation.GenerationConfig.html)。 ```yaml pretrained_model_dir: "./local/qwen3" @@ -59,7 +59,7 @@ generation_config: ### 拉起任务 -参考[使用run_mindformer.py启动推理任务](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/inference.html#%E4%BD%BF%E7%94%A8-run-mindformer-%E4%B8%80%E9%94%AE%E5%90%AF%E5%8A%A8%E8%84%9A%E6%9C%AC%E6%8E%A8%E7%90%86)。 +参考[使用run_mindformer.py启动推理任务](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/inference.html#%E4%BD%BF%E7%94%A8-run-mindformer-%E4%B8%80%E9%94%AE%E5%90%AF%E5%8A%A8%E8%84%9A%E6%9C%AC%E6%8E%A8%E7%90%86)。 ## 常见问题FAQ diff --git a/docs/mindformers/docs/source_zh_cn/feature/logging.md b/docs/mindformers/docs/source_zh_cn/feature/logging.md index 4f1a32b316..009adce0a2 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/logging.md +++ b/docs/mindformers/docs/source_zh_cn/feature/logging.md @@ -46,7 +46,7 @@ MindSpore Transformers 默认会在训练的 yaml 文件中指定文件输出路 如果需要重新指定输出的日志文件夹,可以在 yaml 中修改配置。 -以 [`DeepSeek-V3` 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L2) 为例,可做如下配置: +以 [`DeepSeek-V3` 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L2) 为例,可做如下配置: ```yaml output_dir: './output' # path to save logs/checkpoint/strategy @@ -54,12 +54,12 @@ output_dir: './output' # path to save logs/checkpoint/strategy #### 单卡任务指定输出目录 -除了 yaml 文件配置来指定,MindSpore Transformers 还支持在 [run_mindformer 一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#run-mindformer%E4%B8%80%E9%94%AE%E5%90%AF%E5%8A%A8%E8%84%9A%E6%9C%AC) 中,使用 `--output_dir` 启动命令对日志输出路径做指定。 +除了 yaml 文件配置来指定,MindSpore Transformers 还支持在 [run_mindformer 一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#run-mindformer%E4%B8%80%E9%94%AE%E5%90%AF%E5%8A%A8%E8%84%9A%E6%9C%AC) 中,使用 `--output_dir` 启动命令对日志输出路径做指定。 > 如果在这里配置了输出路径,将会覆盖 yaml 文件中的配置! #### 分布式任务指定输出目录 -如果模型训练需要用到多台服务器,使用[分布式任务拉起脚本 msrun_launcher.sh](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#%E5%88%86%E5%B8%83%E5%BC%8F%E4%BB%BB%E5%8A%A1%E6%8B%89%E8%B5%B7%E8%84%9A%E6%9C%AC) 来启动分布式训练任务。 +如果模型训练需要用到多台服务器,使用[分布式任务拉起脚本 msrun_launcher.sh](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#%E5%88%86%E5%B8%83%E5%BC%8F%E4%BB%BB%E5%8A%A1%E6%8B%89%E8%B5%B7%E8%84%9A%E6%9C%AC) 来启动分布式训练任务。 在设置了共享存储的情况下,还可以在启动脚本中指定入参 `LOG_DIR` 来指定 Worker 以及 Scheduler 的日志输出路径,将所有机器节点的日志都输出到一个路径下,方便统一观察。 \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md index 25fef61e31..0eaaa7bc24 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md +++ b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md @@ -14,7 +14,7 @@ 用户可通过在模型训练的 yaml 配置文件中新增 `recompute_config` 模块来使用重计算。 -以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L113) 为例,可做如下配置: +以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L113) 为例,可做如下配置: ```yaml # recompute config diff --git a/docs/mindformers/docs/source_zh_cn/feature/monitor.md b/docs/mindformers/docs/source_zh_cn/feature/monitor.md index bbcaac7846..036efe3f36 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/monitor.md +++ b/docs/mindformers/docs/source_zh_cn/feature/monitor.md @@ -256,4 +256,4 @@ expert_load(图中为3个MoE层的各自16个专家的负载变化曲线): > 2. 用户在训练配置文件 `yaml` 中设置的配置参数; > 3. 训练默认的配置参数。 > -> 可配置的所有参数请参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。 \ No newline at end of file +> 可配置的所有参数请参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html)。 \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md index ff93bcfd6f..c8fd60988a 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md +++ b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md @@ -40,7 +40,7 @@ parallel_config: - data_parallel:数据并行切分数量,默认为1,根据用户需求配置。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ### 模型并行 @@ -59,7 +59,7 @@ parallel_config: - model_parallel:模型并行切分数量,默认为1,根据用户需求配置。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ### 序列并行 @@ -78,7 +78,7 @@ parallel_config: - use_seq_parallel:是否开启序列并行,默认为Fasle。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ### 长序列并行 @@ -109,7 +109,7 @@ parallel_config: - use_ring_attention:是否开启Ring Attention,默认为False。 - context_parallel:序列并行切分数量,默认为1,根据用户需求配置。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 #### Ulysses序列并行 @@ -140,7 +140,7 @@ parallel_config: - enable_alltoall:生成alltoall通信算子,默认为False,不启用时将会由allgather等其他算子组合完成等价替代,可参考MindSpore `set_auto_parallel_context`[接口文档](https://www.mindspore.cn/docs/zh-CN/r2.7.0rc1/api_python/mindspore/mindspore.set_auto_parallel_context.html);启用Ulysses方案时我们期望能够直接插入alltoall通信算子,因此将该配置项打开。 - context_parallel_algo:设置为`ulysses_cp`开启Ulysses序列并行。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 #### 混合序列并行 @@ -166,7 +166,7 @@ parallel_config: - context_parallel_algo:设置为`hybrid_cp`时开启混合序列并行。 - ulysses_degree_in_cp:Ulysses序列并行切分数量。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ### 流水线并行 @@ -199,7 +199,7 @@ parallel_config: - 目前仅支持Llama和DeepSeek系列模型。 - 目前暂不支持使用Megatron的多源数据集进行训练的场景。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ### 优化器并行 @@ -218,7 +218,7 @@ parallel: - enable_parallel_optimizer:是否开启优化器并行,默认为`False`。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ### 多副本并行 @@ -241,11 +241,11 @@ model_config: - 目前仅支持Llama和Qwen系列模型。 -关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。 +关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html) 中的并行配置章节下的具体内容。 ## MindSpore Transformers 分布式并行应用实践 -在官网提供的[Llama3_1-70B微调配置](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#)文件中,使用了多种分布式并行策略,以提升多机多卡环境中的训练效率。以下是该配置文件中涉及的主要并行策略和关键参数: +在官网提供的[Llama3_1-70B微调配置](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_70b/finetune_llama3_1_70b.yaml#)文件中,使用了多种分布式并行策略,以提升多机多卡环境中的训练效率。以下是该配置文件中涉及的主要并行策略和关键参数: - **数据并行**:未启用额外的数据并行(`data_parallel: 1`)。 - **模型并行**:模型被切分成8个部分,在不同设备上计算(`model_parallel: 8`)。 diff --git a/docs/mindformers/docs/source_zh_cn/feature/quantization.md b/docs/mindformers/docs/source_zh_cn/feature/quantization.md index 92ed1e10c8..b32898a0ca 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/quantization.md +++ b/docs/mindformers/docs/source_zh_cn/feature/quantization.md @@ -14,6 +14,6 @@ MindSpore Transformers 集成 MindSpore Golden Stick 工具组件,提供统一 | 支持的模型 | |-----------------------------------------------------------------------------------------------------------------------------------| -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/predict_deepseek3_671b.yaml) | -| [DeepSeek-R1](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b.yaml) | -| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_13b_ptq.yaml) | \ No newline at end of file +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/predict_deepseek3_671b.yaml) | +| [DeepSeek-R1](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek_r1_671b/predict_deepseek_r1_671b.yaml) | +| [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/llama2/predict_llama2_13b_ptq.yaml) | \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/feature/resume_training.md b/docs/mindformers/docs/source_zh_cn/feature/resume_training.md index d6eaaff24d..25419e9e5a 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/resume_training.md +++ b/docs/mindformers/docs/source_zh_cn/feature/resume_training.md @@ -45,7 +45,7 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中 ### 分布式训练示例 以下示例演示了如何在单卡和多卡环境中启动断点续训。示例基于`llama2_7b` -模型,相关配置文件[configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/pretrain_llama2_7b.yaml)。 +模型,相关配置文件[configs/llama2/pretrain_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/llama2/pretrain_llama2_7b.yaml)。 #### 完整训练 @@ -75,7 +75,7 @@ MindSpore Transformers支持**step级断点续训**功能,允许在训练中 ... ``` -2. 准备数据集,此处以[wikitext2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)为例,启动4卡分布式训练: +2. 准备数据集,此处以[wikitext2](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)为例,启动4卡分布式训练: ```shell bash scripts/msrun_launcher.sh "run_mindformer.py \ diff --git a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md index fceb494912..907279f460 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md +++ b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md @@ -70,7 +70,7 @@ qwen2_7b 在深度学习模型的训练过程中,保存模型的权重是至关重要的一步。权重保存功能使得我们能够在训练的任意阶段存储模型的参数,以便用户在训练中断或完成后进行恢复、继续训练、评估或部署。同时还可以通过保存权重的方式,在不同环境下复现实验结果。 -目前,MindSpore Transformers 支持 [safetensors](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html) 格式的权重文件读取和保存。 +目前,MindSpore Transformers 支持 [safetensors](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html) 格式的权重文件读取和保存。 ### 目录结构 @@ -120,7 +120,7 @@ output 用户可修改 `yaml` 配置文件中 `CheckpointMonitor` 下的字段来控制权重保存行为。 -以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L206) 为例,可做如下配置: +以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L206) 为例,可做如下配置: ```yaml # callbacks @@ -152,7 +152,7 @@ callbacks: | remove_redundancy | 保存模型权重时是否去除冗余。 | (bool, 可选) - 默认值: `False` 。 | | save_network_params | 是否仅额外保存网络参数。 | (bool, 可选) - 是否仅额外保存网络参数。默认值: `False` 。 | -如果您想了解更多有关 CheckpointMonitor 的知识,可以参考 [CheckpointMonitor API 文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.CheckpointMonitor.html)。 +如果您想了解更多有关 CheckpointMonitor 的知识,可以参考 [CheckpointMonitor API 文档](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.CheckpointMonitor.html)。 ## 权重加载 @@ -264,7 +264,7 @@ parallel_config: # 配置16卡分布式策略 **启动任务**: -使用[mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh)进行任务启动。 +使用[mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh)进行任务启动。 ```shell # 第一台服务器(主节点) @@ -358,7 +358,7 @@ auto_trans_ckpt: False # 分布式权重加载, **4.启动任务**: -使用[mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh)进行任务启动。 +使用[mindformers/scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh)进行任务启动。 ```shell # 第一台服务器(主节点) @@ -456,7 +456,7 @@ generation: #### 使用说明 -使用MindSpore Transformers提供的[safetensors权重合并脚本](https://gitee.com/mindspore/mindformers/blob/dev/toolkit/safetensors/unified_safetensors.py),按照如下方式进行safetensors权重合并。合并后的权重格式为[完整权重](#完整权重)。 +使用MindSpore Transformers提供的[safetensors权重合并脚本](https://gitee.com/mindspore/mindformers/blob/r1.6.0/toolkit/safetensors/unified_safetensors.py),按照如下方式进行safetensors权重合并。合并后的权重格式为[完整权重](#完整权重)。 ```shell python toolkit/safetensors/unified_safetensors.py \ @@ -573,7 +573,7 @@ callbacks: ### 训练任务示例 -若使用完整权重多卡在线微调,以Qwen2.5-7B模型为例,修改配置项[finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): +若使用完整权重多卡在线微调,以Qwen2.5-7B模型为例,修改配置项[finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): ```yaml # 修改后的配置 @@ -589,7 +589,7 @@ callbacks: checkpoint_format: safetensors # 保存权重文件格式 ``` -若使用分布式权重多卡在线微调,以Qwen2.5-7B模型为例,修改配置项[finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): +若使用分布式权重多卡在线微调,以Qwen2.5-7B模型为例,修改配置项[finetune_qwen2_5_7b_8k.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/finetune_qwen2_5_7b_8k.yaml): ```yaml # 修改后的配置 @@ -617,11 +617,11 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ 任务执行完成后,在mindformers/output目录下,会生成checkpoint文件夹,同时模型文件会保存在该文件夹下。 -更多详情请参考:[SFT微调介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/supervised_fine_tuning.html)、[预训练介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/pre_training.html) +更多详情请参考:[SFT微调介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/supervised_fine_tuning.html)、[预训练介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/pre_training.html) ### 推理任务示例 -若使用完整权重多卡在线推理,以Qwen2.5-7B模型为例,修改配置项[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): +若使用完整权重多卡在线推理,以Qwen2.5-7B模型为例,修改配置项[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): ```yaml # 修改后的配置 @@ -634,7 +634,7 @@ parallel_config: pipeline_stage: 1 ``` -若使用分布式权重多卡在线推理,以Qwen2.5-7B模型为例,修改配置项[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): +若使用分布式权重多卡在线推理,以Qwen2.5-7B模型为例,修改配置项[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml): ```yaml # 修改后的配置 @@ -664,7 +664,7 @@ bash scripts/msrun_launcher.sh "python run_mindformer.py \ 'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......] ``` -更多详情请参考:[推理介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/inference.html) +更多详情请参考:[推理介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/inference.html) ### 断点续训任务示例 @@ -700,5 +700,5 @@ callbacks: checkpoint_format: safetensors # 保存权重文件格式 ``` -更多详情请参考:[断点续训介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html)。 +更多详情请参考:[断点续训介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/resume_training.html)。 diff --git a/docs/mindformers/docs/source_zh_cn/feature/skip_data_and_ckpt_health_monitor.md b/docs/mindformers/docs/source_zh_cn/feature/skip_data_and_ckpt_health_monitor.md index 30017495e1..005548c13f 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/skip_data_and_ckpt_health_monitor.md +++ b/docs/mindformers/docs/source_zh_cn/feature/skip_data_and_ckpt_health_monitor.md @@ -10,7 +10,7 @@ > - 数据跳过功能和健康监测功能二者结合,能有效解决训练过程中异常 global norm 带来的数据异常问题。使用前请先正常训练一段时间,从而确定需要设定的 global norm 的阈值、连续异常次数的阈值以及 embedding norm 的阈值。 > - 只有连续出现异常时才会中断训练,如果中途出现一次恢复正常,则会清空累计次数,所以请把控阈值的设定。 -> - 数据跳过功能不能与故障快速恢复功能同时使用。参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/high_availability.html)中的进程级重调度恢复功能。 +> - 数据跳过功能不能与故障快速恢复功能同时使用。参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/high_availability.html)中的进程级重调度恢复功能。 ## 数据跳过 @@ -53,7 +53,7 @@ monitor_config: ### 使用示例 -假设以Llama3.1-8B为例子,使用的[finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)按照上述[配置](#使用方法)添加参数,其余步骤请参考[Llama3.1-8B文档](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md)。开启训练: +假设以Llama3.1-8B为例子,使用的[finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)按照上述[配置](#使用方法)添加参数,其余步骤请参考[Llama3.1-8B文档](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md)。开启训练: ```shell bash scripts/msrun_launcher.sh "run_mindformer.py \ @@ -155,7 +155,7 @@ parallel_config: ### 使用示例 -假设以Llama3.1-8B为例子,使用的[finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)按照上述[配置](#使用方法-1)添加参数和修改,其余步骤请参考[Llama3.1-8B文档](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md)。开启训练: +假设以Llama3.1-8B为例子,使用的[finetune_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/llama3_1/llama3_1_8b/finetune_llama3_1_8b.yaml)按照上述[配置](#使用方法-1)添加参数和修改,其余步骤请参考[Llama3.1-8B文档](https://gitee.com/lanshaozuishuai/mindformers/blob/dev/research/llama3_1/README.md)。开启训练: ```shell bash scripts/msrun_launcher.sh "run_mindformer.py \ diff --git a/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md b/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md index 61dde630aa..8f64639cc7 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md +++ b/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md @@ -22,7 +22,7 @@ MindSpore Transformers提供了一键启动脚本`run_mindformer.py`和分布式 | `--device_id` | 设置执行设备ID,其值必须在可用设备范围内。 | int,可选 | 预训练/微调/推理 | | `--device_target` | 设置后端执行设备,MindSpore Transformers仅支持在`Ascend`设备上运行。 | str,可选 | 预训练/微调/推理 | | `--run_mode` | 设置模型的运行模式,可选`train`、`finetune`或`predict`。 | str,可选 | 预训练/微调/推理 | -| `--load_checkpoint` | 加载的权重文件或文件夹路径,详细使用方式参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 | str,可选 | 预训练/微调/推理 | +| `--load_checkpoint` | 加载的权重文件或文件夹路径,详细使用方式参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。 | str,可选 | 预训练/微调/推理 | | `--use_parallel` | 是否开启并行模式。 | bool,可选 | 预训练/微调/推理 | | `--output_dir` | 设置保存日志、权重、切分策略等文件的路径。 | str,可选 | 预训练/微调/推理 | | `--register_path` | 外挂代码所在目录的绝对路径。比如research目录下的模型目录。 | str,可选 | 预训练/微调/推理 | @@ -33,7 +33,7 @@ MindSpore Transformers提供了一键启动脚本`run_mindformer.py`和分布式 | 参数 | 参数说明 | 取值说明 | 适用场景 | |:----------------------------:|:-------------------------------------------------------------------------------------------------------------------|--------------------------------|-----------| | `--src_strategy_path_or_dir` | 权重的策略文件路径。 | str,可选 | 预训练/微调/推理 | -| `--auto_trans_ckpt` | 是否开启在线权重自动转换功能,详情可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 | bool,可选 | 预训练/微调/推理 | +| `--auto_trans_ckpt` | 是否开启在线权重自动转换功能,详情可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。 | bool,可选 | 预训练/微调/推理 | | `--transform_process_num` | 负责权重转换的进程数。 | int,可选 | 预训练/微调/推理 | | `--only_save_strategy` | 是否仅保存切分策略文件。 | bool,可选,为`true`时任务在保存策略文件后直接退出 | 预训练/微调/推理 | @@ -42,7 +42,7 @@ MindSpore Transformers提供了一键启动脚本`run_mindformer.py`和分布式 | 参数 | 参数说明 | 取值说明 | 适用场景 | |:-------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------|---------|--------| | `--train_dataset_dir` | 预训练/微调的数据集目录。 | str,可选 | 预训练/微调 | -| `--resume_training` | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool,可选 | 预训练/微调 | +| `--resume_training` | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool,可选 | 预训练/微调 | | `--epochs` | 训练轮次。 | int,可选 | 预训练/微调 | | `--gradient_accumulation_steps` | 梯度累积步数。 | int,可选 | 预训练/微调 | | `--batch_size` | 批处理数据的样本数。 | int,可选 | 预训练/微调 | diff --git a/docs/mindformers/docs/source_zh_cn/feature/tokenizer.md b/docs/mindformers/docs/source_zh_cn/feature/tokenizer.md index 8bbce9a729..a18284c270 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/tokenizer.md +++ b/docs/mindformers/docs/source_zh_cn/feature/tokenizer.md @@ -38,7 +38,7 @@ MindSpore Transformers原有的Tokenizer组件与Hugging Face Tokenizer的功能 1. 修改yaml配置 - Qwen3模型的配置文件[predict_qwen3.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/qwen3/predict_qwen3.yaml)需要修改的地方如下: + Qwen3模型的配置文件[predict_qwen3.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/qwen3/predict_qwen3.yaml)需要修改的地方如下: ```yaml use_legacy: False diff --git a/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md b/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md index cb445f9448..43a1688887 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md +++ b/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md @@ -19,7 +19,7 @@ MindSpore Transformers 提供了如下几类超参数的配置方式。 #### YAML 参数配置 用户可通过在模型训练的 yaml 配置文件中新增 `lr_schedule` 模块来使用学习率。 -以 [`DeepSeek-V3` 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L31) 为例,可做如下配置: +以 [`DeepSeek-V3` 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L31) 为例,可做如下配置: ```yaml # lr schedule @@ -34,14 +34,14 @@ lr_schedule: 各学习率需配置的参数不同,MindSpore Transformers 目前支持了以下学习率: -1. [恒定预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.ConstantWarmUpLR.html) -2. [线性预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.LinearWithWarmUpLR.html) -3. [余弦预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.CosineWithWarmUpLR.html) -4. [余弦重启与预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.CosineWithRestartsAndWarmUpLR.html) -5. [带有预热阶段的多项式衰减学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.PolynomialWithWarmUpLR.html) -6. [SGDR 的余弦退火部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.CosineAnnealingLR.html) -7. [使用余弦退火调度设置每个参数组的学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.CosineAnnealingWarmRestarts.html) -8. [学习率分层模块](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.LearningRateWiseLayer.html) +1. [恒定预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.ConstantWarmUpLR.html) +2. [线性预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.LinearWithWarmUpLR.html) +3. [余弦预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.CosineWithWarmUpLR.html) +4. [余弦重启与预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.CosineWithRestartsAndWarmUpLR.html) +5. [带有预热阶段的多项式衰减学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.PolynomialWithWarmUpLR.html) +6. [SGDR 的余弦退火部分](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.CosineAnnealingLR.html) +7. [使用余弦退火调度设置每个参数组的学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.CosineAnnealingWarmRestarts.html) +8. [学习率分层模块](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.LearningRateWiseLayer.html) 以余弦预热学习率(CosineWithWarmUpLR)为例,需要关注的主要参数如下表所列: @@ -68,7 +68,7 @@ lr_schedule: total_steps: 20 # -1 means it will load the total steps of the dataset ``` -更多关于学习率 API 的介绍(如 `type` 的配置名称、学习率算法的介绍),可参见 [MindSpore Transformers API 文档:学习率部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/mindformers.core.html#%E5%AD%A6%E4%B9%A0%E7%8E%87) 的相关链接。 +更多关于学习率 API 的介绍(如 `type` 的配置名称、学习率算法的介绍),可参见 [MindSpore Transformers API 文档:学习率部分](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/mindformers.core.html#%E5%AD%A6%E4%B9%A0%E7%8E%87) 的相关链接。 ## 优化器 @@ -78,7 +78,7 @@ lr_schedule: 选择合适的优化器对模型的收敛速度和最终性能有着至关重要的影响。不同的优化器通过不同的方法调整学习率和其他超参数来加速训练过程、改善收敛性并避免局部最优解。 -当前,MindSpore Transformers 只支持 [AdamW 优化器](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/mindformers.core.html#%E4%BC%98%E5%8C%96%E5%99%A8)。 +当前,MindSpore Transformers 只支持 [AdamW 优化器](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/mindformers.core.html#%E4%BC%98%E5%8C%96%E5%99%A8)。 ### 配置与使用 @@ -86,7 +86,7 @@ lr_schedule: 用户可通过在模型训练的 yaml 配置文件中新增 `optimizer` 模块来使用学习率。 -以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L24) 为例,可做如下配置: +以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L24) 为例,可做如下配置: ```yaml # optimizer @@ -98,4 +98,4 @@ optimizer: #### 主要配置参数介绍 -有关优化器配置的主要参数,可参见 [MindSpore Transformers API 文档:优化器部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.AdamW.html#mindformers.core.AdamW) 的相关链接。 +有关优化器配置的主要参数,可参见 [MindSpore Transformers API 文档:优化器部分](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/core/mindformers.core.AdamW.html#mindformers.core.AdamW) 的相关链接。 diff --git a/docs/mindformers/docs/source_zh_cn/guide/deployment.md b/docs/mindformers/docs/source_zh_cn/guide/deployment.md index 1d27ca29ba..5f1e21a9f5 100644 --- a/docs/mindformers/docs/source_zh_cn/guide/deployment.md +++ b/docs/mindformers/docs/source_zh_cn/guide/deployment.md @@ -8,7 +8,7 @@ MindIE,全称Mind Inference Engine,是基于昇腾硬件的高性能推理 MindSpore Transformers承载在模型应用层MindIE LLM中,通过MindIE Service可以部署MindSpore Transformers中的大模型。 -MindIE推理的模型支持度可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)。 +MindIE推理的模型支持度可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/introduction/models.html)。 ## 环境搭建 @@ -16,7 +16,7 @@ MindIE推理的模型支持度可参考[模型库](https://www.mindspore.cn/mind 1. 安装MindSpore Transformers - 参考[MindSpore Transformers官方安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html)进行安装。 + 参考[MindSpore Transformers官方安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/installation.html)进行安装。 2. 安装MindIE @@ -86,9 +86,9 @@ processor: merges_file: "/path/to/mf_model/qwen1_5_72b/merges.txt" # merges文件绝对路径 ``` -模型权重下载和转换可参考 [权重格式转换指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 +模型权重下载和转换可参考 [权重格式转换指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。 -不同模型的所需文件和配置可能会有差异,详情参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)中具体模型的推理章节。 +不同模型的所需文件和配置可能会有差异,详情参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/introduction/models.html)中具体模型的推理章节。 ### 启动MindIE @@ -346,4 +346,4 @@ curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content ## 模型列表 -其他模型的MindIE推理示例可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)中的各模型的介绍文档。 \ No newline at end of file +其他模型的MindIE推理示例可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/introduction/models.html)中的各模型的介绍文档。 \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/guide/inference.md b/docs/mindformers/docs/source_zh_cn/guide/inference.md index 44dc5a37ab..2174551ecc 100644 --- a/docs/mindformers/docs/source_zh_cn/guide/inference.md +++ b/docs/mindformers/docs/source_zh_cn/guide/inference.md @@ -19,7 +19,7 @@ MindSpore Transformers 提供了大模型推理能力,用户可以执行 `run_ 目前推理权重可以在线加载完整权重进行推理,权重可以通过以下两种方式获得: 1. 从Hugging Face模型库中下载相应模型的开源的完整权重。 -2. 预训练或者微调后的分布式权重,通过[合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html#%E6%9D%83%E9%87%8D%E5%90%88%E5%B9%B6)生成一个完整权重。 +2. 预训练或者微调后的分布式权重,通过[合并](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html#%E6%9D%83%E9%87%8D%E5%90%88%E5%B9%B6)生成一个完整权重。 ### 3. 执行推理任务 @@ -27,7 +27,7 @@ MindSpore Transformers 提供了大模型推理能力,用户可以执行 `run_ ## 使用 run_mindformer 一键启动脚本推理 -单卡推理可以直接执行[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py)脚本,多卡推理需要借助[scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/dev/scripts/msrun_launcher.sh)来启动。 +单卡推理可以直接执行[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py)脚本,多卡推理需要借助[scripts/msrun_launcher.sh](https://gitee.com/mindspore/mindformers/blob/r1.6.0/scripts/msrun_launcher.sh)来启动。 run_mindformer.py的参数说明如下: @@ -45,7 +45,7 @@ run_mindformer.py的参数说明如下: msrun_launcher.sh包括run_mindformer.py命令和推理卡数两个参数。 -下面将以 `Qwen2.5-7B` 为例介绍单卡和多卡推理的用法,推荐配置为[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml)文件。 +下面将以 `Qwen2.5-7B` 为例介绍单卡和多卡推理的用法,推荐配置为[predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml)文件。 ### 配置修改 @@ -76,7 +76,7 @@ processor: merges_file: "path/to/merges.txt" ``` -具体配置说明均可参考[yaml配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。 +具体配置说明均可参考[yaml配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html)。 ### 单卡推理 @@ -152,4 +152,4 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ ## 更多信息 -更多关于不同模型的推理示例,请访问[MindSpore Transformers 已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)。 \ No newline at end of file +更多关于不同模型的推理示例,请访问[MindSpore Transformers 已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/introduction/models.html)。 \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/guide/pre_training.md b/docs/mindformers/docs/source_zh_cn/guide/pre_training.md index adb1d101e0..bd1b28f541 100644 --- a/docs/mindformers/docs/source_zh_cn/guide/pre_training.md +++ b/docs/mindformers/docs/source_zh_cn/guide/pre_training.md @@ -12,23 +12,23 @@ ### 1. 数据集准备 -MindSpore Transformers 预训练阶段当前已支持[Megatron 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#megatron%E6%95%B0%E6%8D%AE%E9%9B%86)和[MindRecord格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#mindrecord%E6%95%B0%E6%8D%AE%E9%9B%86)的数据集。用户可根据任务需求完成数据准备。 +MindSpore Transformers 预训练阶段当前已支持[Megatron 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#megatron%E6%95%B0%E6%8D%AE%E9%9B%86)和[MindRecord格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#mindrecord%E6%95%B0%E6%8D%AE%E9%9B%86)的数据集。用户可根据任务需求完成数据准备。 ### 2. 配置文件准备 -预训练任务通过[配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)统一控制,用户可灵活调整[模型训练超参数](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/training_hyperparameters.html)。另外可以通过[分布式并行训练](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html)、[内存优化特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/memory_optimization.html)以及[其它训练特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/other_training_features.html)对预训练性能进行调优。 +预训练任务通过[配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html)统一控制,用户可灵活调整[模型训练超参数](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/training_hyperparameters.html)。另外可以通过[分布式并行训练](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/parallel_training.html)、[内存优化特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/memory_optimization.html)以及[其它训练特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/other_training_features.html)对预训练性能进行调优。 ### 3. 启动训练任务 -MindSpore Transformers 提供[一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/start_tasks.html)启动预训练任务。训练过程中可结合[日志](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/logging.html)与[可视化工具](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/monitor.html)监控训练情况。 +MindSpore Transformers 提供[一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/start_tasks.html)启动预训练任务。训练过程中可结合[日志](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/logging.html)与[可视化工具](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/monitor.html)监控训练情况。 ### 4. 模型保存 -在中间保存检查点或训练完成后,模型权重将保存至指定路径。当前支持保存为[Ckpt 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)或[Safetensors 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html),后续可以使用保存的权重进行续训或微调等。 +在中间保存检查点或训练完成后,模型权重将保存至指定路径。当前支持保存为[Ckpt 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)或[Safetensors 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html),后续可以使用保存的权重进行续训或微调等。 ### 5. 故障恢复 -为应对训练中断等异常情况,MindSpore Transformers 具备临终保存、自动恢复等[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/high_availability.html),并支持[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html),提升训练稳定性。 +为应对训练中断等异常情况,MindSpore Transformers 具备临终保存、自动恢复等[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/high_availability.html),并支持[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/resume_training.html),提升训练稳定性。 ## 基于 MindSpore Transformers 的预训练实践 @@ -44,7 +44,7 @@ MindSpore Transformers 目前已经支持加载 Megatron 数据集,该数据 ### 数据预处理 -数据集处理可参考[Megatron数据集-数据预处理](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#%E6%95%B0%E6%8D%AE%E9%A2%84%E5%A4%84%E7%90%86) +数据集处理可参考[Megatron数据集-数据预处理](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#%E6%95%B0%E6%8D%AE%E9%A2%84%E5%A4%84%E7%90%86) - 生成Megatron BIN格式文件 @@ -78,11 +78,11 @@ MindSpore Transformers 目前已经支持加载 Megatron 数据集,该数据 ### 单机训练 -通过指定模型路径和配置文件[pretrain_deepseek3_671b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml)以msrun的方式启动[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py)脚本,进行8卡分布式训练。 +通过指定模型路径和配置文件[pretrain_deepseek3_671b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml)以msrun的方式启动[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py)脚本,进行8卡分布式训练。 -默认配置中的模型层数、隐藏维度等参数较大,适用于多机大规模分布式训练,无法直接在单机环境启动预训练,需要参考[DeepSeek-V3-修改配置](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/README.md#%E4%BF%AE%E6%94%B9%E9%85%8D%E7%BD%AE)修改配置文件。 +默认配置中的模型层数、隐藏维度等参数较大,适用于多机大规模分布式训练,无法直接在单机环境启动预训练,需要参考[DeepSeek-V3-修改配置](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/README.md#%E4%BF%AE%E6%94%B9%E9%85%8D%E7%BD%AE)修改配置文件。 -启动详细介绍详见[拉起任务](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/README.md#%E6%8B%89%E8%B5%B7%E4%BB%BB%E5%8A%A1),启动命令如下: +启动详细介绍详见[拉起任务](https://gitee.com/mindspore/mindformers/blob/r1.6.0/research/deepseek3/README.md#%E6%8B%89%E8%B5%B7%E4%BB%BB%E5%8A%A1),启动命令如下: ```shell cd $MINDFORMERS_HOME @@ -116,8 +116,8 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ > 此处样例代码假设主节点为`192.168.1.1`、当前Rank序号为`0`。实际执行时请将`master_ip`设置为实际的主节点IP地址;将`node_rank`设置为当前节点的Rank序号。 -**注意**: 在多机分布式训练的过程中,可能会遇到一些性能问题。为了确保训练过程的高效性和稳定性,建议参考[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html),进行必要的性能优化和调整。 +**注意**: 在多机分布式训练的过程中,可能会遇到一些性能问题。为了确保训练过程的高效性和稳定性,建议参考[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/advanced_development/performance_optimization.html),进行必要的性能优化和调整。 ## 更多信息 -更多关于不同模型的训练示例,请访问[MindSpore Transformers已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)。 \ No newline at end of file +更多关于不同模型的训练示例,请访问[MindSpore Transformers已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/introduction/models.html)。 \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md b/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md index 23872ea499..6184d32fe2 100644 --- a/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md +++ b/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md @@ -14,27 +14,27 @@ MindSpore Transformers支持全参微调和LoRA高效微调两种SFT微调方式 ### 1. 权重准备 -在微调之前,需要准备好预训练模型的权重文件。MindSpore Transformers提供加载 [safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html)的能力,支持直接加载从 Hugging Face模型库中下载的模型权重。 +在微调之前,需要准备好预训练模型的权重文件。MindSpore Transformers提供加载 [safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html)的能力,支持直接加载从 Hugging Face模型库中下载的模型权重。 ### 2. 数据集准备 -MindSpore Transformers微调阶段当前已支持[Hugging Face格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#huggingface%E6%95%B0%E6%8D%AE%E9%9B%86)以及[MindRecord格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#mindrecord%E6%95%B0%E6%8D%AE%E9%9B%86)的数据集。用户可根据任务需求完成数据准备。 +MindSpore Transformers微调阶段当前已支持[Hugging Face格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#huggingface%E6%95%B0%E6%8D%AE%E9%9B%86)以及[MindRecord格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#mindrecord%E6%95%B0%E6%8D%AE%E9%9B%86)的数据集。用户可根据任务需求完成数据准备。 ### 3. 配置文件准备 -微调任务通过[配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)统一控制,用户可灵活调整[模型训练超参数](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/training_hyperparameters.html)。另外可以通过[分布式并行训练](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html)、[内存优化特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/memory_optimization.html)以及[其它训练特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/other_training_features.html)对微调性能进行调优。 +微调任务通过[配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/configuration.html)统一控制,用户可灵活调整[模型训练超参数](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/training_hyperparameters.html)。另外可以通过[分布式并行训练](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/parallel_training.html)、[内存优化特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/memory_optimization.html)以及[其它训练特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/other_training_features.html)对微调性能进行调优。 ### 4. 启动训练任务 -MindSpore Transformers提供[一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/start_tasks.html)启动微调任务。训练过程中可结合[日志](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/logging.html)与[可视化工具](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/monitor.html)监控训练情况。 +MindSpore Transformers提供[一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/start_tasks.html)启动微调任务。训练过程中可结合[日志](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/logging.html)与[可视化工具](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/monitor.html)监控训练情况。 ### 5. 模型保存 -训练过程中保存检查点或训练完成后,模型权重将保存至指定路径。当前支持保存为[Safetensors 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html)或[Ckpt 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html),后续可以使用保存的权重进行续训或微调等。 +训练过程中保存检查点或训练完成后,模型权重将保存至指定路径。当前支持保存为[Safetensors 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html)或[Ckpt 格式](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html),后续可以使用保存的权重进行续训或微调等。 ### 6. 故障恢复 -为应对训练中断等异常情况,MindSpore Transformers具备临终保存、自动恢复等[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/high_availability.html),并支持[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html),提升训练稳定性。 +为应对训练中断等异常情况,MindSpore Transformers具备临终保存、自动恢复等[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/high_availability.html),并支持[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/resume_training.html),提升训练稳定性。 ## 使用MindSpore Transformers进行全参微调 @@ -44,7 +44,7 @@ MindSpore Transformers目前已经支持业界主流大模型,该实践流程 ### 下载模型权重 -MindSpore Transformers提供加载Hugging Face模型权重的能力,支持直接加载从Hugging Face模型库中下载的模型权重。详细信息可以参考[MindSpore Transformers-Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html)。 +MindSpore Transformers提供加载Hugging Face模型权重的能力,支持直接加载从Hugging Face模型库中下载的模型权重。详细信息可以参考[MindSpore Transformers-Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/safetensors.html)。 | 模型名称 | Hugging Face权重下载链接 | | :--------- | :--------------------------------------------: | @@ -52,7 +52,7 @@ MindSpore Transformers提供加载Hugging Face模型权重的能力,支持直 ### 数据集准备 -MindSpore Transformers提供在线加载Hugging Face数据集的能力,详细信息可以参考[MindSpore Transformers-数据集-Hugging Face数据集](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#huggingface%E6%95%B0%E6%8D%AE%E9%9B%86)。 +MindSpore Transformers提供在线加载Hugging Face数据集的能力,详细信息可以参考[MindSpore Transformers-数据集-Hugging Face数据集](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html#huggingface%E6%95%B0%E6%8D%AE%E9%9B%86)。 本实践流程以[llm-wizard/alpaca-gpt4-data](https://huggingface.co/datasets/llm-wizard/alpaca-gpt4-data)作为微调数据集为例。 @@ -146,7 +146,7 @@ run_mode: 运行模式,train:训练,finetune:微调,predict #### 多机训练 -多机多卡微调任务与启动预训练类似,可参考[多机多卡的预训练命令](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/pre_training.html#%E5%A4%9A%E6%9C%BA%E8%AE%AD%E7%BB%83)。 +多机多卡微调任务与启动预训练类似,可参考[多机多卡的预训练命令](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/pre_training.html#%E5%A4%9A%E6%9C%BA%E8%AE%AD%E7%BB%83)。 首先对配置文件进行修改,这里需要针对不同的机器数量进行设置: diff --git a/docs/mindformers/docs/source_zh_cn/index.rst b/docs/mindformers/docs/source_zh_cn/index.rst index 0b695efd24..35f88e336b 100644 --- a/docs/mindformers/docs/source_zh_cn/index.rst +++ b/docs/mindformers/docs/source_zh_cn/index.rst @@ -11,7 +11,7 @@ MindSpore Transformers套件基于MindSpore内置的多维混合并行技术和 - 支持任务组件配置化开发。任意模块可通过统一配置进行使能,包括模型网络、优化器、学习率策略等; - 提供训练精度/性能监控指标实时可视化能力等。 -用户可以参阅 `整体架构 `_ 和 `模型库 `_ ,快速了解MindSpore Transformers的系统架构,以及所支持的大模型清单。 +用户可以参阅 `整体架构 `_ 和 `模型库 `_ ,快速了解MindSpore Transformers的系统架构,以及所支持的大模型清单。 如果您对MindSpore Transformers有任何建议,请通过 `issue `_ 与我们联系,我们将及时处理。 @@ -37,18 +37,18 @@ MindSpore Transformers提供了统一的一键启动脚本,支持一键启动 @@ -64,75 +64,75 @@ MindSpore Transformers功能特性说明 - 通用功能: - - `启动任务 `_ + - `启动任务 `_ 单卡、单机和多机任务一键启动。 - - `Ckpt权重 `_ + - `Ckpt权重 `_ 支持ckpt格式的权重文件转换及切分功能。 - - `Safetensors权重 `_ + - `Safetensors权重 `_ 支持safetensors格式的权重文件保存及加载功能。 - - `配置文件 `_ + - `配置文件 `_ 支持使用 `YAML` 文件集中管理和调整任务中的可配置项。 - - `加载Hugging Face模型配置 `_ + - `加载Hugging Face模型配置 `_ 支持加载Hugging Face社区模型配置即插即用,无缝对接。 - - `日志 `_ + - `日志 `_ 日志相关介绍,包括日志结构、日志保存等。 - - `使用Tokenizer `_ + - `使用Tokenizer `_ Tokenizer相关介绍,支持在Hugging Face Tokenizer在推理、数据集中使用。 - 训练功能: - - `数据集 `_ + - `数据集 `_ 支持多种类型和格式的数据集。 - - `训练超参数 `_ + - `训练超参数 `_ 灵活配置大模型训练的超参数配置。 - - `训练指标监控 `_ + - `训练指标监控 `_ 提供大模型训练阶段的可视化服务,用于监控和分析训练过程中的各种指标和信息。 - - `断点续训 `_ + - `断点续训 `_ 支持step级断点续训,有效减少大规模训练时意外中断造成的时间和资源浪费。 - - `训练高可用(Beta) `_ + - `训练高可用(Beta) `_ 提供大模型训练阶段的高可用能力,包括临终 CKPT 保存、UCE 故障容错恢复和进程级重调度恢复功能(Beta特性)。 - - `分布式训练 `_ + - `分布式训练 `_ 一键配置多维混合分布式并行,让模型在上至万卡的集群中高效训练。 - - `训练内存优化 `_ + - `训练内存优化 `_ 支持细粒度选择重计算和细粒度激活值SWAP,用于降低模型训练的峰值内存开销。 - - `其它训练特性 `_ + - `其它训练特性 `_ 支持梯度累积、梯度裁剪等特性。 - 推理功能 - - `评测 `_ + - `评测 `_ 支持使用第三方开源评测框架和数据集进行大模型榜单评测。 - - `量化 `_ + - `量化 `_ 集成 MindSpore Golden Stick 工具组件,提供统一量化推理流程开箱即用。 @@ -141,33 +141,33 @@ MindSpore Transformers功能特性说明 - 调试调优 - - `精度调优 `_ - - `性能调优 `_ + - `精度调优 `_ + - `性能调优 `_ - 模型开发 - - `开发迁移 `_ + - `开发迁移 `_ - 精度对比 - - `与 Megatron-LM 比对训练精度 `_ + - `与 Megatron-LM 比对训练精度 `_ 环境变量 ------------------------------------ -- `环境变量说明 `_ +- `环境变量说明 `_ 贡献指南 ------------------------------------ -- `MindSpore Transformers贡献指南 `_ -- `魔乐社区贡献指南 `_ +- `MindSpore Transformers贡献指南 `_ +- `魔乐社区贡献指南 `_ FAQ ------------------------------------ -- `模型相关 `_ -- `功能相关 `_ +- `模型相关 `_ +- `功能相关 `_ .. toctree:: :glob: diff --git a/docs/mindformers/docs/source_zh_cn/installation.md b/docs/mindformers/docs/source_zh_cn/installation.md index 0337af3984..03457e7448 100644 --- a/docs/mindformers/docs/source_zh_cn/installation.md +++ b/docs/mindformers/docs/source_zh_cn/installation.md @@ -23,7 +23,7 @@ ## 安装依赖软件 -1. 安装固件与驱动:通过[版本匹配关系](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html#%E7%A1%AE%E8%AE%A4%E7%89%88%E6%9C%AC%E5%8C%B9%E9%85%8D%E5%85%B3%E7%B3%BB)中的固件与驱动链接下载安装包,参考[昇腾官方教程](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Ubuntu&Software=cannToolKit)进行安装。 +1. 安装固件与驱动:通过[版本匹配关系](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/installation.html#%E7%A1%AE%E8%AE%A4%E7%89%88%E6%9C%AC%E5%8C%B9%E9%85%8D%E5%85%B3%E7%B3%BB)中的固件与驱动链接下载安装包,参考[昇腾官方教程](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Ubuntu&Software=cannToolKit)进行安装。 2. 安装CANN和MindSpore:按照MindSpore官网的[手动安装](https://www.mindspore.cn/install/)章节进行安装。 diff --git a/docs/mindformers/docs/source_zh_cn/introduction/models.md b/docs/mindformers/docs/source_zh_cn/introduction/models.md index 7f7d97b177..9cbcf79569 100644 --- a/docs/mindformers/docs/source_zh_cn/introduction/models.md +++ b/docs/mindformers/docs/source_zh_cn/introduction/models.md @@ -6,11 +6,11 @@ | 模型名 | 支持规格 | 模型类型 | 最新支持版本 | |:--------------------------------------------------------------------------------------------------------|:------------------------------|:--------:|:----------:| -| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本、1.5.0 | -| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本、1.5.0 | -| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本、1.5.0 | -| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | 稠密LLM | 在研版本、1.5.0 | -| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/dev/research/telechat2) | 7B/35B/115B | 稠密LLM | 在研版本、1.5.0 | +| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/deepseek3) | 671B | 稀疏LLM | 在研版本、1.5.0 | +| [GLM4](https://gitee.com/mindspore/mindformers/blob/r1.6.0/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本、1.5.0 | +| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本、1.5.0 | +| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | 稠密LLM | 在研版本、1.5.0 | +| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/r1.6.0/research/telechat2) | 7B/35B/115B | 稠密LLM | 在研版本、1.5.0 | | [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | 稠密LLM | 1.5.0 | | [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 | | [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 | diff --git a/docs/mindformers/docs/source_zh_cn/introduction/overview.md b/docs/mindformers/docs/source_zh_cn/introduction/overview.md index f756896a2f..4d85126207 100644 --- a/docs/mindformers/docs/source_zh_cn/introduction/overview.md +++ b/docs/mindformers/docs/source_zh_cn/introduction/overview.md @@ -7,9 +7,9 @@ MindSpore Transformers与昇思MindSpore、昇腾Ascend的端到端AI软硬件 1. 在硬件层面,MindSpore Transformers支持用户在Ascend服务器上运行大模型; 2. 在软件层面,MindSpore Transformers通过MindSpore提供的Python接口实现大模型相关代码,并由昇腾AI处理器配套软件包提供的算子库进行数据运算; 3. MindSpore Transformers目前支持的基础功能特性如下: - 1. 支持大模型[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html)运行训练和推理等任务,并行能力包括数据并行、模型并行、超长序列并行等; - 2. 支持[模型权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)、[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)、不同格式[数据集加载](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html)以及[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html)等功能; - 3. 支持25+大模型[预训练](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/pre_training.html)、[微调](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/supervised_fine_tuning.html)、[推理](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/inference.html)和[评测](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/evaluation.html)等功能,同时支持对模型参数进行[量化](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/quantization.html),具体支持模型列表可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html); -4. MindSpore Transformers支持用户通过[MindIE](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/deployment.html)进行模型服务化部署功能,同时支持使用[MindX](https://www.hiascend.com/software/mindx-dl)实现大规模集群调度;后续将支持更多第三方平台,敬请期待。 + 1. 支持大模型[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/parallel_training.html)运行训练和推理等任务,并行能力包括数据并行、模型并行、超长序列并行等; + 2. 支持[模型权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)、[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)、不同格式[数据集加载](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/dataset.html)以及[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/resume_training.html)等功能; + 3. 支持25+大模型[预训练](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/pre_training.html)、[微调](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/supervised_fine_tuning.html)、[推理](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/inference.html)和[评测](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/evaluation.html)等功能,同时支持对模型参数进行[量化](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/quantization.html),具体支持模型列表可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/introduction/models.html); +4. MindSpore Transformers支持用户通过[MindIE](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/guide/deployment.html)进行模型服务化部署功能,同时支持使用[MindX](https://www.hiascend.com/software/mindx-dl)实现大规模集群调度;后续将支持更多第三方平台,敬请期待。 ![/overall_architecture](./images/overall_architecture.png) diff --git a/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md b/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md index fe5ba5d4b2..3d883b9bf6 100644 --- a/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md +++ b/tutorials/source_en/model_infer/ms_infer/llm_inference_overview.md @@ -5,6 +5,7 @@ ## Background At the end of 2022, with the release of OpenAI's ChatGPT, a new research direction emerged in the AI domain, that is, large language models based on the transformer structure. These models exhibited capabilities beyond expectations and achieved impressive results in various tests, quickly becoming the research focus of AI. + One significant research direction in large language models is improving their cost-effectiveness in practical applications. - A large language model usually has tens of billions of parameters, and the computation workload for a single model inference process is extremely high and requires massive compute resources. As a result, AI service providers find that the cost of a large language model inference is very high and cannot be effectively applied to real-world scenarios. @@ -103,7 +104,7 @@ pip install mindspore pip install mindformers ``` -You can also install the Python package that adapts to your environment by referring to the official installation document. For details, see [MindSpore Installation](https://www.mindspore.cn/install/en) and [MindFormers Installation](https://www.mindspore.cn/mindformers/docs/en/dev/installation.html). +You can also install the Python package that adapts to your environment by referring to the official installation document. For details, see [MindSpore Installation](https://www.mindspore.cn/install/en) and [MindFormers Installation](https://www.mindspore.cn/mindformers/docs/en/r1.6.0/installation.html). If you wish to use model quantization to enhance inference performance, you need to install the mindspore_gs package. For details, see [Installing MindSpore Golden Stick](https://www.mindspore.cn/golden_stick/docs/en/master/install.html). @@ -124,7 +125,7 @@ After downloading, you will need to convert the Hugging Face weight format to Mi python convert_weight.py --torch_ckpt_path "/path/to/huggingface_ckpt/" --mindspore_ckpt_path "/path/to/mindspore_ckpt" ``` -You can obtain the conversion script from [convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py). +You can obtain the conversion script from [convert_weight.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/convert_weight.py). For details, see [Large Language Model Weights Obtaining and Preparation](./weight_prepare.md). @@ -145,7 +146,7 @@ config = "/path/to/llama2_7b.yaml" model = AutoModel.from_config(config) ``` -In this code, tokenizer.model is a file downloaded along with the weights from the Hugging Face official website, containing the token mapping table, while config is the model configuration file from MindFormers, which includes the relevant parameters for running the Llama2 model. You can obtain the sample from [predict_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_7b.yaml) (Note: Change the CKPT weight path to the actual weight path). For details, see [Llama 2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#-18). +In this code, tokenizer.model is a tokenizer file downloaded along with the weights from the Hugging Face official website, containing the token mapping table, while config is the model configuration file from MindFormers, which includes the relevant parameters for running the Llama2 model. You can obtain the sample from [predict_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.5.0/configs/llama2/predict_llama2_7b.yaml) (Note: Change the CKPT weight path to the actual weight path). For details, see [Llama 2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md#-18). In addition, if you have special requirements for the model or have a deep understanding of deep learning, you can build your own model. For details, see [Model Development](./model_dev.md). @@ -203,17 +204,17 @@ Once the model is constructed, you can utilize the model object for text generat Note: Each inference step involves postprocessing, specifically selecting generated tokens from the token probability distribution. The simplest way to obtain the highest probability token is by using argmax. The MindFormers model incorporates this processing into the generate API. If you build a large language model yourself, you will need to implement this part separately. -In addition to utilizing the capabilities provided by the MindFormers model suite, you can also build your own preprocessing and postprocessing. Given the complexity of the logic, you may refer to the relevant implementations in MindFormers. For details, see [llama_tokenzier.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/llama_tokenizer.py) and [text_generator.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/generation/text_generator.py). +In addition to utilizing the capabilities provided by the MindFormers model suite, you can also build your own preprocessing and postprocessing. Given the complexity of the logic, you may refer to the relevant implementations in MindFormers. For details, see [llama_tokenzier.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/llama_tokenizer.py) and [text_generator.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/generation/text_generator.py). ### Model Parallelism For large language models with many model parameters, such as Llama2-70B and Qwen2-72B, the parameter scale usually exceeds the memory capacity of a GPU or NPU. Therefore, multi-device parallel inference is required. MindSpore large language model inference can shard the original large language model into N parallel models so that they can be executed on multiple devices in parallel. This not only enables inference for super-large models but also enhances performance by leveraging more resources from the multiple devices. The model scripts provided by the MindFormers model suite can be used to shard a model into multi-device models for execution. You can perform the following steps to deploy the model on multiple devices. -- **Weight sharding**: Because the original weight files are too large, when executing on multiple devices, the overall weight needs to be sharded into multiple weights for each device and passed to the model process corresponding to each device. You can use the script in the MindFormers model suite to perform weight sharding. For details, see [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/ckpt.html). +1. **Weight sharding**: Because the original weight files are too large, when executing on multiple devices, the overall weight needs to be sharded into multiple weights for each device and passed to the model process corresponding to each device. You can use the script in the MindFormers model suite to perform weight sharding. For details, see [Weight Conversion](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/feature_cards/Transform_Ckpt.md). Here is an example of how to shard the Llama2-7B model for parallel execution on two devices. - - **Generating a target parallel strategy file** When MindSpore performs sharding, you need to specify the sharding mode. The information is stored in the parallel strategy file and can be generated using the [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py) script. Open the YAML file corresponding to the Llama2-7B model and modify the following configuration: + 1. **Generating a target parallel strategy file** When MindSpore performs sharding, you need to specify the sharding mode. The information is stored in the parallel strategy file and can be generated using the [run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.5.0/run_mindformer.py) script. Open the YAML file corresponding to the Llama2-7B model and modify the following configuration: - Set only_save_strategy to True, indicating that generating parallel sharding strategy files is enabled. @@ -229,7 +230,7 @@ For large language models with many model parameters, such as Llama2-70B and Qwe msrun is a parallel execution tool provided by MindSpore. The input_data parameter can accept any content to ensure that the model process can be executed properly. After the program is executed, the strategy directory is generated in the output directory, that is, the parallel sharding strategy file for two-device parallel inference. - - **Sharding model weight CKPT file**: Call the conversion script to shard and generate the weight CKPT files. For details, see [transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py). + 2. **Sharding model weight CKPT file**: Call the conversion script to shard and generate the weight CKPT files. For details, see [transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/r1.5.0/mindformers/tools/ckpt_transform/transform_checkpoint.py). Run the following command to shard the weight into two-device parallel weights: @@ -240,7 +241,7 @@ For large language models with many model parameters, such as Llama2-70B and Qwe Here, src_checkpoint is the path to the source CKPT file. In the example, full sharding is used. Therefore, the source strategy file does not need to be passed. However, the path must point to the CKPT file, not to a directory. dst_checkpoint is the target directory of the sharding result. After the sharding is complete, two subdirectories rank_0 and rank_1 are generated to store the weight CKPT files of different devices. dst_strategy is the path of the strategy file generated in the previous step. -- **Model adaptation**: When the MindSpore large language model is running on multiple devices, model parallelism is usually used. Therefore, the original model needs to be sharded based on the number of devices. For example, the matrix multiplication of [1024, 4096] and [4096, 2048] can be sharded into two matrix multiplications of [1024, 4096] and [4096, 1024] respectively. Different sharding may bring different parallel computing performance. The MindFormers model provides proven excellent sharding solution for MindSpore model and uses the MindSpore parallel framework for sharding. The following is part of the sharding code in the model: +2. **Model adaptation**: When the MindSpore large language model is running on multiple devices, model parallelism is usually used. Therefore, the original model needs to be sharded based on the number of devices. For example, the matrix multiplication of [1024, 4096] and [4096, 2048] can be sharded into two matrix multiplications of [1024, 4096] and [4096, 1024] respectively. Different sharding may bring different parallel computing performance. The MindFormers model provides proven excellent sharding solution for MindSpore model and uses the MindSpore parallel framework for sharding. The following is part of the sharding code in the model: ```python if not (_get_parallel_mode() in (ParallelMode.AUTO_PARALLEL,) and _is_sharding_propagation()): @@ -275,9 +276,9 @@ For large language models with many model parameters, such as Llama2-70B and Qwe - Change parallel_config.model_parallel to the required number of parallel devices. data_parallel is usually set to 1 in the inference scenario. No additional configuration is required. - For details about the network script code, see [llama.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/llama.py). + For details about the network script code, see [llama.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/llama.py). -- **Model inference**: Different from single-device inference, multi-device inference requires multiple processes to be started at the same time for parallel inference. Therefore, compared with directly running a script, multi-device inference requires multiple groups of related processes to be run at a time. The MindSpore framework provides the msrun parallel running tool. The usage method is as follows. +3. **Model inference**: Different from single-device inference, multi-device inference requires multiple processes to be started at the same time for parallel inference. Therefore, compared with directly running a script, multi-device inference requires multiple groups of related processes to be run at a time. The MindSpore framework provides the msrun parallel running tool. The usage method is as follows. ```shell msrun --worker_num=2 --local_worker_num=2 run_mindformer.py --config "/path/to/llama2_7b.yaml" --input_data "hello" diff --git a/tutorials/source_en/model_infer/ms_infer/model_dev.md b/tutorials/source_en/model_infer/ms_infer/model_dev.md index e4084ad88c..16d8793560 100644 --- a/tutorials/source_en/model_infer/ms_infer/model_dev.md +++ b/tutorials/source_en/model_infer/ms_infer/model_dev.md @@ -4,7 +4,7 @@ ## Large Language Model Backbone Network -Currently, the backbone networks of mainstream large language models are mainly based on the transformer structure. The most important part is the computation of the self-attention mechanism. The following figure uses the Llama2 large language model as an example to describe the backbone network structure. +Currently, the backbone networks of mainstream large language models are mainly based on the Transformer structure. The most important part is the computation of the self-attention mechanism. The following figure uses the Llama2 large language model as an example to describe the backbone network structure. ![LLAMA network structure](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.0rc1/tutorials/source_zh_cn/model_infer/ms_infer/images/llm_llama_network_arch.png) @@ -12,15 +12,15 @@ The core layer of Llama2 consists of the following parts: - **Embedding**: converts the index corresponding to each token into a vector to implement feature dispersion. Similar to onehot vectorization, the embedding weight is involved in the training process to better adapt to the context semantics in the language model. The process is implemented by an Embedding operator. -- **DecodeLayer**: transformer structure, which is the key for the computation of the large language model. Generally, multi-layer computation is performed based on different configurations. Each layer is actually a transformer structure, which is one of the cores of the foundation language model. +- **DecodeLayer**: Transformer structure, which is the key for the computation of the large language model. Generally, multi-layer computation is performed based on different configurations. Each layer is actually a Transformer structure, which is one of the cores of the foundation language model. -- **RmsNorm&Linear**: outputs linear normalization layer. After computation of the transformer structure, the result is normalized to the same dimension as the model vocabulary, and the probability distribution of each token is returned. +- **RmsNorm&Linear**: outputs linear normalization layer. After computation of the Transformer structure, the result is normalized to the same dimension as the model vocabulary, and the probability distribution of each token is returned. -To build a network with MindSpore large language model inference, you can assemble the operators provided by MindSpore. The following is an example to describe how to build a typical transformer model. +To build a network with MindSpore large language model inference, you can assemble the operators provided by MindSpore. The following is an example to describe how to build a typical Transformer model. ## TransformerModel -In a typical transformer model, each layer consists of the normalization, attention, residual connection, and multi-layer perception (MLP). Both attention and MLP meet the requirements of two continuous matrix multiplications. +In a typical Transformer model, each layer consists of the normalization, attention, residual connection, and multi-layer perception (MLP). Both attention and MLP meet the requirements of two continuous matrix multiplications. 1. Attention diff --git a/tutorials/source_en/model_infer/ms_infer/parallel.md b/tutorials/source_en/model_infer/ms_infer/parallel.md index ce1e15c9d4..681d4b5a61 100644 --- a/tutorials/source_en/model_infer/ms_infer/parallel.md +++ b/tutorials/source_en/model_infer/ms_infer/parallel.md @@ -2,8 +2,9 @@ [![](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.0rc1/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r2.7.0rc1/tutorials/source_en/model_infer/ms_infer/parallel.md) -In recent years, with the rapid development of deep learning technologies, especially the emergence of large-scale pre-trained models (such as ChatGPT, LLaMA, and Pangu), the AI field has made significant progress. However, as model sizes continue to expand, the computing resources required by these large models, particularly GPU memory, are growing exponentially. For example, the Pangu model with 71 billion parameters requires approximately 142 GB of GPU memory at half-precision (FP16). In addition, the increasing sequence length of large models places immense pressure on GPU memory. -The constraints of GPU memory not only affect model loading but also limit batch sizes. Smaller batch sizes may lead to decreased inference efficiency, consequently impacting the overall throughput of the system. +In recent years, with the rapid development of deep learning technologies, especially the emergence of large-scale pre-trained models (such as ChatGPT, LLaMA, and Pangu), the AI field has made significant progress. However, as model sizes continue to expand, the computing resources required by these large models, particularly GPU memory, are growing exponentially. For example, the Pangu model with 71 billion parameters requires approximately 142 GB of GPU memory at half-precision (FP16). + +In addition, the increasing sequence length of large models places immense pressure on GPU memory. The constraints of GPU memory not only affect model loading but also limit batch sizes. Smaller batch sizes may lead to decreased inference efficiency, consequently impacting the overall throughput of the system. The pressure on GPU memory makes it challenging for a single device to complete inference tasks within a reasonable time frame, and parallel computing has become a key strategy to address this challenge. @@ -80,120 +81,121 @@ Starting with the original implementation of [nn.Dense](https://www.mindspore.cn `ColumnParallelLinear` class calculates and initializes the sharded weights' shape based on the number of devices used for model parallelism. Column-wise means to shard `out_channels`. During the model's forward pass, MatMul is called to compute the parallel results. Finally, an `AllGather` operation can be optionally performed on the parallel results to obtain the complete output. The MindSpore training and inference integrated framework supports enabling `infer_boost`. This parameter activates the high-performance self-developed operator library within the MindSpore framework. To enable this mode, you need to: - - Set variables. - - ```python - from mindspore import set_context - set_context(jit_config={"jit_level": 'O0', "infer_boost": 'on'}) - ``` - - - Set system environment variables. - - ```bash - export ASCEND_HOME_PATH={$ascend_custom_path} - ``` - - For example, if there are 2 devices for model parallelism, set environment variables, initialize the communication group, and configure the model parameter `config` as follows: - - ```python - from mindspore import nn, Parameter, ops, Tensor - from mindspore.common import dtype as mstype - from mindspore.communication import init - from mindspore.common.initializer import initializer - import numpy as np - - from mindspore import set_context - set_context(jit_config={"jit_level": 'O0', "infer_boost": 'on'}) - - TP_GROUP_NAME='tp' - TP_SIZE = 2 - COMMUN_HELPER = CommunicationHelper(group_name=TP_GROUP_NAME, size=TP_SIZE) - - init() - COMMUN_HELPER.create_tensor_model_parallel_group() - - config = ConfigHelper(batch_size=64, - vocab_size=32000, - num_layers=4, - seq_length=2048, - hidden_size=1024, - ffn_hidden_size=4096, - dtype=mstype.float16, - num_heads=8, - has_bias=False) - ``` - - Column-wise MatMul module - - ```python - class ColumnParallelLinear(nn.Cell): - def __init__(self, - in_channels, - out_channels, - weight_init=None, - bias_init=None, - has_bias=True, - dtype=mstype.float32): - super().__init__() - self.in_channels = in_channels - self.out_channels = out_channels - self.has_bias = has_bias - self.tensor_parallel_group_size = COMMUN_HELPER.get_tensor_model_parallel_group_size() - self.out_channels_per_partition = out_channels // self.tensor_parallel_group_size - self.dtype = dtype - weight_shape = (self.out_channels_per_partition, self.in_channels) - self.weight = Parameter(initializer(weight_init, weight_shape, self.dtype), name="weight") - if self.has_bias: - self.bias = Parameter(initializer(bias_init, (self.out_channels_per_partition), self.dtype), name="bias") - self.bias_add = ops.Add() - self.matmul = ops.BatchMatMul(transpose_b=True) - self.cast = ops.Cast() - - def construct(self, x): - origin_dtype = x.dtype - x = self.cast(x, self.dtype) - out = self.matmul(x, self.weight) - if self.has_bias: - out = self.bias_add( - out, self.cast(self.bias, self.dtype) - ) - out = self.cast(out, origin_dtype) - return out - ``` - - The output of column-wise MatMul is parallel. To obtain a complete output, use `GatherLastDim`. - - ```python - class GatherLastDim(nn.Cell): - def __init__(self): - super().__init__() - self.all_gather = ops.AllGather(group=COMMUN_HELPER.get_tensor_model_parallel_group()) - self.world_size = COMMUN_HELPER.get_tensor_model_parallel_group_size() - self.split = ops.Split(axis=0, output_num=self.world_size) - - def construct(self, input_): - output = self.all_gather(input_) - tensor_list = self.split(output) - output = ops.cat(tensor_list, axis=-1) - return output - ``` - - Inference of column-wise MatMul: - ```python - column_parallel_linear = ColumnParallelLinear(in_channels=config.hidden_size, - out_channels=config.hidden_size, - weight_init='normal', - dtype=config.dtype, - has_bias=False) - input_x = Tensor(np.random.randn(config.batch_size, config.seq_length, config.hidden_size).astype(np.float32)) - out_parallel = column_parallel_linear(input_x) - print(out_parallel.shape) - - gather_last_dim = GatherLastDim() - out = gather_last_dim(out_parallel) - print(out.shape) - ``` + 1. Set variables. + + ```python + from mindspore import set_context + set_context(jit_config={"jit_level": 'O0', "infer_boost": 'on'}) + ``` + + 2. Set system environment variables. + + ```bash + export ASCEND_HOME_PATH={$ascend_custom_path} + ``` + + For example, if there are 2 devices for model parallelism, set environment variables, initialize the communication group, and configure the model parameter `config` as follows: + + ```python + from mindspore import nn, Parameter, ops, Tensor + from mindspore.common import dtype as mstype + from mindspore.communication import init + from mindspore.common.initializer import initializer + import numpy as np + + from mindspore import set_context + set_context(jit_config={"jit_level": 'O0', "infer_boost": 'on'}) + + TP_GROUP_NAME='tp' + TP_SIZE = 2 + COMMUN_HELPER = CommunicationHelper(group_name=TP_GROUP_NAME, size=TP_SIZE) + + init() + COMMUN_HELPER.create_tensor_model_parallel_group() + + config = ConfigHelper(batch_size=64, + vocab_size=32000, + num_layers=4, + seq_length=2048, + hidden_size=1024, + ffn_hidden_size=4096, + dtype=mstype.float16, + num_heads=8, + has_bias=False) + ``` + + Column-wise MatMul module + + ```python + class ColumnParallelLinear(nn.Cell): + def __init__(self, + in_channels, + out_channels, + weight_init=None, + bias_init=None, + has_bias=True, + dtype=mstype.float32): + super().__init__() + self.in_channels = in_channels + self.out_channels = out_channels + self.has_bias = has_bias + self.tensor_parallel_group_size = COMMUN_HELPER.get_tensor_model_parallel_group_size() + self.out_channels_per_partition = out_channels // self.tensor_parallel_group_size + self.dtype = dtype + weight_shape = (self.out_channels_per_partition, self.in_channels) + self.weight = Parameter(initializer(weight_init, weight_shape, self.dtype), name="weight") + if self.has_bias: + self.bias = Parameter(initializer(bias_init, (self.out_channels_per_partition), self.dtype), name="bias") + self.bias_add = ops.Add() + self.matmul = ops.BatchMatMul(transpose_b=True) + self.cast = ops.Cast() + + def construct(self, x): + origin_dtype = x.dtype + x = self.cast(x, self.dtype) + out = self.matmul(x, self.weight) + if self.has_bias: + out = self.bias_add( + out, self.cast(self.bias, self.dtype) + ) + out = self.cast(out, origin_dtype) + return out + ``` + + The output of column-wise MatMul is parallel. To obtain a complete output, use `GatherLastDim`. + + ```python + class GatherLastDim(nn.Cell): + def __init__(self): + super().__init__() + self.all_gather = ops.AllGather(group=COMMUN_HELPER.get_tensor_model_parallel_group()) + self.world_size = COMMUN_HELPER.get_tensor_model_parallel_group_size() + self.split = ops.Split(axis=0, output_num=self.world_size) + + def construct(self, input_): + output = self.all_gather(input_) + tensor_list = self.split(output) + output = ops.cat(tensor_list, axis=-1) + return output + ``` + + Inference of column-wise MatMul: + + ```python + column_parallel_linear = ColumnParallelLinear(in_channels=config.hidden_size, + out_channels=config.hidden_size, + weight_init='normal', + dtype=config.dtype, + has_bias=False) + input_x = Tensor(np.random.randn(config.batch_size, config.seq_length, config.hidden_size).astype(np.float32)) + out_parallel = column_parallel_linear(input_x) + print(out_parallel.shape) + + gather_last_dim = GatherLastDim() + out = gather_last_dim(out_parallel) + print(out.shape) + ``` 3. Row-wise MatMul diff --git a/tutorials/source_en/model_infer/ms_infer/quantization.md b/tutorials/source_en/model_infer/ms_infer/quantization.md index 8dea229046..73f860ae6a 100644 --- a/tutorials/source_en/model_infer/ms_infer/quantization.md +++ b/tutorials/source_en/model_infer/ms_infer/quantization.md @@ -60,7 +60,7 @@ rtn.convert(net) ms.save_checkpoint(net.parameters_dict(), './simplenet_rtn.ckpt') ``` -1. Use [nn.Cell](https://www.mindspore.cn/docs/en/r2.0/api_python/nn/mindspore.nn.Cell.html) to define the network. After training the model, obtain the floating-point weight of the model, and then load the floating-point weight during inference. The above example simplifies the process by directly creating a network and using the initial floating-point weight for quantization. +1. Use [nn.Cell](https://www.mindspore.cn/docs/en/r2.7.0rc1/api_python/nn/mindspore.nn.Cell.html) to define the network. After training the model, obtain the floating-point weight of the model, and then load the floating-point weight during inference. The above example simplifies the process by directly creating a network and using the initial floating-point weight for quantization. 2. Use PTQConfig to set mode to quantization mode, set the backend to Ascend, and perform 8-bit quantization on the weight. For details, see [PTQConfig Configuration Description](#ptqconfig-configuration-description). 3. Use the apply API to convert the network into a pseudo-quantized network and collect statistics on the quantized object based on the configuration in `PTQConfig`. 4. Use the convert API to quantize the pseudo-quantized network in the previous step to obtain the quantized network. diff --git a/tutorials/source_en/model_infer/ms_infer/weight_prepare.md b/tutorials/source_en/model_infer/ms_infer/weight_prepare.md index 0e86f9ce66..a38b082eef 100644 --- a/tutorials/source_en/model_infer/ms_infer/weight_prepare.md +++ b/tutorials/source_en/model_infer/ms_infer/weight_prepare.md @@ -46,11 +46,7 @@ In the preceding information, pytorch_model-00001-of-00002.bin and pytorch_model To convert Hugging Face weight files into MindSpore weight files, perform the following steps: -1. Load the Hugging Face weight files into a list of PyTorch tensors. -2. Convert the PyTorch tensor list into a list of MindSpore tensors. -3. Save the MindSpore tensor list as a MindSpore CKPT weight file. - -- **Install the Python dependency package**: Since the conversion involves both Hugging Face and MindSpore, you need to install the respective Python packages, primarily including transformers, torch, and mindspore. +1. **Install the Python dependency package**: Since the conversion involves both Hugging Face and MindSpore, you need to install the respective Python packages, primarily including transformers, torch, and mindspore. ```shell pip install torch @@ -58,7 +54,7 @@ To convert Hugging Face weight files into MindSpore weight files, perform the fo pip install transformers ``` -- **Load the Hugging Face model**: Use the transformers library to load the Llama2 weight files and model, and retrieve the list of weights which is actually a list of PyTorch tensor objects. +2. **Load the Hugging Face model**: Use the transformers library to load the Llama2 weight files and model, and retrieve the list of weights which is actually a list of PyTorch tensor objects. ```python import os @@ -74,9 +70,9 @@ To convert Hugging Face weight files into MindSpore weight files, perform the fo print(f"name: {name}") ``` -Executing this python code will load the weights of Llama2 and print out the names of each weight, indicating that the model has been successfully loaded. + Executing this python code will load the weights of Llama2 and print out the names of each weight, indicating that the model has been successfully loaded. -- **Converting torch.Tensor to mindspore.Tensor**: Use NumPy as an intermediary to convert the PyTorch tensor objects into MindSpore tensor objects. In addition to the data, the names of the MindSpore weights differ from those in Hugging Face, so a mapping relationship needs to be recorded. +3. **Converting torch.Tensor to mindspore.Tensor**: Use NumPy as an intermediary to convert the PyTorch tensor objects into MindSpore tensor objects. In addition to the data, the names of the MindSpore weights differ from those in Hugging Face, so a mapping relationship needs to be recorded. - Weight name mapping: Replace the Hugging Face weight names with the MindSpore weight names. @@ -133,7 +129,7 @@ Executing this python code will load the weights of Llama2 and print out the nam print(ckpt_list) ``` -- **Saving a MindSpore CKPT weight file**: Call the MindSpore API to save tensors as a CKPT weight file. +4. **Saving a MindSpore CKPT weight file**: Call the MindSpore API to save tensors as a CKPT weight file. ```python ms_ckpt_path="/path/to/mindspore/ckpt" diff --git a/tutorials/source_en/model_infer/ms_infer/weight_split.md b/tutorials/source_en/model_infer/ms_infer/weight_split.md index a2b5349a8e..c1a87d7dad 100644 --- a/tutorials/source_en/model_infer/ms_infer/weight_split.md +++ b/tutorials/source_en/model_infer/ms_infer/weight_split.md @@ -151,6 +151,6 @@ def save_strategy_file(state_dict, strategy_file_name): raise e ``` -After the parallel strategy file of the inference network is obtained, the training weight can be converted into the weight required for inference according to the method of executing distributed checkpoint transformation. +After the parallel strategy file of the inference network is obtained, the training weight can be converted into the weight required for inference according to the distributed checkpoint transformation method. For details about the end-to-end weight sharding code project, see [Weight Sharding](https://gitee.com/mindspore/docs/blob/r2.7.0rc1/docs/sample_code/infer_code/param_split.py). diff --git a/tutorials/source_en/parallel/split_technique.md b/tutorials/source_en/parallel/split_technique.md index a35bab6d78..781b4cd084 100644 --- a/tutorials/source_en/parallel/split_technique.md +++ b/tutorials/source_en/parallel/split_technique.md @@ -32,7 +32,7 @@ Users working with strategy propagation need to have some understanding not only ## Configuring Code Samples -Taking the encapsulated class [RowParallelLinear](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/experimental/graph/tensor_parallel/layers.py) in MindFormers as an example: +Taking the encapsulated class [RowParallelLinear](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/experimental/graph/tensor_parallel/layers.py) in MindFormers as an example: @@ -78,7 +78,7 @@ class RowParallelLinear(nn.Cell):
-The other example is [CoreAttention](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/experimental/graph/transformer/transformer.py). Configure it as above: +The other example is [CoreAttention](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/experimental/graph/transformer/transformer.py). Configure it as above:
@@ -117,7 +117,7 @@ class CoreAttention(nn.Cell):
-Check the example of [FlashAttention](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/modules/flash_attention.py): +Check the example of [FlashAttention](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/modules/flash_attention.py):
@@ -159,7 +159,7 @@ class FlashAttention(Cell):
-If classes that are open source and already paired with a strategy in MindFormers are used directly, the external network does not need to configure the shard strategy for the operator again, e.g., [LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/llama.py). +If classes that are open source and already paired with a strategy in MindFormers are used directly, the external network does not need to configure the shard strategy for the operator again, e.g., [LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/llama.py).
diff --git a/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md b/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md index 96ba4bd94b..1e14cb04c5 100644 --- a/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md +++ b/tutorials/source_zh_cn/model_infer/ms_infer/llm_inference_overview.md @@ -104,7 +104,7 @@ pip install mindspore pip install mindformers ``` -同时,用户也可以参考官方安装文档来安装自己环境适配的Python包,具体见[MindSpore安装](https://www.mindspore.cn/install)和[MindFormers安装](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html)。 +同时,用户也可以参考官方安装文档来安装自己环境适配的Python包,具体见[MindSpore安装](https://www.mindspore.cn/install)和[MindFormers安装](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/installation.html)。 如果用户需要使用模型量化能力提升模型推理性能,还需要安装mindspore_gs包,具体可以参考[MindSpore GoldenStick安装](https://www.mindspore.cn/golden_stick/docs/zh-CN/master/install.html)。 @@ -125,7 +125,7 @@ git clone https://huggingface.co/daryl149/llama-2-7b-hf python convert_weight.py --torch_ckpt_path "/path/to/huggingface_ckpt/" --mindspore_ckpt_path "/path/to/mindspore_ckpt" ``` -具体转换脚本可以在[convert_weight.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/convert_weight.py)获取。 +具体转换脚本可以在[convert_weight.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/convert_weight.py)获取。 详细教程见[大语言模型权重获取和准备](./weight_prepare.md)。 @@ -146,7 +146,7 @@ config = "/path/to/llama2_7b.yaml" model = AutoModel.from_config(config) ``` -其中,tokenizer.model是从Hugging Face官网下载的权重文件中附带的tokenizer文件,里面记录了tokens的映射表;config是MindFormers的模型配置文件,其中包含了Llama2模型运行的相关参数,样例可以在[predict_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/predict_llama2_7b.yaml)获取(注意:需要将ckpt权重路径改为实际的权重路径)。更详细的教程可以在[Llama 2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#-18)获取。 +其中,tokenizer.model是从Hugging Face官网下载的权重文件中附带的tokenizer文件,里面记录了tokens的映射表;config是MindFormers的模型配置文件,其中包含了Llama2模型运行的相关参数,样例可以在[predict_llama2_7b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.6.0/configs/llama2/predict_llama2_7b.yaml)获取(注意:需要将ckpt权重路径改为实际的权重路径)。更详细的教程可以在[Llama 2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#-18)获取。 此外,如果用户对于模型有自己的特殊需求,或者对深度学习有较深认识,也可以选择自己构建模型,详细教程见[从零构建大语言模型推理网络](./model_dev.md)。 @@ -204,17 +204,17 @@ model = AutoModel.from_config(config) 注意:每轮推理实际都会有一部分后处理,即从token概率分布中选择生成的token,最简单的可以通过argmax计算获取概率最大的token,MindFormers的模型将此处理包含在了generate接口中,如果用户自己构建大语言模型,此部分需要单独实现。 -除了使用MindFormers模型套件提供的模型能力外,用户也可以自己构建前处理和后处理,由于其逻辑比较复杂,用户可以参考MindFormers的相关实现进行实现。具体见[llama_tokenzier.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/llama_tokenizer.py)和[text_generator.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/generation/text_generator.py)。 +除了使用MindFormers模型套件提供的模型能力外,用户也可以自己构建前处理和后处理,由于其逻辑比较复杂,用户可以参考MindFormers的相关实现进行实现。具体见[llama_tokenzier.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/llama_tokenizer.py)和[text_generator.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/generation/text_generator.py)。 ### 模型并行 对于模型参数比较多的大语言模型,如Llama2-70B、Qwen2-72B,由于其参数规模通常会超过一张GPU或者NPU的内存容量,因此需要采用多卡并行推理,MindSpore大语言模型推理支持将原始大语言模型切分成N份可并行的子模型,使其能够分别在多卡上并行执行,在实现超大模型推理同时,也利用多卡中更多的资源提升性能。MindFormers模型套件提供的模型脚本天然支持将模型切分成多卡模型执行,用户可以通过以下步骤在多卡上部署模型。 -1. **权重切分**:由于原来的权重文件太大,多卡执行时,需要将整体权重切分成每张卡上的多份权重,分别传给每张卡对应的模型进程。用户可以使用MindFormers模型套件中的脚本来进行权重切分。具体可以参考[权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/ckpt.html)。 +1. **权重切分**:由于原来的权重文件太大,多卡执行时,需要将整体权重切分成每张卡上的多份权重,分别传给每张卡对应的模型进程。用户可以使用MindFormers模型套件中的脚本来进行权重切分。具体可以参考[权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.6.0/feature/ckpt.html)。 下面以Llama2-7B大语言模型为例,简单描述一下将模型切分为2卡并行的操作: - 1. **生成目标并行策略文件**:MindSpore进行切分,需要用户告诉框架切分的方式,这个信息存储在并行切分策略文件中,可以通过[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/dev/run_mindformer.py)脚本进行生成。打开Llama2-7B模型对应的yaml文件,修改以下配置: + 1. **生成目标并行策略文件**:MindSpore进行切分,需要用户告诉框架切分的方式,这个信息存储在并行切分策略文件中,可以通过[run_mindformer.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/run_mindformer.py)脚本进行生成。打开Llama2-7B模型对应的yaml文件,修改以下配置: - 将only_save_strategy配置改为True,表示生成并行切分策略文件。 @@ -230,7 +230,7 @@ model = AutoModel.from_config(config) msrun为MindSpore提供的并行执行工具,input_data参数可以任意传入内容,传入是为了保证模型流程能够正常执行,这段程序执行完,会在output目录下生成strategy目录,即是切分成2卡执行的并行切分策略文件。 - 2. **切分模型权重ckpt文件**:调用转换脚本,切分并生成权重ckpt文件,具体参考[transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/ckpt_transform/transform_checkpoint.py)。 + 2. **切分模型权重ckpt文件**:调用转换脚本,切分并生成权重ckpt文件,具体参考[transform_checkpoint.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/tools/ckpt_transform/transform_checkpoint.py)。 执行如下命令,利用脚本将权重切分成2卡并行权重: @@ -276,7 +276,7 @@ model = AutoModel.from_config(config) - 将parallel_config.model_parallel改为需要的并行卡数,data_parallel在推理场景下通常配置为1,不需要额外配置。 - 具体的网络脚本代码可以参考[llama.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/llama.py)。 + 具体的网络脚本代码可以参考[llama.py](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/llama.py)。 3. **模型推理**:和单卡推理不同,多卡推理需要同时启动多个进程来并行进行推理,因此在启动模型推理是,相比于直接运行脚本,多卡推理需要一次运行多组相关进程。MindSpore框架为用户提供了msrun的并行运行工具,具体使用方法如下: diff --git a/tutorials/source_zh_cn/parallel/split_technique.md b/tutorials/source_zh_cn/parallel/split_technique.md index 10afd730f7..2a1bac3fcb 100644 --- a/tutorials/source_zh_cn/parallel/split_technique.md +++ b/tutorials/source_zh_cn/parallel/split_technique.md @@ -32,7 +32,7 @@ ## 配置代码样例 -以MindFormers中封装的类[RowParallelLinear](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/experimental/graph/tensor_parallel/layers.py)为例: +以MindFormers中封装的类[RowParallelLinear](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/experimental/graph/tensor_parallel/layers.py)为例: @@ -78,7 +78,7 @@ class RowParallelLinear(nn.Cell):
-另一个例子是[CoreAttention](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/experimental/graph/transformer/transformer.py),根据上述原则配置: +另一个例子是[CoreAttention](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/experimental/graph/transformer/transformer.py),根据上述原则配置:
@@ -117,7 +117,7 @@ class CoreAttention(nn.Cell):
-再看[FlashAttention](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/modules/flash_attention.py)的例子: +再看[FlashAttention](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/modules/flash_attention.py)的例子:
@@ -159,7 +159,7 @@ class FlashAttention(Cell):
-若直接使用MindFormers中开源且已经配好策略的类,则外部网络无需对算子再配置shard策略,如[LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/llama/llama.py)。 +若直接使用MindFormers中开源且已经配好策略的类,则外部网络无需对算子再配置shard策略,如[LlamaForCausalLM](https://gitee.com/mindspore/mindformers/blob/r1.6.0/mindformers/models/llama/llama.py)。
-- Gitee