diff --git a/README.md b/README.md
index 7cd1b0665774d2f91854b34e1326b0b7317bf6f1..7c8418bd62aea9604a899528cf46072b5b9ba6bc 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ docs
| |
| ├───mindspore // MindSpore Documents
| |
-| ├───mindformers // MindSpore Transformer Documents
+| ├───mindformers // MindSpore Transformers Documents
| |
| ├───probability // MindSpore Probability Documents
| |
diff --git a/docs/lite/docs/source_zh_cn/infer/runtime_java.md b/docs/lite/docs/source_zh_cn/infer/runtime_java.md
index 8d7547dc79509dfd848920918eaada2383bd12b5..beb279cd62449e4a8962845736bbddf6503c6160 100644
--- a/docs/lite/docs/source_zh_cn/infer/runtime_java.md
+++ b/docs/lite/docs/source_zh_cn/infer/runtime_java.md
@@ -42,7 +42,7 @@ Android项目中使用MindSpore Lite,可以选择采用[C++ API](https://www.m
采用`Gradle`作为构建工具时,首先将`mindspore-lite-{version}.aar`文件移动到目标module的`libs`目录,然后在目标module的`build.gradle`的`repositories`中添加本地引用目录,最后在`dependencies`中添加AAR的依赖,具体如下所示。
-> 注意mindspore-lite-{version}是AAR的文件名,需要将{version}替换成对应版本信息。
+> mindspore-lite-{version}是AAR的文件名,需要将{version}替换成对应版本信息。
```groovy
repositories {
diff --git a/docs/lite/docs/source_zh_cn/train/train_lenet.md b/docs/lite/docs/source_zh_cn/train/train_lenet.md
index 084004c2df0be9961d65306310faa8fd82e8c2ed..4c7d1ca44dc9563cd73f247695b813d0deb838ad 100644
--- a/docs/lite/docs/source_zh_cn/train/train_lenet.md
+++ b/docs/lite/docs/source_zh_cn/train/train_lenet.md
@@ -2,7 +2,7 @@
[](https://gitee.com/mindspore/docs/blob/master/docs/lite/docs/source_zh_cn/train/train_lenet.md)
-> 注意:MindSpore已经统一端边云推理API,如您想继续使用MindSpore Lite独立API进行端侧训练,可以参考[此文档](https://www.mindspore.cn/lite/docs/zh-CN/r1.3/quick_start/train_lenet.html)。
+> MindSpore已经统一端边云推理API,如您想继续使用MindSpore Lite独立API进行端侧训练,可以参考[此文档](https://www.mindspore.cn/lite/docs/zh-CN/r1.3/quick_start/train_lenet.html)。
## 概述
diff --git a/docs/mindformers/docs/source_en/_templates/classtemplate.rst b/docs/mindformers/docs/source_en/_templates/classtemplate.rst
index 2fd50a2bcc6e973d1ec797312661aea6a105b62a..fd31c0a0d2f22639cddff38a9e828b89004e20d0 100644
--- a/docs/mindformers/docs/source_en/_templates/classtemplate.rst
+++ b/docs/mindformers/docs/source_en/_templates/classtemplate.rst
@@ -98,7 +98,7 @@
{{ fullname | underline }}
.. autoclass:: {{ name }}
- :exclude-members: from_dict, from_model_config, to_dict, update
+ :exclude-members: from_dict, from_model_config, to_dict, update, from_pretrained
:members:
{% elif fullname=="mindformers.generation.GenerationMixin" %}
@@ -112,7 +112,7 @@
{{ fullname | underline }}
.. autoclass:: {{ name }}
- :exclude-members: add_flags_custom, prepare_inputs_for_generation, prepare_inputs_for_predict_layout, construct
+ :exclude-members: add_flags_custom, prepare_inputs_for_generation, prepare_inputs_for_predict_layout, construct, convert_map_dict, convert_name, convert_weight_dict
:members:
{% elif fullname=="mindformers.models.ChatGLM3Tokenizer" %}
@@ -203,7 +203,7 @@
{{ fullname | underline }}
.. autoclass:: {{ name }}
- :exclude-members: auto_register
+ :exclude-members: auto_register, get_instance_type_from_cfg
:members:
{% elif fullname=="mindformers.Trainer" %}
@@ -224,7 +224,14 @@
{{ fullname | underline }}
.. autoclass:: {{ name }}
- :exclude-members: epoch_begin, epoch_end, step_begin, step_end, on_train_epoch_begin, on_train_step_begin, on_train_step_end
+ :exclude-members: epoch_begin, epoch_end, step_begin, step_end, on_train_epoch_begin, on_train_step_begin, on_train_step_end, abnormal_global_norm_check
+ :members:
+
+{% elif fullname=="mindformers.dataset.CausalLanguageModelDataset" %}
+{{ fullname | underline }}
+
+.. autoclass:: {{ name }}
+ :exclude-members: perform_token_counting, construct
:members:
{% elif fullname in ["mindformers.AutoModelForCausalLM", "mindformers.AutoModelForZeroShotImageClassification", "mindformers.AutoModel"] %}
diff --git a/docs/mindformers/docs/source_en/advanced_development/api.rst b/docs/mindformers/docs/source_en/advanced_development/api.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f0accd105687587ec6e9a0ad6dce6c895bb0b8ff
--- /dev/null
+++ b/docs/mindformers/docs/source_en/advanced_development/api.rst
@@ -0,0 +1,17 @@
+API
+===========
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+
+ ../mindformers
+ ../mindformers.core
+ ../mindformers.dataset
+ ../mindformers.generation
+ ../mindformers.models
+ ../mindformers.modules
+ ../mindformers.pet
+ ../mindformers.pipeline
+ ../mindformers.tools
+ ../mindformers.wrapper
diff --git a/docs/mindformers/docs/source_en/usage/dev_migration.md b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
similarity index 91%
rename from docs/mindformers/docs/source_en/usage/dev_migration.md
rename to docs/mindformers/docs/source_en/advanced_development/dev_migration.md
index 429be46ea731e2d97102e38fca9fdc2697deea84..2fe7a74370262fec65f08b1c34b701c7a77b0d40 100644
--- a/docs/mindformers/docs/source_en/usage/dev_migration.md
+++ b/docs/mindformers/docs/source_en/advanced_development/dev_migration.md
@@ -1,6 +1,6 @@
# Development Migration
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/dev_migration.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/advanced_development/dev_migration.md)
This document describes how to develop and build foundation models based on MindSpore Transformers and complete basic adaptation to start the training and inference processes.
@@ -46,9 +46,9 @@ All tokenizer classes must be inherited from the PretrainedTokenizer or Pretrain
### Preparing a Weight and a Dataset
-If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html).
+If a PyTorch-based model weight already exists, you can convert the weight to that in the MindSpore format by referring to [Weight Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html).
-For details about how to prepare a dataset, see [Dataset](https://www.mindspore.cn/mindformers/docs/en/dev/function/dataset.html) or the model document, for example, [Llama2 Description Document > Dataset Preparation](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87).
+For details about how to prepare a dataset, see [Dataset](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html) or the model document, for example, [Llama2 Description Document > Dataset Preparation](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87).
### Preparing a `YAML` Configuration File
@@ -93,13 +93,13 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l
`register_path` is set to `research/llama3_1` (path of the directory where the external code is located). For details about how to prepare the model weight, see [Llama3.1 Description Document > Model Weight Download](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD).
-For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama2-7B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml).
+For details about the configuration file and configurable items, see [Configuration File Descriptions](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html). When compiling a configuration file, you can refer to an existing configuration file in the library, for example, [Llama2-7B fine-tuning configuration file](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml).
-After all the preceding basic elements are prepared, you can refer to other documents in the MindSpore Transformers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/acc_optimize/acc_optimize.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html).
+After all the preceding basic elements are prepared, you can refer to other documents in the MindSpore Transformers tutorial to perform model training, fine-tuning, and inference. For details about subsequent model debugging and optimization, see [Large Model Accuracy Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/precision_optimization.html) and [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html).
### Contributing Models to the MindSpore Transformers Open Source Repository
-You can contribute models to the MindSpore Transformers open source repository for developers to research and use. For details, see [MindSpore Transformers Contribution Guidelines](https://www.mindspore.cn/mindformers/docs/en/dev/faq/mindformers_contribution.html).
+You can contribute models to the MindSpore Transformers open source repository for developers to research and use. For details, see [MindSpore Transformers Contribution Guidelines](https://www.mindspore.cn/mindformers/docs/en/dev/contribution/mindformers_contribution.html).
## MindSpore Transformers Model Migration Practice
@@ -111,7 +111,7 @@ Llama3-8B and Llama2-7B have the same model structure but different model parame
The following compares the model configurations between Llama2-7B and Llama3-8B.
-
+
The differences are as follows:
diff --git a/docs/mindformers/docs/source_en/perf_optimize/images/cast.png b/docs/mindformers/docs/source_en/advanced_development/images/cast.png
similarity index 100%
rename from docs/mindformers/docs/source_en/perf_optimize/images/cast.png
rename to docs/mindformers/docs/source_en/advanced_development/images/cast.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/general_process.png b/docs/mindformers/docs/source_en/advanced_development/images/general_process.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/general_process.png
rename to docs/mindformers/docs/source_en/advanced_development/images/general_process.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/local_norm.png b/docs/mindformers/docs/source_en/advanced_development/images/local_norm.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/local_norm.png
rename to docs/mindformers/docs/source_en/advanced_development/images/local_norm.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss1.png b/docs/mindformers/docs/source_en/advanced_development/images/loss1.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss1.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss1.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss2.png b/docs/mindformers/docs/source_en/advanced_development/images/loss2.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss2.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss2.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss3.png b/docs/mindformers/docs/source_en/advanced_development/images/loss3.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss3.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss3.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss4.png b/docs/mindformers/docs/source_en/advanced_development/images/loss4.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss4.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss4.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss5.png b/docs/mindformers/docs/source_en/advanced_development/images/loss5.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss5.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss5.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss6.png b/docs/mindformers/docs/source_en/advanced_development/images/loss6.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss6.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss6.png
diff --git a/docs/mindformers/docs/source_en/acc_optimize/image/loss7.png b/docs/mindformers/docs/source_en/advanced_development/images/loss7.png
similarity index 100%
rename from docs/mindformers/docs/source_en/acc_optimize/image/loss7.png
rename to docs/mindformers/docs/source_en/advanced_development/images/loss7.png
diff --git a/docs/mindformers/docs/source_en/perf_optimize/images/mstx.png b/docs/mindformers/docs/source_en/advanced_development/images/mstx.png
similarity index 100%
rename from docs/mindformers/docs/source_en/perf_optimize/images/mstx.png
rename to docs/mindformers/docs/source_en/advanced_development/images/mstx.png
diff --git a/docs/mindformers/docs/source_zh_cn/usage/image/multi_modal.png b/docs/mindformers/docs/source_en/advanced_development/images/multi_modal.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/usage/image/multi_modal.png
rename to docs/mindformers/docs/source_en/advanced_development/images/multi_modal.png
diff --git a/docs/mindformers/docs/source_en/perf_optimize/images/reshape.png b/docs/mindformers/docs/source_en/advanced_development/images/reshape.png
similarity index 100%
rename from docs/mindformers/docs/source_en/perf_optimize/images/reshape.png
rename to docs/mindformers/docs/source_en/advanced_development/images/reshape.png
diff --git a/docs/mindformers/docs/source_en/perf_optimize/images/silu_mul.png b/docs/mindformers/docs/source_en/advanced_development/images/silu_mul.png
similarity index 100%
rename from docs/mindformers/docs/source_en/perf_optimize/images/silu_mul.png
rename to docs/mindformers/docs/source_en/advanced_development/images/silu_mul.png
diff --git a/docs/mindformers/docs/source_en/perf_optimize/images/studio.png b/docs/mindformers/docs/source_en/advanced_development/images/studio.png
similarity index 100%
rename from docs/mindformers/docs/source_en/perf_optimize/images/studio.png
rename to docs/mindformers/docs/source_en/advanced_development/images/studio.png
diff --git a/docs/mindformers/docs/source_en/usage/multi_modal.md b/docs/mindformers/docs/source_en/advanced_development/multi_modal_dev.md
similarity index 99%
rename from docs/mindformers/docs/source_en/usage/multi_modal.md
rename to docs/mindformers/docs/source_en/advanced_development/multi_modal_dev.md
index e3bee6fd68eb8182ac2032e410967c6dc33fe2de..de695523d8a4326fac0a67b1ec5701041f7f3468 100644
--- a/docs/mindformers/docs/source_en/usage/multi_modal.md
+++ b/docs/mindformers/docs/source_en/advanced_development/multi_modal_dev.md
@@ -1,6 +1,6 @@
# Multimodal Model Development
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/multi_modal.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/advanced_development/multi_modal_dev.md)
Multimodal models refer to artificial intelligence models capable of processing and combining information from different modalities (such as text, images, audio, video, etc.) for learning and inference. Traditional single-modality models typically focus on a single type of data, such as text classification models handling only text data or image recognition models handling only image data. In contrast, multimodal models integrate data from different sources to accomplish more complex tasks, enabling them to understand and generate richer and more comprehensive content.
@@ -66,7 +66,7 @@ During the training and inference of multimodal models, the data processing modu
Below is a flowchart of the multimodal data processing. The custom modules in the diagram need to be implemented by the user according to their specific requirements, while other modules can be directly invoked.
-
+
Then, using the [CogVLM2-Video model data preprocessing module](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/cogvlm2/cogvlm2_processor.py) as an example, we will introduce the functionality of the components of the multimodal data processing module.
@@ -322,7 +322,7 @@ Parameter Explanation:
After implementing the multimodal dataset, data processing modules, and multimodal model construction, you can start model pre-training, fine-tuning, inference, and other tasks by using the model configuration file. This requires creating the corresponding model configuration file.
-For specific model configuration files, refer to [predict_cogvlm2_video_llama3_chat_13b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/predict_cogvlm2_video_llama3_chat_13b.yaml) and [finetune_cogvlm2_video_llama3_chat_13b_lora.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/finetune_cogvlm2_video_llama3_chat_13b_lora.yaml), which correspond to model inference and fine-tuning, respectively. For the meaning of specific parameters, refer to the [configuration file documentation](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+For specific model configuration files, refer to [predict_cogvlm2_video_llama3_chat_13b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/predict_cogvlm2_video_llama3_chat_13b.yaml) and [finetune_cogvlm2_video_llama3_chat_13b_lora.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/finetune_cogvlm2_video_llama3_chat_13b_lora.yaml), which correspond to model inference and fine-tuning, respectively. For the meaning of specific parameters, refer to the [configuration file documentation](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
In the user-defined configuration file, sections such as `model`, `processor`, and `train_dataset` need to correspond to the user's custom **dataset**, **data processing module**, and **multimodal model**.
diff --git a/docs/mindformers/docs/source_en/perf_optimize/perf_optimize.md b/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md
similarity index 98%
rename from docs/mindformers/docs/source_en/perf_optimize/perf_optimize.md
rename to docs/mindformers/docs/source_en/advanced_development/performance_optimization.md
index 28be670f2cbc1ef75e4e7909cf0c8e0e039dda3b..96b262ee10fffd67f68ed1594d532f32d262152b 100644
--- a/docs/mindformers/docs/source_en/perf_optimize/perf_optimize.md
+++ b/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md
@@ -1,6 +1,6 @@
# Large Model Performance Optimization Guide
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/perf_optimize/perf_optimize.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/advanced_development/performance_optimization.md)
## Overview
@@ -64,7 +64,7 @@ Parallelism strategies are usually classified into various parallel modes:
In practice, multiple parallel strategies and multiple optimizations, such as using optimizer parallelism and recomputation, are usually employed to reduce the model's use of memory and improve training efficiency. Parallel strategy design is closely related to the efficiency of the model, and it is crucial to identify one or more sets of better parallel strategies before model tuning.
-For details, refer to [Parallel Strategy Guide](https://www.mindspore.cn/mindformers/docs/en/dev/function/distributed_parallel.html).
+For details, refer to [Parallel Strategy Guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html).
For models with different parameter count specifications, the following parallel strategy can be selected:
@@ -277,7 +277,7 @@ Click anywhere on the timeline page tree or graphical pane can be performed usin
#### IR Graph
-In the [MindSpore Transformers configuration file](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html), just turn on save_graphs, and the runtime will output some intermediate files ending with the .ir suffix generated during the graph compilation process, which we call IR files. By default, a directory of graphs will be generated in the current task execution directory, and all IR graphs will be saved in this. It is a relatively intuitive and easy to understand document describing the structure of the model in text format, which can be viewed directly with text editing software. Refer to [Config Configuration Description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html) for the meaning of the configuration items, and the configuration method is as follows:
+In the [MindSpore Transformers configuration file](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html), just turn on save_graphs, and the runtime will output some intermediate files ending with the .ir suffix generated during the graph compilation process, which we call IR files. By default, a directory of graphs will be generated in the current task execution directory, and all IR graphs will be saved in this. It is a relatively intuitive and easy to understand document describing the structure of the model in text format, which can be viewed directly with text editing software. Refer to [Config Configuration Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html) for the meaning of the configuration items, and the configuration method is as follows:
```yaml
context:
diff --git a/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md
similarity index 79%
rename from docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md
rename to docs/mindformers/docs/source_en/advanced_development/precision_optimization.md
index c1f02eb42438f2a653fbc75589795d5f5a245daf..a749593416d77845adf856739b5d5464ea38dc67 100644
--- a/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md
+++ b/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md
@@ -1,24 +1,24 @@
-# Large Model Accuracy Optimization Guide
+# Large Model Precision Optimization Guide
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/acc_optimize/acc_optimize.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/advanced_development/precision_optimization.md)
-## Overview and Scenarios of Accuracy Issues
+## Overview and Scenarios of Precision Issues
### Descriptions
-As the Ascend AI processor (hereinafter referred to as NPU) is widely used in deep learning, the MindSpore framework, which is developed natively based on the Ascend NPU, shows better performance advantages. During large-scale cluster training, the performance improvement will greatly save users the cost of large model development. Therefore, more and more users are gradually migrating their original training models to MindSpore. However, due to the differences in hardware and framework usage, users may encounter accuracy problems after completing the model migration.
+As the Ascend AI processor (hereinafter referred to as NPU) is widely used in deep learning, the MindSpore framework, which is developed natively based on the Ascend NPU, shows better performance advantages. During large-scale cluster training, the performance improvement will greatly save users the cost of large model development. Therefore, more and more users are gradually migrating their original training models to MindSpore. However, due to the differences in hardware and framework usage, users may encounter precision problems after completing the model migration.
-This paper summarizes the common accuracy problems in the training process of large models and general accuracy problem localization methods, and seeks to help users quickly troubleshoot accuracy problems and shorten the time for model accuracy problem localization. When starting the work on large model accuracy optimization, you should have the basic knowledge of large model. To avoid dispersion, this document will not explain the basic concepts related to large models and focus on the introduction of accuracy optimization.
+This paper summarizes the common precision problems in the training process of large models and general precision problem localization methods, and seeks to help users quickly troubleshoot precision problems and shorten the time for model precision problem localization. When starting the work on large model precision optimization, you should have the basic knowledge of large model. To avoid dispersion, this document will not explain the basic concepts related to large models and focus on the introduction of precision optimization.
### Categorized Summary of Common Problems
-Various accuracy problems often occur in large model training, and the common problems include that the loss fails to converge, the loss converges poorly, the loss fails to converge at the late stage of training, the accuracy overflows, and the loss can not be fitted to the benchmark in the process of descending. There can be a variety of reasons for these accuracy problems, including the structure of the model, the dataset, the hyperparameters, the precision of the forward and reverse computation, the calculation of the optimizer, the floating-point computational accuracy, and randomness.
+Various precision problems often occur in large model training, and the common problems include that the loss fails to converge, the loss converges poorly, the loss fails to converge at the late stage of training, the precision overflows, and the loss can not be fitted to the benchmark in the process of descending. There can be a variety of reasons for these precision problems, including the structure of the model, the dataset, the hyperparameters, the precision of the forward and reverse computation, the calculation of the optimizer, the floating-point computational precision, and randomness.
-When accuracy problems occur, the problem can be analyzed from the reasons for these accuracy problems. A quick troubleshooting based on CheckList is performed first, followed by parameter and weight alignment, fixed randomness and turning on deterministic calculations. Then the base problem is troubleshooted, and finally the anomalous step is troubleshooted by long stable training. At the current stage, this paper mainly introduces the general method of accuracy localization for the scenarios with accuracy benchmarks, and the content of accuracy problem localization without accuracy benchmarks will be added successively.
+When precision problems occur, the problem can be analyzed from the reasons for these precision problems. A quick troubleshooting based on CheckList is performed first, followed by parameter and weight alignment, fixed randomness and turning on deterministic calculations. Then the base problem is troubleshooted, and finally the anomalous step is troubleshooted by long stable training. At the current stage, this paper mainly introduces the general method of precision localization for the scenarios with precision benchmarks, and the content of precision problem localization without precision benchmarks will be added successively.
-## Accuracy Problems Location CheckList
+## Precision Problems Location CheckList
-Before locating the operator accuracy problem, we should first eliminate the interference of other non-operator factors. Combined with the previous precision positioning cases, the CheckList before precision positioning is summarized. In order to easier locate the problems, users can first carry out quick troubleshooting according to the CheckList.
+Before locating the operator precision problem, we should first eliminate the interference of other non-operator factors. Combined with the previous precision positioning cases, the CheckList before precision positioning is summarized. In order to easier locate the problems, users can first carry out quick troubleshooting according to the CheckList.
### Network Structure CheckList
@@ -34,7 +34,7 @@ Before locating the operator accuracy problem, we should first eliminate the int
| Regularization function | Regularization functions, common structures are LayerNorm, RMSNorm | The specified regularization function is used in MindSpore Transformers and cannot be modified by configuration. The configuration can be customized in Megatron by normalization to check for consistency. |
| rms_norm_eps | Regularized epsilon parameters | Correspond to the Megatron layernorm_epsilon parameter and check for consistency. |
| dropout | dropout in the network | Currently, when MindSpore enables dropout, recalculation cannot be enabled; if precision comparison is carried out, it is recommended that both sides be closed to reduce the random factor.|
-| Fusion computation | Common fusion operators include FA, ROPE, Norm, SwigLU; some users will fuse Wq, Wk, Wv for computation | 1. For accuracy comparison under the same hardware, if fusion algorithms are used, they should be consistent. 2. When comparing accuracy on different hardware, focus on checking whether there is any difference in the calculation of the fusion calculation part. |
+| Fusion computation | Common fusion operators include FA, ROPE, Norm, SwigLU; some users will fuse Wq, Wk, Wv for computation | 1. For precision comparison under the same hardware, if fusion algorithms are used, they should be consistent. 2. When comparing precision on different hardware, focus on checking whether there is any difference in the calculation of the fusion calculation part. |
#### MOE Structure
@@ -74,14 +74,14 @@ Before locating the operator accuracy problem, we should first eliminate the int
| **Key parameters** | **Descriptions** | **CheckList** |
| ----------------- | ----------------------------------------- |---------------------------------------|
-| compute_dtype | Compute accuracy | Megatron set `-bf16: true` to BF16, otherwise FP16. |
+| compute_dtype | Compute precision | Megatron set `-bf16: true` to BF16, otherwise FP16. |
| layernorm_compute_type | LayerNorm/RMSNorm compute precision | Megatron is not configurable, need to check that implementations are consistent. |
| softmax_compute_type | When MindSpore uses FA, the internal Softmax fix is calculated with FA. Type of calculation is configurable only for small arithmetic splicing implementations | Megatron is not configurable, needs to check if the implementation is consistent. |
-| rotary_dtype | Calculation accuracy of rotary position encoding | Megatron is not configurable, needs to check if the implementation is consistent. |
-| Calculation of weights | accuracy calculation for each weight such as, Embedding, lm_head | Since MindSpore Transformers weight initialization needs to be set to FP32, and the usual calculation precision is BF16/FP16, it is necessary to check whether the weight data type is converted to BF16/FP16 before weight calculation.|
-| bias add | bias in the linear layer | If bias is present, Linear layer checks consistency in the computational accuracy of add. |
-| residual add | sum of residuals | Check that the accuracy of the calculation of the residuals is consistent with the benchmarks |
-| loss | Loss Calculation Module | Check that the accuracy of the calculation in the entire loss module is consistent with the benchmarks |
+| rotary_dtype | Calculation precision of rotary position encoding | Megatron is not configurable, needs to check if the implementation is consistent. |
+| Calculation of weights | precision calculation for each weight such as, Embedding, lm_head | Since MindSpore Transformers weight initialization needs to be set to FP32, and the usual calculation precision is BF16/FP16, it is necessary to check whether the weight data type is converted to BF16/FP16 before weight calculation.|
+| bias add | bias in the linear layer | If bias is present, Linear layer checks consistency in the computational precision of add. |
+| residual add | sum of residuals | Check that the precision of the calculation of the residuals is consistent with the benchmarks |
+| loss | Loss Calculation Module | Check that the precision of the calculation in the entire loss module is consistent with the benchmarks |
| Operator High Precision Mode | Ascend Calculator supports high precision mode | Method: `context.set_context(ascend_config= {"ge_options":{ "global":{ "ge.opSelectImplmode":"high_precision" } } })` |
### Parallel Strategy CheckList
@@ -108,9 +108,9 @@ Before locating the operator accuracy problem, we should first eliminate the int
| Version Check | Check whether the versions of MindSpore, MindSpore Transformers and CANN are compatible, it is recommended to use the latest compatible version. |
| Differences with Open Source | MindSpore Transformers has supported the mainstream open source LLM models, and has been more fully tested. If you are developing based on the open source models in MindSpore Transformers, you can focus on checking the differences with the open source models in MindSpore Transformers. |
-## Introduction to Accuracy Debugging Tools
+## Introduction to Precision Debugging Tools
-In accuracy localization, MindSpore's Dump tool is mainly used. For details, please refer to [Dump Function Debugging](https://www.mindspore.cn/tutorials/en/master/debug/dump.html).
+In precision localization, MindSpore's Dump tool is mainly used. For details, please refer to [Dump Function Debugging](https://www.mindspore.cn/tutorials/en/master/debug/dump.html).
MindSpore's Dump tool is enabled by configuring a JSON file, which Dumps out all the operator data in the network, saving the tensor and statistics in the statistic.csv table. The following gives a JSON example of full operator Dump:
@@ -146,24 +146,24 @@ After setting the environment variables, start the program training to get the c
### Other Introductions
-In addition to the full amount of operator Dump introduced above, the tool also supports partial data Dump, overflow Dump, specified-condition Dump and so on. Limited to space, interested users can refer to [Dump function debugging](https://www.mindspore.cn/tutorials/en/master/debug/dump.html) for configuration and use. In addition, the msprobe precision debugging tool is provided. msprobe is a tool package under the precision debugging component of the MindStudio Training Tools suite. It mainly includes functions such as precision pre-check, overflow detection, and precision comparison. For more information, refer to [msprobe User Guide](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe).
+In addition to the full amount of operator Dump introduced above, the tool also supports partial data Dump, overflow Dump, specified-condition Dump and so on. Limited to space, interested users can refer to [Dump function debugging](https://www.mindspore.cn/tutorials/en/master/debug/dump.html) for configuration and use. In addition, the msprobe precision debugging tool is provided. msprobe is a tool package under the precision debugging component of the MindStudio Training Tools suite. It mainly includes functions such as precision pre-check, overflow detection, and precision comparison. For more information, refer to [msprobe User Guide](https://gitee.com/ascend/mstt/tree/master/debug/precision_tools/msprobe).
-## Generalized Processes for Accuracy Positioning
+## Generalized Processes for Precision Positioning
-Quickly troubleshoot the problem by using the [Accuracy Problems Location CheckList](#accuracy-problems-location-checklist) section. If the accuracy problem still exists after completing the CheckList and there is no obvious direction, you can narrow down the scope of the problem by using the accuracy location generic process in this section for further troubleshooting. The current generalized process is mainly for benchmarked scenarios, and the following section will take the scenario of comparing the accuracy of GPU+PyTorch and Ascend+MindSpore as an example to introduce the accuracy localization process.
+Quickly troubleshoot the problem by using the [Precision Problems Location CheckList](#precision-problems-location-checklist) section. If the precision problem still exists after completing the CheckList and there is no obvious direction, you can narrow down the scope of the problem by using the precision location generic process in this section for further troubleshooting. The current generalized process is mainly for benchmarked scenarios, and the following section will take the scenario of comparing the precision of GPU+PyTorch and Ascend+MindSpore as an example to introduce the precision localization process.
There are two main ideas for problem positioning:
* Simplified training scenarios based on single card/standalone, small-scale model replication problems.
-* Fix the random factor and compare the loss difference with the benchmark during training to locate the cause of the accuracy difference.
+* Fix the random factor and compare the loss difference with the benchmark during training to locate the cause of the precision difference.
The training process of the model can be decomposed into the following processes: data input, forward computation, loss, backward computation, gradient, optimizer weight update, and next step. The following will describe how to rank each stage of the training in conjunction with the flow of the following figure.
-
+
### Stage 1: Pre-training Preparation
-Conducting accuracy comparison between GPU+PyTorch and Ascend+MindSpore requires simplifying the scenario and fixing the randomness before reproducing the problem. There are three main parts as follows:
+Conducting precision comparison between GPU+PyTorch and Ascend+MindSpore requires simplifying the scenario and fixing the randomness before reproducing the problem. There are three main parts as follows:
* Aligning parameters, downsizing models, single-card/stand-alone reproduction problems;
@@ -187,7 +187,7 @@ Since features such as model parallelism, flow parallelism, sequence parallelism
#### Weight Conversion
-During training, MindSpore is loaded with the same weights as PyTorch. In case of pre-training scenarios, you can use PyTorch to save an initialized weight and then convert it to MindSpore weights. Because MindSpore weight names differ from PyTorch, the essence of weight conversion is to change the names in the PyTorch weight dict to MindSpore weight names to support MindSpore loading. Refer to [weight conversion guide](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html) for weight conversion.
+During training, MindSpore is loaded with the same weights as PyTorch. In case of pre-training scenarios, you can use PyTorch to save an initialized weight and then convert it to MindSpore weights. Because MindSpore weight names differ from PyTorch, the essence of weight conversion is to change the names in the PyTorch weight dict to MindSpore weight names to support MindSpore loading. Refer to [weight conversion guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html) for weight conversion.
Both MindSpore and PyTorch support `bin` format data, loading the same dataset for training ensures consistency from step to step.
@@ -254,7 +254,7 @@ By comparing the loss and local norm of the first step (step1) and the second st
#### Comparison of Step1 Losses
-After fixing the weights, dataset, and randomness, the difference in the loss value of the first step of training is compared. The loss value of the first step is obtained from the forward computation of the network. If the difference with the benchmark loss is large, it can be determined that there is an accuracy difference in the forward computation, which may be due to the model structure is not aligned, and the accuracy of the operator is abnormal. The tensor values of each layer of MindSpore and PyTorch can be obtained by printing or Dump tool. Currently, the tool does not have automatic comparison function, users need to manually identify the correspondence for comparison. For the introduction of MindSpore Dump tool, please refer to [Introduction of Accuracy Debugging Tools](#introduction-to-accuracy-debugging-tools), and for the use of PyTorch Dump tool, please refer to [Function Explanation of Accuracy Tools](https://gitee.com/ascend/mstt/blob/master/debug/accuracy_tools/msprobe/docs/05.data_dump_PyTorch.md).
+After fixing the weights, dataset, and randomness, the difference in the loss value of the first step of training is compared. The loss value of the first step is obtained from the forward computation of the network. If the difference with the benchmark loss is large, it can be determined that there is an precision difference in the forward computation, which may be due to the model structure is not aligned, and the precision of the operator is abnormal. The tensor values of each layer of MindSpore and PyTorch can be obtained by printing or Dump tool. Currently, the tool does not have automatic comparison function, users need to manually identify the correspondence for comparison. For the introduction of MindSpore Dump tool, please refer to [Introduction of Precision Debugging Tools](#introduction-to-precision-debugging-tools), and for the use of PyTorch Dump tool, please refer to [Function Explanation of Precision Tools](https://gitee.com/ascend/mstt/blob/master/debug/precision_tools/msprobe/docs/05.data_dump_PyTorch.md).
Find the correspondence of layers through PyTorch api_stack_dump.pkl file, and MindSpore statistic.csv file, and initially determine the degree of difference between input and output through max, min, and L2Norm. If you need further comparison, you can load the corresponding npy data for detailed comparison.
@@ -299,9 +299,9 @@ def get_parameters(self):
Below is an example of a local norm comparison, comparing the local norm values corresponding to the weights.
-
+
-It can be found that in the scenario shown in this figure, the local norm value of model.tok_embeddings.embedding_weight has a large difference, which can be focused on troubleshooting the implementation of the Embedding and the calculation accuracy, etc.
+It can be found that in the scenario shown in this figure, the local norm value of model.tok_embeddings.embedding_weight has a large difference, which can be focused on troubleshooting the implementation of the Embedding and the calculation precision, etc.
The local norm value only serves as a preliminary judgment of whether the reverse computation is correct, if we want to compare the reverse computation in depth, we need to compare the MindSpore and PyTorch reverse computation values layer by layer by using the Dump tool.
@@ -377,7 +377,7 @@ Before the training of weight update, it is necessary to confirm the benchmark e
The learning rate is set > 0, the weights are updated, and the long stability test is performed. The training to a certain step appeared the phenomenon of large differences in the loss, after which the training loss began to diverge, as shown in Fig:
-
+
In this scenario, the training before and after the mutation can be targeted for troubleshooting, and the following troubleshooting can be tried:
@@ -391,50 +391,50 @@ In this scenario, the training before and after the mutation can be targeted for
It is also possible to have a better fit in the early part of the training period and a large difference in the convergence loss in the later part of the training period in the long stability test, as shown in Fig:
-
+
In this scenario, troubleshooting can be done from the following perspectives:
* Examine whether the parameters are aligned: focus on examining the parameters related to the optimizer, such as the optimizer type, learning rate, weight decay. We can compare whether the change of learning rate during training is consistent by drawing diagrams, and we also need to confirm whether the weight of weight decay is consistent with the benchmark.
-* Mixed accuracy checking: through the Dump tool, carefully check whether the mixed accuracy is consistent with the benchmark in the calculation process;
+* Mixed precision checking: through the Dump tool, carefully check whether the mixed precision is consistent with the benchmark in the calculation process;
-* If there is a difference in the loss at convergence, but the difference is small, such as less than 1%, the accuracy acceptance can be performed by evaluating the downstream tasks.
+* If there is a difference in the loss at convergence, but the difference is small, such as less than 1%, the precision acceptance can be performed by evaluating the downstream tasks.
#### Scenario Expansion
-After completing the single-card alignment, gradually expand from single-card to multi-card testing and cluster testing; model size and related features such as model parallelism, flow parallelism, optimizer parallelism are added as appropriate. Gradually expand from simple scenarios to actual training scenarios, so as to troubleshoot the impact of the added features on the accuracy.
+After completing the single-card alignment, gradually expand from single-card to multi-card testing and cluster testing; model size and related features such as model parallelism, flow parallelism, optimizer parallelism are added as appropriate. Gradually expand from simple scenarios to actual training scenarios, so as to troubleshoot the impact of the added features on the precision.
-### Large Model Migration Accuracy Standard
+### Large Model Migration Precision Standard
-Accuracy standard for large model migration refers to the accuracy standard set for key indicators to ensure that the model accuracy before and after migration is basically the same after migrating the models trained by other third-party hardware or frameworks to MindSpore and Ascend Hardware. It is summarized based on the actual migration scenarios of MindSpore's large models for developers' reference. Since the accuracy of large models is strongly related to the application domain, model structure, number of parameters, and hyperparameters, and is not fully interpretable, there is no complete and unified mandatory standard. Therefore, this standard is only used as a reference standard to help users make a basic judgment on the accuracy of model migration.
+Precision standard for large model migration refers to the precision standard set for key indicators to ensure that the model precision before and after migration is basically the same after migrating the models trained by other third-party hardware or frameworks to MindSpore and Ascend Hardware. It is summarized based on the actual migration scenarios of MindSpore's large models for developers' reference. Since the precision of large models is strongly related to the application domain, model structure, number of parameters, and hyperparameters, and is not fully interpretable, there is no complete and unified mandatory standard. Therefore, this standard is only used as a reference standard to help users make a basic judgment on the precision of model migration.
-#### Accuracy Standard Specifications
+#### Precision Standard Specifications
1. Relative discrepancy is uniformly described as a percentage (x.x%) and absolute discrepancy is uniformly described as a decimal (0.xx);
-2. If the accuracy fluctuations of the third-party model training no longer meet this accuracy standard, the original model should be adequately tested and the standard should be relaxed in accordance with the fluctuations of the original model;
+2. If the precision fluctuations of the third-party model training no longer meet this precision standard, the original model should be adequately tested and the standard should be relaxed in accordance with the fluctuations of the original model;
#### Default Configuration
| Classes | Default Values | Descriptions |
|--------------------|------|-------------------------------|
| Dataset | [pretrain] wikitext-103 [sft] alpaca | |
-| Accuracy mode | BF16 | Mixed-accuracy configurations are consistent, and distinguish between actual FP32/FP16/BF16 configurations for each API in the network. |
+| Precision mode | BF16 | Mixed-precision configurations are consistent, and distinguish between actual FP32/FP16/BF16 configurations for each API in the network. |
| Parallel method | Data parallel | The parallelism can be adjusted according to the computational resources. |
| Cluster size | Stand-alone 8 cards | Can be adjusted according to the computational resources. |
-| checkpoint | [pretrain] Script initialization by default [sft]Loading pre-training weights | ckpt has a large impact on the accuracy metrics, prioritizing weights with small fluctuations in loss and a clear downward trend in overall loss.|
-|determinism|Turn on|The accuracy indicator determination phase can turn off determinism. The comparison phase needs to turn on determinism in order to minimize random error interference.|
+| checkpoint | [pretrain] Script initialization by default [sft]Loading pre-training weights | ckpt has a large impact on the precision metrics, prioritizing weights with small fluctuations in loss and a clear downward trend in overall loss.|
+|determinism|Turn on|The precision indicator determination phase can turn off determinism. The comparison phase needs to turn on determinism in order to minimize random error interference.|
-#### Accuracy Standard Indicator
+#### Precision Standard Indicator
* Test Standard
1. Without user's special designation, the default continuous observation is 5000 steps or 12 hours, the number of steps can be reduced according to the resource situation, but it is not recommended to be less than 1000 steps.
2. Load the same weights, keep all hyperparameters configured the same, and turn off all randomness.
- 3. The fluctuation of indicators such as loss is greatly influenced by the model, weights, and hyperparameters, and the combination with smooth loss fluctuation is preferred as a benchmark to reduce the judgment of random fluctuation on the accuracy results.
- 4. The randomness of the third-party model was adequately tested by repeating the experiment at least 2 times with determinism turned off and observing the range of fluctuations in the accuracy metrics.
+ 3. The fluctuation of indicators such as loss is greatly influenced by the model, weights, and hyperparameters, and the combination with smooth loss fluctuation is preferred as a benchmark to reduce the judgment of random fluctuation on the precision results.
+ 4. The randomness of the third-party model was adequately tested by repeating the experiment at least 2 times with determinism turned off and observing the range of fluctuations in the precision metrics.
-* loss Accuracy Standard
+* loss Precision Standard
1. The absolute error of first loss is less than 0.005, or the relative error is less than 0.5%.
2. The average absolute error is less than 0.01, or the average relative error is less than 1%.
@@ -445,13 +445,13 @@ Accuracy standard for large model migration refers to the accuracy standard set
### Case Details
-This section will introduce the completion of accuracy ranking based on the above accuracy localization process with practical examples.
+This section will introduce the completion of precision ranking based on the above precision localization process with practical examples.
#### Problem Phenomenon
Training the model with a 128-card cluster and comparing training with Ascend+MindSpore training with GPU+PyTorch training reveals that the late training convergence loss is about 0.1 higher than GPU+PyTorch. As shown in the figure, the convergence is not as expected:
-
+
The red line is the Ascend+MindSpore training curve and the blue line is the GPU+PyTorch training curve.
@@ -461,32 +461,32 @@ Before locating the problem, check against the CheckList to confirm that there i
First the loss alignment of step1 is confirmed to be OK. Comparing the local norm of step1 and calculating the difference between the local norm value of each weight and the benchmark, it is found that the local norm value of Embedding weight has a large difference with the benchmark.
-
+
The reason for this is that MindSpore Transformers uses FP32 for weight initialization, and FP32 precision is used for both forward and backward Embedding calculations, while PyTorch forward and backward calculations are BF16, which leads to differences in the calculated local norm values.
-Once the computational accuracy is aligned, the exhaustive optimizer computation is also fine, and the long stable training alignment starts.
+Once the computational precision is aligned, the exhaustive optimizer computation is also fine, and the long stable training alignment starts.
The long stable training exhaustion will be extended from single card experiments to multi-card experiments by first setting the LEARNING RATE=0, i.e., the weights are not updated. Forward computation of the loss difference of each step is around 0.001, and the forward computation error is as expected. The difference of global norm of each step is about 0.05, and the difference of reverse calculation is not significant. It is initially judged that the model migration code is correct, the model structure is consistent, and the difference of forward and reverse calculation is not significant.
-
+
Re-weight update, single card training, set learning rate=1e-5, train 1k steps. Convergence late loss has a steady 0.1 difference, reproducing the problem.
-
+
Perform problem troubleshooting. Identify the following problems:
-* Identify inconsistencies in computational accuracy during training through Dump file exclusion, and harmonize inconsistencies.
+* Identify inconsistencies in computational precision during training through Dump file exclusion, and harmonize inconsistencies.
* Weight decay implementation is inconsistent, weight decay is performed on all weights in user PyTorch network. bias weights and one-dimensional weights in MindSpore Transformers do not have weight decay by default.
-After fixing the problem, experiment again, train 10,000 steps, the loss difference fluctuates around the 0 axis and is less than 0.03, the accuracy meets the expectation, and the single-card accuracy is aligned.
+After fixing the problem, experiment again, train 10,000 steps, the loss difference fluctuates around the 0 axis and is less than 0.03, the precision meets the expectation, and the single-card precision is aligned.
After completing the single card training, start the multi-card training test: set the learning rate=1e-5, train 1,000 steps. convergence is consistent in the late stage of training, but there is a stable 0.05 error in the middle stage of training.
-
+
To verify that this error is within reasonable limits, the deterministic computation was turned off and the GPU experiment was run twice repeatedly. The red line in the figure is the curve of MindSpore training, and the blue and green lines are the curves of the first and second GPU training, respectively. At the training instability around 7,000 steps, the curve of MindSpore training is right between the curves of the two GPU trainings, indicating that the error is within a reasonable range and the problem is finally solved.
-
+
diff --git a/docs/mindformers/docs/source_en/usage/pretrain_gpt.md b/docs/mindformers/docs/source_en/advanced_development/pretrain_gpt.md
similarity index 99%
rename from docs/mindformers/docs/source_en/usage/pretrain_gpt.md
rename to docs/mindformers/docs/source_en/advanced_development/pretrain_gpt.md
index ff52ba7dcb2149248241f9b5e6da458253607a56..1c24ba53a06c651a4f1317f199683ba3ca884a41 100644
--- a/docs/mindformers/docs/source_en/usage/pretrain_gpt.md
+++ b/docs/mindformers/docs/source_en/advanced_development/pretrain_gpt.md
@@ -1,6 +1,6 @@
# Dynamic Graph Parallelism
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/pretrain_gpt.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/advanced_development/pretrain_gpt.md)
## Overview
diff --git a/docs/mindformers/docs/source_en/faq/mindformers_contribution.md b/docs/mindformers/docs/source_en/contribution/mindformers_contribution.md
similarity index 98%
rename from docs/mindformers/docs/source_en/faq/mindformers_contribution.md
rename to docs/mindformers/docs/source_en/contribution/mindformers_contribution.md
index 670eb56fd291cf20c00ab8e292bd85a6af7f8cc9..24ca3cf2088ba5147ea380e1450c0bc694a6b074 100644
--- a/docs/mindformers/docs/source_en/faq/mindformers_contribution.md
+++ b/docs/mindformers/docs/source_en/contribution/mindformers_contribution.md
@@ -1,6 +1,6 @@
# MindSpore Transformers Contribution Guidelines
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/faq/mindformers_contribution.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/contribution/mindformers_contribution.md)
## Contributing Code to MindSpore Transformers
diff --git a/docs/mindformers/docs/source_en/faq/modelers_contribution.md b/docs/mindformers/docs/source_en/contribution/modelers_contribution.md
similarity index 98%
rename from docs/mindformers/docs/source_en/faq/modelers_contribution.md
rename to docs/mindformers/docs/source_en/contribution/modelers_contribution.md
index e94dfab412aa63f200982353e472802812f07b50..bf7b6585a578ce08be3301972c2ec4b24f32472d 100644
--- a/docs/mindformers/docs/source_en/faq/modelers_contribution.md
+++ b/docs/mindformers/docs/source_en/contribution/modelers_contribution.md
@@ -1,6 +1,6 @@
# Modelers Contribution Guidelines
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/faq/modelers_contribution.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/contribution/modelers_contribution.md)
## Upload a Model to the Modelers Community
diff --git a/docs/mindformers/docs/source_en/appendix/env_variables.md b/docs/mindformers/docs/source_en/env_variables.md
similarity index 94%
rename from docs/mindformers/docs/source_en/appendix/env_variables.md
rename to docs/mindformers/docs/source_en/env_variables.md
index b34c37e0c9c89134b7d22f341c78c1c5caefc2fd..c23acc5123497e032286e9e16cf6a3135da3802a 100644
--- a/docs/mindformers/docs/source_en/appendix/env_variables.md
+++ b/docs/mindformers/docs/source_en/env_variables.md
@@ -1,6 +1,6 @@
# Environment Variable Descriptions
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/appendix/env_variables.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/env_variables.md)
The following environment variables are supported by MindSpore Transformers.
@@ -14,7 +14,7 @@ The following environment variables are supported by MindSpore Transformers.
| **ASCEND_LAUNCH_BLOCKING** | 0 | training or online inference scenarios, this environment variable can be used to control whether synchronization mode is activated during operator execution. | `1`: synchronized mode is mandatory; `0`: synchronized mode is optional. | Since the default operator executes asynchronously during NPU model training, when an error is reported during operator execution, the error stack information printed is not the actual call stack information. When set to `1`, synchronized mode is mandatory, which prints the correct call stack information and makes it easier to debug and locate problems in the code. Setting it to `1` provides more efficient arithmetic. |
| **TE_PARALLEL_COMPILER** | 8 | The number of threads on which the operator is compiled in parallel. Enables parallel compilation when greater than 1. | Takes a positive integer;Maximum number of cpu cores\*80%/number of Ascend AI processors, value range 1~32, default value is 8. | When the network model is large, parallel compilation of the operator can be turned on by configuring this environment variable; setting it to `1` for single-threaded compilation simplifies the difficulty when debugging. |
| **CPU_AFFINITY** | 0 | Turn on the CPU affinity switch, thus ensuring that each process or thread is bound to a single CPU core to improve performance. | `1`: turn on the CPU affinity switch; `0`: turn off the CPU affinity switch. | CPU affinity is turned off by default for **optimized resource utilization** and **energy saving**. |
-| **MS_MEMORY_STATISTIC** | 0 | Memory Statistics. | `1`: turn on memory statistics; `0`: turn off memory statistics. | During memory analysis, basic memory usage can be counted. You can refer to [Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html) for details. |
+| **MS_MEMORY_STATISTIC** | 0 | Memory Statistics. | `1`: turn on memory statistics; `0`: turn off memory statistics. | During memory analysis, basic memory usage can be counted. You can refer to [Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html) for details. |
| **MINDSPORE_DUMP_CONFIG** | NA | Specify the path to the configuration file that the [cloud-side Dump function](https://www.mindspore.cn/tutorials/en/master/debug/dump.html) or [end-side Dump function](https://www.mindspore.cn/lite/docs/en/master/tools/benchmark_tool.html#dump) depends on. | File path, support relative path and absolute path. |
| **GLOG_v** | 3 | Controls the level of MindSpore logs. | `0`: DEBUG `1`: INFO `2`: WARNING `3`: ERROR: indicates that an error has been reported in the execution of the program, an error log is output, and the program may not be terminated; `4`: CRITICAL, indicates that an exception has occurred in the execution of the program, and the execution of the program will be terminated. |
| **ASCEND_GLOBAL_LOG_LEVEL** | 3 | Controls the logging level of CANN. | `0`: DEBUG `1`: INFO `2`: WARNING `3`: ERROR `4`: NULL, no log is output. |
@@ -40,4 +40,4 @@ The following environment variables are supported by MindSpore Transformers.
| **MS_ENABLE_FA_FLATTEN** | on | Controls whether support FlashAttention flatten optimization. | `on`: Enable FlashAttention flatten optimization; `off`: Disable FlashAttention flatten optimization. | Provide a fallback mechanism for models that have not yet been adapted to FlashAttention flatten optimization. |
| **EXPERIMENTAL_KERNEL_LAUNCH_GROUP** | NA | Control whether to support the batch parallel submission of operators. If supported, enable the parallel submission and configure the number of parallel submissions. | `thread_num`: The number of concurrent threads is not recommended to be increased. The default value is 2; `kernel_group_num`: Total number of operator groups, 'kernel_group_num/thread_num' groups per thread, default is' 8 '. | This feature will continue to evolve in the future, and the subsequent behavior may change. Currently, only the `deepseek` reasoning scenario is supported, with certain performance optimization, but other models using this feature may deteriorate, and users need to use it with caution, as follows:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`. |
| **FORCE_EAGER** | False | Control whether to disable jit mode. | `False`: Enable jit mode; `True`: Do not enable jit mode. | Jit compiles functions into a callable MindSpore graph, sets FORCE_EAGER to False to enable jit mode, which can generate performance benefits. Currently, only inference mode is supported. |
-| **MS_ENABLE_TFT** | NA | Enable [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) feature. Turn on TTP, UCE or ARF feature. | The value of the environment variable can be:"{TTP:1,UCE:1,ARF:1}", when using a certain feature, the corresponding field can be configured as "1". | Usage can refer to [High Availability](https://www.mindspore.cn/mindformers/docs/en/dev/function/high_availability.html). |
\ No newline at end of file
+| **MS_ENABLE_TFT** | NA | Enable [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) feature. Turn on TTP, UCE, ARF or TRE feature. | The value of the environment variable can be:"{TTP:1,UCE:1,ARF:1,TRE:1}", when using a certain feature, the corresponding field can be configured as "1". | Usage can refer to [High Availability](https://www.mindspore.cn/mindformers/docs/en/dev/function/high_availability.html). |
diff --git a/docs/mindformers/docs/source_en/faq/func_related.md b/docs/mindformers/docs/source_en/faq/feature_related.md
similarity index 88%
rename from docs/mindformers/docs/source_en/faq/func_related.md
rename to docs/mindformers/docs/source_en/faq/feature_related.md
index 7c61f181338b9d303b65143fe71cafdf78adcb98..cdf536564efa1325434f4d3efddcc9e004e42d5e 100644
--- a/docs/mindformers/docs/source_en/faq/func_related.md
+++ b/docs/mindformers/docs/source_en/faq/feature_related.md
@@ -1,6 +1,6 @@
-# Function-Related
+# Feature-Related FAQ
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/faq/func_related.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/faq/feature_related.md)
## Q: The WikiText dataset download link is not available.
@@ -10,7 +10,7 @@ A: The official download link is not available, please follow the community Issu
## Q: How Do I Generate a Model Sharding Strategy File?
-A: The model sharding strategy file documents the sharding strategy for model weights in distributed scenarios and is generally used when slicing weights offline. Configure `only_save_strategy: True` in the network `yaml` file, and then start the distributed task normally, then the distributed strategy file can be generated in the `output/strategy/` directory. For details, please refer to the [Tutorial on Slicing and Merging Distributed Weights](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html).
+A: The model sharding strategy file documents the sharding strategy for model weights in distributed scenarios and is generally used when slicing weights offline. Configure `only_save_strategy: True` in the network `yaml` file, and then start the distributed task normally, then the distributed strategy file can be generated in the `output/strategy/` directory. For details, please refer to the [Tutorial on Slicing and Merging Distributed Weights](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html).
diff --git a/docs/mindformers/docs/source_en/faq/model_related.md b/docs/mindformers/docs/source_en/faq/model_related.md
index 157196c3327fd485d84d8b37c87a4ad29d4e112c..07bc072e30283ffdd98dad4a4a61c3d70986bdce 100644
--- a/docs/mindformers/docs/source_en/faq/model_related.md
+++ b/docs/mindformers/docs/source_en/faq/model_related.md
@@ -1,4 +1,4 @@
-# Model-Related
+# Model-Related FAQ
[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/faq/model_related.md)
diff --git a/docs/mindformers/docs/source_en/appendix/conf_files.md b/docs/mindformers/docs/source_en/feature/configuration.md
similarity index 98%
rename from docs/mindformers/docs/source_en/appendix/conf_files.md
rename to docs/mindformers/docs/source_en/feature/configuration.md
index e8b7abba33c3abe0c355a9411fe9914173b9b9ad..9b46c7089a39b3b4bff9e48953f9d6aa735ded63 100644
--- a/docs/mindformers/docs/source_en/appendix/conf_files.md
+++ b/docs/mindformers/docs/source_en/feature/configuration.md
@@ -1,6 +1,6 @@
# Configuration File Descriptions
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/appendix/conf_files.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/configuration.md)
## Overview
@@ -19,9 +19,9 @@ The basic configuration is mainly used to specify MindSpore random seeds and rel
| seed | Set the global seed. For details, refer to [mindspore.set_seed](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.set_seed.html). | int |
| run_mode | Set the running mode of the model: `train`, `finetune`, `eval` or `predict`. | str |
| output_dir | Set the path where log, checkpoint, strategy, etc. files are saved. | str |
-| load_checkpoint | File or folder paths for loading weights. Currently there are 3 application scenarios 1. Support for passing in full weight file paths. 2. Support for passing in offline sliced weight folder paths. 3. Support for passing in folder paths containing lora weights and base weights Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html) for the ways of obtaining various weights. | str |
-| auto_trans_ckpt | Enable distributed weight auto slicing and merging. Refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html). | bool |
-| resume_training | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/function/resume_training.html#resumable-training). | bool |
+| load_checkpoint | File or folder paths for loading weights. Currently there are 3 application scenarios 1. Support for passing in full weight file paths. 2. Support for passing in offline sliced weight folder paths. 3. Support for passing in folder paths containing lora weights and base weights Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html) for the ways of obtaining various weights. | str |
+| auto_trans_ckpt | Enable distributed weight auto slicing and merging. Refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html). | bool |
+| resume_training | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html#resumable-training). | bool |
| load_ckpt_format | The format of loading checkpoint, either `ckpt` or `safetensors`. | str |
| remove_redundancy | Whether the checkpoint has removed redundancy while loading checkpoint. The default value is `False`. | bool |
| train_precision_sync | Switching on or off deterministic computation of the training process. The default value is `None`. | Optional[bool] |
@@ -140,7 +140,7 @@ When starting model training, in addition to model-related parameters, you also
### Parallel Configuration
-In order to improve the performance of the model, it is usually necessary to configure the parallelism strategy for the model in large-scale cluster usage scenarios. For details, please refer to [Distributed Parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/function/distributed_parallel.html), the parallel configuration in MindSpore Transformers is as follows.
+In order to improve the performance of the model, it is usually necessary to configure the parallelism strategy for the model in large-scale cluster usage scenarios. For details, please refer to [Distributed Parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), the parallel configuration in MindSpore Transformers is as follows.
| Parameters | Descriptions | Types |
|-----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
@@ -174,7 +174,7 @@ In order to improve the performance of the model, it is usually necessary to con
### Model Optimization Configuration
-1. MindSpore Transformers provides recomputation-related configurations to reduce the memory footprint of the model during training, see [Recomputation](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html#recomputation) for details.
+1. MindSpore Transformers provides recomputation-related configurations to reduce the memory footprint of the model during training, see [Recomputation](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html#recomputation) for details.
| Parameters | Descriptions | Types |
|----------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------|
@@ -186,7 +186,7 @@ In order to improve the performance of the model, it is usually necessary to con
| recompute_config.select_recompute_exclude | Disable recomputation for the specified operator, valid only for the Primitive operators. | bool/list |
| recompute_config.select_comm_recompute_exclude | Disable communication recomputation for the specified operator, valid only for the Primitive operators. | bool/list |
-2. MindSpore Transformers provides fine-grained activations SWAP-related configurations to reduce the memory footprint of the model during training, see [Fine-Grained Activations SWAP](https://www.mindspore.cn/mindformers/docs/en/dev/function/fine_grained_activations_swap.html) for details.
+2. MindSpore Transformers provides fine-grained activations SWAP-related configurations to reduce the memory footprint of the model during training, see [Fine-Grained Activations SWAP](https://www.mindspore.cn/mindformers/docs/en/dev/feature/memory_optimization.html#fine-grained-activations-swap) for details.
| Parameters | Descriptions | Types |
|----------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------|
@@ -280,7 +280,7 @@ MindSpore Transformers provides model evaluation function, and also supports mod
### Profile Configuration
-MindSpore Transformers provides Profile as the main tool for model performance tuning, please refer to [Performance Tuning Guide](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html) for more details. The following is the Profile related configuration.
+MindSpore Transformers provides Profile as the main tool for model performance tuning, please refer to [Performance Tuning Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html) for more details. The following is the Profile related configuration.
| Parameters | Descriptions | Types |
|-----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|
@@ -300,7 +300,7 @@ MindSpore Transformers provides Profile as the main tool for model performance t
### Metric Monitoring Configuration
-The metric monitoring configuration is primarily used to configure methods to record metrics during training, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/dev/function/monitor.html) for more details.Below is a description of the common metric monitoring configuration options in MindSpore Transformers:
+The metric monitoring configuration is primarily used to configure methods to record metrics during training, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/dev/feature/monitor.html) for more details.Below is a description of the common metric monitoring configuration options in MindSpore Transformers:
| Parameters | Descriptions | Types |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
@@ -320,7 +320,7 @@ The metric monitoring configuration is primarily used to configure methods to re
### TensorBoard Configuration
-The TensorBoard configuration is primarily used to configure parameters related to TensorBoard during training, allowing for real-time monitoring and visualization of training metrics, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/dev/function/monitor.html) for more details. Below is a description of the common TensorBoard configuration options in MindSpore Transformers:
+The TensorBoard configuration is primarily used to configure parameters related to TensorBoard during training, allowing for real-time monitoring and visualization of training metrics, please refer to [Training Metrics Monitoring](https://www.mindspore.cn/mindformers/docs/en/dev/feature/monitor.html) for more details. Below is a description of the common TensorBoard configuration options in MindSpore Transformers:
| Parameters | Descriptions | Types |
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
diff --git a/docs/mindformers/docs/source_en/function/dataset.md b/docs/mindformers/docs/source_en/feature/dataset.md
similarity index 99%
rename from docs/mindformers/docs/source_en/function/dataset.md
rename to docs/mindformers/docs/source_en/feature/dataset.md
index 9a4ac14ed082c26128558d9bf5e167db0097e5fb..86a2c3ca070e2df787ae83dee5d62bd60038d806 100644
--- a/docs/mindformers/docs/source_en/function/dataset.md
+++ b/docs/mindformers/docs/source_en/feature/dataset.md
@@ -1,6 +1,6 @@
# Dataset
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/dataset.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/dataset.md)
MindSpore Transformers currently supports multiple types of dataset loading methods, covering common open-source and custom scenarios. Specifically, it includes:
@@ -186,7 +186,7 @@ The following explains how to configure and use Megatron datasets in the configu
| eod | Token ID of the EOD token in the dataset |
| pad | Token ID of the pad token in the dataset |
- In addition, the Megatron dataset also depends on configurations such as `input_columns`, `construct_args_key`, and `full_batch`. For more details, refer to the [configuration file documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html).
+ In addition, the Megatron dataset also depends on configurations such as `input_columns`, `construct_args_key`, and `full_batch`. For more details, refer to the [configuration file documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html).
Here, we only explain how to configure them in different scenarios:
@@ -279,7 +279,7 @@ HuggingFace datasets support online and offline loading of datasets from both th
#### Dataset Loading Process
-
+
The online dataset loading and processing functionality is primarily implemented through `CommonDataLoader`. The data loading part can be customized via configuration files, with detailed configuration instructions available in the [dataloader parameter description](#dataloader-parameter-description). The online loading module requires users to implement customizations for different datasets. For example, the `AlpacaInstructDataHandler` class can be used to preprocess the `alpaca` dataset. For more information, please refer to [Custom Data Handler](#custom-data-handler).
@@ -393,7 +393,7 @@ When packing is configured, the dataset returns an `actual_seq_len` column. For
prefetch_size: 1
```
- 1. For parameter descriptions in `train_dataset`, please refer to the [documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html).
+ 1. For parameter descriptions in `train_dataset`, please refer to the [documentation](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html).
2. `AlpacaInstructDataHandler` is an online processing script developed for the `alpaca` dataset. If using a different dataset, you need to implement a custom data handler by referring to the [Custom Data Handler](#custom-data-handler) guide.
@@ -510,7 +510,7 @@ Users can define custom data handlers to apply various preprocessing logic to th
prefetch_size: 1
```
- The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+ The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
Custom data handler:
@@ -626,7 +626,7 @@ Users can define custom data handlers to apply various preprocessing logic to th
seed: 0
```
- The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+ The rest of the parameters can be described in "model training configuration" and "model evaluation configuration" in [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
Custom adgen_handler:
diff --git a/docs/mindformers/docs/source_en/usage/evaluation.md b/docs/mindformers/docs/source_en/feature/evaluation.md
similarity index 98%
rename from docs/mindformers/docs/source_en/usage/evaluation.md
rename to docs/mindformers/docs/source_en/feature/evaluation.md
index 909c34cc63cc2e873959297df507e4c276d8650a..3984f1e007cf54e3969dd1d405ec496bd8567d5a 100644
--- a/docs/mindformers/docs/source_en/usage/evaluation.md
+++ b/docs/mindformers/docs/source_en/feature/evaluation.md
@@ -1,6 +1,6 @@
# Evaluation
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/evaluation.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/evaluation.md)
## Harness Evaluation
@@ -43,7 +43,7 @@ pip install -e .
#### Preparations Before Evaluation
1. Create a new directory with e.g. the name `model_dir` for storing the model yaml files.
- 2. Place the model inference yaml configuration file (predict_xxx_.yaml) in the directory created in the previous step. The directory location of the reasoning yaml configuration file for different models refers to [model library](../start/models.md).
+ 2. Place the model inference yaml configuration file (predict_xxx_.yaml) in the directory created in the previous step. The directory location of the reasoning yaml configuration file for different models refers to [model library](../introduction/models.md).
3. Configure the yaml file. If the model class, model Config class, and model Tokenzier class in yaml use cheat code, that is, the code files are in [research](https://gitee.com/mindspore/mindformers/tree/dev/research) directory or other external directories, it is necessary to modify the yaml file: under the corresponding class `type` field, add the `auto_register` field in the format of `module.class`. (`module` is the file name of the script where the class is located, and `class` is the class name. If it already exists, there is no need to modify it.).
Using [predict_1lama3_1_8b. yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml) configuration as an example, modify some of the configuration items as follows:
@@ -58,7 +58,7 @@ pip install -e .
auto_register: llama3_tokenizer.Llama3Tokenizer
```
- For detailed instructions on each configuration item, please refer to the [configuration description](../appendix/conf_files.md).
+ For detailed instructions on each configuration item, please refer to the [configuration description](../feature/configuration.md).
4. If you use the `ceval-valid`, `mmlu`, `cmmlu`, `race`, and `lambada` datasets for evaluation, you need to set `use_flash_attention` to `False`. Using `predict_lama3_1_8b.yaml` as an example, modify the yaml as follow:
```yaml
@@ -211,7 +211,7 @@ The currently adapted models and supported evaluation datasets are shown in the
Find the requirements.txt (VLMEvalKit/requirements.txt) file in the downloaded code and modify it to the following content:
- ```txt
+ ```text
gradio==4.40.0
huggingface_hub==0.24.2
imageio==2.35.1
@@ -362,7 +362,7 @@ For OpenEuler systems follow the steps below to install:
#### Preparations Before Evaluation
1. Create a new directory, for example named `model_dir`, to store the model yaml file;
-2. Place the model inference yaml configuration file (predict_xxx_. yaml) in the directory created in the previous step. For details, Please refer to the inference content of description documents for each model in the [model library](../start/models.md);
+2. Place the model inference yaml configuration file (predict_xxx_. yaml) in the directory created in the previous step. For details, Please refer to the inference content of description documents for each model in the [model library](../introduction/models.md);
3. Configure the yaml file.
Using [predict_cogvlm2_image_llama3_chat_19b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/predict_cogvlm2_image_llama3_chat_19b.yaml) configuration as an example:
@@ -378,7 +378,7 @@ For OpenEuler systems follow the steps below to install:
vocab_file: "/{path}/tokenizer.model" # Specify the tokenizer file path
```
- Configure the yaml file. Refer to [configuration description](../appendix/conf_files.md).
+ Configure the yaml file. Refer to [configuration description](../feature/configuration.md).
4. The MMBench-Video dataset evaluation requires the use of the GPT-4 Turbo model for evaluation and scoring. Please prepare the corresponding API Key in advance and put it in the VLMEvalKit/.env file as follows:
```text
diff --git a/docs/mindformers/docs/source_en/function/high_availability.md b/docs/mindformers/docs/source_en/feature/high_availability.md
similarity index 73%
rename from docs/mindformers/docs/source_en/function/high_availability.md
rename to docs/mindformers/docs/source_en/feature/high_availability.md
index 72b5cc64b48e10964c9d55bada11a17da1d78928..bf7d8b4fa6b6fe89a5d429bfac39ae7cb5b8487c 100644
--- a/docs/mindformers/docs/source_en/function/high_availability.md
+++ b/docs/mindformers/docs/source_en/feature/high_availability.md
@@ -1,52 +1,60 @@
# High Availability
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/high_availability.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/high_availability.md)
## Overview
-MindSpore Transformers high availability provides the following three functions:
+MindSpore Transformers high availability provides the following four functions:
- **End-of-life CKPT**: It is mainly aimed at accelerating the fault recovery in the training process of large models. This feature verifies the integrity and consistency of the intermediate state data after a fault occurs during the training process and generates an end-of-life CheckPoint data, which can be used to recover the training and reduce the loss of training iterations caused by the fault.
- **UCE Fault-tolerant Recovery**: It mainly focuses on the detection of UCE faults in on-chip memory during the training process of large models, and accomplishes online repair to reach Step-level recomputation.
-- **Process-Level Rescheduling Recovery**: Instead of pulling up the entire cluster again after an anomaly in training occurs, simply restart or replace it on a node-by-node basis to complete the repair and continue training.
+- **TRE Training Result Excepition Recovery**:It mainly focuses on the detection of value excepton of loss, global-norm, etc. during the training process of large models, and accomplishes online repair to reach Step-level recomputation.
+- **ARF Process-Level Rescheduling Recovery**: Instead of pulling up the entire cluster again after an anomaly in training occurs, simply restart or replace it on a node-by-node basis to complete the repair and continue training.
-The high availability feature is currently only supported in the MindSpore Ascend back-end graph schema; this feature also needs to support Step-level recovery, so only a sink_size of 1 is supported when configuring data sinking.
+Constraints and dependencies of the high availability functions:
-The high availability feature is based on the existence of a replica relationship between the two cards so that when one of the cards fails, it can be recovered from the other card, and therefore there will be two copies of redundancy in both the weights and the optimizer, which will take up more video memory. To ensure this redundancy relationship, data parallelism must be turned on to ensure that there are two cards with the same weights, and also if optimizer parallelism is turned on, it must be ensured that there are two cards with the same optimizer state.
+| | End-of-life CKPT | UCE | ARF | TRE |
+| - | - | - | - | - |
+| Depending on MindIO | Yes | Yes | Yes | No |
+| Replica relationship between between cards | Yes | Yes | Yes | No |
+| Sink Size is 1 | Yes | Yes | Yes | No |
-All three functions can be turned on at the same time or individually. When these three functions are turned on in combination, the order in which they take effect is: UCE Fault Tolerance Recovery -> Process-Level Rescheduling Recovery -> End-of-Life CKPT, and if one of the functions can be recovered, the next function will not be executed. The end-of-life CKPT function serves as a final safeguard, and the entire training process exits upon completion of this function, so it will be turned on by default when the other two functions are turned on.
+These four high availability functions are currently only supported in the MindSpore Ascend back-end graph schema to support Step-level recovery.
-The end-of-life CKPT saving of the Checkpoint file and the renewal of training from that file use the existing MindSpore Transformers capabilities in the same way, except that the end-of-life CKPT relies on the strategy file, so that folder needs to be configured for both the training and the renewal of the training.
+The replica relationship between cards is used to make sure when one of the cards fails, it can be recovered from the other card. It requires that there must be at least two copies of redundancy in both the weights and the optimizer. To ensure this redundancy relationship, data parallelism must be turned on to ensure that there are two cards with the same weights, and also if optimizer parallelism is turned on, it must be ensured that there are two cards with the same optimizer state.
-When an exception triggers an end-of-life CheckPoint save, if de-redundant saving is not turned on, only one card in each data parallel field saves the CheckPoint, and the rest of the cards do not save the CheckPoint. Therefore, when resuming training, it is also necessary to enable the high availability feature in order to resume, otherwise the other cards will not be able to find the available CheckPoint and will report an error exit. Users can determine whether a CheckPoint is triggered by the end-of-life CKPT feature by calculating whether the number of CheckPoints saved by the distribution is less than the number of clusters.
+When End-of-life CKPT, UCE and ARF functions are turned on in combination, the order in which they take effect is: UCE -> ARF -> End-of-Life CKPT, and if one of the functions can be recovered, the next function will not be executed. The end-of-life CKPT function serves as a final safeguard, and the entire training process exits upon completion of this function, so it will be turned on by default when the UCE or ARF functions are turned on.
## Instructions for Use
-The high availability feature switch is enabled by an environment variable, and the switch is not set separately in the YAML configuration file, but the YAML file needs to be able to configure the weights and optimizer states to be the same for both cards, as detailed in the [Replica Relationships Configuration](#replica-relationships-configuration) section of this document.
+The high availability feature switch is enabled by an environment variable, and the switch is not set separately in the YAML configuration file. For high availability functions which depend on replica relationship between between cards, the YAML file needs to be able to configure the weights and optimizer states to be the same for both cards, as detailed in the [Replica Relationships Configuration](#replica-relationships-configuration) section of this document.
-The high availability feature relies on the user to install the MindIO TFT SDK package. Please refer to [Install MindIO TFT SDK on compute nodes](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft011.html).
+For high availability functions which depend on MindIO, the user needs to install the MindIO TFT SDK package. Please refer to [Install MindIO TFT SDK on compute nodes](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft011.html).
### Environment Variable Configuration
```shell
export MINDIO_FOR_MINDSPORE=1
-export MS_ENABLE_TFT="{TTP:1,UCE:1,ARF:1}"
+export MS_ENABLE_TFT="{TTP:1,UCE:1,ARF:1,TRE:1}"
export MS_TFT_IP=127.0.0.1
export MS_TFT_PORT=30051
```
- `MINDIO_FOR_MINDSPORE`: Enabling MindIO TFT SDK to support MindSpore
-- `MS_ENABLE_TFT`: Indicates that the TTP, UCE and ARF functions are enabled. If you want to enable only one of these functions, set the corresponding value to 1.
+- `MS_ENABLE_TFT`: Indicates that the TTP, UCE, ARF and TRE functions are enabled. If you want to enable only one of these functions, set the corresponding value to 1.
- **TTP (Try To Persist)**: End-of-life CKPT function
- **UCE (Uncorrectable Memory Error)**: UCE fault tolerance recovery
- **ARF (Air Refuelling)**: Process-level rescheduling recovery function
+ - **TRE (Training Result Error)**: Training result exception recovery
- When UCE or ARF is enabled, TTP is enabled by default.
+ - TRE function can not be used with UCE or ARF feature
+ - TRE does not depend on MindIO. It is not necessary to configure the MindIO-related environment variables MINDIO_FOR_MINDSPORE, MS_TFT_IP, and MS_TFT_PORT to enable only the TRE feature
- `MS_TFT_IP` and `MS_TFT_PORT` represent the IP and port number of TFT Controller respectively, no default value, need to be specified by user. If the Controller is started by MindSpore Transformers, the IP and port number of the rank0 node in the user's cluster are configured. If the Controller is started by the user, configure the IP and port number of the Controller.
### YAML Configuration
-The YAML configuration consists of two parts: the end-of-life CKPT saving and recovery configuration and the highly available replica relationship configuration.
+The YAML configuration consists of two parts: the end-of-life CKPT saving and recovery configuration and the replica relationship between cards configuration.
#### Saving and Restoring Configurations
@@ -90,7 +98,7 @@ The end-of-life CheckPoint preservation and recovery capabilities are used for i
#### Replica Relationships Configuration
-The key to the three functions of high availability is to configure the weight and optimizer copy redundancy relationship. The core of the configuration is that the dimension of the data parallel domain is greater than 2, and if you overlay the optimizer parallelism, you need to ensure that the number of copies of the optimizer is greater than 2 at the same time. So the configuration is divided into two categories, with the optimizer parallelism and without the optimizer parallelism. The following is an example of how to configure 8 cards.
+The key to the end-of-life CheckPoint, UCE and ARF functions of high availability is to configure the weight and optimizer copy redundancy relationship. The core of the configuration is that the dimension of the data parallel domain is greater than 2, and if you overlay the optimizer parallelism, you need to ensure that the number of copies of the optimizer is greater than 2 at the same time. So the configuration is divided into two categories, with the optimizer parallelism and without the optimizer parallelism. The following is an example of how to configure 8 cards.
- **Without the Optimizer Parallelism**
@@ -120,7 +128,7 @@ The key to the three functions of high availability is to configure the weight a
pipeline_stage: 1
```
-#### Examples
+#### End-of-life CheckPoint Examples
This section demonstrates the use of the end-of-life CKPT using Llama2-13B training as an example.
diff --git a/docs/mindformers/docs/source_zh_cn/function/image/TrainingStateMonitor_log.png b/docs/mindformers/docs/source_en/feature/images/TrainingStateMonitor_log.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/function/image/TrainingStateMonitor_log.png
rename to docs/mindformers/docs/source_en/feature/images/TrainingStateMonitor_log.png
diff --git a/docs/mindformers/docs/source_zh_cn/function/image/adam_m_norm.png b/docs/mindformers/docs/source_en/feature/images/adam_m_norm.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/function/image/adam_m_norm.png
rename to docs/mindformers/docs/source_en/feature/images/adam_m_norm.png
diff --git a/docs/mindformers/docs/source_zh_cn/function/image/commondataloader.png b/docs/mindformers/docs/source_en/feature/images/commondataloader.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/function/image/commondataloader.png
rename to docs/mindformers/docs/source_en/feature/images/commondataloader.png
diff --git a/docs/mindformers/docs/source_zh_cn/function/image/local_loss&local_norm.png b/docs/mindformers/docs/source_en/feature/images/local_loss&local_norm.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/function/image/local_loss&local_norm.png
rename to docs/mindformers/docs/source_en/feature/images/local_loss&local_norm.png
diff --git a/docs/mindformers/docs/source_zh_cn/function/image/tensorboard_scalar.png b/docs/mindformers/docs/source_en/feature/images/tensorboard_scalar.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/function/image/tensorboard_scalar.png
rename to docs/mindformers/docs/source_en/feature/images/tensorboard_scalar.png
diff --git a/docs/mindformers/docs/source_zh_cn/function/image/tensorboard_text.png b/docs/mindformers/docs/source_en/feature/images/tensorboard_text.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/function/image/tensorboard_text.png
rename to docs/mindformers/docs/source_en/feature/images/tensorboard_text.png
diff --git a/docs/mindformers/docs/source_en/feature/infer_function.rst b/docs/mindformers/docs/source_en/feature/infer_function.rst
new file mode 100644
index 0000000000000000000000000000000000000000..d6f8b3d5615d797af8f8f1f13ba6d2d7b3dccd47
--- /dev/null
+++ b/docs/mindformers/docs/source_en/feature/infer_function.rst
@@ -0,0 +1,9 @@
+Infer Function
+================
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+
+ evaluation
+ quantization
diff --git a/docs/mindformers/docs/source_en/function/logs.md b/docs/mindformers/docs/source_en/feature/logging.md
similarity index 85%
rename from docs/mindformers/docs/source_en/function/logs.md
rename to docs/mindformers/docs/source_en/feature/logging.md
index 4e11068b14df3d797ae0618453a42c5300f3ee01..63da65ff41bc97c33a7d05a8ddfc0964fbcd2c2a 100644
--- a/docs/mindformers/docs/source_en/function/logs.md
+++ b/docs/mindformers/docs/source_en/feature/logging.md
@@ -1,12 +1,12 @@
# Logs
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/logs.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/logging.md)
## Logs Saving
### Overview
-MindSpore TransFormers will write the model's training configuration, training steps, loss, throughput and other information into the log. Developers can specify the path for log storage.
+MindSpore Transformers will write the model's training configuration, training steps, loss, throughput and other information into the log. Developers can specify the path for log storage.
### Training Log Directory Structure
@@ -40,7 +40,7 @@ output
### Configuration and Usage
-By default, MindSpore TransFormer specifies the file output path as `./output` in the training yaml file. If you start the training task under the `mindformers` path, the log output generated by the training will be saved under `mindformers/output` by default.
+By default, MindSpore Transformers specifies the file output path as `./output` in the training yaml file. If you start the training task under the `mindformers` path, the log output generated by the training will be saved under `mindformers/output` by default.
#### YAML Parameter Configuration
@@ -54,13 +54,13 @@ output_dir: './output' # path to save logs/checkpoint/strategy
#### Specifying Output Directory for Single-Card Tasks
-In addition to specifying the yaml file configuration, MindSpore TransFormer also supports [run_mindformer In the one-click start script](https://www.mindspore.cn/mindformers/docs/en/dev/function/start_tasks.html#run-mindformer-one-click-start-script),
+In addition to specifying the yaml file configuration, MindSpore Transformers also supports [run_mindformer In the one-click start script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/start_tasks.html#run-mindformer-one-click-start-script),
use the `--output_dir` start command to specify the log output path.
> If the output path is configured here, it will overwrite the configuration in the yaml file!
#### Distributed Task Specifies the Output Directory
-If the model training requires multiple servers, use the [distributed task launch script](https://www.mindspore.cn/mindformers/docs/en/dev/function/start_tasks.html#distributed-task-pull-up-script) to start the distributed training task.
+If the model training requires multiple servers, use the [distributed task launch script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/start_tasks.html#distributed-task-pull-up-script) to start the distributed training task.
If shared storage is set, you can also specify the input parameter `LOG_DIR` in the startup script to specify the log output path of the Worker and Scheduler, and output the logs of all machine nodes to one path for unified observation.
diff --git a/docs/mindformers/docs/source_en/function/other_features.md b/docs/mindformers/docs/source_en/feature/memory_optimization.md
similarity index 50%
rename from docs/mindformers/docs/source_en/function/other_features.md
rename to docs/mindformers/docs/source_en/feature/memory_optimization.md
index 12100a2e31d0f0636ab8310d2bd90099fa1c2e9a..b38779b9f1016c8c8aeb1e57115854e70684f957 100644
--- a/docs/mindformers/docs/source_en/function/other_features.md
+++ b/docs/mindformers/docs/source_en/feature/memory_optimization.md
@@ -1,10 +1,6 @@
-# Other features
+# Memory Optimization Features
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/other_features.md)
-
-During the large-scale training of deep learning models, challenges such as memory limitations, effective utilization of computational resources, and synchronization issues in distributed training are encountered. To address these challenges, training optimization algorithms are employed to enhance training efficiency, accelerate convergence, and improve the final model performance.
-
-MindSpore Transformer provides optimization algorithms like Recomputation, Gradient Accumulation, and Gradient Clipping for use during training.
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/memory_optimization.md)
## Recomputation
@@ -48,7 +44,7 @@ recompute_config:
The log will print the recalculation strategy information after normalizing the input format:
-```log
+```text
INFO - Formative layer_recompute: [[2, 1, 0, 0, 0], [1, 0, 0, 0, 0]]
INFO - Formative select_recompute: {'feed_forward\.w1\.activation\.silu': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'feed_forward\.mul': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'feed_forward\.w1\.matmul': [[1, 0, 0, 0, 0], [2, 1, 0, 0, 0]], 'feed_forward\.w3\.matmul': [[1, 1, 0, 0, 0], [1, 0, 0, 0, 0]]}
INFO - Formative select_comm_recompute: {'ffn_norm\.norm': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'attention_norm\.norm': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]]}
@@ -72,70 +68,273 @@ The main parameters for recomputation configuration are listed in the following
| mp_comm_recompute | Model parallel communication recomputation, whether to recompute communication operators in model parallelism. | (bool, optional) - After turning on, in automatic parallelism or semi-automatic parallelism mode, specify whether to recompute the communication operations introduced by model parallelism in the cell. Default value: `True`. |
| recompute_slice_activation | Slice recomputation, whether to slice the cell output that will be kept in memory. | (bool, optional) - Default value: `False`. |
-## Gradient Accumulation
+## Fine-Grained Activations SWAP
### Overview
-MindSpore supported the gradient accumulation implementation interface `mindspore.nn.wrap.cell_wrapper.GradAccumulationCell` in versions after 2.1.1, which provides the gradient accumulation capability by splitting MiniBatch. MindSpore Transformer encapsulates it into a unified training process and enables it through yaml configuration. For the principle of gradient accumulation and the ability of framework measurement, please refer to [MindSpore Document: Gradient Accumulation](https://www.mindspore.cn/tutorials/en/master/parallel/distributed_gradient_accumulation.html).
+In traditional large-scale model training tasks, the memory resources of computing cards often become a bottleneck. Although adopting larger-scale model parallel (mp) and pipeline parallel (pp) can alleviate the memory pressure on individual computing cards to some extent, it requires larger-scale cluster resources, and excessive communication can significantly reduce the model's Model FLOPs Utilization (MFU). Under limited cluster resources, recomputation is another effective method to mitigate memory pressure. It reduces the memory footprint of activations by discarding the storage of activation values during the forward propagation phase and recomputing the required activation values during gradient backpropagation. However, since recomputation introduces additional computational overhead, this method also significantly decreases the MFU of model training.
-### Configuration and Usage
+Against this backdrop, fine-grained activations SWAP can provide a third effective approach to reduce memory usage while offering greater end-to-end performance advantages. Specifically, SWAP offloads activations that need to be stored long-term to the host side during the forward propagation phase and prefetches them back to the device side in advance when they are needed during backpropagation. In terms of resource utilization, fine-grained activations SWAP leverages D2H/H2D bandwidth, which can overlap with computation tasks and D2D communication tasks during training, thereby masking the overhead of memory transfers.
-#### YAML Parameter Configuration
+The fine-grained activations SWAP technology offers high flexibility in usage. During the forward propagation phase of large model training, multiple activations of varying data sizes are generated, allowing users to swap specific activations at the granularity of the operator selectively. When the model type or configuration changes, users can flexibly adjust the corresponding SWAP strategy to minimize memory overhead and achieve optimal performance.
+
+### Instrunction for Use
+
+#### Constraint Scenarios
+
+- Only support static graph O0/O1 mode
+- Compatible with LLama-family dense models, MoE sparse models to be supported in future updates
+- Somas does not support heterogeneity and needs to be set in the configuration file:
+
+ ```yaml
+ context:
+ memory_optimize_level=O0
+ ```
+
+- When pipeline parallelism is disabled, the lazy_inline scenario must be enabled by setting the environment variable:
+
+ ```bash
+ ENABLE_LAZY_INLINE_NO_PIPELINE=1
+ ```
+
+- Only support Ascend backend
+
+#### Instruction for API
+
+Fine-grained activations SWAP is enabled through the `swap_config` field in YAML configuration, which includes four functional interfaces: `swap`, `default_prefetch`, `layer_swap`, and `op_swap`. These interfaces allow users to flexibly enable SWAP for specific layers or specific operators within layers.
+
+> MindSpore framework currently decouples memory offloading and memory release. When activations are offloaded from the device side to the host side, the memory space occupied on the device side is not immediately released even after all data has been transferred. An explicit release operation is required instead. Before triggering the memory release, the system checks whether the activation offloading is complete. If not, the process will wait in place until the offloading finishes.
+
+| Configuration Item | Type | Description |
+|:--:|:--:|:---|
+| swap | bool | Default False. When set to False, all four functional interfaces are disabled. When set to True, activations SWAP is enabled, and the system checks whether layer_swap and op_swap are None. If both are None, the default SWAP strategy is applied, which enables SWAP for the flash_attention operator across all layers. If either layer_swap or op_swap has a non-None value, the default policy is overridden, and SWAP is enabled according to the configurations in layer_swap and op_swap. |
+| default_prefetch | int | Default 1 and only takes effect when swap=True, layer_swap=None, and op_swap=None. It controls the timing of releasing memory in forward phase and starting prefetch in backward phase of the default SWAP strategy. A larger `default_prefetch` delays memory release during the forward phase, keeping device memory occupied by activations locked for an extended period after offloading, preventing reuse by other data blocks. It also starts earlier prefetching from host to device during the backward phase, applying memory pressure prematurely. A smaller `default_prefetch` releases memory earlier in the forward phase but may introduce idle waiting for copy operations to complete. Additionally, delayed prefetch in the backward phase may cause computation stalls if prefetching isn't finished before activation usage, impacting end-to-end performance. This interface allows users to fine-tune memory release and prefetch timing for optimal memory efficiency and performance.|
+| layer_swap | list | Default None. When set to None, this interface is inactive. When the type is List, this interface contains several list elements of the Dict type. Each Dict element contains two keys: `backward_prefetch`, and `layers`, and provides the prefetch opportunity and layer index for enabling swap. |
+| op_swap | list | Default None. When set to None, this interface is inactive. When the type is List, this interface contains several list elements of the Dict type. Each Dict element contains three keys: `op_name`, `backward_prefetch`, and `layers`, and provides the prefetch opportunity, operator name, and layer index for enabling swap. |
+
+#### Used together with Recomputation
+
+Fine-Grained Activations SWAP and Recomputation have coupling effects:
-To enable gradient accumulation, users only need to configure the `gradient_accumulation_steps` item under the `runner_config` item in the configuration file and set it to the required number of gradient accumulation steps:
+1. If any operator has both recomputation and SWAP enabled simultaneously, recomputation will take effect while SWAP will not.
+2. For any operator with SWAP enabled, if its output is used by an operator with recomputation enabled, then SWAP for that operator will not take effect.
+3. The YAML configuration interface for recomputation only supports enabling recomputation for a specific number of layers sequentially from front to back, rather than selecting specific layers or specific operators within layers. This means when using both SWAP and recomputation together, SWAP can only be enabled for later layers or operators within later layers, preventing full utilization of SWAP's benefits. Therefore, when and only when `swap=True`, the recomputation interface functionality will be adjusted as shown in the table below.
+
+| Interface Name | Original Functionality | Functionality When Enabling SWAP |
+|:--:|:---|:---|
+| recompute | Determine the number of layers with recomputation enabled in each pipeline stage. | Pipeline stage-agnostic, only accepts bool/list type inputs. When bool type: enables recomputation for all layers; when list type: uses layer indices to enable recomputation for specific layers. |
+| select_recompute | Determine the number of layers with recomputation enabled for specific operators in each pipeline stage. | Pipeline stage-agnostic, for each operator's key-value pair, only accepts bool/list type inputs. When bool type: enables recomputation for all layers; when list type: uses layer indices to enable recomputation for specific layers. |
+| select_comm_recompute | Determine the number of layers with recomputation enabled for communication operators in each pipeline stage. | Pipeline stage-agnostic, only accepts bool/list type inputs. When bool type: enables recomputation for all layers; when list type: uses layer indices to enable recomputation for specific layers. |
+
+### Cases of Fine-Grained Activations SWAP
+
+This section demonstrates the usage of fine-grained activations SWAP using Llama2-7B training as an example.
+
+#### Environmental Preparation
+
+Download Mindformers, and prepare the pre-training dataset, such as wikitext.
+
+#### Case 1: Default SWAP Strategy
+
+Modify and supplement the recomputation and SWAP configurations in YAML as follows:
```yaml
-# runner config
-runner_config:
-...
-gradient_accumulation_steps: 4
-...
+context:
+ memory_optimize_level: "O0"
+model:
+ model_config:
+ num_layers: 4
+recompute_config:
+ recompute: False
+ select_recompute: False
+ select_comm_recompute: False
+swap_config:
+ swap: True
+ default_prefetch: 10
```
-#### Key Parameters Introduction
+Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
-| Parameter | Description | Value Description |
-|-----------------------------|----------------------------------------------------------------------------------------------|---------------------------------------|
-| gradient_accumulation_steps | The number of steps to accumulate gradients before performing backpropagation. Default: `1`. | (int, required) - Default value: `1`. |
+```bash
+export GLOG_v=1
+export MS_MEMORY_STATISTIC=1
+export ENABLE_LAZY_INLINE_NO_PIPELINE=1
+YAML_FILE=$1 # User specifies the YAML file path.
+ROOT_PATH=`pwd`
-#### Other Ways to Use Gradient Accumulation
+bash ./scripts/msrun_launcher.sh "run_mindformer.py \
+ --config ${ROOT_PATH}/${YAML_FILE} \
+ --run_mode train \
+ --use_parallel True" \
+ 8 8 8118 0 output/msrun False 300
+```
-In addition to the configuration file, when launching the `run_mindformer.py` script, you can specify the `--gradient_accumulation_steps` argument to use the gradient accumulation feature.
+After training completes, execute the command `cat output/msrun/worker_0.log | grep 'attention.flash_attention'` to check the execution status of the default SWAP strategy:
-#### Usage Restrictions of Gradient Accumulation
+```text
+-INFO - Set op_swap at layer 0: attention.flash_attention, value=10
+-INFO - Set op_swap at layer 1: attention.flash_attention, value=10
+-INFO - Set op_swap at layer 2: attention.flash_attention, value=10
+-INFO - Set op_swap at layer 3: attention.flash_attention, value=10
+```
-> Enabling gradient accumulation will increase memory overhead. Please pay attention to memory management to prevent Out Of Memory.
+The default SWAP strategy is executed successfully.
-1. Since the implementation of `GradAccumulationCell` relies on parallel features, gradient accumulation is currently only supported in **semi-automatic parallel mode**;
-2. In addition, in the pipeline parallel scenario, the meaning of gradient accumulation is the same as micro_batch and will not take effect. Please configure the `micro_batch_num` item to increase the training batch_size.
+#### Case 2: Select Specific Layers to Enable SWAP
-## Gradient Clipping
+Modify and supplement the recomputation and SWAP configurations in YAML as follows:
-### Overview
+```yaml
+context:
+ memory_optimize_level: "O0"
+model:
+ model_config:
+ num_layers: 4
+recompute_config:
+ recompute: False
+ select_recompute: False
+ select_comm_recompute: False
+swap_config:
+ swap: True
+ layer_swap:
+ - backward_prefetch: 20
+ layers: [0,3]
+```
-The gradient clipping algorithm can avoid the situation where the reverse gradient is too large and the optimal solution is skipped.
+Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
-### Configuration and Usage
+```bash
+export GLOG_v=1
+export MS_MEMORY_STATISTIC=1
+export ENABLE_LAZY_INLINE_NO_PIPELINE=1
+YAML_FILE=$1 # User specifies the YAML file path.
+ROOT_PATH=`pwd`
-#### YAML Parameter Configuration
+bash ./scripts/msrun_launcher.sh "run_mindformer.py \
+ --config ${ROOT_PATH}/${YAML_FILE} \
+ --run_mode train \
+ --use_parallel True" \
+ 8 8 8118 0 output/msrun False 300
+```
+
+After training completes, execute the command `cat output/msrun/worker_0.log | grep 'Set layer swap at'` to check the execution status of the default SWAP strategy:
+
+```text
+-INFO - Set layer swap at layer 0 and value is: 20
+-INFO - Set layer swap at layer 3 and value is: 20
+```
+
+The strategy of enabling SWAP for specific layers is executed successfully.
-In MindSpore TransFormers, the default training process `MFTrainOneStepCell` integrates gradient clipping logic.
+#### Case 3: Select Specific Operators within Layers to Enable SWAP
-You can use the following example to enable gradient clipping:
+Modify and supplement the recomputation and SWAP configurations in YAML as follows:
```yaml
-# wrapper cell config
-runner_wrapper:
-type: MFTrainOneStepCell
-...
-use_clip_grad: True
-max_grad_norm: 1.0
-...
+context:
+ memory_optimize_level: "O0"
+model:
+ model_config:
+ num_layers: 4
+recompute_config:
+ recompute: False
+ select_recompute: False
+ select_comm_recompute: False
+swap_config:
+ swap: True
+ op_swap:
+ - op_name: 'attention'
+ backward_prefetch: 20
+ layers: [0,1,2]
+ - op_name: 'attention'
+ backward_prefetch: 10
+ layers: [3]
+ - op_name: 'feed_forward'
+ backward_prefetch: 15
+ layers: [1,2]
```
-#### Key Parameters Introduction
+Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
+
+```bash
+export GLOG_v=1
+export MS_MEMORY_STATISTIC=1
+export ENABLE_LAZY_INLINE_NO_PIPELINE=1
+YAML_FILE=$1 # User specifies the YAML file path.
+ROOT_PATH=`pwd`
+
+bash ./scripts/msrun_launcher.sh "run_mindformer.py \
+ --config ${ROOT_PATH}/${YAML_FILE} \
+ --run_mode train \
+ --use_parallel True" \
+ 8 8 8118 0 output/msrun False 300
+```
+
+After training completes, execute the command `cat output/msrun/worker_0.log | grep 'Set op_swap at layer'` to check the execution status of the default SWAP strategy:
+
+```text
+-INFO - Set op_swap at layer 0: .attention, value=20
+-INFO - Set op_swap at layer 1: .attention, value=20, .feed_forward, value=15
+-INFO - Set op_swap at layer 2: .attention, value=20, .feed_forward, value=15
+-INFO - Set op_swap at layer 3: .attention, value=10
+```
+
+The strategy of enabling SWAP for specific operators within layers is executed successfully.
+
+#### Case 4: Use Fine-Grained Activations SWAP together with Recomputation
+
+Modify and supplement the recomputation and SWAP configurations in YAML as follows:
+
+```yaml
+context:
+ memory_optimize_level: "O0"
+model:
+ model_config:
+ num_layers: 4
+recompute_config:
+ recompute: False
+ select_recompute:
+ 'feed_forward': [0,3]
+ select_comm_recompute: False
+swap_config:
+ swap: True
+ op_swap:
+ - op_name: 'attention'
+ backward_prefetch: 20
+ layers: [0,1,2]
+ - op_name: 'attention'
+ backward_prefetch: 10
+ layers: [3]
+ - op_name: 'feed_forward'
+ backward_prefetch: 15
+ layers: [1,2]
+```
+
+Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
+
+```bash
+export GLOG_v=1
+export MS_MEMORY_STATISTIC=1
+export ENABLE_LAZY_INLINE_NO_PIPELINE=1
+YAML_FILE=$1 # User specifies the YAML file path.
+ROOT_PATH=`pwd`
+
+bash ./scripts/msrun_launcher.sh "run_mindformer.py \
+ --config ${ROOT_PATH}/${YAML_FILE} \
+ --run_mode train \
+ --use_parallel True" \
+ 8 8 8118 0 output/msrun False 300
+```
+
+After training completes, execute the command `cat output/msrun/worker_0.log | grep 'Set op_swap at layer' -C 1` to check the execution status of the default SWAP strategy:
+
+```text
+-INFO - Set select recompute at layer 0: feed_forward
+-INFO - Set op_swap at layer 0: .attention, value=20
+-INFO - Set op_swap at layer 1: .attention, value=20, .feed_forward, value=15
+-INFO - Set op_swap at layer 2: .attention, value=20, .feed_forward, value=15
+-INFO - Set select recompute at layer 3: feed_forward
+-INFO - Set op_swap at layer 3: .attention, value=10
+```
-| Parameter | Description | Value Description |
-|---------------|----------------------------------------------------------------------------------------|-------------------------------------------|
-| use_clip_grad | Controls whether gradient clipping is enabled during training, default value: `False`. | (bool, optional) - Default: `False`. |
-| max_grad_norm | Controls the maximum norm value of gradient clipping, default value: `1.0`. | (float, optional) - Default: `1.0`. |
\ No newline at end of file
+The strategy of enabling fine-grained activations SWAP together with recomputation is executed successfully.
diff --git a/docs/mindformers/docs/source_en/function/monitor.md b/docs/mindformers/docs/source_en/feature/monitor.md
similarity index 97%
rename from docs/mindformers/docs/source_en/function/monitor.md
rename to docs/mindformers/docs/source_en/feature/monitor.md
index dbeeed4d6f5284d4a751c5203998a3918a054ba2..b39f26dd2d3e5efd9a89d703ad238646aa210682 100644
--- a/docs/mindformers/docs/source_en/function/monitor.md
+++ b/docs/mindformers/docs/source_en/feature/monitor.md
@@ -1,6 +1,6 @@
# Training Metrics Monitoring
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/monitor.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/monitor.md)
MindSpore Transformers supports TensorBoard as a visualization tool for monitoring and analyzing various metrics and information during training. TensorBoard is a standalone visualization library that requires the user to manually install it, and it provides an interactive way to view loss, precision, learning rate, gradient distribution, and a variety of other things in training. After the user configures TensorBoard in the training `yaml` file, the event file is generated and updated in real time during the training of the large model, and the training data can be viewed via commands.
@@ -117,7 +117,7 @@ The names and descriptions of the metrics monitored by `MFLossMonitor` are liste
In Tensorboard SCALARS page, the above metrics (assumed to be named `scalar_name`) have drop-down tabs for `scalar_name` and `scalar_name-vs-samples`, except for the last two. A line plot of this scalar versus the number of training iterations is shown under `scalar_name`, and a line plot of this scalar versus the number of samples is shown under `scalar_name-vs-samples`. An example of a plot of learning rate `learning-rate` is shown below:
-
+
#### TrainingStateMonitor Monitoring Metrics
@@ -138,23 +138,23 @@ Depending on the specific settings, the above metrics will be displayed in the T
**Example of logging effect**
-
+
**Example of tensorboard visualization**
adam_m_norm
-
+
local_loss and local_norm
-
+
### Description of Text Data Visualization
On the TEXT page, a tab exists for each training configuration where the values for that configuration are recorded. This is shown in the following figure:
-
+
All configuration names and descriptions are listed below:
@@ -223,4 +223,4 @@ All configuration names and descriptions are listed below:
> 2. Configuration parameters set by the user in the training configuration file `yaml`;
> 3. Default configuration parameters during training.
>
-> Refer to [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html) for all configurable parameters.
\ No newline at end of file
+> Refer to [Configuration File Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html) for all configurable parameters.
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/feature/other_training_features.md b/docs/mindformers/docs/source_en/feature/other_training_features.md
new file mode 100644
index 0000000000000000000000000000000000000000..cf89ea6e4ef5d73f5ea27500dff6a9ad326b12e1
--- /dev/null
+++ b/docs/mindformers/docs/source_en/feature/other_training_features.md
@@ -0,0 +1,75 @@
+# Other Training Features
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/other_training_features.md)
+
+During the large-scale training of deep learning models, challenges such as memory limitations, effective utilization of computational resources, and synchronization issues in distributed training are encountered. To address these challenges, training optimization algorithms are employed to enhance training efficiency, accelerate convergence, and improve the final model performance.
+
+MindSpore Transformers provides optimization algorithms like Recomputation, Gradient Accumulation, and Gradient Clipping for use during training.
+
+## Gradient Accumulation
+
+### Overview
+
+MindSpore supported the gradient accumulation implementation interface `mindspore.nn.wrap.cell_wrapper.GradAccumulationCell` in versions after 2.1.1, which provides the gradient accumulation capability by splitting MiniBatch. MindSpore Transformers encapsulates it into a unified training process and enables it through yaml configuration. For the principle of gradient accumulation and the ability of framework measurement, please refer to [MindSpore Document: Gradient Accumulation](https://www.mindspore.cn/tutorials/en/master/parallel/distributed_gradient_accumulation.html).
+
+### Configuration and Usage
+
+#### YAML Parameter Configuration
+
+To enable gradient accumulation, users only need to configure the `gradient_accumulation_steps` item under the `runner_config` item in the configuration file and set it to the required number of gradient accumulation steps:
+
+```yaml
+# runner config
+runner_config:
+...
+gradient_accumulation_steps: 4
+...
+```
+
+#### Key Parameters Introduction
+
+| Parameter | Description | Value Description |
+|-----------------------------|----------------------------------------------------------------------------------------------|---------------------------------------|
+| gradient_accumulation_steps | The number of steps to accumulate gradients before performing backpropagation. Default: `1`. | (int, required) - Default value: `1`. |
+
+#### Other Ways to Use Gradient Accumulation
+
+In addition to the configuration file, when launching the `run_mindformer.py` script, you can specify the `--gradient_accumulation_steps` argument to use the gradient accumulation feature.
+
+#### Usage Restrictions of Gradient Accumulation
+
+> Enabling gradient accumulation will increase memory overhead. Please pay attention to memory management to prevent Out Of Memory.
+
+1. Since the implementation of `GradAccumulationCell` relies on parallel features, gradient accumulation is currently only supported in **semi-automatic parallel mode**;
+2. In addition, in the pipeline parallel scenario, the meaning of gradient accumulation is the same as micro_batch and will not take effect. Please configure the `micro_batch_num` item to increase the training batch_size.
+
+## Gradient Clipping
+
+### Overview
+
+The gradient clipping algorithm can avoid the situation where the reverse gradient is too large and the optimal solution is skipped.
+
+### Configuration and Usage
+
+#### YAML Parameter Configuration
+
+In MindSpore Transformers, the default training process `MFTrainOneStepCell` integrates gradient clipping logic.
+
+You can use the following example to enable gradient clipping:
+
+```yaml
+# wrapper cell config
+runner_wrapper:
+type: MFTrainOneStepCell
+...
+use_clip_grad: True
+max_grad_norm: 1.0
+...
+```
+
+#### Key Parameters Introduction
+
+| Parameter | Description | Value Description |
+|---------------|----------------------------------------------------------------------------------------|-------------------------------------------|
+| use_clip_grad | Controls whether gradient clipping is enabled during training, default value: `False`. | (bool, optional) - Default: `False`. |
+| max_grad_norm | Controls the maximum norm value of gradient clipping, default value: `1.0`. | (float, optional) - Default: `1.0`. |
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/function/distributed_parallel.md b/docs/mindformers/docs/source_en/feature/parallel_training.md
similarity index 95%
rename from docs/mindformers/docs/source_en/function/distributed_parallel.md
rename to docs/mindformers/docs/source_en/feature/parallel_training.md
index ec76fd1ab85c6c3f18ce2ba3551edd6cf7a3c72f..f279c54adb3f82965db266fb7742ffa99125a407 100644
--- a/docs/mindformers/docs/source_en/function/distributed_parallel.md
+++ b/docs/mindformers/docs/source_en/feature/parallel_training.md
@@ -1,6 +1,6 @@
-# Distributed Parallelism
+# Distributed Parallelism Training
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/distributed_parallel.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/parallel_training.md)
## Parallel Modes and Application Scenarios
@@ -33,7 +33,7 @@ MindSpore Transformers supports multiple parallelism features. You can use these
| **[Long sequence parallelism](#long-sequence-parallelism)** | Slices all inputs and output activations by sequence to further reduce the GPU memory usage of the model for processing long sequence inputs.|
| **[Multi-copy parallelism](https://www.mindspore.cn/docs/en/master/features/parallel/pipeline_parallel.html#mindspore-interleaved-pipeline-scheduler)** | Implements fine-grained parallel control among multiple copies to optimize performance and resource utilization. This mode is suitable for efficient training of models with large specifications. |
-For details about how to configure distributed parallel parameters, see [MindSpore Transformers Configuration Description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+For details about how to configure distributed parallel parameters, see [MindSpore Transformers Configuration Description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
## Introduction to Parallel Characterization
@@ -64,7 +64,7 @@ Parameter Descriptions:
- use_ring_attention: Whether to enable Ring Attention, default is False.
- context_parallel: The number of sequence parallel slices, default is 1, configure according to user requirements.
-For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
#### Ulysses Sequence Parallelism
@@ -95,7 +95,7 @@ Parameter Descriptions:
- enable_alltoall: Generate alltoall communication operator, default is False, when the parameter is not enabled, it will be replaced by a combination of other operators such as allgather. See MindSpore `set_auto_parallel_context` [interface documentation](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.set_auto_parallel_context.html). We expect to be able to directly input allto_all communication operators when we enable the Ulysses scenario, so we turn this configuration item on.
- context_parallel_algo: Set to `ulysses_cp` to enable Ulysses sequence parallelism.
-For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
#### Hybrid Sequence Parallelism
@@ -121,7 +121,7 @@ Parameter Descriptions:
- context_parallel_algo: hybrid sequence parallelism is turned on when set to `hybrid_cp`.
- ulysses_degree_in_cp: the number of parallel slices of the Ulysses sequence.
-For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/appendix/conf_files.html).
+For configuration method of distributed parallel parameters, refer to the contents of the Parallel Configuration section in [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/en/dev/feature/configuration.html).
### Pipeline Parallelism
@@ -153,7 +153,7 @@ Notes:
- Currently, only Llama and DeepSeek series models are supported.
- Using Megatron's multi-source datasets for training is not yet supported.
-For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html), specifically the section on parallel configuration.
+For more information on configuring distributed parallel parameters, see the [MindSpore Transformers configuration description](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html), specifically the section on parallel configuration.
## MindSpore Transformers Distributed Parallel Application Practices
@@ -166,6 +166,6 @@ In the [Llama3-70B fine-tuning configuration](https://gitee.com/kong_de_shu/mind
- **Multi-copy parallelism**: Sequential scheduling algorithm is used to control the parallelism of fine-grained multi-branch operations (`fine_grain_interleave: 2`), improving the overlap of computing and communications.
- **Optimizer parallelism**: The calculation of optimizers is distributed to multiple devices to reduce memory usage (`enable_parallel_optimizer: True`).
-> Note: Sequential parallelism must be turned on at the same time that fine-grained multicopy parallelism is turned on.
+> Sequential parallelism must be turned on at the same time that fine-grained multicopy parallelism is turned on.
With the preceding configurations, the distributed training on Llama3-70B can effectively utilize hardware resources in a multi-node multi-device environment to implement efficient and stable model training.
diff --git a/docs/mindformers/docs/source_en/usage/quantization.md b/docs/mindformers/docs/source_en/feature/quantization.md
similarity index 97%
rename from docs/mindformers/docs/source_en/usage/quantization.md
rename to docs/mindformers/docs/source_en/feature/quantization.md
index 809bbe31ed7b634ce0e6d0188a1852f8a3a1239c..0cd91eeed5d0899616178d1a80ab68e835c2c11d 100644
--- a/docs/mindformers/docs/source_en/usage/quantization.md
+++ b/docs/mindformers/docs/source_en/feature/quantization.md
@@ -1,6 +1,6 @@
# Quantization
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/quantization.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/quantization.md)
## Overview
diff --git a/docs/mindformers/docs/source_en/function/resume_training.md b/docs/mindformers/docs/source_en/feature/resume_training.md
similarity index 99%
rename from docs/mindformers/docs/source_en/function/resume_training.md
rename to docs/mindformers/docs/source_en/feature/resume_training.md
index 6c9778ea8e69ebb84f476909ec148a7f8893ce0e..ec61b7934d5c2212ef1434069e7a20dc7ba316d2 100644
--- a/docs/mindformers/docs/source_en/function/resume_training.md
+++ b/docs/mindformers/docs/source_en/feature/resume_training.md
@@ -1,6 +1,6 @@
# Resumable Training After Breakpoint
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/resume_training.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/resume_training.md)
## Resumable Training
diff --git a/docs/mindformers/docs/source_en/function/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md
similarity index 97%
rename from docs/mindformers/docs/source_en/function/safetensors.md
rename to docs/mindformers/docs/source_en/feature/safetensors.md
index 3fdba20e49e22cbdc19eae185754b60613e4b8a4..69d9ddf182ebafabffc307baeeb07a3331abaef1 100644
--- a/docs/mindformers/docs/source_en/function/safetensors.md
+++ b/docs/mindformers/docs/source_en/feature/safetensors.md
@@ -1,6 +1,6 @@
# Safetensors Weights
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/safetensors.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/safetensors.md)
## Overview
@@ -15,7 +15,7 @@ There are two main types of Safetensors files: complete weights files and distri
Safetensors complete weights can be obtained in two ways:
1. Download directly from Huggingface.
-2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html#safetensors-weight-merging).
+2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html#safetensors-weight-merging).
Huggingface Safetensors example catalog structure is as follows:
@@ -106,7 +106,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.
-For more details, please refer to: [Introduction to Pre-training](https://www.mindspore.cn/mindformers/docs/en/dev/usage/pre_training.html).
+For more details, please refer to: [Introduction to Pre-training](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html).
### Examples of Fine-tuning Tasks
@@ -154,7 +154,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.
-For more details, please refer to [Introduction to SFT fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/usage/sft_tuning.html)
+For more details, please refer to [Introduction to SFT fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/guide/supervised_fine_tuning.html)
### Example of an Inference Task
@@ -201,7 +201,7 @@ The results of executing the above single-card inference and multi-card inferenc
'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......]
```
-For more details, please refer to: [Introduction to Inference](https://www.mindspore.cn/mindformers/docs/en/dev/usage/inference.html)
+For more details, please refer to: [Introduction to Inference](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html)
### Examples of Resumable Training after Breakpoint Tasks
@@ -237,9 +237,9 @@ callbacks:
checkpoint_format: safetensors # Save weights file format
```
-In large cluster scale scenarios, to avoid the online merging process taking too long to occupy the training resources, it is recommended to [merge the complete weights](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html#safetensors-weight-merging) with the original distributed weights file offline, and then pass it in. There is no need to pass in the path of the source slicing strategy file.
+In large cluster scale scenarios, to avoid the online merging process taking too long to occupy the training resources, it is recommended to [merge the complete weights](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html#safetensors-weight-merging) with the original distributed weights file offline, and then pass it in. There is no need to pass in the path of the source slicing strategy file.
-For more details, please refer to: [Resumable Training](https://www.mindspore.cn/mindformers/docs/en/dev/function/resume_training.html).
+For more details, please refer to: [Resumable Training](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html).
## Weight Saving
@@ -247,7 +247,7 @@ For more details, please refer to: [Resumable Training](https://www.mindspore.cn
In the training process of deep learning models, saving the model weights is a crucial step. The weight saving function allows us to store the model parameters at any stage of training, so that users can restore, continue training, evaluate or deploy after training is interrupted or completed. At the same time, by saving weights, experimental results can be reproduced in different environments.
-Currently, MindSpore TransFormer supports reading and saving weight files in the [safetensors](https://www.mindspore.cn/mindformers/docs/en/dev/function/safetensors.html) format.
+Currently, MindSpore TransFormer supports reading and saving weight files in the [safetensors](https://www.mindspore.cn/mindformers/docs/en/dev/feature/safetensors.html) format.
### Directory Structure
diff --git a/docs/mindformers/docs/source_en/function/start_tasks.md b/docs/mindformers/docs/source_en/feature/start_tasks.md
similarity index 96%
rename from docs/mindformers/docs/source_en/function/start_tasks.md
rename to docs/mindformers/docs/source_en/feature/start_tasks.md
index d5dfa3cb9782c3d8c9f9c2e66a12b97dd65141ba..e77498056b5976940fd0811d0cc0dc3b34fcb99f 100644
--- a/docs/mindformers/docs/source_en/function/start_tasks.md
+++ b/docs/mindformers/docs/source_en/feature/start_tasks.md
@@ -1,6 +1,6 @@
# Start Tasks
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/start_tasks.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/start_tasks.md)
## Overview
@@ -22,7 +22,7 @@ In the root directory of the MindSpore Transformers code, execute the `run_mindf
| `--device_id` | Set the execution device ID. The value must be within the range of available devices. | int, optional | pre-train/finetune/predict |
| `--device_target` | Set the backend execution device. MindSpore Transformers is only supported on `Ascend` devices. | str, optional | pre-train/finetune/predict |
| `--run_mode` | Set the running mode of the model: `train`, `finetune` or `predict`. | str, optional | pre-train/finetune/predict |
-| `--load_checkpoint` | File or folder paths for loading weights. For detailed usage, please refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html) | str, optional | pre-train/finetune/predict |
+| `--load_checkpoint` | File or folder paths for loading weights. For detailed usage, please refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html) | str, optional | pre-train/finetune/predict |
| `--use_parallel` | Whether use parallel mode. | bool, optional | pre-train/finetune/predict |
| `--output_dir` | Set the path where log, checkpoint, strategy, etc. files are saved. | str, optional | pre-train/finetune/predict |
| `--register_path` | The absolute path of the directory where the external code is located. For example, the model directory under the research directory. | str, optional | pre-train/finetune/predict |
@@ -33,7 +33,7 @@ In the root directory of the MindSpore Transformers code, execute the `run_mindf
| Parameters | Parameter Descriptions | Value Description | Applicable Scenarios |
|:----------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|-----------------------------|
| `--src_strategy_path_or_dir` | The strategy of load_checkpoint. | str, optional | pre-train/finetune/predict |
-| `--auto_trans_ckpt` | Enable online weight automatic conversion. Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html). | bool, optional | pre-train/finetune/predict |
+| `--auto_trans_ckpt` | Enable online weight automatic conversion. Refer to [Weight Conversion Function](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html). | bool, optional | pre-train/finetune/predict |
| `--transform_process_num` | The number of processes responsible for checkpoint transform. | int, optional | pre-train/finetune/predict |
| `--only_save_strategy` | Whether to only save the strategy files. | bool, optional, when it is `true`, the task exits directly after saving the strategy file. | pre-train/finetune/predict |
@@ -42,7 +42,7 @@ In the root directory of the MindSpore Transformers code, execute the `run_mindf
| Parameters | Parameter Descriptions | Value Description | Applicable Scenarios |
|:--------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|----------------------|
| `--train_dataset_dir` | Dataset directory of data loader to pre-train/finetune. | str, optional | pre-train/finetune |
-| `--resume_training` | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/function/resume_training.html#resumable-training). | bool, optional | pre-train/finetune |
+| `--resume_training` | Enable resumable training after breakpoint. For details, refer to [Resumable Training After Breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html#resumable-training). | bool, optional | pre-train/finetune |
| `--epochs` | Train epochs. | int, optional | pre-train/finetune |
| `--batch_size` | The sample size of the batch data. | int, optional | pre-train/finetune |
| `--gradient_accumulation_steps` | The number of gradient accumulation steps. | int, optional | pre-train/finetune |
diff --git a/docs/mindformers/docs/source_en/feature/training_function.rst b/docs/mindformers/docs/source_en/feature/training_function.rst
new file mode 100644
index 0000000000000000000000000000000000000000..4a27467211b6d00785eae976ea5d27f0ca3b2f82
--- /dev/null
+++ b/docs/mindformers/docs/source_en/feature/training_function.rst
@@ -0,0 +1,15 @@
+Training Function
+======================
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+
+ dataset
+ training_hyperparameters
+ monitor
+ resume_training
+ parallel_training
+ high_availability
+ memory_optimization
+ other_training_features
diff --git a/docs/mindformers/docs/source_en/function/training_hyperparameters.md b/docs/mindformers/docs/source_en/feature/training_hyperparameters.md
similarity index 89%
rename from docs/mindformers/docs/source_en/function/training_hyperparameters.md
rename to docs/mindformers/docs/source_en/feature/training_hyperparameters.md
index 05c4d093772b397042ceab2d71b6c5292bdfd7cf..7b8ebe4a9ebcfd2289d7dd2d9051803e4e24a819 100644
--- a/docs/mindformers/docs/source_en/function/training_hyperparameters.md
+++ b/docs/mindformers/docs/source_en/feature/training_hyperparameters.md
@@ -1,6 +1,6 @@
# Model Training Hyperparameters Configuration
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/training_hyperparameters.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/training_hyperparameters.md)
## Overview
@@ -8,7 +8,7 @@ Hyperparameters significantly affect model performance, with different settings
Choices regarding these parameters influence aspects such as training speed, convergence, capacity, and generalization ability. They are not learned directly from the training data but are determined by developers based on experience, experiments, or tuning processes.
-MindSpore Transformer offers several categories of hyperparameter configuration methods.
+MindSpore Transformers offers several categories of hyperparameter configuration methods.
## Learning Rate
@@ -39,7 +39,7 @@ lr_schedule:
#### Key Parameters Introduction
-Different learning rates require different configuration parameters. MindSpore Transformer currently supports the following learning rates:
+Different learning rates require different configuration parameters. MindSpore Transformers currently supports the following learning rates:
1. [Constant Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.ConstantWarmUpLR.html)
2. [Linear with Warm Up Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.LinearWithWarmUpLR.html)
@@ -75,7 +75,7 @@ lr_schedule:
total_steps: 20 # -1 means it will load the total steps of the dataset
```
-For more details about the learning rate API (such as `type` configuration names and introductions to learning rate algorithms), please refer to the related links in the [MindSpore TransFormer API Documentation: Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers.core.html#learning-rate).
+For more details about the learning rate API (such as `type` configuration names and introductions to learning rate algorithms), please refer to the related links in the [MindSpore Transformers API Documentation: Learning Rate](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers.core.html#learning-rate).
## Optimizer
@@ -85,7 +85,7 @@ An optimizer is an algorithmic choice used for optimizing neural network weights
Selecting the right optimizer is crucial for the convergence speed and final performance of the model. Different optimizers employ various strategies to adjust the learning rate and other hyperparameters to accelerate the training process, improve convergence, and avoid local optima.
-Currently, MindSpore Transformer only supports the [AdamW optimizer](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers.core.html#optimizer).
+Currently, MindSpore Transformers only supports the [AdamW optimizer](https://www.mindspore.cn/mindformers/docs/en/dev/mindformers.core.html#optimizer).
### Configuration and Usage
@@ -105,4 +105,4 @@ optimizer:
#### Key Parameters Introduction
-For the main parameters of optimizer configuration, see the relevant link in [MindSpore TransFormer API Documentation: Optimizer](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.AdamW.html#mindformers.core.AdamW).
\ No newline at end of file
+For the main parameters of optimizer configuration, see the relevant link in [MindSpore Transformers API Documentation: Optimizer](https://www.mindspore.cn/mindformers/docs/en/dev/core/mindformers.core.AdamW.html#mindformers.core.AdamW).
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/function/transform_weight.md b/docs/mindformers/docs/source_en/feature/transform_weight.md
similarity index 99%
rename from docs/mindformers/docs/source_en/function/transform_weight.md
rename to docs/mindformers/docs/source_en/feature/transform_weight.md
index ea60d771ebe050f198a19135204a7d6be7245cdc..e075dba564d59270177752df70299a095f671d49 100644
--- a/docs/mindformers/docs/source_en/function/transform_weight.md
+++ b/docs/mindformers/docs/source_en/feature/transform_weight.md
@@ -1,6 +1,6 @@
# Distributed Weight Slicing and Merging
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/transform_weight.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/transform_weight.md)
## Overview
diff --git a/docs/mindformers/docs/source_en/function/weight_conversion.md b/docs/mindformers/docs/source_en/feature/weight_conversion.md
similarity index 99%
rename from docs/mindformers/docs/source_en/function/weight_conversion.md
rename to docs/mindformers/docs/source_en/feature/weight_conversion.md
index caaf18f1adc0acf4a266825c35be703566137652..c61bcbe880ded0adcbb54ed248838df60e0bcff7 100644
--- a/docs/mindformers/docs/source_en/function/weight_conversion.md
+++ b/docs/mindformers/docs/source_en/feature/weight_conversion.md
@@ -1,6 +1,6 @@
# Weight Format Conversion
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/weight_conversion.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/weight_conversion.md)
## Overview
diff --git a/docs/mindformers/docs/source_en/function/fine_grained_activations_swap.md b/docs/mindformers/docs/source_en/function/fine_grained_activations_swap.md
deleted file mode 100644
index 51a400555b5f9738ac6b8a95361ff36ac6f2bd47..0000000000000000000000000000000000000000
--- a/docs/mindformers/docs/source_en/function/fine_grained_activations_swap.md
+++ /dev/null
@@ -1,272 +0,0 @@
-# Fine-Grained Activations SWAP
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/function/fine_grained_activations_swap.md)
-
-## Overview
-
-In traditional large-scale model training tasks, the memory resources of computing cards often become a bottleneck. Although adopting larger-scale model parallel (mp) and pipeline parallel (pp) can alleviate the memory pressure on individual computing cards to some extent, it requires larger-scale cluster resources, and excessive communication can significantly reduce the model's Model FLOPs Utilization (MFU). Under limited cluster resources, recomputation is another effective method to mitigate memory pressure. It reduces the memory footprint of activations by discarding the storage of activation values during the forward propagation phase and recomputing the required activation values during gradient backpropagation. However, since recomputation introduces additional computational overhead, this method also significantly decreases the MFU of model training.
-
-Against this backdrop, fine-grained activations SWAP can provide a third effective approach to reduce memory usage while offering greater end-to-end performance advantages. Specifically, SWAP offloads activations that need to be stored long-term to the host side during the forward propagation phase and prefetches them back to the device side in advance when they are needed during backpropagation. In terms of resource utilization, fine-grained activations SWAP leverages D2H/H2D bandwidth, which can overlap with computation tasks and D2D communication tasks during training, thereby masking the overhead of memory transfers.
-
-The fine-grained activations SWAP technology offers high flexibility in usage. During the forward propagation phase of large model training, multiple activations of varying data sizes are generated, allowing users to swap specific activations at the granularity of the operator selectively. When the model type or configuration changes, users can flexibly adjust the corresponding SWAP strategy to minimize memory overhead and achieve optimal performance.
-
-## Instrunction for Use
-
-### Constraint Scenarios
-
-- Only support static graph O0/O1 mode
-- Compatible with LLama-family dense models, MoE sparse models to be supported in future updates
-- Somas does not support heterogeneity and needs to be set in the configuration file:
-
- ```yaml
- context:
- memory_optimize_level=O0
- ```
-
-- When pipeline parallelism is disabled, the lazy_inline scenario must be enabled by setting the environment variable:
-
- ```bash
- ENABLE_LAZY_INLINE_NO_PIPELINE=1
- ```
-
-- Only support Ascend backend
-
-### Instruction for API
-
-Fine-grained activations SWAP is enabled through the `swap_config` field in YAML configuration, which includes four functional interfaces: `swap`, `default_prefetch`, `layer_swap`, and `op_swap`. These interfaces allow users to flexibly enable SWAP for specific layers or specific operators within layers.
-
-> MindSpore framework currently decouples memory offloading and memory release. When activations are offloaded from the device side to the host side, the memory space occupied on the device side is not immediately released even after all data has been transferred. An explicit release operation is required instead. Before triggering the memory release, the system checks whether the activation offloading is complete. If not, the process will wait in place until the offloading finishes.
-
-| Configuration Item | Type | Description |
-|:--:|:--:|:---|
-| swap | bool | Default False. When set to False, all four functional interfaces are disabled. When set to True, activations SWAP is enabled, and the system checks whether layer_swap and op_swap are None. If both are None, the default SWAP strategy is applied, which enables SWAP for the flash_attention operator across all layers. If either layer_swap or op_swap has a non-None value, the default policy is overridden, and SWAP is enabled according to the configurations in layer_swap and op_swap. |
-| default_prefetch | int | Default 1 and only takes effect when swap=True, layer_swap=None, and op_swap=None. It controls the timing of releasing memory in forward phase and starting prefetch in backward phase of the default SWAP strategy. A larger `default_prefetch` delays memory release during the forward phase, keeping device memory occupied by activations locked for an extended period after offloading, preventing reuse by other data blocks. It also starts earlier prefetching from host to device during the backward phase, applying memory pressure prematurely. A smaller `default_prefetch` releases memory earlier in the forward phase but may introduce idle waiting for copy operations to complete. Additionally, delayed prefetch in the backward phase may cause computation stalls if prefetching isn't finished before activation usage, impacting end-to-end performance. This interface allows users to fine-tune memory release and prefetch timing for optimal memory efficiency and performance.|
-| layer_swap | list | Default None. When set to None, this interface is inactive. When the type is List, this interface contains several list elements of the Dict type. Each Dict element contains two keys: `backward_prefetch`, and `layers`, and provides the prefetch opportunity and layer index for enabling swap. |
-| op_swap | list | Default None. When set to None, this interface is inactive. When the type is List, this interface contains several list elements of the Dict type. Each Dict element contains three keys: `op_name`, `backward_prefetch`, and `layers`, and provides the prefetch opportunity, operator name, and layer index for enabling swap. |
-
-### Used together with Recomputation
-
-Fine-Grained Activations SWAP and Recomputation have coupling effects:
-
-1. If any operator has both recomputation and SWAP enabled simultaneously, recomputation will take effect while SWAP will not.
-2. For any operator with SWAP enabled, if its output is used by an operator with recomputation enabled, then SWAP for that operator will not take effect.
-3. The YAML configuration interface for recomputation only supports enabling recomputation for a specific number of layers sequentially from front to back, rather than selecting specific layers or specific operators within layers. This means when using both SWAP and recomputation together, SWAP can only be enabled for later layers or operators within later layers, preventing full utilization of SWAP's benefits. Therefore, when and only when `swap=True`, the recomputation interface functionality will be adjusted as shown in the table below.
-
-| Interface Name | Original Functionality | Functionality When Enabling SWAP |
-|:--:|:---|:---|
-| recompute | Determine the number of layers with recomputation enabled in each pipeline stage. | Pipeline stage-agnostic, only accepts bool/list type inputs. When bool type: enables recomputation for all layers; when list type: uses layer indices to enable recomputation for specific layers. |
-| select_recompute | Determine the number of layers with recomputation enabled for specific operators in each pipeline stage. | Pipeline stage-agnostic, for each operator's key-value pair, only accepts bool/list type inputs. When bool type: enables recomputation for all layers; when list type: uses layer indices to enable recomputation for specific layers. |
-| select_comm_recompute | Determine the number of layers with recomputation enabled for communication operators in each pipeline stage. | Pipeline stage-agnostic, only accepts bool/list type inputs. When bool type: enables recomputation for all layers; when list type: uses layer indices to enable recomputation for specific layers. |
-
-## Cases of Fine-Grained Activations SWAP
-
-This section demonstrates the usage of fine-grained activations SWAP using Llama2-7B training as an example.
-
-### Environmental Preparation
-
-Download Mindformers, and prepare the pre-training dataset, such as wikitext.
-
-### Case 1: Default SWAP Strategy
-
-Modify and supplement the recomputation and SWAP configurations in YAML as follows:
-
-```yaml
-context:
- memory_optimize_level: "O0"
-model:
- model_config:
- num_layers: 4
-recompute_config:
- recompute: False
- select_recompute: False
- select_comm_recompute: False
-swap_config:
- swap: True
- default_prefetch: 10
-```
-
-Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
-
-```bash
-export GLOG_v=1
-export MS_MEMORY_STATISTIC=1
-export ENABLE_LAZY_INLINE_NO_PIPELINE=1
-YAML_FILE=$1 # User specifies the YAML file path.
-ROOT_PATH=`pwd`
-
-bash ./scripts/msrun_launcher.sh "run_mindformer.py \
- --config ${ROOT_PATH}/${YAML_FILE} \
- --run_mode train \
- --use_parallel True" \
- 8 8 8118 0 output/msrun False 300
-```
-
-After training completes, execute the command `cat output/msrun/worker_0.log | grep 'attention.flash_attention'` to check the execution status of the default SWAP strategy:
-
-```text
--INFO - Set op_swap at layer 0: attention.flash_attention, value=10
--INFO - Set op_swap at layer 1: attention.flash_attention, value=10
--INFO - Set op_swap at layer 2: attention.flash_attention, value=10
--INFO - Set op_swap at layer 3: attention.flash_attention, value=10
-```
-
-The default SWAP strategy is executed successfully.
-
-### Case 2: Select Specific Layers to Enable SWAP
-
-Modify and supplement the recomputation and SWAP configurations in YAML as follows:
-
-```yaml
-context:
- memory_optimize_level: "O0"
-model:
- model_config:
- num_layers: 4
-recompute_config:
- recompute: False
- select_recompute: False
- select_comm_recompute: False
-swap_config:
- swap: True
- layer_swap:
- - backward_prefetch: 20
- layers: [0,3]
-```
-
-Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
-
-```bash
-export GLOG_v=1
-export MS_MEMORY_STATISTIC=1
-export ENABLE_LAZY_INLINE_NO_PIPELINE=1
-YAML_FILE=$1 # User specifies the YAML file path.
-ROOT_PATH=`pwd`
-
-bash ./scripts/msrun_launcher.sh "run_mindformer.py \
- --config ${ROOT_PATH}/${YAML_FILE} \
- --run_mode train \
- --use_parallel True" \
- 8 8 8118 0 output/msrun False 300
-```
-
-After training completes, execute the command `cat output/msrun/worker_0.log | grep 'Set layer swap at'` to check the execution status of the default SWAP strategy:
-
-```text
--INFO - Set layer swap at layer 0 and value is: 20
--INFO - Set layer swap at layer 3 and value is: 20
-```
-
-The strategy of enabling SWAP for specific layers is executed successfully.
-
-### Case 3: Select Specific Operators within Layers to Enable SWAP
-
-Modify and supplement the recomputation and SWAP configurations in YAML as follows:
-
-```yaml
-context:
- memory_optimize_level: "O0"
-model:
- model_config:
- num_layers: 4
-recompute_config:
- recompute: False
- select_recompute: False
- select_comm_recompute: False
-swap_config:
- swap: True
- op_swap:
- - op_name: 'attention'
- backward_prefetch: 20
- layers: [0,1,2]
- - op_name: 'attention'
- backward_prefetch: 10
- layers: [3]
- - op_name: 'feed_forward'
- backward_prefetch: 15
- layers: [1,2]
-```
-
-Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
-
-```bash
-export GLOG_v=1
-export MS_MEMORY_STATISTIC=1
-export ENABLE_LAZY_INLINE_NO_PIPELINE=1
-YAML_FILE=$1 # User specifies the YAML file path.
-ROOT_PATH=`pwd`
-
-bash ./scripts/msrun_launcher.sh "run_mindformer.py \
- --config ${ROOT_PATH}/${YAML_FILE} \
- --run_mode train \
- --use_parallel True" \
- 8 8 8118 0 output/msrun False 300
-```
-
-After training completes, execute the command `cat output/msrun/worker_0.log | grep 'Set op_swap at layer'` to check the execution status of the default SWAP strategy:
-
-```text
--INFO - Set op_swap at layer 0: .attention, value=20
--INFO - Set op_swap at layer 1: .attention, value=20, .feed_forward, value=15
--INFO - Set op_swap at layer 2: .attention, value=20, .feed_forward, value=15
--INFO - Set op_swap at layer 3: .attention, value=10
-```
-
-The strategy of enabling SWAP for specific operators within layers is executed successfully.
-
-### Case 4: Use Fine-Grained Activations SWAP together with Recomputation
-
-Modify and supplement the recomputation and SWAP configurations in YAML as follows:
-
-```yaml
-context:
- memory_optimize_level: "O0"
-model:
- model_config:
- num_layers: 4
-recompute_config:
- recompute: False
- select_recompute:
- 'feed_forward': [0,3]
- select_comm_recompute: False
-swap_config:
- swap: True
- op_swap:
- - op_name: 'attention'
- backward_prefetch: 20
- layers: [0,1,2]
- - op_name: 'attention'
- backward_prefetch: 10
- layers: [3]
- - op_name: 'feed_forward'
- backward_prefetch: 15
- layers: [1,2]
-```
-
-Execute the following script to launch single-node 8-NPU training, with the script's execution path being the root directory, requiring the user to specify the YAML file path(machine_ip needs to fill in the local environment IP address):
-
-```bash
-export GLOG_v=1
-export MS_MEMORY_STATISTIC=1
-export ENABLE_LAZY_INLINE_NO_PIPELINE=1
-YAML_FILE=$1 # User specifies the YAML file path.
-ROOT_PATH=`pwd`
-
-bash ./scripts/msrun_launcher.sh "run_mindformer.py \
- --config ${ROOT_PATH}/${YAML_FILE} \
- --run_mode train \
- --use_parallel True" \
- 8 8 8118 0 output/msrun False 300
-```
-
-After training completes, execute the command `cat output/msrun/worker_0.log | grep 'Set op_swap at layer' -C 1` to check the execution status of the default SWAP strategy:
-
-```text
--INFO - Set select recompute at layer 0: feed_forward
--INFO - Set op_swap at layer 0: .attention, value=20
--INFO - Set op_swap at layer 1: .attention, value=20, .feed_forward, value=15
--INFO - Set op_swap at layer 2: .attention, value=20, .feed_forward, value=15
--INFO - Set select recompute at layer 3: feed_forward
--INFO - Set op_swap at layer 3: .attention, value=10
-```
-
-The strategy of enabling fine-grained activations SWAP together with recomputation is executed successfully.
diff --git a/docs/mindformers/docs/source_en/usage/mindie_deployment.md b/docs/mindformers/docs/source_en/guide/deployment.md
similarity index 97%
rename from docs/mindformers/docs/source_en/usage/mindie_deployment.md
rename to docs/mindformers/docs/source_en/guide/deployment.md
index d7a477c997db1993f0d7d98912628ac1635d0884..223f7f3630d0b57fde6793bdf800533a9349cb84 100644
--- a/docs/mindformers/docs/source_en/usage/mindie_deployment.md
+++ b/docs/mindformers/docs/source_en/guide/deployment.md
@@ -1,6 +1,6 @@
# Service Deployment
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/mindie_deployment.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/guide/deployment.md)
## Introduction
@@ -8,7 +8,7 @@ MindIE, full name Mind Inference Engine, is a high-performance inference framewo
MindSpore Transformers are hosted in the model application layer MindIE LLM, and large models in MindSpore Transformers can be deployed through MindIE Service.
-The model support for MindIE inference can be found in [model repository](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html).
+The model support for MindIE inference can be found in [model repository](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html).
## Environment Setup
@@ -16,7 +16,7 @@ The model support for MindIE inference can be found in [model repository](https:
1. Install MindSpore Transformers
- Refer to [MindSpore Transformers Official Installation Guide](https://www.mindspore.cn/mindformers/docs/en/dev/quick_start/install.html) for installation.
+ Refer to [MindSpore Transformers Official Installation Guide](https://www.mindspore.cn/mindformers/docs/en/dev/installation.html) for installation.
2. Install MindIE
@@ -86,9 +86,9 @@ processor:
merges_file: "/path/to/mf_model/qwen1_5_72b/merges.txt" # merges file absolute path
```
-For model weight downloading and conversions, refer to the [Weight Format Conversion Guide](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html).
+For model weight downloading and conversions, refer to the [Weight Format Conversion Guide](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html).
-Required files and configurations may vary from model to model. Refer to the model-specific inference sections in [Model Repository](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html) for details.
+Required files and configurations may vary from model to model. Refer to the model-specific inference sections in [Model Repository](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html) for details.
### Starting MindIE
@@ -346,4 +346,4 @@ The validation is successful with the following returned inference result:
## Model List
-Examples of MindIE inference for other models can be found in the introduction documentation for each model in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html).
\ No newline at end of file
+Examples of MindIE inference for other models can be found in the introduction documentation for each model in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html).
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/usage/inference.md b/docs/mindformers/docs/source_en/guide/inference.md
similarity index 98%
rename from docs/mindformers/docs/source_en/usage/inference.md
rename to docs/mindformers/docs/source_en/guide/inference.md
index 7dbb41f52187ee933e6a1c6f0606468f43ed84de..aaacee7c152773e2b43ed5e04d88784ce10ae41c 100644
--- a/docs/mindformers/docs/source_en/usage/inference.md
+++ b/docs/mindformers/docs/source_en/guide/inference.md
@@ -1,6 +1,6 @@
# Inference
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/inference.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/guide/inference.md)
## Overview
@@ -22,8 +22,8 @@ Model weights can be categorized into two types: complete weights and distribute
Complete weights can be obtained in two ways:
-1. After downloading the open source weights of the corresponding model from the HuggingFace model library, refer to [Weight Format Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html) to convert them to the ckpt format.
-2. Pre-trained or fine-tuned distributed weights are used to generate a complete weight by [merging](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html).
+1. After downloading the open source weights of the corresponding model from the HuggingFace model library, refer to [Weight Format Conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html) to convert them to the ckpt format.
+2. Pre-trained or fine-tuned distributed weights are used to generate a complete weight by [merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html).
#### 2.2 Distributed Weights
@@ -35,7 +35,7 @@ If the inference uses a weight slicing that is different from the model slicing
2. The weights of the eight-card training are reasoned over two cards;
3. Already sliced distributed weights are reasoned on a single card, and so on.
-The command samples in the following contents are all used in the way of online autoslicing. It is recommended to use online autoslicing by setting the command parameters `--auto_trans_ckpt` to `-True` and `-src_strategy_path_or_dir` to the weighted slicing strategy file or directory path (which is saved by default after training under `./output/strategy`) are automatically sliced in the inference task. Details can be found in [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html).
+The command samples in the following contents are all used in the way of online autoslicing. It is recommended to use online autoslicing by setting the command parameters `--auto_trans_ckpt` to `-True` and `-src_strategy_path_or_dir` to the weighted slicing strategy file or directory path (which is saved by default after training under `./output/strategy`) are automatically sliced in the inference task. Details can be found in [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html).
> Since both the training and inference tasks use `./output` as the default output path, when using the strategy file output by the training task as the source weight strategy file for the inference task, you need to move the strategy file directory under the default output path to another location to avoid it being emptied by the process of the inference task, for example:
>
@@ -173,7 +173,7 @@ Multi-card multi-batch inference is initiated in the same way as [multi-card inf
The content and format of the `input_predict_data.txt` file is an input each line, and the number of questions is the same as the `predict_batch_size`, which can be found in the following format:
-```txt
+```text
I love Beijing, because
I love Beijing, because
I love Beijing, because
@@ -358,4 +358,4 @@ Thanks, sir.
## More Information
-For more inference examples of different models, see [the models supported by MindSpore Transformers](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html).
+For more inference examples of different models, see [the models supported by MindSpore Transformers](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html).
diff --git a/docs/mindformers/docs/source_en/usage/pre_training.md b/docs/mindformers/docs/source_en/guide/pre_training.md
similarity index 96%
rename from docs/mindformers/docs/source_en/usage/pre_training.md
rename to docs/mindformers/docs/source_en/guide/pre_training.md
index 55d538ff89bec54df22adc27d47870238ca118bd..a0a3a3ba35586db30f0d801606189cdd84e13194 100644
--- a/docs/mindformers/docs/source_en/usage/pre_training.md
+++ b/docs/mindformers/docs/source_en/guide/pre_training.md
@@ -1,6 +1,6 @@
# Pretraining
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/pre_training.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/guide/pre_training.md)
## Overview
@@ -82,8 +82,8 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
run_mode: running mode. The value can be train, finetune, or predict (inference).
```
-**Note**: During multi-node distributed training, some performance problems may occur. To ensure the efficiency and stability of the training process, you are advised to optimize and adjust the performance by referring to [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/perf_optimize/perf_optimize.html).
+**Note**: During multi-node distributed training, some performance problems may occur. To ensure the efficiency and stability of the training process, you are advised to optimize and adjust the performance by referring to [Large Model Performance Optimization Guide](https://www.mindspore.cn/mindformers/docs/en/dev/advanced_development/performance_optimization.html).
## More Information
-For more training examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html).
+For more training examples of different models, see [the models supported by MindFormers](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html).
diff --git a/docs/mindformers/docs/source_en/usage/sft_tuning.md b/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md
similarity index 98%
rename from docs/mindformers/docs/source_en/usage/sft_tuning.md
rename to docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md
index 0f8d2f58b87b3207e0312f484fd96f1e940ecc3f..f51952873dd2bf668686a3b7f1dc4125d6fee9dc 100644
--- a/docs/mindformers/docs/source_en/usage/sft_tuning.md
+++ b/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md
@@ -1,6 +1,6 @@
# Supervised Fine-Tuning (SFT)
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/usage/sft_tuning.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/guide/supervised_fine_tuning.md)
## Overview
@@ -179,7 +179,7 @@ After the task is executed, the **checkpoint** folder is generated in the **mind
#### Multi-Node Training
-The multi-node multi-device fine-tuning task is similar to the pretrained task. You can refer to the [multi-node multi-device pretraining command](https://www.mindspore.cn/mindformers/docs/en/dev/usage/pre_training.html#multi-node-training) and modify the command as follows:
+The multi-node multi-device fine-tuning task is similar to the pretrained task. You can refer to the [multi-node multi-device pretraining command](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html#multi-node-training) and modify the command as follows:
1. Add the input parameter `--load_checkpoint /{path}/llama2_7b.ckpt` to the startup script to load the pretrained weights.
2. Set `--train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord` in the startup script to load the fine-tuning dataset.
@@ -243,7 +243,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
--run_mode finetune" 8
```
-When the distributed strategy of the weights does not match the distributed strategy of the model, the weights need to be transformed. The load weight path should be set to the upper path of the directory named with `rank_0`, and the weight auto transformation function should be enabled by setting `--auto_trans_ckpt True` . For a more detailed description of the scenarios and usage of distributed weight transformation, please refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html).
+When the distributed strategy of the weights does not match the distributed strategy of the model, the weights need to be transformed. The load weight path should be set to the upper path of the directory named with `rank_0`, and the weight auto transformation function should be enabled by setting `--auto_trans_ckpt True` . For a more detailed description of the scenarios and usage of distributed weight transformation, please refer to [Distributed Weight Slicing and Merging](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html).
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
diff --git a/docs/mindformers/docs/source_en/index.rst b/docs/mindformers/docs/source_en/index.rst
index bc039e3e82ee318dc96c5ce772ddcf478856c742..92b3f33a9534f8b5f9f1c154c914f604bd5e7f10 100644
--- a/docs/mindformers/docs/source_en/index.rst
+++ b/docs/mindformers/docs/source_en/index.rst
@@ -1,196 +1,208 @@
MindSpore Transformers Documentation
=====================================
-MindSpore Transformers (also known as MindFormers) is a MindSpore-native foundation model suite designed to provide full-flow development capabilities for foundation model training, fine-tuning, evaluating, inference and deploying, providing the industry mainstream Transformer class of pre-trained models and SOTA downstream task applications, and covering a rich range of parallel features, with the expectation of helping users to easily realize large model training and innovative research and development.
+The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.
-Users can refer to `Overall Architecture `_ and `Model Library `_ to get a quick overview of the MindSpore Transformers system architecture, and the list of supported functional features and foundation models. Further, refer to the `Installation `_ and `Quick Start `_ to get started with MindSpore Transformers.
+Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:
+
+- One-click initiation of single or multi card pre-training, fine-tuning, inference, and deployment processes for large models;
+- Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
+- System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
+- Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
+- Provide real-time visualization of training accuracy/performance monitoring indicators.
+
+Users can refer to `Overall Architecture `_ and `Model Library `_ to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models.
If you have any suggestions for MindSpore Transformers, please contact us via `issue `_ and we will handle them promptly.
-MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, evaluation, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:
+Full-process Developing with MindSpore Transformers
+-------------------------------------------------------------------------------------------
+
+MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:
-- `Development Migration `_
-- `Pretraining `_
-- `SFT Tuning `_
-- `Evaluation `_
-- `Inference `_
-- `Quantization `_
-- `Service Deployment `_
-- `Multimodal Model Development `_
+- `Pretraining `_
+- `Supervised Fine-Tuning `_
+- `Inference `_
+- `Service Deployment `_
Code repository address:
-Flexible and Easy-to-Use Personalized Configuration with MindSpore Transformers
+Features description of MindSpore Transformers
-------------------------------------------------------------------------------------------
-With its powerful feature set, MindSpore Transformers provides users with flexible and easy-to-use personalized configuration options. Specifically, it comes with the following key features:
+MindSpore Transformers provides a wealth of features throughout the full-process of large model development. Users can learn about these features via the following links:
+
+- General Features:
-1. `Start Tasks `_
+ - `Start Tasks `_
One-click start for single-device, single-node and multi-node tasks.
-2. `Weight Format Conversion `_
+ - `Weight Format Conversion `_
+
+ Provides a unified weight conversion tool that converts model weights between the formats used by HuggingFace and MindSpore Transformers.
+
+ - `Distributed Weight Slicing and Merging `_
+
+ Weights in different distributed scenarios are flexibly sliced and merged.
+
+ - `Safetensors Weights `_
+
+ Supports saving and loading weight files in safetensors format.
+
+ - `Configuration File `_
+
+ Supports the use of `YAML` files to centrally manage and adjust configurable items in tasks.
- Provides a unified weight conversion tool that converts model weights between the formats used by HuggingFace and MindSpore Transformers.
+ - `Logging `_
-3. `Distributed Weight Slicing and Merging `_
+ Introduction of logs, including log structure, log saving, and so on.
- Weights in different distributed scenarios are flexibly sliced and merged.
+- Training Features:
-4. `Distributed Parallel `_
+ - `Dataset `_
- One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.
+ Supports multiple types and formats of datasets.
-5. `Dataset `_
+ - `Model Training Hyperparameters `_
- Support multiple types and formats of datasets.
+ Flexibly configure hyperparameter settings for large model training.
-6. `Model Training Hyperparameters Configuration `_
+ - `Training Metrics Monitoring `_
- Provides an introduction and examples of hyperparameter configuration for large model training.
+ Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.
-7. `Other features `_
+ - `Resumable Training After Breakpoint `_
- Introduce features such as gradient accumulation and gradient clipping.
+ Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.
-8. `Logs `_
+ - `Training High Availability (Beta) `_
- Introduction of logs, including log structure, log saving, and so on.
+ Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature).
-9. `Resumable Training After Breakpoint `_
+ - `Parallel Training `_
- Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.
+ One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.
-10. `Training Metrics Monitoring `_
+ - `Training Memory Optimization `_
- Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.
+ Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training.
-11. `Training High Availability `_
+ - `Other Training Features `_
- Provide high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery.
+ Supports gradient accumulation and gradient clipping, etc.
-12. `Safetensors Weights `_
+- Inference Features:
- Support the function of saving and loading weight files in safetensors format.
+ - `Evaluation `_
-13. `Fine-Grained Activations SWAP `_
+ Supports the use of third-party open-source evaluation frameworks and datasets for large-scale model ranking evaluations.
- Support fine-grained selection of specific activations to enable SWAP and reduce peak memory overhead during model training.
+ - `Quantization `_
-Deep Optimizing with MindSpore Transformers
----------------------------------------------
+ Integrated MindSpore Golden Stick toolkit to provides a unified quantization inference process.
-- `Precision Optimizing `_
-- `Performance Optimizing `_
+Advanced developing with MindSpore Transformers
+-------------------------------------------------
-Appendix
+- Diagnostics and Optimization
+
+ - `Precision Optimization `_
+ - `Performance Optimization `_
+
+- Model Development
+
+ - `Development Migration `_
+ - `Multimodal Model Development `_
+
+Environment Variables
+------------------------------------
+
+- `Environment Variables Description `_
+
+Contribution Guide
------------------------------------
-- `Environment Variables Descriptions `_
-- `Configuration File Descriptions `_
+- `MindSpore Transformers Contribution Guide `_
+- `Modelers Contribution Guide `_
FAQ
------------------------------------
- `Model-Related `_
-- `Function-Related `_
-- `MindSpore Transformers Contribution Guide `_
-- `Modelers Contribution Guide `_
+- `Function-Related `_
.. toctree::
:glob:
:maxdepth: 1
- :caption: Start
+ :caption: Introduction
:hidden:
- start/overview
- start/models
+ introduction/overview
+ introduction/models
.. toctree::
:glob:
:maxdepth: 1
- :caption: Quick Start
+ :caption: Installation
:hidden:
- quick_start/install
- quick_start/source_code_start
+ installation
.. toctree::
:glob:
:maxdepth: 1
- :caption: Usage Tutorials
+ :caption: Full-process Guide to Large Models
:hidden:
- usage/dev_migration
- usage/multi_modal
- usage/pre_training
- usage/sft_tuning
- usage/evaluation
- usage/inference
- usage/quantization
- usage/mindie_deployment
- usage/pretrain_gpt
+ guide/pre_training
+ guide/supervised_fine_tuning
+ guide/inference
+ guide/deployment
.. toctree::
:glob:
:maxdepth: 1
- :caption: Function Description
+ :caption: Features
:hidden:
- function/start_tasks
- function/weight_conversion
- function/transform_weight
- function/distributed_parallel
- function/dataset
- function/training_hyperparameters
- function/other_features
- function/logs
- function/resume_training
- function/monitor
- function/high_availability
- function/safetensors
- function/fine_grained_activations_swap
+ feature/start_tasks
+ feature/weight_conversion
+ feature/transform_weight
+ feature/safetensors
+ feature/configuration
+ feature/logging
+ feature/training_function
+ feature/infer_function
.. toctree::
:glob:
:maxdepth: 1
- :caption: Precision Optimization
+ :caption: Advanced Development
:hidden:
- acc_optimize/acc_optimize
+ advanced_development/precision_optimization
+ advanced_development/performance_optimization
+ advanced_development/dev_migration
+ advanced_development/multi_modal_dev
+ advanced_development/api
.. toctree::
:glob:
:maxdepth: 1
- :caption: Performance Optimization
- :hidden:
-
- perf_optimize/perf_optimize
-
-.. toctree::
- :maxdepth: 1
- :caption: API
+ :caption: Environment Variables
:hidden:
- mindformers
- mindformers.core
- mindformers.dataset
- mindformers.generation
- mindformers.models
- mindformers.modules
- mindformers.pet
- mindformers.pipeline
- mindformers.tools
- mindformers.wrapper
+ env_variables
.. toctree::
:glob:
:maxdepth: 1
- :caption: Appendix
+ :caption: Contribution Guide
:hidden:
- appendix/env_variables
- appendix/conf_files
+ contribution/mindformers_contribution
+ contribution/modelers_contribution
.. toctree::
:glob:
@@ -199,6 +211,4 @@ FAQ
:hidden:
faq/model_related
- faq/func_related
- faq/mindformers_contribution
- faq/modelers_contribution
\ No newline at end of file
+ faq/feature_related
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/quick_start/install.md b/docs/mindformers/docs/source_en/installation.md
similarity index 92%
rename from docs/mindformers/docs/source_en/quick_start/install.md
rename to docs/mindformers/docs/source_en/installation.md
index 6f0ccb6b1615a78f13b4386c27b76bf29e143ea2..68ea9d2c4319f738acad38eebbceea010438cb33 100644
--- a/docs/mindformers/docs/source_en/quick_start/install.md
+++ b/docs/mindformers/docs/source_en/installation.md
@@ -1,6 +1,6 @@
# Installation
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/quick_start/install.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/installation.md)
## Confirming Version Matching Relationship
@@ -23,7 +23,7 @@ Historical version matching relationship:
## Installing Dependent Software
-1. Install Firmware and Driver: Download the firmware and driver package through the [Confirming Version Matching Relationship](https://www.mindspore.cn/mindformers/docs/en/dev/quick_start/install.html#confirming-version-matching-relationship) to download the installation package, and refer to the [Ascend official tutorial](https://www.hiascend.com/en/document) for installation.
+1. Install Firmware and Driver: Download the firmware and driver package through the [Confirming Version Matching Relationship](https://www.mindspore.cn/mindformers/docs/en/dev/installation.html#confirming-version-matching-relationship) to download the installation package, and refer to the [Ascend official tutorial](https://www.hiascend.com/en/document) for installation.
2. Install CANN and MindSpore: Use the officially provided Docker image (CANN, MindSpore are already included in the image, no need to install them manually) or follow the [Manual Installation](https://www.mindspore.cn/install/en) section on the MindSpore website for installation.
diff --git a/docs/mindformers/docs/source_en/start/models.md b/docs/mindformers/docs/source_en/introduction/models.md
similarity index 44%
rename from docs/mindformers/docs/source_en/start/models.md
rename to docs/mindformers/docs/source_en/introduction/models.md
index 3f49dc31ef50ba23179044b438de6c2e11e87c10..adad824a6d0b3919b007164c024214747bbeb6f4 100644
--- a/docs/mindformers/docs/source_en/start/models.md
+++ b/docs/mindformers/docs/source_en/introduction/models.md
@@ -1,55 +1,61 @@
# Models
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/start/models.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/introduction/models.md)
The following table lists models supported by MindFormers.
| Model | Specifications | Model Type | Latest Version |
|:--------------------------------------------------------------------------------------------------------|:------------------------------|:----------------:|:----------------------:|
-| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/codellama.md) | 34B | Dense LLM | In-development version |
-| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_image.md) | 19B | MM | In-development version |
-| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_video.md) | 13B | MM | In-development version |
-| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | Sparse LLM | In-development version |
-| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek2) | 236B | Sparse LLM | In-development version |
-| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek1_5) | 7B | Dense LLM | In-development version |
-| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek) | 33B | Dense LLM | In-development version |
-| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version |
-| [GLM3-32K](https://gitee.com/mindspore/mindformers/tree/dev/research/glm32k) | 6B | Dense LLM | In-development version |
-| [GLM3](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm3.md) | 6B | Dense LLM | In-development version |
-| [InternLM2](https://gitee.com/mindspore/mindformers/tree/dev/research/internlm2) | 7B/20B | Dense LLM | In-development version |
-| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version |
-| [Llama3](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) | 8B/70B | Dense LLM | In-development version |
-| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md) | 7B/13B/70B | Dense LLM | In-development version |
-| [Mixtral](https://gitee.com/mindspore/mindformers/tree/dev/research/mixtral) | 8x7B | Sparse LLM | In-development version |
-| [Qwen2](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense/Sparse LLM | In-development version |
-| [Qwen1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen1_5) | 7B/14B/72B | Dense LLM | In-development version |
-| [Qwen-VL](https://gitee.com/mindspore/mindformers/tree/dev/research/qwenvl) | 9.6B | MM | In-development version |
-| [Whisper](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/whisper.md) | 1.5B | MM | In-development version |
-| [Yi](https://gitee.com/mindspore/mindformers/tree/dev/research/yi) | 6B/34B | Dense LLM | In-development version |
-| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | Dense LLM | 1.3.2 |
-| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | Dense LLM | 1.3.2 |
-| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | Dense LLM | 1.3.2 |
-| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | Dense LLM | 1.3.2 |
-| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | Dense LLM | 1.3.2 |
-| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | Dense LLM | 1.1.0 |
-| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | Dense LLM | 1.1.0 |
-| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | Dense LLM | 1.0 |
-| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 |
-| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | Dense LLM | 1.0 |
-| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 |
-| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | Dense LLM | 1.0 |
-| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | Dense LLM | 1.0 |
-| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | Dense LLM | 1.0 |
-| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | Dense LLM | 1.0 |
-| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 |
-| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | Dense LLM | 1.0 |
-| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | Dense LLM | 1.0 |
-| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 |
-| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | Dense LLM | 1.0 |
-| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 |
-| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | Dense LLM | 1.0 |
-| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 |
-| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | Dense LLM | 1.0 |
-| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | Dense LLM | 0.8 |
+| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3) | 671B | Sparse LLM | In-development version, 1.5.0 |
+| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | Dense LLM | In-development version, 1.5.0 |
+| [Llama3.1](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1) | 8B/70B | Dense LLM | In-development version, 1.5.0 |
+| [Qwen2.5](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 |
+| [TeleChat2](https://gitee.com/mindspore/mindformers/blob/dev/research/telechat2) | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 |
+| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | Dense LLM | 1.5.0 |
+| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 |
+| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 |
+| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek2) | 236B | Sparse LLM | 1.5.0 |
+| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek1_5) | 7B | Dense LLM | 1.5.0 |
+| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek) | 33B | Dense LLM | 1.5.0 |
+| [GLM3-32K](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/glm32k) | 6B | Dense LLM | 1.5.0 |
+| [GLM3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/glm3.md) | 6B | Dense LLM | 1.5.0 |
+| [InternLM2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/internlm2) | 7B/20B | Dense LLM | 1.5.0 |
+| [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | 3B | Dense LLM | 1.5.0 |
+| [Llama3.2-Vision](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/mllama.md) | 11B | MM | 1.5.0 |
+| [Llama3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/llama3) | 8B/70B | Dense LLM | 1.5.0 |
+| [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md) | 7B/13B/70B | Dense LLM | 1.5.0 |
+| [Mixtral](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/mixtral) | 8x7B | Sparse LLM | 1.5.0 |
+| [Qwen2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense /Sparse LLM | 1.5.0 |
+| [Qwen1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen1_5) | 7B/14B/72B | Dense LLM | 1.5.0 |
+| [Qwen-VL](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 |
+| [TeleChat](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/telechat) | 7B/12B/52B | Dense LLM | 1.5.0 |
+| [Whisper](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/whisper.md) | 1.5B | MM | 1.5.0 |
+| [Yi](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yi) | 6B/34B | Dense LLM | 1.5.0 |
+| [YiZhao](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yizhao) | 12B | Dense LLM | 1.5.0 |
+| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | Dense LLM | 1.3.2 |
+| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | Dense LLM | 1.3.2 |
+| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | Dense LLM | 1.3.2 |
+| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | Dense LLM | 1.3.2 |
+| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | Dense LLM | 1.3.2 |
+| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | Dense LLM | 1.1.0 |
+| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | Dense LLM | 1.1.0 |
+| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | Dense LLM | 1.0 |
+| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 |
+| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | Dense LLM | 1.0 |
+| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 |
+| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | Dense LLM | 1.0 |
+| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | Dense LLM | 1.0 |
+| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | Dense LLM | 1.0 |
+| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | Dense LLM | 1.0 |
+| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 |
+| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | Dense LLM | 1.0 |
+| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | Dense LLM | 1.0 |
+| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 |
+| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | Dense LLM | 1.0 |
+| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 |
+| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | Dense LLM | 1.0 |
+| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 |
+| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | Dense LLM | 1.0 |
+| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | Dense LLM | 0.8 |
* ***LLM:*** *Large Language Model;* ***MM:*** *Multi-Modal*
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_en/start/overview.md b/docs/mindformers/docs/source_en/introduction/overview.md
similarity index 44%
rename from docs/mindformers/docs/source_en/start/overview.md
rename to docs/mindformers/docs/source_en/introduction/overview.md
index 4a4ec0fda939f58d62ce61ffddfed0508f46b0ef..322b841f5d97a392216e580cc5ad6f5cb24c22d5 100644
--- a/docs/mindformers/docs/source_en/start/overview.md
+++ b/docs/mindformers/docs/source_en/introduction/overview.md
@@ -1,13 +1,13 @@
# Overall Structure
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/start/overview.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/introduction/overview.md)
The overall architecture formed by MindSpore Transformers and the end-to-end AI hardware and software ecosystem of MindSpore and Ascend is as follows:
1. At the hardware level, MindSpore Transformers supports users running large models on Ascend servers;
2. At the software level, MindSpore Transformers implements the big model-related code through the Python interface provided by MindSpore and performs data computation by the operator libraries provided by the supporting software package of the Ascend AI processor;
3. The basic functionality features currently supported by MindSpore Transformers are listed below:
- 1. Supports tasks such as running training and inference for large models [distributed parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/function/distributed_parallel.html), with parallel capabilities including data parallelism, model parallelism, ultra-long sequence parallelism;
- 2. Supports [model weight conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html), [distributed weight splitting and combination](https://www.mindspore.cn/mindformers/docs/en/dev/function/transform_weight.html), and different format of [dataset loading](https://www.mindspore.cn/mindformers/docs/en/dev/function/dataset.html) and [resumable training after breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/function/resume_training.html);
- 3. Support 25+ large models [pretraining](https://www.mindspore.cn/mindformers/docs/en/dev/usage/pre_training.html), [fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/usage/sft_tuning.html), [inference](https://www.mindspore.cn/mindformers/docs/en/dev/usage/inference.html) and [evaluation] (https://www.mindspore.cn/mindformers/docs/en/dev/usage/evaluation.html). Meanwhile, it also supports [quantization](https://www.mindspore.cn/mindformers/docs/en/dev/usage/quantization.html), and the list of supported models can be found in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html);
-4. MindSpore Transformers supports users to carry out model service deployment function through [MindIE](https://www.mindspore.cn/mindformers/docs/en/dev/usage/mindie_deployment.html), and also supports the use of [MindX]( https://www.hiascend.com/software/mindx-dl) to realize large-scale cluster scheduling; more third-party platforms will be supported in the future, please look forward to it.
+ 1. Supports tasks such as running training and inference for large models [distributed parallelism](https://www.mindspore.cn/mindformers/docs/en/dev/feature/parallel_training.html), with parallel capabilities including data parallelism, model parallelism, ultra-long sequence parallelism;
+ 2. Supports [model weight conversion](https://www.mindspore.cn/mindformers/docs/en/dev/feature/weight_conversion.html), [distributed weight splitting and combination](https://www.mindspore.cn/mindformers/docs/en/dev/feature/transform_weight.html), and different format of [dataset loading](https://www.mindspore.cn/mindformers/docs/en/dev/feature/dataset.html) and [resumable training after breakpoint](https://www.mindspore.cn/mindformers/docs/en/dev/feature/resume_training.html);
+ 3. Support 25+ large models [pretraining](https://www.mindspore.cn/mindformers/docs/en/dev/guide/pre_training.html), [fine-tuning](https://www.mindspore.cn/mindformers/docs/en/dev/guide/supervised_fine_tuning.html), [inference](https://www.mindspore.cn/mindformers/docs/en/dev/guide/inference.html) and [evaluation] (https://www.mindspore.cn/mindformers/docs/en/dev/feature/evaluation.html). Meanwhile, it also supports [quantization](https://www.mindspore.cn/mindformers/docs/en/dev/feature/quantization.html), and the list of supported models can be found in [Model Library](https://www.mindspore.cn/mindformers/docs/en/dev/introduction/models.html);
+4. MindSpore Transformers supports users to carry out model service deployment function through [MindIE](https://www.mindspore.cn/mindformers/docs/en/dev/guide/deployment.html), and also supports the use of [MindX]( https://www.hiascend.com/software/mindx-dl) to realize large-scale cluster scheduling; more third-party platforms will be supported in the future, please look forward to it.
diff --git a/docs/mindformers/docs/source_en/quick_start/source_code_start.md b/docs/mindformers/docs/source_en/quick_start/source_code_start.md
deleted file mode 100644
index 0f6145e4f1035ba1a0d68a6c9e9d847c2783780e..0000000000000000000000000000000000000000
--- a/docs/mindformers/docs/source_en/quick_start/source_code_start.md
+++ /dev/null
@@ -1,110 +0,0 @@
-# Calling Source Code to Start
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/quick_start/source_code_start.md)
-
-This section shows how to use MindSpore Transformers to quickly pull up a LoRA low-parameter fine-tuning task based on the Llama2-7B model. To use other models and tasks via MindSpore Transformers, please read the corresponding [model documentation](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html).
-
-## Preparing Weights File
-
-MindSpore Transformers provides pre-trained weights and word list files that have been converted for pre-training, fine-tuning and inference. Users can also download the official HuggingFace weights and use them after converting the model weights. For convenience, this file won't go into too much detail about converting the original weights here, but you can refer to the [Llama2 documentation](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md) and [weight conversion](https://www.mindspore.cn/mindformers/docs/en/dev/function/weight_conversion.html) for more details. Please download the `MindSpore` weights, the converted `.ckpt` file, and the `tokenizer.model` file for subsequent processing.
-
-| Model Name | MindSpore Weights | HuggingFace Weights |
-| ------ | ------ | ------ |
-| Llama2-7B | [llama2_7b.ckpt](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/llama2_7b.ckpt) | [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
-
-Word list download link: [tokenizer.model](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/tokenizer.model)
-
-## Preparing Dataset
-
-1. The dataset file alpaca_data.json used in the fine-tuning process can be obtained at [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca).
-
-2. Data Preprocessing
-
- The following command needs to be executed in the MindSpore Transformers code root directory, and replaces {path} below with the local path where the dataset files are stored.
-
- 1. Execute [mindformers/tools/dataset_preprocess/llama/alpaca_converter.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/alpaca_converter.py), and add prompt templates to convert the raw dataset into a multi-round conversation format.
-
- ```shell
- python mindformers/tools/dataset_preprocess/llama/alpaca_converter.py \
- --data_path /{path}/alpaca_data.json \
- --output_path /{path}/alpaca-data-conversation.json
- ```
-
- **Parameter descriptions**
-
- - data_path: Input the path to the downloaded file.
- - output_path: Save path of the output file.
-
- 2. Execute [mindformers/tools/dataset_preprocess/llama/llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/llama_preprocess.py), and generate MindRecord data and convert data with prompt templates to MindRecord format.
-
- ```shell
- python mindformers/tools/dataset_preprocess/llama/llama_preprocess.py \
- --dataset_type qa \
- --input_glob /{path}/alpaca-data-conversation.json \
- --model_file /{path}/tokenizer.model \
- --seq_length 4096 \
- --output_file /{path}/alpaca-fastchat4096.mindrecord
- ```
-
- **Parameter descriptions**
-
- - dataset_type: Preprocessed data types. The options include "wiki" and "qa."
- - "wiki" is used to process the Wikitext2 dataset, which is suitable for the pre-training and evaluation stages.
- - "qa" is used to process the Alpaca dataset, converting it into a question-answer format, which is suitable for the fine-tuning stage.
- For other dataset conversion scripts, please refer to the corresponding [model documentation](https://www.mindspore.cn/mindformers/docs/en/dev/start/models.html).
- - input_glob: Path to the converted alpaca file.
- - model_file: Path to the model tokenizer.model file.
- - seq_length: Sequence length of the output data.
- - output_file: Save path of the output file.
-
- 3. The console outputs the following, proving that the format conversion was successful.
-
- ```shell
- # Console outputs
- Transformed 52002 records.
- Transform finished, output files refer: {path}/alpaca-fastchat4096.mindrecord
- ```
-
-## Initiating Fine-tuning
-
-In the MindSpore Transformers code root directory, execute the following command to launch the fine-tuning task:
-
-```shell
-bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/lora_llama2_7b.yaml \
- --train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord \
- --load_checkpoint /{path}/llama2_7b.ckpt \
- --auto_trans_ckpt True \
- --use_parallel True \
- --run_mode finetune" 8
-```
-
-**Command Explanation:**
-
-- `scripts/msrun_launcher.sh`: Script for launching distributed tasks.
-- `"run_mindformer.py ..."`: Parameter string for the Python task executed on each card, including:
- - `run_mindformer.py`: One-click startup script.
- - `--config`: Specifies the task configuration file path, e.g., `configs/llama2/lora_llama2_7b.yaml`.
- - `--train_dataset_dir`: Specifies the dataset path, e.g., `/{path}/alpaca-fastchat4096.mindrecord`.
- - `--load_checkpoint`: Specifies the checkpoint file path, e.g., `/{path}/llama2_7b.ckpt`.
- - `--auto_trans_ckpt True`: Enables automatic checkpoint partitioning.
- - `--use_parallel True`: Enables distributed task execution.
- - `--run_mode finetune`: Sets the run mode to fine-tuning.
-- `8`: Sets the task to runs on 8 NPUs.
-
-When the following log appears on the console:
-
-```shell
-Start worker process with rank id:0, log file:output/msrun_log/worker_0.log. Environment variable [RANK_ID=0] is exported.
-Start worker process with rank id:1, log file:output/msrun_log/worker_1.log. Environment variable [RANK_ID=1] is exported.
-Start worker process with rank id:2, log file:output/msrun_log/worker_2.log. Environment variable [RANK_ID=2] is exported.
-Start worker process with rank id:3, log file:output/msrun_log/worker_3.log. Environment variable [RANK_ID=3] is exported.
-Start worker process with rank id:4, log file:output/msrun_log/worker_4.log. Environment variable [RANK_ID=4] is exported.
-Start worker process with rank id:5, log file:output/msrun_log/worker_5.log. Environment variable [RANK_ID=5] is exported.
-Start worker process with rank id:6, log file:output/msrun_log/worker_6.log. Environment variable [RANK_ID=6] is exported.
-Start worker process with rank id:7, log file:output/msrun_log/worker_7.log. Environment variable [RANK_ID=7] is exported.
-```
-
-It indicates that the fine-tuning task is started, the progress can be monitored in the `output/msrun_log/` directory.
-
-For more details on Llama2, and more startup approaches, please refer specifically to the `Llama2` [README](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#llama-2) documentation for more support.
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/api.rst b/docs/mindformers/docs/source_zh_cn/advanced_development/api.rst
new file mode 100644
index 0000000000000000000000000000000000000000..f0accd105687587ec6e9a0ad6dce6c895bb0b8ff
--- /dev/null
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/api.rst
@@ -0,0 +1,17 @@
+API
+===========
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+
+ ../mindformers
+ ../mindformers.core
+ ../mindformers.dataset
+ ../mindformers.generation
+ ../mindformers.models
+ ../mindformers.modules
+ ../mindformers.pet
+ ../mindformers.pipeline
+ ../mindformers.tools
+ ../mindformers.wrapper
diff --git a/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
similarity index 88%
rename from docs/mindformers/docs/source_zh_cn/usage/dev_migration.md
rename to docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
index b25bd538337f569c5643c0f6128b1425b460db94..12df6e202715ed62a373bf606a80db75dfe41797 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md
@@ -1,6 +1,6 @@
# 开发迁移
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/dev_migration.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/advanced_development/dev_migration.md)
本文档将指导用户如何基于MindSpore Transformers开发构建一个大模型,并完成最基本的适配,以拉起训练和推理流程。
@@ -46,9 +46,9 @@ MindSpore Transformers提供了[PretrainedTokenizer](https://www.mindspore.cn/mi
### 准备权重和数据集
-如已有基于PyTorch的模型权重,可以参考[权重转换文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)将权重转换为MindSpore格式的权重。
+如已有基于PyTorch的模型权重,可以参考[权重转换文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)将权重转换为MindSpore格式的权重。
-数据集的准备可以参考[数据集文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/dataset.html),或参考模型文档,如[Llama2说明文档——数据集准备](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)。
+数据集的准备可以参考[数据集文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html),或参考模型文档,如[Llama2说明文档——数据集准备](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#%E6%95%B0%E6%8D%AE%E5%8F%8A%E6%9D%83%E9%87%8D%E5%87%86%E5%A4%87)。
### 准备`YAML`配置文件
@@ -93,13 +93,13 @@ python run_mindformer.py --config research/llama3_1/predict_llama3_1_8b.yaml --l
其中设置了`register_path`为外挂代码所在目录的路径`research/llama3_1`,模型权重的准备参考[Llama3.1说明文档——模型权重下载](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/README.md#%E6%A8%A1%E5%9E%8B%E6%9D%83%E9%87%8D%E4%B8%8B%E8%BD%BD)。
-配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html)。在实际编写配置文件时,也可以参考库内已有的配置文件,例如[Llama2-7B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml)。
+配置文件的详细内容及可配置项可以参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。在实际编写配置文件时,也可以参考库内已有的配置文件,例如[Llama2-7B微调的配置文件](https://gitee.com/mindspore/mindformers/blob/dev/configs/llama2/finetune_llama2_7b.yaml)。
-在准备完上述所有基本要素之后,可以参考MindSpore Transformers使用教程中的其余文档进行模型训练、微调、推理等流程的实践。后续模型调试调优可以参考[大模型精度调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/acc_optimize/acc_optimize.html)和[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/perf_optimize/perf_optimize.html)。
+在准备完上述所有基本要素之后,可以参考MindSpore Transformers使用教程中的其余文档进行模型训练、微调、推理等流程的实践。后续模型调试调优可以参考[大模型精度调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/precision_optimization.html)和[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。
### 将模型贡献给MindSpore Transformers开源仓库
-可以参考[MindSpore Transformers贡献指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/faq/mindformers_contribution.html),将模型贡献到MindSpore Transformers的开源仓库,供广大开发者研究和使用。
+可以参考[MindSpore Transformers贡献指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/contribution/mindformers_contribution.html),将模型贡献到MindSpore Transformers的开源仓库,供广大开发者研究和使用。
## MindSpore Transformers大模型迁移实践
@@ -111,7 +111,7 @@ Llama3-8B与Llama2-7B拥有相同的模型结构,只有部分模型参数、
以下对比了Llama2-7B和Llama3-8B的模型配置:
-
+
其中的区别有:
diff --git a/docs/mindformers/docs/source_zh_cn/perf_optimize/images/cast.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/cast.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/perf_optimize/images/cast.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/cast.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/general_process.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/general_process.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/general_process.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/general_process.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/local_norm.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/local_norm.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/local_norm.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/local_norm.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss1.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss1.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss1.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss1.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss2.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss2.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss2.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss2.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss3.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss3.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss3.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss3.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss4.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss4.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss4.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss4.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss5.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss5.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss5.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss5.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss6.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss6.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss6.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss6.png
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss7.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/loss7.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/image/loss7.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/loss7.png
diff --git a/docs/mindformers/docs/source_zh_cn/usage/image/model_config_comparison.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/model_config_comparison.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/usage/image/model_config_comparison.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/model_config_comparison.png
diff --git a/docs/mindformers/docs/source_zh_cn/perf_optimize/images/mstx.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/mstx.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/perf_optimize/images/mstx.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/mstx.png
diff --git a/docs/mindformers/docs/source_zh_cn/advanced_development/images/multi_modal.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/multi_modal.png
new file mode 100644
index 0000000000000000000000000000000000000000..9ad095bf884e6f052deea77f0bd4725284fede2d
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/advanced_development/images/multi_modal.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/perf_optimize/images/reshape.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/reshape.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/perf_optimize/images/reshape.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/reshape.png
diff --git a/docs/mindformers/docs/source_zh_cn/perf_optimize/images/silu_mul.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/silu_mul.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/perf_optimize/images/silu_mul.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/silu_mul.png
diff --git a/docs/mindformers/docs/source_zh_cn/perf_optimize/images/studio.png b/docs/mindformers/docs/source_zh_cn/advanced_development/images/studio.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/perf_optimize/images/studio.png
rename to docs/mindformers/docs/source_zh_cn/advanced_development/images/studio.png
diff --git a/docs/mindformers/docs/source_zh_cn/usage/multi_modal.md b/docs/mindformers/docs/source_zh_cn/advanced_development/multi_modal_dev.md
similarity index 99%
rename from docs/mindformers/docs/source_zh_cn/usage/multi_modal.md
rename to docs/mindformers/docs/source_zh_cn/advanced_development/multi_modal_dev.md
index 89a61f83bbdea2a0e701131e9b02ddbafed1fc34..49289610aea9e0c5e04cf783fa59fca2c237ef1a 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/multi_modal.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/multi_modal_dev.md
@@ -1,6 +1,6 @@
# 多模态理解模型开发
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/multi_modal.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/advanced_development/multi_modal_dev.md)
多模态理解模型(Multimodal Model)是指能够处理并结合来自不同模态(如文字、图像、音频、视频等)的信息进行学习和推理的人工智能模型。
传统的单一模态模型通常只关注单一数据类型,如文本分类模型只处理文本数据,图像识别模型只处理图像数据。而多模态理解模型则通过融合不同来源的数据来完成更复杂的任务,从而能够理解和生成更加丰富、全面的内容。
@@ -67,7 +67,7 @@ print(dataset_loader[0])
下图是多模态数据的处理流程图,图中的自定义模块需要用户根据实际需求实现,其他模块直接调用即可。
-
+
下面以[CogVLm2-Video模型数据预处理模块](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/cogvlm2/cogvlm2_processor.py)为例,介绍多模态数据处理模块中各组成部分的功能。
@@ -324,7 +324,7 @@ class MultiModalForCausalLM(BaseXModalToTextModel):
在实现多模态数据集、数据处理模块以及多模态理解模型构建之后,就可以通过模型配置文件启动模型预训练、微调、推理等任务,为此需要构建对应的模型配置文件。
-具体模型配置文件可参考[predict_cogvlm2_video_llama3_chat_13b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/predict_cogvlm2_video_llama3_chat_13b.yaml)和[finetune_cogvlm2_video_llama3_chat_13b_lora.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/finetune_cogvlm2_video_llama3_chat_13b_lora.yaml)分别对应模型推理和微调,其中参数具体含义可查阅[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html)。
+具体模型配置文件可参考[predict_cogvlm2_video_llama3_chat_13b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/predict_cogvlm2_video_llama3_chat_13b.yaml)和[finetune_cogvlm2_video_llama3_chat_13b_lora.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/finetune_cogvlm2_video_llama3_chat_13b_lora.yaml)分别对应模型推理和微调,其中参数具体含义可查阅[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。
在用户自定义的配置文件中`model`、`processor`、`train_dataset`等部分内容需要对应用户自定义的**数据集**、**数据处理模块**以及**多模态理解模型**进行设置。
diff --git a/docs/mindformers/docs/source_zh_cn/perf_optimize/perf_optimize.md b/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md
similarity index 98%
rename from docs/mindformers/docs/source_zh_cn/perf_optimize/perf_optimize.md
rename to docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md
index c51b5d980892860f7b7b8df9ed82f562b818d309..eea05bf4b514cfe237f368afd3892bf870dbb5ef 100644
--- a/docs/mindformers/docs/source_zh_cn/perf_optimize/perf_optimize.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md
@@ -1,6 +1,6 @@
# 大模型性能调优指南
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/perf_optimize/perf_optimize.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/advanced_development/performance_optimization.md)
## 概述
@@ -64,7 +64,7 @@ $$
在实际应用中,通常会采用多种并行策略和优化手段,例如使用优化器并行和重计算等方式,以减少模型对内存的使用并提高训练效率。并行策略设计与模型的效率密切相关,因此在模型调优之前先确定一组或多组较优的并行策略,是至关重要的。
-详细介绍参考文档[并行策略指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/distributed_parallel.html)。
+详细介绍参考文档[并行策略指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html)。
对于不同的参数量规格的模型,可参考以下并行策略选择方向:
@@ -277,7 +277,7 @@ MindStudio Insight工具以时间线(Timeline)的形式呈现全流程在线
#### IR 图
-在[MindSpore Transformers配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html)中,只需要开启save_graphs,运行时会输出一些图编译过程中生成的.ir后缀的中间文件,这些被称为IR文件。默认情况下,这些文件会保存在当前执行目录下的graph目录中。IR文件是一种比较直观易懂的文本格式文件,用于描述模型结构的文件,可以直接用文本编辑软件查看。配置项含义参考[Config配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html),配置方法如下:
+在[MindSpore Transformers配置文件](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)中,只需要开启save_graphs,运行时会输出一些图编译过程中生成的.ir后缀的中间文件,这些被称为IR文件。默认情况下,这些文件会保存在当前执行目录下的graph目录中。IR文件是一种比较直观易懂的文本格式文件,用于描述模型结构的文件,可以直接用文本编辑软件查看。配置项含义参考[Config配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html),配置方法如下:
```yaml
context:
diff --git a/docs/mindformers/docs/source_zh_cn/acc_optimize/acc_optimize.md b/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md
similarity index 98%
rename from docs/mindformers/docs/source_zh_cn/acc_optimize/acc_optimize.md
rename to docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md
index dcf6f800f55b025e5998b0fc17d50154c4f32a63..9e7c34524d171cba72567f4b6bb652ccca604066 100644
--- a/docs/mindformers/docs/source_zh_cn/acc_optimize/acc_optimize.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md
@@ -1,6 +1,6 @@
# 大模型精度调优指南
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/acc_optimize/acc_optimize.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/advanced_development/precision_optimization.md)
## 精度问题概述和场景
@@ -159,7 +159,7 @@ export MINDSPORE_DUMP_CONFIG=${JSON_PATH}
模型的训练过程可以分解为如下过程:数据输入、前向计算、loss、反向计算、梯度、优化器权重更新、下一个step。下面将结合如下图的流程,介绍如何对训练各阶段进行排查。
-
+
### 阶段1:训练前准备
@@ -187,7 +187,7 @@ export MINDSPORE_DUMP_CONFIG=${JSON_PATH}
#### 权重转换
-训练过程中,MindSpore与PyTorch加载同一份权重。若是预训练场景,可以使用PyTorch保存一个初始化权重后,转换为MindSpore权重。因为MindSpore的权重名称与PyTorch有差异,权重转换的本质是将PyTorch权重dict中的名字改为MindSpore权重名字,以支持MindSpore加载。权重转换参考[权重转换指导](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)。
+训练过程中,MindSpore与PyTorch加载同一份权重。若是预训练场景,可以使用PyTorch保存一个初始化权重后,转换为MindSpore权重。因为MindSpore的权重名称与PyTorch有差异,权重转换的本质是将PyTorch权重dict中的名字改为MindSpore权重名字,以支持MindSpore加载。权重转换参考[权重转换指导](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)。
MindSpore与PyTorch均支持`bin`格式数据,加载相同的数据集进行训练,保证每个step一致。
@@ -299,7 +299,7 @@ def get_parameters(self):
下图是local norm对比的示例,对比权重对应的local norm值。
-
+
可发现在该图示的场景下,model.tok_embeddings.embedding_weight的local norm值差异较大,可重点排查Embedding的实现及计算精度等。
@@ -377,7 +377,7 @@ class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):
设置learning rate > 0,权重更新,进行长稳测试。训练至某个step出现loss差异较大的现象,之后训练loss开始发散,如图所示:
-
+
在该场景下,可针对突变前后的训练进行排查,可尝试如下排查方式:
@@ -391,7 +391,7 @@ class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):
长稳测试中,还可能出现训练前期拟合较好,后期收敛loss出现较大差异,如图所示:
-
+
在该场景下,可从如下角度进行排查:
@@ -451,7 +451,7 @@ class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):
在128卡集群下训练模型,使用 Ascend+MindSpore 训练与 GPU+PyTorch 训练进行对比,发现训练后期收敛的loss比 GPU+PyTorch 高0.1左右。如图所示,收敛不符合预期:
-
+
红色线为 Ascend+MindSpore 训练曲线,蓝色线为 GPU+PyTorch 训练曲线。
@@ -461,7 +461,7 @@ class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):
首先step1的loss对齐确认没问题。对比step1的local norm,计算每个权重的local norm值与标杆的差异,发现Embedding权重的local norm值与标杆的差异大。
-
+
排查原因为MindSpore Transformers使用FP32进行权重初始化,前向计算及反向计算Embedding时均使用FP32精度计算;而PyTorch的前向及反向计算均为BF16,由此导致了计算出来的local norm值存在差异。
@@ -469,11 +469,11 @@ class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):
长稳训练排查将由单卡实验扩展到多卡实验,先设置learning rate=0,即权重不更新。前向计算每个step的loss差异在0.001左右,前向计算误差符合预期。反向计算每个step的global norm差异在0.05左右,反向计算差异不大;初步判断模型迁移代码正确,模型结构一致,前反向计算差异不大。
-
+
再权重更新,单卡训练,设置learning rate=1e-5,训练1千step。收敛后期loss有稳定的0.1的差异,复现问题。
-
+
进行问题排查。识别如下问题:
@@ -485,8 +485,8 @@ class MFTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):
完成单卡训练后,启动多卡训练测试:设置learning rate=1e-5,训练1千step。训练后期收敛一致,但训练中期存在稳定的0.05误差。
-
+
为验证该误差在合理范围内,关闭确定性计算,重复跑两次GPU实验。图中红线为MindSpore训练的曲线,蓝色、绿色线分别是第一次、第二次GPU训练的曲线。在7千step左右训练不稳定处,MindSpore训练的曲线正处于两次GPU训练的曲线之间,说明误差处于合理范围内,问题最终解决。
-
+
diff --git a/docs/mindformers/docs/source_zh_cn/usage/pretrain_gpt.md b/docs/mindformers/docs/source_zh_cn/advanced_development/pretrain_gpt.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/usage/pretrain_gpt.md
rename to docs/mindformers/docs/source_zh_cn/advanced_development/pretrain_gpt.md
index d6b4d966c0ce76560e6fdedadde0ab24444c15b6..079deedec16b452e5f15a34675129275de7d9e09 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/pretrain_gpt.md
+++ b/docs/mindformers/docs/source_zh_cn/advanced_development/pretrain_gpt.md
@@ -1,505 +1,505 @@
-# 动态图并行
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/pretrain_gpt.md)
-
-## 概述
-
-本教程演示如何使用MindSpore Transformers动态图并行框架训练GPT模型,此框架支持张量并行、流水线并行、序列并行等并行场景,还有支持使用分布式优化器动态学习率等场景,帮助开发者快速、便捷地构建和训练基于动态图并行框架的GPT预训练模型。
-
-## 操作实践
-
-下面基于Ascend平台,进行GPT模型训练。
-
-### 样例代码参考
-
-目录结构如下:
-
-```text
-└─ gpt
- ├─ pretrain_gpt.py
- ├─ pretrain_gpt.sh
- └─ pretrain_gpt_7B.yaml
- ...
-```
-
-其中,`pretrain_gpt.py`是环境配置、模型对象创建及训练的脚本。`pretrain_gpt.sh`是启动执行脚本。`pretrain_gpt_7B.yaml`是配置项。
-
-### 模型结构
-
-GPT以`Transformer`模型为主要架构,网络结构主要围绕`Transformer`的基本构建块构建。
-
-在模型中,初始化五个参数,`config`是模型配置项(在yaml文件的`model_config`中),`num_tokentypes`指定embedding的类型,`parallel_output`用来确认是否输出每一个并行Tensor的输出,`pre_process`和`post_process`分别指定是否为第一阶段和最后一阶段。
-
-调用的`get_language_model`是一个基于`Transformer`模型的接口,详情请看`get_language_model`的api文档。
-
-注意:数据集返回值要与模型定义的前向过程所需要的参数相对应。
-
-```python
-from mindformers.experimental.parallel_core.pynative.transformer.module import Module
-from mindformers.experimental.parallel_core.pynative.transformer.language_model import get_language_model
-from mindformers.experimental.parallel_core.pynative.transformer import ParallelLMLogits
-from mindformers.experimental.parallel_core.pynative.training.loss_func import VocabParallelCrossEntropy
-
-
-class AttnMaskType(enum.Enum):
- padding = 1
- causal = 2
- no_mask = 3
- padding_causal = 4
-
-
-attn_mask_type_mapping = {
- "padding": AttnMaskType.padding,
- "causal": AttnMaskType.causal,
-}
-
-
-class GPTModel(Module):
- def __init__(self,
- config,
- num_tokentypes=0,
- parallel_output=True,
- pre_process=True,
- post_process=True):
- super().__init__(config=config,\
- share_embeddings_and_output_weights=not config.untie_embeddings_and_output_weights)
-
- self.parallel_output = parallel_output
- self.pre_process = pre_process
- self.post_process = post_process
- self.untie_embeddings_and_output_weights = config.untie_embeddings_and_output_weights
- self.fp16_lm_cross_entropy = config.fp16_lm_cross_entropy
-
- self.set_model_key()
- encoder_attn_mask_type = None
- if config.encoder_attn_mask_type is not None:
- encoder_attn_mask_type = attn_mask_type_mapping.get(config.encoder_attn_mask_type)
- if encoder_attn_mask_type is None:
- raise ValueError(f"encoder_attn_mask_type must be one of {attn_mask_type_mapping.keys()}, but got"
- f"{config.encoder_attn_mask_type}")
-
- self.language_model, self._language_model_key = get_language_model(
- config=config,
- num_tokentypes=num_tokentypes,
- add_pooler=False,
- encoder_attn_mask_type=encoder_attn_mask_type,
- pre_process=self.pre_process,
- post_process=self.post_process)
-
- if self.post_process:
- self.parallel_lm_logits = ParallelLMLogits(config=config,
- bias=False,
- compute_dtype=config.compute_dtype)
- self.loss = VocabParallelCrossEntropy()
-
- if not config.untie_embeddings_and_output_weights:
- self.initialize_word_embeddings()
-
- def set_input_tensor(self, input_tensor):
- """ set input_tensor to model """
- self.language_model.set_input_tensor(input_tensor)
-
- def set_model_key(self):
- """ set model key for differentiate PipelineCell process """
- self.model_key = "gpt3"
-
- def construct(self, input_ids, position_ids, attention_mask, loss_mask,
- retriever_input_ids=None,
- retriever_position_ids=None,
- retriever_attn_mask=None,
- labels=None, tokentype_ids=None, inference_params=None):
- """ gpt model forward """
- # use RoPE
- position_ids = None
- retriever_input_ids = None
- retriever_position_ids = None
- retriever_attn_mask = None
- lm_output = self.language_model(
- input_ids,
- position_ids,
- attention_mask,
- retriever_input_ids=retriever_input_ids,
- retriever_position_ids=retriever_position_ids,
- retriever_attn_mask=retriever_attn_mask,
- inference_params=inference_params)
- if self.post_process:
- return post_language_model_processing(
- self.parallel_lm_logits, self.loss,
- lm_output, labels,
- self.language_model.output_layer.weight if\
- self.untie_embeddings_and_output_weights else self.shared_embedding_or_output_weight(),
- self.parallel_output,
- self.fp16_lm_cross_entropy,
- loss_mask)
- else:
- return lm_output
-```
-
-当`post_process`为`True`时,需要对语言模型的输出`lm_output`进行后处理,输出损失和预测结果。
-
-```python
-import mindspore.common.dtype as mstype
-
-def post_language_model_processing(parallel_lm_logits, loss_fn, lm_output, labels, logit_weights,
- parallel_output, fp16_lm_cross_entropy, loss_mask):
- """ gpt model post process forward """
- output = parallel_lm_logits(lm_output, logit_weights, parallel_output)
-
- if labels is None:
- return output
-
- labels = labels
- loss_mask = loss_mask.reshape(-1)
-
- if fp16_lm_cross_entropy:
- if output.dtype != mstype.float16:
- raise ValueError(f"When fp16_lm_cross_entropy=True, output should be float16, but got {output.dtype}")
- loss = loss_fn(output, labels, loss_mask)
- else:
- loss = loss_fn(output.astype(mstype.float32), labels)
- token_nums = loss_mask.sum()
- loss_mask = loss_mask.astype(mstype.float32)
- loss = ops.sum(loss * loss_mask.float()) / loss_mask.sum()
- return loss, output, token_nums
-```
-
-### 动态图并行训练配置
-
-动态图并行的配置项通过yaml文件来读取,并分为不同种类,包括训练配置、并行配置、模型配置等,接下来简单介绍一下大模型训练需要的基本配置。
-
-#### 配置训练参数(training_config)
-
-```yaml
-training_config:
- seed: 42 # 固定随机性用的种子
- output_dir: './output' # 输出目录,用于储存checkpoints和日志等
- training_iters: 10 # 训练迭代次数
- log_interval: 1 # 日志打印的频率
- save_interval: null # 储存checkpoints的频率
- loss_scale: 4096 # loss scale的初始值
- grad_clip_kwargs:
- grad_clip_type: "ClipGlobalNorm" # 梯度裁剪的方法,可选:"ClipGlobalNorm"或者"GradClipByValue"
- clip_value: 1.0
- loss_reduction: "mean" # loss reduction的方法,可选:"mean"或者"sum"
- loss_func_kwargs:
- loss_func_type: "VocabParallelCrossEntropy" # 损失函数,可选: "VocabParallelCrossEntropy"或者"CrossEntropyLoss"
- use_distributed_optimizer: True # 是否使用分布式优化器
-```
-
-#### 配置并行模式(parallel_config)
-
-```yaml
-parallel_config:
- tensor_model_parallel_size: 1 # 张量并行
- pipeline_model_parallel_size: 1 # 流水线并行
- expert_model_parallel_size: 1 # 专家并行
- virtual_pipeline_model_parallel_size: null # 虚拟流水线并行
- sequence_parallel: False # 序列并行
-```
-
-#### 配置模型参数(gpt_config)
-
-```yaml
-model_config:
- params_dtype: "float32" # 参数初始化类型
- compute_dtype: "bfloat16" # 计算时使用的类型
- position_embedding_type: 'rope' # 位置编码的类型,可选:"rope"或者"absolute"
- untie_embeddings_and_output_weights: True # embedding层和head层是否不共享权重
- # 配置GPT 7B模型
- num_layers: 6 # Transformer层数
- hidden_size: 4096 # 隐藏层的大小
- ffn_hidden_size: 11008 # 前馈神经网络隐藏层大小
- num_attention_heads: 32 # 注意力头的数量
-```
-
-GPT模型当前有三种不同规格的配置:7B、13B和70B。
-
-```yaml
-7B:
- num_layers: 32
- hidden_size: 4096
- ffn_hidden_size: 11008
- num_attention_heads: 32
-13B:
- num_layers: 40
- hidden_size: 5120
- ffn_hidden_size: 13824
- num_attention_heads: 40
-70B:
- num_layers: 80
- hidden_size: 8192
- ffn_hidden_size: 28672
- num_attention_heads: 64
- group_query_attention: True
- num_query_groups: 8
-```
-
-#### 数据集配置(dataset_config)
-
-```yaml
-dataset_config:
- batch_size: 1 # 一次迭代从数据集中取出的数据大小
- micro_batch_num: 2 # 微批次个数
- dataset_dir: './dataset' # 数据集所在目录
- shuffle: False # 是否打乱顺序
-```
-
-#### 优化器配置(optimizer_config)
-
-```yaml
-optimizer_config:
- optimizer_type: "AdamW" # 优化器类型,可选:"AdamW", "Adam", "SGD", "Came", "mint.AdamW"及"SpeedAdamW"
- betas: # 优化器输入参数
- - 0.9
- - 0.95
- eps: 1.e-8
- learning_rate: 1.25e-6 # 初始学习率
- weight_decay: 1.e-1 # 权重衰减系数
- learning_rate_scheduler_kwargs: # 学习率调整策略
- warmup_steps: 200
- decay_steps: 2000
- use_cosine: True
- end_learning_rate: 1.25e-7
-```
-
-### 模型训练配置解析
-
-在pretrain_gpt.py里对传入的yaml配置文件进行解析,可以得到训练配置、模型配置、优化器配置、并行策略配置以及数据集配置。
-
-```python
-import argparse
-from mindformers.experimental.parallel_core.pynative.config import (
- init_configs_from_yaml
-)
-
-def get_arg_parser():
- """get argument parser"""
- parser = argparse.ArgumentParser(description="Train gpt model")
- parser.add_argument("--config_path", type=str, default="pretrain_gpt.yaml", help="The path to the config file.")
- parser.add_argument("--run_cmd", type=str, default="", help="running cmd.")
- parser.add_argument("--model_type", type=str, default="gpt_config", help="Input model config.")
- return parser
-parser = get_arg_parser()
-args = parser.parse_args()
-
-all_config = init_configs_from_yaml(args.config_path)
-
-training_config = all_config.training_config
-model_config = all_config.model_config
-optimizer_config = all_config.optimizer_config
-parallel_config = all_config.parallel_config
-dataset_config = all_config.dataset_config
-```
-
-### 通信配置
-
-通过set_context接口可以指定运行模式、运行设备、运行卡号等。并行脚本还需指定并行模式`parallel_mode`为数据并行模式,并通过init根据不同的设备需求初始化HCCL、NCCL或者MCCL通信。指定平台:设置`device_target`为`Ascend`。调试阶段可以使用`set_context(pynative_synchronize=True)`开启同步模式,更准确地定位报错位置。
-
-```python
-import mindspore as ms
-
-
-def set_parallel_context(parallel_config):
- init()
- initialize_model_parallel(
- tensor_model_parallel_size=parallel_config.tensor_model_parallel_size,
- pipeline_model_parallel_size=parallel_config.pipeline_model_parallel_size,
- virtual_pipeline_model_parallel_size=parallel_config.virtual_pipeline_model_parallel_size,
- )
- logger.info(
- f"dp {get_data_parallel_world_size()} | "
- f"pp {parallel_config.pipeline_model_parallel_size} | "
- f"tp {parallel_config.tensor_model_parallel_size} | "
- f"sp {parallel_config.sequence_parallel} | "
- f"vpp {parallel_config.virtual_pipeline_model_parallel_size}"
- )
-
-
-def set_seed(seed):
- # set global seed, np seed, and dataset seed
- ms.set_seed(seed)
- # set rng seed
- ms.manual_seed(seed)
-
-
-ms.set_context(mode=ms.PYNATIVE_MODE)
-ms.set_device(device_target="Ascend")
-set_parallel_context(parallel_config)
-set_seed(training_config.seed)
-```
-
-### 创建网络对象
-
-从模型库获取GPT模型,根据配置文件创建网络模型对象。通过`set_weight_decay`来为不同参数设置不同的权重衰减系数,这个函数会将参数分为两组,一组应用特定的权重衰减值,另一组权重衰减为`0`,然后返回一个包含参数分组信息的列表,赋值给`group_params`变量。调用`get_optimizer`函数,传入`optimizer_config`(优化器配置)、`training_config`(训练配置)、`group_params`(前面得到的参数分组信息)、`network_with_loss`(包含模型和损失的对象)以及一个梯度归约操作(从`training_config.loss_reduction`获取),返回一个优化器对象,并赋值给`optimizer`变量。
-创建一个`TrainOneStepCell`对象,它通常用于在训练过程中执行一步优化。传入`network_with_loss`、`optimizer`及配置作为参数,并将其赋值给train_one_step_cell变量。
-
-完整的创建网络对象代码:
-
-```python
-from mindformers.experimental.parallel_core.pynative.optimizer import get_optimizer
-from mindformers.experimental.parallel_core.pynative.training import get_model
-from mindformers.experimental.parallel_core.pynative.training import TrainOneStepCell
-from mindformers.experimental.parallel_core.models import GPTModel
-
-
-def decay_filter(x):
- return "norm" not in x.name.lower() and "bias" not in x.name.lower()
-
-
-def set_weight_decay(params, weight_decay=1e-1):
- decay_params = list(filter(decay_filter, params))
- other_params = list(filter(lambda x: not decay_filter(x), params))
- group_params = []
- if decay_params:
- group_params.append({"params": decay_params, "weight_decay": weight_decay})
- if other_params:
- group_params.append({"params": other_params, "weight_decay": 0.0})
- return group_params
-
-
-def model_provider_func(pre_process=True, post_process=True):
- network_with_loss = GPTModel(
- model_config, pre_process=pre_process, post_process=post_process
- )
- return network_with_loss
-
-network_with_loss = get_model(model_provider_func, training_config)
-
-group_params = set_weight_decay(network_with_loss.trainable_params(), optimizer_config.weight_decay)
-optimizer = get_optimizer(
- optimizer_config,
- training_config,
- group_params,
- network_with_loss,
- grad_allreduce_op=training_config.loss_reduction
-)
-
-train_one_step_cell = TrainOneStepCell(network_with_loss, optimizer, None, training_config, model_config)
-```
-
-### 加载数据集及执行训练
-
-```python
-from dataset import get_dataset
-from mindformers.experimental.parallel_core.pynative.training import train
-
-train_dataset_iterator, val_dataset_iterator = get_dataset(dataset_config)
-train(
- train_one_step_cell,
- train_dataset_iterator,
- training_config,
- val_dataset_iterator,
- metrics,
- evaluation,
-)
-```
-
-### 运行训练脚本
-
-```bash
-bash pretrain_gpt.sh xx.yaml
-```
-
-若不指定xx.yaml,则默认为pretrain_gpt_7B.yaml。
-
-训练脚本`pretrain_gpt.sh`详细解析如下:
-
-#### 设置环境变量
-
-`HCCL_BUFFSIZE=200`设置两个NPU之间共享数据的缓存区大小为200M;`HCCL_EXEC_TIMEOUT=600`设置设备间执行时同步的等待时间为10分钟。`ASCEND_RT_VISIBLE_DEVICES`指定了可见的设备编号,这里设置为设备`0`号卡。
-
-```bash
-export HCCL_BUFFSIZE=200
-export HCCL_EXEC_TIMEOUT=600
-export ASCEND_RT_VISIBLE_DEVICES='0'
-```
-
-#### 设置端口号
-
-```bash
-port=8828
-```
-
-如果之前的配置异常退出,可以使用如下代码进行清理。
-
-```bash
-PIDS=$(sudo lsof -i :$port | awk 'NR>1 {print $2}')
-if [ -n "$PIDS" ]; then
- for pid in $PIDS; do
- kill -9 $pid
- echo "Killed process $pid"
- done
-else
- echo "No processes found listening on port $port."
-fi
-```
-
-#### 设置日志存储路径
-
-获取当前脚本所在的目录路径并存储在`project_dir`变量中,同时设置日志路径变量`log_path="msrun_log"`。先删除名为`msrun_log`的目录(如果存在),然后重新创建这个目录。
-
-```bash
-project_dir=$(cd "$(dirname "$0")" || exit; pwd)
-log_path="msrun_log"
-
-rm -rf "${log_path}"
-mkdir "${log_path}"
-```
-
-#### 设置可用设备数量
-
-```bash
-# 计算设备数量
-IFS=',' read -r -a devices <<< "$ASCEND_RT_VISIBLE_DEVICES"
-work_num=${#devices[@]}
-```
-
-#### 获取配置文件
-
-尝试从命令行参数中获取配置文件路径,如果没有提供命令行参数,则使用默认的配置文件 "pretrain_gpt_7B.yaml"。
-
-```bash
-config_path=$1
-if [ -z "$config_path" ]; then
- config_path="pretrain_gpt_7B.yaml"
-fi
-```
-
-#### 以msrun模式执行训练脚本
-
-```bash
-msrun --worker_num "$work_num" --local_worker_num="$work_num" --master_port=$port --log_dir="$log_path" --join=True --cluster_time_out=300 pretrain_gpt.py --config_path="${config_path}"
-```
-
-#### 运行结果
-
-接下来通过命令调用对应的脚本。
-
-```bash
-bash pretrain_gpt.sh
-```
-
-执行完后,日志文件保存到`output`目录下,其中部分文件目录结构如下:
-
-```text
-└─ output
- └─ log
- ├─ rank_0
- | ├─ info.log
- | └─ error.log
- ├─ rank_1
- | ├─ info.log
- | └─ error.log
- ...
-```
-
-关于Loss部分结果保存在`output/log/rank_*/info.log`中,示例如下:
-
-```text
-train: Epoch:0, Step:5, Loss: 10.341485, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1403.24 ms
-train: Epoch:0, Step:6, Loss: 10.38118, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1378.19 ms
-train: Epoch:0, Step:7, Loss: 10.165115, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1370.32 ms
-train: Epoch:0, Step:8, Loss: 10.039211, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1386.89 ms
-train: Epoch:0, Step:9, Loss: 10.040031, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1475.95 ms
-...
-```
+# 动态图并行
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/advanced_development/pretrain_gpt.md)
+
+## 概述
+
+本教程演示如何使用MindSpore Transformers动态图并行框架训练GPT模型,此框架支持张量并行、流水线并行、序列并行等并行场景,还有支持使用分布式优化器动态学习率等场景,帮助开发者快速、便捷地构建和训练基于动态图并行框架的GPT预训练模型。
+
+## 操作实践
+
+下面基于Ascend平台,进行GPT模型训练。
+
+### 样例代码参考
+
+目录结构如下:
+
+```text
+└─ gpt
+ ├─ pretrain_gpt.py
+ ├─ pretrain_gpt.sh
+ └─ pretrain_gpt_7B.yaml
+ ...
+```
+
+其中,`pretrain_gpt.py`是环境配置、模型对象创建及训练的脚本。`pretrain_gpt.sh`是启动执行脚本。`pretrain_gpt_7B.yaml`是配置项。
+
+### 模型结构
+
+GPT以`Transformer`模型为主要架构,网络结构主要围绕`Transformer`的基本构建块构建。
+
+在模型中,初始化五个参数,`config`是模型配置项(在yaml文件的`model_config`中),`num_tokentypes`指定embedding的类型,`parallel_output`用来确认是否输出每一个并行Tensor的输出,`pre_process`和`post_process`分别指定是否为第一阶段和最后一阶段。
+
+调用的`get_language_model`是一个基于`Transformer`模型的接口,详情请看`get_language_model`的api文档。
+
+注意:数据集返回值要与模型定义的前向过程所需要的参数相对应。
+
+```python
+from mindformers.experimental.parallel_core.pynative.transformer.module import Module
+from mindformers.experimental.parallel_core.pynative.transformer.language_model import get_language_model
+from mindformers.experimental.parallel_core.pynative.transformer import ParallelLMLogits
+from mindformers.experimental.parallel_core.pynative.training.loss_func import VocabParallelCrossEntropy
+
+
+class AttnMaskType(enum.Enum):
+ padding = 1
+ causal = 2
+ no_mask = 3
+ padding_causal = 4
+
+
+attn_mask_type_mapping = {
+ "padding": AttnMaskType.padding,
+ "causal": AttnMaskType.causal,
+}
+
+
+class GPTModel(Module):
+ def __init__(self,
+ config,
+ num_tokentypes=0,
+ parallel_output=True,
+ pre_process=True,
+ post_process=True):
+ super().__init__(config=config,\
+ share_embeddings_and_output_weights=not config.untie_embeddings_and_output_weights)
+
+ self.parallel_output = parallel_output
+ self.pre_process = pre_process
+ self.post_process = post_process
+ self.untie_embeddings_and_output_weights = config.untie_embeddings_and_output_weights
+ self.fp16_lm_cross_entropy = config.fp16_lm_cross_entropy
+
+ self.set_model_key()
+ encoder_attn_mask_type = None
+ if config.encoder_attn_mask_type is not None:
+ encoder_attn_mask_type = attn_mask_type_mapping.get(config.encoder_attn_mask_type)
+ if encoder_attn_mask_type is None:
+ raise ValueError(f"encoder_attn_mask_type must be one of {attn_mask_type_mapping.keys()}, but got"
+ f"{config.encoder_attn_mask_type}")
+
+ self.language_model, self._language_model_key = get_language_model(
+ config=config,
+ num_tokentypes=num_tokentypes,
+ add_pooler=False,
+ encoder_attn_mask_type=encoder_attn_mask_type,
+ pre_process=self.pre_process,
+ post_process=self.post_process)
+
+ if self.post_process:
+ self.parallel_lm_logits = ParallelLMLogits(config=config,
+ bias=False,
+ compute_dtype=config.compute_dtype)
+ self.loss = VocabParallelCrossEntropy()
+
+ if not config.untie_embeddings_and_output_weights:
+ self.initialize_word_embeddings()
+
+ def set_input_tensor(self, input_tensor):
+ """ set input_tensor to model """
+ self.language_model.set_input_tensor(input_tensor)
+
+ def set_model_key(self):
+ """ set model key for differentiate PipelineCell process """
+ self.model_key = "gpt3"
+
+ def construct(self, input_ids, position_ids, attention_mask, loss_mask,
+ retriever_input_ids=None,
+ retriever_position_ids=None,
+ retriever_attn_mask=None,
+ labels=None, tokentype_ids=None, inference_params=None):
+ """ gpt model forward """
+ # use RoPE
+ position_ids = None
+ retriever_input_ids = None
+ retriever_position_ids = None
+ retriever_attn_mask = None
+ lm_output = self.language_model(
+ input_ids,
+ position_ids,
+ attention_mask,
+ retriever_input_ids=retriever_input_ids,
+ retriever_position_ids=retriever_position_ids,
+ retriever_attn_mask=retriever_attn_mask,
+ inference_params=inference_params)
+ if self.post_process:
+ return post_language_model_processing(
+ self.parallel_lm_logits, self.loss,
+ lm_output, labels,
+ self.language_model.output_layer.weight if\
+ self.untie_embeddings_and_output_weights else self.shared_embedding_or_output_weight(),
+ self.parallel_output,
+ self.fp16_lm_cross_entropy,
+ loss_mask)
+ else:
+ return lm_output
+```
+
+当`post_process`为`True`时,需要对语言模型的输出`lm_output`进行后处理,输出损失和预测结果。
+
+```python
+import mindspore.common.dtype as mstype
+
+def post_language_model_processing(parallel_lm_logits, loss_fn, lm_output, labels, logit_weights,
+ parallel_output, fp16_lm_cross_entropy, loss_mask):
+ """ gpt model post process forward """
+ output = parallel_lm_logits(lm_output, logit_weights, parallel_output)
+
+ if labels is None:
+ return output
+
+ labels = labels
+ loss_mask = loss_mask.reshape(-1)
+
+ if fp16_lm_cross_entropy:
+ if output.dtype != mstype.float16:
+ raise ValueError(f"When fp16_lm_cross_entropy=True, output should be float16, but got {output.dtype}")
+ loss = loss_fn(output, labels, loss_mask)
+ else:
+ loss = loss_fn(output.astype(mstype.float32), labels)
+ token_nums = loss_mask.sum()
+ loss_mask = loss_mask.astype(mstype.float32)
+ loss = ops.sum(loss * loss_mask.float()) / loss_mask.sum()
+ return loss, output, token_nums
+```
+
+### 动态图并行训练配置
+
+动态图并行的配置项通过yaml文件来读取,并分为不同种类,包括训练配置、并行配置、模型配置等,接下来简单介绍一下大模型训练需要的基本配置。
+
+#### 配置训练参数(training_config)
+
+```yaml
+training_config:
+ seed: 42 # 固定随机性用的种子
+ output_dir: './output' # 输出目录,用于储存checkpoints和日志等
+ training_iters: 10 # 训练迭代次数
+ log_interval: 1 # 日志打印的频率
+ save_interval: null # 储存checkpoints的频率
+ loss_scale: 4096 # loss scale的初始值
+ grad_clip_kwargs:
+ grad_clip_type: "ClipGlobalNorm" # 梯度裁剪的方法,可选:"ClipGlobalNorm"或者"GradClipByValue"
+ clip_value: 1.0
+ loss_reduction: "mean" # loss reduction的方法,可选:"mean"或者"sum"
+ loss_func_kwargs:
+ loss_func_type: "VocabParallelCrossEntropy" # 损失函数,可选: "VocabParallelCrossEntropy"或者"CrossEntropyLoss"
+ use_distributed_optimizer: True # 是否使用分布式优化器
+```
+
+#### 配置并行模式(parallel_config)
+
+```yaml
+parallel_config:
+ tensor_model_parallel_size: 1 # 张量并行
+ pipeline_model_parallel_size: 1 # 流水线并行
+ expert_model_parallel_size: 1 # 专家并行
+ virtual_pipeline_model_parallel_size: null # 虚拟流水线并行
+ sequence_parallel: False # 序列并行
+```
+
+#### 配置模型参数(gpt_config)
+
+```yaml
+model_config:
+ params_dtype: "float32" # 参数初始化类型
+ compute_dtype: "bfloat16" # 计算时使用的类型
+ position_embedding_type: 'rope' # 位置编码的类型,可选:"rope"或者"absolute"
+ untie_embeddings_and_output_weights: True # embedding层和head层是否不共享权重
+ # 配置GPT 7B模型
+ num_layers: 6 # Transformer层数
+ hidden_size: 4096 # 隐藏层的大小
+ ffn_hidden_size: 11008 # 前馈神经网络隐藏层大小
+ num_attention_heads: 32 # 注意力头的数量
+```
+
+GPT模型当前有三种不同规格的配置:7B、13B和70B。
+
+```yaml
+7B:
+ num_layers: 32
+ hidden_size: 4096
+ ffn_hidden_size: 11008
+ num_attention_heads: 32
+13B:
+ num_layers: 40
+ hidden_size: 5120
+ ffn_hidden_size: 13824
+ num_attention_heads: 40
+70B:
+ num_layers: 80
+ hidden_size: 8192
+ ffn_hidden_size: 28672
+ num_attention_heads: 64
+ group_query_attention: True
+ num_query_groups: 8
+```
+
+#### 数据集配置(dataset_config)
+
+```yaml
+dataset_config:
+ batch_size: 1 # 一次迭代从数据集中取出的数据大小
+ micro_batch_num: 2 # 微批次个数
+ dataset_dir: './dataset' # 数据集所在目录
+ shuffle: False # 是否打乱顺序
+```
+
+#### 优化器配置(optimizer_config)
+
+```yaml
+optimizer_config:
+ optimizer_type: "AdamW" # 优化器类型,可选:"AdamW", "Adam", "SGD", "Came", "mint.AdamW"及"SpeedAdamW"
+ betas: # 优化器输入参数
+ - 0.9
+ - 0.95
+ eps: 1.e-8
+ learning_rate: 1.25e-6 # 初始学习率
+ weight_decay: 1.e-1 # 权重衰减系数
+ learning_rate_scheduler_kwargs: # 学习率调整策略
+ warmup_steps: 200
+ decay_steps: 2000
+ use_cosine: True
+ end_learning_rate: 1.25e-7
+```
+
+### 模型训练配置解析
+
+在pretrain_gpt.py里对传入的yaml配置文件进行解析,可以得到训练配置、模型配置、优化器配置、并行策略配置以及数据集配置。
+
+```python
+import argparse
+from mindformers.experimental.parallel_core.pynative.config import (
+ init_configs_from_yaml
+)
+
+def get_arg_parser():
+ """get argument parser"""
+ parser = argparse.ArgumentParser(description="Train gpt model")
+ parser.add_argument("--config_path", type=str, default="pretrain_gpt.yaml", help="The path to the config file.")
+ parser.add_argument("--run_cmd", type=str, default="", help="running cmd.")
+ parser.add_argument("--model_type", type=str, default="gpt_config", help="Input model config.")
+ return parser
+parser = get_arg_parser()
+args = parser.parse_args()
+
+all_config = init_configs_from_yaml(args.config_path)
+
+training_config = all_config.training_config
+model_config = all_config.model_config
+optimizer_config = all_config.optimizer_config
+parallel_config = all_config.parallel_config
+dataset_config = all_config.dataset_config
+```
+
+### 通信配置
+
+通过set_context接口可以指定运行模式、运行设备、运行卡号等。并行脚本还需指定并行模式`parallel_mode`为数据并行模式,并通过init根据不同的设备需求初始化HCCL、NCCL或者MCCL通信。指定平台:设置`device_target`为`Ascend`。调试阶段可以使用`set_context(pynative_synchronize=True)`开启同步模式,更准确地定位报错位置。
+
+```python
+import mindspore as ms
+
+
+def set_parallel_context(parallel_config):
+ init()
+ initialize_model_parallel(
+ tensor_model_parallel_size=parallel_config.tensor_model_parallel_size,
+ pipeline_model_parallel_size=parallel_config.pipeline_model_parallel_size,
+ virtual_pipeline_model_parallel_size=parallel_config.virtual_pipeline_model_parallel_size,
+ )
+ logger.info(
+ f"dp {get_data_parallel_world_size()} | "
+ f"pp {parallel_config.pipeline_model_parallel_size} | "
+ f"tp {parallel_config.tensor_model_parallel_size} | "
+ f"sp {parallel_config.sequence_parallel} | "
+ f"vpp {parallel_config.virtual_pipeline_model_parallel_size}"
+ )
+
+
+def set_seed(seed):
+ # set global seed, np seed, and dataset seed
+ ms.set_seed(seed)
+ # set rng seed
+ ms.manual_seed(seed)
+
+
+ms.set_context(mode=ms.PYNATIVE_MODE)
+ms.set_device(device_target="Ascend")
+set_parallel_context(parallel_config)
+set_seed(training_config.seed)
+```
+
+### 创建网络对象
+
+从模型库获取GPT模型,根据配置文件创建网络模型对象。通过`set_weight_decay`来为不同参数设置不同的权重衰减系数,这个函数会将参数分为两组,一组应用特定的权重衰减值,另一组权重衰减为`0`,然后返回一个包含参数分组信息的列表,赋值给`group_params`变量。调用`get_optimizer`函数,传入`optimizer_config`(优化器配置)、`training_config`(训练配置)、`group_params`(前面得到的参数分组信息)、`network_with_loss`(包含模型和损失的对象)以及一个梯度归约操作(从`training_config.loss_reduction`获取),返回一个优化器对象,并赋值给`optimizer`变量。
+创建一个`TrainOneStepCell`对象,它通常用于在训练过程中执行一步优化。传入`network_with_loss`、`optimizer`及配置作为参数,并将其赋值给train_one_step_cell变量。
+
+完整的创建网络对象代码:
+
+```python
+from mindformers.experimental.parallel_core.pynative.optimizer import get_optimizer
+from mindformers.experimental.parallel_core.pynative.training import get_model
+from mindformers.experimental.parallel_core.pynative.training import TrainOneStepCell
+from mindformers.experimental.parallel_core.models import GPTModel
+
+
+def decay_filter(x):
+ return "norm" not in x.name.lower() and "bias" not in x.name.lower()
+
+
+def set_weight_decay(params, weight_decay=1e-1):
+ decay_params = list(filter(decay_filter, params))
+ other_params = list(filter(lambda x: not decay_filter(x), params))
+ group_params = []
+ if decay_params:
+ group_params.append({"params": decay_params, "weight_decay": weight_decay})
+ if other_params:
+ group_params.append({"params": other_params, "weight_decay": 0.0})
+ return group_params
+
+
+def model_provider_func(pre_process=True, post_process=True):
+ network_with_loss = GPTModel(
+ model_config, pre_process=pre_process, post_process=post_process
+ )
+ return network_with_loss
+
+network_with_loss = get_model(model_provider_func, training_config)
+
+group_params = set_weight_decay(network_with_loss.trainable_params(), optimizer_config.weight_decay)
+optimizer = get_optimizer(
+ optimizer_config,
+ training_config,
+ group_params,
+ network_with_loss,
+ grad_allreduce_op=training_config.loss_reduction
+)
+
+train_one_step_cell = TrainOneStepCell(network_with_loss, optimizer, None, training_config, model_config)
+```
+
+### 加载数据集及执行训练
+
+```python
+from dataset import get_dataset
+from mindformers.experimental.parallel_core.pynative.training import train
+
+train_dataset_iterator, val_dataset_iterator = get_dataset(dataset_config)
+train(
+ train_one_step_cell,
+ train_dataset_iterator,
+ training_config,
+ val_dataset_iterator,
+ metrics,
+ evaluation,
+)
+```
+
+### 运行训练脚本
+
+```bash
+bash pretrain_gpt.sh xx.yaml
+```
+
+若不指定xx.yaml,则默认为pretrain_gpt_7B.yaml。
+
+训练脚本`pretrain_gpt.sh`详细解析如下:
+
+#### 设置环境变量
+
+`HCCL_BUFFSIZE=200`设置两个NPU之间共享数据的缓存区大小为200M;`HCCL_EXEC_TIMEOUT=600`设置设备间执行时同步的等待时间为10分钟。`ASCEND_RT_VISIBLE_DEVICES`指定了可见的设备编号,这里设置为设备`0`号卡。
+
+```bash
+export HCCL_BUFFSIZE=200
+export HCCL_EXEC_TIMEOUT=600
+export ASCEND_RT_VISIBLE_DEVICES='0'
+```
+
+#### 设置端口号
+
+```bash
+port=8828
+```
+
+如果之前的配置异常退出,可以使用如下代码进行清理。
+
+```bash
+PIDS=$(sudo lsof -i :$port | awk 'NR>1 {print $2}')
+if [ -n "$PIDS" ]; then
+ for pid in $PIDS; do
+ kill -9 $pid
+ echo "Killed process $pid"
+ done
+else
+ echo "No processes found listening on port $port."
+fi
+```
+
+#### 设置日志存储路径
+
+获取当前脚本所在的目录路径并存储在`project_dir`变量中,同时设置日志路径变量`log_path="msrun_log"`。先删除名为`msrun_log`的目录(如果存在),然后重新创建这个目录。
+
+```bash
+project_dir=$(cd "$(dirname "$0")" || exit; pwd)
+log_path="msrun_log"
+
+rm -rf "${log_path}"
+mkdir "${log_path}"
+```
+
+#### 设置可用设备数量
+
+```bash
+# 计算设备数量
+IFS=',' read -r -a devices <<< "$ASCEND_RT_VISIBLE_DEVICES"
+work_num=${#devices[@]}
+```
+
+#### 获取配置文件
+
+尝试从命令行参数中获取配置文件路径,如果没有提供命令行参数,则使用默认的配置文件 "pretrain_gpt_7B.yaml"。
+
+```bash
+config_path=$1
+if [ -z "$config_path" ]; then
+ config_path="pretrain_gpt_7B.yaml"
+fi
+```
+
+#### 以msrun模式执行训练脚本
+
+```bash
+msrun --worker_num "$work_num" --local_worker_num="$work_num" --master_port=$port --log_dir="$log_path" --join=True --cluster_time_out=300 pretrain_gpt.py --config_path="${config_path}"
+```
+
+#### 运行结果
+
+接下来通过命令调用对应的脚本。
+
+```bash
+bash pretrain_gpt.sh
+```
+
+执行完后,日志文件保存到`output`目录下,其中部分文件目录结构如下:
+
+```text
+└─ output
+ └─ log
+ ├─ rank_0
+ | ├─ info.log
+ | └─ error.log
+ ├─ rank_1
+ | ├─ info.log
+ | └─ error.log
+ ...
+```
+
+关于Loss部分结果保存在`output/log/rank_*/info.log`中,示例如下:
+
+```text
+train: Epoch:0, Step:5, Loss: 10.341485, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1403.24 ms
+train: Epoch:0, Step:6, Loss: 10.38118, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1378.19 ms
+train: Epoch:0, Step:7, Loss: 10.165115, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1370.32 ms
+train: Epoch:0, Step:8, Loss: 10.039211, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1386.89 ms
+train: Epoch:0, Step:9, Loss: 10.040031, Finite_grads: True, Loss_scale: 4096.0, Learning_rate: (1.250000e-06,1.250000e-06,), Time: 1475.95 ms
+...
+```
diff --git a/docs/mindformers/docs/source_zh_cn/conf.py b/docs/mindformers/docs/source_zh_cn/conf.py
index 07db1388c60b45566ee22fec403a5304143ad455..2750c952ea59c8acd59758a5807e74312634c025 100644
--- a/docs/mindformers/docs/source_zh_cn/conf.py
+++ b/docs/mindformers/docs/source_zh_cn/conf.py
@@ -227,8 +227,8 @@ if os.path.exists('./mindformers.experimental.rst'):
if os.path.exists('./experimental'):
shutil.rmtree('./experimental')
-if os.path.exists('./usage/pretrain_gpt.md'):
- os.remove('./usage/pretrain_gpt.md')
+if os.path.exists('advanced_development/pretrain_gpt.md'):
+ os.remove('advanced_development/pretrain_gpt.md')
with open('./index.rst', 'r+', encoding='utf-8') as f:
ind_content = f.read()
diff --git a/docs/mindformers/docs/source_zh_cn/faq/mindformers_contribution.md b/docs/mindformers/docs/source_zh_cn/contribution/mindformers_contribution.md
similarity index 98%
rename from docs/mindformers/docs/source_zh_cn/faq/mindformers_contribution.md
rename to docs/mindformers/docs/source_zh_cn/contribution/mindformers_contribution.md
index ec45c60534ed4d4c9060f8aabf12044739a7ccc1..471635e079156329a1bf90c1b0fb9d8ed5ab882d 100644
--- a/docs/mindformers/docs/source_zh_cn/faq/mindformers_contribution.md
+++ b/docs/mindformers/docs/source_zh_cn/contribution/mindformers_contribution.md
@@ -1,6 +1,6 @@
# MindSpore Transformers贡献指南
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/faq/mindformers_contribution.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/contribution/mindformers_contribution.md)
## 贡献代码至MindSpore Transformers
diff --git a/docs/mindformers/docs/source_zh_cn/faq/modelers_contribution.md b/docs/mindformers/docs/source_zh_cn/contribution/modelers_contribution.md
similarity index 98%
rename from docs/mindformers/docs/source_zh_cn/faq/modelers_contribution.md
rename to docs/mindformers/docs/source_zh_cn/contribution/modelers_contribution.md
index 52a65a934767db9c1f41336aea880a4c294b25c7..923b46ad679b72f8103e6b7ddd24dd0c19e2bf08 100644
--- a/docs/mindformers/docs/source_zh_cn/faq/modelers_contribution.md
+++ b/docs/mindformers/docs/source_zh_cn/contribution/modelers_contribution.md
@@ -1,6 +1,6 @@
# 魔乐社区贡献指南
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/faq/modelers_contribution.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/contribution/modelers_contribution.md)
## 上传模型至魔乐社区
diff --git a/docs/mindformers/docs/source_zh_cn/appendix/env_variables.md b/docs/mindformers/docs/source_zh_cn/env_variables.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/appendix/env_variables.md
rename to docs/mindformers/docs/source_zh_cn/env_variables.md
index b6c507dc8d200598e065308e10ee401db26b542a..f3ed1b7a4c0e283191f44ca0595050bfb29e3423 100644
--- a/docs/mindformers/docs/source_zh_cn/appendix/env_variables.md
+++ b/docs/mindformers/docs/source_zh_cn/env_variables.md
@@ -1,6 +1,6 @@
# 环境变量说明
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/appendix/env_variables.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/env_variables.md)
以下是 MindSpore Transformers 支持的环境变量。
@@ -14,7 +14,7 @@
| **ASCEND_LAUNCH_BLOCKING** | 0 | 训练或在线推理场景,可通过此环境变量控制算子执行时是否启动同步模式。 | `1`:强制算子采用同步模式运行; `0`:不强制算子采用同步模式运行。 | 由于 NPU 模型训练时默认算子异步执行,导致算子执行过程中出现报错时,打印的报错堆栈信息并不是实际的调用栈信息。当设置为`1`时,强制算子采用同步模式运行,这样能够打印正确的调用栈信息,从而更容易地调试和定位代码中的问题。设置为`1`时有更高的运算效率。 |
| **TE_PARALLEL_COMPILER** | 8 | 算子最大并行编译进程数,当大于 1 时开启并行编译。 | 取值为正整数;最大不超过 cpu 核数\*80%/昇腾 AI 处理器个数,取值范围 1~32,默认值是 8。 | 网络模型较大时,可通过配置此环境变量开启算子的并行编译功能; 设置为`1`时为单线程编译,在调试时,可以简化难度。 |
| **CPU_AFFINITY** | 0 | 启动 CPU 亲和性开关,启动该选项可以确保每个进程或线程绑定到一个 CPU 核心上,以提高性能。 | `1`:开启 CPU 亲和性开关; `0`:关闭 CPU 亲和性开关。 | 出于**优化资源利用** 以及**节能** 的考虑,CPU 亲和性默认关闭。 |
-| **MS_MEMORY_STATISTIC** | 0 | 内存统计。 | `1`:开启内存统计功能; `0`:关闭内存统计功能。 | 在内存分析时,可以统计内存的基本使用情况。具体可以参考[调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/perf_optimize/perf_optimize.html)。 |
+| **MS_MEMORY_STATISTIC** | 0 | 内存统计。 | `1`:开启内存统计功能; `0`:关闭内存统计功能。 | 在内存分析时,可以统计内存的基本使用情况。具体可以参考[调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。 |
| **MINDSPORE_DUMP_CONFIG** | | 指定 [云侧 Dump 功能](https://www.mindspore.cn/tutorials/zh-CN/master/debug/dump.html) 或 [端侧 Dump 功能](https://www.mindspore.cn/lite/docs/zh-CN/master/tools/benchmark_tool.html#dump功能) 所依赖的配置文件的路径 | 文件路径,支持相对路径与绝对路径。 | |
| **GLOG_v** | 3 | 控制 MindSpore 日志的级别。 | `0`:DEBUG; `1`:INFO; `2`:WARNING; `3`:ERROR:表示程序执行出现报错,输出错误日志,程序可能不会终止; `4`:CRITICAL,表示程序执行出现异常,将会终止执行程序。 | |
| **ASCEND_GLOBAL_LOG_LEVEL** | 3 | 控制 CANN 的日志级别。 | `0`:DEBUG; `1`:INFO; `2`:WARNING; `3`:ERROR; `4`:NULL,不输出日志。 | |
@@ -40,4 +40,4 @@
| **MS_ENABLE_FA_FLATTEN** | on | 控制 是否支持 FlashAttention flatten 优化。 | `on`:启用 FlashAttention flatten 优化; `off`: 禁用 FlashAttention flatten 优化。 | 对于还未适配FlashAttention flatten 优化的模型提供回退机制。 |
| **EXPERIMENTAL_KERNEL_LAUNCH_GROUP** | NA | 控制是否支持算子批量并行下发,支持开启并行下发,并配置并行数 | `thread_num`: 并发线程数,一般不建议增加,默认值为`2`; `kernel_group_num`: 算子分组总数量,每线程`kernel_group_num/thread_num`个组,默认值为`8`。 | 该特性后续还会继续演进,后续行为可能会有变更,当前仅支持`deepseek`推理场景,有一定的性能优化,但是其他模型使用该特性可能会有劣化,用户需要谨慎使用,使用方法如下:`export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8"`。 |
| **FORCE_EAGER** | False | 控制是否**不开启**jit模式。 | `False`: 开启jit模式; `True`: 不开启jit模式。 | Jit将函数编译成一张可调用的MindSpore图,设置FORCE_EAGER为False开启jit模式,可以获取性能收益,当前仅支持推理模式。 |
-| **MS_ENABLE_TFT** | NA | 使能 [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) 特性,表示启用 TTP、UCE 或 ARF 功能。 | 取值为"{TTP:1,UCE:1,ARF:1}",使用某一功能时,可将对应字段配置为"1"。 | 使用方式可以参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/high_availability.html)。 |
\ No newline at end of file
+| **MS_ENABLE_TFT** | NA | 使能 [MindIO TFT](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft001.html) 特性,表示启用 TTP、UCE、ARF 或 TRE 功能。 | 取值为"{TTP:1,UCE:1,ARF:1,TRE:1}",使用某一功能时,可将对应字段配置为"1"。 | 使用方式可以参考[高可用特性](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/high_availability.html)。 |
diff --git a/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md b/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md
index f2f6ff3a48110ded8a774d27b841375f70be5dc5..d2df88f00428c4e35e7a0710a6b4e538706b49f1 100644
--- a/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md
+++ b/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md
@@ -1,5 +1,7 @@
# 使用DeepSeek-R1进行模型蒸馏的实践案例
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/example/distilled/distilled.md)
+
本案例参考OpenR1-Qwen-7B,旨在指导用户基于MindSpore框架和MindSpore Transformers大模型套件,使用DeepSeek-R1对Qwen2.5-Math-7B模型进行知识蒸馏和微调,以提升其在数学推理任务上的性能。案例涵盖了从环境配置、数据生成、预处理到模型微调和推理测试的完整流程。通过以下步骤,您可以了解如何利用DeepSeek-R1生成推理数据、过滤错误数据、处理数据集,并最终对模型进行微调以解决复杂的数学问题。
蒸馏流程:
@@ -12,7 +14,7 @@
### 1.1 环境
-安装方式请参考[MindSpore Transformers安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/quick_start/install.html)。
+安装方式请参考[MindSpore Transformers安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html)。
并将本案例的[distilled](https://gitee.com/mindspore/docs/tree/master/docs/mindformers/docs/source_zh_cn/example/distilled/distilled)文件夹,复制到MindSpore Transformers源码根目录下。
@@ -43,7 +45,7 @@ mindformers
- **使用OpenR1-Math-220K数据集**:
- **选项1: 使用原始数据离线处理**:适合需要自定义数据处理或学习处理流程的用户。包括预处理和Packing。请从[选项1: 使用原始数据离线处理](#选项-1-使用原始数据离线处理)开始。
- - **选项2: 使用已处理好的数据**:适合希望快速开始训练的用户。案例提供预处理好的OpenR1-Math-220K数据集。请从[选项2: 使用已处理好的数据](#选项-2-使用已经处理好的数据)开始。
+ - **选项2: 使用已处理好的数据**:适合希望快速开始训练的用户。案例提供预处理好的OpenR1-Math-220K数据集。请从[选项2: 使用已处理好的数据](#选项-2-使用完成转换的数据)开始。
#### 1.3.1 从零开始生成数据集
@@ -51,7 +53,7 @@ mindformers
> 生成数据集流程仅作为示例,如需生成高质量数据集,建议参考[OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)的数据集生成流程。
-- 安装依赖
+1. 安装依赖
执行以下命令安装所需依赖:
@@ -59,11 +61,11 @@ mindformers
pip install datasets tqdm aiofiles aiohttp uvloop math_verify
```
-- 本地部署Deepseek-R1
+2. 本地部署Deepseek-R1
参考[MindSpore-Lab/DeepSeek-R1 | 魔乐社区](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1)在本地部署DeepSeek-R1推理服务,或是使用公开的API服务。
-- 生成数据
+3. 生成数据
**目标**:利用DeepSeek-R1模型为数学问题生成Chain-of-Thought(CoT)推理数据,用于后续的数据蒸馏。
@@ -88,22 +90,20 @@ mindformers
--max-concurrent 100
```
-**参数说明:**
-
-- **作用**:调用DeepSeek-R1推理服务,基于“AI-MO/NuminaMath-1.5”数据集中的数学问题(`problem`列)生成推理路径。
-- **关键参数**:
+ - **作用**:调用DeepSeek-R1推理服务,基于[AI-MO/NuminaMath-1.5](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5)数据集中的数学问题(`problem`列)生成推理路径。
+ - **参数说明**:
- - **--model**: 推理服务的模型名,需要和服务化配置文件 `config.json` 中的 `modelName` 一致。
- - **--dataset-name**:种子数据集名称,配置为HuggingFace Datasets名称或本地的数据集路径。
- - **--output-file**:输出CoT数据文件的文件名。
- - **--prompt-column**:种子数据集中提示词的列名,使用此列的数据进行CoT数据生成。
- - **--uuid-column**:种子数据集中uuid的列名,使用此列计算哈希值去重数据。
- - **--api-addr**:推理服务api的地址,配置为 `ip:port` 。
- - **--num-generations**:对于种子数据集中每个问题生成CoT数据的数量。
- - **--max-tokens**:生成的CoT数据的最大Token数。
- - **--max-concurrent**:请求的最大并发数量。
+ - **`--model`**: 推理服务的模型名,需要和服务化配置文件 `config.json` 中的 `modelName` 一致。
+ - **`--dataset-name`**:种子数据集名称,配置为HuggingFace Datasets名称或本地的数据集路径。
+ - **`--output-file`**:输出CoT数据文件的文件名。
+ - **`--prompt-column`**:种子数据集中提示词的列名,使用此列的数据进行CoT数据生成。
+ - **`--uuid-column`**:种子数据集中uuid的列名,使用此列计算哈希值去重数据。
+ - **`--api-addr`**:推理服务api的地址,配置为 `ip:port` 。
+ - **`--num-generations`**:对于种子数据集中每个问题生成CoT数据的数量。
+ - **`--max-tokens`**:生成的CoT数据的最大Token数。
+ - **`--max-concurrent`**:请求的最大并发数量。
-- 拒绝采样
+1. 拒绝采样
**目标**:过滤掉推理数据中的错误或不准确的CoT数据,确保数据质量。
@@ -113,15 +113,13 @@ mindformers
--dst /path/to/numinamath_r1_generations_filtered.jsonl
```
-**参数说明:**
-
-- **作用**:使用`math_verify`库验证`numinamath_r1_generations.jsonl`中的推理路径,剔除错误的CoT数据。
-- **关键参数**:
+ - **作用**:使用`math_verify`库验证`numinamath_r1_generations.jsonl`中的推理路径,剔除错误的CoT数据。
+ - **参数说明**:
- - **--src**:输入的CoT数据文件路径。
- - **--dst**:输出的过滤后的CoT数据文件路径。
+ - **`--src`**:输入的CoT数据文件路径。
+ - **`--dst`**:输出的过滤后的CoT数据文件路径。
-- 数据集预处理
+2. 数据集预处理
跳转到[选项-1-使用原始数据离线处理](#选项-1-使用原始数据离线处理)的中的**步骤一**,并将生成的CoT数据转换为MindSpore Transformers支持的格式。
@@ -142,7 +140,7 @@ mindformers
**适用场景**:适合希望使用高质量预蒸馏数据集进行微调的用户。
-如果使用OpenR1-Math-220K数据集(已经过DeepSeek-R1蒸馏)进行微调,我们提供[详细制作流程](#选项-1-使用原始数据离线处理)以及[转换后的数据集](#选项-2使用完成转换的数据)。
+如果使用OpenR1-Math-220K数据集(已经过DeepSeek-R1蒸馏)进行微调,我们提供[详细制作流程](#选项-1-使用原始数据离线处理)以及[转换后的数据集](#选项-2-使用完成转换的数据)。
##### 选项 1: 使用原始数据离线处理
@@ -191,14 +189,12 @@ mindformers
--register_path distilled/
```
-**参数说明:**
+ - **作用**:将原始数据集转换为MindSpore Transformers支持的格式。
+ - **参数说明**:
-- **作用**:将原始数据集转换为MindSpore Transformers支持的格式。
-- **关键参数**:
-
- - **--config**:数据预处理的配置文件路径。
- - **--save_path**:转换后数据集的保存文件夹路径。
- - **--register_path**:注册路径,为当前目录下的`distilled/`文件夹。
+ - **`--config`**:数据预处理的配置文件路径。
+ - **`--save_path`**:转换后数据集的保存文件夹路径。
+ - **`--register_path`**:注册路径,为当前目录下的`distilled/`文件夹。
步骤二、**数据集Packing**
@@ -222,20 +218,18 @@ python toolkit/data_preprocess/huggingface/datasets_preprocess.py \
--register_path distilled
```
-**参数说明:**
-
- **作用**:将处理好的数据集进行packing,减少微调时的数据加载时间。
-- **关键参数**:
+- **参数说明**:
- - **--config**:数据集packing的配置文件路径。
- - **--save_path**:packing后数据集的保存路径
- - **--register_path**:注册数据集的路径。
+ - **`--config`**:数据集packing的配置文件路径。
+ - **`--save_path`**:packing后数据集的保存路径
+ - **`--register_path`**:注册数据集的路径。
最后在`packed_data`中可以找到处理后的数据集,格式为arrow。
-更多数据集处理的教程请参考[MindSpore Transformers官方文档-数据集](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/dataset.html#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AEhandler)。
+更多数据集处理的教程请参考[MindSpore Transformers官方文档-数据集](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AEhandler)。
-##### 选项 2:使用完成转换的数据
+##### 选项 2: 使用完成转换的数据
我们在[魔乐社区](https://modelers.cn/models/MindSpore-Lab/OpenR1-Qwen-7B/tree/main/dataset/packing)提供packing处理后可以直接用于模型训练的数据,格式为arrow。此时[#1.4 YAML配置](#14-yaml配置)中的`path`需要修改为下载后的数据集路径。
@@ -283,7 +277,7 @@ train_dataset: &train_dataset
......
```
-其余参数配置的解释可以参考[MindSpore Transformers官方文档-SFT微调](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/sft_tuning.html)。
+其余参数配置的解释可以参考[MindSpore Transformers官方文档-SFT微调](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/supervised_fine_tuning.html)。
## 2. 启动微调
@@ -303,7 +297,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py --config distilled/finetune_qw
日志记录在`output/msrun_log`目录下,例如可以通过`tail -f output/msrun_log/worker_7.log`指令查看worker 7的日志信息。
微调完成后,输出的`safetensors`权重文件在`output/checkpoint`目录下。
-更多safetensors权重的内容请参考[MindSpore Transformers官方文档-Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/safetensors.html)。
+更多safetensors权重的内容请参考[MindSpore Transformers官方文档-Safetensors权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html)。
## 3. 执行推理
@@ -325,4 +319,4 @@ bash scripts/msrun_launcher.sh "run_mindformer.py --config distilled/finetune_qw
| OpenR1-Qwen-7B (MindSpore Transformers) | 90.0 |
| OpenThinker-7B | 89.6 |
-> 注:上表第三行为本案例实验结果,该结果由本地实测得到。
+> 上表第三行为本案例实验结果,该结果由本地实测得到。
diff --git a/docs/mindformers/docs/source_zh_cn/faq/func_related.md b/docs/mindformers/docs/source_zh_cn/faq/feature_related.md
similarity index 85%
rename from docs/mindformers/docs/source_zh_cn/faq/func_related.md
rename to docs/mindformers/docs/source_zh_cn/faq/feature_related.md
index 556d17948d0fd8b5dc823fcf0af0ee267bc29c09..01ce3ec5915b6905e0ac3da540ac646ef981edbf 100644
--- a/docs/mindformers/docs/source_zh_cn/faq/func_related.md
+++ b/docs/mindformers/docs/source_zh_cn/faq/feature_related.md
@@ -1,6 +1,6 @@
-# 功能相关
+# 功能相关 FAQ
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/faq/func_related.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/faq/feature_related.md)
## Q: WikiText数据集下载链接失效。
@@ -10,7 +10,7 @@ A: 官方下载链接失效,请关注社区Issue [#IBV35D](https://gitee.com/m
## Q: 如何生成模型切分策略文件?
-A: 模型切分策略文件记录了模型权重在分布式场景下的切分策略,一般在离线权重切分时使用。在网络`yaml`文件中配置`only_save_strategy: True`,然后正常启动分布式任务,便可在`output/strategy/`目录下生成分布式策略文件,详细介绍请参阅[分布式权重切分与合并教程](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html#%E7%A6%BB%E7%BA%BF%E8%BD%AC%E6%8D%A2%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)。
+A: 模型切分策略文件记录了模型权重在分布式场景下的切分策略,一般在离线权重切分时使用。在网络`yaml`文件中配置`only_save_strategy: True`,然后正常启动分布式任务,便可在`output/strategy/`目录下生成分布式策略文件,详细介绍请参阅[分布式权重切分与合并教程](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html#%E7%A6%BB%E7%BA%BF%E8%BD%AC%E6%8D%A2%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)。
diff --git a/docs/mindformers/docs/source_zh_cn/faq/model_related.md b/docs/mindformers/docs/source_zh_cn/faq/model_related.md
index e0e376cfef5612d0f08a70994f67a5a8ac94425a..d4ce7e1361cd823e4da45ac81a8a6236187a150c 100644
--- a/docs/mindformers/docs/source_zh_cn/faq/model_related.md
+++ b/docs/mindformers/docs/source_zh_cn/faq/model_related.md
@@ -1,4 +1,4 @@
-# 模型相关
+# 模型相关 FAQ
[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/faq/model_related.md)
diff --git a/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/appendix/conf_files.md
rename to docs/mindformers/docs/source_zh_cn/feature/configuration.md
index 96dcd64e21d364544666419aee87967de8c91596..b027cb38b36bb2952a529cf59f0ae923f03e9670 100644
--- a/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
@@ -1,6 +1,6 @@
# 配置文件说明
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/appendix/conf_files.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/configuration.md)
## 概述
@@ -19,9 +19,9 @@ MindSpore Transformers提供的`YAML`文件中包含对于不同功能的配置
| seed | 设置全局种子,详情可参考[mindspore.set_seed](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.set_seed.html)。 | int |
| run_mode | 设置模型的运行模式,可选`train`、`finetune`、`eval`或`predict`。 | str |
| output_dir | 设置保存log、checkpoint、strategy等文件的路径。 | str |
-| load_checkpoint | 加载权重的文件或文件夹路径,目前有3个应用场景: 1. 支持传入完整权重文件路径。 2. 支持传入离线切分后的权重文件夹路径。 3. 支持传入包含lora权重和base权重的文件夹路径。 各种权重的获取途径可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)。 | str |
-| auto_trans_ckpt | 是否开启分布式权重自动切分与合并功能,详情可参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html)。 | bool |
-| resume_training | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool |
+| load_checkpoint | 加载权重的文件或文件夹路径,目前有3个应用场景: 1. 支持传入完整权重文件路径。 2. 支持传入离线切分后的权重文件夹路径。 3. 支持传入包含lora权重和base权重的文件夹路径。 各种权重的获取途径可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)。 | str |
+| auto_trans_ckpt | 是否开启分布式权重自动切分与合并功能,详情可参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html)。 | bool |
+| resume_training | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool |
| load_ckpt_format | 加载的模型权重的格式,可选`ckpt`、`safetensors`。 | str |
| remove_redundancy | 加载的模型权重是否去除了冗余。默认值为`False`。 | bool |
| train_precision_sync | 训练确定性计算开关。默认值为`None` 。 | Optional[bool] |
@@ -140,7 +140,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/
### 并行配置
-为了提升模型的性能,在大规模集群的使用场景中通常需要为模型配置并行策略,详情可参考[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/distributed_parallel.html),MindSpore Transformers中的并行配置如下。
+为了提升模型的性能,在大规模集群的使用场景中通常需要为模型配置并行策略,详情可参考[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html),MindSpore Transformers中的并行配置如下。
| 参数 | 说明 | 类型 |
|-----------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
@@ -174,7 +174,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/
### 模型优化配置
-1. MindSpore Transformers提供重计算相关配置,以降低模型在训练时的内存占用,详情可参考[重计算](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/perf_optimize/perf_optimize.html#重计算)。
+1. MindSpore Transformers提供重计算相关配置,以降低模型在训练时的内存占用,详情可参考[重计算](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html#重计算)。
| 参数 | 说明 | 类型 |
|----------------------------------------------------|-------------------------------|-----------------|
@@ -186,7 +186,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/
| recompute_config.select_recompute_exclude | 关闭指定算子的重计算,只对Primitive算子有效。 | bool/list |
| recompute_config.select_comm_recompute_exclude | 关闭指定算子的通讯重计算,只对Primitive算子有效。 | bool/list |
-2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/fine_grained_activations_swap.html)。
+2. MindSpore Transformers提供细粒度激活值SWAP相关配置,以降低模型在训练时的内存占用,详情可参考[细粒度激活值SWAP](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/memory_optimization.html#%E7%BB%86%E7%B2%92%E5%BA%A6%E6%BF%80%E6%B4%BB%E5%80%BCswap)。
| 参数 | 说明 | 类型 |
|------|-----|-----|
@@ -280,7 +280,7 @@ MindSpore Transformers提供模型评估功能,同时支持模型边训练边
### Profile配置
-MindSpore Transformers提供Profile作为模型性能调优的主要工具,详情可参考[性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/perf_optimize/perf_optimize.html)。以下是Profile相关配置。
+MindSpore Transformers提供Profile作为模型性能调优的主要工具,详情可参考[性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html)。以下是Profile相关配置。
| 参数 | 说明 | 类型 |
|-----------------------|-------------------------------------------------------------------------------------------------------------------------------------------|------|
@@ -300,7 +300,7 @@ MindSpore Transformers提供Profile作为模型性能调优的主要工具,详
### 指标监控配置
-指标监控配置主要用于配置训练过程中各指标的记录方式,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/monitor.html)。以下是MindSpore Transformers中通用的指标监控配置项说明:
+指标监控配置主要用于配置训练过程中各指标的记录方式,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/monitor.html)。以下是MindSpore Transformers中通用的指标监控配置项说明:
| 参数名称 | 说明 | 类型 |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------------------------|---------------|
@@ -320,7 +320,7 @@ MindSpore Transformers提供Profile作为模型性能调优的主要工具,详
### TensorBoard配置
-TensorBoard配置主要用于配置训练过程中与TensorBoard相关的参数,便于在训练过程中实时查看和监控训练信息,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/monitor.html)。以下是MindSpore Transformers中通用的TensorBoard配置项说明:
+TensorBoard配置主要用于配置训练过程中与TensorBoard相关的参数,便于在训练过程中实时查看和监控训练信息,详情可参考[训练指标监控](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/monitor.html)。以下是MindSpore Transformers中通用的TensorBoard配置项说明:
| 参数名称 | 说明 | 类型 |
|-------------------------------------------|---------------------------------------------------------|------|
diff --git a/docs/mindformers/docs/source_zh_cn/function/dataset.md b/docs/mindformers/docs/source_zh_cn/feature/dataset.md
similarity index 98%
rename from docs/mindformers/docs/source_zh_cn/function/dataset.md
rename to docs/mindformers/docs/source_zh_cn/feature/dataset.md
index 1b81eebd99018196cc19595fa768b5989f33c7f3..bbce4057ccea16cd07934e30406b6367b8a16153 100644
--- a/docs/mindformers/docs/source_zh_cn/function/dataset.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/dataset.md
@@ -1,6 +1,6 @@
# 数据集
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/dataset.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/dataset.md)
MindSpore Transformers目前支持多种类型的数据集加载方式,涵盖常用开源与自定义场景。具体包括:
@@ -179,7 +179,7 @@ MindSpore Transformers推荐用户使用Megatron数据集进行模型预训练
| eod | 数据集中eod的token id |
| pad | 数据集中pad的token id |
- 此外,Megatron数据集还依赖`input_columns`、`construct_args_key`、`full_batch`等配置,具体可参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html),这里仅说明在不同场景如何配置:
+ 此外,Megatron数据集还依赖`input_columns`、`construct_args_key`、`full_batch`等配置,具体可参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html),这里仅说明在不同场景如何配置:
- 当`create_compressed_eod_mask=True`时:
@@ -269,7 +269,7 @@ HuggingFace数据集可实现HuggingFace社区以及魔乐开源社区中的数
#### 数据集加载流程
-
+
在线数据集加载与处理功能主要通过`CommonDataLoader`实现,其中数据加载部分可通过配置文件进行自定义配置,具体配置内容可参考[dataloader参数说明](#dataloader参数说明),在线加载模块需要用户针对不同数据集进行自定义实现,如通过`AlpacaInstructDataHandler`类可实现对`alpaca`数据集进行预处理,具体实现过程可参考[自定义数据handler](#自定义数据handler)。
@@ -380,7 +380,7 @@ train_dataset: &train_dataset
prefetch_size: 1
```
- 1. `train_dataset`中参数说明可参考[文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html);
+ 1. `train_dataset`中参数说明可参考[文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html);
2. `AlpacaInstructDataHandler`是针对`alpaca`数据集开发的在线处理脚本,如果使用其他数据集,用户需要参考[自定义数据handler](#自定义数据handler)完成自定义数据处理的功能实现。
@@ -496,7 +496,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64"
prefetch_size: 1
```
- 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 的 “模型训练配置” 和 “模型评估配置”。
+ 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 的 “模型训练配置” 和 “模型评估配置”。
自定义数据 handler:
@@ -611,7 +611,7 @@ export MS_DEV_RUNTIME_CONF="aclnn_cache_queue_length:64"
seed: 0
```
- 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 的 “模型训练配置” 和 “模型评估配置”。
+ 其余参数介绍可以参考 [配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 的 “模型训练配置” 和 “模型评估配置”。
自定义 adgen_handler:
diff --git a/docs/mindformers/docs/source_zh_cn/usage/evaluation.md b/docs/mindformers/docs/source_zh_cn/feature/evaluation.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/usage/evaluation.md
rename to docs/mindformers/docs/source_zh_cn/feature/evaluation.md
index 84e8edd5ec801550e3af17439dbda49d2a831ef7..445ce54fdb402aa87fb023bd9a179c27f7bb1d41 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/evaluation.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/evaluation.md
@@ -1,6 +1,6 @@
# 评测
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/evaluation.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/evaluation.md)
## Harness评测
@@ -43,7 +43,7 @@ pip install -e .
#### 评测前准备
1. 创建一个新目录,例如名称为`model_dir`,用于存储模型yaml文件。
- 2. 在上个步骤创建的目录中,放置模型推理yaml配置文件(predict_xxx_.yaml)。不同模型的推理yaml配置文件所在目录位置,请参考[模型库](../start/models.md)。
+ 2. 在上个步骤创建的目录中,放置模型推理yaml配置文件(predict_xxx_.yaml)。不同模型的推理yaml配置文件所在目录位置,请参考[模型库](../introduction/models.md)。
3. 配置yaml文件。如果yaml中模型类、模型Config类、模型Tokenzier类使用了外挂代码,即代码文件在[research](https://gitee.com/mindspore/mindformers/tree/dev/research)目录或其他外部目录下,需要修改yaml文件:在相应类的`type`字段下,添加`auto_register`字段,格式为“module.class”(其中“module”为类所在脚本的文件名,“class”为类名。如果已存在,则不需要修改)。
以[predict_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml)配置为例,对其中的部分配置项进行如下修改:
@@ -58,7 +58,7 @@ pip install -e .
auto_register: llama3_tokenizer.Llama3Tokenizer
```
- 关于每个配置项的详细说明请参考[配置文件说明](../appendix/conf_files.md)。
+ 关于每个配置项的详细说明请参考[配置文件说明](../feature/configuration.md)。
4. 如果使用`ceval-valid`、`mmlu`、`cmmlu`、`race`、`lambada`数据集进行评测,需要将`use_flash_attention`设置为`False`,以`predict_llama3_1_8b.yaml`为例,修改yaml如下:
```yaml
@@ -211,7 +211,7 @@ Harness评测支持单机单卡、单机多卡、多机多卡场景,每种场
在下载好的代码中,找到requirements.txt(VLMEvalKit/requirements.txt)文件,修改成如下内容:
- ```txt
+ ```text
gradio==4.40.0
huggingface_hub==0.24.2
imageio==2.35.1
@@ -362,7 +362,7 @@ OpenEuler系统按照如下步骤安装:
#### 评测前准备
1. 创建一个新目录,例如名称为`model_dir`,用于存储模型yaml文件;
-2. 在上个步骤创建的目录中放置模型推理yaml配置文件(predict_xxx_.yaml),不同模型的推理yaml配置文件的目录位置参考[模型库](../start/models.md)各模型说明文档中的模型文件树;
+2. 在上个步骤创建的目录中放置模型推理yaml配置文件(predict_xxx_.yaml),不同模型的推理yaml配置文件的目录位置参考[模型库](../introduction/models.md)各模型说明文档中的模型文件树;
3. 配置yaml配置文件。
以[predict_cogvlm2_image_llama3_chat_19b.yaml](https://gitee.com/mindspore/mindformers/blob/dev/configs/cogvlm2/predict_cogvlm2_image_llama3_chat_19b.yaml)配置为例:
@@ -378,7 +378,7 @@ OpenEuler系统按照如下步骤安装:
vocab_file: "/{path}/tokenizer.model" # 指定tokenizer文件路径
```
- 配置yaml文件,参考[配置文件说明](../appendix/conf_files.md)。
+ 配置yaml文件,参考[配置文件说明](../feature/configuration.md)。
4. MMbench-Video数据集评测需要使用GPT-4 Turbo模型进行评测打分,请提前准备好相应的API Key,并放在VLMEvalKit/.env文件中,内容如下所示:
```text
@@ -475,7 +475,7 @@ source toolkit/benchmarks/run_vlmevalkit.sh \
下载[Video-Bench中的答案数据](https://huggingface.co/spaces/LanguageBind/Video-Bench/resolve/main/file/ANSWER.json)。
-> 注:Video-Bench中的文本数据按照“egs/VideoBench/Eval_QA”(目录至少两层,且最后一层是`Eval_QA`)的路径格式进行存储;Video-Bench中的视频数据按照“egs/VideoBench/Eval_video”(目录至少两层,且最后一层是`Eval_video`)的路径格式进行存储。
+> Video-Bench中的文本数据按照“egs/VideoBench/Eval_QA”(目录至少两层,且最后一层是`Eval_QA`)的路径格式进行存储;Video-Bench中的视频数据按照“egs/VideoBench/Eval_video”(目录至少两层,且最后一层是`Eval_video`)的路径格式进行存储。
### 评测
diff --git a/docs/mindformers/docs/source_zh_cn/function/high_availability.md b/docs/mindformers/docs/source_zh_cn/feature/high_availability.md
similarity index 76%
rename from docs/mindformers/docs/source_zh_cn/function/high_availability.md
rename to docs/mindformers/docs/source_zh_cn/feature/high_availability.md
index bf6f64b70a39ad9f04fa8fdff4d53662e275b61b..e8acb538c89e773bd3c61ed39a9741454e0ca922 100644
--- a/docs/mindformers/docs/source_zh_cn/function/high_availability.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/high_availability.md
@@ -1,52 +1,59 @@
# 高可用特性
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/high_availability.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/high_availability.md)
## 概述
-MindSpore Transformers 高可用特性提供了如下三个功能:
+MindSpore Transformers 高可用特性提供了如下四个功能:
- **临终 CKPT 功能**:主要针对大模型训练过程中的故障恢复加速,该特性在训练过程中发生故障后,校验中间状态数据的完整性和一致性,生成一次临终 CheckPoint 数据,恢复训练时能够通过该 CheckPoint 数据恢复,减少故障造成的训练迭代损失。
- **UCE 故障容错恢复功能**:主要是针对大模型训练过程中片上内存的 UCE 故障检测,并完成在线修复,达到 Step 级重计算。
-- **进程级重调度恢复功能**:训练发生异常后,不需要重新拉起整个集群,只需以节点为单位进行重启或替换,完成修复并继续训练。
+- **TRE 训练结果异常恢复功能**:主要是针对大模型训练过程中出现loss或global norm等值异常检测,并完成在线修复,达到 Step 级重计算。
+- **ARF 进程级重调度恢复功能**:训练发生异常后,不需要重新拉起整个集群,只需以节点为单位进行重启或替换,完成修复并继续训练。
-高可用特性目前只支持 MindSpore Ascend 后端的图模式;该特性同时需要支持Step级别恢复,因此配置数据下沉时只支持sink_size 为 1。
+这几个高可用特性的**约束**和**依赖**如下:
-高可用特性的基础是两张卡存在副本关系,这样当其中一张卡发生故障时,可从另外一张卡恢复,因此权重和优化器都会存在两份冗余,会占用更多的显存。为保证这种冗余关系,必须开启数据并行,保证有两张卡权重一致,同时如果开启了优化器并行,也必须确保存在两张卡的优化器状态一致。
+| | 临终 CKPT | UCE | ARF | TRE |
+| - | - | - | - | - |
+| 依赖MindIO组件 | Yes | Yes | Yes | No |
+| 卡间存在副本关系 | Yes | Yes | Yes | No |
+| Sink Size 为 1 | Yes | Yes | Yes | No |
-三个功能可同时开启,也可以单独开启。组合开启这三个功能时,依次生效的顺序是:UCE故障容错恢复 -> 进程级重调度恢复 -> 临终 CKPT ,如果其中一个功能可以恢复,就不会执行下一个功能。临终 CKPT 功能作为最后的保障,完成该功能后整个训练进程会退出,所以在另外两个功能开启时会默认开启。
+目前这四个高可用特性只支持Ascend后端上图模式的Step级别恢复。
-临终 CKPT 保存 Checkpoint 文件以及通过该文件进行续训均使用现有 MindSpore Transformers 的能力,在使用方式上一致,只是临终 CKPT 依赖于strategy文件,因此在训练和续训时均需要配置该文件夹。
+卡间存在副本关系的目的是当其中一张卡发生故障时,可从另外一张卡恢复,要求权重和优化器状态都会存在至少两份冗余。为保证这种冗余关系,必须开启数据并行,保证有两张卡权重一致,同时如果开启了优化器并行,也必须确保存在两张卡的优化器状态一致。
-当异常触发临终的 CheckPoint 保存时,如果未开启去冗余保存,每个数据并行域只有一张卡保存了 CheckPoint,其余卡不会保存 CheckPoint;所以在恢复训练时,同样需要使能高可用特性才能恢复,否则其他卡无法找到可用的 CheckPoint,会报错退出。用户可通过计算分布式保存的 CheckPoint 数量是否为小于集群数量,来判断该 CheckPoint 是否由临终 CKPT 功能触发。
+临终 CKPT、UCE 和 ARF 组合开启这三个功能时,依次生效的顺序是:UCE -> ARF -> 临终 CKPT ,如果其中一个功能可以恢复,就不会执行下一个功能。临终 CKPT 功能作为最后的保障,完成该功能后整个训练进程会退出,所以在 UCE 或 ARF 功能开启时,会默认开启临终 CKPT。
## 使用说明
-高可用特性开关由环境变量使能,YAML 配置文件中不单独设置开关,但 YAML 文件需要能配置出两张卡的权重和优化器状态一致,详见本文档中的[副本关系配置](#副本关系配置)章节。
+高可用特性开关由环境变量使能,YAML 配置文件中不单独设置开关。但对于要求卡间存在副本关系的高可用特性,YAML 文件需要能配置出两张卡的权重和优化器状态一致,详见本文档中的[副本关系配置](#副本关系配置)章节。
-高可用特性依赖用户安装 MindIO TFT SDK 包,详细请参考[在计算节点安装 MindIO TFT SDK](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft011.html)。
+依赖MindIO组件的高可用特性需用户安装 MindIO TFT SDK 包,详细请参考[在计算节点安装 MindIO TFT SDK](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/ref/mindiottp/mindiotft011.html)。
### 环境变量配置
```shell
export MINDIO_FOR_MINDSPORE=1
-export MS_ENABLE_TFT="{TTP:1,UCE:1,ARF:1}"
+export MS_ENABLE_TFT="{TTP:1,UCE:1,ARF:1,TRE:1}"
export MS_TFT_IP=127.0.0.1
export MS_TFT_PORT=30051
```
- `MINDIO_FOR_MINDSPORE`:使能 MindIO TFT SDK 支持 MindSpore
-- `MS_ENABLE_TFT`:表示启用 TTP、UCE 和 ARF 功能,如果只想启用其中的某一个功能,则将对应的值设置为 1 即可。
+- `MS_ENABLE_TFT`:表示启用 TTP、UCE、ARF 和 TRE 功能,如果只想启用其中的某一个功能,则将对应的值设置为 1 即可。
- **TTP (Try To Persist)**:临终 CKPT 功能
- **UCE (Uncorrectable Memory Error)**:UCE 故障容错恢复功能
- **ARF (Air Refuelling)**:进程级重调度恢复功能
+ - **TRE (Training Result Error)**:TRE 训练结果异常恢复功能
- 开启 UCE 或者 ARF 功能时,默认开启 TTP 功能
-
+ - 目前 TRE 功能不可以与 UCE 或 ARF 功能同时使用
+ - TRE 功能不依赖 MindIO 组件,若只使能TRE特性,无需配置 MindIO 相关的环境变量 MINDIO_FOR_MINDSPORE、MS_TFT_IP 和 MS_TFT_PORT
- `MS_TFT_IP` 和 `MS_TFT_PORT` 分别表示 TFT Controller 的 IP 和端口号,无默认值,需要用户指定。如果由 MindSpore Transformers 启动 Controller,则配置用户集群中 rank0 节点的 IP 和端口号。如果用户自行启动 Controller,则配置 Controller 的 IP 和端口号。
### YAML 配置
-YAML配置包含两部分:临终 CKPT 的保存及恢复配置和高可用的副本关系配置。
+YAML配置包含两部分:临终 CKPT 的保存及恢复配置和卡间副本关系配置。
#### 保存及恢复配置
@@ -90,7 +97,7 @@ YAML配置包含两部分:临终 CKPT 的保存及恢复配置和高可用的
#### 副本关系配置
-高可用的三个功能的关键是配置出权重和优化器的副本冗余关系,配置的核心是数据并行域的维度大于 2,如果叠加优化器并行,需要同时保证优化器的副本数大于 2。所以配置分两类,开启优化器并行和不开启优化器并行。下面以 8 卡为例,介绍如何配置。
+高可用的临终 CKPT、UCE 和 ARF 这三个功能的关键是配置出权重和优化器的副本冗余关系,配置的核心是数据并行域的维度大于 2,如果叠加优化器并行,需要同时保证优化器的副本数大于 2。所以配置分两类,开启优化器并行和不开启优化器并行。下面以 8 卡为例,介绍如何配置。
- **不开启优化器并行**
@@ -120,7 +127,7 @@ YAML配置包含两部分:临终 CKPT 的保存及恢复配置和高可用的
pipeline_stage: 1
```
-#### 示例
+#### 临终 CKPT 使用示例
本章节以 Llama2-13B 训练为例演示临终 CKPT 的使用。
diff --git a/docs/mindformers/docs/source_zh_cn/feature/images/TrainingStateMonitor_log.png b/docs/mindformers/docs/source_zh_cn/feature/images/TrainingStateMonitor_log.png
new file mode 100644
index 0000000000000000000000000000000000000000..f98cbe0cd819576782d60eb731d62c298a692d71
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/feature/images/TrainingStateMonitor_log.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/feature/images/adam_m_norm.png b/docs/mindformers/docs/source_zh_cn/feature/images/adam_m_norm.png
new file mode 100644
index 0000000000000000000000000000000000000000..f8ece7816ed7b404e7f748a002e7d5b4bdfda00f
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/feature/images/adam_m_norm.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/feature/images/commondataloader.png b/docs/mindformers/docs/source_zh_cn/feature/images/commondataloader.png
new file mode 100644
index 0000000000000000000000000000000000000000..ba434972960609f6ddb16c2e30702d00e6717061
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/feature/images/commondataloader.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/feature/images/local_loss&local_norm.png b/docs/mindformers/docs/source_zh_cn/feature/images/local_loss&local_norm.png
new file mode 100644
index 0000000000000000000000000000000000000000..3478ae69cf82cfde253adf375be364b743ae7df1
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/feature/images/local_loss&local_norm.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/feature/images/tensorboard_scalar.png b/docs/mindformers/docs/source_zh_cn/feature/images/tensorboard_scalar.png
new file mode 100644
index 0000000000000000000000000000000000000000..143fc0812e918394dc4e55a5a1e1c14dd4b73dc7
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/feature/images/tensorboard_scalar.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/feature/images/tensorboard_text.png b/docs/mindformers/docs/source_zh_cn/feature/images/tensorboard_text.png
new file mode 100644
index 0000000000000000000000000000000000000000..6857618c9cca67aac064a24d0122bdca3e7706b9
Binary files /dev/null and b/docs/mindformers/docs/source_zh_cn/feature/images/tensorboard_text.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/feature/infer_function.rst b/docs/mindformers/docs/source_zh_cn/feature/infer_function.rst
new file mode 100644
index 0000000000000000000000000000000000000000..d521b7e16bee0deafbfd14a063cbf7f2291d9361
--- /dev/null
+++ b/docs/mindformers/docs/source_zh_cn/feature/infer_function.rst
@@ -0,0 +1,9 @@
+推理功能
+===========
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+
+ evaluation
+ quantization
diff --git a/docs/mindformers/docs/source_zh_cn/function/logs.md b/docs/mindformers/docs/source_zh_cn/feature/logging.md
similarity index 82%
rename from docs/mindformers/docs/source_zh_cn/function/logs.md
rename to docs/mindformers/docs/source_zh_cn/feature/logging.md
index 97ecfd02cc2a2b182c8d0d833752747b95f4ab3a..bee9ee07b59873eccd672e96d75f8ce87c5e91fe 100644
--- a/docs/mindformers/docs/source_zh_cn/function/logs.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/logging.md
@@ -1,12 +1,12 @@
# 日志
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/logs.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/logging.md)
## 日志保存
### 概述
-MindSpore TransFormers 会将模型的训练配置、训练步数、Loss、吞吐率等信息写入日志中,开发者可以自行指定日志存储的路径。
+MindSpore Transformers 会将模型的训练配置、训练步数、Loss、吞吐率等信息写入日志中,开发者可以自行指定日志存储的路径。
### 训练日志的目录结构
@@ -40,7 +40,7 @@ output
### 配置与使用
-MindSpore TransFormer 默认会在训练的 yaml 文件中指定文件输出路径为 `./output` 。如果在 `mindformers` 路径下启动训练任务,则训练产生的日志输出将默认保存在 `mindformers/output` 下。
+MindSpore Transformers 默认会在训练的 yaml 文件中指定文件输出路径为 `./output` 。如果在 `mindformers` 路径下启动训练任务,则训练产生的日志输出将默认保存在 `mindformers/output` 下。
#### YAML 参数配置
@@ -54,12 +54,12 @@ output_dir: './output' # path to save logs/checkpoint/strategy
#### 单卡任务指定输出目录
-除了 yaml 文件配置来指定,MindSpore TransFormer 还支持在 [run_mindformer 一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#run-mindformer%E4%B8%80%E9%94%AE%E5%90%AF%E5%8A%A8%E8%84%9A%E6%9C%AC) 中,使用 `--output_dir` 启动命令对日志输出路径做指定。
+除了 yaml 文件配置来指定,MindSpore Transformers 还支持在 [run_mindformer 一键启动脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#run-mindformer%E4%B8%80%E9%94%AE%E5%90%AF%E5%8A%A8%E8%84%9A%E6%9C%AC) 中,使用 `--output_dir` 启动命令对日志输出路径做指定。
> 如果在这里配置了输出路径,将会覆盖 yaml 文件中的配置!
#### 分布式任务指定输出目录
-如果模型训练需要用到多台服务器,使用[分布式任务拉起脚本 msrun_launcher.sh](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#%E5%88%86%E5%B8%83%E5%BC%8F%E4%BB%BB%E5%8A%A1%E6%8B%89%E8%B5%B7%E8%84%9A%E6%9C%AC) 来启动分布式训练任务。
+如果模型训练需要用到多台服务器,使用[分布式任务拉起脚本 msrun_launcher.sh](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/start_tasks.html?highlight=%E6%97%A5%E5%BF%97#%E5%88%86%E5%B8%83%E5%BC%8F%E4%BB%BB%E5%8A%A1%E6%8B%89%E8%B5%B7%E8%84%9A%E6%9C%AC) 来启动分布式训练任务。
在设置了共享存储的情况下,还可以在启动脚本中指定入参 `LOG_DIR` 来指定 Worker 以及 Scheduler 的日志输出路径,将所有机器节点的日志都输出到一个路径下,方便统一观察。
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/function/fine_grained_activations_swap.md b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
similarity index 58%
rename from docs/mindformers/docs/source_zh_cn/function/fine_grained_activations_swap.md
rename to docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
index 82892826659bd9f57e6577ca81905ce688b7e1a6..6455de1cb073a427fe3921834033263b5d9e42ef 100644
--- a/docs/mindformers/docs/source_zh_cn/function/fine_grained_activations_swap.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md
@@ -1,8 +1,76 @@
-# 细粒度激活值SWAP
+# 训练内存优化特性
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/fine_grained_activations_swap.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/memory_optimization.md)
-## 概述
+## 重计算
+
+### 概述
+
+重计算可以显著降低训练时的激活内存,但会额外增加一些计算。关于重计算的原理和框架测能力可参考 [MindSpore 教程文档:重计算](https://www.mindspore.cn/tutorials/zh-CN/master/parallel/recompute.html)。
+
+### 配置与使用
+
+#### YAML 参数配置
+
+用户可通过在模型训练的 yaml 配置文件中新增 `recompute_config` 模块来使用重计算。
+
+以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L113) 为例,可做如下配置:
+
+```yaml
+# recompute config
+recompute_config:
+ recompute: [3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 2, 0]
+ select_recompute: False
+ parallel_optimizer_comm_recompute: True
+ mp_comm_recompute: True
+ recompute_slice_activation: True
+```
+
+如果需要对选择重计算配置到某几个特定层进行,可以使用 tuple 的方式进行配置。
+
+例如:一个网络有48层, `pp_interleave_num` 为 `2` , `pipeline_stage` 为 `5` ,offset设为 `[[0,1,1,1,1],[1,1,1,1,0]]` ,重计算配置如下:
+
+```yaml
+# recompute config
+recompute_config:
+ recompute: [[2,1,0,0,0],[1,0,0,0,0]]
+ select_recompute:
+ 'feed_forward\.w1\.activation\.silu': True
+ 'feed_forward\.mul': True
+ 'feed_forward\.w1\.matmul': [[1,0,0,0,0],[2,1,0,0,0]]
+ 'feed_forward\.w3\.matmul': [2,1,0,0,0]
+ select_comm_recompute: ['ffn_norm\.norm','attention_norm\.norm']
+```
+
+在日志中会打印将输入格式规范化后的重计算策略信息:
+
+```text
+INFO - Formative layer_recompute: [[2, 1, 0, 0, 0], [1, 0, 0, 0, 0]]
+INFO - Formative select_recompute: {'feed_forward\.w1\.activation\.silu': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'feed_forward\.mul': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'feed_forward\.w1\.matmul': [[1, 0, 0, 0, 0], [2, 1, 0, 0, 0]], 'feed_forward\.w3\.matmul': [[1, 1, 0, 0, 0], [1, 0, 0, 0, 0]]}
+INFO - Formative select_comm_recompute: {'ffn_norm\.norm': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'attention_norm\.norm': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]]}
+```
+
+随后会打印每一层重计算的配置方式。
+
+> 1. 如果某一层同时配置了完全重计算与选择重计算,则按完全重计算生效。
+> 2. 在一维整数型 list 或 tuple 中的整数可以替换为 True 或 False,代表对所有层启用或关闭重计算。
+
+#### 主要配置参数介绍
+
+有关重计算配置的主要参数如下表所列:
+
+| 参数 | 描述 | 取值说明 |
+|-----------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| recompute | (按层)完全重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple。 配置为 bool 类型时,对所有层开启或关闭完全重计算; 配置为整数型 list 或 tuple 时,代表每个 `pipline_stage` 中有多少层开启完全重计算, `pp_interleave_num > 1` 时开启的重计算层数会均匀分配到各 interleave 中; 配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启完全重计算。 |
+| select_recompute | (按算子)选择重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple,字符串的 list 或 tuple,以及 dict。 默认选择重计算算子为 `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` 。 配置为 bool 类型时,对所有层开启或关闭默认算子的选择重计算; 配置为整数型 list 或 tuple 时,代表每个 `pipline_stage` 中有多少层开启默认算子的选择重计算, `pp_interleave_num > 1` 时开启的选择重计算层数会均匀分配到各 interleave 中; 配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启默认算子的选择重计算。 配置为字符串 list 或 tuple 时,代表对哪些算子开启选择重计算,算子名通过正则表达式匹配,层级关系通过 `'\\.'` 分割; 配置为 dict 时,key 值对应算子名,value 值对应选择重计算的配置方式,这种配法可以对每个算子精细配置重计算策略。 |
+| select_comm_recompute | (按算子)选择通信重计算。 | 配置方式与 **select_recompute** 相同,默认选择通信重计算算子为 `['.*\\.norm']` 。一般仅对 layer_norm 或类似层进行配置。 |
+| parallel_optimizer_comm_recompute | 优化器并行通信重计算。在优化器并行下,是否重计算 AllGather 通信。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由优化器并行引入的 AllGather 通信是否重计算。 默认值: `False` 。 |
+| mp_comm_recompute | 模型并行通信重计算,在模型并行下,是否重计算通信算子。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由模型并行引入的通信操作是否重计算。默认值: `True` 。 |
+| recompute_slice_activation | 切片重计算,是否对将保留在内存中的 Cell 输出进行切片。 | (bool, 可选) - 默认值: `False` 。 |
+
+## 细粒度激活值SWAP
+
+### 概述
在传统大模型训练任务中,计算卡的显存资源常常成为训练瓶颈,采用更大规模的模型并行(model parallel, mp)和流水线并行(pipeline parallel, pp)切分策略,虽然能一定程度上缓解单张计算卡的显存压力,但需要更大规模的集群资源,且引入过多的通信会极大地降低模型的MFU。在集群资源有限的情况下,重计算是另一个缓解内存压力的有效手段,其通过放弃存储正向传播阶段的激活值,并在梯度反向回传时重新计算所需激活值,来降低激活值的显存占用,由于重计算需引入额外的计算开销,因此该方法同样会显著降低模型训练的MFU(Model FLOPs Utilization)。
@@ -10,9 +78,9 @@
细粒度激活值SWAP技术具备较高的使用灵活度。大模型训练的正向传播阶段,将产生数据量大小不同的若干激活值,用户可按需选择特定的激活值进行SWAP,且选择激活值的粒度为算子级。当模型类型或规格改变时,用户可灵活调整对应的SWAP策略,以追求最低的内存开销和最优的性能。
-## 使用说明
+### 使用说明
-### 约束场景
+#### 约束场景
- 仅支持静态图O0/O1模式
- 支持Llama系稠密模型,后续演进支持MoE稀疏模型
@@ -31,7 +99,7 @@
- 仅支持Ascend后端
-### 接口说明
+#### 接口说明
细粒度激活值SWAP特性通过YAML配置`swap_config`字段使能,包括`swap`、`default_prefetch`、`layer_swap`、`op_swap`四个功能接口,用户可通过此接口灵活选择特定层或特定层的特定算子使能激活值SWAP功能。
@@ -44,7 +112,7 @@
| layer_swap | List | 默认值None。当为None时,本接口不生效;当为List类型时,本接口包含若干Dict类型的列表元素,每个Dict类型元素包含`backward_prefetch`与`layers`两个键,提供使能SWAP的预取时机(即开始搬回操作的时机)和对应的层索引。 |
| op_swap | List | 默认值None。当为None时,本接口不生效;当为List类型时,本接口包含若干Dict类型的列表元素,每个Dict类型元素包含`op_name`、`backward_prefetch`与`layers`三个键,提供使能SWAP的预取时机和对应的算子名、层索引。 |
-### 混合重计算
+#### 混合重计算
细粒度激活值SWAP与重计算存在耦合:
@@ -58,15 +126,15 @@
| select_recompute | 确定各pipeline stage中特定算子使能重计算的层数 | 不感知pipeline stage,对于每个算子的键值对,仅接受bool/list类型入参。当为bool类型时,所有层使能重计算;当为list类型时,列表元素为层索引,按索引选择特定层使能重计算 |
| select_comm_recompute | 确定各pipeline stage中通信算子使能重计算的层数 | 不感知pipeline stage,仅接受bool/list类型入参。当为bool类型时,所有层使能重计算;当为list类型时,列表元素为层索引,按索引选择特定层使能重计算 |
-## 使用示例
+### 使用示例
本章节以 Llama2-7B 训练为例,演示细粒度激活值SWAP特性的使用。
-### 环境准备
+#### 环境准备
下载 MindSpore Transformers,并准备预训练数据集,如wikitext等。
-### 示例一:默认SWAP策略
+#### 示例一:默认SWAP策略
在YAML中修改补充重计算与SWAP配置,主要配置参数如下:
@@ -112,7 +180,7 @@ bash ./scripts/msrun_launcher.sh "run_mindformer.py \
默认SWAP策略执行成功。
-### 示例二:选择特定层使能SWAP
+#### 示例二:选择特定层使能SWAP
在YAML中修改补充重计算与SWAP配置,主要配置参数如下:
@@ -158,7 +226,7 @@ bash ./scripts/msrun_launcher.sh "run_mindformer.py \
选择特定层使能SWAP的策略执行成功。
-### 示例三:选择特定层的特定算子使能SWAP
+#### 示例三:选择特定层的特定算子使能SWAP
在YAML中修改补充重计算与SWAP配置,主要配置参数如下:
@@ -213,7 +281,7 @@ bash ./scripts/msrun_launcher.sh "run_mindformer.py \
选择特定层的特定算子使能SWAP成功。
-### 示例四:细粒度激活值SWAP与重计算混用
+#### 示例四:细粒度激活值SWAP与重计算混用
在YAML中修改补充重计算与SWAP配置,主要配置参数如下:
diff --git a/docs/mindformers/docs/source_zh_cn/function/monitor.md b/docs/mindformers/docs/source_zh_cn/feature/monitor.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/function/monitor.md
rename to docs/mindformers/docs/source_zh_cn/feature/monitor.md
index 57b478433142ce067166ce9e45c03f6d8745b71c..fb1c95840f3f63883a5ba578ddb72fdf5bce6da7 100644
--- a/docs/mindformers/docs/source_zh_cn/function/monitor.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/monitor.md
@@ -1,6 +1,6 @@
# 训练指标监控
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/monitor.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/monitor.md)
MindSpore Transformers 支持 TensorBoard 作为可视化工具,用于监控和分析训练过程中的各种指标和信息。TensorBoard 是一个独立的可视化库,需要用户手动安装,它提供了一种交互式的方式来查看训练中的损失、精度、学习率、梯度分布等多种内容。用户在训练`yaml`文件中配置 TensorBoard 后,在大模型训练过程中会实时生成并更新事件文件,可以通过命令查看训练数据。
@@ -117,7 +117,7 @@ TensorBoard 2.18.0 at http://0.0.0.0:6006/ (Press CTRL+C to quit)
在 Tensorboard 的 SCALARS 页面中,上述指标(假设名为 `scalar_name`)除了最后两个,其他都存在 `scalar_name` 和 `scalar_name-vs-samples` 两个下拉标签页。其中 `scalar_name` 下展示了该标量随训练迭代步数进行变化的折线图; `scalar_name-vs-samples` 下展示了该标量随样本数进行变化的折线图。如下图所示为学习率`learning-rate`的曲线图示例:
-
+
#### TrainingStateMonitor监控指标
@@ -138,23 +138,23 @@ TensorBoard 2.18.0 at http://0.0.0.0:6006/ (Press CTRL+C to quit)
**日志效果示例**
-
+
**tensorboard可视化效果示例**
adam_m_norm
-
+
local_loss与local_norm
-
+
### 文本数据可视化说明
在 TEXT 页面中,每个训练配置存在一个标签页,其中记录了该配置的值。如下图所示:
-
+
所有配置名和说明如下:
@@ -223,4 +223,4 @@ local_loss与local_norm
> 2. 用户在训练配置文件 `yaml` 中设置的配置参数;
> 3. 训练默认的配置参数。
>
-> 可配置的所有参数请参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html)。
\ No newline at end of file
+> 可配置的所有参数请参考[配置文件说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html)。
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md b/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md
new file mode 100644
index 0000000000000000000000000000000000000000..b679e7deba5009119df3504d0d9576f65f950a70
--- /dev/null
+++ b/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md
@@ -0,0 +1,75 @@
+# 其它训练特性
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/other_training_features.md)
+
+在大规模的深度学习模型训练中,会遇到诸如:内存限制、计算资源的有效利用、分布式训练中的同步问题等挑战,需要使用训练优化算法来提高训练效率、加速收敛速度以及改善最终模型性能。
+
+MindSpore Transformers 提供了梯度累积、梯度裁剪等训练优化算法,可供开发者进行训练时使用。
+
+## 梯度累积
+
+### 概述
+
+MindSpore 在 2.1.1 之后的版本中增加了 `mindspore.nn.wrap.cell_wrapper.GradAccumulationCell` 这一梯度累积实现接口,通过拆分 MiniBatch 的形式提供了梯度累加的能力,MindSpore Transformers 将其封装进了统一的训练流程,通过 yaml 配置进行使能。关于梯度累积的原理和框架测的能力可以参考 [MindSpore 文档:梯度累加](https://www.mindspore.cn/tutorials/zh-CN/master/parallel/distributed_gradient_accumulation.html)。
+
+### 配置与使用
+
+#### YAML 参数配置
+
+用户在需要开启梯度累积的场景下,只需在配置文件中的 `runner_config` 项下配置 `gradient_accumulation_steps` 项,设置为所需的梯度累积步数即可:
+
+```yaml
+# runner config
+runner_config:
+ ...
+ gradient_accumulation_steps: 4
+ ...
+```
+
+#### 主要配置参数介绍
+
+| 参数 | 描述 | 取值说明 |
+|-----------------------------|---------------------------------|------------------------|
+| gradient_accumulation_steps | 在执行反向传播前,累积梯度的步数。 | (int, 必选) - 默认值: `1` 。 |
+
+#### 其他方式使用梯度累积
+
+除配置文件外,当采用 `run_mindformer.py` 脚本启动时,可指定 `--gradient_accumulation_steps` 入参来使用梯度累积功能。
+
+#### 梯度累积使用限制
+
+> 开启梯度累积会增大内存开销,请注意内存管理,防止发生内存溢出(OOM)。
+
+1. 由于 `GradAccumulationCell` 的实现依赖并行特性,梯度累积当前仅支持在**半自动并行模式**下使用;
+2. 此外,在 pipeline 并行场景下,梯度累积含义与 micro_batch 相同,将不会生效,请配置 `micro_batch_num` 项以增大训练 batch_size。
+
+## 梯度裁剪
+
+### 概述
+
+梯度裁剪算法可以避免反向梯度过大,跳过最优解的情况。
+
+### 配置与使用
+
+#### YAML 参数配置
+
+在 MindSpore Transformers 中,默认的训练流程 `MFTrainOneStepCell` 中集成了梯度裁剪逻辑。
+
+可使用如下示例,以开启梯度裁剪:
+
+```yaml
+# wrapper cell config
+runner_wrapper:
+ type: MFTrainOneStepCell
+ ...
+ use_clip_grad: True
+ max_grad_norm: 1.0
+ ...
+```
+
+#### 主要配置参数介绍
+
+| 参数 | 描述 | 取值说明 |
+|---------------|-------------------|----------------------------|
+| use_clip_grad | 控制在训练过程中是否开启梯度裁剪。 | (bool, 可选) - 默认值: `False` 。 |
+| max_grad_norm | 控制梯度裁剪的最大 norm 值。 | (float, 可选) - 默认值: `1.0` 。 |
diff --git a/docs/mindformers/docs/source_zh_cn/function/distributed_parallel.md b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md
similarity index 95%
rename from docs/mindformers/docs/source_zh_cn/function/distributed_parallel.md
rename to docs/mindformers/docs/source_zh_cn/feature/parallel_training.md
index f3d679f1364eff25a820011be344c05a918b7704..96a9adcfdc741afba7ef4e6e1648e2728cddb467 100644
--- a/docs/mindformers/docs/source_zh_cn/function/distributed_parallel.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md
@@ -1,6 +1,6 @@
-# 分布式并行
+# 分布式并行训练
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/distributed_parallel.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/parallel_training.md)
## 并行模式与应用场景
@@ -33,7 +33,7 @@ MindSpore Transformers 支持多种并行特性,开发者可以利用这些特
| **[长序列并行](#长序列并行)** | 设计用于处理长序列输入的模型,对所有的input输入和所有的输出activation在sequence维度上进行切分,对于超长序列输入场景进一步减少显存占用。 |
| **[多副本并行](https://www.mindspore.cn/docs/zh-CN/master/features/parallel/pipeline_parallel.html#mindspore%E4%B8%AD%E7%9A%84interleaved-pipeline%E8%B0%83%E5%BA%A6)** | 用于在多个副本之间实现精细的并行控制,优化性能和资源利用率,适合大规格模型的高效训练。 |
-关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 中的并行配置章节下的具体内容。
+关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。
## 并行特性介绍
@@ -64,7 +64,7 @@ parallel_config:
- use_ring_attention:是否开启Ring Attention,默认为False。
- context_parallel:序列并行切分数量,默认为1,根据用户需求配置。
-关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 中的并行配置章节下的具体内容。
+关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。
#### Ulysses序列并行
@@ -95,7 +95,7 @@ parallel_config:
- enable_alltoall:生成alltoall通信算子,默认为False,不启用时将会由allgather等其他算子组合完成等价替代,可参考MindSpore `set_auto_parallel_context`[接口文档](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.set_auto_parallel_context.html);启用Ulysses方案时我们期望能够直接插入alltoall通信算子,因此将该配置项打开。
- context_parallel_algo:设置为`ulysses_cp`开启Ulysses序列并行。
-关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 中的并行配置章节下的具体内容。
+关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。
#### 混合序列并行
@@ -121,7 +121,7 @@ parallel_config:
- context_parallel_algo:设置为`hybrid_cp`时开启混合序列并行。
- ulysses_degree_in_cp:Ulysses序列并行切分数量。
-关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 中的并行配置章节下的具体内容。
+关于分布式并行参数的配置方法,参见 [MindSpore Transformers 配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。
### 流水线并行
@@ -153,7 +153,7 @@ parallel_config:
- 目前仅支持Llama和DeepSeek系列模型。
- 目前暂不支持使用Megatron的多源数据集进行训练的场景。
-关于分布式并行参数的配置方法,参见 [MindSpore Transformers配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/appendix/conf_files.html) 中的并行配置章节下的具体内容。
+关于分布式并行参数的配置方法,参见 [MindSpore Transformers配置说明](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/configuration.html) 中的并行配置章节下的具体内容。
## MindSpore Transformers 分布式并行应用实践
@@ -166,6 +166,6 @@ parallel_config:
- **多副本并行**:通过执行序调度算法控制细粒度多分支的并行(`fine_grain_interleave: 2`),提高计算与通信的相互掩盖。
- **优化器并行**:优化器计算分散到多个设备上,以减少内存占用(`enable_parallel_optimizer: True`)。
-> 注意:开启细粒度多副本并行的同时必须开启序列并行。
+> 开启细粒度多副本并行的同时必须开启序列并行。
通过以上配置,Llama3-70B的分布式训练在多机多卡环境中可以有效利用硬件资源,实现高效、稳定的模型训练。
diff --git a/docs/mindformers/docs/source_zh_cn/usage/quantization.md b/docs/mindformers/docs/source_zh_cn/feature/quantization.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/usage/quantization.md
rename to docs/mindformers/docs/source_zh_cn/feature/quantization.md
index c439651c502c144809483f8eeaefb606a8f73a6b..76d2c1a3b3a5dc84868835733ce69fffdf7ecad9 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/quantization.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/quantization.md
@@ -1,6 +1,6 @@
# 量化
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/quantization.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/quantization.md)
## 概述
diff --git a/docs/mindformers/docs/source_zh_cn/function/resume_training.md b/docs/mindformers/docs/source_zh_cn/feature/resume_training.md
similarity index 99%
rename from docs/mindformers/docs/source_zh_cn/function/resume_training.md
rename to docs/mindformers/docs/source_zh_cn/feature/resume_training.md
index f91c66a89932a4e4a76ede2e10fb490a8e27fecc..f9762887e3e6174842eab9859ccccaffecc81a59 100644
--- a/docs/mindformers/docs/source_zh_cn/function/resume_training.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/resume_training.md
@@ -1,6 +1,6 @@
# 模型断点续训
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/resume_training.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/resume_training.md)
## 断点续训
diff --git a/docs/mindformers/docs/source_zh_cn/function/safetensors.md b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md
similarity index 96%
rename from docs/mindformers/docs/source_zh_cn/function/safetensors.md
rename to docs/mindformers/docs/source_zh_cn/feature/safetensors.md
index f4d3a76795215a1b90a8328d55432f88244aff68..6e33d1143ddfb88a87e38306258ec1ba639de253 100644
--- a/docs/mindformers/docs/source_zh_cn/function/safetensors.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md
@@ -1,6 +1,6 @@
# Safetensors权重
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/safetensors.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/safetensors.md)
## 概述
@@ -15,7 +15,7 @@ Safetensors文件主要分为两种类型:完整权重文件和分布式权重
Safetensors完整权重可通过以下两种方式获取:
1. 直接从Huggingface上下载。
-2. 通过MindSpore Transformers分布式训练后,通过[合并脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html#safetensors%E6%9D%83%E9%87%8D%E7%A6%BB%E7%BA%BF%E5%90%88%E5%B9%B6)生成完整权重。
+2. 通过MindSpore Transformers分布式训练后,通过[合并脚本](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html#safetensors%E6%9D%83%E9%87%8D%E7%A6%BB%E7%BA%BF%E5%90%88%E5%B9%B6)生成完整权重。
Huggingface Safetensors示例目录结构:
@@ -106,7 +106,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
任务执行完成后,在mindformers/output目录下,会生成checkpoint文件夹,同时模型文件会保存在该文件夹下。
-更多详情请参考:[预训练介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/pre_training.html)
+更多详情请参考:[预训练介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/pre_training.html)
### 微调任务示例
@@ -154,7 +154,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
任务执行完成后,在mindformers/output目录下,会生成checkpoint文件夹,同时模型文件会保存在该文件夹下。
-更多详情请参考:[SFT微调介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/sft_tuning.html)
+更多详情请参考:[SFT微调介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/supervised_fine_tuning.html)
### 推理任务示例
@@ -201,7 +201,7 @@ bash scripts/msrun_launcher.sh "python run_mindformer.py \
'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......]
```
-更多详情请参考:[推理介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/inference.html)
+更多详情请参考:[推理介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/inference.html)
### 断点续训任务示例
@@ -237,9 +237,9 @@ callbacks:
checkpoint_format: safetensors # 保存权重文件格式
```
-大集群规模场景下,避免在线合并过程耗时过长占用训练资源,推荐将原分布式权重文件离线[合并完整权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html#safetensors%E6%9D%83%E9%87%8D%E7%A6%BB%E7%BA%BF%E5%90%88%E5%B9%B6)后传入,无需传入源切分策略文件路径。
+大集群规模场景下,避免在线合并过程耗时过长占用训练资源,推荐将原分布式权重文件离线[合并完整权重](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html#safetensors%E6%9D%83%E9%87%8D%E7%A6%BB%E7%BA%BF%E5%90%88%E5%B9%B6)后传入,无需传入源切分策略文件路径。
-更多详情请参考:[断点续训介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/resume_training.html)。
+更多详情请参考:[断点续训介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html)。
## 权重保存
@@ -247,7 +247,7 @@ callbacks:
在深度学习模型的训练过程中,保存模型的权重是至关重要的一步。权重保存功能使得我们能够在训练的任意阶段存储模型的参数,以便用户在训练中断或完成后进行恢复、继续训练、评估或部署。同时还可以通过保存权重的方式,在不同环境下复现实验结果。
-目前,MindSpore TransFormer 支持 [safetensors](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/safetensors.html) 格式的权重文件读取和保存。
+目前,MindSpore TransFormer 支持 [safetensors](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/safetensors.html) 格式的权重文件读取和保存。
### 目录结构
diff --git a/docs/mindformers/docs/source_zh_cn/function/start_tasks.md b/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md
similarity index 96%
rename from docs/mindformers/docs/source_zh_cn/function/start_tasks.md
rename to docs/mindformers/docs/source_zh_cn/feature/start_tasks.md
index bef2b6993416d0077949237243084f775cde92a3..f4e983a0676a7894d86a9b4002cfefab13b1f624 100644
--- a/docs/mindformers/docs/source_zh_cn/function/start_tasks.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md
@@ -1,6 +1,6 @@
# 启动任务
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/start_tasks.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/start_tasks.md)
## 概述
@@ -22,7 +22,7 @@ MindSpore Transformers提供了一键启动脚本`run_mindformer.py`和分布式
| `--device_id` | 设置执行设备ID,其值必须在可用设备范围内。 | int,可选 | 预训练/微调/推理 |
| `--device_target` | 设置后端执行设备,MindSpore Transformers仅支持在`Ascend`设备上运行。 | str,可选 | 预训练/微调/推理 |
| `--run_mode` | 设置模型的运行模式,可选`train`、`finetune`或`predict`。 | str,可选 | 预训练/微调/推理 |
-| `--load_checkpoint` | 加载的权重文件或文件夹路径,详细使用方式参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)。 | str,可选 | 预训练/微调/推理 |
+| `--load_checkpoint` | 加载的权重文件或文件夹路径,详细使用方式参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)。 | str,可选 | 预训练/微调/推理 |
| `--use_parallel` | 是否开启并行模式。 | bool,可选 | 预训练/微调/推理 |
| `--output_dir` | 设置保存日志、权重、切分策略等文件的路径。 | str,可选 | 预训练/微调/推理 |
| `--register_path` | 外挂代码所在目录的绝对路径。比如research目录下的模型目录。 | str,可选 | 预训练/微调/推理 |
@@ -33,7 +33,7 @@ MindSpore Transformers提供了一键启动脚本`run_mindformer.py`和分布式
| 参数 | 参数说明 | 取值说明 | 适用场景 |
|:----------------------------:|:-------------------------------------------------------------------------------------------------------------------|--------------------------------|-----------|
| `--src_strategy_path_or_dir` | 权重的策略文件路径。 | str,可选 | 预训练/微调/推理 |
-| `--auto_trans_ckpt` | 是否开启在线权重自动转换功能,详情可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)。 | bool,可选 | 预训练/微调/推理 |
+| `--auto_trans_ckpt` | 是否开启在线权重自动转换功能,详情可参考[权重转换功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)。 | bool,可选 | 预训练/微调/推理 |
| `--transform_process_num` | 负责权重转换的进程数。 | int,可选 | 预训练/微调/推理 |
| `--only_save_strategy` | 是否仅保存切分策略文件。 | bool,可选,为`true`时任务在保存策略文件后直接退出 | 预训练/微调/推理 |
@@ -42,7 +42,7 @@ MindSpore Transformers提供了一键启动脚本`run_mindformer.py`和分布式
| 参数 | 参数说明 | 取值说明 | 适用场景 |
|:-------------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------|---------|--------|
| `--train_dataset_dir` | 预训练/微调的数据集目录。 | str,可选 | 预训练/微调 |
-| `--resume_training` | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool,可选 | 预训练/微调 |
+| `--resume_training` | 是否开启断点续训功能,详情可参考[断点续训功能](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html#%E6%96%AD%E7%82%B9%E7%BB%AD%E8%AE%AD)。 | bool,可选 | 预训练/微调 |
| `--epochs` | 训练轮次。 | int,可选 | 预训练/微调 |
| `--gradient_accumulation_steps` | 梯度累积步数。 | int,可选 | 预训练/微调 |
| `--batch_size` | 批处理数据的样本数。 | int,可选 | 预训练/微调 |
diff --git a/docs/mindformers/docs/source_zh_cn/feature/training_function.rst b/docs/mindformers/docs/source_zh_cn/feature/training_function.rst
new file mode 100644
index 0000000000000000000000000000000000000000..63c9692fce030a030ec8849f319cd0a8aa63f853
--- /dev/null
+++ b/docs/mindformers/docs/source_zh_cn/feature/training_function.rst
@@ -0,0 +1,15 @@
+训练功能
+===========
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+
+ dataset
+ training_hyperparameters
+ monitor
+ resume_training
+ parallel_training
+ high_availability
+ memory_optimization
+ other_training_features
diff --git a/docs/mindformers/docs/source_zh_cn/function/training_hyperparameters.md b/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md
similarity index 86%
rename from docs/mindformers/docs/source_zh_cn/function/training_hyperparameters.md
rename to docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md
index 00fb536df9b897573d959068eb6d29c4172d979d..34304e255e615af362a67be165af7ec8db97eb71 100644
--- a/docs/mindformers/docs/source_zh_cn/function/training_hyperparameters.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md
@@ -1,10 +1,10 @@
# 模型训练超参数配置
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/training_hyperparameters.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/training_hyperparameters.md)
超参数对模型的性能有着重要影响,不同的超参数设置可能导致模型表现的巨大差异。参数的选择会影响到模型的训练速度、收敛性、容量和泛化能力等方面。且它们并非通过训练数据直接学习得到的,而是由开发者根据经验、实验或调优过程来确定的。
-MindSpore Transformer 提供了如下几类超参数的配置方式。
+MindSpore Transformers 提供了如下几类超参数的配置方式。
## 学习率
@@ -32,7 +32,7 @@ lr_schedule:
#### 主要配置参数介绍
-各学习率需配置的参数不同,MindSpore Transformer 目前支持了以下学习率:
+各学习率需配置的参数不同,MindSpore Transformers 目前支持了以下学习率:
1. [恒定预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.ConstantWarmUpLR.html)
2. [线性预热学习率](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.LinearWithWarmUpLR.html)
@@ -68,7 +68,7 @@ lr_schedule:
total_steps: 20 # -1 means it will load the total steps of the dataset
```
-更多关于学习率 API 的介绍(如 `type` 的配置名称、学习率算法的介绍),可参见 [MindSpore TransFormer API 文档:学习率部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/mindformers.core.html#%E5%AD%A6%E4%B9%A0%E7%8E%87) 的相关链接。
+更多关于学习率 API 的介绍(如 `type` 的配置名称、学习率算法的介绍),可参见 [MindSpore Transformers API 文档:学习率部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/mindformers.core.html#%E5%AD%A6%E4%B9%A0%E7%8E%87) 的相关链接。
## 优化器
@@ -78,7 +78,7 @@ lr_schedule:
选择合适的优化器对模型的收敛速度和最终性能有着至关重要的影响。不同的优化器通过不同的方法调整学习率和其他超参数来加速训练过程、改善收敛性并避免局部最优解。
-当前,MindSpore Transformer 只支持 [AdamW 优化器](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/mindformers.core.html#%E4%BC%98%E5%8C%96%E5%99%A8)。
+当前,MindSpore Transformers 只支持 [AdamW 优化器](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/mindformers.core.html#%E4%BC%98%E5%8C%96%E5%99%A8)。
### 配置与使用
@@ -98,4 +98,4 @@ optimizer:
#### 主要配置参数介绍
-有关优化器配置的主要参数,可参见 [MindSpore TransFormer API 文档:优化器部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.AdamW.html#mindformers.core.AdamW) 的相关链接。
+有关优化器配置的主要参数,可参见 [MindSpore Transformers API 文档:优化器部分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/core/mindformers.core.AdamW.html#mindformers.core.AdamW) 的相关链接。
diff --git a/docs/mindformers/docs/source_zh_cn/function/transform_weight.md b/docs/mindformers/docs/source_zh_cn/feature/transform_weight.md
similarity index 99%
rename from docs/mindformers/docs/source_zh_cn/function/transform_weight.md
rename to docs/mindformers/docs/source_zh_cn/feature/transform_weight.md
index 0081faf9b901485924d022ca01aed7fac4ddf7fc..cc03b0c4b4b7ba12725bb5b124234fe799b402db 100644
--- a/docs/mindformers/docs/source_zh_cn/function/transform_weight.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/transform_weight.md
@@ -1,6 +1,6 @@
# 分布式权重切分与合并
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/transform_weight.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/transform_weight.md)
## 概述
diff --git a/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md b/docs/mindformers/docs/source_zh_cn/feature/weight_conversion.md
similarity index 99%
rename from docs/mindformers/docs/source_zh_cn/function/weight_conversion.md
rename to docs/mindformers/docs/source_zh_cn/feature/weight_conversion.md
index 13e5e81e3539df2263088e5d109f50031307b018..19eacbc26074a5fbeae250922c06fb3325aa6feb 100644
--- a/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/weight_conversion.md
@@ -1,6 +1,6 @@
# 权重格式转换
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/weight_conversion.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/feature/weight_conversion.md)
## 概述
diff --git a/docs/mindformers/docs/source_zh_cn/full-process_1.png b/docs/mindformers/docs/source_zh_cn/full-process_1.png
index dbb6a24333a105f779396fc342b049c72938e5c8..27e14e5bb14815a03be6ab6fffe290dd995a8c5f 100644
Binary files a/docs/mindformers/docs/source_zh_cn/full-process_1.png and b/docs/mindformers/docs/source_zh_cn/full-process_1.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/full-process_2.png b/docs/mindformers/docs/source_zh_cn/full-process_2.png
index 27e14e5bb14815a03be6ab6fffe290dd995a8c5f..f422ae1f15ee0285eb9d37da52f096835bd98f93 100644
Binary files a/docs/mindformers/docs/source_zh_cn/full-process_2.png and b/docs/mindformers/docs/source_zh_cn/full-process_2.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/full-process_3.png b/docs/mindformers/docs/source_zh_cn/full-process_3.png
index f422ae1f15ee0285eb9d37da52f096835bd98f93..4356392871de33da27839693b25238e103097f64 100644
Binary files a/docs/mindformers/docs/source_zh_cn/full-process_3.png and b/docs/mindformers/docs/source_zh_cn/full-process_3.png differ
diff --git a/docs/mindformers/docs/source_zh_cn/full-process_4.png b/docs/mindformers/docs/source_zh_cn/full-process_4.png
deleted file mode 100644
index d438149f0718f823a8da83b7fec5f679281b2b8c..0000000000000000000000000000000000000000
Binary files a/docs/mindformers/docs/source_zh_cn/full-process_4.png and /dev/null differ
diff --git a/docs/mindformers/docs/source_zh_cn/full-process_5.png b/docs/mindformers/docs/source_zh_cn/full-process_5.png
deleted file mode 100644
index 4356392871de33da27839693b25238e103097f64..0000000000000000000000000000000000000000
Binary files a/docs/mindformers/docs/source_zh_cn/full-process_5.png and /dev/null differ
diff --git a/docs/mindformers/docs/source_zh_cn/function/other_features.md b/docs/mindformers/docs/source_zh_cn/function/other_features.md
deleted file mode 100644
index 63dac73faf5852d244ae849862a59070c287fae1..0000000000000000000000000000000000000000
--- a/docs/mindformers/docs/source_zh_cn/function/other_features.md
+++ /dev/null
@@ -1,141 +0,0 @@
-# 其它特性
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/function/other_features.md)
-
-在大规模的深度学习模型训练中,会遇到诸如:内存限制、计算资源的有效利用、分布式训练中的同步问题等挑战,需要使用训练优化算法来提高训练效率、加速收敛速度以及改善最终模型性能。
-
-MindSpore TransFormer 提供了重计算、梯度累积、梯度裁剪等训练优化算法,可供开发者进行训练时使用。
-
-## 重计算
-
-### 概述
-
-重计算可以显著降低训练时的激活内存,但会额外增加一些计算。关于重计算的原理和框架测能力可参考 [MindSpore 教程文档:重计算](https://www.mindspore.cn/tutorials/zh-CN/master/parallel/recompute.html)。
-
-### 配置与使用
-
-#### YAML 参数配置
-
-用户可通过在模型训练的 yaml 配置文件中新增 `recompute_config` 模块来使用重计算。
-
-以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3/deepseek3_671b/pretrain_deepseek3_671b.yaml#L113) 为例,可做如下配置:
-
-```yaml
-# recompute config
-recompute_config:
- recompute: [3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 2, 0]
- select_recompute: False
- parallel_optimizer_comm_recompute: True
- mp_comm_recompute: True
- recompute_slice_activation: True
-```
-
-如果需要对选择重计算配置到某几个特定层进行,可以使用 tuple 的方式进行配置。
-
-例如:一个网络有48层, `pp_interleave_num` 为 `2` , `pipeline_stage` 为 `5` ,offset设为 `[[0,1,1,1,1],[1,1,1,1,0]]` ,重计算配置如下:
-
-```yaml
-# recompute config
-recompute_config:
- recompute: [[2,1,0,0,0],[1,0,0,0,0]]
- select_recompute:
- 'feed_forward\.w1\.activation\.silu': True
- 'feed_forward\.mul': True
- 'feed_forward\.w1\.matmul': [[1,0,0,0,0],[2,1,0,0,0]]
- 'feed_forward\.w3\.matmul': [2,1,0,0,0]
- select_comm_recompute: ['ffn_norm\.norm','attention_norm\.norm']
-```
-
-在日志中会打印将输入格式规范化后的重计算策略信息:
-
-```log
-INFO - Formative layer_recompute: [[2, 1, 0, 0, 0], [1, 0, 0, 0, 0]]
-INFO - Formative select_recompute: {'feed_forward\.w1\.activation\.silu': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'feed_forward\.mul': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'feed_forward\.w1\.matmul': [[1, 0, 0, 0, 0], [2, 1, 0, 0, 0]], 'feed_forward\.w3\.matmul': [[1, 1, 0, 0, 0], [1, 0, 0, 0, 0]]}
-INFO - Formative select_comm_recompute: {'ffn_norm\.norm': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]], 'attention_norm\.norm': [[4, 5, 5, 5, 5], [5, 5, 5, 5, 4]]}
-```
-
-随后会打印每一层重计算的配置方式。
-
-> 1. 如果某一层同时配置了完全重计算与选择重计算,则按完全重计算生效。
-> 2. 在一维整数型 list 或 tuple 中的整数可以替换为 True 或 False,代表对所有层启用或关闭重计算。
-
-#### 主要配置参数介绍
-
-有关重计算配置的主要参数如下表所列:
-
-| 参数 | 描述 | 取值说明 |
-|-----------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| recompute | (按层)完全重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple。 配置为 bool 类型时,对所有层开启或关闭完全重计算; 配置为整数型 list 或 tuple 时,代表每个 `pipline_stage` 中有多少层开启完全重计算, `pp_interleave_num > 1` 时开启的重计算层数会均匀分配到各 interleave 中; 配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启完全重计算。 |
-| select_recompute | (按算子)选择重计算。 | 可配置为 bool,整数型的 list 或 tuple,或二维 list 或 tuple,字符串的 list 或 tuple,以及 dict。 默认选择重计算算子为 `['feed_forward\\.mul', 'feed_forward\\.w1\\.activation\\.silu']` 。 配置为 bool 类型时,对所有层开启或关闭默认算子的选择重计算; 配置为整数型 list 或 tuple 时,代表每个 `pipline_stage` 中有多少层开启默认算子的选择重计算, `pp_interleave_num > 1` 时开启的选择重计算层数会均匀分配到各 interleave 中; 配置为整数型二维 list 或 tuple 时,代表每个 mini stage 中有多少层开启默认算子的选择重计算。 配置为字符串 list 或 tuple 时,代表对哪些算子开启选择重计算,算子名通过正则表达式匹配,层级关系通过 `'\\.'` 分割; 配置为 dict 时,key 值对应算子名,value 值对应选择重计算的配置方式,这种配法可以对每个算子精细配置重计算策略。 |
-| select_comm_recompute | (按算子)选择通信重计算。 | 配置方式与 **select_recompute** 相同,默认选择通信重计算算子为 `['.*\\.norm']` 。一般仅对 layer_norm 或类似层进行配置。 |
-| parallel_optimizer_comm_recompute | 优化器并行通信重计算。在优化器并行下,是否重计算 AllGather 通信。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由优化器并行引入的 AllGather 通信是否重计算。 默认值: `False` 。 |
-| mp_comm_recompute | 模型并行通信重计算,在模型并行下,是否重计算通信算子。 | (bool, 可选) - 开启后在自动并行或半自动并行模式下,指定 Cell 内部由模型并行引入的通信操作是否重计算。默认值: `True` 。 |
-| recompute_slice_activation | 切片重计算,是否对将保留在内存中的 Cell 输出进行切片。 | (bool, 可选) - 默认值: `False` 。 |
-
-## 梯度累积
-
-### 概述
-
-MindSpore 在 2.1.1 之后的版本中增加了 `mindspore.nn.wrap.cell_wrapper.GradAccumulationCell` 这一梯度累积实现接口,通过拆分 MiniBatch 的形式提供了梯度累加的能力,MindSpore Transformer 将其封装进了统一的训练流程,通过 yaml 配置进行使能。关于梯度累积的原理和框架测的能力可以参考 [MindSpore 文档:梯度累加](https://www.mindspore.cn/tutorials/zh-CN/master/parallel/distributed_gradient_accumulation.html)。
-
-### 配置与使用
-
-#### YAML 参数配置
-
-用户在需要开启梯度累积的场景下,只需在配置文件中的 `runner_config` 项下配置 `gradient_accumulation_steps` 项,设置为所需的梯度累积步数即可:
-
-```yaml
-# runner config
-runner_config:
- ...
- gradient_accumulation_steps: 4
- ...
-```
-
-#### 主要配置参数介绍
-
-| 参数 | 描述 | 取值说明 |
-|-----------------------------|---------------------------------|------------------------|
-| gradient_accumulation_steps | 在执行反向传播前,累积梯度的步数。 | (int, 必选) - 默认值: `1` 。 |
-
-#### 其他方式使用梯度累积
-
-除配置文件外,当采用 `run_mindformer.py` 脚本启动时,可指定 `--gradient_accumulation_steps` 入参来使用梯度累积功能。
-
-#### 梯度累积使用限制
-
-> 开启梯度累积会增大内存开销,请注意内存管理,防止发生内存溢出(OOM)。
-
-1. 由于 `GradAccumulationCell` 的实现依赖并行特性,梯度累积当前仅支持在**半自动并行模式**下使用;
-2. 此外,在 pipeline 并行场景下,梯度累积含义与 micro_batch 相同,将不会生效,请配置 `micro_batch_num` 项以增大训练 batch_size。
-
-## 梯度裁剪
-
-### 概述
-
-梯度裁剪算法可以避免反向梯度过大,跳过最优解的情况。
-
-### 配置与使用
-
-#### YAML 参数配置
-
-在 MindSpore TransFormers 中,默认的训练流程 `MFTrainOneStepCell` 中集成了梯度裁剪逻辑。
-
-可使用如下示例,以开启梯度裁剪:
-
-```yaml
-# wrapper cell config
-runner_wrapper:
- type: MFTrainOneStepCell
- ...
- use_clip_grad: True
- max_grad_norm: 1.0
- ...
-```
-
-#### 主要配置参数介绍
-
-| 参数 | 描述 | 取值说明 |
-|---------------|-------------------|----------------------------|
-| use_clip_grad | 控制在训练过程中是否开启梯度裁剪。 | (bool, 可选) - 默认值: `False` 。 |
-| max_grad_norm | 控制梯度裁剪的最大 norm 值。 | (float, 可选) - 默认值: `1.0` 。 |
diff --git a/docs/mindformers/docs/source_zh_cn/usage/mindie_deployment.md b/docs/mindformers/docs/source_zh_cn/guide/deployment.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/usage/mindie_deployment.md
rename to docs/mindformers/docs/source_zh_cn/guide/deployment.md
index b6fe3c78832ee03636bba7570f475bc80ecbdc9c..4e381adaa7c42a42aff3d6ea77936478f5352880 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/mindie_deployment.md
+++ b/docs/mindformers/docs/source_zh_cn/guide/deployment.md
@@ -1,6 +1,6 @@
# 服务化部署
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/mindie_deployment.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/guide/deployment.md)
## MindIE介绍
@@ -8,7 +8,7 @@ MindIE,全称Mind Inference Engine,是基于昇腾硬件的高性能推理
MindSpore Transformers承载在模型应用层MindIE LLM中,通过MindIE Service可以部署MindSpore Transformers中的大模型。
-MindIE推理的模型支持度可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)。
+MindIE推理的模型支持度可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)。
## 环境搭建
@@ -16,7 +16,7 @@ MindIE推理的模型支持度可参考[模型库](https://www.mindspore.cn/mind
1. 安装MindSpore Transformers
- 参考[MindSpore Transformers官方安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/quick_start/install.html)进行安装。
+ 参考[MindSpore Transformers官方安装指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html)进行安装。
2. 安装MindIE
@@ -86,9 +86,9 @@ processor:
merges_file: "/path/to/mf_model/qwen1_5_72b/merges.txt" # merges文件绝对路径
```
-模型权重下载和转换可参考 [权重格式转换指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)。
+模型权重下载和转换可参考 [权重格式转换指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)。
-不同模型的所需文件和配置可能会有差异,详情参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)中具体模型的推理章节。
+不同模型的所需文件和配置可能会有差异,详情参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)中具体模型的推理章节。
### 启动MindIE
@@ -346,4 +346,4 @@ curl -w "\ntime_total=%{time_total}\n" -H "Accept: application/json" -H "Content
## 模型列表
-其他模型的MindIE推理示例可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)中的各模型的介绍文档。
\ No newline at end of file
+其他模型的MindIE推理示例可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)中的各模型的介绍文档。
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/usage/inference.md b/docs/mindformers/docs/source_zh_cn/guide/inference.md
similarity index 97%
rename from docs/mindformers/docs/source_zh_cn/usage/inference.md
rename to docs/mindformers/docs/source_zh_cn/guide/inference.md
index da923bcf70a6a464c186f2fd0bc1ebb1a1881a2f..1336b1d2b033a9f986fc566500d0cf4771c575a5 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/inference.md
+++ b/docs/mindformers/docs/source_zh_cn/guide/inference.md
@@ -1,6 +1,6 @@
# 推理
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/inference.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/guide/inference.md)
## 概述
@@ -22,8 +22,8 @@ MindSpore Transformers 提供了大模型推理能力,用户可以执行 `run_
完整权重可以通过以下两种方式获得:
-1. 从HuggingFace模型库中下载相应模型的开源权重后,参考[权重格式转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)将其转换为ckpt格式。
-2. 预训练或者微调后的分布式权重,通过[合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html)生成一个完整权重。
+1. 从HuggingFace模型库中下载相应模型的开源权重后,参考[权重格式转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)将其转换为ckpt格式。
+2. 预训练或者微调后的分布式权重,通过[合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html)生成一个完整权重。
#### 2.2 分布式权重
@@ -35,7 +35,7 @@ MindSpore Transformers 提供了大模型推理能力,用户可以执行 `run_
2. 8卡训练的权重在2卡上推理;
3. 已经切分好的分布式权重在单卡上推理等。
-下文的命令示例均采用了在线自动切分的方式,通过设置参数 `--auto_trans_ckpt` 为 `True` 和 `--src_strategy_path_or_dir` 为权重的切分策略文件或目录路径(预训练或者微调后,默认保存在`./output/strategy`下)在推理任务中自动完成切分。更多用法可参考[分布式权重的合并和切分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html)。
+下文的命令示例均采用了在线自动切分的方式,通过设置参数 `--auto_trans_ckpt` 为 `True` 和 `--src_strategy_path_or_dir` 为权重的切分策略文件或目录路径(预训练或者微调后,默认保存在`./output/strategy`下)在推理任务中自动完成切分。更多用法可参考[分布式权重的合并和切分](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html)。
> 由于训练和推理任务都使用 `./output` 作为默认输出路径,当使用训练任务所输出的策略文件,作为推理任务的源权重策略文件时,需要将默认输出路径下的策略文件目录移动到其他位置,避免被推理任务的进程清空,如:
>
@@ -173,7 +173,7 @@ bash scripts/msrun_launcher.sh "python run_mindformer.py \
`input_predict_data.txt`文件的内容和格式是每一行都是一个输入,问题的个数与`predict_batch_size`一致,可以参考以下格式:
-```txt
+```text
I love Beijing, because
I love Beijing, because
I love Beijing, because
@@ -358,4 +358,4 @@ Thanks, sir.
## 更多信息
-更多关于不同模型的推理示例,请访问[MindSpore Transformers 已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)。
\ No newline at end of file
+更多关于不同模型的推理示例,请访问[MindSpore Transformers 已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)。
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/usage/pre_training.md b/docs/mindformers/docs/source_zh_cn/guide/pre_training.md
similarity index 96%
rename from docs/mindformers/docs/source_zh_cn/usage/pre_training.md
rename to docs/mindformers/docs/source_zh_cn/guide/pre_training.md
index 6fc6dafb39eb9a65454a4ecd92f6be4407b21205..21cd0945fd565eca57c7152effe576618795a8a8 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/pre_training.md
+++ b/docs/mindformers/docs/source_zh_cn/guide/pre_training.md
@@ -1,6 +1,6 @@
# 预训练
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/pre_training.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/guide/pre_training.md)
## 概述
@@ -82,8 +82,8 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
run_mode: 运行模式,train:训练,finetune:微调,predict:推理
```
-**注意**: 在多机分布式训练的过程中,可能会遇到一些性能问题。为了确保训练过程的高效性和稳定性,建议参考[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/perf_optimize/perf_optimize.html),进行必要的性能优化和调整。
+**注意**: 在多机分布式训练的过程中,可能会遇到一些性能问题。为了确保训练过程的高效性和稳定性,建议参考[大模型性能调优指南](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/advanced_development/performance_optimization.html),进行必要的性能优化和调整。
## 更多信息
-更多关于不同模型的训练示例,请访问[MindSpore Transformers已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)。
\ No newline at end of file
+更多关于不同模型的训练示例,请访问[MindSpore Transformers已支持模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html)。
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md b/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md
similarity index 98%
rename from docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md
rename to docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md
index 6ce5dc27a4ecfbd8db718730a0754e18b401647e..5dcdb51211aa016f0a54c644f9aff8d4a1344272 100644
--- a/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md
+++ b/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md
@@ -1,6 +1,6 @@
# SFT微调
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/usage/sft_tuning.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/guide/supervised_fine_tuning.md)
## 概述
@@ -179,7 +179,7 @@ run_mode: 运行模式,train:训练,finetune:微调,predict
#### 多机训练
-多机多卡微调任务与启动预训练类似,可参考[多机多卡的预训练命令](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/pre_training.html#%E5%A4%9A%E6%9C%BA%E8%AE%AD%E7%BB%83),并对命令进行如下修改:
+多机多卡微调任务与启动预训练类似,可参考[多机多卡的预训练命令](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/pre_training.html#%E5%A4%9A%E6%9C%BA%E8%AE%AD%E7%BB%83),并对命令进行如下修改:
1. 增加启动脚本入参`--load_checkpoint /{path}/llama2_7b.ckpt`加载预训练权重。
2. 设置启动脚本中的`--train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord`加载微调数据集。
@@ -243,7 +243,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \
--run_mode finetune" 8
```
-当权重的分布式策略和模型的分布式策略不一致时,需要对权重进行切分转换。加载权重路径应设置为以 `rank_0` 命名的目录的上一层路径,同时开启权重自动切分转换功能 `--auto_trans_ckpt True` 。关于分布式权重切分转换的场景和使用方式的更多说明请参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html)。
+当权重的分布式策略和模型的分布式策略不一致时,需要对权重进行切分转换。加载权重路径应设置为以 `rank_0` 命名的目录的上一层路径,同时开启权重自动切分转换功能 `--auto_trans_ckpt True` 。关于分布式权重切分转换的场景和使用方式的更多说明请参考[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html)。
```shell
bash scripts/msrun_launcher.sh "run_mindformer.py \
diff --git a/docs/mindformers/docs/source_zh_cn/index.rst b/docs/mindformers/docs/source_zh_cn/index.rst
index 463d5dff8a34c8dda574949584cfa9aa2818f37b..abc125a1763c11ca6619f2d986d34e4e9b21ebee 100644
--- a/docs/mindformers/docs/source_zh_cn/index.rst
+++ b/docs/mindformers/docs/source_zh_cn/index.rst
@@ -1,13 +1,24 @@
MindSpore Transformers 文档
=========================================
-MindSpore Transformers(也称MindFormers)是一个MindSpore原生的大模型套件,旨在提供大模型训练、微调、评估、推理、部署等全流程开发能力,提供业内主流的Transformer类预训练模型和SOTA下游任务应用,涵盖丰富的并行特性,期望帮助用户轻松地实现大模型训练和创新研发。
+MindSpore Transformers套件的目标是构建一个大模型预训练、微调、推理、部署的全流程开发套件,提供业内主流的Transformer类大语言模型(Large Language Models, LLMs)和多模态理解模型(Multimodal Models, MMs)。期望帮助用户轻松地实现大模型全流程开发。
-用户可以参阅 `整体架构 `_ 和 `模型库 `_ ,快速了解MindSpore Transformers的系统架构,及所支持的功能特性和大模型清单。进一步地,可参考 `安装 `_ 和 `快速启动 `_ 章节,上手探索MindSpore Transformers。
+MindSpore Transformers套件基于MindSpore内置的多维混合并行技术和组件化设计,具备如下特点:
+
+- 一键启动模型单卡或多卡预训练、微调、推理、部署流程;
+- 提供丰富的多维混合并行能力可供灵活易用地进行个性化配置;
+- 大模型训推系统级深度优化,原生支持超大规模集群高效训推,故障快速恢复;
+- 支持任务组件配置化开发。任意模块可通过统一配置进行使能,包括模型网络、优化器、学习率策略等;
+- 提供训练精度/性能监控指标实时可视化能力等。
+
+用户可以参阅 `整体架构 `_ 和 `模型库 `_ ,快速了解MindSpore Transformers的系统架构,以及所支持的大模型清单。
如果您对MindSpore Transformers有任何建议,请通过 `issue `_ 与我们联系,我们将及时处理。
-MindSpore Transformers支持一键启动任意任务的单卡/多卡训练、微调、评估、推理流程,它通过简化操作、提供灵活性和自动化流程,使得深度学习任务的执行变得更加高效和用户友好,用户可以通过以下说明文档进行学习:
+使用MindSpore Transformers进行大模型全流程开发
+-----------------------------------------------------
+
+MindSpore Transformers提供了统一的一键启动脚本,支持一键启动任意任务的单卡/多卡训练、微调、推理流程,它通过简化操作、提供灵活性和自动化流程,使得深度学习任务的执行变得更加高效和用户友好,用户可以通过以下说明文档进行学习:
.. raw:: html
@@ -22,40 +33,22 @@ MindSpore Transformers支持一键启动任意任务的单卡/多卡训练、微
@@ -63,185 +56,191 @@ MindSpore Transformers支持一键启动任意任务的单卡/多卡训练、微
代码仓地址:
-使用MindSpore Transformers进行灵活易用的个性化配置
+MindSpore Transformers功能特性说明
-----------------------------------------------------
-MindSpore Transformers以其强大的功能集,为用户提供了灵活易用的个性化配置选项。其关键特性包括:
-1. `启动任务 `_
+
+
+- 通用功能:
+
+ - `启动任务 `_
单卡、单机和多机任务一键启动。
-2. `权重格式转换 `_
+ - `权重格式转换 `_
- 提供统一的权重转换工具,能够将模型权重在HuggingFace所使用的格式与MindSpore Transformers所使用的格式之间相互转换。
+ 提供统一的权重转换工具,能够将模型权重在HuggingFace所使用的格式与MindSpore Transformers所使用的格式之间相互转换。
-3. `分布式权重切分与合并 `_
+ - `分布式权重切分与合并 `_
- 不同分布式场景下的权重灵活地进行切分与合并。
+ 不同分布式场景下的权重灵活地进行切分与合并。
-4. `分布式并行 `_
+ - `Safetensors权重 `_
- 一键配置多维混合分布式并行,让模型在上至万卡的集群中高效运行。
+ 支持safetensors格式的权重文件保存及加载功能。
-5. `数据集 `_
+ - `配置文件 `_
- 支持多种类型和格式的数据集。
+ 支持使用`YAML`文件集中管理和调整任务中的可配置项。
-6. `模型训练超参数配置 `_
+ - `日志 `_
- 提供大模型训练的超参数配置介绍和示例。
+ 日志相关介绍,包括日志结构、日志保存等。
-7. `其它特性 `_
+- 训练功能:
- 介绍梯度累积、梯度裁剪等特性。
+ - `数据集 `_
-8. `日志 `_
+ 支持多种类型和格式的数据集。
- 日志相关介绍,包括日志结构、日志保存等。
+ - `训练超参数 `_
-9. `模型断点续训 `_
+ 灵活配置大模型训练的超参数配置。
- 支持step级断点续训,有效减少大规模训练时意外中断造成的时间和资源浪费。
+ - `训练指标监控 `_
-10. `训练指标监控 `_
+ 提供大模型训练阶段的可视化服务,用于监控和分析训练过程中的各种指标和信息。
- 提供大模型训练阶段的可视化服务,用于监控和分析训练过程中的各种指标和信息。
+ - `断点续训 `_
-11. `训练高可用 `_
+ 支持step级断点续训,有效减少大规模训练时意外中断造成的时间和资源浪费。
- 提供大模型训练阶段的高可用能力,包括临终 CKPT 保存、UCE 故障容错恢复和进程级重调度恢复功能。
+ - `训练高可用(Beta) `_
-12. `Safetensors权重 `_
+ 提供大模型训练阶段的高可用能力,包括临终 CKPT 保存、UCE 故障容错恢复和进程级重调度恢复功能(Beta特性)。
- 支持safetensors格式的权重文件保存及加载功能。
+ - `分布式训练 `_
-13. `细粒度激活值SWAP `_
+ 一键配置多维混合分布式并行,让模型在上至万卡的集群中高效训练。
- 支持细粒度地选择特定激活值使能SWAP,用于降低模型训练的峰值内存开销。
+ - `训练内存优化 `_
-使用MindSpore Transformers进行深度调优
+ 支持细粒度选择重计算和细粒度激活值SWAP,用于降低模型训练的峰值内存开销。
+
+ - `其它训练特性 `_
+
+ 支持梯度累积、梯度裁剪等特性。
+
+- 推理功能
+
+ - `评测 `_
+
+ 支持使用第三方开源评测框架和数据集进行大模型榜单评测。
+
+ - `量化 `_
+
+ 集成 MindSpore Golden Stick 工具组件,提供统一量化推理流程开箱即用。
+
+使用MindSpore Transformers进行高阶开发
--------------------------------------
-- `精度调优 `_
-- `性能调优 `_
+- 调试调优
+
+ - `精度调优 `_
+ - `性能调优 `_
+
+- 模型开发
-附录
+ - `开发迁移 `_
+ - `多模态理解模型开发 `_
+
+环境变量
------------------------------------
-- `环境变量说明 `_
-- `配置文件说明 `_
+- `环境变量说明 `_
+
+贡献指南
+------------------------------------
+
+- `MindSpore Transformers贡献指南 `_
+- `魔乐社区贡献指南 `_
FAQ
------------------------------------
- `模型相关 `_
-- `功能相关 `_
-- `MindSpore Transformers贡献指南 `_
-- `魔乐社区贡献指南 `_
+- `功能相关 `_
.. toctree::
:glob:
:maxdepth: 1
- :caption: 开始
+ :caption: 介绍
:hidden:
- start/overview
- start/models
+ introduction/overview
+ introduction/models
.. toctree::
:glob:
:maxdepth: 1
- :caption: 快速入门
+ :caption: 安装
:hidden:
- quick_start/install
- quick_start/source_code_start
+ installation
.. toctree::
:glob:
:maxdepth: 1
- :caption: 使用教程
+ :caption: 大模型全流程指南
:hidden:
- usage/dev_migration
- usage/multi_modal
- usage/pre_training
- usage/sft_tuning
- usage/evaluation
- usage/inference
- usage/quantization
- usage/mindie_deployment
- usage/pretrain_gpt
+ guide/pre_training
+ guide/supervised_fine_tuning
+ guide/inference
+ guide/deployment
.. toctree::
:glob:
:maxdepth: 1
- :caption: 功能说明
+ :caption: 功能特性
:hidden:
- function/start_tasks
- function/weight_conversion
- function/transform_weight
- function/distributed_parallel
- function/dataset
- function/training_hyperparameters
- function/other_features
- function/logs
- function/resume_training
- function/monitor
- function/high_availability
- function/safetensors
- function/fine_grained_activations_swap
+ feature/start_tasks
+ feature/weight_conversion
+ feature/transform_weight
+ feature/safetensors
+ feature/configuration
+ feature/logging
+ feature/training_function
+ feature/infer_function
.. toctree::
:glob:
:maxdepth: 1
- :caption: 精度调优
+ :caption: 高阶开发
:hidden:
- acc_optimize/acc_optimize
+ advanced_development/precision_optimization
+ advanced_development/performance_optimization
+ advanced_development/dev_migration
+ advanced_development/multi_modal_dev
+ advanced_development/api
.. toctree::
:glob:
:maxdepth: 1
- :caption: 性能调优
+ :caption: 优秀实践
:hidden:
- perf_optimize/perf_optimize
+ example/distilled/distilled
.. toctree::
:glob:
:maxdepth: 1
- :caption: 案例
- :hidden:
-
- example/distilled
-
-.. toctree::
- :maxdepth: 1
- :caption: API参考
+ :caption: 环境变量
:hidden:
- mindformers
- mindformers.core
- mindformers.dataset
- mindformers.generation
- mindformers.models
- mindformers.modules
- mindformers.pet
- mindformers.pipeline
- mindformers.tools
- mindformers.wrapper
+ example/distilled/distilled
.. toctree::
:glob:
:maxdepth: 1
- :caption: 附录
+ :caption: 贡献指南
:hidden:
- appendix/env_variables
- appendix/conf_files
+ contribution/mindformers_contribution
+ contribution/modelers_contribution
.. toctree::
:glob:
@@ -250,6 +249,4 @@ FAQ
:hidden:
faq/model_related
- faq/func_related
- faq/mindformers_contribution
- faq/modelers_contribution
+ faq/feature_related
diff --git a/docs/mindformers/docs/source_zh_cn/quick_start/install.md b/docs/mindformers/docs/source_zh_cn/installation.md
similarity index 89%
rename from docs/mindformers/docs/source_zh_cn/quick_start/install.md
rename to docs/mindformers/docs/source_zh_cn/installation.md
index e8ed38382326e5a7e69a0598a775337fe1d5d3ea..9a87b3b89d3ecfce4d8209cf33be8871be7bb96a 100644
--- a/docs/mindformers/docs/source_zh_cn/quick_start/install.md
+++ b/docs/mindformers/docs/source_zh_cn/installation.md
@@ -1,6 +1,6 @@
# 安装
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/quick_start/install.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/installation.md)
## 确认版本匹配关系
@@ -23,7 +23,7 @@
## 安装依赖软件
-1. 安装固件与驱动:通过[版本匹配关系](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/quick_start/install.html#%E7%A1%AE%E8%AE%A4%E7%89%88%E6%9C%AC%E5%8C%B9%E9%85%8D%E5%85%B3%E7%B3%BB)中的固件与驱动链接下载安装包,参考[昇腾官方教程](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Ubuntu&Software=cannToolKit)进行安装。
+1. 安装固件与驱动:通过[版本匹配关系](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/installation.html#%E7%A1%AE%E8%AE%A4%E7%89%88%E6%9C%AC%E5%8C%B9%E9%85%8D%E5%85%B3%E7%B3%BB)中的固件与驱动链接下载安装包,参考[昇腾官方教程](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Ubuntu&Software=cannToolKit)进行安装。
2. 安装CANN和MindSpore:使用官方提供的Docker镜像(镜像中已包含CANN、MindSpore,无需手动安装)或者按照MindSpore官网的[手动安装](https://www.mindspore.cn/install/)章节进行安装。
diff --git a/docs/mindformers/docs/source_zh_cn/start/image/overall_architecture.png b/docs/mindformers/docs/source_zh_cn/introduction/images/overall_architecture.png
similarity index 100%
rename from docs/mindformers/docs/source_zh_cn/start/image/overall_architecture.png
rename to docs/mindformers/docs/source_zh_cn/introduction/images/overall_architecture.png
diff --git a/docs/mindformers/docs/source_zh_cn/start/models.md b/docs/mindformers/docs/source_zh_cn/introduction/models.md
similarity index 44%
rename from docs/mindformers/docs/source_zh_cn/start/models.md
rename to docs/mindformers/docs/source_zh_cn/introduction/models.md
index 2ee22214b7fc66a914432506d37ebe45d8234af5..d0fb9be38ca532190f799b23faaf8ef2f2214330 100644
--- a/docs/mindformers/docs/source_zh_cn/start/models.md
+++ b/docs/mindformers/docs/source_zh_cn/introduction/models.md
@@ -1,55 +1,61 @@
# 模型库
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/start/models.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/introduction/models.md)
当前MindSpore Transformers全量的模型列表如下:
-| 模型名 | 支持规格 | 模型类型 | 最新支持版本 |
-|:--------------------------------------------------------------------------------------------------------|:------------------------------|:------------:|:------:|
-| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/codellama.md) | 34B | 稠密LLM | 在研版本 |
-| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_image.md) | 19B | MM | 在研版本 |
-| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/cogvlm2_video.md) | 13B | MM | 在研版本 |
-| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本 |
-| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek2) | 236B | 稀疏LLM | 在研版本 |
-| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek1_5) | 7B | 稠密LLM | 在研版本 |
-| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/tree/dev/research/deepseek) | 33B | 稠密LLM | 在研版本 |
-| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本 |
-| [GLM3-32K](https://gitee.com/mindspore/mindformers/tree/dev/research/glm32k) | 6B | 稠密LLM | 在研版本 |
-| [GLM3](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm3.md) | 6B | 稠密LLM | 在研版本 |
-| [InternLM2](https://gitee.com/mindspore/mindformers/tree/dev/research/internlm2) | 7B/20B | 稠密LLM | 在研版本 |
-| [Llama3.1](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本 |
-| [Llama3](https://gitee.com/mindspore/mindformers/tree/dev/research/llama3) | 8B/70B | 稠密LLM | 在研版本 |
-| [Llama2](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md) | 7B/13B/70B | 稠密LLM | 在研版本 |
-| [Mixtral](https://gitee.com/mindspore/mindformers/tree/dev/research/mixtral) | 8x7B | 稀疏LLM | 在研版本 |
-| [Qwen2](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | 稠密/稀疏LLM | 在研版本 |
-| [Qwen1.5](https://gitee.com/mindspore/mindformers/tree/dev/research/qwen1_5) | 7B/14B/72B | 稠密LLM | 在研版本 |
-| [Qwen-VL](https://gitee.com/mindspore/mindformers/tree/dev/research/qwenvl) | 9.6B | MM | 在研版本 |
-| [Whisper](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/whisper.md) | 1.5B | MM | 在研版本 |
-| [Yi](https://gitee.com/mindspore/mindformers/tree/dev/research/yi) | 6B/34B | 稠密LLM | 在研版本 |
-| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | 稠密LLM | 1.3.2 |
-| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | 稠密LLM | 1.3.2 |
-| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | 稠密LLM | 1.3.2 |
-| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | 稠密LLM | 1.3.2 |
-| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | 稠密LLM | 1.3.2 |
-| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | 稠密LLM | 1.1.0 |
-| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | 稠密LLM | 1.1.0 |
-| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | 稠密LLM | 1.0 |
-| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 |
-| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | 稠密LLM | 1.0 |
-| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 |
-| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | 稠密LLM | 1.0 |
-| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | 稠密LLM | 1.0 |
-| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | 稠密LLM | 1.0 |
-| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | 稠密LLM | 1.0 |
-| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 |
-| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | 稠密LLM | 1.0 |
-| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | 稠密LLM | 1.0 |
-| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 |
-| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | 稠密LLM | 1.0 |
-| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 |
-| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | 稠密LLM | 1.0 |
-| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 |
-| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | 稠密LLM | 1.0 |
-| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | 稠密LLM | 0.8 |
+| 模型名 | 支持规格 | 模型类型 | 最新支持版本 |
+|:--------------------------------------------------------------------------------------------------------|:------------------------------|:--------:|:----------:|
+| [DeepSeek-V3](https://gitee.com/mindspore/mindformers/blob/dev/research/deepseek3) | 671B | 稀疏LLM | 在研版本、1.5.0 |
+| [GLM4](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm4.md) | 9B | 稠密LLM | 在研版本、1.5.0 |
+| [Llama3.1](https://gitee.com/mindspore/mindformers/blob/dev/research/llama3_1) | 8B/70B | 稠密LLM | 在研版本、1.5.0 |
+| [Qwen2.5](https://gitee.com/mindspore/mindformers/blob/dev/research/qwen2_5) | 0.5B/1.5B/7B/14B/32B/72B | 稠密LLM | 在研版本、1.5.0 |
+| [TeleChat2](https://gitee.com/mindspore/mindformers/blob/dev/research/telechat2) | 7B/35B/115B | 稠密LLM | 在研版本、1.5.0 |
+| [CodeLlama](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/codellama.md) | 34B | 稠密LLM | 1.5.0 |
+| [CogVLM2-Image](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_image.md) | 19B | MM | 1.5.0 |
+| [CogVLM2-Video](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/cogvlm2_video.md) | 13B | MM | 1.5.0 |
+| [DeepSeek-V2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek2) | 236B | 稀疏LLM | 1.5.0 |
+| [DeepSeek-Coder-V1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek1_5) | 7B | 稠密LLM | 1.5.0 |
+| [DeepSeek-Coder](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/deepseek) | 33B | 稠密LLM | 1.5.0 |
+| [GLM3-32K](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/glm32k) | 6B | 稠密LLM | 1.5.0 |
+| [GLM3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/glm3.md) | 6B | 稠密LLM | 1.5.0 |
+| [InternLM2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/internlm2) | 7B/20B | 稠密LLM | 1.5.0 |
+| [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | 3B | 稠密LLM | 1.5.0 |
+| [Llama3.2-Vision](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/mllama.md) | 11B | MM | 1.5.0 |
+| [Llama3](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/llama3) | 8B/70B | 稠密LLM | 1.5.0 |
+| [Llama2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama2.md) | 7B/13B/70B | 稠密LLM | 1.5.0 |
+| [Mixtral](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/mixtral) | 8x7B | 稀疏LLM | 1.5.0 |
+| [Qwen2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2) | 0.5B/1.5B/7B/57B/57B-A14B/72B | 稠密/稀疏LLM | 1.5.0 |
+| [Qwen1.5](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen1_5) | 7B/14B/72B | 稠密LLM | 1.5.0 |
+| [Qwen-VL](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwenvl) | 9.6B | MM | 1.5.0 |
+| [TeleChat](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/telechat) | 7B/12B/52B | 稠密LLM | 1.5.0 |
+| [Whisper](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/whisper.md) | 1.5B | MM | 1.5.0 |
+| [Yi](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yi) | 6B/34B | 稠密LLM | 1.5.0 |
+| [YiZhao](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/yizhao) | 12B | 稠密LLM | 1.5.0 |
+| [Baichuan2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/baichuan2/baichuan2.md) | 7B/13B | 稠密LLM | 1.3.2 |
+| [GLM2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/glm2.md) | 6B | 稠密LLM | 1.3.2 |
+| [GPT2](https://gitee.com/mindspore/mindformers/blob/r1.3.0/docs/model_cards/gpt2.md) | 124M/13B | 稠密LLM | 1.3.2 |
+| [InternLM](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/internlm/internlm.md) | 7B/20B | 稠密LLM | 1.3.2 |
+| [Qwen](https://gitee.com/mindspore/mindformers/blob/r1.3.0/research/qwen/qwen.md) | 7B/14B | 稠密LLM | 1.3.2 |
+| [CodeGeex2](https://gitee.com/mindspore/mindformers/blob/r1.1.0/docs/model_cards/codegeex2.md) | 6B | 稠密LLM | 1.1.0 |
+| [WizardCoder](https://gitee.com/mindspore/mindformers/blob/r1.1.0/research/wizardcoder/wizardcoder.md) | 15B | 稠密LLM | 1.1.0 |
+| [Baichuan](https://gitee.com/mindspore/mindformers/blob/r1.0/research/baichuan/baichuan.md) | 7B/13B | 稠密LLM | 1.0 |
+| [Blip2](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/blip2.md) | 8.1B | MM | 1.0 |
+| [Bloom](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/bloom.md) | 560M/7.1B/65B/176B | 稠密LLM | 1.0 |
+| [Clip](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/clip.md) | 149M/428M | MM | 1.0 |
+| [CodeGeex](https://gitee.com/mindspore/mindformers/blob/r1.0/research/codegeex/codegeex.md) | 13B | 稠密LLM | 1.0 |
+| [GLM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/glm.md) | 6B | 稠密LLM | 1.0 |
+| [iFlytekSpark](https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark.md) | 13B | 稠密LLM | 1.0 |
+| [Llama](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/llama.md) | 7B/13B | 稠密LLM | 1.0 |
+| [MAE](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/mae.md) | 86M | MM | 1.0 |
+| [Mengzi3](https://gitee.com/mindspore/mindformers/blob/r1.0/research/mengzi3/mengzi3.md) | 13B | 稠密LLM | 1.0 |
+| [PanguAlpha](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/pangualpha.md) | 2.6B/13B | 稠密LLM | 1.0 |
+| [SAM](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/sam.md) | 91M/308M/636M | MM | 1.0 |
+| [Skywork](https://gitee.com/mindspore/mindformers/blob/r1.0/research/skywork/skywork.md) | 13B | 稠密LLM | 1.0 |
+| [Swin](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/swin.md) | 88M | MM | 1.0 |
+| [T5](https://gitee.com/mindspore/mindformers/blob/r1.0/docs/model_cards/t5.md) | 14M/60M | 稠密LLM | 1.0 |
+| [VisualGLM](https://gitee.com/mindspore/mindformers/blob/r1.0/research/visualglm/visualglm.md) | 6B | MM | 1.0 |
+| [Ziya](https://gitee.com/mindspore/mindformers/blob/r1.0/research/ziya/ziya.md) | 13B | 稠密LLM | 1.0 |
+| [Bert](https://gitee.com/mindspore/mindformers/blob/r0.8/docs/model_cards/bert.md) | 4M/110M | 稠密LLM | 0.8 |
* ***LLM:*** *大语言模型(Large Language Model);* ***MM:*** *多模态(Multi-Modal)*
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/start/overview.md b/docs/mindformers/docs/source_zh_cn/introduction/overview.md
similarity index 38%
rename from docs/mindformers/docs/source_zh_cn/start/overview.md
rename to docs/mindformers/docs/source_zh_cn/introduction/overview.md
index bff2bc6594b2554c4261d05ae1ce047d56fb8ae9..b1e03fb2d0d674e1808caeb6196452064a0a3e45 100644
--- a/docs/mindformers/docs/source_zh_cn/start/overview.md
+++ b/docs/mindformers/docs/source_zh_cn/introduction/overview.md
@@ -1,15 +1,15 @@
# 整体架构
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/start/overview.md)
+[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/introduction/overview.md)
MindSpore Transformers与昇思MindSpore、昇腾Ascend的端到端AI软硬件生态,形成的整体架构如下:
1. 在硬件层面,MindSpore Transformers支持用户在Ascend服务器上运行大模型;
2. 在软件层面,MindSpore Transformers通过MindSpore提供的Python接口实现大模型相关代码,并由昇腾AI处理器配套软件包提供的算子库进行数据运算;
3. MindSpore Transformers目前支持的基础功能特性如下:
- 1. 支持大模型[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/distributed_parallel.html)运行训练和推理等任务,并行能力包括数据并行、模型并行、超长序列并行等;
- 2. 支持[模型权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)、[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/transform_weight.html)、不同格式[数据集加载](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/dataset.html)以及[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/resume_training.html)等功能;
- 3. 支持25+大模型[预训练](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/pre_training.html)、[微调](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/sft_tuning.html)、[推理](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/inference.html)和[评测](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/evaluation.html)等功能,同时支持对模型参数进行[量化](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/quantization.html),具体支持模型列表可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html);
-4. MindSpore Transformers支持用户通过[MindIE](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/usage/mindie_deployment.html)进行模型服务化部署功能,同时支持使用[MindX](https://www.hiascend.com/software/mindx-dl)实现大规模集群调度;后续将支持更多第三方平台,敬请期待。
+ 1. 支持大模型[分布式并行](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/parallel_training.html)运行训练和推理等任务,并行能力包括数据并行、模型并行、超长序列并行等;
+ 2. 支持[模型权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/weight_conversion.html)、[分布式权重切分与合并](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/transform_weight.html)、不同格式[数据集加载](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/dataset.html)以及[断点续训](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/resume_training.html)等功能;
+ 3. 支持25+大模型[预训练](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/pre_training.html)、[微调](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/supervised_fine_tuning.html)、[推理](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/inference.html)和[评测](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/evaluation.html)等功能,同时支持对模型参数进行[量化](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/feature/quantization.html),具体支持模型列表可参考[模型库](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/introduction/models.html);
+4. MindSpore Transformers支持用户通过[MindIE](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/guide/deployment.html)进行模型服务化部署功能,同时支持使用[MindX](https://www.hiascend.com/software/mindx-dl)实现大规模集群调度;后续将支持更多第三方平台,敬请期待。
-
+
diff --git a/docs/mindformers/docs/source_zh_cn/quick_start/source_code_start.md b/docs/mindformers/docs/source_zh_cn/quick_start/source_code_start.md
deleted file mode 100644
index 21a5294850229b2d7c8c0dddb8b68603fe3097ff..0000000000000000000000000000000000000000
--- a/docs/mindformers/docs/source_zh_cn/quick_start/source_code_start.md
+++ /dev/null
@@ -1,110 +0,0 @@
-# 快速启动
-
-[](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/quick_start/source_code_start.md)
-
-本节展示如何使用MindSpore Transformers快速拉起一个基于 Llama2-7B 模型的LoRA低参微调任务。如果想要通过MindSpore Transformers使用其他模型和任务,请阅读对应的[模型文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)。
-
-## 准备权重文件
-
-MindSpore Transformers提供已经转换完成的预训练权重、词表文件用于预训练、微调和推理,用户也可以下载HuggingFace官方权重经过模型权重转换后进行使用。为了方便起见,这里不对转换原始权重过多赘述,有需要请参考[Llama2文档](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#模型权重转换)以及[权重转换](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/function/weight_conversion.html)了解更多细节。这里请直接下载`MindSpore`权重,下载转换后的`.ckpt`文件以及`tokenizer.model`文件进行后续的处理。
-
-| 模型名称 | MindSpore权重 | HuggingFace权重 |
-| ------ | ------ | ------ |
-| Llama2-7B | [llama2_7b.ckpt](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/llama2_7b.ckpt) | [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
-
-词表下载链接:[tokenizer.model](https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/MindFormers/llama2/tokenizer.model)
-
-## 准备数据集
-
-1. 微调过程中使用的数据集文件alpaca_data.json在[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)下载获得。
-
-2. 数据预处理。
-
- 需要在MindSpore Transformers代码根目录下执行以下操作,并将下文中的{path}替换成存放数据集文件的本地路径。
-
- 1. 执行[mindformers/tools/dataset_preprocess/llama/alpaca_converter.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/alpaca_converter.py),添加prompt模板,将原始数据集转换为多轮对话格式。
-
- ```shell
- python mindformers/tools/dataset_preprocess/llama/alpaca_converter.py \
- --data_path /{path}/alpaca_data.json \
- --output_path /{path}/alpaca-data-conversation.json
- ```
-
- **参数说明**
-
- - data_path: 输入下载的文件路径。
- - output_path: 输出文件的保存路径。
-
- 2. 执行[mindformers/tools/dataset_preprocess/llama/llama_preprocess.py](https://gitee.com/mindspore/mindformers/blob/dev/mindformers/tools/dataset_preprocess/llama/llama_preprocess.py),生成MindRecord数据,将带有prompt模板的数据转换为MindRecord格式。
-
- ```shell
- python mindformers/tools/dataset_preprocess/llama/llama_preprocess.py \
- --dataset_type qa \
- --input_glob /{path}/alpaca-data-conversation.json \
- --model_file /{path}/tokenizer.model \
- --seq_length 4096 \
- --output_file /{path}/alpaca-fastchat4096.mindrecord
- ```
-
- **参数说明**
-
- - dataset_type: 预处理数据类型。选项包括 "wiki" 和 "qa" 两种。
- - "wiki" 用于处理 Wikitext2 数据集,该数据集适用于预训练和评测阶段。
- - "qa" 用于处理 alpaca 数据集,将该数据集转换为问答格式,该数据集适用于微调阶段。
- 其他的数据集转换脚本请参考对应的[模型文档](https://www.mindspore.cn/mindformers/docs/zh-CN/dev/start/models.html)。
- - input_glob: 转换后的alpaca的文件路径。
- - model_file: 模型tokenizer.model文件路径。
- - seq_length: 输出数据的序列长度。
- - output_file: 输出文件的保存路径。
-
- 3. 控制台输出如下内容,证明格式转换成功。
-
- ```shell
- # 控制台输出
- Transformed 52002 records.
- Transform finished, output files refer: {path}/alpaca-fastchat4096.mindrecord
- ```
-
-## 启动微调
-
-在MindSpore Transformers代码根目录下,执行如下命令拉起微调任务:
-
-```shell
-bash scripts/msrun_launcher.sh "run_mindformer.py \
- --config configs/llama2/lora_llama2_7b.yaml \
- --train_dataset_dir /{path}/alpaca-fastchat4096.mindrecord \
- --load_checkpoint /{path}/llama2_7b.ckpt \
- --auto_trans_ckpt True \
- --use_parallel True \
- --run_mode finetune" 8
-```
-
-**命令说明:**
-
-- `scripts/msrun_launcher.sh`:分布式任务拉起脚本。
-- `"run_mindformer.py ..."`:每张卡上执行的Python任务的参数字符串,其中参数包括:
- - `run_mindformer.py`:一键启动脚本。
- - `--config`:指定任务配置文件路径 `configs/llama2/lora_llama2_7b.yaml` 。
- - `--train_dataset_dir`:指定数据集路径 `/{path}/alpaca-fastchat4096.mindrecord` 。
- - `--load_checkpoint`:指定权重文件路径 `/{path}/llama2_7b.ckpt` 。
- - `--auto_trans_ckpt True`:打开权重自动切分功能。
- - `--use_parallel True`:设置为分布式任务。
- - `--run_mode finetune`:设定运行模式为微调。
-- `8`:设置任务使用8张NPU。
-
-当控制台出现如下日志时:
-
-```shell
-Start worker process with rank id:0, log file:output/msrun_log/worker_0.log. Environment variable [RANK_ID=0] is exported.
-Start worker process with rank id:1, log file:output/msrun_log/worker_1.log. Environment variable [RANK_ID=1] is exported.
-Start worker process with rank id:2, log file:output/msrun_log/worker_2.log. Environment variable [RANK_ID=2] is exported.
-Start worker process with rank id:3, log file:output/msrun_log/worker_3.log. Environment variable [RANK_ID=3] is exported.
-Start worker process with rank id:4, log file:output/msrun_log/worker_4.log. Environment variable [RANK_ID=4] is exported.
-Start worker process with rank id:5, log file:output/msrun_log/worker_5.log. Environment variable [RANK_ID=5] is exported.
-Start worker process with rank id:6, log file:output/msrun_log/worker_6.log. Environment variable [RANK_ID=6] is exported.
-Start worker process with rank id:7, log file:output/msrun_log/worker_7.log. Environment variable [RANK_ID=7] is exported.
-```
-
-说明微调任务已拉起,微调进度可在`output/msrun_log/`目录下查看。
-
-关于Llama2更多细节,以及更多的启动方式,请具体参考`Llama2` 的 [README](https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/llama2.md#llama-2)文档获取更多支持。
diff --git a/docs/mindpandas/docs/source_en/mindpandas_configuration.md b/docs/mindpandas/docs/source_en/mindpandas_configuration.md
index 41fcf3b27d5bb0f8bc0f5fb69286be1ee48e4452..8c4dddb19857afc2debcee82c3eb5577bf9ee59c 100644
--- a/docs/mindpandas/docs/source_en/mindpandas_configuration.md
+++ b/docs/mindpandas/docs/source_en/mindpandas_configuration.md
@@ -70,7 +70,7 @@ df_mean = df.mean()
When MindSpore Pandas is installed, the built-in distributed compute engine has also been installed synchronously, which can be accessed using the command `yrctl` in the console.
-> Note: In multi-process mode, please make sure that the cluster you start is only for your personal use. Using a cluster together with others may lead to potential security risks.
+> In multi-process mode, please make sure that the cluster you start is only for your personal use. Using a cluster together with others may lead to potential security risks.
```shell
$ yrctl
@@ -117,7 +117,7 @@ Succeeded to start!
After the cluster is deployed, you need to set a multi-process backend to run in the Python script. The method is to call the `set_concurrency_mode` interface, set the `mode` to `"multiprocess"`.
-> Note: We recommend calling `set_concurrency_mode` immediately after `import mindpandas` to set the concurrency mode. Switching the parallel mode while the script is running may cause the program failure.
+> We recommend calling `set_concurrency_mode` immediately after `import mindpandas` to set the concurrency mode. Switching the parallel mode while the script is running may cause the program failure.
```python
import mindpandas as pd
diff --git a/docs/mindpandas/docs/source_zh_cn/mindpandas_configuration.md b/docs/mindpandas/docs/source_zh_cn/mindpandas_configuration.md
index fec21fb55481c6f049a809aa5b7cbd664679c84a..df1880b32f15571e3b86ac945f4ada31600ab38e 100644
--- a/docs/mindpandas/docs/source_zh_cn/mindpandas_configuration.md
+++ b/docs/mindpandas/docs/source_zh_cn/mindpandas_configuration.md
@@ -70,7 +70,7 @@ df_mean = df.mean()
安装MindSpore Pandas时,内置的分布式计算引擎也已经同步安装完成,可以在控制台使用指令`yrctl`访问。
-> 注意:多进程模式下请确保您启动的集群仅由您个人使用,与他人共同使用一个集群可能导致潜在的安全风险。
+> 多进程模式下请确保您启动的集群仅由您个人使用,与他人共同使用一个集群可能导致潜在的安全风险。
```shell
$ yrctl
@@ -117,7 +117,7 @@ Succeeded to start!
集群部署完成后,在Python脚本中需要设置使用多进程后端运行。方法是调用`set_concurrency_mode`接口,设置`mode`为`"multiprocess"`。
-> 注意:我们建议在`import mindpandas`之后马上调用`set_concurrency_mode`进行并行模式的设置。在脚本运行过程中切换并行模式将可能导致程序出错。
+> 我们建议在`import mindpandas`之后马上调用`set_concurrency_mode`进行并行模式的设置。在脚本运行过程中切换并行模式将可能导致程序出错。
```python
import mindpandas as pd
diff --git a/docs/mindspore/source_en/conf.py b/docs/mindspore/source_en/conf.py
index 0404298a10a66c91f2f014ef696704489fea1f48..82cdba21565710c31382b9961c3a19d9a4b49b5f 100644
--- a/docs/mindspore/source_en/conf.py
+++ b/docs/mindspore/source_en/conf.py
@@ -450,6 +450,7 @@ def linkcode_resolve(domain, info):
('mint.select_ext_view', 'mint.select', 'select_ext_view', 'select'),
('mint.transpose_ext_view', 'mint.transpose', 'transpose_ext_view', 'transpose'),
('mint.nn.functional.im2col_ext', 'mint.nn.functional.unfold', 'im2col_ext', 'unfold'),
+ ('mint.nn.functional.inplace_threshold', 'mint.nn.functional.threshold_', 'inplace_threshold', 'threshold_'),
]
fullname = info["module"] + '.' + name
for i in spec_tp:
@@ -466,8 +467,12 @@ def linkcode_resolve(domain, info):
if name1.split('.')[-1] + '_doc.yaml' not in ops_yaml_list:
if name.split('.')[-1].lower() + '_doc.yaml' in ops_yaml_list:
name1 = name.lower()
+ else:
+ return None
# 根据yaml文件名查询文件是否存在,分别再处理
- if name1.split('.')[-1] + '_doc.yaml' not in ops_yaml_list:
+ if name1.split('.')[-1] + '_ext_doc.yaml' in ops_yaml_list and '.mint.' in fullname:
+ py_source_rel = ops_yaml + name1.split('.')[-1] + '_ext_doc.yaml'
+ elif name1.split('.')[-1] + '_doc.yaml' not in ops_yaml_list:
# 新增查找_ext后缀文件
if name1.split('.')[-1] + '_ext_doc.yaml' in ops_yaml_list:
py_source_rel = ops_yaml + name1.split('.')[-1] + '_ext_doc.yaml'
diff --git a/docs/mindspore/source_en/faq/implement_problem.md b/docs/mindspore/source_en/faq/implement_problem.md
index da745e7645c3ddbd2d5752f04055854a248eb978..c94208dee437ea2e8ed75ada5f47ee1ef0ffe3e4 100644
--- a/docs/mindspore/source_en/faq/implement_problem.md
+++ b/docs/mindspore/source_en/faq/implement_problem.md
@@ -299,7 +299,7 @@ A: This issue is a memory shortage problem caused by too much memory usage, whic
- Set the value of `batch_size` too large. Solution: Reduce the value of `batch_size`.
- Introduce the abnormally large `parameter`, for example, a single data shape is [640,1024,80,81]. The data type is float32, and the single data size is over 15G. In this way, the two data with the similar size are added together, and the memory occupied is over 3*15G, which easily causes `Out of Memory`. Solution: Check the `shape` of the parameter. If it is abnormally large, the shape can be reduced.
-- If the following operations cannot solve the problem, you can raise the problem on the [official forum](https://www.hiascend.com/forum/forum-0106101385921175002-1.html), and there are dedicated technical personnels for help.
+- If the following operations cannot solve the problem, you can raise the problem on the [official forum](https://discuss.mindspore.cn/), and there are dedicated technical personnels for help.
diff --git a/docs/mindspore/source_en/features/compile/graph_optimization.md b/docs/mindspore/source_en/features/compile/graph_optimization.md
index 1c46d7c7280f6325a7b9648fbb3047db8f9b1f8b..d399edd364fe1e49dd6243fd7cd70c80ef5f0951 100644
--- a/docs/mindspore/source_en/features/compile/graph_optimization.md
+++ b/docs/mindspore/source_en/features/compile/graph_optimization.md
@@ -54,7 +54,7 @@ out = func(m)
The MindSpore graph compiler converts Python programs into computational graphs, which consist of multiple subgraphs. The algebraic operations in the source code are converted into operator calls within the subgraph, and it can be seen that the PrimFunc_Add operator is called once.
-```txt
+```text
%para1_x:
subgraph @1_func_14() {
@@ -68,7 +68,7 @@ subgraph @1_func_14() {
By arithmetic simplify, the PrimFunc_Add operator can be directly removed to simplify the computational graph structure, reducing `x + 0` to `x`.
-```txt
+```text
%para1_x:
subgraph @1_func_14() {
@@ -128,7 +128,7 @@ out = f1(a, b, c)
First, MindSpore's graph compiler converts the Python program into a computational graph. The function calls in the Python program are converted into calls between calculation graphs, and the original calculation graph is similar to the following. The main graph `f1` calls the subgraph `f2` twice.
-```txt
+```text
# Params:
%para1_a:
%para2_b:
@@ -155,7 +155,7 @@ subgraph @f1() {
With inlining, the subgraph `f2` can be expanded and merged into the main graph `f1`.
-```txt
+```text
subgraph @f1() {
# First-time subgraph inlining
%0 = PrimFunc_Mul(%para1_a, Float32(0.5)) # Repeated computation
@@ -173,7 +173,7 @@ subgraph @f1() {
Before inlining, the compiler might not detect repeated operations in the two calls to subgraph `f2` (as subgraphs are often treated as black boxes). After inlining, the compiler clearly sees `x * 0.5` calculated twice, enabling optimizations like **CSE** (Common Subexpression Elimination) to reduce redundant computations.
-```txt
+```text
subgraph @f1() {
%0 = PrimFunc_Mul(%para1_a, Float32(0.5)) # CSE merges redundant computations
@@ -256,7 +256,7 @@ In MindSpore's graph mode, the purpose and techniques of redundancy elimination
The MindSpore graph compiler will convert the Python code decorated with `@jit` into the MindIR representation through static analysis and eliminate the redundant computation `c = x * y`. The resulting MindIR is as follows:
- ```txt
+ ```text
# Params:
%para1_x:
%para2_y:
@@ -298,7 +298,7 @@ In MindSpore's graph mode, the purpose and techniques of redundancy elimination
The MindSpore graph compiler will convert the Python code decorated with `@jit` into the MindIR representation through static analysis and eliminate the redundant control flow branch `1 < 0`. The resulting MindIR is as follows:
- ```txt
+ ```text
# Params:
%para1_x:
%para2_y:
diff --git a/docs/mindspore/source_zh_cn/conf.py b/docs/mindspore/source_zh_cn/conf.py
index bf3e6e28ce4136441ff5c43ff96cdea5c6c65ca6..ed27fcea87ae13095c8c4fe6a8f300152a0423da 100644
--- a/docs/mindspore/source_zh_cn/conf.py
+++ b/docs/mindspore/source_zh_cn/conf.py
@@ -548,6 +548,7 @@ def linkcode_resolve(domain, info):
('mint.select_ext_view', 'mint.select', 'select_ext_view', 'select'),
('mint.transpose_ext_view', 'mint.transpose', 'transpose_ext_view', 'transpose'),
('mint.nn.functional.im2col_ext', 'mint.nn.functional.unfold', 'im2col_ext', 'unfold'),
+ ('mint.nn.functional.inplace_threshold', 'mint.nn.functional.threshold_', 'inplace_threshold', 'threshold_'),
]
fullname = modname + '.' + name
for i in spec_tp:
@@ -564,8 +565,12 @@ def linkcode_resolve(domain, info):
if name1.split('.')[-1] + '_doc.yaml' not in ops_yaml_list:
if name.split('.')[-1].lower() + '_doc.yaml' in ops_yaml_list:
name1 = name.lower()
+ else:
+ return None
# 根据yaml文件名查询文件是否存在,分别再处理
- if name1.split('.')[-1] + '_doc.yaml' not in ops_yaml_list:
+ if name1.split('.')[-1] + '_ext_doc.yaml' in ops_yaml_list and '.mint.' in fullname:
+ py_source_rel = ops_yaml + name1.split('.')[-1] + '_ext_doc.yaml'
+ elif name1.split('.')[-1] + '_doc.yaml' not in ops_yaml_list:
# 新增查找_ext后缀文件
if name1.split('.')[-1] + '_ext_doc.yaml' in ops_yaml_list:
py_source_rel = ops_yaml + name1.split('.')[-1] + '_ext_doc.yaml'
diff --git a/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png b/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png
index 0c4d1a957cc1fafe486a8dd6307fdaf97c7b4f2f..4eac2b7dc1f8cb09383e7bb457ea58cda9da1118 100644
Binary files a/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png and b/docs/mindspore/source_zh_cn/design/images/all_scenarios_intro.png differ
diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png b/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png
index 42dc30c2e043f21a078b4c21d55cebe9deb366e8..9a7b5304d3bbabdf469df9628a30e8ce7fa85444 100644
Binary files a/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png and b/docs/mindspore/source_zh_cn/design/images/auto_gradient_forward2.png differ
diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png b/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png
index af52cb4a31f76532b1b4aa9940353cbb5eb11525..f7858b7d55cbca730624d2c78d2dc876951d3c44 100644
Binary files a/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png and b/docs/mindspore/source_zh_cn/design/images/auto_gradient_foward.png differ
diff --git a/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png b/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png
index bfc6c1a95d712982739c10f5a827517994c94acd..2076fa3322c873fe402b0ff9003eb51486d63056 100644
Binary files a/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png and b/docs/mindspore/source_zh_cn/design/images/auto_gradient_jvp.png differ
diff --git a/docs/mindspore/source_zh_cn/faq/implement_problem.md b/docs/mindspore/source_zh_cn/faq/implement_problem.md
index 9b6a19606dd387ef861d551395ed12f2c1ab0407..930a09b6ea24160e93e5226b35330928f0a6d86a 100644
--- a/docs/mindspore/source_zh_cn/faq/implement_problem.md
+++ b/docs/mindspore/source_zh_cn/faq/implement_problem.md
@@ -299,7 +299,7 @@ A: 此问题属于内存占用过多导致的内存不够问题,可能原因
- `batch_size`的值设置过大。解决办法: 将`batch_size`的值设置减小。
- 引入了异常大的`Parameter`,例如单个数据shape为[640,1024,80,81],数据类型为float32,单个数据大小超过15G,这样差不多大小的两个数据相加时,占用内存超过3*15G,容易造成`Out of Memory`。解决办法: 检查参数的`shape`,如果异常过大,减少shape。
-- 如果以上操作还是未能解决,可以上[官方论坛](https://www.hiascend.com/forum/forum-0106101385921175002-1.html)发帖提出问题,将会有专门的技术人员帮助解决。
+- 如果以上操作还是未能解决,可以上[官方论坛](https://discuss.mindspore.cn/)发帖提出问题,将会有专门的技术人员帮助解决。
diff --git a/docs/mindspore/source_zh_cn/features/compile/graph_optimization.md b/docs/mindspore/source_zh_cn/features/compile/graph_optimization.md
index 4516f7f752761a31fcf2d82f206309ff44a11631..34e24d8071e26923f820416feb01c9e226fe779b 100644
--- a/docs/mindspore/source_zh_cn/features/compile/graph_optimization.md
+++ b/docs/mindspore/source_zh_cn/features/compile/graph_optimization.md
@@ -54,7 +54,7 @@ out = func(m)
MindSpore图编译器会把 Python 程序转换为计算图,计算图由多个子图构成。源程序中的代数运算,转换为子图内部的算子调用,可以看到 PrimFunc_Add 算子调用了一次。
-```txt
+```text
%para1_x:
subgraph @1_func_14() {
@@ -68,7 +68,7 @@ subgraph @1_func_14() {
通过代数化简,可以直接删除 PrimFunc_Add 算子,简化计算图结构,将 `x + 0` 简化成 `x`。
-```txt
+```text
%para1_x:
subgraph @1_func_14() {
@@ -128,7 +128,7 @@ out = f1(a, b, c)
首先,MindSpore 的计算图编译器会把 Python 程序转换为计算图。而 Python 程序中的函数调用,会转换为计算图之间的调用,得到类似于下面的原始计算图。其中,主图 f1 调用了 2 次子图 f2。
-```txt
+```text
# Params:
%para1_a:
%para2_b:
@@ -155,7 +155,7 @@ subgraph @f1() {
通过 inline,可以将子图 f2 展开,合并到主图 f1。
-```txt
+```text
subgraph @f1() {
# 第一次子图inline
%0 = PrimFunc_Mul(%para1_a, Float32(0.5)) # 重复计算步骤
@@ -173,7 +173,7 @@ subgraph @f1() {
在 inline 将子图展开之前,编译器可能无法识别到两次调用子图 f2 中的重复操作(此时子图通常被当作黑盒处理)。而通过 inline 将子图展开后,此时编译器可以清晰看到`x * 0.5`被计算了两次,就可以触发编译器进一步的优化:**公共子表达式消除** (CSE, Common Subexpression Elimination),这样就降低了计算量。
-```txt
+```text
subgraph @f1() {
%0 = PrimFunc_Mul(%para1_a, Float32(0.5)) # CSE合并重复计算
@@ -256,7 +256,7 @@ MindSpore 图模式下冗余消除的目的及使用的技术也类似。与传
MindSpore 图编译器会通过静态分析将 `@jit` 修饰的 Python 代码转换为 MindIR 的表示形式并消除其中冗余的 `c = x * y` 的计算,最终生成的 MindIR 如下:
- ```txt
+ ```text
# Params:
%para1_x:
%para2_y:
@@ -298,7 +298,7 @@ MindSpore 图模式下冗余消除的目的及使用的技术也类似。与传
MindSpore 图编译器会通过静态分析将 `@jit` 修饰的 Python 代码转换为 MindIR 的表示形式并消除其中冗余的控制流分支 `1 < 0` 的代码,最终生成的 MindIR 如下:
- ```txt
+ ```text
# Params:
%para1_x:
%para2_y:
diff --git a/docs/vllm_mindspore/docs/requirements.txt b/docs/vllm_mindspore/docs/requirements.txt
index fabeca22d66d2f21d9ccda77b5d7225ddfc3113b..9e952e9a95cffffd0351396a3066726a3f81857f 100644
--- a/docs/vllm_mindspore/docs/requirements.txt
+++ b/docs/vllm_mindspore/docs/requirements.txt
@@ -8,4 +8,4 @@ jieba
descriptastorus == 2.6.0
sympy
tqdm
-sphinx-rtd-theme == 1.0.0
+sphinx-rtd-theme == 1.0.0
\ No newline at end of file
diff --git a/docs/vllm_mindspore/docs/source_en/arch.png b/docs/vllm_mindspore/docs/source_en/arch.png
new file mode 100644
index 0000000000000000000000000000000000000000..fc3b524ca3487ae92431c58157175b4ddcb42725
Binary files /dev/null and b/docs/vllm_mindspore/docs/source_en/arch.png differ
diff --git a/docs/vllm_mindspore/docs/source_en/conf.py b/docs/vllm_mindspore/docs/source_en/conf.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6864f092fea0e8274bc20873bb76de694da5908
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/conf.py
@@ -0,0 +1,266 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import glob
+import os
+import shutil
+import sys
+import IPython
+import re
+import sphinx
+sys.path.append(os.path.abspath('../_ext'))
+import sphinx.ext.autosummary.generate as g
+from sphinx.ext import autodoc as sphinx_autodoc
+
+# -- Project information -----------------------------------------------------
+
+project = 'vLLM MindSpore'
+copyright = 'MindSpore'
+author = 'vLLM MindSpore'
+
+# The full version, including alpha/beta/rc tags
+release = 'master'
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+myst_enable_extensions = ["dollarmath", "amsmath"]
+
+
+myst_heading_anchors = 5
+extensions = [
+ 'sphinx.ext.autodoc',
+ 'sphinx.ext.autosummary',
+ 'sphinx.ext.doctest',
+ 'sphinx.ext.intersphinx',
+ 'sphinx.ext.todo',
+ 'sphinx.ext.coverage',
+ 'sphinx.ext.napoleon',
+ 'sphinx.ext.viewcode',
+ 'myst_parser',
+ 'nbsphinx',
+ 'sphinx.ext.mathjax',
+ 'IPython.sphinxext.ipython_console_highlighting'
+]
+
+source_suffix = {
+ '.rst': 'restructuredtext',
+ '.md': 'markdown',
+}
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+mathjax_path = 'https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/mathjax/MathJax-3.2.2/es5/tex-mml-chtml.js'
+
+mathjax_options = {
+ 'async':'async'
+}
+
+smartquotes_action = 'De'
+
+exclude_patterns = []
+
+pygments_style = 'sphinx'
+
+autodoc_inherit_docstrings = False
+
+autosummary_generate = True
+
+autosummary_generate_overwrite = False
+
+# -- Options for HTML output -------------------------------------------------
+
+# Reconstruction of sphinx auto generated document translation.
+language = 'zh_CN'
+locale_dirs = ['../../../../resource/locale/']
+gettext_compact = False
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+html_search_language = 'zh'
+
+html_search_options = {'dict': '../../../resource/jieba.txt'}
+
+sys.path.append(os.path.abspath('../../../../resource/sphinx_ext'))
+# import anchor_mod
+import nbsphinx_mod
+
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {
+ 'python': ('https://docs.python.org/', '../../../../resource/python_objects.inv'),
+ 'numpy': ('https://docs.scipy.org/doc/numpy/', '../../../../resource/numpy_objects.inv'),
+}
+
+from sphinx import directives
+with open('../_ext/overwriteobjectiondirective.txt', 'r', encoding="utf8") as f:
+ exec(f.read(), directives.__dict__)
+
+from sphinx.ext import viewcode
+with open('../_ext/overwriteviewcode.txt', 'r', encoding="utf8") as f:
+ exec(f.read(), viewcode.__dict__)
+
+with open('../_ext/customdocumenter.txt', 'r', encoding="utf8") as f:
+ exec(f.read(), sphinx_autodoc.__dict__)
+
+# Modify regex for sphinx.ext.autosummary.generate.find_autosummary_in_lines.
+gfile_abs_path = os.path.abspath(g.__file__)
+autosummary_re_line_old = r"autosummary_re = re.compile(r'^(\s*)\.\.\s+autosummary::\s*')"
+autosummary_re_line_new = r"autosummary_re = re.compile(r'^(\s*)\.\.\s+(ms[a-z]*)?autosummary::\s*')"
+with open(gfile_abs_path, "r+", encoding="utf8") as f:
+ data = f.read()
+ data = data.replace(autosummary_re_line_old, autosummary_re_line_new)
+ exec(data, g.__dict__)
+
+# Modify default signatures for autodoc.
+autodoc_source_path = os.path.abspath(sphinx_autodoc.__file__)
+autodoc_source_re = re.compile(r'stringify_signature\(.*?\)')
+get_param_func_str = r"""\
+import re
+import inspect as inspect_
+
+def get_param_func(func):
+ try:
+ source_code = inspect_.getsource(func)
+ if func.__doc__:
+ source_code = source_code.replace(func.__doc__, '')
+ all_params_str = re.findall(r"def [\w_\d\-]+\(([\S\s]*?)(\):|\) ->.*?:)", source_code)
+ all_params = re.sub("(self|cls)(,|, )?", '', all_params_str[0][0].replace("\n", "").replace("'", "\""))
+ return all_params
+ except:
+ return ''
+
+def get_obj(obj):
+ if isinstance(obj, type):
+ return obj.__init__
+
+ return obj
+"""
+
+with open(autodoc_source_path, "r+", encoding="utf8") as f:
+ code_str = f.read()
+ code_str = autodoc_source_re.sub('"(" + get_param_func(get_obj(self.object)) + ")"', code_str, count=0)
+ exec(get_param_func_str, sphinx_autodoc.__dict__)
+ exec(code_str, sphinx_autodoc.__dict__)
+
+# Copy source files of chinese python api from mindscience repository.
+from sphinx.util import logging
+logger = logging.getLogger(__name__)
+
+# copy_path = 'docs/api_python/mindchemistry'
+# src_dir = os.path.join(os.getenv("VLLM_PATH"), copy_path)
+
+copy_list = []
+
+present_path = os.path.dirname(__file__)
+
+# for i in os.listdir(src_dir):
+# if os.path.isfile(os.path.join(src_dir,i)):
+# if os.path.exists('./'+i):
+# os.remove('./'+i)
+# shutil.copy(os.path.join(src_dir,i),'./'+i)
+# copy_list.append(os.path.join(present_path,i))
+# else:
+# if os.path.exists('./'+i):
+# shutil.rmtree('./'+i)
+# shutil.copytree(os.path.join(src_dir,i),'./'+i)
+# copy_list.append(os.path.join(present_path,i))
+
+# add view
+import json
+
+with open('../../../../tools/generate_html/daily.json', 'r+', encoding='utf-8') as f:
+ version_inf = json.load(f)
+
+# if os.getenv("VLLM_PATH").split('/')[-1]:
+# copy_repo = os.getenv("VLLM_PATH").split('/')[-1]
+# else:
+# copy_repo = os.getenv("VLLM_PATH").split('/')[-2]
+
+# import pdb
+# pdb.set_trace()
+
+# branch = [version_inf[i]['branch'] for i in range(len(version_inf)) if version_inf[i]['name'] == copy_repo][0]
+# docs_branch = [version_inf[i]['branch'] for i in range(len(version_inf)) if version_inf[i]['name'] == 'tutorials'][0]
+
+# re_view = f"\n.. image:: https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/{docs_branch}/" + \
+# f"resource/_static/logo_source_en.svg\n :target: https://gitee.com/mindspore/{copy_repo}/blob/{branch}/"
+
+# for cur, _, files in os.walk(present_path):
+# for i in files:
+# flag_copy = 0
+# if i.endswith('.rst'):
+# for j in copy_list:
+# if j in cur:
+# flag_copy = 1
+# break
+# if os.path.join(cur, i) in copy_list or flag_copy:
+# try:
+# with open(os.path.join(cur, i), 'r+', encoding='utf-8') as f:
+# content = f.read()
+# new_content = content
+# if '.. include::' in content and '.. automodule::' in content:
+# continue
+# if 'autosummary::' not in content and "\n=====" in content:
+# re_view_ = re_view + copy_path + cur.split(present_path)[-1] + '/' + i + \
+# '\n :alt: 查看源文件\n\n'
+# new_content = re.sub('([=]{5,})\n', r'\1\n' + re_view_, content, 1)
+# print("re_view_")
+# print(re_view_)
+# if new_content != content:
+# f.seek(0)
+# f.truncate()
+# f.write(new_content)
+# except Exception:
+# print(f'打开{i}文件失败')
+
+
+# import vllm_mindspore
+
+sys.path.append(os.path.abspath('../../../../resource/search'))
+import search_code
+
+sys.path.append(os.path.abspath('../../../../resource/custom_directives'))
+from custom_directives import IncludeCodeDirective
+from myautosummary import MsPlatformAutoSummary, MsCnPlatformAutoSummary
+
+rst_files = set([i.replace('.rst', '') for i in glob.glob('./**/*.rst', recursive=True)])
+
+def setup(app):
+ app.add_directive('msplatformautosummary', MsPlatformAutoSummary)
+ app.add_directive('mscnplatformautosummary', MsCnPlatformAutoSummary)
+ app.add_directive('includecode', IncludeCodeDirective)
+ app.add_config_value('rst_files', set(), False)
+
+src_release = "./release_notes/release_notes.md"
+des_release = "./RELEASE.md"
+with open(src_release, "r", encoding="utf-8") as f:
+ data = f.read()
+if len(re.findall("\n## (.*?)\n",data)) > 1:
+ content = re.findall("(## [\s\S\n]*?)\n## ", data)
+else:
+ content = re.findall("(## [\s\S\n]*)", data)
+
+with open(des_release, "w", encoding="utf-8") as p:
+ p.write("# Release Notes"+"\n\n")
+ p.write(content[0])
+
+os.makedirs(os.path.join(present_path, "../build_en/html/"), exist_ok=True)
+shutil.copy(os.path.join(present_path, "arch.png"), os.path.join(present_path, "../build_en/html/"))
diff --git a/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md b/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md
new file mode 100644
index 0000000000000000000000000000000000000000..2800ac98226112506e0a1c1042167737084c8cdc
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md
@@ -0,0 +1,95 @@
+# Contribution Guidelines
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/developer_guide/contributing.md)
+
+## Contributor License Agreement
+
+Before submitting code to the MindSpore community, you need to sign the Contributor License Agreement (CLA). Individual contributors should refer to the [ICLA Online Document](https://www.mindspore.cn/icla).
+
+## Quick Start
+
+- Fork the repository on [Gitee](https://gitee.com/mindspore/vllm-mindspore).
+- Refer to [README.md](https://gitee.com/mindspore/vllm-mindspore/blob/master/README.md) and the installation page for project information and build instructions.
+
+## Supporting New Models
+
+To support a new model for vLLM MindSpore code repository, please note the following:
+
+- **Follow file format and location specifications.** Model code files should be placed under the `vllm_mindspore/model_executor` directory, organized in corresponding subfolders by model type.
+- **Implement models using MindSpore interfaces with jit static graph support.** Model definitions in vLLM MindSpore must be implemented using MindSpore interfaces. Since MindSpore's static graph mode offers performance advantages, models should support execution via @jit static graphs. For reference, see the [Qwen2.5](https://gitee.com/mindspore/vllm-mindspore/blob/master/vllm_mindspore/model_executor/models/qwen2.py) implementation.
+- **Register new models in vLLM MindSpore.** After implementing the model structure, register it in vLLM MindSpore by adding it to `_NATIVE_MODELS` in `vllm_mindspore/model_executor/models/registry.py`.
+- **Write unit tests.** New models must include corresponding unit tests. Refer to the [Qwen2.5 testcases](https://gitee.com/mindspore/vllm-mindspore/blob/master/tests/st/python/test_vllm_qwen_7b.py) for examples.
+
+## Contribution Process
+
+### Code Style
+
+Follow these guidelines for community code review, maintenance, and development.
+
+- **Coding Standards:** Use vLLM community code checking tools: yapf, codespell, ruff, isort, and mypy. For more details, see the [Toolchain Usage Guide](https://gitee.com/mindspore/vllm-mindspore/blob/master/codecheck_toolkits/README.md).
+- **Unit Testing Guidelines:** vLLM MindSpore uses the [pytest](http://www.pytest.org/en/latest/) framework. Test names should clearly reflect their purpose.
+- **Refactoring Guidelines:** Developers are encouraged to refactor code to eliminate [code smells](https://en.wikipedia.org/wiki/Code_smell). All code, including refactored code, must adhere to coding and testing standards.
+
+### Fork-Pull Development Model
+
+- **Fork the vLLM MindSpore Repository:** Before submitting code, fork the project to your own repository. Ensure consistency between the vLLM MindSpore repository and your fork during parallel development.
+
+- **Clone the Remote Repository:** users can use git to pull the source code:
+
+ ```shell
+ # On Gitee:
+ git clone https://gitee.com/{insert_your_forked_repo}/vllm-mindspore.git
+ git remote add upstream https://gitee.com/mindspore/vllm-mindspore.git
+ ```
+
+- **Local Development:** To avoid branch inconsistencies, switch to a new branch:
+
+ ```shell
+ git checkout -b {new_branch_name} origin/master
+ ```
+
+ For version branches or downstream development, fix upstream bugs before modifying code.
+- **Push Changes to Remote Repository:** After updating the code, push changes:
+
+ ```shell
+ git add .
+ git status # Check update status.
+ git commit -m "Your commit title"
+ git commit -s --amend # Add detailed commit description.
+ git push origin {new_branch_name}
+ ```
+
+- **Create a Pull Request to vLLM MindSpore:** Compare and create a PR between your branch and the vLLM MindSpore master branch. After submission, manually trigger CI checks with `/retest` in the comments. PRs should be merged into upstream master promptly to minimize merge risks.
+
+### Reporting Issues
+
+To contribute by reporting issues, follow these guidelines:
+
+- Specify your environment versions (vLLM MindSpore, MindFormers, MindSpore, OS, Python, etc.).
+- Indicate whether it's a bug report or feature request.
+- Label the issue type for visibility on the issue board.
+- Describe the problem and expected resolution.
+- Provide detailed reproduction steps.
+- Add special notes for reviewers.
+
+**Issue Notes:**
+
+- **Comment first when processing an issue,** inform others that you would start to fix this issue.
+- **For long-unresolved issues**, verify the problem before attempting a fix.
+- **If you resolve your own reported issue**, notify others before closing it.
+
+### Submitting PRs
+
+- For major new features, include a design proposal.
+- After consensus via issue discussion and design review, develop in your fork and submit a PR.
+- Each PR requires at least two LGTM labels from reviewers (excluding the PR author).
+- After thorough discussion, the PR will be merged, abandoned, or rejected based on the outcome.
+
+**PR Notes:**
+
+- Avoid unrelated changes.
+- Maintain clean commit history.
+- Keep your branch synchronized with master.
+- For bug-fix PRs, ensure all related issues are referenced.
+
+Thank you for your interest in contributing to vLLM MindSpore. We welcome and value all forms of collaboration.
diff --git a/docs/vllm_mindspore/docs/source_en/faqs/faqs.md b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md
new file mode 100644
index 0000000000000000000000000000000000000000..2d27c2c672f8675fbb0af87e8b1da5adb3708eb2
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/faqs/faqs.md
@@ -0,0 +1,86 @@
+# Frequently Asked Questions
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/faqs/faqs.md)
+
+## Model-related Issues
+
+### Git-LFS Installation
+
+1. Obtain the corresponding [git-lfs installation package](https://github.com/git-lfs/git-lfs/releases/tag/v3.0.1) from the following link.
+2. Download and install:
+
+ ```shell
+ mkdir git-lfs
+ cd git-lfs
+ wget https://github.com/git-lfs/git-lfs/releases/download/v3.0.1/git-lfs-linux-arm64-v3.0.1.tar.gz --no-check-certificate
+ tar zxvf git-lfs-linux-arm64-v3.0.1.tar.gz
+ bash install.sh
+ ```
+
+3. Verify successful installation:
+
+ ```shell
+ git lfs install
+ ```
+
+ If `Git LFS initialized.` is returned, the installation was successful.
+
+## Deployment-related Issues
+
+### Model Fails to Load During Offline/Online Inference
+
+- Key error message:
+
+ ```text
+ raise ValueError(f"{config.load_checkpoint} is not a valid path to load checkpoint ")
+ ```
+
+- Solution:
+ 1. Check if the model path exists and is valid;
+ 2. If the model path exists and the model files are in `safetensors` format, confirm whether the yaml file contains the `load_ckpt_format: "safetensors"` field:
+ 1. Print the path of the yaml file used by the model:
+
+ ```bash
+ echo $MINDFORMERS_MODEL_CONFIG
+ ```
+
+ 2. Check the yaml file. If the `load_ckpt_format` field is missing, add it:
+
+ ```text
+ load_ckpt_format: "safetensors"
+ ```
+
+### `aclnnNonzeroV2` Related Error When Starting Online Service
+
+- Key error message:
+
+ ```text
+ RuntimeError: Call aclnnNonzeroV2 failed, detail:E39999: Inner Error
+ ```
+
+ Check whether the CANN and MindSpore versions are correctly matched.
+
+### `resolve_transformers_fallback` Import Error When Running Qwen3
+
+- Key error message:
+
+ ```text
+ ImportError: cannot import name 'resolve_transformers_fallback' from 'vllm.model_executor.model_loader.utils'
+ ```
+
+ Try switching `vllm` to version `0.7.3`.
+
+### `torch` Not Found When Importing `vllm_mindspore`
+
+- Key error message:
+
+ ```text
+ importlib.metadata.PackageNotFoundError: No package metadata was found for torch
+ ```
+
+ Execute the following commands to reinstall torch-related components:
+
+ ```bash
+ pip uninstall torch
+ pip uninstall torchvision
+ ```
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
new file mode 100644
index 0000000000000000000000000000000000000000..f088336aa7a953454ca0bf95d4963a7d81a83cf8
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md
@@ -0,0 +1,194 @@
+# Installation Guide
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/installation/installation.md)
+
+This document describes the steps to install the vLLM MindSpore environment. Three installation methods are provided:
+
+- [Docker Installation](#docker-installation): Suitable for quick deployment scenarios.
+- [Pip Installation](#pip-installation): Suitable for scenarios requiring specific versions.
+- [Source Code Installation](#source-code-installation): Suitable for incremental development of vLLM MindSpore.
+
+## Version Compatibility
+
+- OS: Linux-aarch64
+- Python: 3.9 / 3.10 / 3.11
+- Software version compatibility
+
+ | Software | Version | Corresponding Branch |
+ | -------- | ------- | -------------------- |
+ | [CANN](https://www.hiascend.com/developer/download/community/result?module=cann) | 8.1 | - |
+ | [MindSpore](https://www.mindspore.cn/install/) | 2.7 | master |
+ | [MSAdapter](https://git.openi.org.cn/OpenI/MSAdapter) | 0.2 | master |
+ | [MindSpore Transformers](https://gitee.com/mindspore/mindformers) | 1.6 | br_infer_deepseek_os |
+ | [Golden Stick](https://gitee.com/mindspore/golden-stick) | 1.1.0 | r1.1.0 |
+ | [vLLM](https://github.com/vllm-project/vllm) | 0.8.3 | v0.8.3 |
+ | [vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.2 | master |
+
+## Environment Setup
+
+This section introduces three installation methods: [Docker Installation](#docker-installation), [Pip Installation](#pip-installation), [Source Code Installation](#source-code-installation), and [Quick Verification](#quick-verification) example to check the installation.
+
+### Docker Installation
+
+We recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps:
+
+#### Pulling the Image
+
+Execute the following command to pull the vLLM MindSpore Docker image:
+
+```bash
+docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest
+```
+
+During the pull process, user will see the progress of each layer. After successful completion, check the image by executing the following command:
+
+```bash
+docker images
+```
+
+#### Creating a Container
+
+After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then execute the following command to create the container:
+
+```bash
+export DOCKER_NAME=vllm-mindspore-container # your container name
+export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name
+
+docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
+ --device=/dev/davinci0 \
+ --device=/dev/davinci1 \
+ --device=/dev/davinci2 \
+ --device=/dev/davinci3 \
+ --device=/dev/davinci4 \
+ --device=/dev/davinci5 \
+ --device=/dev/davinci6 \
+ --device=/dev/davinci7 \
+ --device=/dev/davinci_manager \
+ --device=/dev/devmm_svm \
+ --device=/dev/hisi_hdc \
+ -v /usr/local/sbin/:/usr/local/sbin/ \
+ -v /var/log/npu/slog/:/var/log/npu/slog \
+ -v /var/log/npu/profiling/:/var/log/npu/profiling \
+ -v /var/log/npu/dump/:/var/log/npu/dump \
+ -v /var/log/npu/:/usr/slog \
+ -v /etc/hccn.conf:/etc/hccn.conf \
+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+ -v /usr/local/dcmi:/usr/local/dcmi \
+ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
+ -v /etc/ascend_install.info:/etc/ascend_install.info \
+ -v /etc/vnpu.cfg:/etc/vnpu.cfg \
+ --shm-size="250g" \
+ ${IMAGE_NAME} \
+ bash
+```
+
+The container ID will be returned if docker is created successfully. User can also check the container by executing the following command:
+
+```bash
+docker ps
+```
+
+#### Entering the Container
+
+After [creating the container](#creating-a-container), user can start and enter the container, using the environment variable `DOCKER_NAME`:
+
+```bash
+docker exec -it $DOCKER_NAME bash
+```
+
+### Pip Installation
+
+Use pip to install vLLM MindSpore, by executing the following command:
+
+```bash
+pip install vllm_mindspore
+```
+
+### Source Code Installation
+
+- **CANN Installation**
+ For CANN installation methods and environment configuration, please refer to [CANN Community Edition Installation Guide](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit). If you encounter any issues during CANN installation, please consult the [Ascend FAQ](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html) for troubleshooting.
+
+ The default installation path for CANN is `/usr/local/Ascend`. After completing CANN installation, configure the environment variables with the following commands:
+
+ ```bash
+ LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
+ source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
+ export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
+ ```
+
+- **vLLM Prerequisites Installation**
+ For vLLM environment configuration and installation methods, please refer to the [vLLM Installation Guide](https://docs.vllm.ai/en/v0.8.3/getting_started/installation/cpu.html). In vllM installation, `gcc/g++ >= 12.3.0` is required, and it could be installed by the following command:
+
+ ```bash
+ yum install -y gcc gcc-c++
+ ```
+
+- **vLLM MindSpore Installation**
+ To install vLLM MindSpore, user needs to pull the vLLM MindSpore source code and then runs the following command to install the dependencies:
+
+ ```bash
+ git clone https://gitee.com/mindspore/vllm-mindspore.git
+ cd vllm-mindspore
+ bash install_depend_pkgs.sh
+ ```
+
+ Compile and install vLLM MindSpore:
+
+ ```bash
+ pip install .
+ ```
+
+ After executing the above commands, `mindformers-dev` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. Add this folder to the environment variables:
+
+ ```bash
+ export MF_PATH=`pwd install_depend_pkgs/mindformers-dev`
+ export PYTHONPATH=$MF_PATH:$PYTHONPATH
+ ```
+
+ If MindSpore Transformers was compiled and installed from the `br_infer_deepseek_os` branch, `mindformers-os` folder will be generated in the `vllm-mindspore/install_depend_pkgs` directory. In this case, adjust the `MF_PATH` environment variable to:
+
+ ```bash
+ export MF_PATH=`pwd install_depend_pkgs/mindformers-os`
+ export PYTHONPATH=$MF_PATH:$PYTHONPATH
+ ```
+
+### Quick Verification
+
+To verify the installation, run a simple offline inference test with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct):
+
+```python
+import vllm_mindspore # Add this line on the top of script.
+from vllm import LLM, SamplingParams
+
+# Sample prompts.
+prompts = [
+ "I am",
+ "Today is",
+ "Llama is"
+]
+
+# Create a sampling params object.
+sampling_params = SamplingParams(temperature=0.0, top_p=0.95)
+
+# Create a LLM
+llm = LLM(model="Qwen2.5-7B-Instruct")
+# Generate texts from the prompts. The output is a list of RequestOutput objects
+# that contain the prompt, generated text, and other information.
+outputs = llm.generate(prompts, sampling_params)
+# Print the outputs.
+for output in outputs:
+ prompt = output.prompt
+ generated_text = output.outputs[0].text
+ print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")
+```
+
+If successful, the output will resemble:
+
+```text
+Prompt: 'I am'. Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'
+Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the teacher has'
+Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative'
+```
+
+Alternatively, refer to the [Quick Start](../quick_start/quick_start.md) guide for [online serving](../quick_start/quick_start.md#online-serving) verification.
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
new file mode 100644
index 0000000000000000000000000000000000000000..edf49a6dbb417fcd469be38ab9a5c63815a83db2
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md
@@ -0,0 +1,235 @@
+# Quick Start
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/quick_start/quick_start.md)
+
+This document provides a quick guide to deploy vLLM MindSpore by [docker](https://www.docker.com/), with the [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model as an example. User can quickly experience the serving and inference abilities of vLLM MindSpore by [offline inference](#offline-inference) and [online serving](#online-serving). For more information about installation, please refer to the [Installation Guide](../installation/installation.md).
+
+## Docker Installation
+
+In this section, we recommend to use docker to deploy the vLLM MindSpore environment. The following sections are the steps for deployment:
+
+### Pulling the Image
+
+Pull the vLLM MindSpore docker image by executing the following command:
+
+```bash
+docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest
+```
+
+During the pull process, user will see the progress of each layer of the docker image. User can verify the image by executing the following command:
+
+```bash
+docker images
+```
+
+### Creating a Container
+
+After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, and create the container by running:
+
+```bash
+export DOCKER_NAME=vllm-mindspore-container # your container name
+export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name
+
+docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
+ --device=/dev/davinci0 \
+ --device=/dev/davinci1 \
+ --device=/dev/davinci2 \
+ --device=/dev/davinci3 \
+ --device=/dev/davinci4 \
+ --device=/dev/davinci5 \
+ --device=/dev/davinci6 \
+ --device=/dev/davinci7 \
+ --device=/dev/davinci_manager \
+ --device=/dev/devmm_svm \
+ --device=/dev/hisi_hdc \
+ -v /usr/local/sbin/:/usr/local/sbin/ \
+ -v /var/log/npu/slog/:/var/log/npu/slog \
+ -v /var/log/npu/profiling/:/var/log/npu/profiling \
+ -v /var/log/npu/dump/:/var/log/npu/dump \
+ -v /var/log/npu/:/usr/slog \
+ -v /etc/hccn.conf:/etc/hccn.conf \
+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+ -v /usr/local/dcmi:/usr/local/dcmi \
+ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
+ -v /etc/ascend_install.info:/etc/ascend_install.info \
+ -v /etc/vnpu.cfg:/etc/vnpu.cfg \
+ --shm-size="250g" \
+ ${IMAGE_NAME} \
+ bash
+```
+
+After successfully creating the container, the container ID will be returned. User can verify the creation by executing the following command:
+
+```bash
+docker ps
+```
+
+### Entering the Container
+
+After [creating the container](#creating-a-container), use the environment variable `DOCKER_NAME` to start and enter the container by executing the following command:
+
+```bash
+docker exec -it $DOCKER_NAME bash
+```
+
+## Using the Service
+
+After deploying the environment, user need to prepare the model files before running the model. Refer to the [Download Model](#downloading-model) section for guidance. After [setting environment variables](#setting-environment-variables), user can experience the model bt [offline inference](#offline-inference) or [online serving](#online-serving).
+
+### Downloading Model
+
+User can download the model using either the [Python Tool](#downloading-with-python-tool) or [git-lfs Tool](#downloading-with-git-lfs-tool).
+
+#### Downloading with Python Tool
+
+Execute the following Python script to download the [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) weights and files from [Hugging Face](https://huggingface.co/):
+
+```python
+from huggingface_hub import snapshot_download
+
+snapshot_download(
+ repo_id="Qwen/Qwen2.5-7B-Instruct",
+ local_dir="/path/to/save/Qwen2.5-7B-Instruct",
+ local_dir_use_symlinks=False
+)
+```
+
+`local_dir` is the model save path specified by the user. Please ensure the disk space is sufficient.
+
+#### Downloading with git-lfs Tool
+
+Execute the following command to check if [git-lfs](https://git-lfs.com) is available:
+
+```bash
+git lfs install
+```
+
+If available, the following output will be displayed:
+
+```text
+Git LFS initialized.
+```
+
+If the tool is unavailable, please install [git-lfs](https://git-lfs.com) first. Refer to the [FAQ](../../faqs/faqs.md) section for guidance on [git-lfs installation](../../faqs/faqs.md#git-lfs-installation).
+
+Once confirmed, download the weights by executing the following command:
+
+```bash
+git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
+```
+
+### Setting Environment Variables
+
+Before launching the model, user need to set the following environment variables:
+
+```bash
+export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
+export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
+export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation.
+export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
+```
+
+Here is an explanation of these environment variables:
+
+- `ASCEND_TOTAL_MEMORY_GB`: The memory size of each card. User can check the memory by using `npu-smi info`, where the value corresponds to `HBM-Usage(MB)` in the query results.
+- `vLLM_MODEL_BACKEND`: The backend of the model to run. User could find supported models and backends for vLLM MindSpore in the [Model Support List](../../user_guide/supported_models/models_list/models_list.md).
+- `vLLM_MODEL_MEMORY_USE_GB`: The memory reserved for model loading. Adjust this value if insufficient memory error occurs during model loading.
+- `MINDFORMERS_MODEL_CONFIG`: The model configuration file.
+
+### Offline Inference
+
+Taking [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example, user can perform offline inference with the following Python script:
+
+```python
+import vllm_mindspore # Add this line on the top of script.
+from vllm import LLM, SamplingParams
+
+# Sample prompts.
+prompts = [
+ "I am",
+ "Today is",
+ "Llama is"
+]
+
+# Create a sampling params object.
+sampling_params = SamplingParams(temperature=0.0, top_p=0.95)
+
+# Create a LLM
+llm = LLM(model="Qwen2.5-7B-Instruct")
+# Generate texts from the prompts. The output is a list of RequestOutput objects
+# that contain the prompt, generated text, and other information.
+outputs = llm.generate(prompts, sampling_params)
+# Print the outputs.
+for output in outputs:
+ prompt = output.prompt
+ generated_text = output.outputs[0].text
+ print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")
+```
+
+If offline inference runs successfully, similar results will be obtained:
+
+```text
+Prompt: 'I am'. Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'
+Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the teacher has'
+Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative'
+```
+
+### Online Serving
+
+vLLM MindSpore supports online serving deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example.
+
+#### Starting the Service
+
+Use the model `Qwen/Qwen2.5-7B-Instruct` and start the vLLM service with the following command:
+
+```bash
+python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-7B-Instruct"
+```
+
+If the service starts successfully, similar output will be obtained:
+
+```text
+INFO: Started server process [6363]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+```
+
+Additionally, performance metrics will be logged, such as:
+
+```text
+Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
+```
+
+#### Sending Requests
+
+Use the following command to send a request, where `prompt` is the model input:
+
+```bash
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 15, "temperature": 0}'
+```
+
+If the request is processed successfully, the following inference result will be returned:
+
+```text
+{
+ "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion",
+ "create":1747398389,
+ "model":"Qwen2.5-7B-Instruct",
+ "choices":[
+ {
+ "index":0,
+ "trying to create a virtual environment for my Python project, but I am encountering some",
+ "logprobs":null,
+ "finish_reason":"length",
+ "stop_reason":null,
+ "prompt_logprobs":null
+ }
+ ],
+ "usage":{
+ "prompt_tokens":2,
+ "total_tokens":17,
+ "completion_tokens":15,
+ "prompt_tokens_details":null
+ }
+}
+```
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
new file mode 100644
index 0000000000000000000000000000000000000000..007bdab99ef7f181c9102280ec54feca98e1a3b3
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -0,0 +1,203 @@
+# NPU Single-Node Multi-Card Inference (Qwen2.5-32B)
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md)
+
+This document introduces single-node multi-card inference process by vLLM MindSpore. Taking the [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) model as an example, users can configure the environment through the [Docker Installation](#docker-installation) section or the [Installation Guide](../../installation/installation.md#installation-guide), and then [download the model weights](#downloading-model-weights). After [setting environment variables](#setting-environment-variables), users can perform [online inference](#online-inference) to experience single-node multi-card inference capabilities.
+
+## Docker Installation
+
+In this section, we recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps for Docker deployment:
+
+### Pulling the Image
+
+Pull the vLLM MindSpore Docker image by executing the following command:
+
+```bash
+docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest
+```
+
+During the pull process, user will see the progress of each layer. After successful completion, use can also check the image by running:
+
+```bash
+docker images
+```
+
+### Creating a Container
+
+After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then create the container:
+
+```bash
+export DOCKER_NAME=vllm-mindspore-container # your container name
+export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name
+
+docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
+ --device=/dev/davinci0 \
+ --device=/dev/davinci1 \
+ --device=/dev/davinci2 \
+ --device=/dev/davinci3 \
+ --device=/dev/davinci4 \
+ --device=/dev/davinci5 \
+ --device=/dev/davinci6 \
+ --device=/dev/davinci7 \
+ --device=/dev/davinci_manager \
+ --device=/dev/devmm_svm \
+ --device=/dev/hisi_hdc \
+ -v /usr/local/sbin/:/usr/local/sbin/ \
+ -v /var/log/npu/slog/:/var/log/npu/slog \
+ -v /var/log/npu/profiling/:/var/log/npu/profiling \
+ -v /var/log/npu/dump/:/var/log/npu/dump \
+ -v /var/log/npu/:/usr/slog \
+ -v /etc/hccn.conf:/etc/hccn.conf \
+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+ -v /usr/local/dcmi:/usr/local/dcmi \
+ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
+ -v /etc/ascend_install.info:/etc/ascend_install.info \
+ -v /etc/vnpu.cfg:/etc/vnpu.cfg \
+ --shm-size="250g" \
+ ${IMAGE_NAME} \
+ bash
+```
+
+After successful creation, the container ID will be returned. Verify the container by running:
+
+```bash
+docker ps
+```
+
+### Entering the Container
+
+After [creating the container](#creating-a-container), start and enter the container using the predefined `DOCKER_NAME`:
+
+```bash
+docker exec -it $DOCKER_NAME bash
+```
+
+## Downloading Model Weights
+
+Users can download the model using either [Python Tools](#downloading-with-python-tool) or [git-lfs Tools](#downloading-with-git-lfs-tool).
+
+### Downloading with Python Tool
+
+Execute the following Python script to download the [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) weights and files from [Hugging Face](https://huggingface.co/):
+
+```python
+from openmind_hub import snapshot_downloadfrom huggingface_hub import snapshot_download
+snapshot_download(
+ repo_id="Qwen/Qwen2.5-32B-Instruct",
+ local_dir="/path/to/save/Qwen2.5-32B-Instruct",
+ local_dir_use_symlinks=False
+)
+```
+
+`local_dir` is the user-specified path to save the model. Ensure sufficient disk space is available.
+
+### Downloading with git-lfs Tool
+
+Run the following command to verify if [git-lfs](https://git-lfs.com) is available:
+
+```bash
+git lfs install
+```
+
+If available, the following output will be displayed:
+
+```text
+Git LFS initialized.
+```
+
+If unavailable, install [git-lfs](https://git-lfs.com) first. Refer to the [FAQ](../../../faqs/faqs.md) section for [git-lfs installation](../../../faqs/faqs.md#git-lfs-installation) guidance.
+
+Once confirmed, execute the following command to download the weights:
+
+```bash
+git clone https://huggingface.co/Qwen/Qwen2.5-32B-Instruct
+```
+
+## Setting Environment Variables
+
+For [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct), the following environment variables configure memory allocation, backend, and model-related YAML files:
+
+```bash
+#set environment variables
+export ASCEND_TOTAL_MEMORY_GB=64 # Use `npu-smi info` to check the memory.
+export vLLM_MODEL_BACKEND=MindFormers # Use MindFormers as the model backend.
+export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Adjust based on the model's maximum usage, with the remaining allocated for KV cache.
+export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model YAML file.
+```
+
+Here is an explanation of these environment variables:
+
+- `ASCEND_TOTAL_MEMORY_GB`: The memory size of each compute card. Query using `npu-smi info`, corresponding to `HBM-Usage(MB)` in the results.
+- `vLLM_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).
+- `vLLM_MODEL_MEMORY_USE_GB`: Memory reserved for model loading. Adjust this if encountering insufficient memory.
+- `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2_5). For Qwen2.5-32B, the YAML file is [predict_qwen2_5_32b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2_5/predict_qwen2_5_32b_instruct.yaml).
+
+Users can check memory usage with `npu-smi info` and set the NPU cards for inference using the following example (assuming cards 4,5,6,7 are used):
+
+```bash
+export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7
+```
+
+## Online Inference
+
+vLLM MindSpore supports online serving deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example.
+
+### Starting the Service
+
+Use the model `Qwen/Qwen2.5-32B-Instruct` and start the vLLM service with the following command:
+
+```bash
+export TENSOR_PARALLEL_SIZE=4
+export MAX_MODEL_LEN=1024
+python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-32B-Instruct" --trust_remote_code --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_MODEL_LEN
+```
+
+Here, `TENSOR_PARALLEL_SIZE` specifies the number of NPU cards, and `MAX_MODEL_LEN` sets the maximum output token length.
+
+If the service starts successfully, similar output will be obtained:
+
+```text
+INFO: Started server process [6363]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+```
+
+Additionally, performance metrics will be logged, such as:
+
+```text
+Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
+```
+
+### Sending Requests
+
+Use the following command to send a request, where `prompt` is the model input:
+
+```bash
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
+```
+
+If processed successfully, the inference result will be:
+
+```text
+{
+ "id":"cmpl-11fe2898c77d4ff18c879f57ae7aa9ca","object":"text_completion",
+ "create":1748568696,
+ "model":"Qwen2.5-32B-Instruct",
+ "choices":[
+ {
+ "index":0,
+ "text":"trying to create a virtual environment in Python using venv, but I am encountering some issues with setting",
+ "logprobs":null,
+ "finish_reason":"length",
+ "stop_reason":null,
+ "prompt_logprobs":null
+ }
+ ],
+ "usage":{
+ "prompt_tokens":2,
+ "total_tokens":22,
+ "completion_tokens":20,
+ "prompt_tokens_details":null
+ }
+}
+```
diff --git a/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
new file mode 100644
index 0000000000000000000000000000000000000000..3b4b64ed5db81db388bab9747a4f6bac20442d72
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -0,0 +1,237 @@
+# Single NPU Inference (Qwen2.5-7B)
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md)
+
+This document introduces single NPU inference process by vLLM MindSpore. Taking the [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model as an example, user can configure the environment through the [Docker Installation](#docker-installation) or the [Installation Guide](../../installation/installation.md#installation-guide), and [download model weights](#download-model-weights). After [setting environment variables](#setting-environment-variables), user can perform [offline inference](#offline-inference) and [online inference](#online-inference) to experience single NPU inference abilities.
+
+## Docker Installation
+
+In this section, we recommend using Docker for quick deployment of the vLLM MindSpore environment. Below are the steps for Docker deployment:
+
+### Pulling the Image
+
+Pull the vLLM MindSpore Docker image by executing the following command:
+
+```bash
+docker pull hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest
+```
+
+During the pull process, user will see the progress of each layer. After successful completion, use can also check the image by running:
+
+```bash
+docker images
+```
+
+### Creating a Container
+
+After [pulling the image](#pulling-the-image), set `DOCKER_NAME` and `IMAGE_NAME` as the container and image names, then create the container:
+
+```bash
+export DOCKER_NAME=vllm-mindspore-container # your container name
+export IMAGE_NAME=hub.oepkgs.net/oedeploy/openeuler/aarch64/mindspore:latest # your image name
+
+docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
+ --device=/dev/davinci0 \
+ --device=/dev/davinci1 \
+ --device=/dev/davinci2 \
+ --device=/dev/davinci3 \
+ --device=/dev/davinci4 \
+ --device=/dev/davinci5 \
+ --device=/dev/davinci6 \
+ --device=/dev/davinci7 \
+ --device=/dev/davinci_manager \
+ --device=/dev/devmm_svm \
+ --device=/dev/hisi_hdc \
+ -v /usr/local/sbin/:/usr/local/sbin/ \
+ -v /var/log/npu/slog/:/var/log/npu/slog \
+ -v /var/log/npu/profiling/:/var/log/npu/profiling \
+ -v /var/log/npu/dump/:/var/log/npu/dump \
+ -v /var/log/npu/:/usr/slog \
+ -v /etc/hccn.conf:/etc/hccn.conf \
+ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
+ -v /usr/local/dcmi:/usr/local/dcmi \
+ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
+ -v /etc/ascend_install.info:/etc/ascend_install.info \
+ -v /etc/vnpu.cfg:/etc/vnpu.cfg \
+ --shm-size="250g" \
+ ${IMAGE_NAME} \
+ bash
+```
+
+After successful creation, the container ID will be returned. Verify the container by running:
+
+```bash
+docker ps
+```
+
+### Entering the Container
+
+After [creating the container](#creating-a-container), start and enter the container using the predefined `DOCKER_NAME`:
+
+```bash
+docker exec -it $DOCKER_NAME bash
+```
+
+## Downloading Model Weights
+
+User can download the model using either [Python Tool](#downloading-with-python-tool) or [git-lfs Tool](#downloading-with-git-lfs-tool).
+
+### Downloading with Python Tool
+
+Execute the following Python script to download the [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) weights and files from [Hugging Face](https://huggingface.co/):
+
+```python
+from huggingface_hub import snapshot_download
+snapshot_download(
+ repo_id="Qwen/Qwen2.5-7B-Instruct",
+ local_dir="/path/to/save/Qwen2.5-7B-Instruct",
+ local_dir_use_symlinks=False
+)
+```
+
+`local_dir` is the user-specified model save path. Ensure sufficient disk space is available.
+
+### Downloading with git-lfs Tool
+
+Run the following command to check if [git-lfs](https://git-lfs.com) is available:
+
+```bash
+git lfs install
+```
+
+If available, the following output will be displayed:
+
+```text
+Git LFS initialized.
+```
+
+If unavailable, install [git-lfs](https://git-lfs.com) first. Refer to the [FAQ](../../../faqs/faqs.md) section for [git-lfs installation](../../../faqs/faqs.md#git-lfs-installation) guidance.
+
+Once confirmed, download the weights by executing the following command:
+
+```bash
+git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
+```
+
+## Setting Environment Variables
+
+For [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), the following environment variables configure memory allocation, backend, and model-related YAML files:
+
+```bash
+#set environment variables
+export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
+export vLLM_MODEL_BACKEND=MindFormers # use MindFormers as model backend.
+export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
+export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
+```
+
+Here is an explanation of these variables:
+
+- `ASCEND_TOTAL_MEMORY_GB`: The memory size of each compute card. Query using `npu-smi info`, corresponding to `HBM-Usage(MB)` in the results.
+- `vLLM_MODEL_BACKEND`: The model backend. Currently supported models and backends are listed in the [Model Support List](../../../user_guide/supported_models/models_list/models_list.md).
+- `vLLM_MODEL_MEMORY_USE_GB`: Memory reserved for model loading. Adjust this if encountering insufficient memory.
+- `MINDFORMERS_MODEL_CONFIG`: Model configuration file. User can find the corresponding YAML file in the [MindSpore Transformers repository](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2_5). For Qwen2.5-7B, the YAML file is [predict_qwen2_5_7b_instruct.yaml](https://gitee.com/mindspore/mindformers/blob/r1.5.0/research/qwen2_5/predict_qwen2_5_7b_instruct.yaml).
+
+User can check memory usage with `npu-smi info` and set the compute card for inference using:
+
+```bash
+export NPU_VISIBE_DEVICES=0
+export ASCEND_RT_VISIBLE_DEVICES=$NPU_VISIBE_DEVICES
+```
+
+## Offline Inference
+
+Taking [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example, user can perform offline inference with the following Python code:
+
+```python
+import vllm_mindspore # Add this line on the top of script.
+from vllm import LLM, SamplingParams
+
+# Sample prompts.
+prompts = [
+ "I am",
+ "Today is",
+ "Llama is"
+]
+
+# Create a sampling params object.
+sampling_params = SamplingParams(temperature=0.0, top_p=0.95)
+
+# Create a LLM
+llm = LLM(model="Qwen/Qwen2.5-7B-Instruct")
+# Generate texts from the prompts.
+outputs = llm.generate(prompts, sampling_params)
+# Print the outputs.
+for output in outputs:
+ prompt = output.prompt
+ generated_text = output.outputs[0].text
+ print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")
+```
+
+If offline inference runs successfully, similar results will be obtained:
+
+```text
+Prompt: 'I am'. Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'
+Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the teacher has'
+Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative'
+```
+
+## Online Inference
+
+vLLM MindSpore supports online serving deployment with the OpenAI API protocol. The following section would introduce how to [starting the service](#starting-the-service) and [send requests](#sending-requests) to obtain inference results, using [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example.
+
+### Starting the Service
+
+Use the model `Qwen/Qwen2.5-7B-Instruct` and start the vLLM service with the following command:
+
+```bash
+python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-7B-Instruct"
+```
+
+If the service starts successfully, similar output will be obtained:
+
+```text
+INFO: Started server process [6363]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+```
+
+Additionally, performance metrics will be logged, such as:
+
+```text
+Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
+```
+
+#### Sending Requests
+
+Use the following command to send a request, where `prompt` is the model input:
+
+```bash
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 15, "temperature": 0}'
+```
+
+If the request is processed successfully, the following inference result will be returned:
+
+```text
+{
+ "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion",
+ "create":1747398389,
+ "model":"Qwen2.5-7B-Instruct",
+ "choices":[
+ {
+ "index":0,
+ "trying to create a virtual environment for my Python project, but I am encountering some",
+ "logprobs":null,
+ "finish_reason":"length",
+ "stop_reason":null,
+ "prompt_logprobs":null
+ }
+ ],
+ "usage":{
+ "prompt_tokens":2,
+ "total_tokens":17,
+ "completion_tokens":15,
+ "prompt_tokens_details":null
+ }
+}
+```
diff --git a/docs/vllm_mindspore/docs/source_en/index.rst b/docs/vllm_mindspore/docs/source_en/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..451a5a84fc09651efab9f4f47bc694de49c6cfdf
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/index.rst
@@ -0,0 +1,143 @@
+vLLM MindSpore
+=========================================
+
+Overview
+-----------------------------------------------------
+vLLM MindSpore (`vllm-mindspore`) is a plugin brewed by the `MindSpore community `_ , which aims to integrate MindSpore LLM inference capabilities into `vLLM `_ . With vLLM MindSpore, technical strengths of Mindspore and vLLM will be organically combined to provide a full-stack open-source, high-performance, easy-to-use LLM inference solution.
+
+vLLM, an opensource and community-driven project initiated by Sky Computing Lab, UC Berkeley, has been widely used in academic research and industry applications. On the basis of Continuous Batching scheduling mechanism and PagedAttention Key-Value cache management, vLLM provides a rich set of inference service features, including speculative inference, Prefix Caching, Multi-LoRA, etc. vLLM also supports a wide range of open-source large models, including Transformer-based models (e.g., LLaMa), Mixture-of-Expert models (e.g., DeepSeek), Embedding models (e.g., E5-Mistral), and multi-modal models (e.g., LLaVA). Because vLLM chooses to use PyTorch to build large models and manage storage resources, it cannot deploy large models built upon MindSpore.
+
+vLLM MindSpore plugin aims to integrate Mindspore large models into vLLM and to enable deploying MindSpore-based LLM inference services. It follows the following design principles:
+
+- Interface compatibility: support the native APIs and service deployment interfaces of vLLM to avoid adding new configuration files or interfaces, reducing user learning costs and ensuring ease of use.
+- Minimal invasive modifications: minimize invasive modifications to the vLLM code to ensure system maintainability and evolvability.
+- Component decoupling: minimize and standardize the coupling between MindSpore large model components and vLLM service components to facilitate the integration of various MindSpore large model suites.
+
+On the basis of the above design principles, vLLM MindSpore adopts the system architecture shown in the figure below, and implements the docking between vLLM and Mindspore in categories of components:
+
+- Service components: vLLM MindSpore maps PyTorch API calls in service components including LLMEngine and Scheduler to MindSpore capabilities, inheriting support for service functions like Continuous Batching and PagedAttention.
+- Model components: vLLM MindSpore registers or replaces model components including models, network layers, and custom operators, and integrates MindSpore Transformers, MindSpore One, and other MindSpore large model suites, as well as custom large models, into vLLM.
+
+.. raw:: html
+
+
+
+
+
+
+
+
+
+vLLM MindSpore uses the plugin mechanism recommended by the vLLM community to realize capability registration. In the future, we expect to promote vLLM community to support integration of inference capabilities of third-party AI frameworks, including PaddlePaddle and JAX by following principles described in `[RPC] Multi-framework support for vllm `_ .
+
+Code:
+
+Prerequisites
+-----------------------------------------------------
+
+- Hardware:Atlas 800I A2 Inference series, or Atlas 800T A2 Training series, with necessary drivers installed and access to the Internet
+- Operating System: openEuler or Ubuntu Linux
+- Software:
+
+ * Python >= 3.9, < 3.12
+ * CANN >= 8.0.0.beta1
+ * MindSpore (matched with the vllm-mindspore version)
+ * vLLM (matched with the vllm-mindspore version)
+
+Getting Started
+-----------------------------------------------------
+Please refer to `Quick Start <./getting_started/quick_start/quick_start.html>`_ and `Installation <./getting_started/installation/installation.html>`_ for more details.
+
+Contributing
+-----------------------------------------------------
+Please read `CONTRIBUTING <./developer_guide/contributing.html>`_ for details on setting up development environments, testing functions, and submitting PR.
+
+We welcome and value any form of contribution and cooperation. Please use `Issue `_ to inform us of any bugs you encounter, or to submit your feature requests, improvement suggestions, and technical solutions.
+
+Branch
+-----------------------------------------------------
+The vllm-mindspore repository contains the main branch, development branch, and version branches:
+
+- **main**: the main branch, compatible with Mindspore master branch and vLLM v0.7.3 version, is continuously monitored for quality through Ascend-MindSpore CI.
+- **develop**: the development branch for adapting vLLM features, which is forked from the main branch when a new vLLM version is released. Once the adapted features is stable, it will be merged into the main branch. The current development branch is adapting vLLM v0.8.3 version.
+- **rX.Y.Z**: version branches used for archiving version release, which is forked from the main branch after the adaptation of a certain vLLM version is completed.
+
+The following are the version branches:
+
+.. list-table::
+ :header-rows: 1
+
+ * - Branch
+ - Status
+ - Notes
+ * - master
+ - Maintained
+ - Compatible with vLLM v0.7.3, and CI commitment for MindSpore master branch
+ * - develop
+ - Maintained
+ - Compatible with vLLM v0.8.3
+ * - r0.1
+ - Unmaintained
+ - Only doc fixed is allowed
+ * - r0.2
+ - Maintained
+ - Compatible with vLLM v0.7.3, and CI commitment for MindSpore 2.6.0
+
+SIG
+-----------------------------------------------------
+- Welcome to join vLLM MindSpore SIG to participate in the co-construction of open-source projects and industrial cooperation: https://www.mindspore.cn/community/SIG
+- SIG meetings, every other Friday or Saturday evening, 20:00 - 21:00 (UTC+8, `Convert to your timezone `_ )
+
+License
+-----------------------------------------------------
+Apache License 2.0, as found in the `LICENSE `_ file.
+
+
+.. toctree::
+ :glob:
+ :maxdepth: 2
+ :caption: Quick Start
+ :hidden:
+
+ getting_started/quick_start/quick_start
+ getting_started/installation/installation
+ getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU
+ getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+ :caption: User Guide
+ :hidden:
+
+ user_guide/supported_models/models_list/models_list
+ user_guide/supported_features/features_list/features_list
+ user_guide/supported_features/operations/npu_ops
+ user_guide/supported_features/quantization/quantization
+ user_guide/supported_features/profiling/profiling
+ user_guide/supported_features/benchmark/benchmark
+ user_guide/environment_variables/environment_variables
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+ :caption: Developer Guide
+ :hidden:
+
+ developer_guide/contributing
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+ :caption: FAQ
+ :hidden:
+
+ faqs/faqs
+
+.. toctree::
+ :glob:
+ :maxdepth: 1
+ :caption: RELEASE NOTES
+ :hidden:
+
+ release_notes/release_notes
\ No newline at end of file
diff --git a/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md b/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md
new file mode 100644
index 0000000000000000000000000000000000000000..e8dc13f686149d9ac4066029e039c40b1c38dba1
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md
@@ -0,0 +1,23 @@
+# Release Notes
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/release_notes/release_notes.md)
+
+## vLLM MindSpore 0.3.0 Release Notes
+
+The following are the key new features and models supported in the vLLM MindSpore plugin version 0.3.0.
+
+### New Features
+
+- 0.8.3 V1 Architecture Basic Features, including chunked prefill and automatic prefix caching;
+- V0 Multi-step Scheduling;
+- V0 Chunked Prefill;
+- V0 Automatic Prefix Caching;
+- V0 DeepSeek MTP (Multi-Task Processing);
+- GPTQ Quantization;
+- SmoothQuant Quantization.
+
+### New Models
+
+- DeepSeek-V3/R1
+- Qwen2.5-0.5B/1.5/7B/14B/32B/72B
+- Qwen3-0.6B/1.7B/4B/8B/14B/32B
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
new file mode 100644
index 0000000000000000000000000000000000000000..99d210e577a876d81ee83849724ee445b9080e18
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md
@@ -0,0 +1,20 @@
+# Environment Variable List
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/environment_variables/environment_variables.md)
+
+| Environment Variable | Required for Basic Scenarios | Function |
+|----------------------|-----------------------------|----------|
+| `export vLLM_MODEL_BACKEND=MINDFORMER_MODELS` | Running MindSpore Transformers models | Distinguishes between MindSpore Transformers and vLLM MindSpore native models (default: native models) |
+| `export PYTHONPATH=/xxx/mindformers-dev/:$PYTHONPATH` | Running models in MindSpore Transformers Research directory | MindSpore Transformers must be installed from source, as research directory code is not packaged into whl files |
+| `export MINDFORMERS_MODEL_CONFIG=/xxx.yaml` | Running MindSpore Transformers models | Configuration file for MindSpore Transformers models |
+| `export MS_JIT_MODULES="vllm_mindspore,research"` | Greater than v0.7.3 version | Specifies modules require JIT static compilation in static graph mode; corresponds to top-level module names in imports |
+| `export GLOO_SOCKET_IFNAME=enp189s0f0` | Ray multi-machine | Used for inter-server communication in Ray multi-machine scenarios |
+| `export TP_SOCKET_IFNAME=enp189s0f0` | Ray multi-machine | Required for RPC in Ray multi-machine scenarios |
+| `export HCCL_OP_EXPANSION_MODE=AIV` | Multi-machine | Multi-machine optimization configuring communication algorithm orchestration for acceleration |
+| `export HCCL_EXEC_TIMEOUT=7200` | Multi-machine | Multi-machine optimization controlling device synchronization timeout (seconds, default: 1836) |
+| `export RUN_MODE="predict"` | Basic inference workflow (system default) | Configures network execution mode (predict mode enables optimizations) |
+| `export DEVICE_NUM_PER_NODE=16` | Multi-machine checkpoint splitting | Required for automatic weight splitting functionality (default: 8 NPUs/server) |
+| `export vLLM_USE_NPU_ADV_STEP_FLASH_OP="on"` | MSS (Multi-step scheduler) custom operators | Toggle for custom operators in MSS functionality |
+| `export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1` | Ray multi-machine | Enables Ray dependency in vLLM MindSpore |
+| `export MS_JIT=0` | Quantization scenarios (post v0.7.3) | 0: Disables JIT compilation, executing network scripts in dynamic graph (PyNative) mode |
+| `export FORCE_EAGER="true"` | Quantization scenarios (post v0.7.3) | |
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
new file mode 100644
index 0000000000000000000000000000000000000000..d7eff67b57d152816b0215ad6f5dcd2194f11cb6
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md
@@ -0,0 +1,118 @@
+# Benchmark
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/benchmark/benchmark.md)
+
+The benchmark tool of vLLM MindSpore is inherited from vLLM. You can refer to the [vLLM BenchMark](https://github.com/vllm-project/vllm/blob/main/benchmarks/README.md) documentation for more details. This document introduces [Online Benchmark](#online-performance-testing) and [Offline Benchmark](#offline-performance-testing). Users can follow the steps to conduct performance tests.
+
+## Online Benchmark
+
+For single-GPU inference, we take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. You can prepare the environment by following the guide [NPU Single-GPU Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#online-inference), then start the online service with the following command:
+
+```bash
+vllm-mindspore serve Qwen/Qwen2.5-7B-Instruct --device auto --disable-log-requests
+```
+
+For multi-GPU inference, we take [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example. You can prepare the environment by following the guide [NPU Single-Node Multi-GPU Inference (Qwen2.5-32B)](../../../getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md#online-inference), then start the online service with the following command:
+
+```bash
+export TENSOR_PARALLEL_SIZE=4
+export MAX_MODEL_LEN=1024
+python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-32B-Instruct" --trust_remote_code --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_MODEL_LEN
+```
+
+If the service is successfully started, the following inference result will be returned:
+
+```text
+INFO: Started server process [21349]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+```
+
+Clone the vLLM repository and import the vLLM MindSpore plugin to reuse the benchmark tools:
+
+```bash
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+sed -i '1i import vllm_mindspore' benchmarks/benchmark_serving.py
+```
+
+Execute the test script:
+
+```bash
+# download dataset
+# wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
+
+# single-card, take Qwen2.5-7B as example:
+python3 benchmarks/benchmark_serving.py \
+ --backend openai-chat \
+ --endpoint /v1/chat/completions \
+ --model Qwen/Qwen2.5-7B-Instruct \
+ --dataset-name sharegpt \
+ --dataset-path /ShareGPT_V3_unfiltered_cleaned_split.json \
+ --num-prompts 10
+
+# multi-card, take Qwen2.5-32B as example:
+python3 benchmarks/benchmark_serving.py \
+ --backend openai-chat \
+ --endpoint /v1/chat/completions \
+ --model Qwen/Qwen2.5-32B-Instruct \
+ --dataset-name sharegpt \
+ --dataset-path /ShareGPT_V3_unfiltered_cleaned_split.json \
+ --num-prompts 10
+```
+
+If the test runs successfully, the following results will be returned:
+
+```text
+============ Serving Benchmark Result ============
+Successful requests: ....
+Benchmark duration (s): ....
+Total input tokens: ....
+Total generated tokens: ....
+Request throughput (req/s): ....
+Output token throughput (tok/s): ....
+Total Token throughput (tok/s): ....
+---------------Time to First Token----------------
+Mean TTFT (ms): ....
+Median TTFT (ms): ....
+P99 TTFT (ms): ....
+-----Time per Output Token (excl. 1st token)------
+Mean TPOT (ms): ....
+Median TPOT (ms): ....
+P99 TPOT (ms): ....
+---------------Inter-token Latency----------------
+Mean ITL (ms): ....
+Median ITL (ms): ....
+P99 ITL (ms): ....
+==================================================
+```
+
+## Offline Benchmark
+
+For offline performance benchmark, take [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as an example. Prepare the environment by following the guide [NPU Single-GPU Inference (Qwen2.5-7B)](../../../getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md#offline-inference).
+
+Clone the vLLM repository and import the vLLM-MindSpore plugin to reuse the benchmark tools:
+
+```bash
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+sed -i '1i import vllm_mindspore' benchmarks/benchmark_throughput.py
+```
+
+Run the test script with the following command:
+
+```bash
+python3 benchmarks/benchmark_throughput.py \
+ --model Qwen/Qwen2.5-7B-Instruct \
+ --dataset-name sonnet \
+ --dataset-path benchmarks/sonnet.txt \
+ --num-prompts 10
+```
+
+If the test runs successfully, the following results will be returned:
+
+```text
+Throughput: ... requests/s, ... total tokens/s, ... output tokens/s
+Total num prompt tokens: ...
+Total num output tokens: ...
+```
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md
new file mode 100644
index 0000000000000000000000000000000000000000..b75f8f133fb5425dd5a79af93a6e47eada49ca06
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md
@@ -0,0 +1,36 @@
+# Supported Features List
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/features_list/features_list.md)
+
+The features supported by vLLM MindSpore are consistent with the community version of vLLM. For feature descriptions and usage, please refer to the [vLLM Official Documentation](https://docs.vllm.ai/en/latest/).
+
+The following is the features supported in vLLM MindSpore.
+
+| **Features** | **vLLM V0** | **vLLM V1** |
+|-----------------------------------|--------------------|--------------------|
+| Chunked Prefill | √ | √ |
+| Automatic Prefix Caching | √ | √ |
+| Multi step scheduler | √ | × |
+| DeepSeek MTP | √ | WIP |
+| Async output | √ | √ |
+| Quantization | √ | √ |
+| LoRA | WIP | WIP |
+| Tensor Parallel | √ | √ |
+| Pipeline Parallel | WIP | WIP |
+| Expert Parallel | × | √ |
+| Data Parallel | × | √ |
+| Prefill Decode Disaggregation | × | √ |
+| Multi Modality | WIP | WIP |
+| Prompt adapter | × | WIP |
+| Speculative decoding | × | WIP |
+| LogProbs | × | WIP |
+| Prompt logProbs | × | WIP |
+| Best of | × | × |
+| Beam search | × | WIP |
+| Guided Decoding | × | WIP |
+| Pooling | × | × |
+| Enc-dec | × | × |
+
+- √:Feature aligned with the community version of vLLM.
+- ×:Currently unsupported; alternative solutions are recommended.
+- WIP:Under development or planned for future implementation.
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md
new file mode 100644
index 0000000000000000000000000000000000000000..4e8392a75eae6a7bec60b1df429e7f3b77d4f1ae
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md
@@ -0,0 +1,105 @@
+# Custom Operator Integration
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/operations/npu_ops.md)
+
+This document would introduce how to integrate a new custom operator into the vLLM MindSpore project, with the **`adv_step_flash`** operator as an example. The following sections would focus on the integration process, and user can refer to operator implementation introduction in official MindSpore tutorial: [Dynamic Graph Custom Operator Integration](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder.html).
+
+For development, additional features can be extended based on project requirements. Implementation details can be referenced from [MindSpore Custom Operator Implementation](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder.html).
+
+## File Structure
+
+The directory `vllm_mindspore/ops` contains and declaration and implementation of operations:
+
+```text
+vllm_mindspore/ops/
+├── ascendc/
+│ ├── adv_step_flash.h // AscendC AdvStepFlash operator declaration
+│ ├── adv_step_flash.c // AscendC AdvStepFlash operator implementation
+│ └── ...
+├── module/
+│ ├── module.h // Common module registration header
+│ ├── module.cpp // Common module registration implementation
+│ ├── adv_step_flash.cpp // Integration layer code (Python interface registration)
+│ └── ...
+```
+
+- **`ops/ascendc/`**: Contains AscendC custom operator implementation code.
+- **`ops/module/`**: Contains operator integration layer code, including common module registration (`module.h`, `module.cpp`) and operator-specific integration (e.g., `adv_step_flash.cpp`).
+
+## Integration Process
+
+To integrate a custom operator, user need to create [Operator Interface Declaration](#operator-interface-declaration), [Operator Implementation](#operator-implementation) and [Operator Integration](#operator-integration) in the directory `ops/ascendc/`. And do [Operator Compilation and Testing](#operator-compilation-and-testing) after declaration and implementation.
+
+### Operator Interface Declaration
+
+Create a header file (e.g., `my_custom_op.h`) in `ops/ascendc/` to declare the operator function and related interfaces:
+
+```cpp
+#ifndef VLLM_MINDSPORE_OPS_ASCENDC_MY_CUSTOM_OP_H
+#define VLLM_MINDSPORE_OPS_ASCENDC_MY_CUSTOM_OP_H
+
+extern void MyCustomOpKernelEntry(uint32_t blockDims, void *l2ctrl, void *aclStream,
+ uint8_t *input, uint8_t *output, int32_t param1, int32_t param2);
+
+#endif // VLLM_MINDSPORE_OPS_ASCENDC_MY_CUSTOM_OP_H
+```
+
+### Operator Implementation
+
+Create an implementation file (e.g., `my_custom_op.c`) in `ops/ascendc/` for the core logic:
+
+```cpp
+#include "my_custom_op.h"
+#include "kernel_operator.h"
+
+extern "C" __global__ __aicore__ void my_custom_op_impl(GM_ADDR input, GM_ADDR output,
+ int32_t param1, int32_t param2) {
+ // AscendC operation implement
+}
+
+#ifndef __CCE_KT_TEST__
+void MyCustomOpKernelEntry(uint32_t blockDims, void *l2ctrl, void *aclStream,
+ uint8_t *input, uint8_t *output, int32_t param1, int32_t param2) {
+ my_custom_op_impl<<>>(input, output, param1, param2);
+}
+#endif
+```
+
+### Operator Integration
+
+Create an integration file (e.g., `my_custom_op.cpp`) in `module/`. User can refer to `adv_step_flash.cpp` for more details about the integration:
+
+```cpp
+#include "ms_extension.h"
+#include "ascendc/my_custom_op.h"
+#include "module/module.h"
+
+void MyCustomOpPythonInterface(int32_t param1, int32_t param2,
+ BaseTensorPtr input, BaseTensorPtr output) {
+ ...
+}
+
+MS_EXTENSION_MODULE(my_custom_op) {
+ m.def("my_custom_op", &MyCustomOpPythonInterface, "My custom operator",
+ pybind11::arg("param1"), pybind11::arg("param2"),
+ pybind11::arg("input"), pybind11::arg("output"));
+}
+```
+
+### Operator Compilation and Testing
+
+1. **Code Integration**: Merge the code into the vLLM MindSpore project.
+2. **Project Compilation**: Build and install the whl package containing the custom operator.
+3. **Operator Testing**: Invoke the operator in Python:
+
+ ```python
+ from vllm_mindspore import npu_ops
+ import numpy as np
+ import mindspore as ms
+
+ input = ms.Tensor(np.array([1, 2, 3], dtype=np.int32))
+ output = ms.Tensor(np.zeros_like(input))
+
+ npu_ops.my_custom_op(10, 20, input, output)
+ print("Output:", output)
+ ```
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/op_detail.png b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/op_detail.png
new file mode 100644
index 0000000000000000000000000000000000000000..cab7a5dcc9b4146d6375efba9a947c73c2f162b9
Binary files /dev/null and b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/op_detail.png differ
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/op_total.png b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/op_total.png
new file mode 100644
index 0000000000000000000000000000000000000000..a8dbc1f6548dafd6f0a8a1b777499560a18b9095
Binary files /dev/null and b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/op_total.png differ
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md
new file mode 100644
index 0000000000000000000000000000000000000000..8f31d5809fba8e9680b8009b9d48cf433557bd08
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md
@@ -0,0 +1,89 @@
+# Profiling Methods
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/profiling.md)
+
+vLLM MindSpore supports the `mindspore.Profiler` module to track the performance of workers in vLLM MindSpore. User can follow the [Collecting Profiling Data](#collecting-profiling-data) section to gather data and then analyze it according to [Analyzing Profiling Data](#analyzing-profiling-data). Additionally, user can inspect the model's IR graph through [Graph Data Dump](#graph-data-dump) to analyze and debug the model structure.
+
+## Collecting Profiling Data
+
+To enable profiling data collection, user need to set the `VLLM_TORCH_PROFILER_DIR` environment variable to the directory where the profiling results will be saved. For multi-machine inference, this variable must be set on each machine before inference:
+
+```bash
+export VLLM_TORCH_PROFILER_DIR=/path/to/save/vllm_profile
+```
+
+After setting the variable, Run the following command to launch the vLLM MindSpore service. We take [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as an example:
+
+```bash
+export TENSOR_PARALLEL_SIZE=4
+export MAX_MODEL_LEN=1024
+python3 -m vllm_mindspore.entrypoints vllm.entrypoints.openai.api_server --model "Qwen/Qwen2.5-32B-Instruct" --trust_remote_code --tensor-parallel-size $TENSOR_PARALLEL_SIZE --max-model-len $MAX_MODEL_LEN
+```
+
+If the service starts successfully, you will see output similar to the following, indicating that the `start_profile` and `stop_profile` requests are being monitored:
+
+```text
+INFO 05-15 12:03:07 [launcher.py:31] Route: /start_profile, Methods: POST
+INFO 05-15 12:03:07 [launcher.py:31] Route: /stop_profile, Methods: POST
+INFO: Started server process [212135]
+INFO: Waiting for application startup.
+INFO: Application startup complete.
+```
+
+Once the service is running, user can send the following requests to perform a profiling collection:
+
+```shell
+# Request to start profiling
+curl -X POST http://127.0.0.1:8000/start_profile
+
+# Request for inference
+curl http://localhost:8000/v1/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "/home/DeepSeekV3",
+ "prompt": "San Francisco is a",
+ "max_tokens": 7,
+ "temperature": 0
+ }'
+
+# Request to stop profiling
+curl -X POST http://127.0.0.1:8000/stop_profile
+```
+
+When the log displays content similar to the following, it indicates that profiling data collection for one worker is complete:
+
+```text
+Parsing: [####################] 3/3 Done
+```
+
+## Analyzing Profiling Data
+
+The directory specified by `VLLM_TORCH_PROFILER_DIR` contains the profiling results, with subdirectories named with the `ascend_ms` suffix. Each subdirectory stores the profiling results for one worker. The files in these subdirectories can be referenced for performance analysis, as described in [Ascend Performance Tuning](https://www.mindspore.cn/tutorials/en/master/debug/profiler.html).
+
+User can select a subdirectory to analyze the performance of a single worker:
+
+- `op_statistic.csv`: Overall operator statistics.
+
+ 
+
+- `kernel_details.csv`: Detailed execution data for each operator.
+
+ 
+
+- `trace_view.json`: System-wide execution data. This file can be uploaded to the [Perfetto UI](https://ui.perfetto.dev/) for visual inspection of system execution. Clicking on a process in the left sidebar displays trace event information for all threads under that process:
+
+ 
+
+ - MindSpore Process Details: Shows operator dispatch during graph execution.
+
+ 
+
+ - **Ascend Process Details**: Shows the actual execution of Ascend operators, which can be correlated with the operators dispatched in the MindSpore process.
+
+ 
+
+## Graph Data Dump
+
+Refer to the [MindSpore Dump Documentation](https://www.mindspore.cn/tutorials/en/master/debug/dump.html). First, configure the JSON file, then set the `MINDSPORE_DUMP_CONFIG` environment variable to point to the absolute path of this configuration file. After inference completes, the graph data can be obtained.
+
+The dump results include the IR graph. Additionally, by configuring the `dump_mode` in the JSON file, user can choose to dump execution data for all operators or specific operators.
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_1.png b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..83f6d0123bfdbf36f388accf6f95c00bfce51248
Binary files /dev/null and b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_1.png differ
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_2.png b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..69d3bbd4cf661c7f67bc93b20ab080cf77b8916b
Binary files /dev/null and b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_2.png differ
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_total.png b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_total.png
new file mode 100644
index 0000000000000000000000000000000000000000..8e260cfd42ba6dee53f825282cfdd7dff8c3f9b9
Binary files /dev/null and b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/profiling/trace_total.png differ
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
new file mode 100644
index 0000000000000000000000000000000000000000..9663e0f88261275306c9947c95d219d0ce35f360
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md
@@ -0,0 +1,131 @@
+# Quantization Methods
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_features/quantization/quantization.md)
+
+This document introduces model quantization and quantized inference methods. Quantization reduces inference resource with minor cost of precision, while improving inference performance to enable deployment on more devices. With the large scale of LLMs, post-training quantization has become the mainstream approach for model quantization. For details, refer to [Post-Training Quantization Introduction](https://gitee.com/mindspore/golden-stick/blob/master/mindspore_gs/ptq/README_CN.md).
+
+In this document, the [Creating Quantized Models](#creating-quantized-models) section introduces post-training quantization steps using [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) as an example. A the [Quantized Model Inference](#quantized-model-inference) section explains how to perform inference with quantized models.
+
+## Creating Quantized Models
+
+We use the [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) network as an example to introduce A8W8 quantization with the SmoothQuant algorithm.
+
+### Quantizing Networks with MindSpore Golden Stick
+
+We employ [MindSpore Golden Stick's PTQ algorithm](https://gitee.com/mindspore/golden-stick/blob/master/mindspore_gs/ptq/ptq/README_CN.md) for SmoothQuant quantization of Qwen3-8B. For detailed methods, refer to [Qwen3-SmoothQuant Quantization Example](todo).
+
+#### Downloading Qwen3-8B Weights
+
+Users can download the weights using huggingface-cli:
+
+```bash
+huggingface-cli download --resume-download Qwen/Qwen3-8B --local-dir Qwen3-8B-bf16
+```
+
+Alternatively, use [other download methods](../../../getting_started/quick_start/quick_start.md#download-model).
+
+#### Loading the Network with MindSpore Transformers
+
+Load the network using [MindSpore Transformers](https://gitee.com/mindspore/mindformers) with the following script:
+
+```python
+from mindformers import AutoModel
+from mindformers import AutoTokenizer
+
+network = AutoModel.from_pretrained("Qwen3-8B-bf16")
+tokenizer = AutoTokenizer.from_pretrained("Qwen3-8B-bf16")
+```
+
+#### Preparing the CEval Dataset
+
+Download the CEval dataset to the `ceval` directory with the following structure:
+
+```bash
+ceval
+ ├── dev
+ ├── test
+ └── val
+```
+
+Create a dataset handle using MindSpore:
+
+```python
+from mindspore import GeneratorDataset
+ds = GeneratorDataset(source="ceval", column_names=["subjects", "input_ids", "labels"])
+```
+
+#### Performing Post-Training Quantization with Golden Stick
+
+Use the following Python script for post-training quantization:
+
+```python
+from mindspore import dtype as msdtype
+from mindspore_gs.ptq import PTQ
+from mindspore_gs.common import BackendTarget
+from mindspore_gs.ptq import PTQConfig, PTQMode, OutliersSuppressionType, QuantGranularity, PrecisionRecovery
+
+cfg = PTQConfig(mode=quant_mode, backend=BackendTarget.ASCEND, weight_quant_dtype=msdtype.int8,
+ act_quant_dtype=msdtype.int8, outliers_suppression=OutliersSuppressionType.SMOOTH,
+ opname_blacklist=['lm_head'])
+w2_config = PTQConfig(mode=quant_mode, backend=BackendTarget.ASCEND, weight_quant_dtype=msdtype.int8,
+ act_quant_dtype=msdtype.int8,
+ outliers_suppression=OutliersSuppressionType.NONE,
+ precision_recovery=PrecisionRecovery.NONE,
+ act_quant_granularity=QuantGranularity.PER_TOKEN,
+ weight_quant_granularity=QuantGranularity.PER_CHANNEL)
+layer_policies = OrderedDict({r'.*\.w2.*': w2_config})
+ptq = PTQ(config=cfg, layer_policies=layer_policies)
+from research.qwen3.qwen3_transformers import Qwen3ParallelTransformerLayer
+ptq.decoder_layer_types.append(Qwen3ParallelTransformerLayer)
+ptq.apply(network, ds)
+ptq.convert(network)
+ms.save_checkpoint(network.parameters_dict(), "Qwen3-8B-A8W8", format="safetensors",
+ choice_func=lambda x: "key_cache" not in x and "value_cache" not in x and "float_weight" not in x)
+```
+
+Before calibration, add the MindSpore Transformers root directory to the `PYTHONPATH` environment variable, and check Qwen3-related classes have been successfully imported.
+
+### Downloading Quantized Weights
+
+We have uploaded the quantized Qwen3-8B to [ModelArts Community](https://modelers.cn): [MindSpore-Lab/Qwen3-8B-A8W8](https://modelers.cn/models/MindSpore-Lab/Qwen3-8B-A8W8). Refer to the [ModelArts Community documentation](https://modelers.cn/docs/zh/openmind-hub-client/0.9/basic_tutorial/download.html) to download the weights locally.
+
+## Quantized Model Inference
+
+After obtaining the Qwen3-8B SmoothQuant weights, ensure they are stored in the relative path `Qwen3-8B-A8W8`.
+
+### Offline Inference
+
+Refer to the [Installation Guide](../../../getting_started/installation/installation.md) to set up the vLLM MindSpore environment. Once ready, use the following Python code for offline inference:
+
+```python
+import vllm_mindspore # Add this line at the top of the script
+from vllm import LLM, SamplingParams
+
+# Sample prompts
+prompts = [
+ "I am",
+ "Today is",
+ "Llama is"
+]
+
+# Create sampling parameters
+sampling_params = SamplingParams(temperature=0.0, top_p=0.95)
+
+# Initialize LLM
+llm = LLM(model="Qwen3-8B-A8W8", quantization='SmoothQuant')
+# Generate text
+outputs = llm.generate(prompts, sampling_params)
+# Print results
+for output in outputs:
+ prompt = output.prompt
+ generated_text = output.outputs[0].text
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+
+Successful execution will yield inference results like:
+
+```text
+Prompt: 'I am', Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'
+Prompt: 'Today is', Generated text: ' the 100th day of school. To celebrate, the teacher has'
+Prompt: 'Llama is', Generated text: ' a 100% natural, biodegradable, and compostable alternative'
+```
diff --git a/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
new file mode 100644
index 0000000000000000000000000000000000000000..097ed295ccc70f52b59f35c17cb6c823e718ddc1
--- /dev/null
+++ b/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md
@@ -0,0 +1,21 @@
+# Supported Model List
+
+[](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/user_guide/supported_models/models_list/models_list.md)
+
+| Model | Supported | Download Link | Backend |
+|-------| --------- | ------------- | ------- |
+| Qwen2.5 | √ | [Qwen2.5-7B](https://modelers.cn/models/AI-Research/Qwen2.5-7B), [Qwen2.5-32B](https://modelers.cn/models/AI-Research/Qwen2.5-32B), etc. | MINDFORMER_MODELS |
+| Qwen3 | √ | [Qwen3-8B](https://modelers.cn/models/MindSpore-Lab/Qwen3-8B), [Qwen3-32B](https://modelers.cn/models/MindSpore-Lab/Qwen3-32B), etc. | MINDFORMER_MODELS |
+| DeepSeek V3 | √ | [DeepSeek-V3](https://modelers.cn/models/MindSpore-Lab/DeepSeek-V3), etc. | MINDFORMER_MODELS |
+| DeepSeek R1 | √ | [DeepSeek-R1](https://modelers.cn/models/MindSpore-Lab/DeepSeek-R1), [Deepseek-R1-W8A8](https://modelers.cn/models/MindSpore-Lab/DeepSeek-r1-w8a8), etc. | MINDFORMER_MODELS |
+
+The "Backend" refers to the source of the model, which can be either from MindSpore Transformers or vLLM MindSpore native models. It is specified using the environment variable `vLLM_MODEL_BACKEND`:
+
+- If the model source is MindSpore Transformers, the value is `MINDFORMER_MODELS`;
+- If the model source is vLLM MindSpore, the value is `NATIVE_MODELS`.
+
+By default, the backend is set to `NATIVE_MODELS`. To change the model backend, use the following command:
+
+```bash
+export vLLM_MODEL_BACKEND=MINDFORMER_MODELS
+```
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md b/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md
index deab07166d628d82fc4cea03b58ce6fe3e846b0c..7815a11f1a09561bf0569f3d395713ae7f9af9de 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/faqs/faqs.md
@@ -58,6 +58,7 @@
RuntimeError: Call aclnnNonzeroV2 failed, detail:E39999: Inner Error
```
+- 解决思路:
请检查CANN与MindSpore的配套关系是否正确。
### 执行Qwen3时,报vLLM相关的`resolve_transformers_fallback`导入错误
@@ -68,6 +69,7 @@
ImportError: cannot import name 'resolve_transformers_fallback' from 'vllm.model_executor.model_loader.utils'
```
+- 解决思路:
请尝试将`vllm`切换为`v0.7.3`版本。
### `import vllm_mindspore`时找不到`torch`
@@ -78,6 +80,7 @@
importlib.metadata.PackageNotFoundError: No package metadata was found for torch
```
+- 解决思路:
请执行以下命令,下载torch相关组件:
```bash
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
index 46ec5290d90c88e19247874fdd100d9d941a6190..d5f0defb14458e2fb3446496174dae89ff05bd1b 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/installation/installation.md
@@ -22,7 +22,7 @@
|[MindSpore Transformers](https://gitee.com/mindspore/mindformers)|1.6 | br_infer_deepseek_os |
|[Golden Stick](https://gitee.com/mindspore/golden-stick)|1.1.0 | r1.1.0 |
|[vLLM](https://github.com/vllm-project/vllm) | 0.8.3 | v0.8.3 |
- |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.2 | develop |
+ |[vLLM MindSpore](https://gitee.com/mindspore/vllm-mindspore) | 0.2 | master |
## 配置环境
@@ -106,8 +106,8 @@ pip install vllm_mindspore
### 源码安装
-- 安装CANN与MindSpore
- CANN与mindspore的环境配套与安装方法,请参考[MindSpore安装教程](https://www.mindspore.cn/install)与[CANN社区版软件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit)。
+- **CANN安装**
+ CANN安装方法与环境配套,请参考[CANN社区版软件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=openEuler&Software=cannToolKit),若用户在安装CANN过程中遇到问题,可参考[昇腾常见问题](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html)进行解决。
CANN默认安装路径为`/usr/local/Ascend`。用户在安装CANN完毕后,使用如下命令,为CANN配置环境变量:
@@ -117,43 +117,14 @@ pip install vllm_mindspore
export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
```
- 用户安装之后,可以通过以下命令,校验CANN与MindSpore是否安装成功:
+- **vLLM前置依赖安装**
+ vLLM的环境配置与安装方法,请参考[vLLM安装教程](https://docs.vllm.ai/en/v0.8.3/getting_started/installation/cpu.html)。其依赖`gcc/g++ >= 12.3.0`版本,可通过以下命令完成安装:
```bash
- python -c "import mindspore;mindspore.set_context(device_target='Ascend');mindspore.run_check();exit()"
+ yum install -y gcc gcc-c++
```
- 若执行后返回以下结果,则MindSpore已安装成功:
-
- ```text
- The result of multiplication calculation is correct, MindSpore has been installed on platform [Ascend] successfully!
- ```
-
- 若用户在安装CANN与MindSpore过程中遇到问题,可参考[MindSpore常见问题](https://www.mindspore.cn/docs/zh-CN/r2.6.0/faq/)与[昇腾常见问题](https://www.hiascend.com/document/detail/zh/AscendFAQ/ProduTech/CANNFAQ/cannfaq_000.html)进行解决。
-
-- 安装vLLM
- vLLM的环境配置与安装方法,请参考[vLLM安装教程](https://docs.vllm.ai/en/v0.8.3/getting_started/installation/cpu.html)。其依赖`gcc/g++ >= 12.3.0`的版本,在准备好该依赖后,执行以下命令拉取vLLM源码:
-
- ```bash
- git clone https://github.com/vllm-project/vllm.git vllm_source
- cd vllm_source
- ```
-
- 安装vLLM CPU后端所需Python依赖包:
-
- ```bash
- pip install --upgrade pip
- pip install "cmake>=3.26" wheel packaging ninja "setuptools-scm>=8" numpy
- pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
- ```
-
- 最后,编译安装vLLM CPU:
-
- ```bash
- VLLM_TARGET_DEVICE=cpu python setup.py install
- ```
-
-- 安装vLLM MindSpore
+- **vLLM MindSpore安装*
安装vLLM MindSpore,需要在拉取vLLM MindSpore源码后,执行以下命令,安装依赖包:
@@ -169,6 +140,15 @@ pip install vllm_mindspore
pip install .
```
+ 上述命令执行完毕之后,将在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-dev`文件夹,将其加入到环境变量中:
+
+ ```bash
+ export MF_PATH=`pwd install_depend_pkgs/mindformers-dev`
+ export PYTHONPATH=$MF_PATH:$PYTHONPATH
+ ```
+
+ 若MindSpore Transformers是由`br_infer_deepseek_os`分支编译安装,则会在`vllm-mindspore/install_depend_pkgs`目录下生成`mindformers-os`文件夹,则环境变量`MF_PATH`需调整为`pwd install_depend_pkgs/mindformers-os`。
+
### 快速验证
用户可以创建一个简单的离线推理场景,验证安装是否成功。下面以[Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) 为例,用户可以使用如下Python脚本,进行模型的离线推理:
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
index 16e5663c23438fdcae4f47ad5917b7967c7a031c..aca28df54978cbbce3a4afa8a545a3e077771e84 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/quick_start/quick_start.md
@@ -125,7 +125,7 @@ git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct
```bash
export ASCEND_TOTAL_MEMORY_GB=64 # Please use `npu-smi info` to check the memory.
export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
-export vLLM_MODEL_MEMORY_USE_GB=16 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
+export vLLM_MODEL_MEMORY_USE_GB=32 # Memory reserved for model execution. Set according to the model's maximum usage, with the remaining environment used for kvcache allocation
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
```
@@ -202,24 +202,24 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0
#### 发送请求
-使用如下命令发送请求。其中`$PROMPT`为模型输入:
+使用如下命令发送请求。其中`prompt`字段为模型输入:
```bash
PROMPT="I am"
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "$PROMPT", "max_tokens": 120, "temperature": 0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
```
若请求处理成功,将获得以下的推理结果:
```text
{
- "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion",
- "create":1747398389,
+ "id":"cmpl-bac2b14c726b48b9967bcfc724e7c2a8","object":"text_completion",
+ "create":1748485893,
"model":"Qwen2.5-7B-Instruct",
"choices":[
{
"index":0,
- "trying to create a virtual environment for my Python project, but I am encountering some",
+ "trying to create a virtual environment for my Python project, but I am encountering some issues with setting up",
"logprobs":null,
"finish_reason":"length",
"stop_reason":null,
@@ -228,8 +228,8 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d
],
"usage":{
"prompt_tokens":2,
- "total_tokens":17,
- "completion_tokens":15,
+ "total_tokens":22,
+ "completion_tokens":20,
"prompt_tokens_details":null
}
}
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_multiNode/deepseek_r1_671b_w8a8_tp16_multi_node.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_multiNode/deepseek_r1_671b_w8a8_tp16_multi_node.md
index 22a74c910e2d043f8b4ca8214b3edd904ac35189..af563414a8a993bc6ba58f27887779c7999a34ca 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_multiNode/deepseek_r1_671b_w8a8_tp16_multi_node.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/deepseek_multiNode/deepseek_r1_671b_w8a8_tp16_multi_node.md
@@ -86,7 +86,7 @@ git clone https://modelers.cn/MindSpore-Lab/DeepSeek-R1-W8A8.git
分别在主从节点配置如下环境变量:
-> 注:环境变量必须设置在 Ray 创建集群前,且当环境有变更时,需要通过 `ray stop` 将主从节点集群停止,并重新创建集群,否则环境变量将不生效。
+> 环境变量必须设置在 Ray 创建集群前,且当环境有变更时,需要通过 `ray stop` 将主从节点集群停止,并重新创建集群,否则环境变量将不生效。
```bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
index 5484b56c404d568dbbec91c4afb4ffbbee223285..4e4fbdc76af5f1d84109fc08b5eec8ffaaa52710 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_32b_multiNPU/qwen2.5_32b_multiNPU.md
@@ -171,25 +171,23 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0
### 发送请求
-使用如下命令发送请求。其中`$PROMPT`为模型输入:
+使用如下命令发送请求。其中`prompt`字段为模型输入:
```bash
-PROMPT="请介绍Qwen2.5-32B模型"
-MAX_TOKEN=64
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "$PROMPT", "max_tokens": $MAX_TOKEN, "temperature": 0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen2.5-32B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
```
若请求处理成功,将获得以下的推理结果:
```text
{
- "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion",
- "create":1747398389,
+ "id":"cmpl-11fe2898c77d4ff18c879f57ae7aa9ca","object":"text_completion",
+ "create":1748568696,
"model":"Qwen2.5-32B-Instruct",
"choices":[
{
"index":0,
- "text":"的使用方法\nQwen2.5-32B 是一个大型的自然语言处理模型,通常用于生成文本、回答问题、进行对话等任务。以下是使用 Qwen2.5-32B 模型的一般步骤:\n\n### 1. 环境准备\n-",
+ "text":"trying to create a virtual environment in Python using venv, but I am encountering some issues with setting",
"logprobs":null,
"finish_reason":"length",
"stop_reason":null,
@@ -197,9 +195,9 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d
}
],
"usage":{
- "prompt_tokens":12,
- "total_tokens":76,
- "completion_tokens":64,
+ "prompt_tokens":2,
+ "total_tokens":22,
+ "completion_tokens":20,
"prompt_tokens_details":null
}
}
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
index 72ca0cc4cf8532445a51add3cfc9123532679382..75cafa056664a2a4f2ae587ee45c5f73633fe648 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
+++ b/docs/vllm_mindspore/docs/source_zh_cn/getting_started/tutorials/qwen2.5_7b_singleNPU/qwen2.5_7b_singleNPU.md
@@ -206,25 +206,23 @@ Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg gereration throughput: 0.0
#### 发送请求
-使用如下命令发送请求。其中`$PROMPT`为模型输入:
+使用如下命令发送请求。其中`prompt`字段为模型输入:
```bash
-PROMPT="I am"
-MAX_TOKEN=120
-curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "$PROMPT", "max_tokens": $MAX_TOKEN, "temperature": 0}'
+curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "I am", "max_tokens": 20, "temperature": 0}'
```
若请求处理成功,将获得以下的推理结果:
```text
{
- "id":"cmpl-5e6e314861c24ba79fea151d86c1b9a6","object":"text_completion",
- "create":1747398389,
+ "id":"cmpl-bac2b14c726b48b9967bcfc724e7c2a8","object":"text_completion",
+ "create":1748485893,
"model":"Qwen2.5-7B-Instruct",
"choices":[
{
"index":0,
- "trying to create a virtual environment for my Python project, but I am encountering some",
+ "trying to create a virtual environment for my Python project, but I am encountering some issues with setting up",
"logprobs":null,
"finish_reason":"length",
"stop_reason":null,
@@ -233,8 +231,8 @@ curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d
],
"usage":{
"prompt_tokens":2,
- "total_tokens":17,
- "completion_tokens":15,
+ "total_tokens":22,
+ "completion_tokens":20,
"prompt_tokens_details":null
}
}
diff --git a/docs/vllm_mindspore/docs/source_zh_cn/index.rst b/docs/vllm_mindspore/docs/source_zh_cn/index.rst
index 82db28882a851b46832039034a8691bd37c3059b..9aff69b749ff52266e31c61b1a5927d8a5cfc36a 100644
--- a/docs/vllm_mindspore/docs/source_zh_cn/index.rst
+++ b/docs/vllm_mindspore/docs/source_zh_cn/index.rst
@@ -28,7 +28,7 @@ vLLM MindSpore插件以将MindSpore大模型接入vLLM,并实现服务化部
-vLLM MindSpore采用vLLM社区推荐的插件机制,实现能力注册。未来期望遵循 `RPC Multi-framework support for vllm `_ 所述原则,推动上游vLLM社区通过抽象和解耦AI框架,支持接入包括PaddlePaddle、JAX等多类型AI框架推理能力。
+vLLM MindSpore采用vLLM社区推荐的插件机制,实现能力注册。未来期望遵循 `RPC Multi-framework support for vllm `_ 所述原则。
代码仓地址:
@@ -124,7 +124,7 @@ Apache 许可证 2.0,如 `LICENSE 注意,检测结果的输出值为`list`格式,其中元素值格式为`(文件名, 报错单元, 报错单元行, 报错码, 报错信息)`
+> 检测结果的输出值为`list`格式,其中元素值格式为`(文件名, 报错单元, 报错单元行, 报错码, 报错信息)`
## 环境准备
diff --git a/tutorials/source_en/beginner/introduction.md b/tutorials/source_en/beginner/introduction.md
index e71e0663fda95680120813f397d78ad4dd7ed86e..0acb1e3db4238ff29068c75090455ad541cd14f1 100644
--- a/tutorials/source_en/beginner/introduction.md
+++ b/tutorials/source_en/beginner/introduction.md
@@ -52,4 +52,4 @@ Welcome every developer to the MindSpore community and contribute to this all-sc
- [MindSpore GitHub](https://github.com/mindspore-ai/mindspore): MindSpore code image of Gitee. Developers who are accustomed to using GitHub can learn MindSpore and view the latest code implementation here.
-- **MindSpore forum**: We are dedicated to serving every developer. You can find your voice in MindSpore, regardless of whether you are an entry-level developer or a master. Let's learn and grow together. ([Learn more](https://www.hiascend.com/forum/forum-0106101385921175002-1.html))
+- **MindSpore forum**: We are dedicated to serving every developer. You can find your voice in MindSpore, regardless of whether you are an entry-level developer or a master. Let's learn and grow together. ([Learn more](https://discuss.mindspore.cn/))
diff --git a/tutorials/source_en/compile/static_graph.md b/tutorials/source_en/compile/static_graph.md
index 17637f1e4a808f1e15005e23d51b472e03683a40..b90993c9cc9f7f94786c01f016a99906b8df1fc1 100644
--- a/tutorials/source_en/compile/static_graph.md
+++ b/tutorials/source_en/compile/static_graph.md
@@ -1146,6 +1146,334 @@ The results are as follows:
ret:(Tensor(shape=[1], dtype=Int64, value= [1]), Tensor(shape=[1], dtype=Int64, value= [1]))
```
+### View and In-place Operations
+
+Graph Mode supports view operations and in-place operations on tensors, along with gradient calculations.
+
+#### Supported View Operations
+
+View operations create new tensors that share the same underlying data storage as the original tensors but have different shapes or layouts. In other words, the view operation does not copy data, but rather interprets the existing data from a different perspective, avoiding unnecessary memory allocation and data copying.
+
+`Ascend` devices are supported. When compiling with `mindspore.jit`, both `jit_level=00` and `jit_level=01` are available.
+
+View operations and gradient computations in Graph Mode produce results consistent with PyNative Mode.
+
+Example:
+
+```python
+import numpy as np
+import mindspore
+from mindspore import nn, mint
+from mindspore import grad
+
+
+class Net(nn.Cell):
+ def construct(self, x):
+ out = mint.narrow(x, 1, 1, 2)
+ return out
+
+
+net = Net()
+np_x = np.arange(9).reshape(3, 3).astype(np.float32)
+x = mindspore.tensor(np_x)
+
+pynative_out = net(x)
+pynative_grad_out = grad(net)(x)
+
+net.construct = mindspore.jit(net.construct, backend='ms_backend')
+graph_out = net(x)
+graph_grad_out = grad(net)(x)
+
+assert (graph_out == pynative_out).all()
+assert (graph_grad_out == pynative_grad_out).all()
+```
+
+#### Supported In-place Operations
+
+In-place operations modify the input tensor directly without creating new tensors, reducing memory overhead—especially for high-dimensional data.
+
+The following uses examples related to Tensor and Parameter to illustrate the support of in-place operations and differentiation in Graph Mode.
+
+- Tensor usage
+
+ ```python
+ import mindspore
+ from mindspore import nn
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, x, y):
+ x.add_(y)
+ return x
+
+
+ x = mindspore.tensor(2, dtype=mindspore.int32)
+ y = mindspore.tensor(3, dtype=mindspore.int32)
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend='ms_backend')
+ graph_out = net(x, y)
+ graph_grad_out = grad(net)(x, y)
+ print("graph_out: ", graph_out)
+ print("graph_grad_out: ", graph_grad_out)
+ ```
+
+ Graph Mode output:
+
+ ```text
+ graph_out: 5
+ graph_grad_out: 1
+ ```
+
+ Graph Mode ensure correct gradient computation for in-place operations due to global optimization.
+
+ In PyNative Mode, modifications to the forward tensor will affect backpropagation, which may cause errors in automatic differentiation, so use it with caution.
+
+ ```python
+ import mindspore
+ from mindspore import nn
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, x, y):
+ x.add_(y)
+ return x
+
+
+ x = mindspore.tensor(2, dtype=mindspore.int32)
+ y = mindspore.tensor(3, dtype=mindspore.int32)
+ net = Net()
+ pynative_out = net(x, y)
+ pynative_grad_out = grad(net)(x, y)
+ print("pynative_out: ", pynative_out)
+ print("pynative_grad_out: ", pynative_grad_out)
+ ```
+
+ PyNative Mode error:
+
+ ```text
+ pynative_out: 5
+ RuntimeError: A leaf tensor that requires grad is being used in an inplace operator, InplaceAddExt, which is forbidden!
+ ```
+
+- Parameter usage
+
+ ```python
+ import mindspore
+ from mindspore import nn, context, Parameter, ParameterTuple
+ from mindspore import dtype as mstype
+ from mindspore import ops
+
+
+ class GradOfAllInputsAndParams(nn.Cell):
+ def __init__(self, net):
+ super(GradOfAllInputsAndParams, self).__init__()
+ self.net = net
+ self.params = ParameterTuple(net.trainable_params())
+ self.grad_op = ops.GradOperation(get_all=True, get_by_list=True)
+
+ def construct(self, x, y):
+ gradient_function = self.grad_op(self.net, self.params)
+ return gradient_function(x, y)
+
+
+ class Net(nn.Cell):
+ def __init__(self):
+ super(Net, self).__init__()
+ self.param1 = Parameter(mindspore.tensor([1], dtype=mstype.float32), name="param1")
+ self.param2 = Parameter(mindspore.tensor([1], dtype=mstype.float32), name="param2")
+
+ def construct(self, x, y):
+ out = self.param1 + self.param2 + x + y
+ out = out * x
+ return out.add_(y)
+
+
+ x = mindspore.tensor([1], dtype=mstype.float32)
+ y = mindspore.tensor([2], dtype=mstype.float32)
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend='ms_backend')
+ graph_out = net(x, y)
+ graph_grad_out = GradOfAllInputsAndParams(net)(x, y)
+ print("graph_out: ", graph_out)
+ print("graph_grad_out: ", graph_grad_out)
+ ```
+
+ Output:
+
+ ```text
+ graph_out: [7.]
+ graph_grad_out: ((Tensor(shape=[1], dtype=Float32, value= [ 6.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00])), (Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00])))
+ ```
+
+#### Supported View + In-place Scenarios
+
+Combining view and in-place operations improves memory efficiency and computational speed, especially for large tensors or resource-constrained environments.
+
+- Explicit view + in-place
+
+ Example:
+
+ ```python
+ import numpy as np
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops
+
+
+ class ViewOut(nn.Cell):
+ def __init__(self):
+ super(ViewOut, self).__init__()
+ self.transpose = ops.operations.TransposeView()
+ self.assign = ops.operations.Assign()
+
+ @mindspore.jit
+ def construct(self, x):
+ x = self.transpose(x, (0, 1, 2))
+ self.assign(x, x * 2)
+ return x * 3
+
+
+ x1 = mindspore.tensor(np.array([[[1, 0, 0, 0], [0, 0, 0, 0], [-1, -1, 0, -1]],
+ [[0, -1, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0]]]), mindspore.int32)
+ net = ViewOut()
+ net.construct = mindspore.jit(net.construct, backend='ms_backend')
+ out_graph = net(x1)
+
+ x2 = mindspore.tensor(np.array([[[1, 0, 0, 0], [0, 0, 0, 0], [-1, -1, 0, -1]],
+ [[0, -1, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0]]]), mindspore.int32)
+ x2.transpose((0, 1, 2))
+ x2 += x2
+ z = x2 * 3
+ assert np.allclose(out_graph.asnumpy(), z.asnumpy(), rtol=10e-4, atol=10e-4)
+ ```
+
+- Tensor indexing scenario
+
+ Enabled via `MS_DEV_TENSOR_INDEX_BOOST` (see [Environment Variables](https://chat.deepseek.com/a/chat/s/ba8b920c-4345-435e-bf87-c26bd6a9537b)).
+
+ `Ascend` devices are supported. When compiling with `mindspore.jit`, both `jit_level=00` and `jit_level=01` are available.
+
+ Example:
+
+ ```python
+ import numpy as np
+ import mindspore
+
+
+ @mindspore.jit(capture_mode='ast', jit_level="O0", backend="ms_backend")
+ def func(ms_x):
+ ms_x[slice(0, 1)] = -1
+ return ms_x
+
+
+ np_x = np.arange(2 * 3 * 4).reshape(2, 3, 4)
+ ms_x = mindspore.tensor(np_x)
+
+ ms_output = func(ms_x)
+ np_x[slice(0, 1)] = -1
+ assert np.allclose(np_x, ms_output.asnumpy())
+ ```
+
+#### Gradient Support for View + In-place
+
+- Supported cases
+
+ Gradient propagation relies on node connections, which are affected by view/in-place operations.
+
+ When there are view and in-place operators in the computation graph, if gradients can be correctly propagated on the expression of the graph nodes, the view in-place scenario can be supported in reverse. Otherwise, it is currently not supported, and corresponding error messages will be displayed.
+
+ Example:
+
+ ```python
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops, mint
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, x):
+ y = ops.abs(x)
+ y_viewed = mint.select(y, 0, 0)
+ y_viewed.add_(mindspore.tensor(-1, dtype=mindspore.float32))
+ return y
+
+
+ x = mindspore.tensor([[0, 1], [2, 3]], dtype=mindspore.float32)
+ net = Net()
+ out_expect = grad(net)(x)
+ net.construct = mindspore.jit(net.construct, backend="ms_backend")
+ out_jit = grad(net)(x)
+ assert (out_expect.asnumpy() == out_jit.asnumpy()).all()
+ ```
+
+- Unsupported cases (throw errors)
+
+ 1. Gradient required for non-modified inputs of in-place ops.
+
+ ```python
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops, mint
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, input_tensor):
+ input_abs = ops.abs(input_tensor)
+ m = mint.select(input_abs, 0, 0)
+ n = mint.select(input_abs, 0, 1)
+ m.mul_(n)
+ return input_abs
+
+
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend="ms_backend")
+ out_jit = grad(net)(mindspore.tensor([3, 4]))
+ ```
+
+ Error:
+
+ ```text
+ RuntimeError: When performing an in-place operation on an object generated by a view operation, it is currently not supported to compute gradients for the other inputs of this in-place operator.
+ ```
+
+ Both `m` and `n` are view outputs derived from `input_abs`. In this in-place operation, gradient propagation is required for both inputs `m` and `n`. However, since `n` is not being modified, the current implementation does not support this scenario, and therefore the operation is intercepted with an error.
+
+ 2. Returning view results or depending on view results.
+
+ ```python
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops, mint
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, input_tensor):
+ input_abs = ops.abs(input_tensor)
+ x = mint.select(input_abs, 0, 0)
+ y = mint.select(input_abs, 0, 1)
+ x.add_(2)
+ y.add_(3)
+ return x
+
+
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend="ms_backend")
+ out_jit = grad(net)(mindspore.tensor([3, 4]))
+ ```
+
+ Error:
+
+ ```text
+ RuntimeError: The current view inplace differentiation scenario is not supported.
+ ```
+
+ The network's return value `x` is a view output derived from `input_abs`. The current implementation does not support this scenario, and therefore the operation is intercepted with an error.
+
## Syntax Constraints of Basic Syntaxes
The execution graph in graph mode is converted from source code, and not all Python syntax can support it. The following describes some of the
diff --git a/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md b/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md
index b0519e206848e1457fe5532881edde323fb7b062..87de3e5117c22037165776caa9d3277d81b69ea6 100644
--- a/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md
+++ b/tutorials/source_en/custom_program/operation/op_customopbuilder_function.md
@@ -227,7 +227,7 @@ Here the custom operator is called in the script via `self.my_ops.mul(x, y)`, wh
Run the above script to get the results:
-```txt
+```text
out: 12.0
grads[0]: (Tensor(shape=[], dtype=Float32, value= 6), Tensor(shape=[], dtype=Float32, value= 4))
grads[1]: (Tensor(shape=[], dtype=Float32, value= 6),)
diff --git a/tutorials/source_zh_cn/beginner/introduction.ipynb b/tutorials/source_zh_cn/beginner/introduction.ipynb
index 5ed3cde13283776e8f9ac8c77aac186f43e0ff99..83353916c608310ed417ac56780a58b015229d85 100644
--- a/tutorials/source_zh_cn/beginner/introduction.ipynb
+++ b/tutorials/source_zh_cn/beginner/introduction.ipynb
@@ -67,7 +67,7 @@
"\n",
" - [MindSpore GitHub](https://github.com/mindspore-ai/mindspore):Gitee的MindSpore代码镜像,习惯用GitHub的开发者可以在这里进行MindSpore的学习,查看最新代码实现!\n",
"\n",
- "- **昇思MindSpore 论坛**:我们努力地服务好每一个开发者,在昇思MindSpore中,无论是入门开发者还是高手大咖都能找到知音,共同学习,共同成长!([了解更多](https://www.hiascend.com/forum/forum-0106101385921175002-1.html))"
+ "- **昇思MindSpore 论坛**:我们努力地服务好每一个开发者,在昇思MindSpore中,无论是入门开发者还是高手大咖都能找到知音,共同学习,共同成长!([了解更多](https://discuss.mindspore.cn/))"
]
}
],
diff --git a/tutorials/source_zh_cn/compile/static_graph.md b/tutorials/source_zh_cn/compile/static_graph.md
index 2a683c3807216f745354feba95b033eeb5fd7a94..e01d7ba6a286c75a773cb3f05ca63ab6c005ca70 100644
--- a/tutorials/source_zh_cn/compile/static_graph.md
+++ b/tutorials/source_zh_cn/compile/static_graph.md
@@ -1056,6 +1056,334 @@ print('ret:{}'.format(ret))
ret:(Tensor(shape=[1], dtype=Int64, value= [1]), Tensor(shape=[1], dtype=Int64, value= [1]))
```
+### view和in-place功能
+
+Graph Mode支持对Tensor的view和in-place操作与求导。
+
+#### 支持view操作
+
+view操作是指创建一个新的张量,它与原始张量共享相同的数据存储,但具有不同的形状或排列方式,换句话说view操作不会复制数据,而是通过不同的视角来解释现有的数据,避免了不必要的内存分配和数据复制。
+
+支持`Ascend`设备,使用`mindspore.jit`进行编译时,`jit_level=00`和`01`均支持。
+
+view操作和求导:Graph Mode和PyNative Mode结果一致。
+
+示例如下:
+
+```python
+import numpy as np
+import mindspore
+from mindspore import nn, mint
+from mindspore import grad
+
+
+class Net(nn.Cell):
+ def construct(self, x):
+ out = mint.narrow(x, 1, 1, 2)
+ return out
+
+
+net = Net()
+np_x = np.arange(9).reshape(3, 3).astype(np.float32)
+x = mindspore.tensor(np_x)
+
+pynative_out = net(x)
+pynative_grad_out = grad(net)(x)
+
+net.construct = mindspore.jit(net.construct, backend='ms_backend')
+graph_out = net(x)
+graph_grad_out = grad(net)(x)
+
+assert (graph_out == pynative_out).all()
+assert (graph_grad_out == pynative_grad_out).all()
+```
+
+#### 支持in-place操作
+
+in-place操作是指直接修改输入张量的内容,而不创建新的张量,其优点在于节省内存,尤其是在处理高维数据时,能显著减少额外的内存开销。
+
+下面使用Tensor和Parameter相关用例对Graph Mode支持in-place操作和求导进行说明。
+
+- Tensor使用示例
+
+ ```python
+ import mindspore
+ from mindspore import nn
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, x, y):
+ x.add_(y)
+ return x
+
+
+ x = mindspore.tensor(2, dtype=mindspore.int32)
+ y = mindspore.tensor(3, dtype=mindspore.int32)
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend='ms_backend')
+ graph_out = net(x, y)
+ graph_grad_out = grad(net)(x, y)
+ print("graph_out: ", graph_out)
+ print("graph_grad_out: ", graph_grad_out)
+ ```
+
+ Graph Mode下输出正确结果:
+
+ ```text
+ graph_out: 5
+ graph_grad_out: 1
+ ```
+
+ Graph Mode下,由于可以获取更多全局的信息,可以保证in-place操作的求导正确性。
+
+ PyNative Mode下,对前向张量的修改会影响反向传播,在自动求导中可能引发错误,所以要谨慎使用。
+
+ ```python
+ import mindspore
+ from mindspore import nn
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, x, y):
+ x.add_(y)
+ return x
+
+
+ x = mindspore.tensor(2, dtype=mindspore.int32)
+ y = mindspore.tensor(3, dtype=mindspore.int32)
+ net = Net()
+ pynative_out = net(x, y)
+ pynative_grad_out = grad(net)(x, y)
+ print("pynative_out: ", pynative_out)
+ print("pynative_grad_out: ", pynative_grad_out)
+ ```
+
+ PyNative Mode下求导报错如下:
+
+ ```text
+ pynative_out: 5
+ RuntimeError: A leaf tensor that requires grad is being used in an inplace operator, InplaceAddExt, which is forbidden!
+ ```
+
+- Parameter使用示例
+
+ ```python
+ import mindspore
+ from mindspore import nn, context, Parameter, ParameterTuple
+ from mindspore import dtype as mstype
+ from mindspore import ops
+
+
+ class GradOfAllInputsAndParams(nn.Cell):
+ def __init__(self, net):
+ super(GradOfAllInputsAndParams, self).__init__()
+ self.net = net
+ self.params = ParameterTuple(net.trainable_params())
+ self.grad_op = ops.GradOperation(get_all=True, get_by_list=True)
+
+ def construct(self, x, y):
+ gradient_function = self.grad_op(self.net, self.params)
+ return gradient_function(x, y)
+
+
+ class Net(nn.Cell):
+ def __init__(self):
+ super(Net, self).__init__()
+ self.param1 = Parameter(mindspore.tensor([1], dtype=mstype.float32), name="param1")
+ self.param2 = Parameter(mindspore.tensor([1], dtype=mstype.float32), name="param2")
+
+ def construct(self, x, y):
+ out = self.param1 + self.param2 + x + y
+ out = out * x
+ return out.add_(y)
+
+
+ x = mindspore.tensor([1], dtype=mstype.float32)
+ y = mindspore.tensor([2], dtype=mstype.float32)
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend='ms_backend')
+ graph_out = net(x, y)
+ graph_grad_out = GradOfAllInputsAndParams(net)(x, y)
+ print("graph_out: ", graph_out)
+ print("graph_grad_out: ", graph_grad_out)
+ ```
+
+ 输出正确结果:
+
+ ```text
+ graph_out: [7.]
+ graph_grad_out: ((Tensor(shape=[1], dtype=Float32, value= [ 6.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00])), (Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00]), Tensor(shape=[1], dtype=Float32, value= [ 1.00000000e+00])))
+ ```
+
+#### 支持view inplace场景
+
+合理结合view操作和in-place操作,能够显著提升内存效率和优化计算速度,适用于处理大尺寸张量、部署资源受限的环境、计算密集型操作等场景。下面介绍Graph Mode支持的view inplace场景。
+
+- 显式的view inplace场景
+
+ 示例如下:
+
+ ```python
+ import numpy as np
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops
+
+
+ class ViewOut(nn.Cell):
+ def __init__(self):
+ super(ViewOut, self).__init__()
+ self.transpose = ops.operations.TransposeView()
+ self.assign = ops.operations.Assign()
+
+ @mindspore.jit
+ def construct(self, x):
+ x = self.transpose(x, (0, 1, 2))
+ self.assign(x, x * 2)
+ return x * 3
+
+
+ x1 = mindspore.tensor(np.array([[[1, 0, 0, 0], [0, 0, 0, 0], [-1, -1, 0, -1]],
+ [[0, -1, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0]]]), mindspore.int32)
+ net = ViewOut()
+ net.construct = mindspore.jit(net.construct, backend='ms_backend')
+ out_graph = net(x1)
+
+ x2 = mindspore.tensor(np.array([[[1, 0, 0, 0], [0, 0, 0, 0], [-1, -1, 0, -1]],
+ [[0, -1, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0]]]), mindspore.int32)
+ x2.transpose((0, 1, 2))
+ x2 += x2
+ z = x2 * 3
+ assert np.allclose(out_graph.asnumpy(), z.asnumpy(), rtol=10e-4, atol=10e-4)
+ ```
+
+- Tensor索引场景
+
+ 开启`MS_DEV_TENSOR_INDEX_BOOST`使能后,将使用view算子和in-place算子实现Tensor索引功能,提升索引操作的执行效率,具体描述请参考[环境变量](https://www.mindspore.cn/docs/zh-CN/master/api_python/env_var_list.html#%E5%9B%BE%E7%BC%96%E8%AF%91%E6%89%A7%E8%A1%8C)。
+
+ 支持`Ascend`设备,使用`mindspore.jit`进行编译时,`jit_level=00`和`01`均支持。
+
+ 示例如下:
+
+ ```python
+ import numpy as np
+ import mindspore
+
+
+ @mindspore.jit(capture_mode='ast', jit_level="O0", backend="ms_backend")
+ def func(ms_x):
+ ms_x[slice(0, 1)] = -1
+ return ms_x
+
+
+ np_x = np.arange(2 * 3 * 4).reshape(2, 3, 4)
+ ms_x = mindspore.tensor(np_x)
+
+ ms_output = func(ms_x)
+ np_x[slice(0, 1)] = -1
+ assert np.allclose(np_x, ms_output.asnumpy())
+ ```
+
+#### 有限支持view inplace场景反向
+
+- 支持view inplace场景反向
+
+ 在Graph Mode的自动微分中,梯度的传递依赖于节点的连边关系,而view和in-place类算子作为原地操作类算子,会影响节点的连边关系,进而影响梯度的传递。
+
+ 当计算图中存在view和in-place算子时,如果可以在图节点的表达上正确传播梯度,也就可以支持view inplace场景反向,否则当前支持不了,会存在相应的报错信息。
+
+ 示例如下:
+
+ ```python
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops, mint
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, x):
+ y = ops.abs(x)
+ y_viewed = mint.select(y, 0, 0)
+ y_viewed.add_(mindspore.tensor(-1, dtype=mindspore.float32))
+ return y
+
+
+ x = mindspore.tensor([[0, 1], [2, 3]], dtype=mindspore.float32)
+ net = Net()
+ out_expect = grad(net)(x)
+ net.construct = mindspore.jit(net.construct, backend="ms_backend")
+ out_jit = grad(net)(x)
+ assert (out_expect.asnumpy() == out_jit.asnumpy()).all()
+ ```
+
+- view inplace场景反向异常
+
+ 1. 当in-place算子中不进行修改的输入需要传递梯度时,因当前方案不支持将抛出`RuntimeError`异常。示例如下:
+
+ ```python
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops, mint
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, input_tensor):
+ input_abs = ops.abs(input_tensor)
+ m = mint.select(input_abs, 0, 0)
+ n = mint.select(input_abs, 0, 1)
+ m.mul_(n)
+ return input_abs
+
+
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend="ms_backend")
+ out_jit = grad(net)(mindspore.tensor([3, 4]))
+ ```
+
+ 结果报错如下:
+
+ ```text
+ RuntimeError: When performing an in-place operation on an object generated by a view operation, it is currently not supported to compute gradients for the other inputs of this in-place operator.
+ ```
+
+ `m`和`n`均是`input_abs`的view输出,in-place操作的输入`m`和`n`均存在梯度传递,`n`不进行修改,当前方案不支持,故进行拦截报错。
+
+ 2. 当网络返回值是view操作的结果或者依赖view操作的结果时,因当前方案不支持将抛出`RuntimeError`异常。示例如下:
+
+ ```python
+ import mindspore
+ import mindspore.nn as nn
+ from mindspore import ops, mint
+ from mindspore import grad
+
+
+ class Net(nn.Cell):
+ def construct(self, input_tensor):
+ input_abs = ops.abs(input_tensor)
+ x = mint.select(input_abs, 0, 0)
+ y = mint.select(input_abs, 0, 1)
+ x.add_(2)
+ y.add_(3)
+ return x
+
+
+ net = Net()
+ net.construct = mindspore.jit(net.construct, backend="ms_backend")
+ out_jit = grad(net)(mindspore.tensor([3, 4]))
+ ```
+
+ 结果报错如下:
+
+ ```text
+ RuntimeError: The current view inplace differentiation scenario is not supported.
+ ```
+
+ 网络返回值`x`是`input_abs`的view输出,当前方案不支持,故进行拦截报错。
+
## 基础语法的语法约束
图模式下的执行图是从源码转换而来,并不是所有的Python语法都能支持。下面介绍在基础语法下存在的一些语法约束。更多网络编译问题可见[网络编译](https://www.mindspore.cn/docs/zh-CN/master/faq/network_compilation.html)。
diff --git a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md
index 5a5be19907f69042d546270bed4fc877c68f80d7..07a50f209c8c65a6b46b7f1b5454011ab830d635 100644
--- a/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md
+++ b/tutorials/source_zh_cn/custom_program/operation/op_customopbuilder_function.md
@@ -227,7 +227,7 @@ print('grads[1]:', grads[1])
运行以上脚本,获得结果:
-```txt
+```text
out: 12.0
grads[0]: (Tensor(shape=[], dtype=Float32, value= 6), Tensor(shape=[], dtype=Float32, value= 4))
grads[1]: (Tensor(shape=[], dtype=Float32, value= 6),)
diff --git a/tutorials/source_zh_cn/cv/vit.ipynb b/tutorials/source_zh_cn/cv/vit.ipynb
index a4f4c42ac18dabca4c7565c2804aa658865f0211..5434e1b25a674721c1fa38f4f6132fe5e7753449 100644
--- a/tutorials/source_zh_cn/cv/vit.ipynb
+++ b/tutorials/source_zh_cn/cv/vit.ipynb
@@ -47,7 +47,7 @@
"\n",
"下面将通过代码实例来详细解释基于ViT实现ImageNet分类任务。\n",
"\n",
- "> 注意,本教程在CPU上运行时间过长,不建议使用CPU运行。"
+ "> 本教程在CPU上运行时间过长,不建议使用CPU运行。"
]
},
{