From 710f12106abd7997396aeaa3256a2765213ea9c3 Mon Sep 17 00:00:00 2001 From: Xinrui Chen Date: Thu, 18 Sep 2025 21:19:47 +0800 Subject: [PATCH] [MindFormers] Update overview.md --- .../docs/source_en/introduction/overview.md | 104 +++++++++++++----- .../source_zh_cn/introduction/overview.md | 100 ++++++++++++----- 2 files changed, 148 insertions(+), 56 deletions(-) diff --git a/docs/mindformers/docs/source_en/introduction/overview.md b/docs/mindformers/docs/source_en/introduction/overview.md index 171efb5bad..d681e390bf 100644 --- a/docs/mindformers/docs/source_en/introduction/overview.md +++ b/docs/mindformers/docs/source_en/introduction/overview.md @@ -8,50 +8,96 @@ The overall architecture of MindSpore Transformers is as follows: ![/overall_architecture](./images/overall_architecture.png) -The northbound API of MindSpore Transformers supports users integrating into their own training and inference platforms or open-source components, supporting Ascend's own technology stack while also actively embracing the open-source community, as follows: +MindSpore Transformers supports both Ascend's proprietary technology stack and actively embraces the open-source community. Users may integrate it into their own training and inference platforms or open-source components, as detailed below: -1. Training platforms: MindCluster, third-party platforms -2. Service components: vLLM -3. Communities: Modelers, Hugging Face +1. Training platforms: [MindCluster](http://hiascend.com/software/mindcluster), third-party platforms +2. Service components: [vLLM](https://www.mindspore.cn/mindformers/docs/en/master/guide/deployment.html) +3. Communities: [Modelers](https://modelers.cn/), [Hugging Face](https://huggingface.co/) MindSpore Transformers Southbound is based on MindSpore+Ascend's large-scale model technology stack, leveraging the MindSpore framework combined with CANN to optimize Ascend hardware for compatibility, providing a high-performance model training and inference experience. -MindSpore Transformers is mainly divided into the following modules: +MindSpore Transformers is primarily divided into the following modules: -1. Large model training and inference unified scheduling API: Provides a unified launcher script msrun_launcher.sh to uniformly execute the distributed training and inference processes of all models within the scheduling suite. -2. Registration/configuration layer: Implements a factory class by interface type to enable the high-level interface layer to initialize the corresponding task interface and model interface according to the configuration. -3. Large model library: Implements a high-performance large model library and basic Transformer interfaces, supporting both user-configurable custom model construction and custom development to accommodate various development scenarios. -4. Dataset: Implements data loading encapsulation for large model training and fine-tuning tasks, natively supporting Hugging Face Datasets, Megatron datasets, and MindSpore's native MindRecord data support. -5. Training Components: Implements the basic interfaces for the training process, including learning rate strategies, optimizers, training callbacks, and TrainOneStepWrapper interfaces. -6. Tool Layer: Independent tool scripts currently provide data preprocessing, Hugging Face weight conversion, and benchmarking tool scripts. -7. DFX (Design for X): Implements high-availability features such as fault diagnosis and fault monitoring to reduce the cost of recovering from training failures. +1. Unified Training and Inference Scheduling: Provides the launch script `msrun_launcher.sh` to centrally execute and schedule the distributed training and inference processes for all models within the suite. +2. Registration/Configuration Layer: Implements factory-like functionality by interface type, enabling higher-level interface layers to initialise corresponding task interfaces and model interfaces based on configuration. +3. Large Model Library: Offers a high-performance large model repository alongside foundational Transformer interfaces. This supports both user-configured model construction and custom development, catering to diverse development scenarios. +4. Dataset: Encapsulates data loading interfaces for large model training and fine-tuning tasks, supporting Hugging Face datasets, Megatron datasets, and MindSpore's MindRecord datasets. +5. Training Components: Provides foundational interfaces for training workflows, including learning rate strategies, optimisers, training callbacks, and training wrapper interfaces. +6. Utility Layer: Offers data preprocessing tools, Hugging Face weight conversion utilities, and evaluation scripting tools. +7. DFX (Design for X): Implements high-availability features such as fault diagnosis and monitoring, reducing the cost of recovery from training failures. ## Model Architecture -MindSpore Transformers has adopted a brand-new model architecture in version 1.6.0 and later. In the previous architecture (labeled as Legacy), each model had its own set of model code, making maintenance and optimization challenging. The new architecture (labeled as Mcore) employs a layered abstraction and modular implementation for large-scale general-purpose Transformer architectures, encompassing lower-level foundational layers such as Linear, Embedding, and Norm, as well as upper-level components like MoELayer, TransformerBlock, and the unified model interface GPTModel (General PreTrained Model). All modular interfaces are deeply optimized for parallelism leveraging MindSpore’s parallel computing capabilities, providing high-performance, out-of-the-box interfaces. All highly encapsulated and integrated interfaces support flexible combination through the ModuleSpec mechanism for model construction. +MindSpore Transformers adopted a completely new model architecture after version 1.6.0. The original architecture (labelled Legacy) required separate model code implementations for each model, making maintenance and optimisation challenging. The new architecture (designated as Mcore) employs layered abstraction and modular implementation for large models based on the general Transformer architecture. This encompasses foundational layers such as Linear, Embedding, and Norm, alongside higher-level components including MoELayer, TransformerBlock, and the unified model interface GPTModel (General PreTrained Model). All modular interfaces leverage MindSpore's parallel capabilities for deep parallel optimisation, providing high-performance, ready-to-use interfaces externally. This supports flexible model construction through the ModuleSpec mechanism. ## Training Capabilities -MindSpore Transformer training offers a range of efficient and user-friendly features, as well as ecosystem collaboration capabilities, to assist users in achieving simplicity, efficiency, and stability during the pre-training and fine-tuning phases of large models. External capabilities include: +MindSpore Transformer delivers efficient, stable, and user-friendly large-model training capabilities, covering both pre-training and fine-tuning scenarios while balancing performance and ecosystem compatibility. Core capabilities include: -- Multi-dimensional hybrid parallelism, including data parallelism, model parallelism, optimizer parallelism, pipeline parallelism, sequence parallelism, context parallelism, and MoE expert parallelism; -- Support for directly loading Megatron-LM multi-source mixed datasets during the pre-training phase, avoiding data migration issues across platforms and frameworks; -- In the fine-tuning phase, it integrates Hugging Face ecosystem capabilities, supporting the use of Hugging Face SFT datasets, Hugging Face Tokenizer for data preprocessing, reading Hugging Face model configurations to instantiate models, and loading native Hugging Face Safetensors weights. Combined with zero-code, configuration-enabled low-parameter fine-tuning capabilities, it achieves efficient and convenient fine-tuning; -- Supports automatic weight splitting and loading in distributed environments, eliminating the need for manual weight conversion during distributed strategy switching debugging, cluster scaling, and other scenarios, thereby facilitating efficient debugging and training; -- Provides user-friendly and highly available features such as training status monitoring, fault recovery, anomaly skipping, and resume training from breakpoints, supporting testability, maintainability, and reliability during pre-training/fine-tuning processes; -- Encapsulates high-performance basic interfaces, with interface design aligned with Megatron-LM and computational accuracy meeting standards. Combined with tutorials and documentation related to model migration and accuracy comparison, as well as the Cell-level dump tool provided by the Ascend toolchain, it achieves low-threshold, high-efficiency model migration and construction. +**Multi-dimensional hybrid parallel training** + +Supports flexible combinations of multiple parallelization strategies, including data parallelism, model parallelism, optimiser parallelism, pipeline parallelism, sequence parallelism, context parallelism, and MoE expert parallelism, enabling efficient distributed training for large-scale models. + +**Support for Mainstream Open-Source Ecosystems** + +Pre-training phase: Direct loading of Megatron-LM multi-source hybrid datasets is supported, reducing data migration costs across platforms and frameworks. + +Fine-tuning phase: Deep integration with the Hugging Face ecosystem, supporting: + +- Utilisation of Hugging Face SFT datasets; +- Data preprocessing via Hugging Face Tokenizer; +- Model instantiation by reading Hugging Face model configurations; +- Loading native Hugging Face Safetensors weights; + +Enables efficient, streamlined fine-tuning through zero-code, configuration-driven low-parameter fine-tuning capabilities. + +**Model Weight Usability** + +Supports automatic weight partitioning and loading in distributed environments, eliminating the need for manual weight conversion. This significantly reduces debugging complexity during distributed strategy switching and cluster scaling operations, thereby enhancing training agility. + +**High Availability Training Assurance** + +Provides training status monitoring, rapid fault recovery, anomaly skipping, and resume-from-breakpoint capabilities. Enhances testability, maintainability, and reliability of training tasks, ensuring stable operation during extended training cycles. + +**Low-Threshold Model Migration** + +- Encapsulates high-performance foundational interfaces aligned with Megatron-LM design; +- Provides model migration guides and accuracy comparison tutorials; +- Supports Ascend toolchain's Cell-level dump debugging capabilities; +- Enables low-threshold, high-efficiency model migration and construction. ## Inference Capabilities -MindSpore Transformers inference integrates with third-party open-source components, providing developers with richer inference deployment, quantization, and evaluation capabilities: +MindSpore Transformers establishes an inference framework centred on ‘northbound ecosystem integration and southbound deep optimisation’. By leveraging open-source components, it delivers efficient and user-friendly deployment, quantisation, and evaluation capabilities, thereby accelerating the development and application of large-model inference: + +**Northbound Ecosystem Integration** + +- **Hugging Face Ecosystem Reuse** + + Supports direct loading of Hugging Face open-source model configuration files, weights, and tokenisers, enabling configuration-ready, one-click inference initiation to lower migration and deployment barriers. + +- **Integration with vLLM Service Framework** + + Supports integration with the vLLM service framework for service-oriented inference deployment. Supports core features including Continuous Batch, Prefix Cache, and Chunked Prefill, significantly enhancing throughput and resource utilisation. + +- **Support for Quantisation Inference** + + Leveraging quantisation algorithms provided by the MindSpore Golden-Stick quantisation suite, Legacy models already support A16W8, A8W8, and A8W4 quantisation inference; Mcore models are expected to support A8W8 and A8W4 quantisation inference in the next release. + +- **Support for Open-Source Benchmark Evaluation** + + Utilising the AISbench evaluation suite, models deployed via vLLM can be assessed across over 20 mainstream benchmarks including CEval, GSM8K, and AIME. + +**Southbound Deep Optimization** + +- **Multi-level Pipeline Operator Dispatch** + + Leveraging MindSpore framework runtime capabilities, operator scheduling is decomposed into three pipeline tasks—InferShape, Resize, and Launch—on the host side. This fully utilises host multi-threading parallelism to enhance operator dispatch efficiency and reduce inference latency. + +- Dynamic-static hybrid execution mode -- Supports direct loading and use of Hugging Face open-source configurations, weights, and tokenizers, enabling one-click inference startup; -- Supports integration with vLLM service frameworks, enabling service-based inference deployment. Supports service-based features such as Continuous Batch, Prefix Cache, and Chunked Prefill; -- Through the MindSpore Golden-Stick quantization suite, Legacy models can achieve A16W8, A8W8, and A8W4 quantization inference, while Mcore models are expected to support A8W8 and A8W4 quantization inference in the next version; -- Through the AISbench evaluation suite, MindSpore Transformers models integrated with vLLM service-oriented architecture can achieve CEval, GSM8K, AIME, and other 20+ mainstream benchmark evaluations. + Default PyNative programming mode combined with JIT compilation compiles models into static computation graphs for accelerated inference. Supports one-click switching to PyNative dynamic graph mode for development and debugging. -The Southbound API of MindSpore Transformers relies on the inference optimization capabilities provided by the MindSpore framework to achieve high-performance inference in southbound: +- Ascend high-performance operator acceleration -- Relying on the multi-level pipeline dispatch feature provided by the framework Runtime, the operator scheduling is split into three tasks—InferShape, Resize, and Launch—on the host side for pipeline dispatch, fully utilizing the host's multi-threading resources to improve operator dispatch efficiency and achieve inference acceleration; -- By default, it uses the PyNative programming mode + JIT (just-in-time) compilation technology to compile the model into a static computation graph for inference acceleration. It can also be switched to the PyNative dynamic graph mode with a single click for convenient development and debugging; -- MindSpore Transformers supports the use of ACLNN, ATB, and MindSpore-provided inference acceleration/fusion operators to achieve more efficient inference performance on the Ascend platform. \ No newline at end of file + Supports deployment of inference acceleration and fusion operators provided by ACLNN, ATB, and MindSpore, achieving more efficient inference performance on Ascend platforms. \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/introduction/overview.md b/docs/mindformers/docs/source_zh_cn/introduction/overview.md index 223366e233..bebf7dfc45 100644 --- a/docs/mindformers/docs/source_zh_cn/introduction/overview.md +++ b/docs/mindformers/docs/source_zh_cn/introduction/overview.md @@ -8,50 +8,96 @@ MindSpore Transformers 整体架构如下: ![/overall_architecture](./images/overall_architecture.png) -MindSpore Transformers 北向支持用户集成在自有训推平台或者开源组件中,支持昇腾自有技术栈外也积极拥抱开源社区,具体如下: +MindSpore Transformers 北向既支持昇腾自有技术栈,也积极拥抱开源社区。用户可将其集成在自有训推平台或者开源组件中,具体如下: -1. 训练平台:MindCluster、第三方平台 -2. 服务化组件:vLLM -3. 社区:魔乐社区、Hugging Face +1. 训练平台:[MindCluster](http://hiascend.com/software/mindcluster)、第三方平台 +2. 服务化组件:[vLLM](https://www.mindspore.cn/mindformers/docs/zh-CN/master/guide/deployment.html) +3. 社区:[魔乐社区](https://modelers.cn/)、[Hugging Face](https://huggingface.co/) MindSpore Transformers 南向基于昇思+昇腾的大模型技术栈,利用昇思框架结合 CANN 对昇腾硬件进行亲和优化,提供高性能的模型训推体验。 -对于 MindSpore Transformers 本身,主要分为如下模块: +MindSpore Transformers 主要分为如下模块: -1. 大模型训练、推理统一调度入口层:提供统一的运行工具脚本 msrun_launcher.sh,统一执行与调度套件内所有模型的分布式训推流程。 +1. 训推统一调度:提供启动脚本 `msrun_launcher.sh`,统一执行与调度套件内所有模型的分布式训推流程。 2. 注册/配置层:按接口类型实现类工厂,使能高阶接口层按配置初始化对应的任务接口、模型接口。 -3. 大模型模型库:实现高性能大模型库以及基础 Transformer 接口,即可支持用户配置化构建自有模型,也可自定义开发,可满足不同开发场景。 -4. 数据集:实现大模型训练、微调任务的数据加载封装,可原生支持 Hugging Face Datasets、Megatron 数据集以及 MindSpore 原生 MindRecord 的数据支持。 -5. 训练组件:实现训练流程的基础接口,包含学习率策略、优化器、训练回调以及 TrainOneStepWrapper 等接口。 -6. 工具层:独立工具脚本,目前提供数据预处理、Hugging Face 权重互转、评测工具脚本。 +3. 大模型模型库:提供高性能大模型库以及基础 Transformer 接口,既可支持用户配置化构建自有模型,也可自定义开发,可满足不同开发场景。 +4. 数据集:封装大模型训练、微调任务的数据加载接口,可支持 Hugging Face 数据集、Megatron 数据集以及 MindSpore 的 MindRecord 数据集。 +5. 训练组件:提供训练流程的基础接口,包含学习率策略、优化器、训练回调以及训练包装接口等。 +6. 工具层:提供数据预处理、Hugging Face 权重互转、评测工具脚本。 7. DFX(Design for X):实现故障诊断、故障监测等高可用特性,降低训练故障恢复成本。 ## 模型架构 -MindSpore Transformers 在 1.6.0 版本之后应用了全新的模型架构,原有架构(标记为 Legacy)各模型单独有一份模型代码,较难维护与优化。新架构(标记为 Mcore)对通用 Transformer 架构大模型进行分层抽象与模块化实现,涉及下层的基础层,如 Linear、Embedding、Norm 等,以及上层的 MoELayer、TransformerBlock 和模型统一接口 GPTModel(General PreTrained Model)等。所有模块化接口基于 MindSpore 提供的并行能力,进行了深度并行优化,对外提供开箱即用的高性能接口。所有高度封装集成的接口支持通过 ModuleSpec 机制自由组合进行模型搭建。 +MindSpore Transformers 在 1.6.0 版本之后应用了全新的模型架构,原有架构(标记为 Legacy)各模型单独实现一份模型代码,较难维护与优化。新架构(标记为 Mcore)对通用 Transformer 架构大模型进行分层抽象与模块化实现,涉及下层的基础层,如 Linear、Embedding、Norm 等,以及上层的 MoELayer、TransformerBlock 和模型统一接口 GPTModel(General PreTrained Model)等。所有模块化接口基于 MindSpore 提供的并行能力,进行了深度并行优化,对外提供开箱即用的高性能接口,支持通过 ModuleSpec 机制自由组合进行模型搭建。 ## 训练能力 -MindSpore Transformer 训练提供了一系列高效易用特性以及生态协同能力,协助用户在大模型的预训练和微调环节实现简洁易用、高效稳定。对外能力涵盖: +MindSpore Transformer 提供高效、稳定、易用的大模型训练能力,覆盖预训练和微调场景,兼顾性能与生态兼容性。核心能力包括: -- 多维混合并行,包含数据并行、模型并行、优化器并行、流水线并行、序列并行、上下文并行、MoE 专家并行等; -- 预训练阶段支持直接加载 Megatron-LM 多源混合数据集,避免跨平台和框架的数据集迁移问题; -- 微调阶段接入 Hugging Face 生态能力,支持使用 Hugging Face SFT 数据集,支持使用 Hugging Face Tokenizer 实现数据预处理,支持读取 Hugging Face 模型配置实例化模型,支持加载原生 Hugging Face Safetensors 权重。配合零代码、配置化使能低参微调的能力,实现高效便捷微调; -- 支持分布式权重自动切分加载,在分布式策略切换调试、集群扩缩容等场景下,无需手动转换权重,助力高效调试与训练; -- 提供训练状态监控、故障快恢、异常跳过、断点续训等易用性和高可用特性,支持预训练/微调过程中的可测试性、可维护性和可靠性; -- 封装了高性能基础接口,接口设计与 Megatron-LM 对齐,计算精度对齐达标。结合模型迁移和精度比对相关的教程文档,以及昇腾工具链提供的 Cell 级 dump 工具,实现低门槛、高效率的模型迁移与构建。 +**多维混合并行训练** + +支持数据并行、模型并行、优化器并行、流水线并行、序列并行、上下文并行及 MoE 专家并行等多种并行策略的灵活组合,满足大规模模型的高效分布式训练。 + +**主流开源生态支持** + +预训练阶段:支持直接加载 Megatron-LM 多源混合数据集,减少跨平台和框架的数据集迁移成本; + +微调阶段:深度接入 Hugging Face 生态,支持: + +- 使用 Hugging Face SFT 数据集; +- 使用 Hugging Face Tokenizer 进行数据预处理; +- 读取 Hugging Face 模型配置实例化模型; +- 加载原生 Hugging Face Safetensors 权重; + +配合零代码、配置化使能低参微调的能力,实现高效便捷微调。 + +**模型权重易用性** + +支持分布式权重自动切分与加载,无需手动转换权重,显著降低在分布式策略切换、集群扩缩容等场景下的调试复杂度,提升训练敏捷性。 + +**训练高可用保障** + +提供训练状态监控、故障快恢、异常跳过、断点续训等特性,提升训练任务的可测试性、可维护性和可靠性,保障长周期训练稳定运行。 + +**模型低门槛迁移** + +- 封装了高性能基础接口,接口设计与 Megatron-LM 对齐; +- 提供模型迁移指南和精度比对教程; +- 支持昇腾工具链 Cell 级 dump 调试能力; +- 实现低门槛、高效率的模型迁移与构建。 ## 推理能力 -MindSpore Transformers 推理北向对接第三方开源组件,为开发者提供更丰富的推理部署、量化和评测能力: +MindSpore Transformers 构建了“北向生态融合、南向深度优化”的推理体系,配合开源组件提供高效易用的部署、量化、评测能力,助力大模型推理的开发与应用: + +**北向生态融合** + +- **Hugging Face 生态复用** + + 支持直接加载使用 Hugging Face 开源模型的配置文件、权重和 Tokenizer,实现配置即用、一键启动推理,降低迁移与部署门槛。 + +- **对接 vLLM 服务化框架** + + 支持对接 vLLM 服务化框架,实现推理服务化部署。支持 Continuous Batch、Prefix Cache、Chunked Prefill 等核心特性,显著提升吞吐与资源利用率。 + +- **支持量化推理** + + 依托 MindSpore Golden-Stick 量化套件提供的量化算法,Legacy 模型已支持 A16W8、A8W8、A8W4 量化推理;Mcore 模型预计在下一版本中支持 A8W8 与 A8W4 量化推理。 + +- **支持开源榜单评测** + + 通过 AISbench 评测套件,可对基于 vLLM 部署的模型进行评测,覆盖 CEval、GSM8K、AIME 等 20+ 主流榜单。 + +**南向深度优化** + +- **算子多级流水下发** + + 依靠 MindSpore 框架 Runtime 运行时能力,在 Host 侧将算子调度拆分成 InferShape、Resize 和 Launch 三个任务进行流水线式下发,充分发挥 Host 多线程并行优势,提升算子下发效率,降低推理延迟。 + +- 动静结合的执行模式 -- 支持直接加载使用 Hugging Face 开源配置、权重和 tokenizer,一键启动推理; -- 支持对接 vLLM 服务化框架,实现推理服务化部署。支持 Continuous Batch、Prefix Cache、Chunked Prefill 等服务化特性; -- 通过 MindSpore Golden-Stick 量化套件,Legacy模型可以实现A16W8、A8W8、A8W4量化推理,Mcore 模型预计下版本支持A8W8、A8W4量化推理; -- 通过 AISbench 评测套件,接入 vLLM 服务化的 MindSpore Transformers 模型,可以实现CEval、GSM8K、AIME 等 20+ 主流榜单评测。 + 默认采用 PyNative 编程模式 + JIT 即时编译技术,将模型编译成静态计算图进行推理加速;同时支持一键切换至 PyNative 动态图模式便于开发调试。 -南向依靠 MindSpore 框架提供的推理优化能力,实现高性能推理: +- 昇腾高性能算子加速 -- 依靠框架 Runtime 运行时提供的多级流水下发特性,在 host 侧将算子调度拆分成 InferShape、Resize 和 Launch 三个任务流水下发,充分利用 host 多线程资源,提升算子下发效率,从而实现推理加速; -- 默认采用 PyNative 编程模式 + JIT 即时编译技术,将模型编译成静态计算图进行推理加速,也可以一键切换 PyNative 动态图模式便于开发调试; -- MindSpore Transformers 支持使用 ACLNN、ATB和 MindSpore 提供的推理加速/融合算子,在昇腾底座上实现更加高效的推理性能。 + 支持使用 ACLNN、ATB和 MindSpore 提供的推理加速与融合算子,在昇腾底座上实现更加高效的推理性能。 \ No newline at end of file -- Gitee