# MindSpeed-MM **Repository Path**: Yulv-git/MindSpeed-MM ## Basic Information - **Project Name**: MindSpeed-MM - **Description**: 华为昇腾面向大规模分布式训练的多模态大模型套件，支撑多模态生成、多模态理解。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://gitee.com/ascend/MindSpeed-MM - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 242 - **Created**: 2025-04-14 - **Last Updated**: 2025-04-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

MindSpeed-MM是面向大规模分布式训练的昇腾多模态大模型套件，同时支持多模态生成及多模态理解，旨在为华为 [昇腾芯片](https://www.hiascend.com/) 提供端到端的多模态训练解决方案, 包含预置业界主流模型，数据工程，分布式训练及加速，预训练、微调、在线推理任务等特性。 --- ## 🔥🔥🔥Latest News * [Apr. 3, 2025]: 🚀 MindSpeed-MM支持Qwen2.5VL-32B模型【Prototype】 * [Mar. 27, 2025]: 🚀 MindSpeed-MM支持Wan2.1-1.3B/14B模型【Prototype】 * [Mar. 26, 2025]: 🚀 MindSpeed-MM支持Qwen2.5VL-3B/7B/72B模型【Prototype】 * [Feb. 20, 2025]: 🚀 MindSpeed-MM支持InternVL2.5-78B模型【Prototype】 * [Feb. 18, 2025]: 🚀 MindSpeed-MM支持HunyuanVideo模型 * [Feb. 17, 2025]: 🔥 MindSpeed-MM支持Mindspeed-Core & Megatron 0.8.0版本 * [Feb. 15, 2025]: 🚀 MindSpeed-MM支持Sana模型 * [Feb. 06, 2025]: 🚀 MindSpeed-MM支持OpenSoraPlan 1.3模型PP与VPP * [Jan. 24, 2025]: 🚀 MindSpeed-MM支持CogVideoX 1.5 5B模型 * [Jan. 22, 2025]: 🎉 MindSpeed-MM支持Qwen2VL视频模态 * [Jan. 16, 2025]: 🎉 MindSpeed-MM支持wfvae训练 * [Dec. 30, 2024]: 🔥 MindSpeed-MM版本1.0.0发布 * [Dec. 19, 2024]: 🎉 MindSpeed-MM生成类模型支持分布式推理 * [Dec. 16, 2024]: 🚀 MindSpeed-MM支持Qihoo-T2X模型 * [Dec. 05, 2024]: 🎉 MindSpeed-MM理解类模型支持Lora微调 * [Dec. 03, 2024]: 🚀 MindSpeed-MM支持SD3.5模型 * [Nov. 30, 2024]: 🎉 MindSpeed-MM支持多模态理解测评 * [Nov. 22, 2024]: 🚀 MindSpeed-MM支持CogVideoX-5B-t2v & i2v模型 * [Nov. 13, 2024]: 🚀 MindSpeed-MM支持OpenSoraPlan 1.3-i2v模型 * [Nov. 06, 2024]: 🚀 MindSpeed-MM支持FLUX模型 * [Oct. 30, 2024]: 🚀 MindSpeed-MM支持OpenSoraPlan 1.3-t2v模型 * [Oct. 21, 2024]: 🚀 MindSpeed-MM支持InternVL2-8B、以及Qwen2VL-7B模型 * [Oct. 16, 2024]: 🌱 MindSpeed-MM首版本1.0.RC3发布 > 注： **Prototype**特性未经过充分验证，可能存在不稳定和bug问题，**beta**表示非商用特性 --- ## 已支持特性概览 | 模型 \ 特性 | [TP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/tensor-parallel.md) | [TP-SP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/sequence-parallel.md) | [VPP](docs/features/virtual_pipeline_parallel.md) | [PP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/pipeline-parallel.md) | CP | [Distributed Optimizer](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/distributed-optimizer.md) | [Recomputation](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/recomputation.md) | [LoRA](./docs/features/lora_finetune.md) | |:-------------------:|:------:|:------:|:------:|:---------------------------------------------------------------------------------------:|:------:|:------:|:------:|:------:| | Wan2.1 | | | | | CP (Ulysses) | ✔ | ✔ | Prototype | | HunyuanVideo | ✔ | ✔ | | | CP (Ulysses) | ✔ | ✔ | | | CogVideoX系列-T2V | ✔ | ✔ | | | CP (Ulysses) | ✔ | ✔ | Prototype | | CogVideoX系列-I2V | ✔ | ✔ | | | CP (Ulysses) | ✔ | ✔ | Prototype | | Opensora1.2 | | | | | DSP | ✔ | ✔ | | | OpensoraPlan1.3-T2V | ✔ | ✔ | ✔ | ✔ | CP (Ulysses) | ✔ | ✔ | | | OpensoraPlan1.3-I2V | ✔ | ✔ | ✔ | ✔ | CP (Ulysses) | ✔ | ✔ | | | InternVL2-2B | | | ✔ | ✔ | | ✔ | ✔ | | | InternVL2-8B | | | ✔ | ✔ | | ✔ | ✔ | | | InternVL2-26B | | | ✔ | ✔ | | ✔ | ✔ | | | InternVL2-76B | | | ✔ | ✔ | | ✔ | ✔ | | | Qwen2VL-2B | | | | | | ✔ | ✔ | ✔ | | Qwen2VL-7B | ✔ | | | ✔ | | ✔ | ✔ | ✔ | | Qwen2VL-72B | ✔ | | | ✔ | | ✔ | ✔ | ✔ | | Qwen2.5VL-3B | | | | | | ✔ | | | | Qwen2.5VL-7B | ✔ | | | ✔ | | ✔ | | | | Qwen2.5VL-32B | ✔ | | | ✔ | | ✔ | | | | Qwen2.5VL-72B | ✔ | | | ✔ | | ✔ | | | 备注： * TP: [Tensor Parallel](https://arxiv.org/abs/1909.08053) * TP-SP: [Tensor Parallel with Sequence Parallel](https://arxiv.org/abs/2205.05198) * VPP: [Virtual Pipeline Parallel](https://arxiv.org/abs/2104.04473) * PP: [Pipeline Parallel](https://arxiv.org/abs/2104.04473) * DSP: [Dynamic Sequence Parallel](https://arxiv.org/abs/2403.10266) * CP (Ulysses): [Context Parallel](https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html) by leveraging [Deepspeed Ulysses](https://arxiv.org/abs/2309.14509) with Sequence Parallel * CP (Ring Attention): Context Parallel with [Ring Attention](https://arxiv.org/abs/2310.01889) * Distributed Optimizer: [Zero Redundancy Optimizer](https://arxiv.org/abs/1910.02054) (ZeRO) * Recomputation: Reducing Activation [Recomputation](https://arxiv.org/abs/2205.05198) * LoRA: [Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) --- ## 研发中的特性与模型 * 【新模型】 JanusPro * 【模型特性】 CogVideoX: PP * 【模型特性】 OpensoraPlan1.3: CP (Ring Attention) * 【模型特性】 Qwen2VL: VPP, CP (Ulysses & Ring Attention) * 【模型特性】 InternVL2: TP, CP (Ulysses & Ring Attention) * 【基础特性】 Distrain --- ## 版本维护策略 MindSpeed-MM版本有以下五个维护阶段： | **状态** | **时间** | **说明** | | ------------------- | -------- |----------------------------------------------------------------------| | 计划 | 1—3 个月 | 计划特性 | | 开发 | 3 个月 | 开发特性 | | 维护 | 6-12 个月| 合入所有已解决的问题并发布版本，针对不同的MindSpeed-MM版本采取不同的维护策略，常规版本和长期支持版本维护周期分别为6个月和12个月 | | 无维护 | 0—3 个月 | 合入所有已解决的问题，无专职维护人员，无版本发布 | | 生命周期终止（EOL） | N/A | 分支不再接受任何修改 | MindSpeed-MM已发布版本维护策略： | **MindSpeed-MM版本** | **维护策略** | **当前状态** | **发布时间** | **后续状态** | **EOL日期** | |-----------------|-----------|--------|------------|-----------------------|-----------| | 2.0.0 | 常规版本 | 维护 | 2025/03/30 | 预计2025/09/30起无维护 | | 1.0.0 | 常规版本 | 维护 | 2024/12/30 | 预计2025/06/30起无维护 | | | 1.0.RC3 | 常规版本 | 维护 | 2024/09/30 | 预计2025/03/30起无维护 | | --- ## 配套版本与支持模型【现版本实测性能（硬件信息：Atlas 900 A2 PODc）】下述列表中支持的模型，我们在各模型的**README**文件中提供了相应的使用说明，里面有详细的模型训练、推理、微调等流程 **模型**列中的超链接指向各模型的文件夹地址， **参数量**列中的超链接指向模型的社区资源地址 **认证**【Pass】表示已经通过测试的模型，【Test】表示测试中的模型 Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS) (注：此处SPS、FPS展示集群吞吐；TPS展示单卡吞吐) **亲和场景**为调整少量结构或参数，使得模型更加亲和昇腾，性能更优 **A3** 为硬件 Atlas A3 训练系列产品

MindSpeed-MM模型列表
模型任务	模型	参数量	任务	集群	精度格式	NPU性能	参考性能	认证
多模态生成
	Wan2.1-T2V	1.3B	预训练	1x8	BF16	0.770 (SPS)	0.960 (SPS)	【Test】
		1.3B	Lora微调	1x8	BF16	0.954 (SPS)	1.042 (SPS)	【Test】
		14B	预训练	1x8	BF16	0.160 (SPS)	0.160 (SPS)	【Test】
		14B	Lora微调	1x8	BF16	0.179 (SPS)	0.174 (SPS)	【Test】
	Wan2.1-I2V	1.3B	预训练	1x8	BF16	0.76 (SPS)	/	【Test】
		14B	预训练	1x8	BF16	0.130 (SPS)	/	【Test】
		14B	Lora微调	1x8	BF16	0.179 (SPS)	0.173 (SPS)	【Test】
	HunyuanVideo	13B	预训练	1x8	BF16	0.171 (SPS)	0.181 (SPS)	【Test】
	OpenSora 1.0	5.5B	预训练	1x8	BF16	3.18 (SPS)	2.04 (SPS)	【Pass】
	OpenSora 1.2	5.2B	预训练	1x8	BF16	7.31 (SPS)	8.15 (SPS)	【Test】
	OpenSoraPlan 1.2	8.7B	预训练	1x8	BF16	0.42 (SPS)	0.37 (SPS)	【Pass】
	OpenSoraPlan 1.3-T2V	8.6B	预训练	1x8	BF16	1.29 (SPS)	1.27 (SPS)	【Pass】
	OpenSoraPlan 1.3-I2V	8.6B	预训练	1x8	BF16	1.17 (SPS)	1.15 (SPS)	【Pass】
	CogVideoX-T2V	5B	预训练	1x8	BF16	0.37 (SPS)	0.46 (SPS)	【Pass】
	CogVideoX-I2V	5B	预训练	1x8	BF16	0.37 (SPS)	0.46 (SPS)	【Pass】
	CogVideoX 1.5-T2V	5B	预训练	1x8	BF16	1.88 (SPS)	2.09 (SPS)	【Pass】
	CogVideoX 1.5-T2V	5B	Lora微调	1x8	BF16	2.89 (SPS)	3.03 (SPS)	【Test】
	CogVideoX 1.5-I2V	5B	预训练	1x8	BF16	1.81 (SPS)	2.01 (SPS)	【Pass】
	CogVideoX 1.5-I2V	5B	Lora微调	1x8	BF16	3.44 (SPS)	3.92 (SPS)	【Test】
	Qihoo-T2X	1.1B	推理	1x1	BF16	/	/	【奇虎360贡献】
	SDXL	3.5B	预训练	1x8	BF16	29.92 (FPS)	30.65 (FPS)	【Pass】
	SDXL	3.5B	预训练	1x8	FP16	28.51 (FPS)	30.23 (FPS)	【Pass】
	SD3	2B	全参微调	1x8	BF16	16.09 (FPS)	16.01 (FPS)	【Pass】
	SD3.5	8.1B	全参微调	1x8	BF16	26.20 (FPS)	28.33 (FPS)	【Pass】
	SD3.5	8.1B	Lora微调	1x8	FP16	47.93 (FPS)	47.95 (FPS)	【Pass】
	Flux	12B	全参微调	1x8	BF16	55.23 (FPS)	53.65 (FPS)	【Pass】
	Sana	1.6B	Lora微调	1x8	BF16	28.7 (FPS)	32.8 (FPS)	【Pass】
	Kolors	2.6B	推理	1x1	FP16	/	/	【Test】
多模态理解	LLaVA 1.5	7B	全参微调	1x8	BF16	48.27 (SPS)	49.94 (SPS)	【Test】
	InternVL 2.0	2B	微调	1x8	BF16	33.77 (SPS)	22.46 (SPS)	【Pass】
		8B	微调	1x8	BF16	12.86 (SPS)	11.00 (SPS)	【Pass】
		26B	微调	1x8	BF16	3.31 (SPS)	3.26 (SPS)	【Pass】
		76B	全参微调	8x16	BF16	214 (TPS)	191 (TPS)	【Test】
	InternVL 2.5	78B	微调	8x8	BF16	/	/	【Test】
	Qwen2-VL	2B	微调	1x8	BF16	34.15 (SPS)	34.88 (SPS)	【Pass】
		7B	微调	1x8	BF16	13.28 (SPS)	11.66 (SPS)	【Pass】
		72B	微调	4x8 (A3)	BF16	261.25 (TPS)	257.63 (TPS)	【Pass】
	Qwen2.5-VL	3B	微调	1x8	BF16	23.77 (SPS)	21.79 (SPS)	【Test】
		7B	微调	1x8	BF16	14.20 (SPS)	12.67 (SPS)	【Test】
		32B	微调	2x8	BF16	249.94 (TPS)	/	【Test】
		72B	微调	8x8	BF16	/	/	【Test】
语音识别	Whisper	1.5B	预训练	1x8	BF16	93.38 (SPS)	109.23 (SPS)	【Test】

---

其他已适配昇腾的多模态大模型
模型	参数量	任务	集群	精度格式	NPU性能	参考性能	认证
CogVLM-2	8B	微调	1x8	BF16	3.9 (s/it)	3.3 (s/it)	【Pass】
PLLaVA	7B	预训练	1x8	BF16	0.841 (s/step)	0.935 (s/step)	【Pass】
PLLaVA	7B	预训练	1x8	FP32	0.935 (s/step)	1.08 (s/step)	【Pass】
miniCPM-V 2.5	8B	全参微调	1x8	BF16	1046 (s)/50-200steps	847 (s)/50-200steps	【Pass】
miniCPM-V 2.5	8B	Lora微调	1x8	BF16	603 (s)/50-200steps	490 (s)/50-200steps	【Pass】
HunYuanDiT	1.5B	预训练	1x8	BF16	1099.5 (ms/step)	1059.3 (ms/step)	【Pass】
InternVL 1.5	26B	微调训练	1x8	BF16	4.952 (FPS)	5.151 (FPS)	【Pass】

--- ## 图生视频： OpensoraPlan 1.3 I2V

输入图片	Prompt: A rocket ascends slowly into the sky

## 图生视频： Wan 2.1 I2V

输入图片	Prompt: An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.

## 文生图：Flux T2I

Prompt: A cat holding a sign that says hello world

Prompt: A cat holding a sign that says MindSpeed

## 理解模型：InternVL2 & Qwen2VL

Input image for both models:

Input text for both models: Please describe the image shortly

InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for InternVL2: 请简短描述这张照片

InternVL2推理结果: 这张图片展示了一个宁静的湖泊，湖面平静，反射着天空和周围景物的影像。湖的中央有一个木制码头，延伸到湖中，码头上有几根柱子支撑。湖的远端是一片茂密的森林，树木高大，覆盖着茂密的绿色植被。森林的尽头是一座高耸的山峰，山峰上覆盖着积雪，显得格外壮丽。天空中有一些云朵，但整体上是晴朗的，阳光从云层中透出，照亮了整个湖面和周围的景色。这张图片整体给人一种宁静、祥和的感觉，仿佛是一个远离尘嚣的世外桃源

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上，背景是连绵的山脉和茂密的森林。天空多云，整体色调偏冷，给人一种宁静和自然的感觉。

--- ## MindSpeed-MM工具库 ### 昇腾Profiling采集工具 MindSpeed-MM集成了昇腾profiling采集工具，以提供对模型运行情况的分析。该工具能够依照配置采集模型的算子、显存等关键信息，同时支持动静态两种采集方式，协助开发者分析模型瓶颈，并可根据实际场景需求选择使用。具体方法见 [README](./mindspeed_mm/tools/README.md) 的profiling章节 ### MindStudio Insight性能分析工具针对大模型集群场景的性能调优，这里推荐一款优秀的可视化调优工具MindStudio Insight。 MindStudio Insight提供了包括Timeline视图、通信分析、计算耗时等的可视化呈现，以便用户分析潜在的性能瓶颈，并指导如何采取措施消除或减少这些瓶颈。具体使用方法见[《MindStudio Insight操作指南》](https://www.hiascend.com/document/detail/zh/mindstudio/70RC3/msinsightug/msascendinsightug/Insight_userguide_0002.html)，下载地址[《MindStudio Insight》](https://support.huawei.com/enterprise/zh/ascend-computing/mindstudio-pid-251913966/software/262029358?idAbsPath=fixnode01%7C23710424%7C251366513%7C22892968%7C251913966) --- ## MindSpeed-MM FAQ 相关FAQ请参考链接：[FAQ](./docs/FAQ.md) --- ## 致谢 MindSpeed-MM 由华为公司的下列部门联合贡献： * 计算产品线 * 公共开发部 * 2012实验室 * 华为云 MindSpeed-MM 生态贡献方： * 360 AI Research * 北大OpenSoraPlan团队 * 微信技术架构部基础架构中心感谢来自社区的每一个PR，欢迎贡献 MindSpeed-MM --- ## MindSpeed-MM 相关介绍 1. [面向大规模分布式训练的多模态套件](https://mp.weixin.qq.com/s/Qiw_qThKA72T0lLOSpjkKw) 2. [凭借昇腾澎湃算力，Open-Sora Plan实现电影级视频生成](https://mp.weixin.qq.com/s/KY2tLthhre-SRbuWka3c2w) 3. [MindSpeed-MM支持主流多模态理解大模型，性能实现大幅提升！](https://mp.weixin.qq.com/s/3pZRy24ITyKl3nGc33Sq7w) 4. [基于昇腾原生训练！中大和360联合打造多模态任务新范式Qihoo-T2X](https://mp.weixin.qq.com/s/zQAy_hbL9cR3c8-NO6lKnA) 5. [基于昇腾MindSpeed MM玩转Wan2.1视频生成SOTA模型](https://mp.weixin.qq.com/s/g2ShV2F6YpoVAniw6CBN_w) 6. [多模态理解SOTA模型开箱即用，MindSpeed MM支持Qwen2.5-VL最佳实践](https://mp.weixin.qq.com/s/ac7RUWw79stunwQIyC-ykQ) --- ## 安全申明 [MindSpeed MM 安全申明](https://gitee.com/ascend/MindSpeed-MM/blob/master/docs/SECURITYNOTE.md) --- ## 免责声明 ### 致MindSpeed-MM使用者 1. MindSpeed-MM提供的模型仅供您用于非商业目的。 2. 对于各模型，MindSpeed-MM平台仅提示性地向您建议可用于训练的数据集，华为不提供任何数据集，如您使用这些数据集进行训练，请您特别注意应遵守对应数据集的License，如您因使用数据集而产生侵权纠纷，华为不承担任何责任。 3. 如您在使用MindSpeed-MM模型过程中，发现任何问题（包括但不限于功能问题、合规问题），请在Gitee提交issue，我们将及时审视并解决。 ### 致数据集所有者如果您不希望您的数据集在MindSpeed-MM中的模型被提及，或希望更新MindSpeed-MM中的模型关于您的数据集的描述，请在Gitee提交issue，我们将根据您的issue要求删除或更新您的数据集描述。衷心感谢您对MindSpeed-MM的理解和贡献。