# MindSpeed-MM **Repository Path**: wangzw1022/MindSpeed-MM ## Basic Information - **Project Name**: MindSpeed-MM - **Description**: 昇腾多模态大模型套件。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://gitee.com/ascend/MindSpeed-MM - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 232 - **Created**: 2024-10-17 - **Last Updated**: 2025-02-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

MindSpeed-MM是面向大规模分布式训练的昇腾多模态大模型套件，同时支持多模态生成及多模态理解，旨在为华为 [昇腾芯片](https://www.hiascend.com/) 提供端到端的多模态训练解决方案, 包含预置业界主流模型，数据工程，分布式训练及加速，预训练、微调、在线推理任务等特性。 --- ## 🔥🔥🔥Latest News * [Dec. 19, 2024]: 🎉 MindSpeed-MM生成类模型支持分布式推理 * [Dec. 16, 2024]: 🚀 MindSpeed-MM支持Qihoo-T2X模型 * [Dec. 05, 2024]: 🎉 MindSpeed-MM理解类模型支持Lora微调 * [Dec. 03, 2024]: 🚀 MindSpeed-MM支持SD3.5模型 * [Nov. 30, 2024]: 🎉 MindSpeed-MM支持多模态理解测评 * [Nov. 22, 2024]: 🚀 MindSpeed-MM支持CogVideoX-5B-t2v & i2v模型 * [Nov. 13, 2024]: 🚀 MindSpeed-MM支持OpenSoraPlan 1.3-i2v模型 * [Nov. 06, 2024]: 🚀 MindSpeed-MM支持FLUX模型 * [Oct. 30, 2024]: 🚀 MindSpeed-MM支持OpenSoraPlan 1.3-t2v模型 * [Oct. 21, 2024]: 🚀 MindSpeed-MM支持InternVL2-8B、以及Qwen2VL-7B模型 * [Oct. 16, 2024]: 🌱 MindSpeed-MM首版本1.0.RC3发布 --- ## 已支持特性概览 | 模型 \ 特性 | [TP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/tensor-parallel.md) | [TP-SP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/sequence-parallel.md) | [VPP](docs/features/virtual_pipeline_parallel.md) | [PP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/pipeline-parallel.md) | CP | [Distributed Optimizer](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/distributed-optimizer.md) | [Recomputation](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/recomputation.md) | [LoRA](./docs/features/lora_finetune.md) | |:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:| | CogVideoX-T2V | ✔ | | | | CP (Ulysses) | ✔ | ✔ | | | CogVideoX-I2V | ✔ | | | | CP (Ulysses) | ✔ | ✔ | | | Opensora1.2 | | | | | DSP | ✔ | ✔ | | | OpensoraPlan1.3-T2V | ✔ | ✔ | | ✔ | CP (Ulysses) | ✔ | ✔ | | | OpensoraPlan1.3-I2V | ✔ | ✔ | | ✔ | CP (Ulysses) | ✔ | ✔ | | | InternVL2-2B | | | ✔ | ✔ | | ✔ | ✔ | ✔ | | InternVL2-8B | | | ✔ | ✔ | | ✔ | ✔ | ✔ | | InternVL2-76B | | | ✔ | ✔ | | ✔ | ✔ | ✔ | | Qwen2VL-2B | | | | ✔ | | ✔ | ✔ | ✔ | | Qwen2VL-7B | | | | ✔ | | ✔ | ✔ | ✔ | | Qwen2VL-72B | | | | ✔ | | ✔ | ✔ | ✔ | 备注： * TP: [Tensor Parallel](https://arxiv.org/abs/1909.08053) * TP-SP: [Tensor Parallel with Sequence Parallel](https://arxiv.org/abs/2205.05198) * VPP: [Virtual Pipeline Parallel](https://arxiv.org/abs/2104.04473) * PP: [Pipeline Parallel](https://arxiv.org/abs/2104.04473) * DSP: [Dynamic Sequence Parallel](https://arxiv.org/abs/2403.10266) * CP (Ulysses): [Context Parallel](https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html) by leveraging [Deepspeed Ulysses](https://arxiv.org/abs/2309.14509) with Sequence Parallel * CP (Ring Attention): Context Parallel with [Ring Attention](https://arxiv.org/abs/2310.01889) * Distributed Optimizer: [Zero Redundancy Optimizer](https://arxiv.org/abs/1910.02054) (ZeRO) * Recomputation: Reducing Activation [Recomputation](https://arxiv.org/abs/2205.05198) * LoRA: [Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) --- ## 研发中的特性与模型 * 【新模型】 CogVideoX 1.5: [5B](https://huggingface.co/THUDM/CogVideoX1.5-5B) * 【新模型】 MiniCPM-V 2.6: [8B](https://huggingface.co/openbmb/MiniCPM-V-2_6) * 【新模型】 WF-VAE: [WF-VAE](https://arxiv.org/abs/2411.17459) training * 【模型特性】 CogVideoX: PP, TP+SP * 【模型特性】 OpensoraPlan1.3: PP, CP (Ring Attention) * 【模型特性】 Qwen2VL: TP, VPP, CP (Ulysses & Ring Attention) * 【模型特性】 InternVL2: TP, CP (Ulysses & Ring Attention) * 【基础特性】 10M超长序列Demo * 【基础特性】分布式推理 * 【基础特性】 Distrain --- ## 版本维护策略 MindSpeed-MM版本有以下五个维护阶段： | **状态** | **时间** | **说明** | | ------------------- | -------- |----------------------------------------------------------------------| | 计划 | 1—3 个月 | 计划特性 | | 开发 | 3 个月 | 开发特性 | | 维护 | 6-12 个月| 合入所有已解决的问题并发布版本，针对不同的MindSpeed-MM版本采取不同的维护策略，常规版本和长期支持版本维护周期分别为6个月和12个月 | | 无维护 | 0—3 个月 | 合入所有已解决的问题，无专职维护人员，无版本发布 | | 生命周期终止（EOL） | N/A | 分支不再接受任何修改 | MindSpeed-MM已发布版本维护策略： | **MindSpeed-MM版本** | **维护策略** | **当前状态** | **发布时间** | **后续状态** | **EOL日期** | |-----------------|-----------|--------|------------|-----------------------|-----------| | 1.0.RC3 | 常规版本 | 维护 | 2024/09/30 | 预计2025/03/30起无维护 | | --- ## 配套版本与支持模型【现版本实测性能（硬件信息：Atlas 900 A2 PODc）】下述列表中支持的模型，我们在各模型的`README`文件中提供了相应的使用说明，里面有详细的模型训练、推理、微调等流程 `模型`列中的超链接指向各模型的文件夹地址， `参数量`列中的超链接指向模型的社区资源地址 `认证`【Pass】表示已经过测试的模型，【Test】表示测试中的模型 Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS) `亲和场景`为调整少量结构或参数，使得模型更加亲和昇腾，性能更优

MindSpeed-MM模型列表
模型任务	模型	参数量	任务	集群	精度格式	NPU性能	参考性能	认证
多模态生成	OpenSora 1.0	5.5B	预训练	1x8	BF16	3.18 (SPS)	2.04 (SPS)	【Pass】
	OpenSora 1.2	5.2B	预训练	1x8	BF16	7.31 (SPS)	8.15 (SPS)	【Pass】
	OpenSoraPlan 1.2	8.7B	预训练	1x8	BF16	0.42 (SPS)	0.37 (SPS)	【Pass】
	OpenSoraPlan 1.3-T2V	8.6B	预训练	1x8	BF16	1.29 (SPS)	1.27 (SPS)	【Pass】
	OpenSoraPlan 1.3-I2V	8.6B	预训练	1x8	BF16	1.17 (SPS)	1.15 (SPS)	【Pass】
	CogVideoX-T2V	5B	预训练	1x8	BF16	0.37 (SPS)	0.46 (SPS)	【Pass】
	CogVideoX-T2V	亲和场景	预训练	1x8	BF16	0.92 (SPS)	0.96 (SPS)	【Pass】
	CogVideoX-I2V	5B	预训练	1x8	BF16	0.37 (SPS)	0.46 (SPS)	【Pass】
	CogVideoX-I2V	亲和场景	预训练	1x8	BF16	0.92 (SPS)	0.96 (SPS)	【Pass】
	Qihoo-T2X	1.1B	推理	1x1	BF16	/	/	【奇虎360贡献】
	SDXL	3.5B	预训练	1x8	BF16	29.92 (FPS)	30.65 (FPS)	【Pass】
	SDXL	3.5B	预训练	1x8	FP16	28.51 (FPS)	30.23 (FPS)	【Pass】
	SD3	2B	全参微调	1x8	BF16	17.08 (FPS)	17.51 (FPS)	【Pass】
	SD3.5	2B	全参微调	1x8	BF16	26.20 (FPS)	28.33 (FPS)	【Pass】
	SD3.5	2B	Lora微调	1x8	FP16	47.93 (FPS)	47.95 (FPS)	【Pass】
	Flux	12B	全参微调	1x8	BF16	55.23 (FPS)	53.65 (FPS)	【Pass】
	Kolors	2.6B	推理	1x1	FP16	/	/	【Pass】
多模态理解	LLaVA 1.5	7B	全参微调	1x8	BF16	48.27 (SPS)	49.94 (SPS)	【Pass】
	Intern-VL-2.0	2B	微调	1x8	BF16	33.77 (SPS)	22.46 (SPS)	【Pass】
		8B	微调	1x8	BF16	12.86 (SPS)	11.00 (SPS)	【Pass】
		76B	全参微调	8x16	BF16	214 (TPS)	191 (TPS)	【Test】
	Qwen2-VL	2B	微调	1x8	BF16	34.15 (SPS)	34.88 (SPS)	【Pass】
		7B	微调	1x8	BF16	13.28 (SPS)	11.66 (SPS)	【Pass】
		72B	微调	8x16	BF16	/	/	【Test】
语音识别	Whisper	1.5B	预训练	1x8	BF16	93.38 (SPS)	109.23 (SPS)	【Pass】

---

其他已适配昇腾的多模态大模型
模型	参数量	任务	集群	精度格式	NPU性能	参考性能	认证
CogVLM-2	8B	微调	1x8	BF16	3.9 (s/it)	3.3 (s/it)	【Pass】
PLLaVA	7B	预训练	1x8	BF16	0.841 (s/step)	0.935 (s/step)	【Pass】
PLLaVA	7B	预训练	1x8	FP32	0.935 (s/step)	1.08 (s/step)	【Pass】
miniCPM-V 2.5	8B	全参微调	1x8	BF16	1046 (s)/50-200steps	847 (s)/50-200steps	【Pass】
miniCPM-V 2.5	8B	Lora微调	1x8	BF16	603 (s)/50-200steps	490 (s)/50-200steps	【Pass】
HunYuanDiT	1.5B	预训练	1x8	BF16	1099.5 (ms/step)	1059.3 (ms/step)	【Pass】
Intern-VL-1.5	26B	微调训练	1x8	BF16	4.952 (FPS)	5.151 (FPS)	【Pass】

--- ## 图生视频： OpensoraPlan 1.3 I2V

输入图片	Prompt: A rocket ascends slowly into the sky

## 文生视频： OpensoraPlan 1.3 T2V

Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures

Prompt: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee

## 文生图：Flux T2I

Prompt: A cat holding a sign that says hello world

Prompt: A cat holding a sign that says MindSpeed

## 理解模型：InvernVL2 & Qwen2VL

Input image for both models:

Input text for both models: Please describe the image shortly

InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for InternVL2: 请简短描述这张照片

InternVL2推理结果: 这张图片展示了一个宁静的湖泊，湖面平静，反射着天空和周围景物的影像。湖的中央有一个木制码头，延伸到湖中，码头上有几根柱子支撑。湖的远端是一片茂密的森林，树木高大，覆盖着茂密的绿色植被。森林的尽头是一座高耸的山峰，山峰上覆盖着积雪，显得格外壮丽。天空中有一些云朵，但整体上是晴朗的，阳光从云层中透出，照亮了整个湖面和周围的景色。这张图片整体给人一种宁静、祥和的感觉，仿佛是一个远离尘嚣的世外桃源

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上，背景是连绵的山脉和茂密的森林。天空多云，整体色调偏冷，给人一种宁静和自然的感觉。

--- ## MindSpeed-MM工具库 ### 昇腾Profiling采集工具 MindSpeed-MM集成了昇腾profiling采集工具，以提供对模型运行情况的分析。该工具能够依照配置采集模型的算子、显存等关键信息，同时支持动静态两种采集方式，协助开发者分析模型瓶颈，并可根据实际场景需求选择使用。具体方法见 [README](./mindspeed_mm/tools/README.md) 的profiling章节 ### MindStudio Insight性能分析工具针对大模型集群场景的性能调优，这里推荐一款优秀的可视化调优工具MindStudio Insight。 MindStudio Insight提供了包括Timeline视图、通信分析、计算耗时等的可视化呈现，以便用户分析潜在的性能瓶颈，并指导如何采取措施消除或减少这些瓶颈。具体使用方法见[《MindStudio Insight操作指南》](https://www.hiascend.com/document/detail/zh/mindstudio/70RC3/msinsightug/msascendinsightug/Insight_userguide_0002.html)，下载地址[《MindStudio Insight》](https://support.huawei.com/enterprise/zh/ascend-computing/mindstudio-pid-251913966/software/262029358?idAbsPath=fixnode01%7C23710424%7C251366513%7C22892968%7C251913966) --- ## MindSpeed-MM FAQ 相关FAQ请参考链接：[FAQ](./docs/FAQ.md) --- ## 致谢 MindSpeed-MM 由华为公司的下列部门联合贡献： * 计算产品线 * 公共开发部 * 2012实验室 * 华为云 MindSpeed-MM 生态贡献方： * 360 AI Research 感谢来自社区的每一个PR，欢迎贡献 MindSpeed-MM --- ## Mindspeed-MM 相关介绍 1. [面向大规模分布式训练的多模态套件](https://mp.weixin.qq.com/s/Qiw_qThKA72T0lLOSpjkKw) 2. [凭借昇腾澎湃算力，Open-Sora Plan实现电影级视频生成](https://mp.weixin.qq.com/s/KY2tLthhre-SRbuWka3c2w) 3. [MindSpeed-MM支持主流多模态理解大模型，性能实现大幅提升！](https://mp.weixin.qq.com/s/3pZRy24ITyKl3nGc33Sq7w) --- ## 安全申明 [MindSpeed MM 安全申明](https://gitee.com/ascend/MindSpeed-MM/blob/master/docs/SECURITYNOTE.md)