# MindSpeed-MM **Repository Path**: Yulv-git/MindSpeed-MM ## Basic Information - **Project Name**: MindSpeed-MM - **Description**: 华为昇腾面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://gitee.com/ascend/MindSpeed-MM - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 226 - **Created**: 2025-04-14 - **Last Updated**: 2025-04-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
模型任务 | 模型 | 参数量 | 任务 | 集群 | 精度格式 | NPU性能 | 参考性能 | 认证 |
---|---|---|---|---|---|---|---|---|
多模态生成 | Wan2.1-T2V | 1.3B | 预训练 | 1x8 | BF16 | 0.770 (SPS) | 0.960 (SPS) | 【Test】 |
1.3B | Lora微调 | 1x8 | BF16 | 0.954 (SPS) | 1.042 (SPS) | 【Test】 | ||
14B | 预训练 | 1x8 | BF16 | 0.160 (SPS) | 0.160 (SPS) | 【Test】 | ||
14B | Lora微调 | 1x8 | BF16 | 0.179 (SPS) | 0.174 (SPS) | 【Test】 | Wan2.1-I2V | 1.3B | 预训练 | 1x8 | BF16 | 0.76 (SPS) | / | 【Test】 |
14B | 预训练 | 1x8 | BF16 | 0.130 (SPS) | / | 【Test】 | ||
14B | Lora微调 | 1x8 | BF16 | 0.179 (SPS) | 0.173 (SPS) | 【Test】 | HunyuanVideo | 13B | 预训练 | 1x8 | BF16 | 0.171 (SPS) | 0.181 (SPS) | 【Test】 |
OpenSora 1.0 | 5.5B | 预训练 | 1x8 | BF16 | 3.18 (SPS) | 2.04 (SPS) | 【Pass】 | |
OpenSora 1.2 | 5.2B | 预训练 | 1x8 | BF16 | 7.31 (SPS) | 8.15 (SPS) | 【Test】 | |
OpenSoraPlan 1.2 | 8.7B | 预训练 | 1x8 | BF16 | 0.42 (SPS) | 0.37 (SPS) | 【Pass】 | |
OpenSoraPlan 1.3-T2V | 8.6B | 预训练 | 1x8 | BF16 | 1.29 (SPS) | 1.27 (SPS) | 【Pass】 | |
OpenSoraPlan 1.3-I2V | 8.6B | 预训练 | 1x8 | BF16 | 1.17 (SPS) | 1.15 (SPS) | 【Pass】 | |
CogVideoX-T2V | 5B | 预训练 | 1x8 | BF16 | 0.37 (SPS) | 0.46 (SPS) | 【Pass】 | |
CogVideoX-I2V | 5B | 预训练 | 1x8 | BF16 | 0.37 (SPS) | 0.46 (SPS) | 【Pass】 | |
CogVideoX 1.5-T2V | 5B | 预训练 | 1x8 | BF16 | 1.88 (SPS) | 2.09 (SPS) | 【Pass】 | |
5B | Lora微调 | 1x8 | BF16 | 2.89 (SPS) | 3.03 (SPS) | 【Test】 | ||
CogVideoX 1.5-I2V | 5B | 预训练 | 1x8 | BF16 | 1.81 (SPS) | 2.01 (SPS) | 【Pass】 | |
5B | Lora微调 | 1x8 | BF16 | 3.44 (SPS) | 3.92 (SPS) | 【Test】 | ||
Qihoo-T2X | 1.1B | 推理 | 1x1 | BF16 | / | / | 【奇虎360贡献】 | |
SDXL | 3.5B | 预训练 | 1x8 | BF16 | 29.92 (FPS) | 30.65 (FPS) | 【Pass】 | |
3.5B | 预训练 | 1x8 | FP16 | 28.51 (FPS) | 30.23 (FPS) | 【Pass】 | ||
SD3 | 2B | 全参微调 | 1x8 | BF16 | 16.09 (FPS) | 16.01 (FPS) | 【Pass】 | |
SD3.5 | 8.1B | 全参微调 | 1x8 | BF16 | 26.20 (FPS) | 28.33 (FPS) | 【Pass】 | |
8.1B | Lora微调 | 1x8 | FP16 | 47.93 (FPS) | 47.95 (FPS) | 【Pass】 | ||
Flux | 12B | 全参微调 | 1x8 | BF16 | 55.23 (FPS) | 53.65 (FPS) | 【Pass】 | |
Sana | 1.6B | Lora微调 | 1x8 | BF16 | 28.7 (FPS) | 32.8 (FPS) | 【Pass】 | |
Kolors | 2.6B | 推理 | 1x1 | FP16 | / | / | 【Test】 | |
多模态理解 | LLaVA 1.5 | 7B | 全参微调 | 1x8 | BF16 | 48.27 (SPS) | 49.94 (SPS) | 【Test】 |
InternVL 2.0 | 2B | 微调 | 1x8 | BF16 | 33.77 (SPS) | 22.46 (SPS) | 【Pass】 | |
8B | 微调 | 1x8 | BF16 | 12.86 (SPS) | 11.00 (SPS) | 【Pass】 | ||
26B | 微调 | 1x8 | BF16 | 3.31 (SPS) | 3.26 (SPS) | 【Pass】 | ||
76B | 全参微调 | 8x16 | BF16 | 214 (TPS) | 191 (TPS) | 【Test】 | ||
InternVL 2.5 | 78B | 微调 | 8x8 | BF16 | / | / | 【Test】 | |
Qwen2-VL | 2B | 微调 | 1x8 | BF16 | 34.15 (SPS) | 34.88 (SPS) | 【Pass】 | |
7B | 微调 | 1x8 | BF16 | 13.28 (SPS) | 11.66 (SPS) | 【Pass】 | ||
72B | 微调 | 4x8 (A3) | BF16 | 261.25 (TPS) | 257.63 (TPS) | 【Pass】 | ||
Qwen2.5-VL | 3B | 微调 | 1x8 | BF16 | 23.77 (SPS) | 21.79 (SPS) | 【Test】 | |
7B | 微调 | 1x8 | BF16 | 14.20 (SPS) | 12.67 (SPS) | 【Test】 | ||
32B | 微调 | 2x8 | BF16 | 249.94 (TPS) | / | 【Test】 | ||
72B | 微调 | 8x8 | BF16 | / | / | 【Test】 | ||
语音识别 | Whisper | 1.5B | 预训练 | 1x8 | BF16 | 93.38 (SPS) | 109.23 (SPS) | 【Test】 |
模型 | 参数量 | 任务 | 集群 | 精度格式 | NPU性能 | 参考性能 | 认证 |
---|---|---|---|---|---|---|---|
CogVLM-2 | 8B | 微调 | 1x8 | BF16 | 3.9 (s/it) | 3.3 (s/it) | 【Pass】 |
PLLaVA | 7B | 预训练 | 1x8 | BF16 | 0.841 (s/step) | 0.935 (s/step) | 【Pass】 |
7B | 预训练 | 1x8 | FP32 | 0.935 (s/step) | 1.08 (s/step) | 【Pass】 | |
miniCPM-V 2.5 | 8B | 全参微调 | 1x8 | BF16 | 1046 (s)/50-200steps | 847 (s)/50-200steps | 【Pass】 |
8B | Lora微调 | 1x8 | BF16 | 603 (s)/50-200steps | 490 (s)/50-200steps | 【Pass】 | |
HunYuanDiT | 1.5B | 预训练 | 1x8 | BF16 | 1099.5 (ms/step) | 1059.3 (ms/step) | 【Pass】 |
InternVL 1.5 | 26B | 微调训练 | 1x8 | BF16 | 4.952 (FPS) | 5.151 (FPS) | 【Pass】 |
![]() 输入图片 |
![]() Prompt: A rocket ascends slowly into the sky |
![]() 输入图片 |
![]() Prompt: An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot. |
![]() Prompt: A cat holding a sign that says hello world |
![]() Prompt: A cat holding a sign that says MindSpeed |
Input image for both models: ![]() Input text for both models: Please describe the image shortly InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene. Input text for InternVL2: 请简短描述这张照片 InternVL2推理结果: 这张图片展示了一个宁静的湖泊,湖面平静,反射着天空和周围景物的影像。湖的中央有一个木制码头,延伸到湖中,码头上有几根柱子支撑。 湖的远端是一片茂密的森林,树木高大,覆盖着茂密的绿色植被。森林的尽头是一座高耸的山峰,山峰上覆盖着积雪,显得格外壮丽。 天空中有一些云朵,但整体上是晴朗的,阳光从云层中透出,照亮了整个湖面和周围的景色。 这张图片整体给人一种宁静、祥和的感觉,仿佛是一个远离尘嚣的世外桃源 Input text for Qwen2VL: 请用中文简短描述这张照片 Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上,背景是连绵的山脉和茂密的森林。天空多云,整体色调偏冷,给人一种宁静和自然的感觉。 |