# MindSpeed-MM **Repository Path**: wangzw1022/MindSpeed-MM ## Basic Information - **Project Name**: MindSpeed-MM - **Description**: 昇腾多模态大模型套件。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://gitee.com/ascend/MindSpeed-MM - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 232 - **Created**: 2024-10-17 - **Last Updated**: 2025-02-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
模型任务 | 模型 | 参数量 | 任务 | 集群 | 精度格式 | NPU性能 | 参考性能 | 认证 |
---|---|---|---|---|---|---|---|---|
多模态生成 | OpenSora 1.0 | 5.5B | 预训练 | 1x8 | BF16 | 3.18 (SPS) | 2.04 (SPS) | 【Pass】 |
OpenSora 1.2 | 5.2B | 预训练 | 1x8 | BF16 | 7.31 (SPS) | 8.15 (SPS) | 【Pass】 | |
OpenSoraPlan 1.2 | 8.7B | 预训练 | 1x8 | BF16 | 0.42 (SPS) | 0.37 (SPS) | 【Pass】 | |
OpenSoraPlan 1.3-T2V | 8.6B | 预训练 | 1x8 | BF16 | 1.29 (SPS) | 1.27 (SPS) | 【Pass】 | |
OpenSoraPlan 1.3-I2V | 8.6B | 预训练 | 1x8 | BF16 | 1.17 (SPS) | 1.15 (SPS) | 【Pass】 | |
CogVideoX-T2V | 5B | 预训练 | 1x8 | BF16 | 0.37 (SPS) | 0.46 (SPS) | 【Pass】 | |
亲和场景 | 预训练 | 1x8 | BF16 | 0.92 (SPS) | 0.96 (SPS) | 【Pass】 | ||
CogVideoX-I2V | 5B | 预训练 | 1x8 | BF16 | 0.37 (SPS) | 0.46 (SPS) | 【Pass】 | |
亲和场景 | 预训练 | 1x8 | BF16 | 0.92 (SPS) | 0.96 (SPS) | 【Pass】 | ||
Qihoo-T2X | 1.1B | 推理 | 1x1 | BF16 | / | / | 【奇虎360贡献】 | |
SDXL | 3.5B | 预训练 | 1x8 | BF16 | 29.92 (FPS) | 30.65 (FPS) | 【Pass】 | |
3.5B | 预训练 | 1x8 | FP16 | 28.51 (FPS) | 30.23 (FPS) | 【Pass】 | ||
SD3 | 2B | 全参微调 | 1x8 | BF16 | 17.08 (FPS) | 17.51 (FPS) | 【Pass】 | |
SD3.5 | 2B | 全参微调 | 1x8 | BF16 | 26.20 (FPS) | 28.33 (FPS) | 【Pass】 | |
2B | Lora微调 | 1x8 | FP16 | 47.93 (FPS) | 47.95 (FPS) | 【Pass】 | ||
Flux | 12B | 全参微调 | 1x8 | BF16 | 55.23 (FPS) | 53.65 (FPS) | 【Pass】 | |
Kolors | 2.6B | 推理 | 1x1 | FP16 | / | / | 【Pass】 | |
多模态理解 | LLaVA 1.5 | 7B | 全参微调 | 1x8 | BF16 | 48.27 (SPS) | 49.94 (SPS) | 【Pass】 |
Intern-VL-2.0 | 2B | 微调 | 1x8 | BF16 | 33.77 (SPS) | 22.46 (SPS) | 【Pass】 | |
8B | 微调 | 1x8 | BF16 | 12.86 (SPS) | 11.00 (SPS) | 【Pass】 | ||
76B | 全参微调 | 8x16 | BF16 | 214 (TPS) | 191 (TPS) | 【Test】 | ||
Qwen2-VL | 2B | 微调 | 1x8 | BF16 | 34.15 (SPS) | 34.88 (SPS) | 【Pass】 | |
7B | 微调 | 1x8 | BF16 | 13.28 (SPS) | 11.66 (SPS) | 【Pass】 | ||
72B | 微调 | 8x16 | BF16 | / | / | 【Test】 | ||
语音识别 | Whisper | 1.5B | 预训练 | 1x8 | BF16 | 93.38 (SPS) | 109.23 (SPS) | 【Pass】 |
模型 | 参数量 | 任务 | 集群 | 精度格式 | NPU性能 | 参考性能 | 认证 |
---|---|---|---|---|---|---|---|
CogVLM-2 | 8B | 微调 | 1x8 | BF16 | 3.9 (s/it) | 3.3 (s/it) | 【Pass】 |
PLLaVA | 7B | 预训练 | 1x8 | BF16 | 0.841 (s/step) | 0.935 (s/step) | 【Pass】 |
7B | 预训练 | 1x8 | FP32 | 0.935 (s/step) | 1.08 (s/step) | 【Pass】 | |
miniCPM-V 2.5 | 8B | 全参微调 | 1x8 | BF16 | 1046 (s)/50-200steps | 847 (s)/50-200steps | 【Pass】 |
8B | Lora微调 | 1x8 | BF16 | 603 (s)/50-200steps | 490 (s)/50-200steps | 【Pass】 | |
HunYuanDiT | 1.5B | 预训练 | 1x8 | BF16 | 1099.5 (ms/step) | 1059.3 (ms/step) | 【Pass】 |
Intern-VL-1.5 | 26B | 微调训练 | 1x8 | BF16 | 4.952 (FPS) | 5.151 (FPS) | 【Pass】 |
![]() 输入图片 |
![]() Prompt: A rocket ascends slowly into the sky |
![]() Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures |
![]() Prompt: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee |
![]() Prompt: A cat holding a sign that says hello world |
![]() Prompt: A cat holding a sign that says MindSpeed |
Input image for both models: ![]() Input text for both models: Please describe the image shortly InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene. Input text for InternVL2: 请简短描述这张照片 InternVL2推理结果: 这张图片展示了一个宁静的湖泊,湖面平静,反射着天空和周围景物的影像。湖的中央有一个木制码头,延伸到湖中,码头上有几根柱子支撑。 湖的远端是一片茂密的森林,树木高大,覆盖着茂密的绿色植被。森林的尽头是一座高耸的山峰,山峰上覆盖着积雪,显得格外壮丽。 天空中有一些云朵,但整体上是晴朗的,阳光从云层中透出,照亮了整个湖面和周围的景色。 这张图片整体给人一种宁静、祥和的感觉,仿佛是一个远离尘嚣的世外桃源 Input text for Qwen2VL: 请用中文简短描述这张照片 Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上,背景是连绵的山脉和茂密的森林。天空多云,整体色调偏冷,给人一种宁静和自然的感觉。 |