面向大规模分布式训练的多模态大模型套件 多模态生成、理解

B站影视 韩国电影 2025-03-26 06:36 1

摘要:一飞开源,介绍创意、新奇、有趣、实用的开源应用、系统、软件、硬件及技术,一个探索、发现、分享、使用与互动交流的开源技术社区平台。致力于打造活力开源社区,共建开源新生态!

一飞开源,介绍创意、新奇、有趣、实用的开源应用、系统、软件、硬件及技术,一个探索、发现、分享、使用与互动交流的开源技术社区平台。致力于打造活力开源社区,共建开源新生态!

MindSpeed-MM是面向大规模分布式训练的昇腾多模态大模型套件,同时支持多模态生成及多模态理解,旨在为华为 昇腾芯片 提供端到端的多模态训练解决方案, 包含预置业界主流模型,数据工程,分布式训练及加速,预训练、微调、在线推理任务等特性。

Prompt: A rocket ascends slowly into the sky

Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures

Prompt: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee

Prompt: A cat holding a sign that says hello world

Prompt: A cat holding a sign that says MindSpeed

Input text for both models: Please describe the image shortly

InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for InternVL2: 请简短描述这张照片

InternVL2推理结果: 这张图片展示了一个宁静的湖泊,湖面平静,反射着天空和周围景物的影像。湖的中央有一个木制码头,延伸到湖中,码头上有几根柱子支撑。 湖的远端是一片茂密的森林,树木高大,覆盖着茂密的绿色植被。森林的尽头是一座高耸的山峰,山峰上覆盖着积雪,显得格外壮丽。 天空中有一些云朵,但整体上是晴朗的,阳光从云层中透出,照亮了整个湖面和周围的景色。 这张图片整体给人一种宁静、祥和的感觉,仿佛是一个远离尘嚣的世外桃源

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上,背景是连绵的山脉和茂密的森林。天空多云,整体色调偏冷,给人一种宁静和自然的感觉。

模型 \ 特性TPTP-SPVPPPPCPDistributed OptimizerRecomputationLoRAHunyuanVideo✔✔

CP (Ulysses)✔✔
CogVideoX系列-T2V✔✔

CP (Ulysses)✔✔
CogVideoX系列-I2V✔✔

CP (Ulysses)✔✔
Opensora1.2



DSP✔✔
OpensoraPlan1.3-T2V✔✔✔✔CP (Ulysses)✔✔
OpensoraPlan1.3-I2V✔✔✔✔CP (Ulysses)✔✔
InternVL2-2B

✔✔
✔✔
InternVL2-8B

✔✔
✔✔
InternVL2-26B

✔✔
✔✔
InternVL2-76B

✔✔
✔✔
Qwen2VL-2B✔


✔✔✔Qwen2VL-7B✔


✔✔✔Qwen2VL-72B✔


✔✔✔

备注:

TP: Tensor ParallelTP-SP: Tensor Parallel with Sequence ParallelVPP: Virtual Pipeline ParallelPP: Pipeline ParallelDSP: Dynamic Sequence ParallelCP (Ulysses): Context Parallel by leveraging Deepspeed Ulysses with Sequence ParallelCP (Ring Attention): Context Parallel with Ring AttentionDistributed Optimizer: Zero Redundancy Optimizer (ZeRO)Recomputation: Reducing Activation RecomputationLoRA: Low-Rank Adaptation状态时间说明计划1—3 个月计划特性开发3 个月开发特性维护6-12 个月合入所有已解决的问题并发布版本,针对不同的MindSpeed-MM版本采取不同的维护策略,常规版本和长期支持版本维护周期分别为6个月和12个月无维护0—3 个月合入所有已解决的问题,无专职维护人员,无版本发布生命周期终止(EOL)N/A分支不再接受任何修改

MindSpeed-MM已发布版本维护策略:

MindSpeed-MM版本维护策略当前状态发布时间后续状态EOL日期2.0.0常规版本维护2025/03/30预计2025/09/30起无维护
1.0.0常规版本维护2024/12/30预计2025/06/30起无维护
1.0.RC3常规版本维护2024/09/30预计2025/03/30起无维护

【现版本实测性能(硬件信息:Atlas 900 A2 PODc)】

下述列表中支持的模型,我们在各模型的README文件中提供了相应的使用说明,里面有详细的模型训练、推理、微调等流程

模型列中的超链接指向各模型的文件夹地址, 参数量列中的超链接指向模型的社区资源地址

认证【Pass】表示已经通过测试的模型,【Test】表示测试中的模型

Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS)

亲和场景为调整少量结构或参数,使得模型更加亲和昇腾,性能更优

A3 为硬件 Atlas A3 训练系列产品

MindSpeed-MM模型列表

模型任务模型参数量任务集群精度格式NPU性能参考性能认证多模态生成HunyuanVideo13B预训练1x8BF160.171 (SPS)0.181 (SPS)【Test】OpenSora 1.05.5B预训练1x8BF163.18 (SPS)2.04 (SPS)【Pass】OpenSora 1.25.2B预训练1x8BF167.31 (SPS)8.15 (SPS)【Pass】OpenSoraPlan 1.28.7B预训练1x8BF160.42 (SPS)0.37 (SPS)【Pass】OpenSoraPlan 1.3-T2V8.6B预训练1x8BF161.29 (SPS)1.27 (SPS)【Pass】OpenSoraPlan 1.3-I2V8.6B预训练1x8BF161.17 (SPS)1.15 (SPS)【Pass】CogVideoX-T2V5B预训练1x8BF160.37 (SPS)0.46 (SPS)【Pass】CogVideoX-I2V5B预训练1x8BF160.37 (SPS)0.46 (SPS)【Pass】CogVideoX 1.5-T2V5B预训练1x8BF161.88 (SPS)2.09 (SPS)【Pass】CogVideoX 1.5-I2V5B预训练1x8BF161.81 (SPS)2.01 (SPS)【Pass】Qihoo-T2X1.1B推理1x1BF16//【奇虎360贡献】SDXL3.5B预训练1x8BF1629.92 (FPS)30.65 (FPS)【Pass】3.5B预训练1x8FP1628.51 (FPS)30.23 (FPS)【Pass】SD32B全参微调1x8BF1616.09 (FPS)16.01 (FPS)【Pass】SD3.58.1B全参微调1x8BF1626.20 (FPS)28.33 (FPS)【Pass】8.1BLora微调1x8FP1647.93 (FPS)47.95 (FPS)【Pass】Flux12B全参微调1x8BF1655.23 (FPS)53.65 (FPS)【Pass】Sana1.6BLora微调1x8BF1628.7 (FPS)32.8 (FPS)【Pass】Kolors2.6B推理1x1FP16//【Test】多模态理解LLaVA 1.57B全参微调1x8BF1648.27 (SPS)49.94 (SPS)【Test】InternVL 2.02B微调1x8BF1633.77 (SPS)22.46 (SPS)【Pass】8B微调1x8BF1612.86 (SPS)11.00 (SPS)【Pass】26B微调1x8BF163.31 (SPS)3.26 (SPS)【Pass】76B全参微调8x16BF16214 (TPS)191 (TPS)【Test】InternVL 2.578B微调8x8BF16//【Test】Qwen2-VL2B微调1x8BF1634.15 (SPS)34.88 (SPS)【Pass】7B微调1x8BF1613.28 (SPS)11.66 (SPS)【Pass】72B微调4x8 (A3)BF16261.25 (TPS)257.63 (TPS)【Pass】语音识别Whisper1.5B预训练1x8BF1693.38 (SPS)109.23 (SPS)【Test】

其他已适配昇腾的多模态大模型

模型参数量任务集群精度格式NPU性能参考性能认证CogVLM-28B微调1x8BF163.9 (s/it)3.3 (s/it)【Pass】PLLaVA7B预训练1x8BF160.841 (s/step)0.935 (s/step)【Pass】7B预训练1x8FP320.935 (s/step)1.08 (s/step)【Pass】miniCPM-V 2.58B全参微调1x8BF161046 (s)/50-200steps847 (s)/50-200steps【Pass】8BLora微调1x8BF16603 (s)/50-200steps490 (s)/50-200steps【Pass】HunYuanDiT1.5B预训练1x8BF161099.5 (ms/step)1059.3 (ms/step)【Pass】InternVL 1.526B微调训练1x8BF164.952 (FPS)5.151 (FPS)【Pass】计算产品线公共开发部2012实验室华为云

MindSpeed-MM 生态贡献方:

360 AI Research北大OpenSoraPlan团队

来源:一飞开源

相关推荐