← Home

Quick answer

This paper introduces BLIP-3, an open framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs.

Claim

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Le Xue·
Manli Shu·
Anas Awadalla·
Jun Wang·
An Yan·
Senthil Purushwalkam·
Honglu Zhou·
Viraj Prabhu·
Yutong Dai·
Michael S Ryoo·
Shrikant Kendre·
Jieyu Zhang·
Shaoyen Tseng·
Gustavo A Lujan-Moreno·
Matthew L Olson·
Musashi Hinck·
David Cobbley·
Vasudev Lal·
Can Qin·
Shu Zhang·
Chia-Chih Chen·
Ning Yu·
Juntao Tan·
Tulika Manoj Awalgaonkar·
Shelby Heinecke·
Huan Wang·
Yejin Choi·
Ludwig Schmidt·
Zeyuan Chen·
Silvio Savarese·
Juan Carlos Niebles·
Caiming Xiong·
Ran Xu

ABSTRACT

This paper introduces BLIP-3, an open framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. We release 4B and 14B models, including both the pre-trained base model and the instruction fine-tuned ones. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks. Our models demonstrate competitive performance among open-source LMMs with similar model sizes. Our resulting LMMs demonstrate competitive performance among open-source LMMs with similar model sizes, with the ability to comprehend interleaved image-text inputs. Our training code, models, and all datasets used in this work, including the three largescale datasets we create and the preprocessed ones, will be open-sourced to better support the research community.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for xGen-MM (BLIP-3): A Family of Open Large Multimodal Models.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful