← Home

Quick answer

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can...

Claim

LLaVA-OneVision: Easy Visual Task Transfer

Bo Li·
Yuanhan Zhang·
Dong Guo·
Renrui Zhang·
Feng Li·
Hao Zhang·
Kaichen Zhang·
Peiyuan Zhang·
Yanwei Li·
Ziwei Liu·
Chunyuan Li

ABSTRACT

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for LLaVA-OneVision: Easy Visual Task Transfer.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful