Quick answer

Paper2025-05-20•Source ↗•10 attns5,757 checkouts

Claim

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation

Authors

Discuss with Grok

Rui Tian·

Mingfei Gao·

Mingze Xu·

Jiaming Hu·

Jiasen Lu·

Zuxuan Wu·

Yinfei Yang·

Afshin Dehghan

ABSTRACT

We introduce UniGen, a unified multimodal large language model (MLLM) capable of image understanding and generation. We study the full training pipeline of UniGen from a data-centric perspective, including multi-stage pre-training, supervised fine-tuning, and direct preference optimization. More importantly, we propose a new Chain-of-Thought Verification (CoT-V) strategy for test-time scaling, which significantly boosts UniGen's image generation quality using a simple Best-of-N test-time strategy. Specifically, CoT-V enables UniGen to act as both image generator and verifier at test time, assessing the semantic alignment between a text prompt and its generated image in a step-by-step CoT manner. Trained entirely on open-source datasets across all stages, UniGen achieves state-of-the-art performance on a range of image understanding and generation benchmarks, with a final score of 0.78 on GenEval and 85.19 on DPG-Bench. Through extensive ablation studies, our work provides actionable insights and addresses key challenges in the full life cycle of building unified MLLMs, contributing meaningful directions to the future research.

#machine-learning/month/202505 #deep-learning/month/202505 #machine-learning 📋 Awesome List: multimodal #deep-learning/year/2025 #multimodal #deep-learning #machine-learning/year/2025

Review Snapshot

Explore ratings

0.0

★★★★★

0 ratings

5 star

4 star

3 star

2 star

1 star

Recommendation

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful