Quick answer

Paper2025-10-17•Source ↗•10 attns5,041 checkouts

Claim

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Authors

Discuss with Grok

Qingyu Shi·

Jinbin Bai·

Zhuoran Zhao·

Wenhao Chai·

Kaidong Yu·

Jianzong Wu·

Shuangyong Song·

Yunhai Tong·

Xiangtai Li·

Xuelong Li·

Shuicheng Yan

ABSTRACT

Unified generation models aim to handle diverse tasks across modalities -- such as text generation, image generation, and vision-language reasoning -- within a single architecture and decoding paradigm. Autoregressive unified models suffer from slow inference due to sequential decoding, and non-autoregressive unified models suffer from weak generalization due to limited pretrained backbones. We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities. Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder, enabling flexible and high-quality multimodal generation under a unified architecture. Empirical results show that Muddit achieves competitive or superior performance compared to significantly larger autoregressive models in both quality and efficiency. The work highlights the potential of purely discrete diffusion, when equipped with strong visual priors, as a scalable and effective backbone for unified generation.

#machine-learning #machine-learning/month/202510 #deep-learning/month/202510 📋 Awesome List: multimodal #deep-learning/year/2025 #multimodal #deep-learning #machine-learning/year/2025

Review Snapshot

Explore ratings

0.0

★★★★★

0 ratings

5 star

4 star

3 star

2 star

1 star

Recommendation

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful