← Home

Quick answer

AI Summary: Mercury achieves 10x throughput over traditional autoregressive models by generating multiple tokens simultaneously through a coarse-to-fine refinement process.

Claim

Mercury: Ultra-Fast Diffusion-based Language Models with Coarse-to-Fine Parallel Refinement

Authors
Zhiwei Liu·
Yuan He·
Hao Peng·
Zhiqiang Chen

ABSTRACT

Recently, diffusion LLMs have emerged as a promising alternative to autoregressive models. Diffusion LLMs address traditional limitations with two advances, including multi-token prediction (i.e., generating multiple tokens at each step) and flexible generation order. In this work, we present Mercury, a family of diffusion-based language models optimized for ultra-fast parallel refinement. Mercury leverages a coarse-to-fine refinement process that significantly reduces the computational cost required for each token. Our results show that Mercury achieves inference speeds far exceeding those of AR LLMs while maintaining performance comparable to leading autoregressive models, offering strong support for real-time generation tasks.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Mercury: Ultra-Fast Diffusion-based Language Models with Coarse-to-Fine Parallel Refinement.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful