Quick answer
AI Summary: Mercury achieves 10x throughput over traditional autoregressive models by generating multiple tokens simultaneously through a coarse-to-fine refinement process.
AI Summary: Mercury achieves 10x throughput over traditional autoregressive models by generating multiple tokens simultaneously through a coarse-to-fine refinement process.
Recently, diffusion LLMs have emerged as a promising alternative to autoregressive models. Diffusion LLMs address traditional limitations with two advances, including multi-token prediction (i.e., generating multiple tokens at each step) and flexible generation order. In this work, we present Mercury, a family of diffusion-based language models optimized for ultra-fast parallel refinement. Mercury leverages a coarse-to-fine refinement process that significantly reduces the computational cost required for each token. Our results show that Mercury achieves inference speeds far exceeding those of AR LLMs while maintaining performance comparable to leading autoregressive models, offering strong support for real-time generation tasks.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Mercury: Ultra-Fast Diffusion-based Language Models with Coarse-to-Fine Parallel Refinement.