Quick answer

AI Summary: By leveraging structured sparsity in Monarch matrices, this model reduces the quadratic complexity of video attention to near-linear.

Paper2026-02-13•Source ↗•19 attns0 checkouts

Claim

MonarchRT: Efficient Attention for Real-Time Video Generation

Authors

Discuss with Grok

Krish Agarwal·

Zhuoming Chen·

Cheng Luo·

Beidi Chen

ABSTRACT

The quadratic complexity of attention severely limits the context scalability of Video Diffusion Transformers (DiTs). We find that the sparse spatio-temporal attention patterns in Video DiTs can be naturally represented by the Monarch matrix—a class of structured matrices with flexible sparsity. We propose VMonarch, an attention mechanism that adaptively captures intra-frame and inter-frame correlations using Monarch factorization. We introduce a recomputation strategy for stability and an online entropy algorithm fused into FlashAttention for fast updates. VMonarch reduces attention FLOPs by 17.5x and achieves a speedup of over 5x for long videos while maintaining generation quality.

#computer-vision/paper/month/202602 #computer-vision/year/2026 #cs-cv #computer-vision/paper/year/2026 #cs-ai #computer-vision/paper #computer-vision/month/202602 #computer-vision

Review Snapshot

Explore ratings

0.0

★★★★★

0 ratings

5 star

4 star

3 star

2 star

1 star

Recommendation

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for MonarchRT: Efficient Attention for Real-Time Video Generation.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful