← Home

Quick answer

AI Summary: By leveraging structured sparsity in Monarch matrices, this model reduces the quadratic complexity of video attention to near-linear.

Claim

MonarchRT: Efficient Attention for Real-Time Video Generation

Authors
Krish Agarwal·
Zhuoming Chen·
Cheng Luo·
Beidi Chen

ABSTRACT

The quadratic complexity of attention severely limits the context scalability of Video Diffusion Transformers (DiTs). We find that the sparse spatio-temporal attention patterns in Video DiTs can be naturally represented by the Monarch matrix—a class of structured matrices with flexible sparsity. We propose VMonarch, an attention mechanism that adaptively captures intra-frame and inter-frame correlations using Monarch factorization. We introduce a recomputation strategy for stability and an online entropy algorithm fused into FlashAttention for fast updates. VMonarch reduces attention FLOPs by 17.5x and achieves a speedup of over 5x for long videos while maintaining generation quality.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for MonarchRT: Efficient Attention for Real-Time Video Generation.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
MonarchRT: Efficient Attention for Real-Time Video Generation | Attendemia