Quick answer

AI Summary: Demonstrates that minimizing variance during alignment leads to more stable and higher-quality outputs in generative policy optimization.

Paper2026-02-13•Source ↗•24 attns1 checkout

Claim

Diffusion Alignment as Variational Expectation-Maximization

Authors

Discuss with Grok

Zijing Ou·

Jacob Si·

Junyi Zhu·

Yingzhen Li

ABSTRACT

Diffusion alignment aims to optimize diffusion models for downstream objectives. While existing methods based on RL achieve success, they often suffer from reward over-optimization and mode collapse. We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates alignment as an iterative process alternating between test-time search (E-step) to generate diverse reward-aligned samples and model refinement (M-step) using those samples. We demonstrate that DAV optimizes reward while preserving diversity for both continuous and discrete tasks, achieving a superior trade-off between aesthetic scores and naturalness compared to KL-regularized baselines.

#machine-learning/month/202602 #machine-learning #cs-lg #machine-learning/year/2026

Review Snapshot

Explore ratings

5.0

★★★★★

1 ratings

5 star

100%

4 star

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Diffusion Alignment as Variational Expectation-Maximization.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful