← Home

Quick answer

AI Summary: Demonstrates that minimizing variance during alignment leads to more stable and higher-quality outputs in generative policy optimization.

Claim

Diffusion Alignment as Variational Expectation-Maximization

Authors
Zijing Ou·
Jacob Si·
Junyi Zhu·
Yingzhen Li

ABSTRACT

Diffusion alignment aims to optimize diffusion models for downstream objectives. While existing methods based on RL achieve success, they often suffer from reward over-optimization and mode collapse. We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates alignment as an iterative process alternating between test-time search (E-step) to generate diverse reward-aligned samples and model refinement (M-step) using those samples. We demonstrate that DAV optimizes reward while preserving diversity for both continuous and discrete tasks, achieving a superior trade-off between aesthetic scores and naturalness compared to KL-regularized baselines.

Review Snapshot

Explore ratings

5.0
★★★★★
1 ratings
5 star
100%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Diffusion Alignment as Variational Expectation-Maximization.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Diffusion Alignment as Variational Expectation-Maximization | Attendemia