Quick answer
AI Summary: Demonstrates that minimizing variance during alignment leads to more stable and higher-quality outputs in generative policy optimization.
AI Summary: Demonstrates that minimizing variance during alignment leads to more stable and higher-quality outputs in generative policy optimization.
Diffusion alignment aims to optimize diffusion models for downstream objectives. While existing methods based on RL achieve success, they often suffer from reward over-optimization and mode collapse. We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates alignment as an iterative process alternating between test-time search (E-step) to generate diverse reward-aligned samples and model refinement (M-step) using those samples. We demonstrate that DAV optimizes reward while preserving diversity for both continuous and discrete tasks, achieving a superior trade-off between aesthetic scores and naturalness compared to KL-regularized baselines.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Diffusion Alignment as Variational Expectation-Maximization.