Quick answer
AI Summary: Introduces GLIDE, a diffusion model that popularized 'classifier-free guidance' to generate photorealistic images from text and pioneered natural language image inpainting and editing.
AI Summary: Introduces GLIDE, a diffusion model that popularized 'classifier-free guidance' to generate photorealistic images from text and pioneered natural language image inpainting and editing.
Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guiding technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that classifier-free guidance yields higher-quality images that better capture the text prompts. We present GLIDE, a 3.5 billion parameter text-guided diffusion model. Human evaluators overwhelmingly prefer GLIDE over DALL-E 1. Furthermore, we demonstrate that GLIDE can be fine-tuned to perform powerful, zero-shot image editing (inpainting) via natural language prompts.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.