Quick answer
AI Summary: The foundational paper that introduced Reinforcement Learning from Human Preferences (now RLHF), demonstrating that AI agents can learn complex, hard-to-define behaviors like backflips purely from pairwise human comparisons.
AI Summary: The foundational paper that introduced Reinforcement Learning from Human Preferences (now RLHF), demonstrating that AI agents can learn complex, hard-to-define behaviors like backflips purely from pairwise human comparisons.
For many complex real-world tasks, defining a mathematical reward function is difficult, leading to misaligned AI behavior when optimized. We explore a method for solving reinforcement learning tasks without a programmatic reward function, relying instead on human feedback. Our algorithm asks humans to compare two short video clips of the agent's behavior and select the preferred one. We train a reward predictor on these preferences and simultaneously train an RL policy to optimize this predicted reward. Our method successfully solves complex tasks in Atari and simulated robotics environments, such as teaching a simulated robot to do a backflip, using only a few thousand bits of human feedback.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Deep reinforcement learning from human preferences.