← Home

Quick answer

AI Summary: The foundational paper that introduced Reinforcement Learning from Human Preferences (now RLHF), demonstrating that AI agents can learn complex, hard-to-define behaviors like backflips purely from pairwise human comparisons.

Claim

Deep reinforcement learning from human preferences

Paul F Christiano·
Jan Leike·
Tom Brown·
Miljan Martic·
Shane Legg·
Dario Amodei

ABSTRACT

For many complex real-world tasks, defining a mathematical reward function is difficult, leading to misaligned AI behavior when optimized. We explore a method for solving reinforcement learning tasks without a programmatic reward function, relying instead on human feedback. Our algorithm asks humans to compare two short video clips of the agent's behavior and select the preferred one. We train a reward predictor on these preferences and simultaneously train an RL policy to optimize this predicted reward. Our method successfully solves complex tasks in Atari and simulated robotics environments, such as teaching a simulated robot to do a backflip, using only a few thousand bits of human feedback.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Deep reinforcement learning from human preferences.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful