← Home

Quick answer

AI Summary: The foundational paper that successfully applied Reinforcement Learning from Human Feedback (RLHF) to language models, significantly improving their ability to generate high-quality, human-preferred text summaries.

Claim

Learning to summarize from human feedback

Nisan Stiennon·
Long Ouyang·
Jeffrey Wu·
Daniel Ziegler·
Ryan Lowe·
Chelsea Voss·
Alec Radford·
Dario Amodei·
Paul Christiano

ABSTRACT

We show that it is possible to significantly improve the quality of text summaries generated by large language models by training them with reinforcement learning from human feedback. We collect a dataset of human preferences between pairs of summaries generated by our models, and train a reward model to predict these preferences. We then use Proximal Policy Optimization (PPO) to fine-tune the language model to maximize this reward. Our models trained with human feedback significantly outperform models trained via supervised fine-tuning, with human evaluators strongly preferring the RL-optimized summaries. This demonstrates a scalable approach for aligning models with complex human values.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Learning to summarize from human feedback.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful