← Home

Quick answer

AI Summary: Details the application of Reinforcement Learning from Human Feedback (RLHF) to create InstructGPT, fundamentally bridging the gap between raw text prediction and helpful, conversational AI.

Claim

Training language models to follow instructions with human feedback

Long Ouyang·
Jeffrey Wu·
Xu Jiang·
Diogo Almeida·
Carroll Wainwright·
Pamela Mishkin·
Chong Zhang·
Sandhini Agarwal·
Katarina Slama·
Alex Ray·
John Schulman·
Jacob Hilton·
Fraser Kelton·
Luke Miller·
Maddie Simens·
Amanda Askell·
Peter Welinder·
Paul Christiano·
Jan Leike·
Ryan Lowe

ABSTRACT

Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. We use reinforcement learning from human feedback (RLHF) to fine-tune GPT-3 to follow a broad class of written instructions. The resulting InstructGPT models are much better at following instructions than GPT-3, while making up facts less often and showing small decreases in toxic output generation.

Review Snapshot

Explore ratings

4.2
★★★★
5 ratings
5 star
40%
4 star
40%
3 star
20%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Training language models to follow instructions with human feedback.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful