Quick answer
AI Summary: The foundational paper that successfully applied Reinforcement Learning from Human Feedback (RLHF) to language models, significantly improving their ability to generate high-quality, human-preferred text summaries.
AI Summary: The foundational paper that successfully applied Reinforcement Learning from Human Feedback (RLHF) to language models, significantly improving their ability to generate high-quality, human-preferred text summaries.
We show that it is possible to significantly improve the quality of text summaries generated by large language models by training them with reinforcement learning from human feedback. We collect a dataset of human preferences between pairs of summaries generated by our models, and train a reward model to predict these preferences. We then use Proximal Policy Optimization (PPO) to fine-tune the language model to maximize this reward. Our models trained with human feedback significantly outperform models trained via supervised fine-tuning, with human evaluators strongly preferring the RL-optimized summaries. This demonstrates a scalable approach for aligning models with complex human values.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Learning to summarize from human feedback.