Learning to summarize from human feedback
Paper • Sep 2, 2020 • arXiv • Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
We show that it is possible to significantly improve the quality of text summaries generated by large language models by training them with reinforcement learning from human feedback. We collect a ...