Scaling Laws for Reward Model Overoptimization
Paper • Oct 19, 2022 • arXiv • Leo Gao, John Schulman, Jacob Hilton
When optimizing a policy against a learned reward model, the policy eventually exploits errors in the reward model, leading to a decline in the true underlying objective. This phenomenon, known as ...