Trust Region Policy Optimization
Paper • Feb 19, 2015 • arXiv • John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz
We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical ...