← Home

Quick answer

AI Summary: Introduces Proximal Policy Optimization (PPO), a reinforcement learning algorithm that achieves a perfect balance of ease of implementation, sample complexity, and tuning stability.

Claim

Proximal Policy Optimization Algorithms

John Schulman·
Filip Wolski·
Prafulla Dhariwal·
Alec Radford·
Oleg Klimov

ABSTRACT

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a 'surrogate' objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call Proximal Policy Optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically).

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Proximal Policy Optimization Algorithms.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful