← Home

Quick answer

AI Summary: Details OpenAI Five, a massive multi-agent RL system that defeated world champion human players in the complex esports game Dota 2 using heavily scaled Proximal Policy Optimization.

Claim

Dota 2 with Large Scale Deep Reinforcement Learning

Christopher Berner·
Greg Brockman·
Brooke Chan·
Vicki Cheung·
Przemysław Dębiak·
Christy Dennison·
David Farhi·
Quirin Fischer·
Shariq Hashme·
Chris Hesse·
Ilya Sutskever·
et al.

ABSTRACT

We present OpenAI Five, a system of five neural networks that learned to play the highly complex, imperfect-information esports game Dota 2 entirely through self-play. Dota 2 involves long time horizons, partially observed states, and high-dimensional, continuous action spaces, making it a grand challenge for AI. By scaling up Proximal Policy Optimization (PPO) to an unprecedented 256 GPUs and 128,000 CPU cores, OpenAI Five achieved expert-level performance, ultimately defeating the world champion e-sports team OG. The system demonstrates that general reinforcement learning algorithms can scale to solve extremely complex, real-time multi-agent environments without hand-crafted heuristics.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Dota 2 with Large Scale Deep Reinforcement Learning.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful