Quick answer
AI Summary: Details OpenAI Five, a massive multi-agent RL system that defeated world champion human players in the complex esports game Dota 2 using heavily scaled Proximal Policy Optimization.
AI Summary: Details OpenAI Five, a massive multi-agent RL system that defeated world champion human players in the complex esports game Dota 2 using heavily scaled Proximal Policy Optimization.
We present OpenAI Five, a system of five neural networks that learned to play the highly complex, imperfect-information esports game Dota 2 entirely through self-play. Dota 2 involves long time horizons, partially observed states, and high-dimensional, continuous action spaces, making it a grand challenge for AI. By scaling up Proximal Policy Optimization (PPO) to an unprecedented 256 GPUs and 128,000 CPU cores, OpenAI Five achieved expert-level performance, ultimately defeating the world champion e-sports team OG. The system demonstrates that general reinforcement learning algorithms can scale to solve extremely complex, real-time multi-agent environments without hand-crafted heuristics.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for Dota 2 with Large Scale Deep Reinforcement Learning.