Quick answer
AI Summary: Presents IMPALA, a scalable, distributed reinforcement learning architecture that uses the V-trace algorithm to efficiently train single agents across massive computer clusters.
AI Summary: Presents IMPALA, a scalable, distributed reinforcement learning architecture that uses the V-trace algorithm to efficiently train single agents across massive computer clusters.
Scaling reinforcement learning algorithms to utilize thousands of machines efficiently is crucial for tackling complex, visually rich environments. We introduce IMPALA (Importance Weighted Actor-Learner Architecture), a distributed agent architecture that decouples the acting and learning processes to achieve massive throughput. Unlike A3C, which uses gradients to communicate between workers, IMPALA actors communicate trajectories of experience to centralized learners. We introduce the V-trace off-policy actor-critic algorithm to correct for the policy lag caused by this decoupled architecture. IMPALA achieves data throughput that is orders of magnitude higher than previous systems, successfully learning a single policy capable of playing all 30 DMLab-30 levels and 57 Atari games.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.