Topic: Reinforcement Learning

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for Reinforcement Learning, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

← Back to home

Causally Robust Reward Learning from Reason-Augmented Preference Feedback
Paper • Mar 4, 2026 • arXiv • Minjune Hwang, Yigit Korkmaz, Daniel Seita, Erdem Bıyık
Reward learning from human preferences often suffers from spurious correlations, leading agents to develop brittle and misaligned behaviors. The authors present a framework that integrates causal i...
Agentic Alignment: Inverse Reinforcement Learning from Swarm Behavior
Paper • Dec 22, 2025 • arXiv • Percy Liang, Thomas K. V., Eleanor Rigby
Aligning multi-agent systems via traditional human feedback is intractable due to the sheer volume and speed of agent-to-agent interactions. We introduce a novel alignment framework utilizing Inver...
Minecraft as a Turing Test: Evaluating Open-Ended Agentic AI
Paper • Jul 15, 2025 • arXiv • Kevin Zhu, Lara Croft, Julian Bao
Evaluating the long-horizon planning and adaptability of Agentic AI in the real world is fraught with safety and cost limitations. We establish Minecraft as the premier sandbox for open-ended agent...
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Paper • Jun 7, 2017 • arXiv • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inhere...
Hindsight Experience Replay
Paper • Jul 5, 2017 • arXiv • Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba
Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay (HER) which allows sample-efficient lear...
Concrete Problems in AI Safety
Paper • Jun 21, 2016 • arXiv • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such poten...
Emergent Tool Use From Multi-Agent Autocurricula
Paper • Sep 17, 2019 • arXiv • Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch
We demonstrate that simple multi-agent competition can drive the emergence of highly complex, intelligent behaviors without explicit human design. We train agents using reinforcement learning to pl...
Deep reinforcement learning from human preferences
Paper • Jun 12, 2017 • arXiv • Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei
For many complex real-world tasks, defining a mathematical reward function is difficult, leading to misaligned AI behavior when optimized. We explore a method for solving reinforcement learning tas...
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Video
Paper • Jun 23, 2022 • arXiv • Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune
Training agents to perform complex, long-horizon tasks typically requires massive amounts of heavily annotated data or prohibitive amounts of reinforcement learning trial-and-error. We introduce Vi...
Dota 2 with Large Scale Deep Reinforcement Learning
Paper • Dec 13, 2019 • arXiv • Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Ilya Sutskever, et al.
We present OpenAI Five, a system of five neural networks that learned to play the highly complex, imperfect-information esports game Dota 2 entirely through self-play. Dota 2 involves long time hor...
Proximal Policy Optimization Algorithms
Paper • Jul 20, 2017 • arXiv • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a 'surrogate' objective...
Human-level performance in 3D multiplayer games with population-based reinforcement learning
Paper • May 31, 2019 • Science • Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel
Multiplayer video games represent a significant frontier for AI research, requiring real-time, high-dimensional sensory processing, spatial navigation, and team-based coordination. We report an AI ...
Magnetic control of tokamak plasmas through deep reinforcement learning
Paper • Feb 16, 2022 • Nature • Jonas Degrave, Federico Felici, Jonas Buchli, Martin Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Jung, Abbas Abdolmaleki, Demis Hassabis, Martin Riedmiller
Nuclear fusion represents a clean, virtually limitless energy source, but sustaining the necessary plasma states inside a tokamak reactor requires complex, high-frequency magnetic control. Traditio...
Mastering the game of Go without human knowledge
Paper • Oct 18, 2017 • Nature • David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. We introduce AlphaGo Zero, an AI that achieves superhuman pe...
Agent57: Outperforming the Atari Human Benchmark
Paper • Mar 31, 2020 • arXiv • Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell
Atari 2600 games have been a long-standing benchmark in the reinforcement learning community. While previous algorithms have achieved superhuman performance on average, they consistently fail on a ...
Discovering faster matrix multiplication algorithms with reinforcement learning
Paper • Oct 5, 2022 • Nature • Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Pushmeet Kohli
Matrix multiplication is a fundamental computational task, heavily utilized in neural networks, scientific computing, and graphics. Despite its ubiquity, finding optimal algorithms for matrix multi...
Mastering the game of Go with deep neural networks and tree search
Paper • Jan 27, 2016 • Nature • David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Demis Hassabis
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and move...
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Paper • Feb 9, 2018 • arXiv • Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu
Scaling reinforcement learning algorithms to utilize thousands of machines efficiently is crucial for tackling complex, visually rich environments. We introduce IMPALA (Importance Weighted Actor-Le...
Mastering Atari, Go, chess and shogi by planning with a learned model
Paper • Dec 23, 2020 • Nature • Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, David Silver
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challengi...
Human-level control through deep reinforcement learning
Paper • Feb 26, 2015 • Nature • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Demis Hassabis
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural networ...

← PreviousPage 1Next →

FAQ

What does this Reinforcement Learning page rank?

It ranks public content for Reinforcement Learning using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in Reinforcement Learning?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: Reinforcement Learning

Short answer

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

Agentic Alignment: Inverse Reinforcement Learning from Swarm Behavior

Minecraft as a Turing Test: Evaluating Open-Ended Agentic AI

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Hindsight Experience Replay

Concrete Problems in AI Safety

Emergent Tool Use From Multi-Agent Autocurricula

Deep reinforcement learning from human preferences

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Video

Dota 2 with Large Scale Deep Reinforcement Learning

Proximal Policy Optimization Algorithms

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Magnetic control of tokamak plasmas through deep reinforcement learning

Mastering the game of Go without human knowledge

Agent57: Outperforming the Atari Human Benchmark

Discovering faster matrix multiplication algorithms with reinforcement learning

Mastering the game of Go with deep neural networks and tree search

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Mastering Atari, Go, chess and shogi by planning with a learned model

Human-level control through deep reinforcement learning

Top Entities In This Topic

Related Topics

FAQ

What does this Reinforcement Learning page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in Reinforcement Learning?

Can I follow this topic for updates?