Topic: Reinforcement Learning

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for Reinforcement Learning, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Causally Robust Reward Learning from Reason-Augmented Preference Feedback

    PaperMar 4, 2026arXivMinjune Hwang, Yigit Korkmaz, Daniel Seita, Erdem Bıyık

    Reward learning from human preferences often suffers from spurious correlations, leading agents to develop brittle and misaligned behaviors. The authors present a framework that integrates causal i...

  2. Agentic Alignment: Inverse Reinforcement Learning from Swarm Behavior

    PaperDec 22, 2025arXivPercy Liang, Thomas K. V., Eleanor Rigby

    Aligning multi-agent systems via traditional human feedback is intractable due to the sheer volume and speed of agent-to-agent interactions. We introduce a novel alignment framework utilizing Inver...

  3. Minecraft as a Turing Test: Evaluating Open-Ended Agentic AI

    PaperJul 15, 2025arXivKevin Zhu, Lara Croft, Julian Bao

    Evaluating the long-horizon planning and adaptability of Agentic AI in the real world is fraught with safety and cost limitations. We establish Minecraft as the premier sandbox for open-ended agent...

  4. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

    PaperJun 7, 2017arXivRyan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch

    We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inhere...

  5. Hindsight Experience Replay

    PaperJul 5, 2017arXivMarcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba

    Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay (HER) which allows sample-efficient lear...

  6. Concrete Problems in AI Safety

    PaperJun 21, 2016arXivDario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané

    Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper, we discuss one such poten...

  7. Emergent Tool Use From Multi-Agent Autocurricula

    PaperSep 17, 2019arXivBowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch

    We demonstrate that simple multi-agent competition can drive the emergence of highly complex, intelligent behaviors without explicit human design. We train agents using reinforcement learning to pl...

  8. Deep reinforcement learning from human preferences

    PaperJun 12, 2017arXivPaul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei

    For many complex real-world tasks, defining a mathematical reward function is difficult, leading to misaligned AI behavior when optimized. We explore a method for solving reinforcement learning tas...

  9. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Video

    PaperJun 23, 2022arXivBowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

    Training agents to perform complex, long-horizon tasks typically requires massive amounts of heavily annotated data or prohibitive amounts of reinforcement learning trial-and-error. We introduce Vi...

  10. Dota 2 with Large Scale Deep Reinforcement Learning

    PaperDec 13, 2019arXivChristopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Ilya Sutskever, et al.

    We present OpenAI Five, a system of five neural networks that learned to play the highly complex, imperfect-information esports game Dota 2 entirely through self-play. Dota 2 involves long time hor...

  11. Proximal Policy Optimization Algorithms

    PaperJul 20, 2017arXivJohn Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

    We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a 'surrogate' objective...

  12. Human-level performance in 3D multiplayer games with population-based reinforcement learning

    PaperMay 31, 2019ScienceMax Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

    Multiplayer video games represent a significant frontier for AI research, requiring real-time, high-dimensional sensory processing, spatial navigation, and team-based coordination. We report an AI ...

  13. Magnetic control of tokamak plasmas through deep reinforcement learning

    PaperFeb 16, 2022NatureJonas Degrave, Federico Felici, Jonas Buchli, Martin Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Jung, Abbas Abdolmaleki, Demis Hassabis, Martin Riedmiller

    Nuclear fusion represents a clean, virtually limitless energy source, but sustaining the necessary plasma states inside a tokamak reactor requires complex, high-frequency magnetic control. Traditio...

  14. Mastering the game of Go without human knowledge

    PaperOct 18, 2017NatureDavid Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis

    A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. We introduce AlphaGo Zero, an AI that achieves superhuman pe...

  15. Agent57: Outperforming the Atari Human Benchmark

    PaperMar 31, 2020arXivAdrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell

    Atari 2600 games have been a long-standing benchmark in the reinforcement learning community. While previous algorithms have achieved superhuman performance on average, they consistently fail on a ...

  16. Mastering the game of Go with deep neural networks and tree search

    PaperJan 27, 2016NatureDavid Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Demis Hassabis

    The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and move...

  17. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

    PaperFeb 9, 2018arXivLasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu

    Scaling reinforcement learning algorithms to utilize thousands of machines efficiently is crucial for tackling complex, visually rich environments. We introduce IMPALA (Importance Weighted Actor-Le...

  18. Mastering Atari, Go, chess and shogi by planning with a learned model

    PaperDec 23, 2020NatureJulian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, David Silver

    Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challengi...

  19. Human-level control through deep reinforcement learning

    PaperFeb 26, 2015NatureVolodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Demis Hassabis

    We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural networ...

← PreviousPage 1Next →

Top Entities In This Topic

Related Topics

FAQ

What does this Reinforcement Learning page rank?

It ranks public content for Reinforcement Learning using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in Reinforcement Learning?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to Reinforcement Learning topic page on Attendemia and is written so it still makes sense without reading other sections on the page.