Improving Language Understanding by Generative Pre-TrainingPaper·Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Suts…·6/11/2018Source ↗
Proximal Policy Optimization AlgorithmsPaper·John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radf…·7/20/2017Source ↗
Hindsight Experience ReplayPaper·Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schnei…·7/5/2017Source ↗
Deep reinforcement learning from human preferencesPaper·Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, S…·6/12/2017Source ↗
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsPaper·Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, I…·6/7/2017Source ↗
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real WorldPaper·Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojci…·3/20/2017Source ↗
Evolution Strategies as a Scalable Alternative to Reinforcement LearningPaper·Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever·3/10/2017Source ↗
Concrete Problems in AI SafetyPaper·Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christia…·6/21/2016Source ↗
Trust Region Policy OptimizationPaper·John Schulman, Sergey Levine, Pieter Abbeel, Michael Jord…·2/19/2015Source ↗