Topic: cs.LG

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for cs.LG, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

Current month Last month 2 months ago

← Back to home

Diffusion Alignment as Variational Expectation-Maximization
Paper • Feb 13, 2026 • arXiv • Zijing Ou, Jacob Si, Junyi Zhu, Yingzhen Li
Diffusion alignment aims to optimize diffusion models for downstream objectives. While existing methods based on RL achieve success, they often suffer from reward over-optimization and mode collaps...
Scaling Laws for Reward Model Overoptimization
Paper • Oct 19, 2022 • arXiv • Leo Gao, John Schulman, Jacob Hilton
When optimizing a policy against a learned reward model, the policy eventually exploits errors in the reward model, leading to a decline in the true underlying objective. This phenomenon, known as ...
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Paper • Jun 7, 2017 • arXiv • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inhere...
Hindsight Experience Replay
Paper • Jul 5, 2017 • arXiv • Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba
Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay (HER) which allows sample-efficient lear...
Scaling Laws for Autoregressive Generative Modeling
Paper • Oct 28, 2020 • arXiv • Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish
Building upon previous work establishing scaling laws for language models, we investigate whether similar power-law scaling relationships hold across other data modalities. We train autoregressive ...
Asymmetric self-play for automatic goal discovery in robotic manipulation
Paper • Jan 14, 2021 • arXiv • OpenAI Robotics Team
Training robots to solve a wide variety of manipulation tasks typically requires massive amounts of human-engineered reward functions and goal specifications. We introduce a method for automatic go...
Let's Verify Step by Step
Paper • May 31, 2023 • arXiv • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
Large language models often struggle with multi-step logical reasoning, frequently hallucinating incorrect steps that invalidate the final answer. To improve reasoning capabilities, we compare two ...
Generating Long Sequences with Sparse Attention
Paper • Apr 23, 2019 • arXiv • Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever
Transformers are powerful sequence models, but their self-attention mechanism scales quadratically with sequence length, making them computationally prohibitive for long inputs like high-resolution...
Emergent Tool Use From Multi-Agent Autocurricula
Paper • Sep 17, 2019 • arXiv • Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch
We demonstrate that simple multi-agent competition can drive the emergence of highly complex, intelligent behaviors without explicit human design. We train agents using reinforcement learning to pl...
Deep reinforcement learning from human preferences
Paper • Jun 12, 2017 • arXiv • Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei
For many complex real-world tasks, defining a mathematical reward function is difficult, leading to misaligned AI behavior when optimized. We explore a method for solving reinforcement learning tas...
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Video
Paper • Jun 23, 2022 • arXiv • Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune
Training agents to perform complex, long-horizon tasks typically requires massive amounts of heavily annotated data or prohibitive amounts of reinforcement learning trial-and-error. We introduce Vi...
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper • Dec 14, 2023 • arXiv • Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu
As AI models become increasingly capable, we will eventually face the challenge of superalignment: how can humans supervise AI systems that are much smarter than them? To study this empirically tod...
Consistency Models
Paper • Mar 2, 2023 • arXiv • Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
Diffusion models have achieved significant success in image, audio, and video generation, but they depend on an iterative generation process that causes slow sampling and precludes real-time applic...
Learning Dexterous In-Hand Manipulation
Paper • Jul 30, 2018 • arXiv • Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba
We demonstrate that reinforcement learning algorithms can be used to learn highly dexterous, in-hand manipulation policies that successfully transfer to the real world. We train a policy to control...
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
Paper • Jan 6, 2022 • arXiv • Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, Vedant Misra
We demonstrate a striking phenomenon in the training dynamics of neural networks on small algorithmic datasets: networks that initially severely overfit the training data can, after continued train...
Improved Denoising Diffusion Probabilistic Models
Paper • Feb 18, 2021 • arXiv • Alex Nichol, Prafulla Dhariwal
Denoising diffusion probabilistic models (DDPMs) have recently demonstrated high-quality image generation, but they suffer from notoriously slow sampling times and sub-optimal log-likelihoods. We p...
Dota 2 with Large Scale Deep Reinforcement Learning
Paper • Dec 13, 2019 • arXiv • Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Ilya Sutskever, et al.
We present OpenAI Five, a system of five neural networks that learned to play the highly complex, imperfect-information esports game Dota 2 entirely through self-play. Dota 2 involves long time hor...
Learning to summarize from human feedback
Paper • Sep 2, 2020 • arXiv • Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
We show that it is possible to significantly improve the quality of text summaries generated by large language models by training them with reinforcement learning from human feedback. We collect a ...
Solving Rubik's Cube with a Robot Hand
Paper • Oct 15, 2019 • arXiv • Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Peter Welinder, Lilian Weng, Wojciech Zaremba, Lei Zhang
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. We use reinforcement learning to train a policy to sol...
Evaluating Large Language Models Trained on Code
Paper • Jul 7, 2021 • arXiv • Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Wojciech Zaremba, Ilya Sutskever, et al.
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copi...

← PreviousPage 1Next →

FAQ

What does this cs.LG page rank?

It ranks public content for cs.LG using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in cs.LG?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: cs.LG

Short answer

Diffusion Alignment as Variational Expectation-Maximization

Scaling Laws for Reward Model Overoptimization

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Hindsight Experience Replay

Scaling Laws for Autoregressive Generative Modeling

Asymmetric self-play for automatic goal discovery in robotic manipulation

Let's Verify Step by Step

Generating Long Sequences with Sparse Attention

Emergent Tool Use From Multi-Agent Autocurricula

Deep reinforcement learning from human preferences

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Video

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Consistency Models

Learning Dexterous In-Hand Manipulation

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Improved Denoising Diffusion Probabilistic Models

Dota 2 with Large Scale Deep Reinforcement Learning

Learning to summarize from human feedback

Solving Rubik's Cube with a Robot Hand

Evaluating Large Language Models Trained on Code

Top Entities In This Topic

Related Topics

FAQ

What does this cs.LG page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in cs.LG?

Can I follow this topic for updates?