Topic: cs.LG

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for cs.LG, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

← Back to home

Minimax M2.5: Scaling RL for Industrial-Grade Agentic AI
Paper • Feb 16, 2026 • arXiv • MiniMax Research Team
Training agents for industrial-scale deployment requires extreme stability and data throughput. We present Minimax M2.5, a model trained using a novel asynchronous RL architecture designed to proce...
MASPO: Robust and Sample-Efficient LLM Reasoning via Unified Policy Optimization
Paper • Feb 19, 2026 • arXiv • Xiaoliang Fu, Jiaye Lin, Yangyi Fang
Policy optimization for Large Language Models often suffers from gradient instability and reward signal unreliability, particularly in mathematical and verifiable reasoning tasks. We introduce MASP...
Fast KV Compaction via Attention Matching
Paper • Feb 18, 2026 • arXiv • Adam Zweiger, Xinghong Fu, Han Guo, MIT Team
Large Language Models struggle with memory overhead during long-context inference due to the linear growth of the Key-Value (KV) cache. We propose Attention Matching (AM), a framework for high-qual...
KLong: Training LLM Agents for Extremely Long-horizon Tasks
Paper • Feb 19, 2026 • arXiv • Yue Liu, Zhiyuan Hu, Flood Sung
Current LLM agents frequently fail in tasks requiring hundreds of steps due to error accumulation and context overflow. We introduce KLong, an agentic framework that utilizes 'Trajectory-Splitting ...
Identifying Intervenable and Interpretable Features via Orthogonality Regularization
Paper • Feb 17, 2026 • arXiv • Moritz Miller, Florent Draye, Bernhard Schölkopf
This paper addresses the fundamental challenge of 'feature disentanglement' in modern deep learning. We propose an Orthogonality Regularization technique to identify features that are both interpre...
Trust Region Policy Optimization
Paper • Feb 19, 2015 • arXiv • John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz
We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical ...
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Paper • Dec 8, 2021 • arXiv • Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Eleni Elia, Danilo J. Rezende, Vinyals, Simonyan
Language modelling provides a step towards intelligent communication systems by harnessing large datasets and expressive models. We provide an analysis of Transformer-based language model architect...
Asynchronous Methods for Deep Reinforcement Learning
Paper • Feb 4, 2016 • arXiv • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present as...
A Generalist Agent
Paper • May 12, 2022 • arXiv • Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Nando de Freitas
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato,...
Stateful Memory Compaction for Mitigating Context Rot in Long-Horizon Agents
Paper • Feb 21, 2026 • arXiv • Zhixuan Fang, Yisong Yue, Anima Anandkumar
Long-horizon autonomous agents suffer from 'context rot,' a phenomenon where the accumulation of irrelevant interaction history severely degrades the model's reasoning capabilities over time. We pr...
KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning
Paper • Feb 4, 2026 • arXiv • Fan Yang, Rui Meng, Trudi Di Qi, Ali Ezzati, Yuxin Wen
Direct Preference Optimization (DPO) and its variants have revolutionized LLM alignment, yet they struggle when the preferred choice requires deep, multi-step reasoning. We introduce KEPO, a framew...
Towards On-Policy SFT: Distribution Discriminant Theory and its Applications
Paper • Feb 13, 2026 • arXiv • Miaosen Zhang, Xu Yang, Qi Dai, Chong Luo
Supervised fine-tuning (SFT) is efficient but often yields inferior generalization compared to RL, a gap driven by RL's use of on-policy data. We propose a framework to bridge this chasm by enablin...
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Models
Paper • Feb 8, 2026 • arXiv • Fan Yang, Rui Meng, Trudi Di Qi, Yuxin Wen
We argue that while replacing data with model parameters characterizes the present of Federated Learning (FL), replacing parameters with preferences represents a more scalable and privacy-preservin...
STAR: Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction
Paper • Feb 12, 2026 • arXiv • Xiaoxiao Wang, Chunxiao Li, Junying Wang, Zicheng Zhang
The paper introduces STAR (STatistical and Agentic Reasoning), a novel framework designed to predict the performance of large models across diverse benchmarks from limited observations. Existing st...
MEM1: A Constant-Memory RL Framework for Long-Horizon Language Agents
Paper • Feb 12, 2026 • arXiv • Yurong Chen, Yu He, Michael I. Jordan, Fan Yao
Modern language agents must operate over long-horizon, multi-turn interactions, but most rely on full-context prompting which leads to unbounded memory growth. We introduce MEM1, an end-to-end rein...
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
Paper • Feb 8, 2026 • arXiv • Hao Li, Richard Peng, Sanjit Singh, Gregory D. Lyng
SkillRL is a framework that enables LLM agents to learn high-level, reusable behavioral patterns from past experiences. While traditional memory-based methods store redundant and noisy raw trajecto...
Mercury: Ultra-Fast Diffusion-based Language Models with Coarse-to-Fine Parallel Refinement
Paper • Feb 13, 2026 • arXiv • Zhiwei Liu, Yuan He, Hao Peng, Zhiqiang Chen
Recently, diffusion LLMs have emerged as a promising alternative to autoregressive models. Diffusion LLMs address traditional limitations with two advances, including multi-token prediction (i.e., ...

← PreviousPage 3Next →

FAQ

What does this cs.LG page rank?

It ranks public content for cs.LG using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in cs.LG?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to cs.LG topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Topic: cs.LG

Short answer

Minimax M2.5: Scaling RL for Industrial-Grade Agentic AI

MASPO: Robust and Sample-Efficient LLM Reasoning via Unified Policy Optimization

Fast KV Compaction via Attention Matching

KLong: Training LLM Agents for Extremely Long-horizon Tasks

Identifying Intervenable and Interpretable Features via Orthogonality Regularization

Trust Region Policy Optimization

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Asynchronous Methods for Deep Reinforcement Learning

A Generalist Agent

Stateful Memory Compaction for Mitigating Context Rot in Long-Horizon Agents

KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications

Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Models

STAR: Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction

MEM1: A Constant-Memory RL Framework for Long-Horizon Language Agents

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Mercury: Ultra-Fast Diffusion-based Language Models with Coarse-to-Fine Parallel Refinement

Top Entities In This Topic

Related Topics

FAQ

What does this cs.LG page rank?

How should I use weekly vs monthly vs all-time?

How can I discover organizations active in cs.LG?

Can I follow this topic for updates?