Topic: Reinforcement Learning

Short answer

This page shows the most relevant public items for Reinforcement Learning, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

Weekly Monthly All time

Current week Past week 2 weeks ago

← Back to home

Minimax M2.5: Scaling RL for Industrial-Grade Agentic AI
Paper • Feb 16, 2026 • arXiv • MiniMax Research Team
Training agents for industrial-scale deployment requires extreme stability and data throughput. We present Minimax M2.5, a model trained using a novel asynchronous RL architecture designed to proce...
MASPO: Robust and Sample-Efficient LLM Reasoning via Unified Policy Optimization
Paper • Feb 19, 2026 • arXiv • Xiaoliang Fu, Jiaye Lin, Yangyi Fang
Policy optimization for Large Language Models often suffers from gradient instability and reward signal unreliability, particularly in mathematical and verifiable reasoning tasks. We introduce MASP...
KLong: Training LLM Agents for Extremely Long-horizon Tasks
Paper • Feb 19, 2026 • arXiv • Yue Liu, Zhiyuan Hu, Flood Sung
Current LLM agents frequently fail in tasks requiring hundreds of steps due to error accumulation and context overflow. We introduce KLong, an agentic framework that utilizes 'Trajectory-Splitting ...

Topic: Reinforcement Learning

Short answer

Minimax M2.5: Scaling RL for Industrial-Grade Agentic AI

MASPO: Robust and Sample-Efficient LLM Reasoning via Unified Policy Optimization

KLong: Training LLM Agents for Extremely Long-horizon Tasks

Related Topics