Quick answer

AI Summary: KLong addresses the 'memory rot' that occurs when agents attempt tasks lasting several hours. It uses a unique training method that breaks long trajectories into manageable segments during fine-tuning, allowing the model to learn intermediate checkpoints.

Paper2026-02-19•Source ↗•28 attns352 checkouts

Claim

KLong: Training LLM Agents for Extremely Long-horizon Tasks

Authors

Discuss with Grok

Yue Liu·

Zhiyuan Hu·

Flood Sung

ABSTRACT

Current LLM agents frequently fail in tasks requiring hundreds of steps due to error accumulation and context overflow. We introduce KLong, an agentic framework that utilizes 'Trajectory-Splitting Supervised Fine-Tuning' (TS-SFT) and progressive reinforcement learning to master extremely long-horizon tasks. KLong (106B) achieves state-of-the-art results on PaperBench, outperforming models ten times its size by maintaining high reasoning fidelity across thousands of tokens. This work provides a scalable method for training agents that can independently conduct scientific research and complex project management.

#cs-lg #long-horizon-planning #reinforcement-learning #cs-ai

Review Snapshot

Explore ratings

4.3

★★★★★

6 ratings

5 star

50%

4 star

33%

3 star

17%

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for KLong: Training LLM Agents for Extremely Long-horizon Tasks.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful