Quick answer

AI Summary: Introduces a new framework and benchmark for evaluating multi-agent swarms on complex, repository-level software engineering tasks, proving the superiority of role-based agent collaboration.

Paper2025-07-08•Source ↗•35 attns414 checkouts

Claim

SWE-agent-2.0: Benchmarking Multi-Agent Swarms on Full-Stack Software Engineering

Authors

Discuss with Grok

Carlos E. Jimenez·

John Yang·

Alexander Wettig·

Kilian Lieder·

Shunyu Yao·

Karthik Narasimhan·

Ofir Press

ABSTRACT

As single-agent coding assistants plateau in their ability to handle repository-scale refactoring, multi-agent swarms have emerged as the new standard for autonomous software engineering. We introduce SWE-agent-2.0, a framework and evaluation suite designed to benchmark distributed agent swarms on real-world GitHub issues. By assigning distinct roles (Architect, Developer, Tester) within a sandboxed Unix environment, our reference swarm achieves a 42% resolution rate on the SWE-bench-Lite dataset, demonstrating that specialized agentic collaboration significantly outperforms monolithic code generation.

#agentic-ai/paper/year/2025 #cs-se #agentic-ai/month/202507 #benchmarking #agentic-ai/paper/month/202507 #autonomous-coding #agentic-ai/paper #agentic-ai/year/2025 #agentic-ai 📋 al:agentic-ai-2025

Review Snapshot

Explore ratings

4.6

★★★★★

5 ratings

5 star

60%

4 star

40%

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for SWE-agent-2.0: Benchmarking Multi-Agent Swarms on Full-Stack Software Engineering.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful