← Home

Quick answer

AI Summary: Introduces a new framework and benchmark for evaluating multi-agent swarms on complex, repository-level software engineering tasks, proving the superiority of role-based agent collaboration.

Claim

SWE-agent-2.0: Benchmarking Multi-Agent Swarms on Full-Stack Software Engineering

Carlos E. Jimenez·
John Yang·
Alexander Wettig·
Kilian Lieder·
Shunyu Yao·
Karthik Narasimhan·
Ofir Press

ABSTRACT

As single-agent coding assistants plateau in their ability to handle repository-scale refactoring, multi-agent swarms have emerged as the new standard for autonomous software engineering. We introduce SWE-agent-2.0, a framework and evaluation suite designed to benchmark distributed agent swarms on real-world GitHub issues. By assigning distinct roles (Architect, Developer, Tester) within a sandboxed Unix environment, our reference swarm achieves a 42% resolution rate on the SWE-bench-Lite dataset, demonstrating that specialized agentic collaboration significantly outperforms monolithic code generation.

Review Snapshot

Explore ratings

4.6
★★★★★
5 ratings
5 star
60%
4 star
40%
3 star
0%
2 star
0%
1 star
0%

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for SWE-agent-2.0: Benchmarking Multi-Agent Swarms on Full-Stack Software Engineering.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful