Quick answer

AI Summary: Introduces a dynamic, adversarial benchmark for multi-agent systems, revealing critical vulnerabilities in how current models handle authority and deception.

Paper2025-09-10•Source ↗•31 attns176 checkouts

Claim

AgentBench-2025: Evaluating Autonomous Swarms in Adversarial and Dynamic Environments

Authors

Discuss with Grok

Yao Mu·

Tianyu Zheng·

Percy Liang·

Dan Hendrycks

ABSTRACT

As multi-agent swarms are deployed in open-ended web environments, standard static benchmarks fail to capture their vulnerability to dynamic threats and deceptive actors. We introduce AgentBench-2025, a comprehensive evaluation suite that places agentic swarms in simulated, adversarial corporate networks. The benchmark evaluates agents on task completion, API rate-limit management, and resistance to active 'Intent Redirection' attacks by rogue agents. Our results reveal that while frontier models excel in cooperative tasks, they fail catastrophically when forced to verify the authority of conflicting instructions in dynamic environments.

#agentic-ai/paper/year/2025 #agentic-ai/paper/month/202509 #security #agentic-ai/month/202509 #benchmarking #agentic-ai/paper #agentic-ai/year/2025 #agentic-ai #cs-ai 📋 al:agentic-ai-2025

Review Snapshot

Explore ratings

4.6

★★★★★

5 ratings

5 star

60%

4 star

40%

3 star

2 star

1 star

Recommendation

100%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for AgentBench-2025: Evaluating Autonomous Swarms in Adversarial and Dynamic Environments.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.

Post an inquiry

Sort by: Most helpful