Quick answer
AI Summary: Introduces a dynamic, adversarial benchmark for multi-agent systems, revealing critical vulnerabilities in how current models handle authority and deception.
AI Summary: Introduces a dynamic, adversarial benchmark for multi-agent systems, revealing critical vulnerabilities in how current models handle authority and deception.
As multi-agent swarms are deployed in open-ended web environments, standard static benchmarks fail to capture their vulnerability to dynamic threats and deceptive actors. We introduce AgentBench-2025, a comprehensive evaluation suite that places agentic swarms in simulated, adversarial corporate networks. The benchmark evaluates agents on task completion, API rate-limit management, and resistance to active 'Intent Redirection' attacks by rogue agents. Our results reveal that while frontier models excel in cooperative tasks, they fail catastrophically when forced to verify the authority of conflicting instructions in dynamic environments.
Share your opinion to help other learners triage faster.
Write a reviewInvite someone by email to share an invited review for AgentBench-2025: Evaluating Autonomous Swarms in Adversarial and Dynamic Environments.