Topic: Awesome List: ai-agent-papers-2026

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for Awesome List: ai-agent-papers-2026, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. Toward Architecture-Aware Evaluation Metrics for LLM Agents

    PaperJan 27, 2026arxiv.orgDébora Souza, Patrícia Machado

    LLM-based agents are becoming central to software engineering tasks, yet evaluating them remains fragmented and largely model-centric. Existing studies overlook how architectural components, such a...

  2. DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle

    PaperJan 27, 2026arxiv.orgYuheng Tang, Kaijie Zhu, Bonan Ruan, Chuqi Zhang, Michael Yang, Hongwei Li, Suyue Guo, Tianneng Shi, Zekun Li, Christopher Kruegel, Giovanni Vigna, Dawn Song, William Yang Wang, Lun Wang, Yangruibo Ding, Zhenkai Liang, Wenbo Guo

    Even though demonstrating extraordinary capabilities in code generation and software issue resolving, AI agents' capabilities in the full software DevOps cycle are still unknown. Different from pur...

  3. Who Writes the Docs in SE 3.0? Agent vs. Human Documentation Pull Requests

    PaperJan 28, 2026arxiv.orgKazuma Yamasaki, Joseph Ayobami Joshua, Tasha Settewong, Mahmoud Alfadel, Kazumasa Shimari, Kenichi Matsumoto

    As software engineering moves toward SE3.0, AI agents are increasingly used to carry out development tasks and contribute changes to software projects. It is therefore important to understand the e...

  4. Interpreting Emergent Extreme Events in Multi-Agent Systems

    PaperJan 28, 2026arxiv.orgLing Tang, Jilin Mei, Dongrui Liu, Chen Qian, Dawei Cheng, Jing Shao, Xia Hu

    Large language model-powered multi-agent systems have emerged as powerful tools for simulating complex human-like systems. The interactions within these systems often lead to extreme events whose o...

  5. Agent Benchmarks Fail Public Sector Requirements

    PaperJan 28, 2026arxiv.orgJonathan Rystrøm, Chris Schmitz, Karolina Korgul, Jan Batzner, Chris Russell

    Deploying Large Language Model-based agents (LLM agents) in the public sector requires assuring that they meet the stringent legal, procedural, and structural requirements of public-sector institut...

  6. The Quiet Contributions: Insights into AI-Generated Silent Pull Requests

    PaperJan 28, 2026arxiv.orgS M Mahedy Hasan, Md Fazle Rabbi, Minhaz Zibran

    We present the first empirical study of AI-generated pull requests that are 'silent,' meaning no comments or discussions accompany them. This absence of any comments or discussions associated with ...

  7. JAF: Judge Agent Forest

    PaperJan 29, 2026arxiv.orgSahil Garg, Brad Cheezum, Sridhar Dutta, Vishal Agarwal

    Judge agents are fundamental to agentic AI frameworks: they provide automated evaluation, and enable iterative self-refinement of reasoning processes. We introduce JAF: Judge Agent Forest, a framew...

  8. TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI

    PaperJan 30, 2026arxiv.orgRoham Koohestani, Ateş Görpelioğlu, Egor Klimov, Burcu Kulahcioglu Ozkan, Maliheh Izadi

    Agentic AI systems act through tools and evolve their behavior over long, stochastic interaction traces. This setting complicates assurance, because behavior depends on nondeterministic environment...

  9. Benchmarking Agents in Insurance Underwriting Environments

    PaperJan 31, 2026arxiv.orgAmanda Dsouza, Ramya Ramakrishnan, Charles Dickens, Bhavishya Pohani, Christopher M Glaze

    As AI agents integrate into enterprise applications, their evaluation demands benchmarks that reflect the complexity of real-world operations. Instead, existing benchmarks overemphasize open-domain...

  10. HumanStudy-Bench: Towards AI Agent Design for Participant Simulation

    PaperJan 31, 2026arxiv.orgXuan Liu, Haoyang Shang, Zizhang Liu, Xinyan Liu, Yunze Xiao, Yiwen Tu, Haojian Jin

    Large language models (LLMs) are increasingly used as simulated participants in social science experiments, but their behavior is often unstable and highly sensitive to design choices. Prior evalua...

← PreviousPage 1Next →

Related Topics

FAQ

What does this Awesome List: ai-agent-papers-2026 page rank?

It ranks public content for Awesome List: ai-agent-papers-2026 using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to Awesome List: ai-agent-papers-2026 topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to Awesome List: ai-agent-papers-2026 topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in Awesome List: ai-agent-papers-2026?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to Awesome List: ai-agent-papers-2026 topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to Awesome List: ai-agent-papers-2026 topic page on Attendemia and is written so it still makes sense without reading other sections on the page.