Topic: AI Engineering

Track this topic after sign-in.

Short answer

This page shows the most relevant public items for AI Engineering, ranked by trend activity and review signal. Use weekly for fast changes, monthly for more stable patterns, and all-time for evergreen picks.

WeeklyMonthlyAll time

← Back to home

  1. JAF: Judge Agent Forest

    PaperJan 29, 2026arxiv.orgSahil Garg, Brad Cheezum, Sridhar Dutta, Vishal Agarwal

    Judge agents are fundamental to agentic AI frameworks: they provide automated evaluation, and enable iterative self-refinement of reasoning processes. We introduce JAF: Judge Agent Forest, a framew...

  2. TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI

    PaperJan 30, 2026arxiv.orgRoham Koohestani, Ateş Görpelioğlu, Egor Klimov, Burcu Kulahcioglu Ozkan, Maliheh Izadi

    Agentic AI systems act through tools and evolve their behavior over long, stochastic interaction traces. This setting complicates assurance, because behavior depends on nondeterministic environment...

  3. Benchmarking Agents in Insurance Underwriting Environments

    PaperJan 31, 2026arxiv.orgAmanda Dsouza, Ramya Ramakrishnan, Charles Dickens, Bhavishya Pohani, Christopher M Glaze

    As AI agents integrate into enterprise applications, their evaluation demands benchmarks that reflect the complexity of real-world operations. Instead, existing benchmarks overemphasize open-domain...

  4. HumanStudy-Bench: Towards AI Agent Design for Participant Simulation

    PaperJan 31, 2026arxiv.orgXuan Liu, Haoyang Shang, Zizhang Liu, Xinyan Liu, Yunze Xiao, Yiwen Tu, Haojian Jin

    Large language models (LLMs) are increasingly used as simulated participants in social science experiments, but their behavior is often unstable and highly sensitive to design choices. Prior evalua...

  5. TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents

    PaperFeb 6, 2026arxiv.orgYibing Liu, Chong Zhang, Zhongyi Han, Hansong Liu, Yong Wang, Yang Yu, Xiaoyan Wang, Yilong Yin

    We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output filteri...

  6. JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks

    PaperFeb 6, 2026arxiv.orgLanbo Lin, Jiayao Liu, Tianyuan Yang, Li Cai, Yuanwu Xu, Lei Wei, Sicong Xie, Guannan Zhang

    Evaluating agentic AI on open-ended professional tasks faces a fundamental dilemma between rigor and flexibility. Static rubrics provide rigorous, reproducible assessment but fail to accommodate di...

  7. AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

    PaperFeb 6, 2026arxiv.orgAlisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach

    LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced from sta...

  8. Agentic Uncertainty Reveals Agentic Overconfidence

    PaperFeb 6, 2026arxiv.orgJean Kaddour, Srijan Patel, Gbètondji Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner

    Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhibit agen...

  9. From Features to Actions: Explainability in Traditional and Agentic AI Systems

    PaperFeb 6, 2026arxiv.orgSindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza

    Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structu...

  10. SimpleMem: Efficient Lifelong Memory for LLM Agents

    PaperJan 29, 2026arxiv.orgJiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, Huaxiu Yao

    To support long-term interaction in complex environments, LLM agents require memory systems that manage historical experiences. Existing approaches either retain full interaction histories via pass...

  11. MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

    PaperJan 6, 2026arxiv.orgDongming Jiang, Yi Li, Guanpeng Li, Bingzhe Li

    Memory-Augmented Generation (MAG) extends Large Language Models with external memory to support long-context reasoning, but existing approaches largely rely on semantic similarity over monolithic m...

  12. Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents

    PaperJan 20, 2026arxiv.orgDehao Tao, Guoliang Ma, Yongfeng Huang, Minghu Jiang

    Human-agent dialogues often exhibit topic continuity-a stable thematic frame that evolves through temporally adjacent exchanges-yet most large language model (LLM) agent memory systems fail to pres...

← PreviousPage 4Next →

Related Topics

FAQ

What does this AI Engineering page rank?

It ranks public content for AI Engineering using recent discussion, review, and engagement signals so you can triage faster. This guidance is specific to AI Engineering topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How should I use weekly vs monthly vs all-time?

Use weekly for fast-moving updates, monthly for stable trend confirmation, and all-time for evergreen references. This guidance is specific to AI Engineering topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

How can I discover organizations active in AI Engineering?

Use the linked entities section to jump to labs, companies, and experts connected to this topic and explore their timelines. This guidance is specific to AI Engineering topic page on Attendemia and is written so it still makes sense without reading other sections on the page.

Can I follow this topic for updates?

Yes. Use the follow button on this page to subscribe and track new high-signal activity. This guidance is specific to AI Engineering topic page on Attendemia and is written so it still makes sense without reading other sections on the page.