← Home

Quick answer

Aggregation query over free text is a long-standing yet underexplored problem. " Existing paradigms such as Text-to-SQL and Retrieval-Augmented Generation fail to achieve this completeness.

Claim

Aggregation Queries over Unstructured Text: Benchmark and Agentic Method

Haojia Zhu·
Qinyuan Xu·
Haoyu Li·
Yuxi Liu·
Hanchen Qiu·
Jiaoyan Chen·
Jiahui Jin

ABSTRACT

Aggregation query over free text is a long-standing yet underexplored problem. Unlike ordinary question answering, aggregate queries require exhaustive evidence collection and systems are required to "find all," not merely "find one." Existing paradigms such as Text-to-SQL and Retrieval-Augmented Generation fail to achieve this completeness. In this work, we formalize entity-level aggregation querying over text in a corpus-bounded setting with strict completeness requirement. To enable principled evaluation, we introduce AGGBench, a benchmark designed to evaluate completeness-oriented aggregation under realistic large-scale corpus. To accompany the benchmark, we propose DFA (Disambiguation--Filtering--Aggregation), a modular agentic baseline that decomposes aggregation querying into interpretable stages and exposes key failure modes related to ambiguity, filtering, and aggregation. Empirical results show that DFA consistently improves aggregation evidence coverage over strong RAG and agentic baselines. The data and code are available in \href{https://anonymous.4open.science/r/DFA-A4C1}.

Review Snapshot

Explore ratings

0.0
★★★★★
0 ratings
5 star
0%
4 star
0%
3 star
0%
2 star
0%
1 star
0%

Recommendation

0%

recommend this content.

Review this content

Share your opinion to help other learners triage faster.

Write a review

Invite a reviewer

Invite someone by email to share an invited review for Aggregation Queries over Unstructured Text: Benchmark and Agentic Method.

Author Inquiries

Public questions about this content. Attendemia will route your question to the author. Vote on the most important ones. No guarantee of response.
Post an inquiry
Sort by: Most helpful
Aggregation Queries over Unstructured Text: Benchmark and Agentic Method | Attendemia