Papers5

#deep research agents

DREAM: Deep Research Evaluation with Agentic Metrics

Elad Ben Avraham, Changhao Li et al.Feb 21arXiv

Deep research agents write long reports, but old tests often judge only how smooth they sound and whether they add links, not whether the facts are true today or the logic really holds.

#deep research agents#agentic evaluation#capability parity

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents

Intermediate

Tiansheng Hu, Yilun Zhao et al.Feb 5arXiv

SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.

#SAGE benchmark#scientific literature retrieval#deep research agents

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Intermediate

Jialiang Zhu, Gongrui Zhang et al.Feb 2arXiv

Re-TRAC is a new way for AI search agents to learn from each try, write a clean summary of what happened, and then use that summary to do better on the next try.

#Re-TRAC#trajectory compression#deep research agents

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate

Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

Over-Searching in Search-Augmented Large Language Models

Intermediate

Roy Xie, Deepak Gopinath et al.Jan 9arXiv

The paper shows that language models with a search tool often look up too much information, which wastes compute and can make answers worse on unanswerable questions.

#search-augmented LLMs#over-searching#abstention