DREAM: Deep Research Evaluation with Agentic Metrics
IntermediateElad Ben Avraham, Changhao Li et al.Feb 21arXiv
Deep research agents write long reports, but old tests often judge only how smooth they sound and whether they add links, not whether the facts are true today or the logic really holds.
#deep research agents#agentic evaluation#capability parity