Papers2

#web retrieval

CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era

Zhengqing Yuan, Kaiwen Shi et al.Feb 26arXiv

The paper tackles a new integrity problem in science: large language models sometimes invent realistic-looking citations that do not exist.

#citation verification#hallucinated citations#scholarly integrity

Not triaged yet

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate

Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

Not triaged yet