πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#agentic evaluation

DREAM: Deep Research Evaluation with Agentic Metrics

Intermediate
Elad Ben Avraham, Changhao Li et al.Feb 21arXiv

Deep research agents write long reports, but old tests often judge only how smooth they sound and whether they add links, not whether the facts are true today or the logic really holds.

#deep research agents#agentic evaluation#capability parity

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Intermediate
Yibo Wang, Lei Wang et al.Jan 14arXiv

The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.

#deep research agents#agentic evaluation#persona-driven tasks

Towards a Science of Scaling Agent Systems

Beginner
Yubin Kim, Ken Gu et al.Dec 9arXiv

Multi-agent AI teams are not automatically better; their success depends on matching the team’s coordination style to the job’s structure.

#multi-agent systems#agentic evaluation#scaling laws