The paper introduces DeepResearchEval, a fully automated way to build realistic deep research tasks and to grade long research reports from AI systems.
Multi-agent AI teams are not automatically better; their success depends on matching the teamβs coordination style to the jobβs structure.