Papers2

#hybrid reward

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration

This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.

#search-integrated reasoning#reinforcement learning#credit assignment

Not triaged yet

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Intermediate

Changze Lv, Jie Zhou et al.Feb 3arXiv

DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.

#DeepResearch#query-specific rubrics#human preference learning

Not triaged yet