Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
IntermediateJiajie Zhang, Xin Lv et al.Jan 9arXiv
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.
#deep search agents#reinforcement learning#rubric rewards