BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
IntermediateShiyu Liu, Yongjing Yin et al.Jan 16arXiv
RL-trained search agents often sound confident even when they donβt know, which can mislead people.
#agentic search#reinforcement learning#boundary awareness