InfoPO: Information-Driven Policy Optimization for User-Centric Agents
IntermediateFanqi Kong, Jiayi Zhang et al.Feb 28arXiv
Many real-life requests to AI helpers are vague, so agents must ask good questions before acting.
#Information-driven RL#Turn-level credit assignment#Counterfactual masking