AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search
IntermediateZefang Zong, Dingwei Chen et al.Jan 8arXiv
AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.
#Agentic Reinforcement Learning#Turn-level Optimization#Tree Search