InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
IntermediateYuchen Yan, Liang Jiang et al.Feb 6arXiv
Long chains of thought make AI smarter but also slower, pricier, and limited by memory windows.
#Iterative reasoning#Reinforcement learning for LLMs#Trajectory-level optimization