Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
IntermediateFuting Wang, Jianhao Yan et al.Feb 12arXiv
The paper teaches language models to explore more ideas while thinking, so they can solve harder problems.
#In-Context Exploration#Test-Time Scaling#Chain-of-Thought