Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
IntermediateZhenwen Liang, Sidi Lu et al.Dec 17arXiv
This paper teaches large language models (LLMs) to explore smarter by listening to their own gradients—the directions they would update—rather than chasing random variety.
#gradient-guided reinforcement learning#GRL#GRPO