Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
IntermediateZhiyuan Hu, Yucheng Wang et al.Jan 13arXiv
The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.
#Uniqueness-Aware Reinforcement Learning#LLM reasoning#strategy clustering