Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
BeginnerTong Wu, Yang Liu et al.Dec 8arXiv
This paper teaches a language model to think along several paths at the same time instead of one step after another.
#parallel reasoning#reinforcement learning for LLMs#self-distillation