Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
IntermediateXiao Liang, Zhong-Zhi Li et al.Feb 2arXiv
The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.
#divide-and-conquer reasoning#chain-of-thought#reinforcement learning