The paper trains language models to solve hard problems by first breaking them into smaller parts and then solving those parts, instead of only thinking in one long chain.
Supervised fine-tuning (SFT) often makes a model great at a new task but worse at its old skills; this paper explains a key reason why and how to fix it.