The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?
When you tune the learning rate carefully, plain old LoRA fine-tuning works about as well as fancy new versions.