This paper shows that many reasoning failures in AI are caused by just a few distracting words in the prompt, not because the problems are too hard.
This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.