This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.