Longer explanations are not always better; the shape of thinking matters.
UniT teaches one multimodal model to think in steps with pictures and words, so it can check its own work and fix mistakes as it goes.
This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.