This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.
Large language models usually get only a final thumbs-up or thumbs-down at the end of their answer, which is too late to fix mistakes made in the middle.