The paper introduces UCoder, a way to teach a code-generating AI to get better without using any outside datasets, not even unlabeled code.
JustRL shows that a tiny, steady recipe for reinforcement learning (RL) can make a 1.5B-parameter language model much better at math without fancy tricks.
Different programming languages scale differently when training code AI models, so treating them all the same wastes compute and lowers performance.