The paper teaches language models using extra 'language homework' made from the same raw text so they learn grammar and meaning, not just next-word guessing.
Large reasoning models can often find the right math answer in their βheadβ before finishing their written steps, but this works best in languages with lots of training data like English and Chinese.