The paper shows that changing the language a model 'thinks in' (its language of thought) can make its English answers more varied without making them much worse in quality.
Large reasoning models can often find the right math answer in their βheadβ before finishing their written steps, but this works best in languages with lots of training data like English and Chinese.
Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.