The paper asks a simple question: if a language model becomes better at step-by-step reasoning (using RLVR), do its text embeddings also get better? The short answer is no.
Similarity tells you if two models seem to think about things the same way, but it doesnβt tell you if that thinking is sturdy when the world wiggles.