The paper asks a simple question: if a language model becomes better at step-by-step reasoning (using RLVR), do its text embeddings also get better? The short answer is no.
The paper shows that big sequence models (like transformers) quietly learn longer goals inside their hidden activations, even though they are trained one step at a time.