The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.
Long texts make language models slow because they must keep and re-check a huge memory called the KV cache for every new word they write.
TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.