The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.
TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.
This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.