The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.
This paper teaches a computer to find the same object when seen from two very different cameras, like a body camera (first-person) and a room camera (third-person).
Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.
Locas is a new kind of add-on memory for language models that learns during use but touches none of the model’s original weights.
Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.
TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.
This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.
Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.