Papers8

#test-time training

Tool Verification for Test-Time Reinforcement Learning

Ruotong Liao, Nikolai Röhrich et al.Mar 2arXiv

The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.

#test-time reinforcement learning#verification-weighted voting#tool verification

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

Intermediate

Shannan Yan, Leqi Zheng et al.Feb 22arXiv

This paper teaches a computer to find the same object when seen from two very different cameras, like a body camera (first-person) and a room camera (third-person).

#cross-view correspondence#egocentric to exocentric#binary segmentation

Reinforced Fast Weights with Next-Sequence Prediction

Intermediate

Hee Seung Hwang, Xindi Wu et al.Feb 18arXiv

Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.

#fast weight models#next-sequence prediction#reinforcement learning for LMs

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Intermediate

Sidi Lu, Zhenwen Liang et al.Feb 4arXiv

Locas is a new kind of add-on memory for language models that learns during use but touches none of the model’s original weights.

#Locas#parametric memory#test-time training

LoopViT: Scaling Visual ARC with Looped Transformers

Intermediate

Wen-Jie Shu, Xuerui Qiu et al.Feb 2arXiv

Loop-ViT is a vision model that thinks in loops, so it can take more steps on hard puzzles and stop early on easy ones.

#ARC-AGI#visual reasoning#Looped Transformer

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Intermediate

Chengyi Yang, Zhishang Xiang et al.Jan 30arXiv

TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.

#test-time training#test-time reinforcement learning#curriculum learning

Learning to Discover at Test Time

Intermediate

Mert Yuksekgonul, Daniel Koceja et al.Jan 22arXiv

This paper shows how to keep training a language model while it is solving one hard, real problem, so it can discover a single, truly great answer instead of many average ones.

#test-time training#reinforcement learning#entropic objective

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Intermediate

Zechen Bai, Chen Gao et al.Dec 16arXiv

Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.

#EVOLVE-VLA#test-time training#vision-language-action