Papers3

#test-time reinforcement learning

Tool Verification for Test-Time Reinforcement Learning

Ruotong Liao, Nikolai Röhrich et al.Mar 2arXiv

The paper fixes a big flaw in test-time reinforcement learning (TTRL): when many wrong answers agree, the model rewards the mistake and gets stuck.

#test-time reinforcement learning#verification-weighted voting#tool verification

Not triaged yet

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Intermediate

Chengyi Yang, Zhishang Xiang et al.Jan 30arXiv

TTCS is a way for a model to teach itself during the test by first making easier practice questions that are similar to the real hard question and then learning from them.

#test-time training#test-time reinforcement learning#curriculum learning

Not triaged yet

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Intermediate

Zhiyuan Hu, Yunhai Hu et al.Jan 14arXiv

This paper introduces MATTRL, a way for multiple AI agents to learn from their own conversations at test time using short, reusable text notes instead of retraining their weights.

#multi-agent systems#test-time reinforcement learning#experience retrieval

Not triaged yet