This paper teaches AI agents to learn new reusable skills and get better over time by using reinforcement learning, not just prompts.
JustRL shows that a tiny, steady recipe for reinforcement learning (RL) can make a 1.5B-parameter language model much better at math without fancy tricks.
Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.
This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.
This paper teaches robots to move their camera to a better spot before answering a question about what they see.
The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.
This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.
SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.
ReVSeg teaches an AI to segment objects in videos by thinking step-by-step instead of guessing everything at once.