Robots usually learn by copying many demonstrations, which is expensive and makes them brittle when things change.
Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.
SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.
The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.
This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.
This paper teaches robots to move their camera to a better spot before answering a question about what they see.
QwenLong-L1.5 is a training recipe that helps AI read and reason over very long documents by improving the data it learns from, the way it is trained, and how it remembers important stuff.
DentalGPT is a special AI that looks at dental images and text together and explains what it sees like a junior dentist.
The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.
This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.
Role-playing agents need to juggle several goals at once, like staying in character, following instructions, and using the right tone.
The paper shows that video AIs do not need long, human-like chains of thought to reason well.