The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.
This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.
SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.
ReVSeg teaches an AI to segment objects in videos by thinking step-by-step instead of guessing everything at once.