Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.
This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
This paper adds a tiny but powerful step called Early Knowledge Alignment (EKA) to multi-step retrieval systems so the model takes a quick, smart look at relevant information before it starts planning.
This paper teaches AI agents to learn new reusable skills and get better over time by using reinforcement learning, not just prompts.
JustRL shows that a tiny, steady recipe for reinforcement learning (RL) can make a 1.5B-parameter language model much better at math without fancy tricks.
Zoom-Zero helps AI answer questions about videos by first finding the right moment and then zooming in to double-check tiny details.
This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.
This paper teaches robots to move their camera to a better spot before answering a question about what they see.
The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.
This paper asks whether reinforcement learning (RL) can improve making 3D models from text and shows that the answer is yes if we design the training and rewards carefully.
SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.
ReVSeg teaches an AI to segment objects in videos by thinking step-by-step instead of guessing everything at once.