Search is not the same as research; real research needs planning, checking many sources, fixing mistakes, and writing a clear report.
This paper builds DiRL, a fast and careful way to finish training diffusion language models so they reason better.
This paper teaches robots to move their camera to a better spot before answering a question about what they see.
The paper asks how to best use expert step-by-step solutions (expert trajectories) when teaching big AI models to reason after pretraining.