The paper shows that when a model compares two of its own answers head-to-head, it picks the right one more often than when it judges each answer alone.
This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.
MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.
CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.
FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.
This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?
SWE-rebench V2 is a giant, language-agnostic robot pipeline that turns real GitHub pull requests into safe, runnable software tasks for training AI coding agents.
EmbodMocap is a low-cost, portable way to capture people moving inside real places using just two iPhones, so computers and robots can learn from real life instead of studios.
This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.
GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.
LongVideo-R1 is a smart video-watching agent that jumps to the right moments in long videos instead of scanning everything.
PyVision-RL teaches vision-language models to act like curious agents that think in multiple steps and use Python tools to inspect images and videos.