KARL is a smart search helper that learns to look up information step by step and explain answers using the facts it finds.
The paper shows that when a model compares two of its own answers head-to-head, it picks the right one more often than when it judges each answer alone.
This paper teaches long-horizon AI agents to remember everything exactly without stuffing their whole memory at once.
This paper teaches AI to name things in pictures very specifically (like “golden retriever” instead of just “dog”) without making more mistakes.
MemSifter is a smart helper that picks the right memories for a big AI so the big AI doesn’t have to read everything.
MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.
CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.
FireRed-OCR turns a general vision-language model into a careful document reader that follows strict rules, so its outputs are usable in the real world.
This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?
CHIMERA is a small (about 9,000 examples) but very carefully built synthetic dataset that teaches AI to solve hard problems step by step.
SWE-rebench V2 is a giant, language-agnostic robot pipeline that turns real GitHub pull requests into safe, runnable software tasks for training AI coding agents.
SLATE is a new way to teach AI to think step by step while using a search engine, giving feedback at each step instead of only at the end.