Robots often learn good hand motions during training but get confused when the scene or the instructions change at test time, even a little bit.
AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.
Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.
AutoFigure is an AI system that reads long scientific texts and then thinks, plans, and draws clear, good-looking figures—like a careful student who makes a neat, accurate poster from a long chapter.
This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.
This paper introduces 3DiMo, a new way to control how people move in generated videos while keeping the camera moves flexible through text.
SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.
Reasoning Cache (RC) is a new way for AI to think in steps: it writes some thoughts, makes a short summary, throws away the long thoughts, and then keeps going using only the summary.
LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.
MemGUI-Bench is a new test that checks how well phone-controlling AI agents can remember important information both during a task and across different tries.
This paper builds ID-MoCQA, a new two-step (multi-hop) quiz set about Indonesian culture that makes AI connect clues before answering.
The paper asks a simple question: when an AI sees a picture and some text but the instructions say 'only trust the picture,' how does it decide which one to follow?