Parallel-Probe is a simple add-on that lets many AI “thought paths” think at once but stop early when they already agree.
AutoFigure is an AI system that reads long scientific texts and then thinks, plans, and draws clear, good-looking figures—like a careful student who makes a neat, accurate poster from a long chapter.
This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.
This paper introduces 3DiMo, a new way to control how people move in generated videos while keeping the camera moves flexible through text.
SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.
AOrchestra is like a smart conductor that builds the right mini-helpers (sub-agents) on demand to solve big, multi-step tasks.
LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.
This paper builds ID-MoCQA, a new two-step (multi-hop) quiz set about Indonesian culture that makes AI connect clues before answering.
The paper asks a simple question: when an AI sees a picture and some text but the instructions say 'only trust the picture,' how does it decide which one to follow?
This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.
CL-bench is a new test that checks whether AI can truly learn new things from the information you give it right now, not just from what it memorized before.