This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.
This paper introduces 3DiMo, a new way to control how people move in generated videos while keeping the camera moves flexible through text.
SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like what’s in front, behind, bigger, or reachable.
LIVE is a new way to train video-making AIs so their mistakes don’t snowball over long videos.
This paper builds ID-MoCQA, a new two-step (multi-hop) quiz set about Indonesian culture that makes AI connect clues before answering.
The paper asks a simple question: when an AI sees a picture and some text but the instructions say 'only trust the picture,' how does it decide which one to follow?
This paper teaches AI to look things up on the web and fix its own mistakes mid-thought instead of starting over from scratch.
DeepResearch agents write long, evidence-based reports, but teaching and grading them is hard because there is no single 'right answer' to score against.
HY3D-Bench is a complete, open-source “starter kit” for making and studying high-quality 3D objects.
HySparse is a new way for AI models to pay attention that mixes a few full attention layers with many fast, memory‑saving sparse layers.
The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.
A-RAG lets the AI choose how to search, what to read, and when to stop, instead of following a fixed recipe.