OmniGAIA is a new test that checks if AI can watch videos, look at images, listen to audio, and use web and code tools in several steps to find a verified answer.
Large multimodal models (LMMs) can look at pictures and read text, but they still miss tricky cases, like tiny chart labels or multi-step math.
This paper tests whether AI can realistically guess what a specific social media user would comment when they see a new post.
This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.
dLLM is a single, open-source toolbox that standardizes how diffusion language models are trained, run, and tested.
Different transformers may have very different weights, but they often hide the same tiny "engine" inside that actually does the task.
The paper introduces CMDM, a new way to make computer-generated human motions that feel smooth over time and match the meaning of a text prompt.
This paper makes training giant AI models faster and lighter on memory by inventing a new way to split tensors called RaggedShard.
Solaris is a new AI that can imagine the future videos of two Minecraft players at the same time, keeping both cameras consistent with each other.
The paper builds an automated pipeline that translates AI benchmarks and datasets into many languages while keeping questions and answers correctly connected.
GUI-Libra is a training recipe that helps computer-using AI agents both think carefully and click precisely on screens.
WoG (World Guidance) teaches a robot to imagine just the right bits of the near future and use those bits to pick better actions.