APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.
Everyone uses tests (benchmarks) to judge how smart AI models are, but not all tests are good tests.
FOCUSUI makes computer-using AI faster and still accurate by looking only at the important parts of a screen.
ATLAS is a system that picks the best mix of AI models and helper tools for each question, instead of using just one model or a fixed tool plan.
The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.
DiffCoT treats a model’s step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.
Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.
This paper teaches a computer agent to grow a toolbox of skills that are real, runnable programs, not just text ideas.
EpiQAL is a new benchmark that tests how well AI models answer population-level disease questions using real research papers.
ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.
The paper teaches language models using extra 'language homework' made from the same raw text so they learn grammar and meaning, not just next-word guessing.
Mixture-of-Experts (MoE) language models don’t split cleanly into domain specialists; instead, a small, stable group of experts gets chosen again and again across many subjects.