Everyone uses tests (benchmarks) to judge how smart AI models are, but not all tests are good tests.
FOCUSUI makes computer-using AI faster and still accurate by looking only at the important parts of a screen.
The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.
DiffCoT treats a model’s step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.
This paper teaches a computer agent to grow a toolbox of skills that are real, runnable programs, not just text ideas.
EpiQAL is a new benchmark that tests how well AI models answer population-level disease questions using real research papers.
Mixture-of-Experts (MoE) language models don’t split cleanly into domain specialists; instead, a small, stable group of experts gets chosen again and again across many subjects.
InfiniDepth is a new way to predict depth that treats every image location as a smooth, continuous place you can ask for depth, not just the fixed pixels of a grid.
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.
Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.
This paper introduces SOP, a system that lets many real robots learn new skills online at the same time while keeping one shared brain (policy).
MMFormalizer is a new system that turns problems with pictures and words (like physics scenes or geometry diagrams) into strict, checkable math statements and proofs.