MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.
SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.
Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.
TimeBill is a way to help big AI models finish their answers on time without ruining answer quality.
The paper shows that when vision-language models write captions, only a small set of uncertain words (about 20%) act like forks that steer the whole sentence.
This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.
The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.
This paper teaches AI to notice not just what is in a picture, but how the picture looks and feels to people.
HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.
Streamo is a real-time video assistant that knows when to stay quiet, when to wait, and when to speak—while a video is still playing.
DreaMontage is a new AI method that makes long, single-shot videos that feel smooth and connected, even when you give it scattered images or short clips in the middle.
Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.