Yume1.5 is a model that turns text or a single image into a living, explorable video world you can move through with keyboard keys.
SciEvalKit is a new open-source toolkit that tests AI on real scientific skills, not just trivia or simple Q&A.
SpotEdit is a training‑free way to edit only the parts of an image that actually change, instead of re-generating the whole picture.
MAI-UI is a family of AI agents that can see, understand, and control phone and computer screens using plain language.
SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.
Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.
TimeBill is a way to help big AI models finish their answers on time without ruining answer quality.
The paper shows that when vision-language models write captions, only a small set of uncertain words (about 20%) act like forks that steer the whole sentence.
This paper introduces Knot Forcing, a way to make talking-head videos that look great while being generated live, frame by frame.
The paper shows that many AI systems work best when a small 'compressor' model first shrinks long text into a short, info-packed summary and a bigger 'predictor' model then reasons over that summary.
This paper teaches AI to notice not just what is in a picture, but how the picture looks and feels to people.
HiStream makes 1080p video generation much faster by removing repeated work across space, time, and steps.