CubeComposer is a new AI method that turns a normal forward-facing video into a full 360° VR video at true 4K quality without using super-resolution upscaling.
NOVA is a new video editor that lets you change a few key frames (sparse control) while it carefully keeps the original motion and background details (dense synthesis).
HiFi-Inpaint is a new AI method that fills a missing area in a photo of a person by inserting a specific product, while keeping tiny details like logos, textures, and small text crisp.
Kiwi-Edit is a new video editor that follows your words and also copies looks from a picture you give it.
The paper argues that to build an AI that truly understands and simulates the real world, it must be consistent in three ways at once: across different senses (modal), across 3D space (spatial), and across time (temporal).
Modern image generators can still make strange mistakes like extra fingers or melted faces, and today’s vision-language models (VLMs) often miss them.
Mobile-O is a small but smart AI that can both understand pictures and make new images, and it runs right on your phone.
DreamID-Omni is one model that can create, edit, and animate human-centered videos with matching voices, all in sync.
Stroke3D lets you draw simple 2D stick-figure strokes plus a short text, and it builds a ready-to-animate 3D model with a skeleton and textures.
MOVA is an open-source AI that makes videos and sounds at the same time so mouths, actions, and noises match perfectly.
This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.
SkyReels-V3 is a single AI model that can make videos in three ways: from reference images, by extending an existing video, and by creating talking avatars from audio.