FOFPred is a new AI that reads one or two images plus a short instruction like “move the bottle left to right,” and then predicts how every pixel will move in the next moments.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.
This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.
VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.
LiveTalk turns slow, many-step video diffusion into a fast, 4-step, real-time system for talking avatars that listen, think, and respond with synchronized video.
Over++ is a video AI that adds realistic effects like shadows, splashes, dust, and smoke between a foreground and a background without changing the original footage.
EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”
Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.