Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
VideoAR is a new way to make videos with AI that writes each frame like a story, one step at a time, while painting details from coarse to fine.
This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.
Over++ is a video AI that adds realistic effects like shadows, splashes, dust, and smoke between a foreground and a background without changing the original footage.
EasyV2V is a simple but powerful system that edits videos by following plain-language instructions like “make the shirt blue starting at 2 seconds.”
Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.
DuetSVG is a new AI that learns to make SVG graphics by generating an image and the matching SVG code together, like sketching first and then tracing neatly.
Diffusion models sometimes copy training images too closely, which can be a privacy and copyright problem.
Wan-Move is a new way to control how things move in AI-generated videos by guiding motion directly inside the model’s hidden features.