ReCo is a new way to edit videos just by telling the computer what to change with words, no extra masks needed.
FlashPortrait makes talking-portrait videos that keep a person’s identity steady for as long as you want—minutes or even hours.
Robots learn best from what they would actually see, which is a first-person (egocentric) view, but most AI models are trained on third-person videos and get confused.
Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.
Spatia is a video generator that keeps a live 3D map of the scene (a point cloud) as its memory while making videos.
This paper fixes a common problem in video-making AIs where tiny mistakes snowball over time and ruin long videos.
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.
Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.
KlingAvatar 2.0 is a system that makes long, sharp, lifelike talking-person videos that follow audio, images, and text instructions all at once.
This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.
VideoCoF is a new way to edit videos that first figures out WHERE to edit and then does the edit, like thinking before acting.
Saber is a new way to make videos that match a text description while keeping the look of people or objects from reference photos, without needing special triplet datasets.