CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.
VINO is a single AI model that can make and edit both images and videos by listening to text and looking at reference pictures and clips at the same time.
SemanticGen is a new way to make videos that starts by planning in a small, high-level 'idea space' (semantic space) and then adds the tiny visual details later.
Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.
EgoX turns a regular third-person video into a first-person video that looks like it was filmed from the actor’s eyes.