This paper shows how to make text-to-video models create clearer, steadier, and more on-topic videos without using any human-labeled ratings.
CoDance is a new way to animate many characters in one picture using just one pose video, even if the picture and the video do not line up perfectly.
Big video makers (diffusion models) create great videos but are too slow because they use hundreds of tiny clean-up steps.
APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.
LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.
This paper is about making the words you type into a generator turn into the right pictures and videos more reliably.
This paper introduces BiCo, a one-shot way to mix ideas from images and videos by tightly tying each visual idea to the exact words in a prompt.