MOVA is an open-source AI that makes videos and sounds at the same time so mouths, actions, and noises match perfectly.
This paper teaches talking avatars not just to speak, but to look around their scene and handle nearby objects exactly as a text instruction says.
This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
This paper builds a real-time talking-listening head avatar that reacts naturally to your words, tone, nods, and smiles in about half a second.
KlingAvatar 2.0 is a system that makes long, sharp, lifelike talking-person videos that follow audio, images, and text instructions all at once.