This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.
Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.