Ex-Omni is a new open-source AI system that can understand text or speech and then talk back while moving a 3D face in sync with the voice.
Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.
The paper shows how to control accents in text-to-speech (TTS) by mixing simple, linguistics-based sound-change rules with speaker embeddings.
Digital humans used to just copy motions; this paper makes them think, speak, and move in sync like real people.