Proact-VL is a video-talking AI that knows not only what to say but also when to say it, like a great sports commentator.
Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.
Big reasoning AIs think in many steps, which is slow and costly.
LiveTalk turns slow, many-step video diffusion into a fast, 4-step, real-time system for talking avatars that listen, think, and respond with synchronized video.
This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.
VideoSSM is a new way to make long, stable, and lively videos by giving the model two kinds of memory: a short-term window and a long-term state-space memory.