Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.
VideoSSM is a new way to make long, stable, and lively videos by giving the model two kinds of memory: a short-term window and a long-term state-space memory.