Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.
Qwen3โASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.