Qwen3โASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.
HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.