Qwen3โASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.
PRiSM is a new open-source benchmark that checks how well speech models hear and write down tiny speech sounds called phones.