Qwen3‑ASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.
This paper builds one smart system that listens to child–adult conversations and writes what was said, who said it, and exactly when each person spoke.