Papers2

#language identification

Qwen3-ASR Technical Report

Qwen3‑ASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.

#ASR#forced alignment#timestamps

Not triaged yet

PRiSM: Benchmarking Phone Realization in Speech Models

Beginner

Shikhar Bharadwaj, Chin-Jou Li et al.Jan 20arXiv

PRiSM is a new open-source benchmark that checks how well speech models hear and write down tiny speech sounds called phones.

#phone recognition#phonetic transcription#PFER

Not triaged yet