Papers3

#speaker diarization

VIBEVOICE-ASR Technical Report

Zhiliang Peng, Jianwei Yu et al.Jan 26arXiv

VIBEVOICE-ASR is a single-pass system that listens to up to 60 minutes of audio at once and outputs who spoke, when they spoke, and what they said in one stream.

#long-form ASR#speaker diarization#timestamping

Not triaged yet

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions

Intermediate

Anfeng Xu, Tiantian Feng et al.Jan 25arXiv

This paper builds one smart system that listens to child–adult conversations and writes what was said, who said it, and exactly when each person spoke.

#end-to-end ASR#speaker diarization#child speech

Not triaged yet

MOSS Transcribe Diarize Technical Report

Beginner

MOSI. AI, : et al.Jan 4arXiv

This paper introduces MOSS Transcribe Diarize, a single model that writes down what people say in a conversation, tells who said each part, and marks the exact times—all in one go.

#speaker diarization#speech recognition#end-to-end SATS

Not triaged yet