Papers4

#RoPE

Voxtral Realtime

Alexander H. Liu, Andy Ehrenberg et al.Feb 11arXiv

Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.

#streaming ASR#real-time transcription#causal audio encoder

Not triaged yet

CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

Beginner

Haoran Li, Sucheng Ren et al.Feb 5arXiv

The paper introduces CoPE, a simple change to how models track word positions that makes long documents much easier for them to understand.

#CoPE#RoPE#Rotary Positional Embedding

Not triaged yet

Ministral 3

Beginner

Alexander H. Liu, Kartik Khandelwal et al.Jan 13arXiv

Ministral 3 is a new family of small-but-mighty AI language models (3B, 8B, 14B) that learn from a larger model using a step-by-step tutoring method called Cascade Distillation.

#Cascade Distillation#Model pruning#Logit distillation

Not triaged yet

Next-Embedding Prediction Makes Strong Vision Learners

Beginner

Sihan Xu, Ziqiao Ma et al.Dec 18arXiv

This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.

#self-supervised learning#vision transformer#autoregression

Not triaged yet