The paper introduces CoPE, a simple change to how models track word positions that makes long documents much easier for them to understand.
Ministral 3 is a new family of small-but-mighty AI language models (3B, 8B, 14B) that learn from a larger model using a step-by-step tutoring method called Cascade Distillation.
This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patchβs embedding in an image sequence, just like language models predict the next word.