Ministral 3 is a new family of small-but-mighty AI language models (3B, 8B, 14B) that learn from a larger model using a step-by-step tutoring method called Cascade Distillation.
This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patchβs embedding in an image sequence, just like language models predict the next word.
Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.