Big idea: use a small, already-trained model to help a bigger model learn good habits early, so the big one trains faster and ends up smarter.
This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patchβs embedding in an image sequence, just like language models predict the next word.