Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.
This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patchβs embedding in an image sequence, just like language models predict the next word.