Papers2

All Beginner Intermediate Advanced

All Sources arXiv

#vision transformer

MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources

Intermediate

Baorui Ma, Jiahui Yang et al.Jan 29arXiv

Metric Anything is a new way to teach AI real, ruler-like distances (metric depth) from very mixed and noisy 3D data.

#metric depth estimation#sparse metric prompt#monocular depth

Not triaged yet

Next-Embedding Prediction Makes Strong Vision Learners

Beginner

Sihan Xu, Ziqiao Ma et al.Dec 18arXiv

This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.

#self-supervised learning#vision transformer#autoregression

Not triaged yet