πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#ADE20K

Locality-Attending Vision Transformer

Intermediate
Sina Hajimiri, Farzad Beizaee et al.Mar 5arXiv

Vision Transformers (ViTs) are great at recognizing what is in a whole image but often blur the tiny details needed to label each pixel (segmentation).

#Vision Transformer#self-attention#segmentation

Next-Embedding Prediction Makes Strong Vision Learners

Beginner
Sihan Xu, Ziqiao Ma et al.Dec 18arXiv

This paper introduces NEPA, a very simple way to teach vision models by having them predict the next patch’s embedding in an image sequence, just like language models predict the next word.

#self-supervised learning#vision transformer#autoregression