πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#patch tokens

Locality-Attending Vision Transformer

Intermediate
Sina Hajimiri, Farzad Beizaee et al.Mar 5arXiv

Vision Transformers (ViTs) are great at recognizing what is in a whole image but often blur the tiny details needed to label each pixel (segmentation).

#Vision Transformer#self-attention#segmentation

What matters for Representation Alignment: Global Information or Spatial Structure?

Intermediate
Jaskirat Singh, Xingjian Leng et al.Dec 11arXiv

This paper asks whether generation training benefits more from an encoder’s big-picture meaning (global semantics) or from how features are arranged across space (spatial structure).

#representation alignment#REPA#iREPA