Locality-Attending Vision Transformer
IntermediateSina Hajimiri, Farzad Beizaee et al.Mar 5arXiv
Vision Transformers (ViTs) are great at recognizing what is in a whole image but often blur the tiny details needed to label each pixel (segmentation).
#Vision Transformer#self-attention#segmentation