VidEoMT: Your ViT is Secretly Also a Video Segmentation Model
IntermediateNarges Norouzi, Idil Esen Zulfikar et al.Feb 19arXiv
VidEoMT shows that a single, well‑trained Vision Transformer (ViT) can segment and track objects in videos without extra tracking gadgets.
#Video Segmentation#Vision Transformer#Encoder-only