SAMTok: Representing Any Mask with Two Words
IntermediateYikang Zhou, Tao Zhang et al.Jan 22arXiv
SAMTok turns any objectโs mask in an image into just two special โwordsโ so language models can handle pixels like they handle text.
#SAMTok#mask tokenizer#residual vector quantization