SAMTok: Representing Any Mask with Two Words
IntermediateYikang Zhou, Tao Zhang et al.Jan 22arXiv
SAMTok turns any objectβs mask in an image into just two special βwordsβ so language models can handle pixels like they handle text.
#SAMTok#mask tokenizer#residual vector quantization