Towards Pixel-Level VLM Perception via Simple Points Prediction
IntermediateTianhui Song, Haoyu Lu et al.Jan 27arXiv
SimpleSeg teaches a multimodal language model to outline objects by writing down a list of points, like connecting the dots, instead of using a special segmentation decoder.
#SimpleSeg#multimodal large language model#decoder-free segmentation