See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning
IntermediateShuoshuo Zhang, Yizhen Zhang et al.Dec 26arXiv
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.
#BiPS#perceptual shaping#vision-language models