This paper teaches AI to pay attention better by training its focus, not just its words.
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.