This paper teaches multimodal AI models to not just read pictures but to also imagine and think with pictures inside their heads.
FIGR is a new way for AI to βthink by drawing,β using code to build clean, editable diagrams while it reasons.
The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.