How I Study AI - Learn AI Papers & Lectures the Easy Way

Toward Cognitive Supersensing in Multimodal Large Language Model

Intermediate

Boyi Li, Yifan Shen et al.Feb 2arXiv

This paper teaches multimodal AI models to not just read pictures but to also imagine and think with pictures inside their heads.

#multimodal large language model#visual cognition#latent visual imagery

Figure It Out: Improve the Frontier of Reasoning with Executable Visual States

Intermediate

Meiqi Chen, Fandong Meng et al.Dec 30arXiv

FIGR is a new way for AI to ‘think by drawing,’ using code to build clean, editable diagrams while it reasons.

#executable visual states#diagrammatic reasoning#reinforcement learning for reasoning

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Intermediate

Shuoshuo Zhang, Yizhen Zhang et al.Dec 26arXiv

The paper teaches vision-language models (AIs that look and read) to pay attention to the right picture parts without needing extra tools during answering time.

#BiPS#perceptual shaping#vision-language models

Papers3

Toward Cognitive Supersensing in Multimodal Large Language Model

Figure It Out: Improve the Frontier of Reasoning with Executable Visual States

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning