๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#multimodal RL

Thinking with Images via Self-Calling Agent

Intermediate
Wenxi Yang, Yuzhong Zhao et al.Dec 9arXiv

This paper teaches a vision-language model to think about images by talking to copies of itself, using only words to plan and decide.

#Self-Calling Chain-of-Thought#sCoT#interleaved multimodal chain-of-thought