The paper introduces LT-Tuning, a way for AI models to “think silently” using special hidden tokens instead of writing every step out loud.
Large Multimodal Models (LMMs) are great at reading text and looking at pictures, but they usually do most of their thinking in words, which limits deep visual reasoning.