Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
IntermediateYubo Wang, Juntian Zhang et al.Jan 11arXiv
This paper introduces Laser, a new way for vision-language models to think in their hidden space before speaking, so they see the whole “forest” before picking out the “trees.”
#Latent reasoning#Dynamic Windowed Alignment Learning#Dynamic Semantic Windows