Forest Before Trees: Latent Superposition for Efficient Visual Reasoning
IntermediateYubo Wang, Juntian Zhang et al.Jan 11arXiv
This paper introduces Laser, a new way for vision-language models to think in their hidden space before speaking, so they see the whole βforestβ before picking out the βtrees.β
#Latent reasoning#Dynamic Windowed Alignment Learning#Dynamic Semantic Windows