Papers2

All Beginner Intermediate Advanced

All Sources arXiv

#Gating Mechanism

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Intermediate

Shoubin Yu, Yue Zhang et al.Feb 9arXiv

Visual spatial reasoning often fails when a model only looks at one picture and must imagine new viewpoints.

#Adaptive Test-Time Scaling#World Models#Visual Spatial Reasoning

Not triaged yet

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

Intermediate

Jing Zuo, Lingzhou Mu et al.Jan 20arXiv

FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.

#Vision-and-Language Navigation#Chain-of-Thought#Multimodal CoT

Not triaged yet