FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation
IntermediateJing Zuo, Lingzhou Mu et al.Jan 20arXiv
FantasyVLN teaches a robot to follow language instructions while looking around, using a smart, step-by-step thinking style during training but not at test time.
#Vision-and-Language Navigation#Chain-of-Thought#Multimodal CoT