Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
IntermediateShengchao Zhou, Yuxin Chen et al.Dec 23arXiv
The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).
#dynamic spatial reasoning#vision-language models#4D understanding