How Much 3D Do Video Foundation Models Encode?
IntermediateZixuan Huang, Xiang Li et al.Dec 23arXiv
This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?
#video foundation models#3D awareness#temporal reasoning