The paper introduces VGGT-Det, a new way to detect 3D objects indoors from many photos without needing sensor-provided camera poses or depth maps.
DreamWorld is a new way to make videos that not only look real but also follow common-sense rules about motion, space, and meaning.
V-DPM is a new way for AI to turn a short video into a moving 3D world, capturing both the shape and the motion of everything in it.
This paper teaches an AI model to understand both which way an object is facing (orientation) and how it turns between views (rotation), all in one system.
InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.
This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?