InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.
This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?
RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.
This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.
This paper teaches a vision-language model to first find objects in real 3D space (not just 2D pictures) and then reason about where things are.
Pixels are the raw stuff of images, and this paper shows you can learn great vision skills by predicting pixels directly, not by comparing fancy hidden features.
D4RT is a new AI model that turns regular videos into moving 3D scenes (4D) quickly and accurately.