Utonia is a single brain (encoder) that learns from many kinds of 3D point clouds, like indoor rooms, outdoor streets, tiny toys, and even city maps.
WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.
The paper tackles a big blind spot in vision-language models: understanding how objects move and relate in 3D over time (dynamic spatial reasoning, or DSR).
CRISP turns a normal phone video of a person into a clean 3D world and a virtual human that can move in it without breaking physics.
D4RT is a new AI model that turns regular videos into moving 3D scenes (4D) quickly and accurately.