Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.
ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.
This paper shows how to make long, camera-controlled videos much faster by generating only a few smart keyframes with diffusion, then filling in the rest using a 3D scene and rendering.
GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.
Pixels are the raw stuff of images, and this paper shows you can learn great vision skills by predicting pixels directly, not by comparing fancy hidden features.