RealWonder is a system that turns a single picture and 3D physical actions (like pushes, wind, and robot gripper moves) into a realistic video in real time.
WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.
ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.
This paper shows how to make long, camera-controlled videos much faster by generating only a few smart keyframes with diffusion, then filling in the rest using a 3D scene and rendering.
GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.
Pixels are the raw stuff of images, and this paper shows you can learn great vision skills by predicting pixels directly, not by comparing fancy hidden features.