Papers7

#3D reconstruction

RealWonder: Real-Time Physical Action-Conditioned Video Generation

RealWonder is a system that turns a single picture and 3D physical actions (like pushes, wind, and robot gripper moves) into a realistic video in real time.

#action-conditioned video generation#physics simulation#optical flow

Not triaged yet

WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

Intermediate

Yisu Zhang, Chenjie Cao et al.Mar 2arXiv

WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.

#video diffusion models#camera control#3D reconstruction

Not triaged yet

Think3D: Thinking with Space for Spatial Reasoning

Beginner

Zaibin Zhang, Yuhan Wu et al.Jan 19arXiv

Think3D lets AI models stop guessing from flat pictures and start exploring real 3D space, like walking around a room in a video game.

#Think3D#spatial reasoning#3D reconstruction

Not triaged yet

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

Intermediate

Yawar Siddiqui, Duncan Frost et al.Jan 16arXiv

ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.

#ShapeR#3D reconstruction#object-centric

Not triaged yet

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Intermediate

Jieying Chen, Jeffrey Hu et al.Jan 14arXiv

This paper shows how to make long, camera-controlled videos much faster by generating only a few smart keyframes with diffusion, then filling in the rest using a 3D scene and rendering.

#camera-controlled video generation#sparse keyframes#3D reconstruction

Not triaged yet

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Intermediate

Yi-Chuan Huang, Hao-Jen Chien et al.Dec 31arXiv

GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.

#3D reconstruction#outpainting#multi-view diffusion

Not triaged yet

In Pursuit of Pixel Supervision for Visual Pre-training

Intermediate

Lihe Yang, Shang-Wen Li et al.Dec 17arXiv

Pixels are the raw stuff of images, and this paper shows you can learn great vision skills by predicting pixels directly, not by comparing fancy hidden features.

#pixel supervision#masked autoencoders#MAE redesign

Not triaged yet