Papers7

#ControlNet

WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

Yisu Zhang, Chenjie Cao et al.Mar 2arXiv

WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.

#video diffusion models#camera control#3D reconstruction

Not triaged yet

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Intermediate

Linxi Xie, Lisong C. Sun et al.Feb 20arXiv

This paper builds a "generated reality" system that lets AI-made videos react to your real head and hand movements in VR.

#generated reality#hand pose conditioning#video diffusion transformer

Not triaged yet

Efficient Text-Guided Convolutional Adapter for the Diffusion Model

Intermediate

Aryan Das, Koushik Biswas et al.Feb 16arXiv

This paper introduces Nexus Adapters, tiny helper networks that let a diffusion model follow both a text prompt and a structure map (like edges or depth) at the same time.

#Nexus Adapter#text-guided adapter#cross-attention

Not triaged yet

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

Beginner

Nate Gillman, Yinghua Zhou et al.Jan 9arXiv

Video models can now be told what physical result you want (like “make this ball move left with a strong push”) using Goal Force, instead of just vague text or a final picture.

#goal force#force vector control#visual planning

Not triaged yet