WorldStereo is a method that turns a single photo (or a panorama) into a short set of camera-guided videos and then reconstructs a consistent 3D scene from them.
This paper builds a "generated reality" system that lets AI-made videos react to your real head and hand movements in VR.
This paper introduces Nexus Adapters, tiny helper networks that let a diffusion model follow both a text prompt and a structure map (like edges or depth) at the same time.
Video models can now be told what physical result you want (like “make this ball move left with a strong push”) using Goal Force, instead of just vague text or a final picture.
Spatia is a video generator that keeps a live 3D map of the scene (a point cloud) as its memory while making videos.
Steer3D lets you change a 3D object just by typing what you want, like “add a roof rack,” and it does it in one quick pass.
LongVie 2 is a video world model that can generate controllable videos for 3–5 minutes while keeping the look and motion steady over time.