This paper introduces Nexus Adapters, tiny helper networks that let a diffusion model follow both a text prompt and a structure map (like edges or depth) at the same time.
ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.