Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.
This paper teaches a vision-language model to first find objects in real 3D space (not just 2D pictures) and then reason about where things are.