This paper shows a simple way to make image-generating AIs (diffusion Transformers) produce clearer, more accurate pictures by letting the model guide itself from the inside.
WorldWarp is a new method that turns a single photo plus a planned camera path into a long, steady, 3D-consistent video.
This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.
This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.