This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.
DiffThinker turns hard picture-based puzzles into an image-to-image drawing task instead of a long texting task.
This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.
This paper shows a new way to teach an autoencoder to shape its hidden space (the 'latent space') to look like any distribution we want, not just a simple bell curve.