The Sphere Encoder is a new way to make images fast by teaching an autoencoder to place all images evenly on a big imaginary sphere and then decode random spots on that sphere back into pictures.
This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.
This paper shows a simple way to make image-generating AIs (diffusion Transformers) produce clearer, more accurate pictures by letting the model guide itself from the inside.