DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.
This paper tackles dataset distillation by giving a clear, math-backed way to keep only the most useful bits of data, so models can learn well from far fewer images.
RecTok is a new visual tokenizer that teaches the whole training path of a diffusion model (the forward flow) to be smart about image meaning, not just the starting latent features.