Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
IntermediateShengbang Tong, Boyang Zheng et al.Jan 22arXiv
Before this work, most text-to-image models used VAEs (small, squished image codes) and struggled with slow training and overfitting on high-quality fine-tuning sets.
#Representation Autoencoder#RAE#Variational Autoencoder