Towards Scalable Pre-training of Visual Tokenizers for Generation
IntermediateJingfeng Yao, Yuda Song et al.Dec 15arXiv
The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.
#visual tokenizer#latent space#Vision Transformer