Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
IntermediateZhixiang Wei, Yi Li et al.Jan 27arXiv
Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.
#Vision-Language Models#Unified Autoregressive Supervision#Visual Tokenization