๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Image-Text Vocabulary

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Intermediate
Zhixiang Wei, Yi Li et al.Jan 27arXiv

Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.

#Vision-Language Models#Unified Autoregressive Supervision#Visual Tokenization