Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.
This paper introduces YaPO, a way to gently nudge a language modelβs hidden thoughts so it behaves better without retraining it.