How I Study AI - Learn AI Papers & Lectures the Easy Way

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Intermediate

Xiaomin Yu, Yi Xin et al.Feb 2arXiv

This paper finds a precise way to describe and fix the Modality Gap, which is when image and text features that mean the same thing still sit in different places in the AI’s memory space.

#Modality Gap#Multimodal Large Language Models#Contrastive Learning

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Intermediate

Zhixiang Wei, Yi Li et al.Jan 27arXiv

Youtu-VL is a new kind of vision-language model that learns to predict both words and tiny image pieces, not just words.

#Vision-Language Models#Unified Autoregressive Supervision#Visual Tokenization

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

Intermediate

Abdelaziz Bounhar, Rania Hossam Elmohamady Elbadry et al.Jan 13arXiv

This paper introduces YaPO, a way to gently nudge a language model’s hidden thoughts so it behaves better without retraining it.

#Activation Steering#Sparse Autoencoder#Preference Optimization

Papers3

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation