Papers5

#Contrastive Learning

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Penguin-VL shows that small vision-language models (2B and 8B) can be very strong if you give them a better vision encoder, not just a bigger brain.

#Vision Language Model#LLM-based Vision Encoder#Contrastive Learning

Not triaged yet

OpenAutoNLU: Open Source AutoML Library for NLU

Beginner

Grigory Arshinov, Aleksandr Boriskin et al.Mar 2arXiv

OpenAutoNLU is a simple, open-source tool that automatically builds text understanding models for you.

#AutoML#Natural Language Understanding#Text Classification

Not triaged yet

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Intermediate

Xiaomin Yu, Yi Xin et al.Feb 2arXiv

This paper finds a precise way to describe and fix the Modality Gap, which is when image and text features that mean the same thing still sit in different places in the AI’s memory space.

#Modality Gap#Multimodal Large Language Models#Contrastive Learning

Not triaged yet

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Intermediate

Letian Zhang, Sucheng Ren et al.Jan 21arXiv

OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).

#Unified Visual Encoder#VAE#Vision Transformer

Not triaged yet

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Intermediate

Yuqing Li, Jiangnan Li et al.Dec 19arXiv

Humans keep a big-picture memory (a “mindscape”) when reading long things; this paper teaches AI to do the same.

#Retrieval-Augmented Generation#Mindscape#Hierarchical Summarization

Not triaged yet