Papers7

All Beginner Intermediate Advanced

All Sources arXiv

#linear probing

DREAM: Where Visual Understanding Meets Text-to-Image Generation

Beginner

Chao Li, Tianhong Li et al.Mar 3arXiv

DREAM is one model that both understands images (like CLIP) and makes images from text (like top text-to-image models).

#DREAM#contrastive learning#masked autoregressive modeling

Not triaged yet

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Intermediate

Ahmadreza Jeddi, Kimia Shaban et al.Mar 1arXiv

This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?

#medical vision-language models#reinforcement learning#supervised fine-tuning

Not triaged yet

MAEB: Massive Audio Embedding Benchmark

Intermediate

Adnan El Assadi, Isaac Chung et al.Feb 17arXiv

MAEB is a giant, fair report card for audio AI that tests 50+ models on 30 tasks across speech, music, environmental sounds, and audio–text tasks in 100+ languages.

#audio embeddings#MAEB#MTEB

Not triaged yet

How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Intermediate

Jiahao Yuan, Yike Xu et al.Feb 11arXiv

Decoder-only language models can be great at making user profiles (embeddings), but how we let them look at the sequence—called attention masking—changes how smart those profiles are.

#decoder-only LLM#attention masking#causal attention

Not triaged yet

EEG Foundation Models: Progresses, Benchmarking, and Open Problems

Intermediate

Dingkun Liu, Yuheng Chen et al.Jan 25arXiv

This paper builds a fair, big playground (a benchmark) to test many EEG foundation models side-by-side on the same rules.

#EEG foundation models#brain-computer interface#self-supervised learning

Not triaged yet

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Intermediate

John Page, Xuesong Niu et al.Jan 9arXiv

This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.

#Send-VAE#semantic disentanglement#latent diffusion

Not triaged yet

What matters for Representation Alignment: Global Information or Spatial Structure?

Intermediate

Jaskirat Singh, Xingjian Leng et al.Dec 11arXiv

This paper asks whether generation training benefits more from an encoder’s big-picture meaning (global semantics) or from how features are arranged across space (spatial structure).

#representation alignment#REPA#iREPA

Not triaged yet