Papers9

#InfoNCE

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

DARE is a new way for AI assistants to find the right R functions by also looking at what the data looks like, not just the words in the question.

#distribution-aware retrieval#RPKB#RCodingAgent

Not triaged yet

InfoNCE Induces Gaussian Distribution

Intermediate

Roy Betser, Eyal Gofer et al.Feb 27arXiv

The paper shows that when we train with the popular InfoNCE contrastive loss, the learned features start to behave like they come from a Gaussian (bell-shaped) distribution.

#InfoNCE#contrastive learning#Gaussian embeddings

Not triaged yet

How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Intermediate

Jiahao Yuan, Yike Xu et al.Feb 11arXiv

Decoder-only language models can be great at making user profiles (embeddings), but how we let them look at the sequence—called attention masking—changes how smart those profiles are.

#decoder-only LLM#attention masking#causal attention

Not triaged yet

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Intermediate

Xiaomin Yu, Yi Xin et al.Feb 2arXiv

This paper finds a precise way to describe and fix the Modality Gap, which is when image and text features that mean the same thing still sit in different places in the AI’s memory space.

#Modality Gap#Multimodal Large Language Models#Contrastive Learning

Not triaged yet

Do Reasoning Models Enhance Embedding Models?

Intermediate

Wun Yu Chan, Shaojin Chen et al.Jan 29arXiv

The paper asks a simple question: if a language model becomes better at step-by-step reasoning (using RLVR), do its text embeddings also get better? The short answer is no.

#text embeddings#RLVR#contrastive learning

Not triaged yet

CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval

Intermediate

Tsung-Hsiang Chou, Chen-Jui Yu et al.Jan 22arXiv

This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.

#table retrieval#synthetic query generation#cluster-guided partial tables

Not triaged yet

Action100M: A Large-scale Video Action Dataset

Intermediate

Delong Chen, Tejaswi Kasarla et al.Jan 15arXiv

Action100M is a gigantic video dataset with about 100 million labeled action moments built automatically from 1.2 million instructional videos.

#Action100M#open-vocabulary action recognition#hierarchical temporal segmentation

Not triaged yet

CPPO: Contrastive Perception for Vision Language Policy Optimization

Intermediate

Ahmad Rezaei, Mohsen Gholami et al.Jan 1arXiv

CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.

#CPPO#Contrastive Perception Loss#Vision-Language Models

Not triaged yet

Relational Visual Similarity

Intermediate

Thao Nguyen, Sicheng Mo et al.Dec 8arXiv

Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.

Not triaged yet