Papers14

#cosine similarity

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

DARE is a new way for AI assistants to find the right R functions by also looking at what the data looks like, not just the words in the question.

#distribution-aware retrieval#RPKB#RCodingAgent

Not triaged yet

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

Intermediate

Arnas Uselis, Andrea Dittadi et al.Feb 27arXiv

The paper asks a simple question: what must a vision model’s internal pictures (embeddings) look like if it can recognize new mixes of things it already knows?

#compositional generalization#linear representation hypothesis#orthogonal representations

Not triaged yet

Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Beginner

You Li, Chi Chen et al.Feb 26arXiv

The paper asks a simple question: do the model’s invisible “imagination tokens” actually help it reason about images?

#multimodal large language model#visual reasoning#latent visual reasoning

Not triaged yet

Multi-Vector Index Compression in Any Modality

Beginner

Hanxiang Qin, Alexander Martin et al.Feb 24arXiv

Searching through videos, images, and long documents is powerful but gets very expensive when every tiny piece is stored separately.

#multi-vector retrieval#late interaction#index compression

Not triaged yet

Reinforced Fast Weights with Next-Sequence Prediction

Intermediate

Hee Seung Hwang, Xindi Wu et al.Feb 18arXiv

Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.

#fast weight models#next-sequence prediction#reinforcement learning for LMs

Not triaged yet

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Intermediate

Anton Korznikov, Andrey Galichin et al.Feb 15arXiv

Sparse autoencoders (SAEs) are popular for explaining what large language models are doing, but this paper shows they often don’t learn real, meaningful features.

#sparse autoencoders#interpretability#dictionary learning

Not triaged yet

VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval

Intermediate

Issar Tzachor, Dvir Samuel et al.Feb 8arXiv

VidVec shows that video-capable multimodal language models already hide strong matching signals between videos and sentences inside their middle layers.

#video–text retrieval#multimodal large language models#intermediate layer embeddings

Not triaged yet

Semantic Search over 9 Million Mathematical Theorems

Intermediate

Luke Alexander, Eric Leonen et al.Feb 5arXiv

This paper builds a Google-for-theorems: a semantic search engine that finds exact theorems, lemmas, and propositions instead of just entire papers.

#semantic theorem search#mathematical information retrieval#dense retrieval

Not triaged yet

LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs

Beginner

Benno Krojer, Shravan Nayak et al.Jan 31arXiv

LatentLens is a simple, training-free way to translate what a model "sees" in image patches into clear words and phrases.

#LatentLens#visual tokens#contextual embeddings

Not triaged yet

IVRA: Improving Visual-Token Relations for Robot Action Policy with Training-Free Hint-Based Guidance

Beginner

Jongwoo Park, Kanchana Ranasinghe et al.Jan 22arXiv

IVRA is a simple, training-free add-on that helps robot brains keep the 2D shape of pictures while following language instructions.

#Vision-Language-Action#affinity map#training-free guidance

Not triaged yet

CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval

Intermediate

Tsung-Hsiang Chou, Chen-Jui Yu et al.Jan 22arXiv

This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.

#table retrieval#synthetic query generation#cluster-guided partial tables

Not triaged yet

InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

Intermediate

Shuai Yuan, Yantai Yang et al.Jan 5arXiv

InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.

#InfiniteVGGT#rolling memory#causal attention

Not triaged yet

1 2