DARE is a new way for AI assistants to find the right R functions by also looking at what the data looks like, not just the words in the question.
The paper asks a simple question: what must a vision model’s internal pictures (embeddings) look like if it can recognize new mixes of things it already knows?
The paper asks a simple question: do the model’s invisible “imagination tokens” actually help it reason about images?
Searching through videos, images, and long documents is powerful but gets very expensive when every tiny piece is stored separately.
Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.
Sparse autoencoders (SAEs) are popular for explaining what large language models are doing, but this paper shows they often don’t learn real, meaningful features.
VidVec shows that video-capable multimodal language models already hide strong matching signals between videos and sentences inside their middle layers.
This paper builds a Google-for-theorems: a semantic search engine that finds exact theorems, lemmas, and propositions instead of just entire papers.
LatentLens is a simple, training-free way to translate what a model "sees" in image patches into clear words and phrases.
IVRA is a simple, training-free add-on that helps robot brains keep the 2D shape of pictures while following language instructions.
This paper introduces CGPT, a way to help computers find the right tables by building smarter mini-versions of tables and training with tough practice questions.
InfiniteVGGT is a streaming 3D vision system that can keep working forever on live video without running out of memory.