The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.
This paper builds a Google-for-theorems: a semantic search engine that finds exact theorems, lemmas, and propositions instead of just entire papers.
This paper builds SocialVeil, a testing world where AI chat agents must talk to each other even when communication is messy, not perfect.
Locas is a new kind of add-on memory for language models that learns during use but touches none of the model’s original weights.
This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.
This paper asks a simple question with big consequences: can today’s AI models actively explore a new space and build a trustworthy internal map of it?
This paper teaches AI to pay attention better by training its focus, not just its words.
The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.
The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.
Horizon-LM flips the usual training setup by keeping all long-term model stuff in the computer’s RAM (CPU) and using the GPU only as a fast, temporary calculator.
Rigging 3D characters is a bottleneck: making bones and skin weights by hand is slow and tricky, and past automatic tools often guess the skin weights poorly.
OmniSIFT is a new way to shrink (compress) audio and video tokens so omni-modal language models can think faster without forgetting important details.