Before this work, AI agents often stopped to run safety checks at every single step, which made them slow and still easy to trick in sneaky ways.
ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.
The paper discovers that popular RLVR methods for training language and vision-language models secretly prefer certain answer lengths, which can hurt learning.
This paper builds a Google-for-theorems: a semantic search engine that finds exact theorems, lemmas, and propositions instead of just entire papers.
This paper builds SocialVeil, a testing world where AI chat agents must talk to each other even when communication is messy, not perfect.
This paper says we should measure an AI agent’s uncertainty across its whole conversation, not just on one final answer.
This paper teaches AI to pay attention better by training its focus, not just its words.
The paper shows that the popular PPO method for training language models is unfair to rare words and too gentle with very common words, which makes learning slow and unstable.
The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.
Horizon-LM flips the usual training setup by keeping all long-term model stuff in the computer’s RAM (CPU) and using the GPU only as a fast, temporary calculator.
OmniSIFT is a new way to shrink (compress) audio and video tokens so omni-modal language models can think faster without forgetting important details.
Large language models can quietly pick up hidden preferences from training data that looks harmless.