SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.
DFlash is a new way to make big language models answer much faster without changing the final answers.
InterPrior is a new brain for simulated humans and humanoid robots that can move, balance, and use objects by following simple goals instead of step-by-step instructions.
V-Retrver is a new way for AI to search across text and images by double-checking tiny visual details instead of only guessing from words.
The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.
RISE-Video is a new test that checks whether video-making AIs follow hidden world rules, not just make pretty pictures.
SAGE is a new test for how well AI research agents find scientific papers when questions require multi-step reasoning.
The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.
This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.
BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.
Large language models are great at words, but they struggle to predict what will happen after they act in a changing world.
Large language models are usually trained to get good at one kind of reasoning, but real life needs them to be good at many things at once.