This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.
RL-trained search agents often sound confident even when they don’t know, which can mislead people.
Cities are full of places defined by people, like schools and parks, which are hard to see clearly from space without extra clues.
SkinFlow is a 7B-parameter vision–language model that diagnoses skin conditions by sending the most useful visual information to the language brain, instead of just getting bigger.
TranslateGemma is a family of open machine translation models fine-tuned from Gemma 3 to translate many languages more accurately.
Agents often act like tourists without a map: they react to what they see now and miss long-term consequences.
The paper introduces Multiplex Thinking, a new way for AI to think by sampling several likely next words at once and blending them into a single super-token.
VLingNav is a robot navigation system that sees, reads instructions, and acts, while deciding when to think hard and when to just move.
MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.
The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.
Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.
Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.