This paper teaches a humanoid robot to find and pick up many different objects in new places using plain-language requests like 'grab the orange mug.'
Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.
This paper teaches AI agents to make smart choices about when to explore for more information and when to act right away.
SAW-Bench is a new test that checks if AI can understand the world from a first-person view, like wearing smart glasses.
Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.
Long-horizon AI assistants can grab old, low-quality, or conflicting memories and then answer with too much confidence, which is dangerous.
AI models that make CAD designs used to learn mostly from simple “draw-then-extrude” examples, so they struggled with real, complex parts.
The paper shows that AI agents can learn to cooperate simply by playing lots of different kinds of opponents and figuring them out on the fly, without hardcoding how those opponents learn.
DeepVision-103K is a new 103,000-example picture-and-text math dataset designed to help AI think better using rewards that can be checked automatically.
MAEB is a giant, fair report card for audio AI that tests 50+ models on 30 tasks across speech, music, environmental sounds, and audio–text tasks in 100+ languages.
SAM 3D Body (3DB) is a model that turns a single photo of a person into a full 3D body, feet, and hands mesh with state-of-the-art accuracy.
Big idea: Make image-making AIs stop, think, check, and fix their own work so they get better at both creating pictures and understanding them.