A2Eval is a two-agent system that automatically builds and runs fair tests for robot-style vision-language models, cutting wasted work while keeping results trustworthy.
RoboBrain 2.5 teaches robots to see depth precisely and to keep track of time-aware progress, so plans turn into safe, accurate actions.
Robots usually think in words and pictures, but their hands need exact motions, so there is a gap between understanding and doing.
QuantiPhy is a new test that checks if AI models can measure real-world physics from videos using numbers, not guesses.
This paper teaches robots to move their camera to a better spot before answering a question about what they see.
Vision-Language-Action (VLA) models are robots’ “see–think–do” brains that connect cameras (vision), words (language), and motors (action).