SpatiaLab is a new test that checks if vision-language models (VLMs) can understand real-world spatial puzzles, like whatβs in front, behind, bigger, or reachable.
SAGE is a smart video-watching agent that decides when to answer quickly and when to take multiple steps, just like how people skim or rewind long videos.