Robots learn better when they think about how things move over time, not by redrawing every pixel of a video.
HERBench is a new test that checks if video AI models can combine several clues spread across time, not just guess from one frame or language priors.