This paper builds a gigantic library of video puzzles (VBVR) so AI can practice not just making pretty videos, but actually thinking through what happens over time.
TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.