The paper asks a simple question: do the model’s invisible “imagination tokens” actually help it reason about images?
TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.