This paper says we should test AI the way real life works: by letting it ask questions, gather clues, and make smart moves step by step under a limited budget.
MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.
TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.