This paper says we should test AI the way real life works: by letting it ask questions, gather clues, and make smart moves step by step under a limited budget.
The paper teaches AI models to plan their thinking time like a smart test-taker who has to finish several questions before the bell rings.