Interactive Benchmarks
BeginnerBaoqing Yue, Zihan Zhu et al.Mar 5arXiv
This paper says we should test AI the way real life works: by letting it ask questions, gather clues, and make smart moves step by step under a limited budget.
#interactive benchmarks#information acquisition#budgeted reasoning