FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights
IntermediateZhen Wang, Fan Bai et al.Feb 2arXiv
FIRE-Bench is a new test that checks whether AI agents can fully redo real scientific discoveries, step by step, not just guess answers.
#FIRE-Bench#scientific agents#rediscovery benchmark