AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
IntermediateAlisia Lupidi, Bhavul Gauri et al.Feb 6arXiv
AIRS-Bench is a new test suite that checks whether AI research agents can do real machine learning research from start to finish, not just answer questions.
#AIRS-Bench#AI research agents#LLM agents