BABE: Biology Arena BEnchmark
IntermediateJunting Zhou, Jin Chen et al.Feb 5arXiv
BABE is a new benchmark that tests if AI can read real biology papers and reason from experiments like a scientist, not just recall facts.
#BABE Benchmark#Experimental Reasoning#Causal Reasoning