Papers2

#multimodal benchmark

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

Chongyang Gao, Diji Yang et al.Feb 23arXiv

CFE-BENCH is a new, teacher-verified "Classroom Final Exam" for AI that uses real college STEM problems to test deep, step-by-step reasoning.

#CFE-BENCH#variable-based verification#reasoning flow

Not triaged yet

TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models

Beginner

Fangxu Yu, Xingang Guo et al.Jan 26arXiv

TSRBench is a giant test that checks if AI models can understand and reason about data that changes over time, like heartbeats, stock prices, and weather.

#time series reasoning#multimodal benchmark#perception

Not triaged yet