Papers2

#Spearman Correlation

A2Eval: Agentic and Automated Evaluation for Embodied Brain

A2Eval is a two-agent system that automatically builds and runs fair tests for robot-style vision-language models, cutting wasted work while keeping results trustworthy.

#Embodied AI#Vision-Language Models#Agentic Evaluation

Not triaged yet

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Intermediate

Ming Li, Han Chen et al.Dec 21arXiv

This paper asks a simple question with big impact: Can AI tell which test questions are hard for humans?

#Item Difficulty Prediction#Item Response Theory#Rasch Model

Not triaged yet