This paper teaches AI teams to get better by scoring every move they make, not just the final answer.
This paper turns rebuttal writing from ‘just write some text’ into ‘make a plan with proof, then write.’
AgentEHR is a new, realistic test that asks AI agents to read messy hospital records and make full clinical decisions, not just look up facts.
LongVideoAgent is a team of three AIs that work together to answer questions about hour‑long TV episodes without missing small details.