The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.
SAGE is a two-agent system that automatically writes tough, multi-step search questions and checks them by actually trying to solve them.
The paper tackles understanding super long, first‑person videos (days to a week) by giving an AI a smarter memory and better tools.
The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.
Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.
This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.