ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.
The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.
The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.
AuditDM is a friendly 'auditor' model that hunts for where vision-language models get things wrong and then creates the right practice to fix them.