ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.
This paper introduces Causal-JEPA (C-JEPA), a world model that learns by hiding entire objects in its memory and forcing itself to predict them from other objects.
The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.
The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.
Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.
AuditDM is a friendly 'auditor' model that hunts for where vision-language models get things wrong and then creates the right practice to fix them.