ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.
GutenOCR turns a general vision-language model into a single, smart OCR front-end that can read, find, and point to text on a page using simple prompts.
STEP3-VL-10B is a small (10 billion parameters) open multimodal model that sees images and reads text, yet scores like much larger models.
This paper builds a tough new test called O3-BENCH to check if AI can truly think with images, not just spot objects.
Long texts are expensive for AI to read because each extra token costs a lot of compute and memory.