Papers2

#bounding box localization

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

Mathieu Sibue, Andres Muñoz Garza et al.Feb 12arXiv

ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.

#structured information extraction#document understanding#vision-language models

Not triaged yet

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Intermediate

Said Taghadouini, Adrien Cavaillès et al.Jan 20arXiv

LightOnOCR-2-1B is a single, compact AI model that reads PDF pages and scans and turns them into clean, well-ordered text without using fragile multi-step OCR pipelines.

#LightOnOCR-2-1B#end-to-end OCR#vision-language model

Not triaged yet