GutenOCR: A Grounded Vision-Language Front-End for Documents
IntermediateHunter Heidenreich, Ben Elliott et al.Jan 20arXiv
GutenOCR turns a general vision-language model into a single, smart OCR front-end that can read, find, and point to text on a page using simple prompts.
#grounded OCR#vision-language model#document understanding