Papers3

#Vision-Language Model

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Dianyi Wang, Ruihang Li et al.Feb 12arXiv

DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.

#Unified multimodal model#Stacked Channel Bridging#Think tokens

Not triaged yet

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Intermediate

Yufeng Zhong, Lei Chen et al.Jan 29arXiv

OCRVerse is a new AI model that can read both plain text in documents and the visual structures in charts, webpages, and science plots, all in one system.

#Holistic OCR#Vision-Language Model#Supervised Fine-Tuning

Not triaged yet

Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

Beginner

Surapon Nonesung, Natapong Nitarach et al.Jan 21arXiv

Typhoon OCR is an open, lightweight vision-language model that reads Thai and English documents and returns clean, structured text.

#Thai OCR#Vision-Language Model#Document Layout Reconstruction

Not triaged yet