๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Vision-Language Model

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

Beginner
Dianyi Wang, Ruihang Li et al.Feb 12arXiv

DeepGen 1.0 is a small 5B-parameter model that can both make new images and smartly edit existing ones from text instructions.

#Unified multimodal model#Stacked Channel Bridging#Think tokens

Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

Beginner
Surapon Nonesung, Natapong Nitarach et al.Jan 21arXiv

Typhoon OCR is an open, lightweight vision-language model that reads Thai and English documents and returns clean, structured text.

#Thai OCR#Vision-Language Model#Document Layout Reconstruction