πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#bounding boxes

BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models

Beginner
Eliran Kachlon, Alexander Visheratin et al.Feb 24arXiv

BBQ is a text-to-image model that lets you place objects exactly where you want using numeric bounding boxes and color them with exact RGB values.

#text-to-image#bounding boxes#RGB control

Not triaged yet

GutenOCR: A Grounded Vision-Language Front-End for Documents

Intermediate
Hunter Heidenreich, Ben Elliott et al.Jan 20arXiv

GutenOCR turns a general vision-language model into a single, smart OCR front-end that can read, find, and point to text on a page using simple prompts.

#grounded OCR#vision-language model#document understanding

Not triaged yet