🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers7

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#diffusion model

Unified Thinker: A General Reasoning Modular Core for Image Generation

Intermediate
Sashuai Zhou, Qiang Zhou et al.Jan 6arXiv

Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.

#reasoning-aware image generation#structured planning#edit-only prompt

VIBE: Visual Instruction Based Editor

Intermediate
Grigorii Alekseenko, Aleksandr Gordeev et al.Jan 5arXiv

VIBE is a tiny but mighty image editor that listens to your words and changes pictures while keeping the original photo intact unless you ask otherwise.

#instruction-based image editing#vision-language model#diffusion model

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Intermediate
Hau-Shiang Shiu, Chin-Yang Lin et al.Dec 29arXiv

This paper makes diffusion-based video super-resolution (VSR) practical for live, low-latency use by removing the need for future frames and cutting denoising from ~50 steps down to just 4.

#video super-resolution#diffusion model#latent diffusion

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Intermediate
Tianyuan Qu, Lei Ke et al.Dec 18arXiv

RePlan is a plan-then-execute system that first figures out exactly where to edit in a picture and then makes clean changes there.

#instruction-based image editing#vision–language model (VLM)#diffusion model

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

LongCat-Image Technical Report

Intermediate
Meituan LongCat Team, Hanghang Ma et al.Dec 8arXiv

LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.

#LongCat-Image#diffusion model#text-to-image

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

Intermediate
Junyan Ye, Leiqi Zhu et al.Nov 29arXiv

RealGen is a new way to make computer-made pictures look so real that they can fool expert detectors and even careful judges.

#photorealistic text-to-image#detector-guided rewards#reinforcement learning