🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers12

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#text-to-image

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Intermediate
Tianhe Wu, Ruibin Li et al.Feb 3arXiv

The paper solves a big problem in fast image generators: they got quick, but they lost variety and kept making similar pictures.

#diffusion distillation#distribution matching distillation#mode collapse

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Beginner
Zengbin Wang, Xuecai Hu et al.Jan 28arXiv

Text-to-image models draw pretty pictures, but often put things in the wrong places or miss how objects interact.

#text-to-image#spatial intelligence#occlusion

AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation

Intermediate
Dongjie Cheng, Ruifeng Yuan et al.Jan 25arXiv

AR-Omni is a single autoregressive model that can take in and produce text, images, and speech without extra expert decoders.

#autoregressive modeling#multimodal large language model#any-to-any generation

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Intermediate
Chengzhuo Tong, Mingkun Chang et al.Jan 15arXiv

This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.

#Chain-of-Frame#visual reasoning#text-to-image

Unified Thinker: A General Reasoning Modular Core for Image Generation

Intermediate
Sashuai Zhou, Qiang Zhou et al.Jan 6arXiv

Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.

#reasoning-aware image generation#structured planning#edit-only prompt

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Intermediate
Kaixin Ding, Yang Zhou et al.Dec 18arXiv

Alchemist is a smart data picker for training text-to-image models that learns which pictures and captions actually help the model improve.

#meta-gradient#data selection#text-to-image

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Intermediate
Bozhou Li, Sihan Yang et al.Dec 17arXiv

This paper is about making the words you type into a generator turn into the right pictures and videos more reliably.

#diffusion models#text encoder#multimodal large language model

Few-Step Distillation for Text-to-Image Generation: A Practical Guide

Intermediate
Yifan Pu, Yizeng Han et al.Dec 15arXiv

Big text-to-image models make amazing pictures but are slow because they take hundreds of tiny steps to turn noise into an image.

#text-to-image#diffusion models#few-step generation

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Intermediate
Minglei Shi, Haolin Wang et al.Dec 12arXiv

This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.

#text-to-image#diffusion transformer#flow matching

CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models

Intermediate
Tong Zhang, Carlos Hinojosa et al.Dec 11arXiv

Diffusion models sometimes copy training images too closely, which can be a privacy and copyright problem.

#diffusion models#memorization mitigation#latent feature injection

LongCat-Image Technical Report

Intermediate
Meituan LongCat Team, Hanghang Ma et al.Dec 8arXiv

LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.

#LongCat-Image#diffusion model#text-to-image

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Intermediate
Zhenglin Cheng, Peng Sun et al.Dec 3arXiv

TwinFlow is a new way to make big image models draw great pictures in just one step instead of 40–100 steps.

#TwinFlow#one-step generation#twin trajectories