🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers9

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#image editing

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Intermediate
Dianyi Wang, Chaofan Ma et al.Feb 2arXiv

UniReason is a single, unified model that plans with world knowledge before making an image and then edits its own result to fix mistakes, like a student drafting and revising an essay.

#unified multimodal model#world knowledge reasoning#text-to-image generation

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate
Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

Alterbute: Editing Intrinsic Attributes of Objects in Images

Intermediate
Tal Reiss, Daniel Winter et al.Jan 15arXiv

Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.

#intrinsic attribute editing#visual named entities#identity preservation

Unified Thinker: A General Reasoning Modular Core for Image Generation

Intermediate
Sashuai Zhou, Qiang Zhou et al.Jan 6arXiv

Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.

#reasoning-aware image generation#structured planning#edit-only prompt

ProEdit: Inversion-based Editing From Prompts Done Right

Intermediate
Zhi Ouyang, Dian Zheng et al.Dec 26arXiv

ProEdit is a training-free, plug-and-play method that fixes a common problem in image and video editing: the model clings too hard to the original picture and refuses to change what you asked for.

#ProEdit#inversion-based editing#KV-mix

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate
Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

LongCat-Image Technical Report

Intermediate
Meituan LongCat Team, Hanghang Ma et al.Dec 8arXiv

LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.

#LongCat-Image#diffusion model#text-to-image

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Intermediate
Xin He, Longhui Wei et al.Dec 4arXiv

EMMA is a single AI model that can understand images, write about them, create new images from text, and edit images—all in one unified system.

#EMMA#unified multimodal architecture#32x autoencoder

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Intermediate
Bowen Ping, Chengyou Jia et al.Dec 2arXiv

This paper teaches image models to keep things consistent across multiple pictures—like the same character, art style, and story logic—using reinforcement learning (RL).

#consistent image generation#pairwise reward modeling#reinforcement learning