Papers10

#image editing

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Xiangyan Qu, Zhenlong Yuan et al.Feb 24arXiv

This paper speeds up and improves AI image editing by giving hard edits more attention and easy edits less, just like a smart coach.

#adaptive test-time scaling#image chain-of-thought#image editing

Not triaged yet

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Intermediate

Dianyi Wang, Chaofan Ma et al.Feb 2arXiv

UniReason is a single, unified model that plans with world knowledge before making an image and then edits its own result to fix mistakes, like a student drafting and revising an essay.

#unified multimodal model#world knowledge reasoning#text-to-image generation

Not triaged yet

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate

Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

Not triaged yet

Alterbute: Editing Intrinsic Attributes of Objects in Images

Intermediate

Tal Reiss, Daniel Winter et al.Jan 15arXiv

Alterbute is a diffusion-based method that changes an object's intrinsic attributes (color, texture, material, shape) in a photo while keeping the object's identity and the scene intact.

#intrinsic attribute editing#visual named entities#identity preservation

Not triaged yet

Unified Thinker: A General Reasoning Modular Core for Image Generation

Intermediate

Sashuai Zhou, Qiang Zhou et al.Jan 6arXiv

Unified Thinker separates “thinking” (planning) from “drawing” (image generation) so complex instructions get turned into clear, doable steps before any pixels are painted.

#reasoning-aware image generation#structured planning#edit-only prompt

Not triaged yet

ProEdit: Inversion-based Editing From Prompts Done Right

Intermediate

Zhi Ouyang, Dian Zheng et al.Dec 26arXiv

ProEdit is a training-free, plug-and-play method that fixes a common problem in image and video editing: the model clings too hard to the original picture and refuses to change what you asked for.

#ProEdit#inversion-based editing#KV-mix

Not triaged yet

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Intermediate

Zhihang Liu, Xiaoyi Bao et al.Dec 15arXiv

ShowTable is a new way for AI to turn a data table into a beautiful, accurate infographic using a think–make–check–fix loop.

#creative table visualization#multimodal large language model#diffusion model

Not triaged yet

LongCat-Image Technical Report

Intermediate

Meituan LongCat Team, Hanghang Ma et al.Dec 8arXiv

LongCat-Image is a small (6B) but mighty bilingual image generator that turns text into high-quality, realistic pictures and can also edit images very well.

#LongCat-Image#diffusion model#text-to-image

Not triaged yet

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Intermediate

Xin He, Longhui Wei et al.Dec 4arXiv

EMMA is a single AI model that can understand images, write about them, create new images from text, and edit images—all in one unified system.

#EMMA#unified multimodal architecture#32x autoencoder

Not triaged yet

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Intermediate

Bowen Ping, Chengyou Jia et al.Dec 2arXiv

This paper teaches image models to keep things consistent across multiple pictures—like the same character, art style, and story logic—using reinforcement learning (RL).

#consistent image generation#pairwise reward modeling#reinforcement learning

Not triaged yet