Papers6

#PickScore

Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO

Text-to-image models using GRPO used to give the same final reward to every step, which is like giving the whole team the same grade no matter who did what.

#TurningPoint-GRPO#GRPO#Flow Matching

Not triaged yet

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate

Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

Not triaged yet

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Intermediate

Haoyou Deng, Keyu Yan et al.Jan 28arXiv

DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.

#DenseGRPO#flow matching#GRPO

Not triaged yet

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

Intermediate

Xijie Huang, Chengming Xu et al.Jan 5arXiv

This paper makes video editing easier by teaching an AI to spread changes from the first frame across the whole video smoothly and accurately.

#First-Frame Propagation#Video Editing#FFP-300K

Not triaged yet

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Intermediate

Shengjun Zhang, Zhang Zhang et al.Jan 1arXiv

This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.

#E-GRPO#Group Relative Policy Optimization#Flow Matching

Not triaged yet

Position: Universal Aesthetic Alignment Narrows Artistic Expression

Intermediate

Wenqi Marshall Guo, Qingyun Qian et al.Dec 9arXiv

The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.

#universal aesthetic alignment#aesthetic pluralism#reward models

Not triaged yet