This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.
This paper introduces DERL, a two-level learning system that automatically builds better reward functions for reinforcement learning agents.
SPARK teaches AI to grade its own steps without needing the right answers written down anywhere.