PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.
This paper organizes how AI agents learn and improve into one simple map with four roads: A1, A2, T1, and T2.