Talk2Move is a training recipe that lets an image editor move, rotate, and resize the exact object you mention using plain text, while keeping the rest of the picture stable.
Autoregressive (AR) image models make pictures by choosing tokens one-by-one, but they were judged only on picking likely tokens, not on how good the final picture looks in pixels.
This paper teaches AI models to reason better by first copying only good examples and later learning from mistakes too.