This paper turns a popular image-guidance trick (Classifier-Free Guidance) into a feedback-control problem, just like keeping a car steady in its lane.
Humans keep a big-picture memory (a βmindscapeβ) when reading long things; this paper teaches AI to do the same.