DiffCoT treats a modelβs step-by-step thinking (Chain-of-Thought) like a messy draft that can be cleaned up over time, not something fixed forever.
This paper introduces Self-E, a text-to-image model that learns from scratch and can generate good pictures in any number of steps, from just a few to many.