CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
IntermediateChengzhuo Tong, Mingkun Chang et al.Jan 15arXiv
This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.
#Chain-of-Frame#visual reasoning#text-to-image