Chain-of-Thought (CoT) makes AI think step by step, but it is slow because it writes many tokens one by one.
This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.