Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.
Normalizing Flows are models that learn how to turn real images into simple noise and then back again.