See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
IntermediateJaehyun Park, Minyoung Ahn et al.Feb 24arXiv
Modern image generators can still make strange mistakes like extra fingers or melted faces, and todayβs vision-language models (VLMs) often miss them.
#visual artifacts#structural artifacts#diffusion transformer