Similarity-based image–text models like CLIP can be fooled by “half-truths,” where adding one plausible but wrong detail makes a caption look more similar to an image instead of less similar.
LatentMorph teaches an image-making AI to quietly think in its head while it draws, instead of stopping to write out its thoughts in words.