How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing
IntermediateHuanyu Zhang, Xuehai Bai et al.Feb 2arXiv
VIBE is a new test that checks how well image-editing AI models follow visual instructions like arrows, boxes, and sketches—not just text.
#visual instruction following#image editing benchmark#deictic grounding