GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation
IntermediateRang Li, Lei Li et al.Dec 19arXiv
Visual grounding is when an AI finds the exact thing in a picture that a sentence is talking about, and this paper shows todayβs big vision-language AIs are not as good at it as we thought.
#visual grounding#multimodal large language models#benchmark