Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
IntermediateQihua Dong, Kuo Yang et al.Feb 27arXiv
This paper builds a new test called Ref-Adv to check if AI can truly match tricky sentences to the right thing in a picture.
#Referring Expression Comprehension#Visual Grounding#Multimodal Large Language Models