Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition
IntermediateJinlong Ma, Yu Zhang et al.Feb 4arXiv
The paper teaches multimodal large language models (MLLMs) to stop guessing from just text or just images and instead check both together before answering.
#GMNER#Multimodal Large Language Models#Modality Bias