From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
IntermediateHongrui Jia, Chaoya Jiang et al.Feb 26arXiv
Large multimodal models (LMMs) can look at pictures and read text, but they still miss tricky cases, like tiny chart labels or multi-step math.
#Large Multimodal Models#Diagnostic-driven Progressive Evolution#Reinforcement Learning