Training Data Efficiency in Multimodal Process Reward Models
IntermediateJinyuan Li, Chengsong Huang et al.Feb 4arXiv
Multimodal Process Reward Models (MPRMs) teach AI to judge each step of a picture-and-text reasoning process, not just the final answer.
#Multimodal Process Reward Model#Process Supervision#Monte Carlo Annotation