Multimodal Process Reward Models (MPRMs) teach AI to judge each step of a picture-and-text reasoning process, not just the final answer.
The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.