Self-Improving VLM Judges Without Human Annotations
IntermediateInna Wanyin Lin, Yushi Hu et al.Dec 2arXiv
The paper shows how a vision-language model (VLM) can train itself to be a fair judge of answers about images without using any human preference labels.
#vision-language model#VLM judge#reward model