Puzzle Curriculum GRPO for Vision-Centric Reasoning
IntermediateAhmadreza Jeddi, Hakki Can Karaimer et al.Dec 16arXiv
This paper teaches vision-language models to reason about pictures using puzzles instead of expensive human labels.
#vision-language models#reinforcement learning#group-relative policy optimization