CPPO is a new way to fine‑tune vision‑language models so they see pictures more accurately before they start to reason.
This paper teaches a language model to think along several paths at the same time instead of one step after another.