This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.
The paper introduces UCoder, a way to teach a code-generating AI to get better without using any outside datasets, not even unlabeled code.
The paper shows how a vision-language model (VLM) can train itself to be a fair judge of answers about images without using any human preference labels.