PhyCritic is a judge model that checks other AI modelsβ answers about the physical world, like cooking steps, robot actions, or driving choices.
Reward models are like scorekeepers that tell AI which answers people like more, and this paper builds the first big test for scorekeepers that judge both pictures and words together.