PRISM is a new way to help AI think through hard problems by checking each step, not just the final answer.
LLM judges are cheap but biased; without calibration they can completely flip which model looks best.