This paper asks a simple question: does reinforcement learning (RL) truly make medical vision-language models (VLMs) smarter, or just help them pick better from answers they already know?
The paper shows a fast, training-free way to boost an LLMβs step-by-step reasoning by smartly reusing the modelβs own probabilities.