The paper introduces SIN-Bench, a new way to test AI that read long scientific papers by forcing them to show exactly where their answers come from.
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.