This paper asks a simple question: do tests written by AI coding agents actually help them fix real software bugs, or do they just look helpful?
ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.