ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.
SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.
Coding agents waste most of their tokens just reading giant files, which makes them slow and expensive.
SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.