SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
IntermediateMinh V. T. Thai, Tue Le et al.Dec 20arXiv
SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.
#SWE-EVO#software evolution#coding agents