BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
IntermediateGuoxin Chen, Fanzhe Meng et al.Mar 3arXiv
BeyondSWE is a new benchmark that tests code agents on tougher, more real-life tasks than single-repo bug fixing.
#BeyondSWE#code agents#software engineering benchmark