Papers4

#coding agents

ContextBench: A Benchmark for Context Retrieval in Coding Agents

ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.

#context retrieval#coding agents#software engineering benchmarks

Not triaged yet

SERA: Soft-Verified Efficient Repository Agents

Intermediate

Ethan Shen, Danny Tormoen et al.Jan 28arXiv

SERA is a new, low-cost way to train coding helpers (agents) that learn the style and secrets of your own codebase.

#SERA#Soft-Verified Generation#soft verification

Not triaged yet

SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents

Intermediate

Yuhang Wang, Yuling Shi et al.Jan 23arXiv

Coding agents waste most of their tokens just reading giant files, which makes them slow and expensive.

#SWE-Pruner#context pruning#coding agents

Not triaged yet

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

Intermediate

Minh V. T. Thai, Tue Le et al.Dec 20arXiv

SWE-EVO is a new test (benchmark) that checks if AI coding agents can upgrade real software projects over many steps, not just fix one small bug.

#SWE-EVO#software evolution#coding agents

Not triaged yet