Papers2

#agent trajectories

Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents

This paper asks a simple question: do tests written by AI coding agents actually help them fix real software bugs, or do they just look helpful?

#LLM agents#agent-written tests#software engineering agents

Not triaged yet

ContextBench: A Benchmark for Context Retrieval in Coding Agents

Intermediate

Han Li, Letian Zhu et al.Feb 5arXiv

ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.

#context retrieval#coding agents#software engineering benchmarks

Not triaged yet