Papers4

#agentic coding

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

This paper shows that code-writing AI agents can take an existing math problem and automatically turn it into a new, harder one while keeping it solvable.

#code agents#multi-agent systems#mathematical reasoning

Not triaged yet

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Intermediate

Yusong Lin, Haiyang Wang et al.Feb 11arXiv

CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.

#agentic coding#command line interface#Dockerfile

Not triaged yet

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Intermediate

Qixing Zhou, Jiacheng Zhang et al.Feb 11arXiv

FeatureBench is a new benchmark that tests AI coding agents on building real software features, not just fixing small bugs.

#FeatureBench#agentic coding#execution-based evaluation

Not triaged yet

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Intermediate

Zimu Lu, Houxing Ren et al.Feb 3arXiv

This paper builds an AI team that can make real full‑stack websites (frontend, backend, and database) from plain English instructions.

#agentic coding#multi-agent systems#full-stack development

Not triaged yet