Papers2

#unit tests

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Yusong Lin, Haiyang Wang et al.Feb 11arXiv

CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.

#agentic coding#command line interface#Dockerfile

Not triaged yet

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

Intermediate

Qixing Zhou, Jiacheng Zhang et al.Feb 11arXiv

FeatureBench is a new benchmark that tests AI coding agents on building real software features, not just fixing small bugs.

#FeatureBench#agentic coding#execution-based evaluation

Not triaged yet