CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.
This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.
SAGE is a two-agent system that automatically writes tough, multi-step search questions and checks them by actually trying to solve them.
The paper introduces UCoder, a way to teach a code-generating AI to get better without using any outside datasets, not even unlabeled code.