Papers8

#software engineering agents

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Shuang Sun, Huatong Song et al.Feb 3arXiv

SWE-World lets code-fixing AI agents practice and learn without heavy Docker containers by using smart models that pretend to be the computer and tests.

#SWE-World#software engineering agents#Docker-free training

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Intermediate

Mouxiang Chen, Lei Zhang et al.Feb 2arXiv

SWE-Universe is a factory-like system that turns real GitHub pull requests into safe, repeatable coding practice worlds with automatic checkers.

#SWE-Universe#software engineering agents#pull requests

Closing the Loop: Universal Repository Representation with RPG-Encoder

Intermediate

Jane Luo, Chengyu Yin et al.Feb 2arXiv

The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).

#Repository Planning Graph#RPG-Encoder#semantic lifting

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Intermediate

Chuanzhe Guo, Jingjing Wu et al.Jan 30arXiv

This paper builds a smart team of AI helpers, called MEnvAgent, that automatically sets up the right computer environments for code projects in many languages.

#environment construction#software engineering agents#Fail-to-Pass (F2P)

daVinci-Dev: Agent-native Mid-training for Software Engineering

Intermediate

Ji Zeng, Dayuan Fu et al.Jan 26arXiv

This paper teaches code AIs to work more like real software engineers by training them in the middle of their learning using real development workflows.

#agentic mid-training#agent-native data#contextually-native trajectories

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Intermediate

Mike A. Merrill, Alexander G. Shaw et al.Jan 17arXiv

Terminal-Bench 2.0 is a tough test that checks how well AI agents can solve real, professional tasks by typing commands in a computer terminal.

#Terminal-Bench#command line interface#Docker containers

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

Intermediate

Lei Zhang, Mouxiang Chen et al.Jan 12arXiv

MegaFlow is a new system that helps thousands of AI agents practice and test big, messy tasks (like fixing real software bugs) all at once without crashing or wasting money.

#agent orchestration#distributed systems#event-driven architecture

SWE-RM: Execution-free Feedback For Software Engineering Agents

Intermediate

KaShun Shum, Binyuan Hui et al.Dec 26arXiv

Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.

#execution-free feedback#reward model#software engineering agents