Papers5

#code agents

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

This paper shows that code-writing AI agents can take an existing math problem and automatically turn it into a new, harder one while keeping it solvable.

#code agents#multi-agent systems#mathematical reasoning

Not triaged yet

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

Intermediate

Guoxin Chen, Fanzhe Meng et al.Mar 3arXiv

BeyondSWE is a new benchmark that tests code agents on tougher, more real-life tasks than single-repo bug fixing.

#BeyondSWE#code agents#software engineering benchmark

Not triaged yet

MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences

Beginner

Qihao Wang, Ziming Cheng et al.Jan 11arXiv

MemGovern teaches code agents to learn from past human fixes on GitHub by turning messy discussions into clean, reusable 'experience cards.'

#MemGovern#experience governance#agentic search

Not triaged yet

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Intermediate

Junru Lu, Jiarui Qin et al.Dec 31arXiv

Youtu-LLM is a small (1.96B) language model that was trained from scratch to think, plan, and act like an agent instead of just copying bigger models.

#lightweight LLM#agentic mid-training#trajectory data

Not triaged yet

DeepCode: Open Agentic Coding

Beginner

Zongwei Li, Zhonghang Li et al.Dec 8arXiv

DeepCode is an AI coding system that turns long, complicated papers into full, working code repositories.

#agentic coding#document-to-code#information-flow management

Not triaged yet