Papers8

#SWE-bench

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5 Team, Aohan Zeng et al.Feb 17arXiv

GLM-5 is a new open-weight AI model that moves from 'vibe coding' (prompting the model to write code) to 'agentic engineering' (letting the model plan, build, test, and fix software on its own).

#GLM-5#Agentic Engineering#DeepSeek Sparse Attention

Not triaged yet

CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

Intermediate

Yusong Lin, Haiyang Wang et al.Feb 11arXiv

CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.

#agentic coding#command line interface#Dockerfile

Not triaged yet

ContextBench: A Benchmark for Context Retrieval in Coding Agents

Intermediate

Han Li, Letian Zhu et al.Feb 5arXiv

ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.

#context retrieval#coding agents#software engineering benchmarks

Not triaged yet

Closing the Loop: Universal Repository Representation with RPG-Encoder

Intermediate

Jane Luo, Chengyu Yin et al.Feb 2arXiv

The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).

#Repository Planning Graph#RPG-Encoder#semantic lifting

Not triaged yet

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Intermediate

Mohan Jiang, Dayuan Fu et al.Feb 2arXiv

Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.

#long-horizon agency#pull request chains#software evolution

Not triaged yet

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

Intermediate

Jie Yang, Honglin Guo et al.Jan 16arXiv

ABC-Bench is a new test that checks if AI coding agents can really do backend work from start to finish, not just write a few lines of code.

#ABC-Bench#agentic backend coding#end-to-end API testing

Not triaged yet

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Intermediate

Weixun Wang, XiaoXiao Xu et al.Dec 31arXiv

This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.

#agentic LLMs#reinforcement learning#IPA

Not triaged yet

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Intermediate

Boxin Wang, Chankyu Lee et al.Dec 15arXiv

The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.

#Cascaded Reinforcement Learning#RLHF#Instruction-Following RL

Not triaged yet