GLM-5 is a new open-weight AI model that moves from 'vibe coding' (prompting the model to write code) to 'agentic engineering' (letting the model plan, build, test, and fix software on its own).
CLI-Gym is a new way to create lots of realistic computer-fixing tasks for AI by safely breaking and then repairing software environments inside containers.
ContextBench is a new benchmark that checks not just whether a coding AI fixes a bug, but whether it found and used the right pieces of code along the way.
The paper introduces RPG-Encoder, a way to turn a whole code repository into one clear map that mixes meaning (semantics) with structure (dependencies).
Long tasks trip up most AIs because they lose track of goals and make small mistakes that snowball over many steps.
ABC-Bench is a new test that checks if AI coding agents can really do backend work from start to finish, not just write a few lines of code.
This paper builds an open, end-to-end ecosystem (ALE) that lets AI agents plan, act, and fix their own mistakes across many steps in real computer environments.
The paper introduces Nemotron-Cascade, a step-by-step (cascaded) reinforcement learning recipe that trains an AI across domains like alignment, instructions, math, coding, and software engineering—one at a time.