Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.
Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.
The paper shows that teaching a language model with a special “reward-shaped” next-token objective can make later reinforcement learning (RL) work much better.