Before this work, computer-using AIs mostly copied old examples and struggled with long step-by-step tasks on real computers.
Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.
SmartSnap teaches an agent not only to finish a phone task but also to prove it with a few perfect snapshots it picks itself.
This paper builds Step-GUI, a pair of small-but-strong GUI agent models (4B/8B) that can use phones and computers by looking at screenshots and following instructions.