Step-GUI Technical Report
IntermediateHaolong Yan, Jia Wang et al.Dec 17arXiv
This paper builds Step-GUI, a pair of small-but-strong GUI agent models (4B/8B) that can use phones and computers by looking at screenshots and following instructions.
#GUI automation#multimodal large language models#trajectory-level calibration