FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
IntermediateMingyu Ouyang, Kevin Qinghong Lin et al.Jan 7arXiv
FOCUSUI makes computer-using AI faster and still accurate by looking only at the important parts of a screen.
#UI grounding#vision-language models#visual token pruning