ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.
This paper builds Step-GUI, a pair of small-but-strong GUI agent models (4B/8B) that can use phones and computers by looking at screenshots and following instructions.
This paper builds A4-Agent, a smart three-part helper that figures out where to touch or use an object just from a picture and a written instruction, without any extra training.