How I Study AI - Learn AI Papers & Lectures the Easy Way

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

Mathieu Sibue, Andres Muñoz Garza et al.Feb 12arXiv

ExStrucTiny is a new test (benchmark) that checks if AI can pull many connected facts from all kinds of documents and neatly put them into JSON, even when the question style and schema change.

#structured information extraction#document understanding#vision-language models

Not triaged yet

Step-GUI Technical Report

Intermediate

Haolong Yan, Jia Wang et al.Dec 17arXiv

This paper builds Step-GUI, a pair of small-but-strong GUI agent models (4B/8B) that can use phones and computers by looking at screenshots and following instructions.

#GUI automation#multimodal large language models#trajectory-level calibration

Not triaged yet

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Intermediate

Zixin Zhang, Kanghao Chen et al.Dec 16arXiv

This paper builds A4-Agent, a smart three-part helper that figures out where to touch or use an object just from a picture and a written instruction, without any extra training.

#affordance prediction#zero-shot learning#vision-language models

Not triaged yet

Papers3

ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

Step-GUI Technical Report

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning