How I Study AI - Learn AI Papers & Lectures the Easy Way

ObjEmbed: Towards Universal Multimodal Object Embeddings

Intermediate

Shenghao Fu, Yukun Su et al.Feb 2arXiv

ObjEmbed teaches an AI to understand not just whole pictures, but each object inside them, and to link those objects to the right words.

#object embeddings#IoU embedding#visual grounding

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Intermediate

Zixin Zhang, Kanghao Chen et al.Dec 16arXiv

This paper builds A4-Agent, a smart three-part helper that figures out where to touch or use an object just from a picture and a written instruction, without any extra training.

#affordance prediction#zero-shot learning#vision-language models

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

Intermediate

Yulu Gan, Ligeng Zhu et al.Dec 11arXiv

FoundationMotion is a fully automatic pipeline that turns raw videos into detailed motion data, captions, and quizzes about how things move.

#motion understanding#spatio-temporal reasoning#video question answering

Papers3

ObjEmbed: Towards Universal Multimodal Object Embeddings

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos