๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers28

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#vision-language models

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Intermediate
Zixin Zhang, Kanghao Chen et al.Dec 16arXiv

This paper builds A4-Agent, a smart three-part helper that figures out where to touch or use an object just from a picture and a written instruction, without any extra training.

#affordance prediction#zero-shot learning#vision-language models

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

Intermediate
Yulu Gan, Ligeng Zhu et al.Dec 11arXiv

FoundationMotion is a fully automatic pipeline that turns raw videos into detailed motion data, captions, and quizzes about how things move.

#motion understanding#spatio-temporal reasoning#video question answering

From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models

Intermediate
Zongzhao Li, Xiangzhe Kong et al.Dec 11arXiv

The paper defines Microscopic Spatial Intelligence (MiSI) as the skill AI needs to understand tiny 3D things like molecules from 2D pictures and text, just like scientists do.

#microscopic spatial intelligence#vision-language models#orthographic projection

M3DR: Towards Universal Multilingual Multimodal Document Retrieval

Intermediate
Adithya S Kolavi, Vyoman JainDec 3arXiv

The paper introduces M3DR, a way for computers to find the right document image no matter which of 22 languages the query or the document uses.

#multilingual retrieval#multimodal retrieval#document image search
123