๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Set-of-Marks

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Intermediate
Qihua Dong, Kuo Yang et al.Feb 27arXiv

This paper builds a new test called Ref-Adv to check if AI can truly match tricky sentences to the right thing in a picture.

#Referring Expression Comprehension#Visual Grounding#Multimodal Large Language Models

WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

Intermediate
Hao Bai, Alexey Taymanov et al.Jan 5arXiv

WebGym is a giant practice world (almost 300,000 tasks) that lets AI web agents learn on real, ever-changing websites instead of tiny, fake ones.

#WebGym#visual web agents#vision-language models

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Intermediate
Chiao-An Yang, Ryo Hachiuma et al.Dec 18arXiv

This paper teaches a video-understanding AI to think in 3D plus time (4D) so it can answer questions about specific objects moving in videos.

#4D perception#multimodal large language models#perceptual distillation