πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#visual question answering

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Intermediate
Yu Zeng, Wenxuan Huang et al.Feb 2arXiv

The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.

#multimodal large language model#visual question answering#vision deep research

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Intermediate
Wenxuan Huang, Yu Zeng et al.Jan 29arXiv

The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.

#multimodal deep research#visual question answering#ReAct reasoning

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Beginner
Dasol Choi, Guijin Son et al.Jan 7arXiv

Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.

#vision-language models#under-specified queries#query explicitation

Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

Intermediate
Qihao Liu, Chengzhi Mao et al.Dec 18arXiv

AuditDM is a friendly 'auditor' model that hunts for where vision-language models get things wrong and then creates the right practice to fix them.

#AuditDM#model auditing#cross-model divergence