🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#multimodal reasoning

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

Beginner
Jiachun Li, Shaoping Huang et al.Mar 2arXiv

MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.

#multimodal reasoning#multi-image understanding#real-life benchmark

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Beginner
Jingxuan Wei, Honghao He et al.Feb 12arXiv

The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.

#Thinking with Drafting#optical decompression#visual algebra

When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Beginner
Jiacheng Hou, Yining Sun et al.Feb 10arXiv

Modern image editors can now follow visual prompts like arrows and scribbles, which opens a new way for attackers to hide harmful instructions inside images.

#vision-centric jailbreak#image editing safety#visual prompts

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Beginner
Honglin Lin, Chonghan Qin et al.Jan 17arXiv

The paper studies how to make and judge scientific images that are not just pretty but scientifically correct.

#scientific image synthesis#text-to-image (T2I)#programmatic diagram generation

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Beginner
Jinyang Wu, Guocheng Zhai et al.Jan 7arXiv

ATLAS is a system that picks the best mix of AI models and helper tools for each question, instead of using just one model or a fixed tool plan.

#ATLAS#LLM routing#tool augmentation

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Beginner
Dasol Choi, Guijin Son et al.Jan 7arXiv

Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.

#vision-language models#under-specified queries#query explicitation