Papers6

#multimodal reasoning

MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning

Jiachun Li, Shaoping Huang et al.Mar 2arXiv

MMR-Life is a new test (benchmark) that checks how AI understands everyday situations using several real photos at once.

#multimodal reasoning#multi-image understanding#real-life benchmark

Not triaged yet

Thinking with Drafting: Optical Decompression via Logical Reconstruction

Beginner

Jingxuan Wei, Honghao He et al.Feb 12arXiv

The paper fixes a common problem in AI: models can read pictures and text well, but they often mess up the logic behind them.

#Thinking with Drafting#optical decompression#visual algebra

Not triaged yet

When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Beginner

Jiacheng Hou, Yining Sun et al.Feb 10arXiv

Modern image editors can now follow visual prompts like arrows and scribbles, which opens a new way for attackers to hide harmful instructions inside images.

#vision-centric jailbreak#image editing safety#visual prompts

Not triaged yet

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Beginner

Honglin Lin, Chonghan Qin et al.Jan 17arXiv

The paper studies how to make and judge scientific images that are not just pretty but scientifically correct.

#scientific image synthesis#text-to-image (T2I)#programmatic diagram generation

Not triaged yet

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

Beginner

Jinyang Wu, Guocheng Zhai et al.Jan 7arXiv

ATLAS is a system that picks the best mix of AI models and helper tools for each question, instead of using just one model or a fixed tool plan.

#ATLAS#LLM routing#tool augmentation

Not triaged yet

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Beginner

Dasol Choi, Guijin Son et al.Jan 7arXiv

Real people often ask vague questions with pictures, and today’s vision-language models (VLMs) struggle with them.

#vision-language models#under-specified queries#query explicitation

Not triaged yet