Papers7

#multi-hop reasoning

MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning

Chenhao Zhang, Yazhe Niu et al.Feb 11arXiv

Pictures can hide deeper meanings, like a wilted plant meaning someone feels burned out; most AI models miss these hints.

#image metaphor understanding#image implication#visual reinforcement learning

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models

Intermediate

Yu Zeng, Wenxuan Huang et al.Feb 2arXiv

The paper introduces VDR-Bench, a new test with 2,000 carefully built questions that truly require both seeing (images) and reading (web text) to find answers.

#multimodal large language model#visual question answering#vision deep research

SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback

Intermediate

Fangyuan Xu, Rujun Han et al.Jan 26arXiv

SAGE is a two-agent system that automatically writes tough, multi-step search questions and checks them by actually trying to solve them.

#deep search#agentic data generation#execution feedback

Agentic Very Long Video Understanding

Intermediate

Aniket Rege, Arka Sadhu et al.Jan 26arXiv

The paper tackles understanding super long, first‑person videos (days to a week) by giving an AI a smarter memory and better tools.

#entity scene graph#agentic planning#long-horizon video understanding

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Intermediate

Seongyun Lee, Yongrae Jo et al.Jan 12arXiv

The paper shows that when we give AI lots of extra text, even harmless extra text, it can get badly confused—sometimes losing up to 80% of its accuracy.

#NoisyBench#Rationale-Aware Reward#RARE

Dr. Zero: Self-Evolving Search Agents without Training Data

Intermediate

Zhenrui Yue, Kartikeya Upasani et al.Jan 11arXiv

Dr. Zero is a pair of AI agents (a Proposer and a Solver) that teach each other to do web-search-based reasoning without any human-written training data.

#Dr. Zero#self-evolution#proposer-solver

Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning

Intermediate

Yuyang Hu, Jiongnan Liu et al.Jan 8arXiv

This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.

#event-centric memory#Event Graph#logic-aware retrieval