🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers14

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#long-horizon reasoning

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Intermediate
Zhaochen Su, Jincheng Gao et al.Feb 26arXiv

AgentVista is a new test (benchmark) that checks whether AI agents can solve tough, real-life picture-based problems by using multiple tools over many steps.

#AgentVista#multimodal agents#visual grounding

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Intermediate
Qianben Chen, Tianrui Qin et al.Feb 26arXiv

This paper shows that letting an AI search many places at the same time (in parallel) can beat making it think in long, slow chains.

#agentic search#parallel evidence acquisition#plan refinement

Free(): Learning to Forget in Malloc-Only Reasoning Models

Intermediate
Yilun Zheng, Dongyang Ma et al.Feb 8arXiv

LLMs can think for many steps, but when they keep every step forever, the extra tokens turn into noise and make answers worse, not better.

#Free()LM#self-forgetting#context pruning

AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

Beginner
Xianyang Liu, Shangding Gu et al.Feb 5arXiv

AgenticPay is a safe playground where AI agents practice buying and selling by talking, not just by typing numbers.

#multi-agent negotiation#language-mediated bargaining#LLM agents

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Intermediate
Ian Wu, Yuxiao Qu et al.Feb 3arXiv

Reasoning Cache (RC) is a new way for AI to think in steps: it writes some thoughts, makes a short summary, throws away the long thoughts, and then keeps going using only the summary.

#Reasoning Cache#iterative decoding#summary-conditioned reasoning

Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience

Intermediate
Zhongxiang Sun, Qipeng Wang et al.Jan 30arXiv

Deep search agents can plan and browse the web in many steps, but they often fail because they don’t notice when their own thinking drifts off-track.

#deep search agents#metacognition#consistency monitoring

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Intermediate
Wenxuan Huang, Yu Zeng et al.Jan 29arXiv

The paper tackles a real problem: one-shot image or text searches often miss the right evidence (low hit-rate), especially in noisy, cluttered pictures.

#multimodal deep research#visual question answering#ReAct reasoning

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Intermediate
Yaorui Shi, Shugui Liu et al.Jan 29arXiv

MemOCR is a new way for AI to remember long histories by turning important notes into a picture with big, bold parts for key facts and tiny parts for details.

#MemOCR#visual memory#adaptive information density

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Intermediate
Minghao Yan, Bo Peng et al.Jan 15arXiv

PACEvolve is a new recipe that helps AI agents improve their ideas step by step over long periods without getting stuck.

#evolutionary search#LLM agents#context management

Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning

Intermediate
Yuyang Hu, Jiongnan Liu et al.Jan 8arXiv

This paper turns an AI agent’s memory from a flat list of notes into a logic map of events connected by cause-and-time links.

#event-centric memory#Event Graph#logic-aware retrieval

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Intermediate
Junbo Li, Peng Zhou et al.Dec 18arXiv

Turn-PPO is a new way to train chatty AI agents that act over many steps, by judging each conversation turn as one whole action instead of judging every single token.

#Turn-PPO#multi-turn reinforcement learning#agentic LLMs

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Intermediate
Jingzhe Ding, Shengda Long et al.Dec 14arXiv

NL2Repo-Bench is a new benchmark that tests if coding agents can build a whole Python library from just one long natural-language document and an empty folder.

#NL2Repo-Bench#autonomous coding agents#long-horizon reasoning
12