Papers16

#long-horizon planning

HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

The Viet Bui, Wenjun Li et al.Mar 5arXiv

HiMAP-Travel is a team-based AI planner that splits a long trip into daily chunks so it can follow tough rules like budgets without drifting off course.

#hierarchical planning#multi-agent systems#constraint drift

Not triaged yet

Experiential Reinforcement Learning

Intermediate

Taiwei Shi, Sihao Chen et al.Feb 15arXiv

This paper teaches AI models to learn like good students: try, think about what went wrong, fix it, and remember the fix.

#Experiential Reinforcement Learning#self-reflection#distillation

Not triaged yet

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Intermediate

Chenlong Deng, Mengjie Deng et al.Feb 11arXiv

Most image search systems judge each photo by itself, which fails when clues are split across many photos taken over time.

#context-aware image retrieval#multimodal agents#visual history exploration

Not triaged yet

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

Intermediate

Xavier Hu, Jinxiang Xia et al.Feb 10arXiv

EcoGym is a new open test playground where AI agents run small businesses over many days to see if they can plan well for the long term.

#EcoGym#long-horizon planning#LLM agents

Not triaged yet

ProAct: Agentic Lookahead in Interactive Environments

Intermediate

Yangbin Yu, Mingyu Yang et al.Feb 5arXiv

ProAct teaches AI agents to think ahead accurately without needing expensive search every time they act.

#ProAct#GLAD#MC-Critic

Not triaged yet

Steering LLMs via Scalable Interactive Oversight

Intermediate

Enyu Zhou, Zhiheng Xi et al.Feb 4arXiv

The paper tackles a common problem: people can ask AI to do big, complex tasks, but they can’t always explain exactly what they want or check the results well.

#scalable oversight#interactive alignment#requirement elicitation

Not triaged yet

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents

Beginner

Nikita Gupta, Riju Chatterjee et al.Jan 28arXiv

DeepSearchQA is a new test with 900 real-world style questions that checks if AI agents can find complete lists of answers, not just one fact.

#DeepSearchQA#agentic information retrieval#systematic collation

Not triaged yet

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Intermediate

Yinger Zhang, Shutong Jiang et al.Jan 26arXiv

DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.

#long-horizon planning#agentic tool use#global constrained optimization

Not triaged yet

Rethinking Video Generation Model for the Embodied World

Beginner

Yufan Deng, Zilin Pan et al.Jan 21arXiv

Robots need videos that not only look pretty but also follow real-world physics and finish the task asked of them.

#robot video generation#embodied AI#benchmark

Not triaged yet

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Intermediate

Chi-Pin Huang, Yunze Man et al.Jan 14arXiv

Fast-ThinkAct teaches a robot to plan with a few tiny hidden "thought tokens" instead of long paragraphs, making it much faster while staying smart.

#Vision-Language-Action#latent reasoning#verbalizable planning

Not triaged yet

User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale

Intermediate

Jungho Cho, Minbyul Jeong et al.Jan 13arXiv

The paper builds a new way to create realistic, long conversations between people and AI that use tools like databases.

#multi-turn dialogue generation#tool use#user simulation

Not triaged yet

ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking

Beginner

Qiang Zhang, Boli Chen et al.Jan 10arXiv

ArenaRL teaches AI agents by comparing their answers against each other, like a sports tournament, instead of giving each answer a single noisy score.

#ArenaRL#reinforcement learning#relative ranking

Not triaged yet

1 2