TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios
IntermediateYuanzhe Shen, Zisu Huang et al.Feb 2arXiv
TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.
#TRIP-Bench#long-horizon agents#multi-turn interaction