TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.
Small AI models often stumble when a tool call fails and then get stuck repeating bad calls instead of fixing the mistake.