TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.
DeepPlanning is a new benchmark that tests whether AI can make long, realistic plans that fit time and money limits.