TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.
When a model learns from many rewards at once, a popular method called GRPO can accidentally squash different reward mixes into the same learning signal, which confuses training.