CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.
TRIP-Bench is a new test that checks if AI travel agents can plan real trips over many chat turns while following strict rules and changing user requests.