The paper tackles a common problem: people can ask AI to do big, complex tasks, but they can’t always explain exactly what they want or check the results well.
The paper fixes a common problem in training AI reasoners: models get stuck using the same favorite solution style and stop exploring new ways to solve problems.
The paper fixes a big problem in training web-searching AI: rewarding only the final answer makes agents cut corners and sometimes hallucinate.
COMPASS is a new framework that turns a company’s rules into thousands of smart test questions to check if chatbots follow those rules.