This paper shows that many reasoning failures in AI are caused by just a few distracting words in the prompt, not because the problems are too hard.
AT2PO is a new way to train AI agents that work in several turns, like asking the web a question, reading the result, and trying again.