CoVe is a way to create training conversations for AI agents that use tools, while guaranteeing the conversations are both challenging and correct.
Accuracy alone can make AI agents look good on paper while still failing in real life; this paper shows how to measure reliability properly.
This paper fixes a common problem in reasoning AIs called Lazy Reasoning, where the model rambles instead of making a good plan.
The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step βhow-toβ knowledge.