The paper shows a new way to teach AI assistants how to use tools in many-step conversations by mining ordinary text on the internet for step-by-step “how-to” knowledge.
Large language models often sound confident even when they are wrong, and existing ways to catch mistakes are slow or not very accurate.
Large language models can say things that sound right but aren’t supported by the given document; this is called a faithfulness hallucination.
The paper shows how a vision-language model (VLM) can train itself to be a fair judge of answers about images without using any human preference labels.