The paper fixes a common mistake in training language models for multi-part tasks: giving the same reward signal to every token, even when different text parts aim at different goals.
QuCo-RAG is a new way to decide when an AI should look things up while it writes, using facts from its training data instead of its own shaky confidence.