KARL is a smart search helper that learns to look up information step by step and explain answers using the facts it finds.
The paper studies a simple way to train giant language models with reinforcement learning by replacing a hard-to-compute term (the log-partition function) with something easy: the mean reward.