AgentArk teaches one language model to think like a whole team of models that debate, so it can solve tough problems quickly without running a long, expensive debate at answer time.
The paper discovers a tiny, special group of neurons inside large language models (LLMs) that act like a reward system in the human brain.
When training smart language models with RL that use right-or-wrong rewards, learning can stall on 'saturated' problems that the model almost always solves.