The paper discovers a tiny, special group of neurons inside large language models (LLMs) that act like a reward system in the human brain.
AI agents often act very sure of themselves even when they are wrong, especially on long, multi-step tasks.
This paper studies how AI agents that use tools talk about how sure they are and finds a split: some tools make them too sure, others help them be honest.