The paper discovers a tiny, special group of neurons inside large language models (LLMs) that act like a reward system in the human brain.
This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.
The paper shows that when an LLM is trained with spurious (misleading) rewards in RLVR, it can score higher by memorizing answers instead of reasoning.
The paper shows that top reasoning AIs don’t just think longer—they act like a tiny team inside their heads, with different voices that ask, disagree, and then agree.
Large language models (LLMs) are good at many math problems but often mess up simple counting when the list gets long.
This paper shows a new way (called RISE) to find and control how AI models think without needing any human-made labels.