Large language models don’t map out a full step-by-step plan before they start thinking; they mostly plan just a little bit ahead.
Benign fine-tuning meant to make language models more helpful can accidentally make them overshare private information.
This survey turns model understanding into a step-by-step repair toolkit called Locate, Steer, and Improve.
Large reasoning models can often find the right math answer in their “head” before finishing their written steps, but this works best in languages with lots of training data like English and Chinese.
Large language models (LLMs) don’t act as a single brain; inside, each layer and module quietly makes its own mini-decisions called internal policies.