This paper teaches language models not just to get the final answer right but to think in a way others can reliably follow.
Large language models get smarter when they get bigger, but storing all those extra weights eats tons of memory.