Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 3: Architectures, Hyperparameters
BeginnerLanguage modeling means predicting the next token (a token is a small piece of text like a word or subword) given all tokens before it. If you can estimate this next-token probability well, you can generate text by sampling one token at a time and appending it to the history. This step-by-step sampling turns probabilities into full sentences or paragraphs. Good models make these probabilities sharp for likely words and low for unlikely ones.
#language modeling#next-token prediction#embedding