Stanford CME295 Transformers & LLMs
Course Content

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer
Decision trees are simple, powerful models that make predictions by asking a sequence of yes/no questions about input features. They work for both classification (like spam vs. not spam) and regression (like house price prediction). The tree’s structure—what to split on and when—comes directly from the data, which is why we call them non-parametric.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks
Principal Component Analysis (PCA) is a method to turn high-dimensional data into a smaller set of numbers while keeping as much useful information as possible. The lecture explains three equivalent views of PCA: best low-dimensional representation, directions of maximum variance, and best reconstruction after projection. These views all lead to the same solution using eigenvectors and eigenvalues of a certain matrix built from the data.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 3 - Tranformers & Large Language Models
Artificial Intelligence (AI) is the science of making machines do tasks that would need intelligence if a person did them. Today’s AI mostly focuses on specific tasks like recognizing faces or recommending products, which is called narrow AI. A future goal is general AI, which would do any thinking task a human can, but it does not exist yet.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM Training
This lecture explains how we train neural networks by minimizing a loss function using optimization methods. It starts with gradient descent and stochastic gradient descent (SGD), showing how we update parameters by stepping opposite to the gradient. Mini-batches make training faster and add helpful noise that can escape bad spots in the loss landscape called local minima.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 5 - LLM tuning
Regularization is a method to prevent overfitting by adding a penalty for model complexity. Overfitting happens when a model memorizes training data, including noise, and performs poorly on new data. By discouraging overly complex patterns, regularization helps the model generalize better.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 6 - LLM Reasoning
Machine learning lets computers learn patterns from data instead of following hand-made rules. Instead of writing instructions like “pointy ears = cat,” we feed many labeled examples and let the computer discover what features matter. This makes ML flexible and powerful for messy, real-world problems where rules are hard to write. Arthur Samuel’s classic definition captures this: computers learn without being explicitly programmed.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 7 - Agentic LLMs
This lecture explains L1 regularization, also called LASSO, as a way to prevent overfitting by adding a penalty to the loss that depends on the absolute values of model weights. Overfitting means a model memorizes the training data but fails on new data. By penalizing large weights, L1 helps the model focus on the strongest, most useful features.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation
Kernel methods turn simple linear algorithms into powerful non-linear ones. Instead of drawing only straight lines to separate data, they let us curve and bend the boundary by working in a higher-dimensional feature space. This keeps training simple while unlocking complex patterns.

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends
The lecture explains what deep learning is and why it changed how we build intelligent systems. In the past, engineers wrote step-by-step rules (like detecting corners and lines) to identify objects in images. These hand-built rules often broke when lighting, angle, or season changed. Deep learning replaces these hand-crafted rules with models that learn directly from data.