Stanford CME295 Transformers & LLMs

9 lectures

Course Content

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Decision trees are simple, powerful models that make predictions by asking a sequence of yes/no questions about input features. They work for both classification (like spam vs. not spam) and regression (like house price prediction). The tree’s structure—what to split on and when—comes directly from the data, which is why we call them non-parametric.

Stanfordbeginner

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks

Principal Component Analysis (PCA) is a method to turn high-dimensional data into a smaller set of numbers while keeping as much useful information as possible. The lecture explains three equivalent views of PCA: best low-dimensional representation, directions of maximum variance, and best reconstruction after projection. These views all lead to the same solution using eigenvectors and eigenvalues of a certain matrix built from the data.

Stanfordintermediate

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 3 - Tranformers & Large Language Models

Artificial Intelligence (AI) is the science of making machines do tasks that would need intelligence if a person did them. Today’s AI mostly focuses on specific tasks like recognizing faces or recommending products, which is called narrow AI. A future goal is general AI, which would do any thinking task a human can, but it does not exist yet.

Stanfordbeginner

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM Training

This lecture explains how we train neural networks by minimizing a loss function using optimization methods. It starts with gradient descent and stochastic gradient descent (SGD), showing how we update parameters by stepping opposite to the gradient. Mini-batches make training faster and add helpful noise that can escape bad spots in the loss landscape called local minima.

Stanfordbeginner

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 5 - LLM tuning

Regularization is a method to prevent overfitting by adding a penalty for model complexity. Overfitting happens when a model memorizes training data, including noise, and performs poorly on new data. By discouraging overly complex patterns, regularization helps the model generalize better.

Stanfordbeginner

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 6 - LLM Reasoning

Machine learning lets computers learn patterns from data instead of following hand-made rules. Instead of writing instructions like “pointy ears = cat,” we feed many labeled examples and let the computer discover what features matter. This makes ML flexible and powerful for messy, real-world problems where rules are hard to write. Arthur Samuel’s classic definition captures this: computers learn without being explicitly programmed.

Stanfordbeginner

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 7 - Agentic LLMs

This lecture explains L1 regularization, also called LASSO, as a way to prevent overfitting by adding a penalty to the loss that depends on the absolute values of model weights. Overfitting means a model memorizes the training data but fails on new data. By penalizing large weights, L1 helps the model focus on the strongest, most useful features.

Stanfordbeginner

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Kernel methods turn simple linear algorithms into powerful non-linear ones. Instead of drawing only straight lines to separate data, they let us curve and bend the boundary by working in a higher-dimensional feature space. This keeps training simple while unlocking complex patterns.

Stanfordintermediate

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

The lecture explains what deep learning is and why it changed how we build intelligent systems. In the past, engineers wrote step-by-step rules (like detecting corners and lines) to identify objects in images. These hand-built rules often broke when lighting, angle, or season changed. Deep learning replaces these hand-crafted rules with models that learn directly from data.

Stanfordbeginner