🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

🎬AI Lectures15

📚All📝LLM🎯Prompts🔍RAG🤝Agents🧠Deep Learning💬NLP🤖ML📖Basics
Difficulty:
AllBeginnerIntermediateAdvanced
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM EvaluationML

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Intermediate
Stanford

Kernel methods turn simple linear algorithms into powerful non-linear ones. Instead of drawing only straight lines to separate data, they let us curve and bend the boundary by working in a higher-dimensional feature space. This keeps training simple while unlocking complex patterns.

#kernel methods#kernel trick#feature map
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks
12
ML

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks

Intermediate
Stanford

Principal Component Analysis (PCA) is a method to turn high-dimensional data into a smaller set of numbers while keeping as much useful information as possible. The lecture explains three equivalent views of PCA: best low-dimensional representation, directions of maximum variance, and best reconstruction after projection. These views all lead to the same solution using eigenvectors and eigenvalues of a certain matrix built from the data.

#principal component analysis#pca#dimensionality reduction
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of ExpertsLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of Experts

Intermediate
Stanford Online

The lecture explains why simply making language models bigger (more parameters) helped for years, but also why data size and training time matter just as much. From BERT in 2018 to GPT‑2, GPT‑3, PaLM, Chinchilla, and Llama 2, the trend shows performance rises when models are scaled correctly with enough data and compute.

#mixture of experts#sparse activation#conditional computation
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: EvaluationLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: Evaluation

Intermediate
Stanford Online

Evaluation tells us how good a language model really is. There are two big ways to judge models: intrinsic (measure the model directly) and extrinsic (measure it through real tasks). Intrinsic is fast and clean but might not reflect real-world usefulness. Extrinsic is realistic and practical but slow and complicated to run.

#language model evaluation#perplexity#intrinsic evaluation
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 11: Scaling Laws 2LLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 11: Scaling Laws 2

Intermediate
Stanford Online

Scaling laws relate a model’s log loss (how surprised it is by the next token) to three knobs: number of parameters (N), dataset size (D), and compute budget (C). As you increase N, D, and C, loss usually drops smoothly. But this only holds when you keep many other things steady and consistent.

#scaling laws#log loss#perplexity
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1

Intermediate
Stanford Online

This lesson teaches two big ways to train neural networks on many GPUs: data parallelism and model parallelism. Data parallelism copies the whole model to every GPU and splits the dataset into equal shards, then averages gradients to take one update step. Model parallelism splits the model itself across GPUs and passes activations forward and gradients backward between them.

#data parallelism#model parallelism#parameter server
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 10: InferenceLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 10: Inference

Intermediate
Stanford Online

This session explains how to use a trained language model to produce outputs, a phase called inference. It covers three task types—conditional generation, open-ended generation, and classification—each with different input/output shapes that affect decoding choices. The lecture then dives into decoding methods, which are strategies to choose the next token step by step. Finally, it discusses how to evaluate generated text using human judgments and automatic metrics, along with their trade-offs.

#inference#decoding#greedy decoding
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 9: Scaling Laws 1LLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 9: Scaling Laws 1

Intermediate
Stanford Online

Scaling laws are empirical rules that show how a model’s loss (error) drops as you grow model size, data, or compute. They take a power-law form: Loss = A × N^(-α), where N can be parameters, data tokens, or compute, and α is the scaling exponent. This lets us predict how bigger models might perform without training them.

#scaling laws#power law#language models
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHFRLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHF

Intermediate
Stanford Online

Alignment means teaching a pre-trained language model to act the way people want: safe, helpful, and harmless. A pre-trained model is like a bag of knowledge with no idea how to use it, so it may hallucinate or say unsafe things. Alignment adds an outer layer of behavior so the model answers clearly, avoids harm, and respects user intent.

#alignment#sft#rlhf
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1

Intermediate
Stanford Online

This session introduces alignment for language models and why next‑token prediction alone is not enough. When models only learn to guess the next word, they can hallucinate facts, produce toxic or biased text, and follow tricky prompts the wrong way. Alignment aims to make models helpful, honest, and harmless so they do what people actually want. The lecture lays out a practical recipe to achieve this with RLHF (Reinforcement Learning from Human Feedback).

#alignment#rlhf#reward model
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2

Intermediate
Stanford Online

This session continues alignment with reinforcement learning for language models. It recaps reward hacking—when a model chases the reward in the wrong way, like writing very long answers if reward is tied to word count. The RLHF pipeline is reviewed: pre-train a model, gather human preference data, train a reward model, then fine-tune the policy using RL with a safety constraint. The main focus is how to optimize the policy while staying close to the original model using techniques like KL penalties, PPO, and DPO.

#rlhf#ppo#kl divergence
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, TritonLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton

Intermediate
Stanford Online

Modern language models are expensive to run because they perform many matrix multiplications. The main cost comes from both compute and moving data in and out of GPU memory. Optimizing the low-level code that runs these operations can make inference and training much faster and cheaper.

#triton#gpu kernel#cuda