🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

🎬AI Lectures43

📚All📝LLM🎯Prompts🔍RAG🤝Agents🧠Deep Learning💬NLP🤖ML📖Basics
Difficulty:
AllBeginnerIntermediateAdvanced
Stanford CS329H: ML from Human Preferences | Autumn 2024 | Model-based Preference OptimizationBasics

Stanford CS329H: ML from Human Preferences | Autumn 2024 | Model-based Preference Optimization

Beginner
Stanford

Decision trees are flowchart-like models used to predict a class (like yes/no) by asking a series of questions about features. You start at the root and follow branches based on answers until you reach a leaf with a class label. Each internal node tests one attribute, each branch is an outcome of that test, and each leaf gives the prediction.

#decision tree#entropy#information gain
1234
Stanford CS329H: Machine Learning from Human Preferences | Autumn 2024 | Mechanism DesignML

Stanford CS329H: Machine Learning from Human Preferences | Autumn 2024 | Mechanism Design

Beginner
Stanford

This lesson explains the core pieces of machine learning: data (X and Y), models f(x;θ), loss functions that measure mistakes, and optimizers that adjust θ to reduce the loss. It divides learning into supervised (with labels), unsupervised (without labels), and reinforcement learning (with rewards). The focus here is on supervised learning, especially regression and classification, plus a short intro to k-means clustering.

#supervised learning#unsupervised learning#regression
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 2: Pytorch, Resource AccountingDeep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 2: Pytorch, Resource Accounting

Beginner
Stanford Online

This session teaches two essentials for building language models: PyTorch basics and resource accounting. PyTorch is a library for working with tensors (multi‑dimensional arrays) and can run on CPU or GPU. You learn how to create tensors, perform math (including matrix multiplies), reshape, index/slice, and use automatic differentiation to compute gradients for training.

#pytorch#tensor#autograd
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 3: Architectures, HyperparametersLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 3: Architectures, Hyperparameters

Beginner
Stanford Online

Language modeling means predicting the next token (a token is a small piece of text like a word or subword) given all tokens before it. If you can estimate this next-token probability well, you can generate text by sampling one token at a time and appending it to the history. This step-by-step sampling turns probabilities into full sentences or paragraphs. Good models make these probabilities sharp for likely words and low for unlikely ones.

#language modeling#next-token prediction#embedding
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 1: Overview and TokenizationNLP

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 1: Overview and Tokenization

Beginner
Stanford Online

This session introduces a brand-new course on building language models from scratch. You learn what language modeling is, where it’s used (speech recognition, translation, text generation, classification), and how different modeling families work. The class emphasizes implementing models yourself in Python and PyTorch, plus how to train and evaluate them.

#language modeling#tokenization#n-gram
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of ExpertsLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of Experts

Intermediate
Stanford Online

The lecture explains why simply making language models bigger (more parameters) helped for years, but also why data size and training time matter just as much. From BERT in 2018 to GPT‑2, GPT‑3, PaLM, Chinchilla, and Llama 2, the trend shows performance rises when models are scaled correctly with enough data and compute.

#mixture of experts#sparse activation#conditional computation
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: EvaluationLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: Evaluation

Intermediate
Stanford Online

Evaluation tells us how good a language model really is. There are two big ways to judge models: intrinsic (measure the model directly) and extrinsic (measure it through real tasks). Intrinsic is fast and clean but might not reflect real-world usefulness. Extrinsic is realistic and practical but slow and complicated to run.

#language model evaluation#perplexity#intrinsic evaluation
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 11: Scaling Laws 2LLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 11: Scaling Laws 2

Intermediate
Stanford Online

Scaling laws relate a model’s log loss (how surprised it is by the next token) to three knobs: number of parameters (N), dataset size (D), and compute budget (C). As you increase N, D, and C, loss usually drops smoothly. But this only holds when you keep many other things steady and consistent.

#scaling laws#log loss#perplexity
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 5: GPUsDeep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 5: GPUs

Beginner
Stanford Online

GPUs (Graphics Processing Units) are critical for deep learning because they run thousands of simple math operations at the same time. Language models like Transformers rely on huge numbers of matrix multiplications, which are perfect for parallel processing. CPUs have a few strong cores for complex, step-by-step tasks, while GPUs have many simpler cores for doing lots of math in parallel. Using GPUs correctly can make training and inference dramatically faster.

#gpu#cuda#pytorch
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1

Intermediate
Stanford Online

This lesson teaches two big ways to train neural networks on many GPUs: data parallelism and model parallelism. Data parallelism copies the whole model to every GPU and splits the dataset into equal shards, then averages gradients to take one update step. Model parallelism splits the model itself across GPUs and passes activations forward and gradients backward between them.

#data parallelism#model parallelism#parameter server
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 10: InferenceLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 10: Inference

Intermediate
Stanford Online

This session explains how to use a trained language model to produce outputs, a phase called inference. It covers three task types—conditional generation, open-ended generation, and classification—each with different input/output shapes that affect decoding choices. The lecture then dives into decoding methods, which are strategies to choose the next token step by step. Finally, it discusses how to evaluate generated text using human judgments and automatic metrics, along with their trade-offs.

#inference#decoding#greedy decoding
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 9: Scaling Laws 1LLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 9: Scaling Laws 1

Intermediate
Stanford Online

Scaling laws are empirical rules that show how a model’s loss (error) drops as you grow model size, data, or compute. They take a power-law form: Loss = A × N^(-α), where N can be parameters, data tokens, or compute, and α is the scaling exponent. This lets us predict how bigger models might perform without training them.

#scaling laws#power law#language models