🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

🎬AI Lectures7

📚All📝LLM🎯Prompts🔍RAG🤝Agents🧠Deep Learning💬NLP🤖ML📖Basics
Difficulty:
AllBeginnerIntermediateAdvanced
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM TrainingDeep Learning

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM Training

Beginner
Stanford

This lecture explains how we train neural networks by minimizing a loss function using optimization methods. It starts with gradient descent and stochastic gradient descent (SGD), showing how we update parameters by stepping opposite to the gradient. Mini-batches make training faster and add helpful noise that can escape bad spots in the loss landscape called local minima.

#gradient descent#stochastic gradient descent#mini-batch
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current TrendsDeep Learning

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Beginner
Stanford

The lecture explains what deep learning is and why it changed how we build intelligent systems. In the past, engineers wrote step-by-step rules (like detecting corners and lines) to identify objects in images. These hand-built rules often broke when lighting, angle, or season changed. Deep learning replaces these hand-crafted rules with models that learn directly from data.

#deep learning#neural networks#machine learning
Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep LearningDeep Learning

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Beginner
Stanford Online

This lecture kicks off Stanford CS230 and explains what deep learning is: a kind of machine learning that uses multi-layer neural networks to learn complex patterns. Andrew Ng highlights its strength in understanding images, language, and speech by learning layered features like edges, textures, and objects. The message is that deep learning’s power comes from big data, flexible architectures, and non-linear functions that let models represent complex relationships.

#deep learning#neural network#perceptron
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 2: Pytorch, Resource AccountingDeep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 2: Pytorch, Resource Accounting

Beginner
Stanford Online

This session teaches two essentials for building language models: PyTorch basics and resource accounting. PyTorch is a library for working with tensors (multi‑dimensional arrays) and can run on CPU or GPU. You learn how to create tensors, perform math (including matrix multiplies), reshape, index/slice, and use automatic differentiation to compute gradients for training.

#pytorch#tensor#autograd
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 5: GPUsDeep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 5: GPUs

Beginner
Stanford Online

GPUs (Graphics Processing Units) are critical for deep learning because they run thousands of simple math operations at the same time. Language models like Transformers rely on huge numbers of matrix multiplications, which are perfect for parallel processing. CPUs have a few strong cores for complex, step-by-step tasks, while GPUs have many simpler cores for doing lots of math in parallel. Using GPUs correctly can make training and inference dramatically faster.

#gpu#cuda#pytorch
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1

Intermediate
Stanford Online

This lesson teaches two big ways to train neural networks on many GPUs: data parallelism and model parallelism. Data parallelism copies the whole model to every GPU and splits the dataset into equal shards, then averages gradients to take one update step. Model parallelism splits the model itself across GPUs and passes activations forward and gradients backward between them.

#data parallelism#model parallelism#parameter server
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2

Intermediate
Stanford Online

This session explains how to speed up and scale training when one GPU or a simple setup is not enough. It reviews data parallelism (split data across devices) and pipeline parallelism (split model across devices), then dives into practical fixes for their main bottlenecks. The key tools are gradient accumulation, virtual batch size, and interleaved pipeline stages. You’ll learn the trade‑offs between memory use, communication overhead, and idle time.

#data parallelism#pipeline parallelism#model parallelism