πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

🎬AI Lectures2

πŸ“šAllπŸ“LLM🎯PromptsπŸ”RAG🀝Agents🧠Deep LearningπŸ’¬NLPπŸ€–MLπŸ“–Basics
Difficulty:
AllBeginnerIntermediateAdvanced
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1

Intermediate
Stanford Online

This lesson teaches two big ways to train neural networks on many GPUs: data parallelism and model parallelism. Data parallelism copies the whole model to every GPU and splits the dataset into equal shards, then averages gradients to take one update step. Model parallelism splits the model itself across GPUs and passes activations forward and gradients backward between them.

#data parallelism#model parallelism#parameter server
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2

Intermediate
Stanford Online

This session explains how to speed up and scale training when one GPU or a simple setup is not enough. It reviews data parallelism (split data across devices) and pipeline parallelism (split model across devices), then dives into practical fixes for their main bottlenecks. The key tools are gradient accumulation, virtual batch size, and interleaved pipeline stages. You’ll learn the trade‑offs between memory use, communication overhead, and idle time.

#data parallelism#pipeline parallelism#model parallelism