How I Study AI - Learn AI Papers & Lectures the Easy Way

Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2

Intermediate

Stanford Online

This session explains how to speed up and scale training when one GPU or a simple setup is not enough. It reviews data parallelism (split data across devices) and pipeline parallelism (split model across devices), then dives into practical fixes for their main bottlenecks. The key tools are gradient accumulation, virtual batch size, and interleaved pipeline stages. You’ll learn the trade‑offs between memory use, communication overhead, and idle time.

#data parallelism#pipeline parallelism#model parallelism

🎬AI Lectures7

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM Training

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 2: Pytorch, Resource Accounting

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 5: GPUs

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2