🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

🎬AI Lectures43

📚All📝LLM🎯Prompts🔍RAG🤝Agents🧠Deep Learning💬NLP🤖ML📖Basics
Difficulty:
AllBeginnerIntermediateAdvanced
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHFRLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHF

Intermediate
Stanford Online

Alignment means teaching a pre-trained language model to act the way people want: safe, helpful, and harmless. A pre-trained model is like a bag of knowledge with no idea how to use it, so it may hallucinate or say unsafe things. Alignment adds an outer layer of behavior so the model answers clearly, avoids harm, and respects user intent.

#alignment#sft#rlhf
1234
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1
RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1

Intermediate
Stanford Online

This session introduces alignment for language models and why next‑token prediction alone is not enough. When models only learn to guess the next word, they can hallucinate facts, produce toxic or biased text, and follow tricky prompts the wrong way. Alignment aims to make models helpful, honest, and harmless so they do what people actually want. The lecture lays out a practical recipe to achieve this with RLHF (Reinforcement Learning from Human Feedback).

#alignment#rlhf#reward model
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2

Intermediate
Stanford Online

This session continues alignment with reinforcement learning for language models. It recaps reward hacking—when a model chases the reward in the wrong way, like writing very long answers if reward is tied to word count. The RLHF pipeline is reviewed: pre-train a model, gather human preference data, train a reward model, then fine-tune the policy using RL with a safety constraint. The main focus is how to optimize the policy while staying close to the original model using techniques like KL penalties, PPO, and DPO.

#rlhf#ppo#kl divergence
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, TritonLLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton

Intermediate
Stanford Online

Modern language models are expensive to run because they perform many matrix multiplications. The main cost comes from both compute and moving data in and out of GPU memory. Optimizing the low-level code that runs these operations can make inference and training much faster and cheaper.

#triton#gpu kernel#cuda
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 13: Data 1LLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 13: Data 1

Intermediate
Stanford Online

This class explains why data is the most important part of building language models. You learn where text data comes from (books, the web, and human feedback) and what each source is good and bad at. The instructor stresses that most of your time in real projects goes into finding, collecting, cleaning, and filtering data, not model code.

#common crawl#c4#mc4
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2Deep Learning

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2

Intermediate
Stanford Online

This session explains how to speed up and scale training when one GPU or a simple setup is not enough. It reviews data parallelism (split data across devices) and pipeline parallelism (split model across devices), then dives into practical fixes for their main bottlenecks. The key tools are gradient accumulation, virtual batch size, and interleaved pipeline stages. You’ll learn the trade‑offs between memory use, communication overhead, and idle time.

#data parallelism#pipeline parallelism#model parallelism
Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 14: Data 2LLM

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 14: Data 2

Intermediate
Stanford Online

The lecture explains why rare words are a core challenge in language modeling. Most corpora follow Zipf’s law, where a few words appear very often and a huge number appear very rarely. Rare words make probability estimates unreliable and inflate vocabulary size, which increases memory and slows training and inference.

#zipf's law#rare words#unknown token