How I Study AI - Learn AI Papers & Lectures the Easy Way

RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2

Intermediate

Stanford Online

This session continues alignment with reinforcement learning for language models. It recaps reward hacking—when a model chases the reward in the wrong way, like writing very long answers if reward is tied to word count. The RLHF pipeline is reviewed: pre-train a model, gather human preference data, train a reward model, then fine-tune the policy using RL with a safety constraint. The main focus is how to optimize the policy while staying close to the original model using techniques like KL penalties, PPO, and DPO.

#rlhf#ppo#kl divergence

🎬AI Lectures55

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 13: Data 1

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 8: Parallelism 2

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 14: Data 2

Chapter 1: Vectors, what even are they? | Essence of Linear Algebra

Chapter 12: A geometric interpretation of Cramer's rule | Essence of Linear Algebra

Chapter 6: The determinant | Essence of Linear Algebra

Chapter 5: Three-dimensional linear transformations | Essence of Linear Algebra

Chapter 9: Dot products and duality | Essence of Linear Algebra