How I Study AI - Learn AI Papers & Lectures the Easy Way

RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2

Intermediate

Stanford Online

This session continues alignment with reinforcement learning for language models. It recaps reward hacking—when a model chases the reward in the wrong way, like writing very long answers if reward is tied to word count. The RLHF pipeline is reviewed: pre-train a model, gather human preference data, train a reward model, then fine-tune the policy using RL with a safety constraint. The main focus is how to optimize the policy while staying close to the original model using techniques like KL penalties, PPO, and DPO.

#rlhf#ppo#kl divergence

🎬AI Lectures15

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 2 - Transformer-Based Models & Tricks

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of Experts

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 12: Evaluation

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 11: Scaling Laws 2

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 10: Inference

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 9: Scaling Laws 1

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHF

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 17: Alignment - RL 2

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 6: Kernels, Triton