RLHFAlignment means teaching a pre-trained language model to act the way people want: safe, helpful, and harmless. A pre-trained model is like a bag of knowledge with no idea how to use it, so it may hallucinate or say unsafe things. Alignment adds an outer layer of behavior so the model answers clearly, avoids harm, and respects user intent.

This session introduces alignment for language models and why next‑token prediction alone is not enough. When models only learn to guess the next word, they can hallucinate facts, produce toxic or biased text, and follow tricky prompts the wrong way. Alignment aims to make models helpful, honest, and harmless so they do what people actually want. The lecture lays out a practical recipe to achieve this with RLHF (Reinforcement Learning from Human Feedback).