Groups
Category
Level
RLHF turns human preferences between two model outputs into training signals using a probabilistic model of choice.
Grokking is when a model suddenly starts to generalize well long after it has already memorized the training set.
Cross-entropy measures how well a proposed distribution Q predicts outcomes actually generated by a true distribution P.
A loss landscape is the โterrainโ of a modelโs loss as you move through parameter space; valleys are good solutions and peaks are bad ones.
Convex optimization studies minimizing convex functions over convex sets, where every local minimum is guaranteed to be a global minimum.