Groups
Category
Level
0-1 loss directly measures classification error but is discontinuous and non-convex, making optimization computationally hard.
Empirical Risk Minimization (ERM) chooses a model that minimizes the average loss on the training data.
Stochastic Gradient Descent (SGD) updates model parameters using small random subsets (mini-batches) of data, making learning faster and more memory-efficient.
Statistical learning theory explains why a model that fits training data can still predict well on unseen data by relating true risk to empirical risk plus a complexity term.
PAC learning formalizes when a learner can probably (with probability at least 1−δ) and approximately (error at most ε) succeed using a polynomial number of samples.