Groups
Category
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.
Maximum Likelihood Estimation (MLE) picks parameters that make the observed data most probable under a chosen probabilistic model.