Groups
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.
Cross-entropy loss measures how well predicted probabilities match the true labels by penalizing confident wrong predictions heavily.