Groups
Category
Knowledge distillation loss blends standard hard-label cross-entropy with a soft distribution match from a teacher using a temperature parameter.
Cross-entropy measures how well a proposed distribution Q predicts outcomes actually generated by a true distribution P.