Rethinking Selective Knowledge Distillation
IntermediateAlmog Tavor, Itay Ebenspanger et al.Feb 1arXiv
The paper studies how to teach a smaller language model using a bigger one by only focusing on the most useful bits instead of everything.
#knowledge distillation#selective distillation#student entropy