On-Policy Self-Distillation for Reasoning Compression
BeginnerHejian Sang, Yuanda Xu et al.Mar 5arXiv
Reasoning models often talk too much, and those extra words can actually make them more wrong.
#on-policy self-distillation#reasoning compression#conciseness instruction