Spilled Energy in Large Language Models
IntermediateAdrian Robert Minut, Hazem Dewidar et al.Feb 21arXiv
The paper treats the last layer of a Large Language Model (the softmax over tokens) as an Energy-Based Model, which lets us measure a new signal called spilled energy.
#spilled energy#energy-based models#marginal energy