VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse
IntermediateYing Nie, Kai Han et al.Dec 16arXiv
Large language models get smarter when they get bigger, but storing all those extra weights eats tons of memory.
#VersatileFFN#parameter efficiency#virtual experts