SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning
IntermediateQifan Yu, Xinyu Ma et al.Feb 2arXiv
This paper shows how to safely make a neural network wider in the middle of training without it freaking out.
#Progressive Learning#Width Expansion#RMS scale