This paper shows how to safely make a neural network wider in the middle of training without it freaking out.
The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.