Spectral Condition for $ฮผ$P under Width-Depth Scaling
IntermediateChenyu Zheng, Rongzhen Wang et al.Feb 28arXiv
Big AI models keep getting wider (more neurons per layer) and deeper (more layers), which often makes training unstable and hyperparameters hard to reuse.
#maximal update parametrization#ฮผP#spectral condition