🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#AdamW

Spectral Condition for $μ$P under Width-Depth Scaling

Intermediate
Chenyu Zheng, Rongzhen Wang et al.Feb 28arXiv

Big AI models keep getting wider (more neurons per layer) and deeper (more layers), which often makes training unstable and hyperparameters hard to reuse.

#maximal update parametrization#μP#spectral condition

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Intermediate
Qifan Yu, Xinyu Ma et al.Feb 2arXiv

This paper shows how to safely make a neural network wider in the middle of training without it freaking out.

#Progressive Learning#Width Expansion#RMS scale

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Intermediate
Maksim Velikanov, Ilyas Chahed et al.Jan 8arXiv

The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.

#learnable multipliers#weight decay#noise–WD equilibrium