DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers
IntermediateIonut-Vlad Modoranu, Philip Zmushko et al.Feb 2arXiv
Shampoo is a smart optimizer that can train models better than AdamW, but it used to be slow because it must compute tricky inverse matrix roots.
#Shampoo optimizer#second-order optimization#inverse matrix roots