This paper shows how to train big language models faster and cheaper by using 4-bit numbers (NVFP4) without losing much accuracy.
Training big AI models uses lots of memory because most methods still keep a secret full-precision copy of the weights called master weights.