This paper shows how to train big language models faster and cheaper by using 4-bit numbers (NVFP4) without losing much accuracy.
AI programs called LLMs can now help write the tiny, super-fast pieces of code (kernels) that make GPUs run AI models efficiently.