SageBwd: A Trainable Low-bit Attention
BeginnerJintao Zhang, Marco Chen et al.Mar 2arXiv
SageBwd is a way to make the Transformer's attention both fast and trainable by doing most big multiplications in 8-bit instead of full precision.
#SageBwd#low-bit attention#INT8 training