FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
IntermediateZhaopeng Qiu, Shuang Yu et al.Jan 26arXiv
The paper shows how to speed up reinforcement learning (RL) for large language models (LLMs) by making numbers smaller (FP8) without breaking training.
#FP8 quantization#LLM reinforcement learning#KV-cache