How I Study AI - Learn AI Papers & Lectures the Easy Way

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Intermediate

Haocheng Xi, Charlie Ruan et al.Jan 20arXiv

Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.

#FP8 quantization#on-policy reinforcement learning#precision flow

Papers1

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow