How I Study AI - Learn AI Papers & Lectures the Easy Way

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Zhaopeng Qiu, Shuang Yu et al.Jan 26arXiv

The paper shows how to speed up reinforcement learning (RL) for large language models (LLMs) by making numbers smaller (FP8) without breaking training.

#FP8 quantization#LLM reinforcement learning#KV-cache

Not triaged yet

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Intermediate

Haocheng Xi, Charlie Ruan et al.Jan 20arXiv

Reinforcement learning (RL) for large language models is slow because the rollout (text generation) stage can take more than 70% of training time, especially for long, step-by-step answers.

#FP8 quantization#on-policy reinforcement learning#precision flow

Not triaged yet

Kling-Omni Technical Report

Intermediate

Kling Team, Jialu Chen et al.Dec 18arXiv

Kling-Omni is a single, unified model that can understand text, images, and videos together and then make or edit high-quality videos from those mixed instructions.

#multimodal visual language#MVL#prompt enhancer

Not triaged yet

Papers3

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Kling-Omni Technical Report