Papers2

#Policy gradient

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.

#Triton kernels#Reinforcement learning#Policy gradient

Not triaged yet

PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

Intermediate

Jiarui Yao, Ruida Wang et al.Jan 15arXiv

Large language models usually get only a final thumbs-up or thumbs-down at the end of their answer, which is too late to fix mistakes made in the middle.

#Process Reward Learning#PRL#Reasoning LLMs

Not triaged yet