Papers3

All Beginner Intermediate Advanced

All Sources arXiv

#Test-time scaling

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Intermediate

Qiyuan Zhang, Yufei Wang et al.Mar 2arXiv

Longer explanations are not always better; the shape of thinking matters.

#Generative Reward Models#Chain-of-Thought#Breadth-CoT

Not triaged yet

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Intermediate

Leon Liangyu Chen, Haoyu Ma et al.Feb 12arXiv

UniT teaches one multimodal model to think in steps with pictures and words, so it can check its own work and fix mistakes as it goes.

#Unified multimodal model#Chain-of-thought#Test-time scaling

Not triaged yet

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Intermediate

Wei Liu, Jiawei Xu et al.Feb 5arXiv

This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.

#Triton kernels#Reinforcement learning#Policy gradient

Not triaged yet