๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Test-time scaling

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Intermediate
Qiyuan Zhang, Yufei Wang et al.Mar 2arXiv

Longer explanations are not always better; the shape of thinking matters.

#Generative Reward Models#Chain-of-Thought#Breadth-CoT

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Intermediate
Leon Liangyu Chen, Haoyu Ma et al.Feb 12arXiv

UniT teaches one multimodal model to think in steps with pictures and words, so it can check its own work and fix mistakes as it goes.

#Unified multimodal model#Chain-of-thought#Test-time scaling

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Intermediate
Wei Liu, Jiawei Xu et al.Feb 5arXiv

This paper teaches a language model to write fast GPU kernels (tiny speed programs) in Triton using reinforcement learning that really cares about meaningful speed, not just being correct.

#Triton kernels#Reinforcement learning#Policy gradient