๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#privileged supervision

Self-Hinting Language Models Enhance Reinforcement Learning

Intermediate
Baohao Liao, Hanze Dong et al.Feb 3arXiv

When rewards are rare, a popular training method for language models (GRPO) often stops learning because every try in a group gets the same score, so there is nothing to compare.

#reinforcement learning#GRPO#self-hinting