πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
πŸ“Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#chain-of-thought (CoT)

Likelihood-Based Reward Designs for General LLM Reasoning

Beginner
Ariel Kwiatkowski, Natasha Butt et al.Feb 3arXiv

Binary right/wrong rewards for training reasoning in large language models are hard to design and often too sparse to learn from.

#log-likelihood reward#chain-of-thought (CoT)#reinforcement learning for LLMs

Not triaged yet

Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Intermediate
Shaotian Yan, Kaiyuan Liu et al.Jan 14arXiv

The paper introduces DASD-4B-Thinking, a small (4B) open-source reasoning model that scores like much larger models on hard math, science, and coding tests.

#sequence-level distillation#divergence-aware sampling#temperature-scheduled learning

Not triaged yet