🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers7

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Reinforcement Learning with Verifiable Rewards

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Intermediate
Qiyuan Zhang, Yufei Wang et al.Mar 2arXiv

Longer explanations are not always better; the shape of thinking matters.

#Generative Reward Models#Chain-of-Thought#Breadth-CoT

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

Intermediate
Yuanda Xu, Hejian Sang et al.Feb 24arXiv

The paper shows that when training reasoning AIs with reinforcement learning, treating every wrong answer the same makes the AI overconfident in some bad paths and less diverse overall.

#ACE#Reinforcement Learning with Verifiable Rewards#GRPO

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Intermediate
Xin Xu, Clive Bai et al.Feb 12arXiv

This paper shows a simple way to turn many 'too-easy' questions into harder, still-checkable ones so that AI keeps learning instead of stalling.

#Reinforcement Learning with Verifiable Rewards#Compositional prompts#Sequential Prompt Composition

Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Intermediate
Shiting Huang, Zecheng Li et al.Feb 10arXiv

The paper teaches large language models to do what good students do: find where they went wrong, turn that lesson into a rule, and remember it for next time.

#Reinforcement Learning with Verifiable Rewards#RLVR#Meta-Experience Learning

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Intermediate
Ximing Lu, David Acuna et al.Jan 30arXiv

Golden Goose turns messy internet text into clean multiple-choice puzzles that computers can learn from and get automatic rewards for.

#Reinforcement Learning with Verifiable Rewards#Golden Goose#GooseReason-0.7M

Exploring Reasoning Reward Model for Agents

Intermediate
Kaixuan Fan, Kaituo Feng et al.Jan 29arXiv

The paper teaches AI agents better by grading not just their final answers, but also how they think and use tools along the way.

#Agentic Reinforcement Learning#Reasoning Reward Model#Process Supervision

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Intermediate
Said Taghadouini, Adrien Cavaillès et al.Jan 20arXiv

LightOnOCR-2-1B is a single, compact AI model that reads PDF pages and scans and turns them into clean, well-ordered text without using fragile multi-step OCR pipelines.

#LightOnOCR-2-1B#end-to-end OCR#vision-language model