🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Group Relative Policy Optimization

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

Intermediate
Shengjun Zhang, Zhang Zhang et al.Jan 1arXiv

This paper shows that when teaching image generators with reinforcement learning, only a few early, very noisy steps actually help the model learn what people like.

#E-GRPO#Group Relative Policy Optimization#Flow Matching

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Intermediate
Peter Chen, Xiaopeng Li et al.Dec 18arXiv

The paper studies why two opposite-sounding tricks in RL for reasoning—adding random (spurious) rewards and reducing randomness (entropy)—can both seem to help large language models think better.

#RLVR#Group Relative Policy Optimization#ratio clipping