🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#exploration–exploitation

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Intermediate
Rujiao Long, Yang Li et al.Dec 19arXiv

Reasoning Palette gives a language or vision-language model a tiny hidden “mood” (a latent code) before it starts answering, so it chooses a smarter plan rather than just rolling dice on each next word.

#Reasoning Palette#latent contextualization#VAE

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Intermediate
Peter Chen, Xiaopeng Li et al.Dec 18arXiv

The paper studies why two opposite-sounding tricks in RL for reasoning—adding random (spurious) rewards and reducing randomness (entropy)—can both seem to help large language models think better.

#RLVR#Group Relative Policy Optimization#ratio clipping