🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Verifiable Rewards

Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

Intermediate
Kirill Pavlenko, Alexander Golubev et al.Feb 10arXiv

The paper fixes a common mistake in training language models for multi-part tasks: giving the same reward signal to every token, even when different text parts aim at different goals.

#Blockwise Advantage Estimation#Outcome-Conditioned Baseline#Group Relative Policy Optimization

iGRPO: Self-Feedback-Driven LLM Reasoning

Beginner
Ali Hatamizadeh, Shrimai Prabhumoye et al.Feb 9arXiv

This paper teaches a language model to improve its own math answers by first writing several drafts and then learning to beat its best draft.

#iGRPO#GRPO#Reinforcement Learning

Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition

Intermediate
Jinlong Ma, Yu Zhang et al.Feb 4arXiv

The paper teaches multimodal large language models (MLLMs) to stop guessing from just text or just images and instead check both together before answering.

#GMNER#Multimodal Large Language Models#Modality Bias

Reinforcement Learning via Self-Distillation

Intermediate
Jonas Hübotter, Frederike Lübeck et al.Jan 28arXiv

The paper teaches large language models to learn from detailed feedback (like error messages) instead of only a simple pass/fail score.

#Self-Distillation#Reinforcement Learning with Rich Feedback#SDPO

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Intermediate
Changle Qu, Sunhao Dai et al.Jan 15arXiv

MatchTIR teaches AI agents to judge each tool call step-by-step instead of giving the same reward to every step.

#Tool-Integrated Reasoning#Credit Assignment#Bipartite Matching