How I Study AI - Learn AI Papers & Lectures the Easy Way

Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

Kirill Pavlenko, Alexander Golubev et al.Feb 10arXiv

The paper fixes a common mistake in training language models for multi-part tasks: giving the same reward signal to every token, even when different text parts aim at different goals.

#Blockwise Advantage Estimation#Outcome-Conditioned Baseline#Group Relative Policy Optimization

Not triaged yet

Agentic Uncertainty Quantification

Intermediate

Jiaxin Zhang, Prafulla Kumar Choubey et al.Jan 22arXiv

Long AI tasks can go wrong early and keep getting worse, like a snowball of mistakes called the Spiral of Hallucination.

#Agentic Uncertainty Quantification#Spiral of Hallucination#Dual-Process Architecture

Not triaged yet

EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

Intermediate

Jewon Yeom, Jaewon Sok et al.Jan 11arXiv

This paper teaches AI models not just how to solve problems but also how to tell when their own answers might be wrong.

#EPICAR#calibration#epistemic uncertainty

Not triaged yet

Papers3

Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

Agentic Uncertainty Quantification

EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs