How I Study AI - Learn AI Papers & Lectures the Easy Way

Privileged Information Distillation for Language Models

Intermediate

Emiliano Penaloza, Dheeraj Vattikonda et al.Feb 4arXiv

The paper shows how to train a language model with special extra hints (privileged information) during practice so it can still do well later without any hints.

#Privileged Information#Knowledge Distillation#π-Distill

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Intermediate

Hang Yan, Xinyu Che et al.Feb 2arXiv

This paper studies how AI agents get better while they are working, not just whether they finish the job.

#Test-Time Improvement#LLM agents#trajectory analysis

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Intermediate

Ming Chen, Sheng Tang et al.Dec 6arXiv

The paper shows that making a model write a number as a sequence of digits and then grading the whole number at the end works better than grading each digit separately.

#decoding-based regression#sequence-level reward#reinforcement learning

Papers3

Privileged Information Distillation for Language Models

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning