P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
IntermediatePinyi Zhang, Ting-En Lin et al.Feb 12arXiv
This paper introduces P-GenRM, a personalized generative reward model that judges AI answers using a custom scorecard built just for each user and situation.
#personalized reward modeling#generative reward model#evaluation chain