One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment
IntermediateHongru Cai, Yongqi Li et al.Jan 26arXiv
Large language models often learn one-size-fits-all preferences, but people are different, so we need personalization.
#personalized alignment#reward modeling#meta-learning