Evaluating Parameter Efficient Methods for RLVR
IntermediateQingyu Yin, Yulun Wu et al.Dec 29arXiv
The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).
#RLVR#parameter-efficient fine-tuning#LoRA