When you tune the learning rate carefully, plain old LoRA fine-tuning works about as well as fancy new versions.
The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).