Papers3

#parameter-efficient fine-tuning

The paper asks a simple question: which kind of step-by-step reasoning helps small language models learn best, and why?

Not triaged yet

When you tune the learning rate carefully, plain old LoRA fine-tuning works about as well as fancy new versions.

Not triaged yet

The paper asks which small, add-on training tricks (PEFT) work best when we teach language models with yes/no rewards we can check (RLVR).

Not triaged yet