How I Study AI - Learn AI Papers & Lectures the Easy Way

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Intermediate

Shengrui Li, Fei Zhao et al.Jan 31arXiv

Training big language models works best when you mix the right kinds of data (general, math, code), but finding the best mix used to be slow and very expensive.

#data mixture optimization#model merging#weighted model merging

Papers1

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training