Revisiting Parameter Server in LLM Post-Training
IntermediateXinyi Wan, Penghui Qi et al.Jan 27arXiv
Large language model (LLM) post-training has uneven work per GPU because some text sequences are much longer than others.
#On-Demand Communication#Fully Sharded Data Parallel#Parameter Server