๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#MoE routing

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Beginner
Jinrui Zhang, Chaodong Xiao et al.Feb 12arXiv

Training big language models usually needs super-expensive, tightly connected GPU clusters, which most people do not have.

#decentralized LLM pretraining#mixture-of-experts (MoE)#sparse expert synchronization

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Intermediate
Zhaopeng Qiu, Shuang Yu et al.Jan 26arXiv

The paper shows how to speed up reinforcement learning (RL) for large language models (LLMs) by making numbers smaller (FP8) without breaking training.

#FP8 quantization#LLM reinforcement learning#KV-cache