πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#gpt-oss-120b

Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts

Intermediate
Xuan-Phi Nguyen, Shrey Pandit et al.Jan 23arXiv

Mixture-of-Experts (MoE) models often send far more tokens to a few β€œfavorite” experts, which overloads some GPUs while others sit idle.

#Mixture-of-Experts#Expert Parallelism#Least-Loaded Expert Parallelism

Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision

Intermediate
Wei Du, Shubham Toshniwal et al.Dec 17arXiv

Nemotron-Math is a giant math dataset with 7.5 million step-by-step solutions created in three thinking styles and with or without Python help.

#mathematical reasoning#long-context fine-tuning#multi-mode supervision