Groups
Category
The Weak Law of Large Numbers (WLLN) says that the sample average of independent, identically distributed (i.i.d.) random variables with finite mean gets close to the true mean with high probability as the sample size grows.
Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.