Groups
Category
Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.
Stratified sampling reduces Monte Carlo variance by dividing the domain into non-overlapping regions (strata) and sampling within each region.