Groups
Category
Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.
Distributed algorithm theory studies how many independent computers cooperate correctly and efficiently despite delays and failures.