Concepts2

Groups

Distributed & Parallel Optimization

Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.

#data parallelism#synchronous sgd#asynchronous sgd+12

Feature Learning vs Kernel Regime

The kernel (lazy) regime keeps neural network parameters close to their initialization, making training equivalent to kernel regression with a fixed kernel such as the Neural Tangent Kernel (NTK).

#neural tangent kernel