Groups
Layer Normalization rescales and recenters each sample across its feature dimensions, making it independent of batch size.
The Central Limit Theorem (CLT) says that the sum or average of many independent, identically distributed variables with finite variance becomes approximately normal (Gaussian).