Groups
Category
Level
Softmax turns arbitrary real-valued scores (logits) into probabilities that sum to one.
Scaling laws say that model loss typically follows a power law that improves predictably as you increase parameters, data, or compute.