The paper shows that when we train with the popular InfoNCE contrastive loss, the learned features start to behave like they come from a Gaussian (bell-shaped) distribution.
This paper shows how to get strong text embeddings from decoder-only language models without any training.