This paper tests whether AI can realistically guess what a specific social media user would comment when they see a new post.
Different programming languages scale differently when training code AI models, so treating them all the same wastes compute and lowers performance.
The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.