ThreadWeaver teaches a language model to split big problems into smaller parts it can solve at the same time, like teammates working in parallel.
The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.
This paper teaches computers to understand words by also looking at the smaller pieces inside words, like 'un-', 'play', and '-ing'.