This paper makes training giant AI models faster and lighter on memory by inventing a new way to split tensors called RaggedShard.
Trinity is a family of open language models that are huge on the inside but only wake up a few 'experts' for each word, so they are fast and affordable to run.
Step 3.5 Flash is a huge but efficient AI that keeps 196 billion total parameters but only wakes up about 11 billion per token, so it thinks smart and fast.
This paper shows how to safely make a neural network wider in the middle of training without it freaking out.