๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Muon Optimizer

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Intermediate
Zezhou Wang, Youjie Li et al.Feb 25arXiv

This paper makes training giant AI models faster and lighter on memory by inventing a new way to split tensors called RaggedShard.

#FSDP#ZeRO#RaggedShard

Arcee Trinity Large Technical Report

Intermediate
Varun Singh, Lucas Krauss et al.Feb 19arXiv

Trinity is a family of open language models that are huge on the inside but only wake up a few 'experts' for each word, so they are fast and affordable to run.

#Mixture-of-Experts#SMEBU#Gated Attention

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Intermediate
Ailin Huang, Ang Li et al.Feb 11arXiv

Step 3.5 Flash is a huge but efficient AI that keeps 196 billion total parameters but only wakes up about 11 billion per token, so it thinks smart and fast.

#Sparse Mixture-of-Experts#Sliding-Window Attention#Head-wise Gated Attention

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Intermediate
Qifan Yu, Xinyu Ma et al.Feb 2arXiv

This paper shows how to safely make a neural network wider in the middle of training without it freaking out.

#Progressive Learning#Width Expansion#RMS scale