🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#weight decay

Transformers converge to invariant algorithmic cores

Intermediate
Joshua S. SchiffmanFeb 26arXiv

Different transformers may have very different weights, but they often hide the same tiny "engine" inside that actually does the task.

#algorithmic cores#mechanistic interpretability#transformers

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

Intermediate
Jianliang He, Leda Wang et al.Feb 18arXiv

This paper explains, in detail, how a simple two-layer neural network learns to add numbers on a clock (modular addition) by building and combining wave-like patterns called Fourier features.

#modular addition#Fourier features#discrete Fourier transform

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Intermediate
Maksim Velikanov, Ilyas Chahed et al.Jan 8arXiv

The paper shows that big language models often get stuck with weight sizes set by training hyperparameters instead of by the data, which quietly hurts performance.

#learnable multipliers#weight decay#noise–WD equilibrium

Visualizing the Loss Landscape of Neural Nets

Intermediate
Hao Li, Zheng Xu et al.Dec 28arXiv

Training a neural network is like finding the lowest spot in a giant, bumpy landscape called the loss landscape.

#loss landscape visualization#filter normalization#sharpness flatness