Groups
Category
Transformer expressiveness studies what kinds of sequence-to-sequence mappings a Transformer can represent or approximate.
The kernel (lazy) regime keeps neural network parameters close to their initialization, making training equivalent to kernel regression with a fixed kernel such as the Neural Tangent Kernel (NTK).