Papers3

#Layer Normalization

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

The paper shows that using information from many layers of a language model (not just one) helps text-to-image diffusion transformers follow prompts much better.

#Diffusion Transformer#Text Conditioning#Multi-layer LLM Features

Not triaged yet

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Intermediate

Chen Chen, Lai WeiJan 27arXiv

Big AI models used to get better by getting wider or reading longer texts, but those tricks are slowing down.

#Keel#Post-LayerNorm#Pre-LayerNorm

Not triaged yet

Attention Is All You Need

Intermediate

Ashish Vaswani, Noam Shazeer et al.Jun 12arXiv

The paper introduces the Transformer, a model that understands and generates sequences (like sentences) using only attention, without RNNs or CNNs.

#Transformer#Self-Attention#Multi-Head Attention

Not triaged yet