How I Study AI - Learn AI Papers & Lectures the Easy Way

SageBwd: A Trainable Low-bit Attention

Beginner

Jintao Zhang, Marco Chen et al.Mar 2arXiv

SageBwd is a way to make the Transformer's attention both fast and trainable by doing most big multiplications in 8-bit instead of full precision.

#SageBwd#low-bit attention#INT8 training

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Beginner

Yifan Zhou, Zeqi Xiao et al.Dec 18arXiv

This paper introduces Log-linear Sparse Attention (LLSA), a new way for Diffusion Transformers to focus only on the most useful information using a smart, layered search.

#Log-linear Sparse Attention#Hierarchical Top-K#Hierarchical KV Enrichment

LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Beginner

Chenkai Xu, Yijie Jin et al.Dec 18arXiv

This paper speeds up diffusion language models (dLLMs) by changing the order in which they fill in missing words.

#Diffusion LLM#Parallel decoding#Token Filling Order

Papers3

SageBwd: A Trainable Low-bit Attention

Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding