๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#straight-through estimator

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

Intermediate
Hyochan Chong, Dongkyu Kim et al.Feb 6arXiv

NanoQuant is a new way to shrink large language models down to 1-bit and even less than 1-bit per weight without retraining on huge datasets.

#post-training quantization#sub-1-bit quantization#binary LLMs

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Intermediate
Yue Ding, Yiyan Ji et al.Feb 4arXiv

OmniSIFT is a new way to shrink (compress) audio and video tokens so omni-modal language models can think faster without forgetting important details.

#Omni-LLM#token compression#modality-asymmetric

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Beginner
Zecheng Tang, Quantong Qiu et al.Jan 24arXiv

Transformers slow down on very long inputs because standard attention looks at every token pair, which is expensive.

#elastic attention#sparse attention#full attention