Papers2

#Long-Context Modeling

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Transformers are powerful but slow because regular self-attention compares every token with every other token, which grows too fast for long sequences.

#Multi-Head Linear Attention#Linear Attention#Self-Attention

Not triaged yet

MiMo-V2-Flash Technical Report

Intermediate

Xiaomi LLM-Core Team, : et al.Jan 6arXiv

MiMo-V2-Flash is a giant but efficient language model that uses a team-of-experts design to think well while staying fast.

#Mixture-of-Experts#Sliding Window Attention#Global Attention

Not triaged yet