Papers3

#Grouped-Query Attention

Arcee Trinity Large Technical Report

Varun Singh, Lucas Krauss et al.Feb 19arXiv

Trinity is a family of open language models that are huge on the inside but only wake up a few 'experts' for each word, so they are fast and affordable to run.

#Mixture-of-Experts#SMEBU#Gated Attention

Not triaged yet

MiMo-V2-Flash Technical Report

Intermediate

Xiaomi LLM-Core Team, : et al.Jan 6arXiv

MiMo-V2-Flash is a giant but efficient language model that uses a team-of-experts design to think well while staying fast.

#Mixture-of-Experts#Sliding Window Attention#Global Attention

Not triaged yet

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Intermediate

NVIDIA, : et al.Dec 23arXiv

Nemotron 3 Nano is a new open-source language model that mixes two brain styles (Mamba and Transformer) and adds a team of special experts (MoE) so it thinks better while running much faster.

#Mixture-of-Experts#Mamba-2#Transformer

Not triaged yet