Papers9

#RoPE

Utonia: Toward One Encoder for All Point Clouds

Yujia Zhang, Xiaoyang Wu et al.Mar 3arXiv

Utonia is a single brain (encoder) that learns from many kinds of 3D point clouds, like indoor rooms, outdoor streets, tiny toys, and even city maps.

#Utonia#point cloud#self-supervised learning

Not triaged yet

Arcee Trinity Large Technical Report

Intermediate

Varun Singh, Lucas Krauss et al.Feb 19arXiv

Trinity is a family of open language models that are huge on the inside but only wake up a few 'experts' for each word, so they are fast and affordable to run.

#Mixture-of-Experts#SMEBU#Gated Attention

Not triaged yet

FASA: Frequency-aware Sparse Attention

Intermediate

Yifei Wang, Yueqi Wang et al.Feb 3arXiv

FASA is a training-free method that makes large language models faster and lighter on memory by keeping only the most useful past tokens during decoding.

#FASA#Frequency-aware sparse attention#KV cache compression

Not triaged yet

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts

Intermediate

Yingfa Chen, Zhen Leng Thai et al.Jan 29arXiv

This paper shows how to turn a big Transformer model into a faster hybrid model that mixes attention and RNN layers using far less training data (about 2.3B tokens).

#hybrid attention#RNN attention hybrid#linear attention

Not triaged yet

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Intermediate

Pengze Zhang, Yanze Wu et al.Jan 20arXiv

OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).

#spatio-temporal video transfer#identity transfer#style transfer

Not triaged yet

K-EXAONE Technical Report

Intermediate

Eunbi Choi, Kibong Choi et al.Jan 5arXiv

K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.

#Mixture-of-Experts#Hybrid Attention#Sliding Window Attention

Not triaged yet

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Intermediate

Zeyuan Allen-ZhuDec 19arXiv

The paper introduces Canon layers, tiny add-ons that let nearby words share information directly, like passing notes along a row of desks.

#Canon layers#horizontal information flow#transformer architecture

Not triaged yet

RePo: Language Models with Context Re-Positioning

Intermediate

Huayang Li, Tianyu Zhao et al.Dec 16arXiv

Large language models usually line words up in fixed order slots, which can waste mental energy and make it harder to find the important parts of a long or noisy text.

#context re-positioning#positional encoding#self-attention

Not triaged yet

Group Representational Position Encoding

Intermediate

Yifan Zhang, Zixiang Chen et al.Dec 8arXiv

GRAPE is a new way to tell Transformers where each word is in a sentence by using neat math moves called group actions.

#GRAPE#positional encoding#group actions

Not triaged yet