Papers4

#edge deployment

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Abdelrahman Shaker, Ahmed Heakl et al.Feb 23arXiv

Mobile-O is a small but smart AI that can both understand pictures and make new images, and it runs right on your phone.

#Mobile-O#unified multimodal model#on-device AI

Not triaged yet

AnyDepth: Depth Estimation Made Easy

Intermediate

Zeyu Ren, Zeyu Zhang et al.Jan 6arXiv

AnyDepth is a new, simple way for a computer to tell how far things are in a picture using just one image (monocular depth).

#monocular depth estimation#zero-shot depth#Simple Depth Transformer

Not triaged yet

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Intermediate

Hongyuan Tao, Bencheng Liao et al.Dec 9arXiv

InfiniteVL is a vision-language model that mixes two ideas: local focus with Sliding Window Attention and long-term memory with a linear module called Gated DeltaNet.

#InfiniteVL#linear attention#Gated DeltaNet

Not triaged yet

Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in $\{\pm 1, \pm i\}$

Intermediate

Feiyu Wang, Xinyu Tan et al.Dec 2arXiv

Fairy2i turns any pre-trained real-valued Transformer layer into an exactly equivalent complex form, so nothing changes before quantization.

#LLM quantization#complex-valued neural networks#widely-linear transformation

Not triaged yet