Papers23

#Mixture-of-Experts

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Timer-S1 is a huge time-series model (8.3B parameters, only 0.75B used per step) that predicts the future by thinking step-by-step inside one forward pass.

#time series forecasting#foundation models#Mixture-of-Experts

Memory Caching: RNNs with Growing Memory

Beginner

Ali Behrouz, Zeman Li et al.Feb 27arXiv

Recurrent neural networks (RNNs) are fast but forgetful because they squeeze everything they’ve seen into a tiny, fixed memory.

#Memory Caching#Recurrent Neural Networks#Attention

LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding

Intermediate

Alexander Samarin, Sergei Krutikov et al.Feb 27arXiv

Speculative decoding speeds up big language models by letting a small helper model guess several next words and having the big model check them all at once.

#speculative decoding#acceptance rate#LK losses

Arcee Trinity Large Technical Report

Intermediate

Varun Singh, Lucas Krauss et al.Feb 19arXiv

Trinity is a family of open language models that are huge on the inside but only wake up a few 'experts' for each word, so they are fast and affordable to run.

#Mixture-of-Experts#SMEBU#Gated Attention

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Intermediate

SII-OpenMOSS Team, Donghua Yu et al.Feb 9arXiv

MOVA is an open-source AI that makes videos and sounds at the same time so mouths, actions, and noises match perfectly.

#video-audio generation#lip synchronization#dual-tower architecture

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Intermediate

Mouxiang Chen, Lei Zhang et al.Feb 2arXiv

SWE-Universe is a factory-like system that turns real GitHub pull requests into safe, repeatable coding practice worlds with automatic checkers.

#SWE-Universe#software engineering agents#pull requests

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate

Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts

Intermediate

Xuan-Phi Nguyen, Shrey Pandit et al.Jan 23arXiv

Mixture-of-Experts (MoE) models often send far more tokens to a few “favorite” experts, which overloads some GPUs while others sit idle.

#Mixture-of-Experts#Expert Parallelism#Least-Loaded Expert Parallelism

LongCat-Flash-Thinking-2601 Technical Report

Beginner

Meituan LongCat Team, Anchun Gui et al.Jan 23arXiv

LongCat-Flash-Thinking-2601 is a huge 560-billion-parameter Mixture-of-Experts model built to act like a careful helper that can use tools, browse, code, and solve multi-step tasks.

#Agentic reasoning#Mixture-of-Experts#Asynchronous reinforcement learning

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Intermediate

Yu Xu, Hongbin Yan et al.Jan 12arXiv

TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.

#Task-Aware Gating#Mixture-of-Experts#Unified Image Generation

Solar Open Technical Report

Intermediate

Sungrae Park, Sanghoon Kim et al.Jan 11arXiv

Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.

#Solar Open#Mixture-of-Experts#bilingual LLM

Token-Level LLM Collaboration via FusionRoute

Intermediate

Nuoya Xiong, Yuhang Zhou et al.Jan 8arXiv

Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.

#FusionRoute#token-level collaboration#expert routing

1 2