🎓How I Study AIHISA
đź“–Read
📄Papers📰Blogs🎬Courses
đź’ˇLearn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers18

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Mixture-of-Experts

SWE-Universe: Scale Real-World Verifiable Environments to Millions

Intermediate
Mouxiang Chen, Lei Zhang et al.Feb 2arXiv

SWE-Universe is a factory-like system that turns real GitHub pull requests into safe, repeatable coding practice worlds with automatic checkers.

#SWE-Universe#software engineering agents#pull requests

OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution

Intermediate
Le Zhang, Yixiong Xiao et al.Jan 28arXiv

OmegaUse is a new AI that can use phones and computers by looking at screenshots and deciding where to click, type, or scroll—much like a careful human user.

#GUI agent#UI grounding#navigation policy

Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts

Intermediate
Xuan-Phi Nguyen, Shrey Pandit et al.Jan 23arXiv

Mixture-of-Experts (MoE) models often send far more tokens to a few “favorite” experts, which overloads some GPUs while others sit idle.

#Mixture-of-Experts#Expert Parallelism#Least-Loaded Expert Parallelism

LongCat-Flash-Thinking-2601 Technical Report

Beginner
Meituan LongCat Team, Anchun Gui et al.Jan 23arXiv

LongCat-Flash-Thinking-2601 is a huge 560-billion-parameter Mixture-of-Experts model built to act like a careful helper that can use tools, browse, code, and solve multi-step tasks.

#Agentic reasoning#Mixture-of-Experts#Asynchronous reinforcement learning

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Intermediate
Yu Xu, Hongbin Yan et al.Jan 12arXiv

TAG-MoE is a new way to steer Mixture-of-Experts (MoE) models using clear task hints, so the right “mini-experts” handle the right parts of an image job.

#Task-Aware Gating#Mixture-of-Experts#Unified Image Generation

Solar Open Technical Report

Intermediate
Sungrae Park, Sanghoon Kim et al.Jan 11arXiv

Solar Open is a giant bilingual AI (102 billion parameters) that focuses on helping underserved languages like Korean catch up with English-level AI quality.

#Solar Open#Mixture-of-Experts#bilingual LLM

Token-Level LLM Collaboration via FusionRoute

Intermediate
Nuoya Xiong, Yuhang Zhou et al.Jan 8arXiv

Big all-in-one language models are powerful but too expensive to run everywhere, while small specialists are cheaper but narrow.

#FusionRoute#token-level collaboration#expert routing

DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation

Intermediate
Guanzhi Deng, Bo Li et al.Jan 8arXiv

Mixture-of-Experts (MoE) models use many small specialist networks and only activate a few per token, but classic LoRA fine-tuning gives every expert the same rank, wasting parameters on the wrong experts.

#DR-LoRA#Mixture-of-Experts#Low-Rank Adaptation

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Intermediate
Yan Wang, Yitao Xu et al.Jan 6arXiv

Mixture-of-Experts (MoE) language models don’t split cleanly into domain specialists; instead, a small, stable group of experts gets chosen again and again across many subjects.

#Mixture-of-Experts#Standing Committee#Sparse routing

MiMo-V2-Flash Technical Report

Intermediate
Xiaomi LLM-Core Team, : et al.Jan 6arXiv

MiMo-V2-Flash is a giant but efficient language model that uses a team-of-experts design to think well while staying fast.

#Mixture-of-Experts#Sliding Window Attention#Global Attention

K-EXAONE Technical Report

Intermediate
Eunbi Choi, Kibong Choi et al.Jan 5arXiv

K-EXAONE is a super-sized language model that speaks six languages and can read very long documents (up to 256,000 tokens) without forgetting important details.

#Mixture-of-Experts#Hybrid Attention#Sliding Window Attention

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Intermediate
Ang Lv, Jin Ma et al.Dec 29arXiv

Mixture-of-Experts (MoE) models use many small specialist networks (experts) and a router to pick which experts handle each token, but the router isn’t explicitly taught what each expert is good at.

#Mixture-of-Experts#expert-router coupling#auxiliary loss
12