๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Speculative Decoding

GLM-5: from Vibe Coding to Agentic Engineering

Intermediate
GLM-5 Team, Aohan Zeng et al.Feb 17arXiv

GLM-5 is a new open-weight AI model that moves from 'vibe coding' (prompting the model to write code) to 'agentic engineering' (letting the model plan, build, test, and fix software on its own).

#GLM-5#Agentic Engineering#DeepSeek Sparse Attention

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Intermediate
Ailin Huang, Ang Li et al.Feb 11arXiv

Step 3.5 Flash is a huge but efficient AI that keeps 196 billion total parameters but only wakes up about 11 billion per token, so it thinks smart and fast.

#Sparse Mixture-of-Experts#Sliding-Window Attention#Head-wise Gated Attention

Scaling Embeddings Outperforms Scaling Experts in Language Models

Intermediate
Hong Liu, Jiaqi Zhang et al.Jan 29arXiv

The paper shows that growing the embedding part of a language model (especially with n-grams) can beat adding more MoE experts once you pass a certain sparsity 'sweet spot.'

#N-gram Embedding#Mixture-of-Experts (MoE)#Embedding Scaling

MiMo-V2-Flash Technical Report

Intermediate
Xiaomi LLM-Core Team, : et al.Jan 6arXiv

MiMo-V2-Flash is a giant but efficient language model that uses a team-of-experts design to think well while staying fast.

#Mixture-of-Experts#Sliding Window Attention#Global Attention

NVIDIA Nemotron 3: Efficient and Open Intelligence

Intermediate
NVIDIA, : et al.Dec 24arXiv

Nemotron 3 is a new family of open AI models (Nano, Super, Ultra) built to think better while running faster and cheaper.

#Nemotron 3#Mixture-of-Experts#LatentMoE