🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers7

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#long-context modeling

Reinforced Fast Weights with Next-Sequence Prediction

Intermediate
Hee Seung Hwang, Xindi Wu et al.Feb 18arXiv

Fast weight models remember context with a tiny, fixed memory, but standard next-token training teaches them to think only one word ahead.

#fast weight models#next-sequence prediction#reinforcement learning for LMs

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Intermediate
MiniCPM Team, Wenhao An et al.Feb 12arXiv

MiniCPM-SALA is a 9B-parameter language model that mixes two kinds of attention—sparse and linear—to read very long texts quickly and accurately.

#long-context modeling#sparse attention#linear attention

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Intermediate
Shuo Chen, Cong Wei et al.Feb 5arXiv

The paper fixes a big problem in long video generation: models either forget what happened or slowly drift off-topic over time.

#autoregressive video generation#long-context modeling#distribution matching distillation

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Intermediate
Sidi Lu, Zhenwen Liang et al.Feb 4arXiv

Locas is a new kind of add-on memory for language models that learns during use but touches none of the model’s original weights.

#Locas#parametric memory#test-time training

MOSS Transcribe Diarize Technical Report

Beginner
MOSI. AI, : et al.Jan 4arXiv

This paper introduces MOSS Transcribe Diarize, a single model that writes down what people say in a conversation, tells who said each part, and marks the exact times—all in one go.

#speaker diarization#speech recognition#end-to-end SATS

SWE-RM: Execution-free Feedback For Software Engineering Agents

Intermediate
KaShun Shum, Binyuan Hui et al.Dec 26arXiv

Coding agents used to fix software rely on feedback; unit tests give only pass/fail signals that are often noisy or missing.

#execution-free feedback#reward model#software engineering agents

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

Intermediate
Jingdi Lei, Di Zhang et al.Dec 14arXiv

Standard attention is slow for long texts because it compares every word with every other word, which takes quadratic time.

#error-free linear attention#rank-1 matrix exponential#continuous-time dynamics