๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#sliding-window attention

Solaris: Building a Multiplayer Video World Model in Minecraft

Intermediate
Georgy Savva, Oscar Michel et al.Feb 25arXiv

Solaris is a new AI that can imagine the future videos of two Minecraft players at the same time, keeping both cameras consistent with each other.

#multiplayer world model#video diffusion transformer#Minecraft dataset

Voxtral Realtime

Beginner
Alexander H. Liu, Andy Ehrenberg et al.Feb 11arXiv

Voxtral Realtime is a speech-to-text model that types what you say almost instantly, while keeping accuracy close to the best offline systems.

#streaming ASR#real-time transcription#causal audio encoder

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Intermediate
Tongcheng Fang, Hanling Zhang et al.Jan 23arXiv

Videos are made of very long lists of tokens, and regular attention looks at every pair of tokens, which is slow and expensive.

#SALAD#sparse attention#linear attention

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Beginner
Yifei Yu, Xiaoshan Wu et al.Dec 4arXiv

VideoSSM is a new way to make long, stable, and lively videos by giving the model two kinds of memory: a short-term window and a long-term state-space memory.

#autoregressive video diffusion#state-space model#hybrid memory