πŸŽ“How I Study AIHISA
πŸ“–Read
πŸ“„PapersπŸ“°Blogs🎬Courses
πŸ’‘Learn
πŸ›€οΈPathsπŸ“šTopicsπŸ’‘Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#token compression

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Intermediate
Yue Ding, Yiyan Ji et al.Feb 4arXiv

OmniSIFT is a new way to shrink (compress) audio and video tokens so omni-modal language models can think faster without forgetting important details.

#Omni-LLM#token compression#modality-asymmetric

DeepSeek-OCR 2: Visual Causal Flow

Intermediate
Haoran Wei, Yaofeng Sun et al.Jan 28arXiv

DeepSeek-OCR 2 teaches a computer to β€œread” pictures of documents in a smarter order, more like how people read.

#DeepSeek-OCR 2#DeepEncoder V2#visual tokens

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Intermediate
Haowei Zhang, Shudong Yang et al.Jan 21arXiv

HERMES is a training-free way to make video-language models understand live, streaming video quickly and accurately.

#HERMES#KV cache#hierarchical memory

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Intermediate
HyperAI Team, Yuchen Liu et al.Dec 16arXiv

HyperVL is a small but smart model that understands images and text, designed to run fast on phones and tablets.

#HyperVL#on-device multimodal#edge AI

Rethinking Chain-of-Thought Reasoning for Videos

Intermediate
Yiwu Zhong, Zi-Yuan Hu et al.Dec 10arXiv

The paper shows that video AIs do not need long, human-like chains of thought to reason well.

#video reasoning#chain-of-thought#concise reasoning