🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers6

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#multimodal LLM

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Intermediate
Lulu Hu, Wenhu Xiao et al.Mar 5arXiv

Multimodal AI models handle text, images, and audio, but their signals are very different in size, which breaks standard low‑bit compression methods.

#post‑training quantization#multimodal LLM#channel‑wise smoothing

SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Intermediate
Jintao Tong, Shilin Yan et al.Feb 5arXiv

SwimBird is a multimodal AI that can switch how it thinks: only in text, only in vision (with hidden picture-like thoughts), or a mix of both.

#SwimBird#switchable reasoning#hybrid autoregressive

Reinforced Attention Learning

Intermediate
Bangzheng Li, Jianmo Ni et al.Feb 4arXiv

This paper teaches AI to pay attention better by training its focus, not just its words.

#Reinforced Attention Learning#attention policy#multimodal LLM

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Intermediate
Qian Chen, Jinlan Fu et al.Jan 20arXiv

FutureOmni is the first benchmark that tests if multimodal AI models can predict what happens next from both sound and video, not just explain what already happened.

#multimodal LLM#audio-visual reasoning#future forecasting

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Intermediate
Shuming Liu, Mingchen Zhuge et al.Jan 8arXiv

The paper asks a simple question: do video AIs really need to “think out loud” every time, or can they answer quickly most of the time and think deeply only when needed?

#video reasoning#adaptive reasoning#early exit

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Intermediate
Moritz Böhle, Amélie Royer et al.Dec 22arXiv

CASA is a new way to mix images and text inside a language model that keeps speed and memory low while keeping accuracy high.

#CASA#cross-attention#self-attention