๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#ASR

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding

Intermediate
Jiaming Zhou, Xuxin Cheng et al.Jan 30arXiv

DIFFA-2 is a new audio AI that listens to speech, sounds, and music and answers questions about them using a diffusion-style language model instead of the usual step-by-step (autoregressive) method.

#Diffusion language models#Audio understanding#Large audio language model

Qwen3-ASR Technical Report

Intermediate
Xian Shi, Xiong Wang et al.Jan 29arXiv

Qwen3โ€‘ASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.

#ASR#forced alignment#timestamps

AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation

Intermediate
Dongjie Cheng, Ruifeng Yuan et al.Jan 25arXiv

AR-Omni is a single autoregressive model that can take in and produce text, images, and speech without extra expert decoders.

#autoregressive modeling#multimodal large language model#any-to-any generation

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

Intermediate
Xiaojun Jia, Jie Liao et al.Dec 6arXiv

OmniSafeBench-MM is a one-stop, open-source test bench that fairly compares how multimodal AI models get tricked (jailbroken) and how well different defenses stop that.

#multimodal large language models#jailbreak attacks#safety alignment