DIFFA-2 is a new audio AI that listens to speech, sounds, and music and answers questions about them using a diffusion-style language model instead of the usual step-by-step (autoregressive) method.
Qwen3โASR is a family of speech models that hear, understand, and write down speech in 52 languages and dialects, plus they can tell you when each word was spoken.
AR-Omni is a single autoregressive model that can take in and produce text, images, and speech without extra expert decoders.
OmniSafeBench-MM is a one-stop, open-source test bench that fairly compares how multimodal AI models get tricked (jailbroken) and how well different defenses stop that.