🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

📝Daily Log 🎯Prompts 🧠Review

Search Settings

How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

All Beginner Intermediate Advanced

All Sources arXiv

#multimodal diffusion

UM-Text: A Unified Multimodal Model for Image Understanding and Visual Text Editing

Lichen Ma, Xiaolong Fu et al.Jan 13arXiv

UM-Text is a single AI that understands both your words and your picture to add or change text in images so it looks like it truly belongs there.

#visual text editing#multimodal diffusion#Visual Language Model

Apollo: Unified Multi-Task Audio-Video Joint Generation

Jun Wang, Chunyu Qiang et al.Jan 7arXiv

APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.

#audio-video generation#multimodal diffusion#single-tower transformer

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Team Seedance, Heyi Chen et al.Dec 15arXiv

Seedance 1.5 pro is a single model that makes video and sound together at the same time, so lips, music, and actions match naturally.

#audio-visual generation#diffusion transformer#cross-modal synchronization