๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐ŸงฉProblems๐ŸŽฏPrompts๐Ÿง Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#alignment loss

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Intermediate
Kai Liu, Jungang Li et al.Dec 28arXiv

JavisGPT is a single AI that can both understand sounding videos (audio + video together) and also create new ones that stay in sync.

#multimodal large language model#audio-video synchronization#SyncFusion

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Intermediate
Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement