🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#MLLM

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Intermediate
Pengze Zhang, Yanze Wu et al.Jan 20arXiv

OmniTransfer is a single system that learns from a whole reference video, not just one image, so it can copy how things look (identity and style) and how they move (motion, camera, effects).

#spatio-temporal video transfer#identity transfer#style transfer

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Intermediate
Han Lin, Xichen Pan et al.Dec 12arXiv

MetaCanvas lets a multimodal language model (MLLM) sketch a plan inside the generator’s hidden canvas so diffusion models can follow it patch by patch.

#MetaCanvas#MLLM#Diffusion Transformer

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Intermediate
Le Thien Phuc Nguyen, Zhuoran Yu et al.Dec 1arXiv

This paper introduces AV-SpeakerBench, a new test that checks if AI can truly see, hear, and understand who is speaking, what they say, and when they say it in real videos.

#audiovisual reasoning#speaker attribution#temporal grounding