🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#Fréchet Video Distance

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

Intermediate
Kai Liu, Yanhao Zheng et al.Feb 22arXiv

JavisDiT++ is a new AI that makes short videos and matching sounds from a text prompt, keeping sight and sound in sync.

#joint audio-video generation#multimodal diffusion transformer#modality-specific mixture-of-experts

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

Intermediate
Yufan He, Pengfei Guo et al.Dec 29arXiv

SurgWorld teaches surgical robots using videos plus text, then guesses the missing robot moves so we can train good policies without collecting tons of real robot-action data.

#surgical world model#SATA dataset#inverse dynamics model

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Intermediate
Yang Fei, George Stoica et al.Dec 12arXiv

The paper teaches a video generator to move things realistically by borrowing motion knowledge from a strong video tracker.

#video diffusion#structure-preserving motion#SAM2