🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#multi-task learning

Apollo: Unified Multi-Task Audio-Video Joint Generation

Intermediate
Jun Wang, Chunyu Qiang et al.Jan 7arXiv

APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.

#audio-video generation#multimodal diffusion#single-tower transformer

Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks

Beginner
Atsuki Yamaguchi, Maggie Mi et al.Jan 6arXiv

The paper teaches language models using extra 'language homework' made from the same raw text so they learn grammar and meaning, not just next-word guessing.

#language model pretraining#causal language modeling#linguistic competence

Towards Scalable Pre-training of Visual Tokenizers for Generation

Intermediate
Jingfeng Yao, Yuda Song et al.Dec 15arXiv

The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.

#visual tokenizer#latent space#Vision Transformer

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Beginner
Jiehui Huang, Yuechen Zhang et al.Dec 8arXiv

UnityVideo is a single, unified model that learns from many kinds of video information at once—like colors (RGB), depth, motion (optical flow), body pose, skeletons, and segmentation—to make smarter, more realistic videos.

#multimodal video generation#multi-task learning#dynamic noise scheduling