🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers49

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#flow matching

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Intermediate
Qiyuan Zhang, Biao Gong et al.Jan 16arXiv

This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.

#physics-aware video generation#rigid body motion#reinforcement learning

HeartMuLa: A Family of Open Sourced Music Foundation Models

Intermediate
Dongchao Yang, Yuxin Xie et al.Jan 15arXiv

HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.

#music generation#audio tokenizer#residual vector quantization

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Intermediate
Chengzhuo Tong, Mingkun Chang et al.Jan 15arXiv

This paper turns a video model into a step-by-step visual thinker that makes one final, high-quality picture from a text prompt.

#Chain-of-Frame#visual reasoning#text-to-image

Apollo: Unified Multi-Task Audio-Video Joint Generation

Intermediate
Jun Wang, Chunyu Qiang et al.Jan 7arXiv

APOLLO is a single, unified model that can make video and audio together or separately, and it keeps them tightly in sync.

#audio-video generation#multimodal diffusion#single-tower transformer

ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing

Beginner
Hengjia Li, Liming Jiang et al.Jan 6arXiv

ThinkRL-Edit teaches an image editor to think first and draw second, which makes tricky, reasoning-heavy edits much more accurate.

#reasoning-centric image editing#reinforcement learning#chain-of-thought

DreamStyle: A Unified Framework for Video Stylization

Intermediate
Mengtian Li, Jinshu Chen et al.Jan 6arXiv

DreamStyle is a single video-stylizing model that can follow text, copy a style image, or continue from a stylized first frame—without switching tools.

#video stylization#image-to-video (I2V)#token-specific LoRA

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Intermediate
Loïc Magne, Anas Awadalla et al.Jan 4arXiv

NitroGen is a vision-to-action AI that learns to play many video games by watching 40,000 hours of gameplay videos from over 1,000 titles with on-screen controller overlays.

#NitroGen#generalist gaming agent#behavior cloning

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Intermediate
Xu Guo, Fulong Ye et al.Jan 4arXiv

DreamID-V is a new AI method that swaps faces in videos while keeping the body movements, expressions, lighting, and background steady and natural.

#video face swapping#image face swapping#diffusion transformer

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Intermediate
Siyuan Hu, Kevin Qinghong Lin et al.Dec 31arXiv

Computers usually click like a woodpecker, but they struggle to drag smoothly like a human hand; this paper fixes that.

#GUI automation#continuous control#flow matching

FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Intermediate
Jibin Song, Mingi Kwon et al.Dec 31arXiv

FlowBlending is a simple way to speed up video diffusion models by smartly choosing when to use a big model and when a small one is enough.

#FlowBlending#stage-aware sampling#video diffusion

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Intermediate
Yuanhao Cai, Kunpeng Li et al.Dec 31arXiv

This paper teaches text-to-video models to follow real-world physics, so people, balls, water, glass, and fire act the way they should.

#text-to-video generation#physical consistency#direct preference optimization

GR-Dexter Technical Report

Intermediate
Ruoshi Wen, Guangzeng Chen et al.Dec 30arXiv

GR-Dexter is a full package—new robot hands, a smart AI brain, and lots of carefully mixed data—that lets a two-handed robot follow language instructions to do long, tricky tasks.

#vision-language-action#dexterous manipulation#bimanual robotics
12345