🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers46

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#flow matching

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Intermediate
Tianhe Wu, Ruibin Li et al.Feb 3arXiv

The paper solves a big problem in fast image generators: they got quick, but they lost variety and kept making similar pictures.

#diffusion distillation#distribution matching distillation#mode collapse

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Intermediate
Zehong Ma, Ruihan Xu et al.Feb 2arXiv

PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.

#pixel diffusion#perceptual loss#LPIPS

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Intermediate
FSVideo Team, Qingyu Chen et al.Feb 2arXiv

FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.

#FSVideo#image-to-video#video diffusion transformer

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate
Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Intermediate
Anthony Chen, Naomi Ken Korem et al.Jan 29arXiv

This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.

#video dubbing#audio-visual diffusion#joint generation

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Intermediate
Haoyou Deng, Keyu Yan et al.Jan 28arXiv

DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.

#DenseGRPO#flow matching#GRPO

Self-Refining Video Sampling

Intermediate
Sangwon Jang, Taekyung Ki et al.Jan 26arXiv

This paper shows how a video generator can improve its own videos during sampling, without extra training or outside checkers.

#video generation#flow matching#denoising autoencoder

LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Intermediate
Shijie Lian, Bin Yu et al.Jan 21arXiv

Robots often learn a bad habit called the vision shortcut: they guess the task just by looking, and ignore the words you tell them.

#Vision-Language-Action#Bayesian decomposition#Latent Action Queries

TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Intermediate
Bin Yu, Shijie Lian et al.Jan 20arXiv

TwinBrainVLA is a robot brain with two halves: a frozen generalist that keeps world knowledge safe and a trainable specialist that learns to move precisely.

#Vision-Language-Action#catastrophic forgetting#Asymmetric Mixture-of-Transformers

ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

Intermediate
Yawar Siddiqui, Duncan Frost et al.Jan 16arXiv

ShapeR builds clean, correctly sized 3D objects from messy, casual phone or glasses videos by using images, camera poses, sparse SLAM points, and short text captions together.

#ShapeR#3D reconstruction#object-centric

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Intermediate
Qiyuan Zhang, Biao Gong et al.Jan 16arXiv

This paper teaches video-making AIs to follow real-world physics, so rolling balls roll right and collisions look believable.

#physics-aware video generation#rigid body motion#reinforcement learning

HeartMuLa: A Family of Open Sourced Music Foundation Models

Intermediate
Dongchao Yang, Yuxin Xie et al.Jan 15arXiv

HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.

#music generation#audio tokenizer#residual vector quantization
1234