🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers64

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#flow matching

VLANeXt: Recipes for Building Strong VLA Models

Intermediate
Xiao-Ming Wu, Bin Fan et al.Feb 20arXiv

This paper studies Vision–Language–Action (VLA) robots under one fair setup to find which design choices truly matter.

#Vision-Language-Action#robot manipulation#flow matching

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Intermediate
Hila Manor, Rinon Gal et al.Feb 17arXiv

This paper teaches image models to copy a change shown in one image pair and apply it to a new image, like saying 'hat added here, add a similar hat there.'

#visual analogy learning#LoRA#LoRA basis

World Action Models are Zero-shot Policies

Intermediate
Seonghyeon Ye, Yunhao Ge et al.Feb 17arXiv

DreamZero is a robot brain that learns actions by predicting short videos of the future and the matching moves at the same time.

#World Action Models#DreamZero#video diffusion

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning

Intermediate
Jintao Zhang, Kai Jiang et al.Feb 13arXiv

Video generators are slow because attention looks at everything, which takes a lot of time.

#sparse attention#Top-k masking#Top-p masking

ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation

Intermediate
Zihan Yang, Shuyuan Tu et al.Feb 9arXiv

ArcFlow is a new way to make text-to-image models draw great pictures in only 2 steps instead of 50, giving about a 40× speed boost.

#ArcFlow#few-step distillation#non-linear flow

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Intermediate
SII-OpenMOSS Team, Donghua Yu et al.Feb 9arXiv

MOVA is an open-source AI that makes videos and sounds at the same time so mouths, actions, and noises match perfectly.

#video-audio generation#lip synchronization#dual-tower architecture

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Intermediate
Tianhe Wu, Ruibin Li et al.Feb 3arXiv

The paper solves a big problem in fast image generators: they got quick, but they lost variety and kept making similar pictures.

#diffusion distillation#distribution matching distillation#mode collapse

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Intermediate
Zehong Ma, Ruihan Xu et al.Feb 2arXiv

PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.

#pixel diffusion#perceptual loss#LPIPS

FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space

Intermediate
FSVideo Team, Qingyu Chen et al.Feb 2arXiv

FSVideo is a new image-to-video generator that runs about 42× faster than popular open-source models while keeping similar visual quality.

#FSVideo#image-to-video#video diffusion transformer

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Intermediate
Fu-Yun Wang, Han Zhang et al.Feb 1arXiv

PromptRL teaches a language model to rewrite prompts while a flow-based image model learns to draw, and both are trained together using the same rewards.

#PromptRL#flow matching#reinforcement learning

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Intermediate
Anthony Chen, Naomi Ken Korem et al.Jan 29arXiv

This paper shows a simple, one-model way to dub videos that makes the new voice and the lips move together naturally.

#video dubbing#audio-visual diffusion#joint generation

DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment

Intermediate
Haoyou Deng, Keyu Yan et al.Jan 28arXiv

DenseGRPO teaches image models using lots of small, timely rewards instead of one final score at the end.

#DenseGRPO#flow matching#GRPO
12345