🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers11

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#latent diffusion

DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

Intermediate
Mingshuang Luo, Shuang Liang et al.Jan 29arXiv

DreamActor-M2 is a new way to make a still picture move by copying motion from a video while keeping the character’s look the same.

#character image animation#spatiotemporal in-context learning#video diffusion

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Intermediate
Moo Jin Kim, Yihuai Gao et al.Jan 22arXiv

Cosmos Policy teaches robots to act by fine-tuning a powerful video model in just one training stage, without changing the model’s architecture.

#video diffusion#robot policy learning#visuomotor control

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

Intermediate
Ruiheng Zhang, Jingfeng Yao et al.Jan 16arXiv

UniX is a new medical AI that both understands chest X-rays (writes accurate reports) and generates chest X-ray images (high visual quality) without making the two jobs fight each other.

#UniX#autoregressive branch#diffusion branch

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Intermediate
John Page, Xuesong Niu et al.Jan 9arXiv

This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.

#Send-VAE#semantic disentanglement#latent diffusion

LTX-2: Efficient Joint Audio-Visual Foundation Model

Intermediate
Yoav HaCohen, Benny Brazowski et al.Jan 6arXiv

LTX-2 is an open-source model that makes video and sound together from a text prompt, so the picture and audio match in time and meaning.

#text-to-video#text-to-audio#audiovisual generation

Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

Intermediate
Hau-Shiang Shiu, Chin-Yang Lin et al.Dec 29arXiv

This paper makes diffusion-based video super-resolution (VSR) practical for live, low-latency use by removing the need for future frames and cutting denoising from ~50 steps down to just 4.

#video super-resolution#diffusion model#latent diffusion

RadarGen: Automotive Radar Point Cloud Generation from Cameras

Intermediate
Tomer Borreda, Fangqiang Ding et al.Dec 19arXiv

RadarGen is a tool that learns to generate realistic car radar point clouds just from multiple camera views.

#automotive radar#radar point cloud generation#latent diffusion

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Intermediate
Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement

Towards Scalable Pre-training of Visual Tokenizers for Generation

Intermediate
Jingfeng Yao, Yuda Song et al.Dec 15arXiv

The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.

#visual tokenizer#latent space#Vision Transformer

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Intermediate
Minglei Shi, Haolin Wang et al.Dec 12arXiv

This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.

#text-to-image#diffusion transformer#flow matching

StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

Intermediate
Tjark Behrens, Anton Obukhov et al.Dec 11arXiv

StereoSpace turns a single photo into a full 3D-style stereo pair without ever estimating a depth map.

#stereo generation#monocular-to-stereo#diffusion models