🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers10

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#DINOv2

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Intermediate
Zehong Ma, Ruihan Xu et al.Feb 2arXiv

PixelGen is a new image generator that works directly with pixels and uses what-looks-good-to-people guidance (perceptual loss) to improve quality.

#pixel diffusion#perceptual loss#LPIPS

iFSQ: Improving FSQ for Image Generation with 1 Line of Code

Intermediate
Bin Lin, Zongjian Li et al.Jan 23arXiv

This paper fixes a hidden flaw in a popular image tokenizer (FSQ) with a simple one-line change to its activation function.

#image generation#finite scalar quantization#iFSQ

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Intermediate
John Page, Xuesong Niu et al.Jan 9arXiv

This paper shows that the best VAEs for image generation are the ones whose latents neatly separate object attributes, a property called semantic disentanglement.

#Send-VAE#semantic disentanglement#latent diffusion

Orient Anything V2: Unifying Orientation and Rotation Understanding

Intermediate
Zehan Wang, Ziang Zhang et al.Jan 9arXiv

This paper teaches an AI model to understand both which way an object is facing (orientation) and how it turns between views (rotation), all in one system.

#object orientation#rotational symmetry#relative rotation

DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies

Intermediate
Renke Wang, Zhenyu Zhang et al.Jan 5arXiv

DiffProxy turns tricky multi-camera photos of a person into a clean 3D body and hands by first painting a precise 'map' on each pixel and then fitting a standard body model to that map.

#human mesh recovery#SMPL-X#dense correspondence

MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing

Intermediate
Xiaokun Sun, Zeyu Cai et al.Jan 1arXiv

MorphAny3D is a training-free way to smoothly change one 3D object into another, even if they are totally different (like a bee into a biplane).

#3D morphing#Structured Latent#SLAT

REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Intermediate
Giorgos Petsangourakis, Christos Sgouropoulos et al.Dec 18arXiv

Latent diffusion models are great at making images but learn the meaning of scenes slowly because their training goal mostly teaches them to clean up noise, not to understand objects and layouts.

#latent diffusion#REGLUE#representation entanglement

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Intermediate
Minghui Lin, Pengxiang Ding et al.Dec 10arXiv

Robots often act like goldfish with short memories; HiF-VLA fixes this by letting them use motion to remember the past and predict the future.

#Vision-Language-Action#motion vectors#temporal reasoning

One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Intermediate
Yuan Gao, Chen Chen et al.Dec 8arXiv

This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.

#Feature Auto-Encoder#FAE#Self-Supervised Learning

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Intermediate
Elisabetta Fedele, Francis Engelmann et al.Dec 5arXiv

SpaceControl lets you steer a powerful 3D generator with simple shapes you draw, without retraining the model.

#3D generative modeling#test-time guidance#latent space intervention