Papers4

#Latent Diffusion

Guiding a Diffusion Transformer with the Internal Dynamics of Itself

This paper shows a simple way to make image-generating AIs (diffusion Transformers) produce clearer, more accurate pictures by letting the model guide itself from the inside.

#Internal Guidance#Diffusion Transformer#Intermediate Supervision

Not triaged yet

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Beginner

Hanyang Kong, Xingyi Yang et al.Dec 22arXiv

WorldWarp is a new method that turns a single photo plus a planned camera path into a long, steady, 3D-consistent video.

#Novel View Synthesis#3D Gaussian Splatting#Spatio-Temporal Diffusion

Not triaged yet

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

Beginner

Shilong Zhang, He Zhang et al.Dec 19arXiv

This paper shows that great image understanding features alone are not enough for making great images; you also need strong pixel-level detail.

#Pixel–Semantic VAE#Semantic Regularization#Off-Manifold Generation

Not triaged yet

One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Intermediate

Yuan Gao, Chen Chen et al.Dec 8arXiv

This paper shows that we can turn big, smart vision features into a small, easy-to-use code for image generation with just one attention layer.

#Feature Auto-Encoder#FAE#Self-Supervised Learning

Not triaged yet