Papers7

#PSNR

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Yasaman Haghighi, Alexandre AlahiFeb 27arXiv

SenCache speeds up video diffusion models by reusing past answers only when the model is predicted to change very little.

#diffusion models#video generation#caching

Not triaged yet

Unified Latents (UL): How to train your latents

Intermediate

Jonathan Heek, Emiel Hoogeboom et al.Feb 19arXiv

Unified Latents (UL) is a way to learn the hidden code (latents) for images and videos by training three parts together: an encoder, a diffusion prior, and a diffusion decoder.

#Unified Latents#diffusion prior#diffusion decoder

Not triaged yet

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Intermediate

Haocheng Xi, Shuo Yang et al.Feb 3arXiv

Auto-regressive video models make videos one chunk at a time but run out of GPU memory because the KV-cache grows with history.

#Quant VideoGen (QVG)#KV-cache quantization#2-bit quantization

Not triaged yet

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Intermediate

Yi-Chuan Huang, Hao-Jen Chien et al.Dec 31arXiv

GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.

#3D reconstruction#outpainting#multi-view diffusion

Not triaged yet

Robust and Calibrated Detection of Authentic Multimedia Content

Intermediate

Sarim Hashmi, Abdelrahman Elsayed et al.Dec 17arXiv

Deepfakes are getting so good that simple yes/no detectors are failing, especially when attackers add tiny, invisible changes.

#Authenticity Index#calibrated resynthesis#reconstruction-free inversion

Not triaged yet

Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets

Intermediate

Jialong Zuo, Haoyou Deng et al.Dec 17arXiv

This paper checks if a popular text-to-image model called Nano Banana Pro can fix messy photos without any extra training.

#low-level vision#zero-shot restoration#generative models

Not triaged yet

SS4D: Native 4D Generative Model via Structured Spacetime Latents

Intermediate

Zhibing Li, Mengchen Zhang et al.Dec 16arXiv

SS4D is a new AI model that turns a short single-camera video into a full 3D object that moves over time (that’s 4D), and it does this in about 2 minutes.

#4D generation#structured spacetime latents#temporal attention

Not triaged yet