🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers5

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#PSNR

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

Intermediate
Haocheng Xi, Shuo Yang et al.Feb 3arXiv

Auto-regressive video models make videos one chunk at a time but run out of GPU memory because the KV-cache grows with history.

#Quant VideoGen (QVG)#KV-cache quantization#2-bit quantization

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Intermediate
Yi-Chuan Huang, Hao-Jen Chien et al.Dec 31arXiv

GaMO is a new way to rebuild 3D scenes from just a few photos by expanding each photo’s edges (outpainting) instead of inventing whole new camera views.

#3D reconstruction#outpainting#multi-view diffusion

Robust and Calibrated Detection of Authentic Multimedia Content

Intermediate
Sarim Hashmi, Abdelrahman Elsayed et al.Dec 17arXiv

Deepfakes are getting so good that simple yes/no detectors are failing, especially when attackers add tiny, invisible changes.

#Authenticity Index#calibrated resynthesis#reconstruction-free inversion

Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets

Intermediate
Jialong Zuo, Haoyou Deng et al.Dec 17arXiv

This paper checks if a popular text-to-image model called Nano Banana Pro can fix messy photos without any extra training.

#low-level vision#zero-shot restoration#generative models

SS4D: Native 4D Generative Model via Structured Spacetime Latents

Intermediate
Zhibing Li, Mengchen Zhang et al.Dec 16arXiv

SS4D is a new AI model that turns a short single-camera video into a full 3D object that moves over time (that’s 4D), and it does this in about 2 minutes.

#4D generation#structured spacetime latents#temporal attention