Papers5

#DINOv3

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Hun Chang, Byunghee Cha et al.Jan 30arXiv

DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.

#DINO-SAE#spherical manifold#cosine similarity alignment

C-RADIOv4 (Tech Report)

Intermediate

Mike Ranzinger, Greg Heinrich et al.Jan 24arXiv

C-RADIOv4 is a single vision model that learns from several expert models at once and keeps their best skills while staying fast.

#C-RADIOv4#agglomerative vision models#multi-teacher distillation

AnyDepth: Depth Estimation Made Easy

Intermediate

Zeyu Ren, Zeyu Zhang et al.Jan 6arXiv

AnyDepth is a new, simple way for a computer to tell how far things are in a picture using just one image (monocular depth).

#monocular depth estimation#zero-shot depth#Simple Depth Transformer

Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Intermediate

Yi Zhou, Xuechao Zou et al.Dec 28arXiv

Co2S is a new way to train segmentation models with very few labels by letting two different students (CLIP and DINOv3) learn together and correct each other.

#semi-supervised segmentation#remote sensing#pseudo-label drift

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Intermediate

Minglei Shi, Haolin Wang et al.Dec 12arXiv

This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.

#text-to-image#diffusion transformer#flow matching