🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers7

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#DINOv3

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Intermediate
Tilemachos Aravanis, Vladan Stojnić et al.Feb 26arXiv

This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.

#open-vocabulary segmentation#vision-language models#retrieval-augmented

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

Intermediate
Shannan Yan, Leqi Zheng et al.Feb 22arXiv

This paper teaches a computer to find the same object when seen from two very different cameras, like a body camera (first-person) and a room camera (third-person).

#cross-view correspondence#egocentric to exocentric#binary segmentation

DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Intermediate
Hun Chang, Byunghee Cha et al.Jan 30arXiv

DINO-SAE is a new autoencoder that keeps both the meaning of an image (semantics) and tiny textures (fine details) at the same time.

#DINO-SAE#spherical manifold#cosine similarity alignment

C-RADIOv4 (Tech Report)

Intermediate
Mike Ranzinger, Greg Heinrich et al.Jan 24arXiv

C-RADIOv4 is a single vision model that learns from several expert models at once and keeps their best skills while staying fast.

#C-RADIOv4#agglomerative vision models#multi-teacher distillation

AnyDepth: Depth Estimation Made Easy

Intermediate
Zeyu Ren, Zeyu Zhang et al.Jan 6arXiv

AnyDepth is a new, simple way for a computer to tell how far things are in a picture using just one image (monocular depth).

#monocular depth estimation#zero-shot depth#Simple Depth Transformer

Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Intermediate
Yi Zhou, Xuechao Zou et al.Dec 28arXiv

Co2S is a new way to train segmentation models with very few labels by letting two different students (CLIP and DINOv3) learn together and correct each other.

#semi-supervised segmentation#remote sensing#pseudo-label drift

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Intermediate
Minglei Shi, Haolin Wang et al.Dec 12arXiv

This paper shows you can train a big text-to-image diffusion model directly on the features of a vision foundation model (like DINOv3) without using a VAE.

#text-to-image#diffusion transformer#flow matching