🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
🧩Problems🎯Prompts🧠Review
Search
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers7

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#CLIP

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Intermediate
Letian Zhang, Sucheng Ren et al.Jan 21arXiv

OpenVision 3 is a single vision encoder that learns one set of image tokens that work well for both understanding images (like answering questions) and generating images (like making new pictures).

#Unified Visual Encoder#VAE#Vision Transformer

Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Intermediate
Yi Zhou, Xuechao Zou et al.Dec 28arXiv

Co2S is a new way to train segmentation models with very few labels by letting two different students (CLIP and DINOv3) learn together and correct each other.

#semi-supervised segmentation#remote sensing#pseudo-label drift

StoryMem: Multi-shot Long Video Storytelling with Memory

Intermediate
Kaiwen Zhang, Liming Jiang et al.Dec 22arXiv

StoryMem is a new way to make minute‑long, multi‑shot videos that keep the same characters, places, and style across many clips.

#StoryMem#Memory-to-Video#multi-shot video generation

Towards Scalable Pre-training of Visual Tokenizers for Generation

Intermediate
Jingfeng Yao, Yuda Song et al.Dec 15arXiv

The paper tackles a paradox: visual tokenizers that get great pixel reconstructions often make worse images when used for generation.

#visual tokenizer#latent space#Vision Transformer

Position: Universal Aesthetic Alignment Narrows Artistic Expression

Intermediate
Wenqi Marshall Guo, Qingyun Qian et al.Dec 9arXiv

The paper shows that many AI image generators are trained to prefer one popular idea of beauty, even when a user clearly asks for something messy, dark, blurry, or emotionally heavy.

#universal aesthetic alignment#aesthetic pluralism#reward models

Relational Visual Similarity

Intermediate
Thao Nguyen, Sicheng Mo et al.Dec 8arXiv

Most image-similarity tools only notice how things look (color, shape, class) and miss deeper, human-like connections.

#relational similarity#visual analogy#anonymous captions

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Intermediate
Elisabetta Fedele, Francis Engelmann et al.Dec 5arXiv

SpaceControl lets you steer a powerful 3D generator with simple shapes you draw, without retraining the model.

#3D generative modeling#test-time guidance#latent space intervention