🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#video foundation models

VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval

Intermediate
Issar Tzachor, Dvir Samuel et al.Feb 8arXiv

VidVec shows that video-capable multimodal language models already hide strong matching signals between videos and sentences inside their middle layers.

#video–text retrieval#multimodal large language models#intermediate layer embeddings

How Much 3D Do Video Foundation Models Encode?

Intermediate
Zixuan Huang, Xiang Li et al.Dec 23arXiv

This paper asks a simple question: do video AI models trained only on 2D videos secretly learn about 3D worlds?

#video foundation models#3D awareness#temporal reasoning