🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers3

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#text-to-video retrieval

Unified Vision-Language Modeling via Concept Space Alignment

Intermediate
Yifu Qiu, Paul-Ambroise Duquenne et al.Mar 1arXiv

The paper builds v-Sonar, a bridge that maps images and videos into the same meaning-space as text called Sonar, so all modalities “speak” the same language.

#v-Sonar#OmniSONAR#concept space alignment

RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval

Intermediate
Tyler Skow, Alexander Martin et al.Feb 2arXiv

RANKVIDEO is a video-native reasoning reranker that helps search engines find the right videos for a text query by directly looking at the video’s visuals and audio, not just text captions.

#text-to-video retrieval#video-native reranking#multimodal reasoning

Action100M: A Large-scale Video Action Dataset

Intermediate
Delong Chen, Tejaswi Kasarla et al.Jan 15arXiv

Action100M is a gigantic video dataset with about 100 million labeled action moments built automatically from 1.2 million instructional videos.

#Action100M#open-vocabulary action recognition#hierarchical temporal segmentation