🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

🧩Problems 🎯Prompts 🧠Review

How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers2

All Beginner Intermediate Advanced

All Sources arXiv

#video-language models

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

Sangyun Chung, Se Yeon Kim et al.Jan 29arXiv

Multimodal AI models can mix up what they see and what they hear, making things up across senses; this is called cross-modal hallucination.

#multimodal large language models#cross-modal hallucination#contrastive decoding

Factorized Learning for Temporally Grounded Video-Language Models

Wenzheng Zeng, Difei Gao et al.Dec 30arXiv

This paper teaches video-language models to first find when the proof happens in a video and then answer with that proof, instead of mixing both steps together.

#temporal grounding#video-language models#evidence tokens