Papers2

#Vision-Language Models

Adapting Vision-Language Models for E-commerce Understanding at Scale

Matteo Nulli, Vladimir Orshulevich et al.Feb 12arXiv

This paper shows a simple, repeatable way to teach general Vision-Language Models (VLMs) to understand e-commerce items much better without forgetting their general skills.

#Vision-Language Models#E-commerce adaptation#Attribute extraction

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

Beginner

Song Wang, Lingdong Kong et al.Dec 30arXiv

Robots like cars and drones see the world with many different sensors (cameras, LiDAR, radar, and even event cameras), and this paper shows a clear roadmap for teaching them to understand space by learning from all of these together.

#Spatial Intelligence#Multi-Modal Pre-Training#Self-Supervised Learning