How I Study AI - Learn AI Papers & Lectures the Easy Way

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Intermediate

Tilemachos Aravanis, Vladan Stojnić et al.Feb 26arXiv

This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.

#open-vocabulary segmentation#vision-language models#retrieval-augmented

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Intermediate

Runpei Dong, Ziyan Li et al.Feb 18arXiv

This paper teaches a humanoid robot to find and pick up many different objects in new places using plain-language requests like 'grab the orange mug.'

#humanoid loco-manipulation#end-effector tracking#open-vocabulary perception

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation

Intermediate

Zhe Huang, Hao Wen et al.Dec 30arXiv

Multimodal Large Language Models (MLLMs) often hallucinate on videos by trusting words and common sense more than what the frames really show.

#multimodal large language model#video understanding#visual hallucination

Papers3

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation