Papers4

#few-shot learning

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Tilemachos Aravanis, Vladan Stojnić et al.Feb 26arXiv

This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.

#open-vocabulary segmentation#vision-language models#retrieval-augmented

Not triaged yet

Large Multimodal Models as General In-Context Classifiers

Intermediate

Marco Garosi, Matteo Farina et al.Feb 26arXiv

People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).

#in-context learning#multimodal models#open-world classification

Not triaged yet

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

Intermediate

Mohammad Rifqi Farhansyah, Hanif Muhammad Zhafran et al.Jan 24arXiv

Most people on Earth speak more than one language and often switch languages in the same chat, but AI tools aren’t tested well on this real behavior.

#code-switching#multilingual NLP#trilingual dialogue

Not triaged yet

IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning

Intermediate

Yuanhang Li, Yiren Song et al.Dec 17arXiv

IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.

#video editing#visual effects#diffusion transformer

Not triaged yet