This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.
People often pick CLIP-like models for image labeling, but this paper shows that large multimodal models (LMMs) can be just as good—or even better—when you give them a few examples in the prompt (in-context learning).
Most people on Earth speak more than one language and often switch languages in the same chat, but AI tools aren’t tested well on this real behavior.
Giving large language models a few good examples and step-by-step instructions can make them much better at spotting feelings in text.
This paper teaches a camera to fix nighttime colors by combining a smart rule-based color trick (SGP-LRD) with a learning-by-trying helper (reinforcement learning).
IC-Effect is a new way to add special effects to existing videos by following a text instruction while keeping everything else unchanged.