This paper teaches an AI to segment any object you name (open-vocabulary) much better by adding a few example pictures with pixel labels and smart retrieval.
ObjEmbed teaches an AI to understand not just whole pictures, but each object inside them, and to link those objects to the right words.