Composing Concepts from Images and Videos via Concept-prompt Binding
IntermediateXianghao Kong, Zeyu Zhang et al.Dec 10arXiv
This paper introduces BiCo, a one-shot way to mix ideas from images and videos by tightly tying each visual idea to the exact words in a prompt.
#BiCo#concept binding#token-level composition