AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
IntermediateChangwoo Baek, Jouwon Song et al.Mar 1arXiv
Big picture: Vision-language models look at hundreds of image pieces (tokens), which makes them slow and sometimes chatty with mistakes called hallucinations.
#visual token pruning#attention-based pruning#diversity-based pruning