VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
IntermediateHanxun Yu, Wentong Li et al.Jan 30arXiv
VisionTrim makes picture-and-text AI models run much faster by keeping only the most useful visual pieces (tokens) and smartly merging the rest.
#vision token compression#training-free acceleration#multimodal large language model