VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
IntermediateHongbo Zhao, Meng Wang et al.Dec 17arXiv
Long texts are expensive for AI to read because each extra token costs a lot of compute and memory.
#vision‑text compression#VTCBench#vision‑language models