Proxy Compression for Language Modeling
IntermediateLin Zheng, Xinyu Li et al.Feb 4arXiv
Most language models are trained on compressed tokens, which makes training fast but ties the model to a specific tokenizer.
#proxy compression#byte-level language modeling#tokenizer-free inference