MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models
IntermediateYitian Gong, Kuangwei Chen et al.Feb 11arXiv
This paper builds a new audio tokenizer, called MOSS-Audio-Tokenizer, that turns sound into tiny tokens the way text tokenizers turn sentences into words.
#audio tokenizer#causal transformer#residual vector quantization