Papers3

#audio tokenizer

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Yitian Gong, Kuangwei Chen et al.Feb 11arXiv

This paper builds a new audio tokenizer, called MOSS-Audio-Tokenizer, that turns sound into tiny tokens the way text tokenizers turn sentences into words.

#audio tokenizer#causal transformer#residual vector quantization

Not triaged yet

HeartMuLa: A Family of Open Sourced Music Foundation Models

Intermediate

Dongchao Yang, Yuxin Xie et al.Jan 15arXiv

HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.

#music generation#audio tokenizer#residual vector quantization

Not triaged yet

Towards Interactive Intelligence for Digital Humans

Intermediate

Yiyi Cai, Xuangeng Chu et al.Dec 15arXiv

Digital humans used to just copy motions; this paper makes them think, speak, and move in sync like real people.

#interactive intelligence#digital human#multimodal avatar

Not triaged yet