MIBURI is a system that makes a talking digital character move its body and face expressively in real time while it speaks.
This paper builds a new audio tokenizer, called MOSS-Audio-Tokenizer, that turns sound into tiny tokens the way text tokenizers turn sentences into words.
SAMTok turns any object’s mask in an image into just two special “words” so language models can handle pixels like they handle text.
HeartMuLa is a family of open-source music AI models that can understand and generate full songs with clear lyrics and strong musical structure.