๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
๐Ÿ“Daily Log๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers4

AllBeginnerIntermediateAdvanced
All SourcesarXiv
#text-to-speech

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Intermediate
Haoyu Zhang, Zhipeng Li et al.Feb 6arXiv

Ex-Omni is a new open-source AI system that can understand text or speech and then talk back while moving a 3D face in sync with the voice.

#omni-modal LLM#3D facial animation#lip-sync

Qwen3-TTS Technical Report

Intermediate
Hangrui Hu, Xinfa Zhu et al.Jan 22arXiv

Qwen3-TTS is a family of text-to-speech models that can talk in 10+ languages, clone a new voice from just 3 seconds, and follow detailed style instructions in real time.

#Qwen3-TTS#text-to-speech#voice cloning

Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis

Intermediate
Thanathai Lertpetchpun, Yoonjeong Lee et al.Jan 20arXiv

The paper shows how to control accents in text-to-speech (TTS) by mixing simple, linguistics-based sound-change rules with speaker embeddings.

#text-to-speech#accent control#phonological rules

Towards Interactive Intelligence for Digital Humans

Intermediate
Yiyi Cai, Xuangeng Chu et al.Dec 15arXiv

Digital humans used to just copy motions; this paper makes them think, speak, and move in sync like real people.

#interactive intelligence#digital human#multimodal avatar