Blog(4)
- MELS-TTS: Multi-Emotion Multi-Lingual Multi-Speaker Text-To-Speech System via Disentangled Style Tokens
- Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
- [INTERSPEECH 2024 Series #3] High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model