Search

ALL
Blog
Research Areas
Publications
News
Others

Blog(6)

[INTERSPEECH 2025 Series #6] Efficient Streaming TTS Acoustic Model with Depthwise RVQ Decoding Strategies in a Mamba Framework
Recent advancements in neural codec-based text-to-speech (TTS) systems have revolutionized speech synthesis quality, achieving remarkable naturalness and fidelity.
MELS-TTS: Multi-Emotion Multi-Lingual Multi-Speaker Text-To-Speech System via Disentangled Style Tokens
In the swiftly progressing domain of neural text-to-speech (TTS) systems, the quest for creating human-like speech has witnessed remarkable strides. Recent advancements have opened avenues for TTS systems capable of not only mimicking human speech but also encapsulating the nuances of emotions and linguistic diversity.
Robust neural codec language modeling with phoneme position prediction for zero-shot TTS
Large language models (LLMs) have exhibited impressive in-context learning abilities [1]. Inspired by these successes, recent studies [2-5] have extended LLM applications to text-to-speech (TTS) systems by representing speech through discrete acoustic codes.

View More

Research Areas(0)

Publications(3)

News(10)

View More

Others(0)