Blog(3)
In the swiftly progressing domain of neural text-to-speech (TTS) systems, the quest for creating human-like speech has witnessed remarkable strides. Recent advancements have opened avenues for TTS systems capable of not only mimicking human speech but also encapsulating the nuances of emotions and linguistic diversity.
In recent years, text-to-speech (TTS) has accomplished remarkable improvement with the emergence of various end-to end TTS models [1, 2, 3]. Through these advanced models, TTS expands its field from a model built with a professional voice actor to a personalized TTS.
Recently, personalized AI systems have gained significant attention. In the TTS field, zero-shot text-to-speech (ZS-TTS) systems [1-7] enable users to create their own TTS systems that replicate their voices with just one utterance, without further training.
Research Areas(0)
Publications(3)
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge
AuthorSangjun Park, Kihyun Choo, Joohyung Lee, Anton V. Porov, Konstantin Osipov, June Sig Sung
PublishedAnnual Conference of the International Speech Communication Association (INTERSPEECH)
Date2022-09-18
Part-of-Speech Models Compression Methods for On-device Grapheme-to-phoneme Conversion
AuthorMarek Kubis, Pawel Skorzewski, Marcin Lewandowski 외
PublishedInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Date2022-05-22
SEMI-SUPERVISED TRANSFER LEARNING FOR LANGUAGE EXPANSION OF END-TO-END SPEECH RECOGNITION MODELS TO LOW-RESOURCE LANGUAGES
AuthorJiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim
PublishedIEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
Date2021-12-17
News(5)
Others(0)