Blog(0)
Research Areas(0)
Publications(0)
News(7)
Automatic Speech Recognition (ASR) systems on smart devices have traditionally relied on server based models. This involves sending audio data to the server and receiving text hypothesis once the server model completes decoding.
Deep learning based audio signal processing is increasingly becoming popular for solving the cocktail party problem [1-5]. A noisy signal, which is mixed by a clean and noise signal, is used as a pair with the clean signal for training a speech enhancement or source separation model [6-7].
In today’s world of virtual meetings, conferences, and multi-media, automatic speech translation offers a wide variety of applications. Traditional offline speech translation models used a cascade of speech recognition and text translation. In our prior works [1], we developed efficient techniques for end-to-end speech translation which outperforms traditional cascaded approaches.
Others(0)