Publications

Leveraging Self-Supervised Speech Representations for Domain Adaptation in Speech Enhancement

Published

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Date

2024.04.14

Research Areas

Abstract

The performance of deep learning based speech enhancement (SE) approaches can degrade significantly due to mismatch between training and testing conditions. A realistic situation is that an SE model trained solely on parallel noisy-clean utterances from the source domain could fail to perform reasonably well on the target (new) domain due to unseen noise types and acoustic environments. However, it can be a common case that only noisy data are available from the target domain, as collecting clean samples in a new domain can be challenging.  Therefore, it is worth studying unsupervised domain adaptive SE techniques that exploit the knowledge available from the source domain with paired noisy-clean data, together with the unpaired target domain noisy data, for improved SE. In this paper, we present a novel adaptation framework by leveraging self-supervised learning (SSL) speech representations. SSL models are pre-trained with large amount of unlabeled speech data to extract representations rich in phonetic and acoustics information. We show that by taking advantage of SSL features, effective adaptation to the target domain is possible. To our knowledge, it is the first attempt to apply SSL models for unsupervised domain adaptation in the context of SE.