Publications

Prototypical speaker-interference loss for target voice separation using non-parallel audio samples

Published

Annual Conference of the International Speech Communication Association (INTERSPEECH)

Date

2022.09.18

Research Areas

Abstract

In this paper, we propose a new prototypical loss function
for training neural network models for target voice separation.
Conventional methods use paired parallel audio samples of the
target speaker with and without an interfering speaker or noise,
and minimize the spectrographic mean squared error (MSE) between the clean and enhanced target speaker audio. Motivated
by the use of contrastive loss in speaker recognition task, we
had earlier proposed a speaker representation loss that uses representative samples from the target speaker in addition to the
conventional MSE loss. In this work, we propose a prototypical
speaker-interference (PSI) loss, that makes use of representative samples from the target speaker, interfering speaker as well
as the interfering noise to better utilize any non-parallel data
that may be available. The performance of the proposed loss
function is evaluated using VoiceFilter, a popular framework for
target voice separation. Experimental results show that the proposed PSI loss significantly improves the PESQ scores of the
enhanced target speaker audio.