An MVDR-Embedded U-Net Beamformer for Effective and Robust Multichannel Speech Enhancement
Published
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Abstract
In multichannel speech enhancement (SE) systems based on beamforming, deep neural networks (DNNs) are often used to estimate beamformer weights directly. This approach, however, may not generalize well to new acoustic conditions. Alternatively, DNNs can predict T-F masks for speech and noise patterns that can be used with statistical beamforming. This approach is robust, but its performance is constrained by the later component as relying on certain modeling assumptions, e.g., covariance-based modeling in the minimum-variance-distortionless-response (MVDR) beamformer. In this paper, we propose a novel integration of the two types of methodology by introducing an intra-MVDR module embedded in the U-Net architecture that combines the merits of both, i.e., effectiveness and robustness. Simulation results show that the proposed MVDR-embedded U-Net leads to SE improvements that are not achievable by simply enlarging the network with baseline approaches.