TFPSNet Time-Frequency Domain Path Scanning Network for Speech Separation
Deep learning techniques have accomplished a big step forward on speech separation task. The current leading methods are based on the time-domain audio separation network (TasNet) . TasNet uses a learnable encoder and decoder to replace the fixed T-F domain transformation. It takes waveform inputs and directly reconstructs sources, and computes time-domain loss with utterance-level permutation invariant training (uPIT). Several approaches are proposed based on TasNet framework, such as the Conv-TasNet  , the dual-path recurrent neural network (DPRNN) , the dual-path Transformer network (DPTNet) , RNN-free transformer-based neural network (SepFormer)  , a self-attentive network with a novel sandglass-shape, namely Sandglasset .