Publications

Joint Embedding Learning and Latent Subspace Probing for Cross-Domain Few-Shot Keyword Spotting

Published

International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date

2024.03.18

Research Areas

Abstract

Probing classifiers (PCs) have been employed as one of the notable approaches for exploring properties of deep neural network (DNN) models in various tasks such as natural language processing (NLP) and computer vision (CV). In this approach, a PC is trained to predict some property (e.g. semantics in NLP) from features learned by pre-trained models. If the PC performs well, then it is concluded that the model learned information relevant for the property. Recent studies have demonstrated various methodological limitations of this approach, such as classifier selection, and focused on designing PCs. Instead, we are studying improvement of limitations of feature spaces used to train PCs for cross-domain few-shot keyword spotting (FKWS). To this end, we propose a Latent Subspace Probing (LSP) framework for jointly learning embedding of features onto latent subspaces of models together with PCs employed on the subspaces. We apply our LSP for FKWS aiming to identify and improve the keyword discrimination property of models. We quantitatively and qualitatively explore discrimination properties of learned embeddings of keywords with PCs. The results show that our proposed methods outperform the conventional and state-of-the-art methods up to 8% and 10% in 1 and 5 shot recognition tasks.