Publications

MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset

Published

Annual Conference of the International Speech Communication Association (INTERSPEECH)

Date

2023.08.20

Research Areas

Abstract

The main purpose of this work is to create a comprehensive audio testset that can be used to evaluate custom keyword spotting (KWS) models and to benchmark different KWS solutions. We also propose a set of requirements that should be followed while creating testsets to evaluate custom KWS models. We consider multiple versions of the problem: text and audio-based keyword spotting, as well as offline and online (streaming) modes. Our testset named MOCKS is based on LibriSpeech and Mozilla Common Voice datasets. We used automatically generated alignments to extract parts of the recordings, which were split into keywords and test samples. The resulting testset contains almost 50,000 keywords. It contains audio data in English, French, German, Italian, and Spanish, but can be easily extended to other languages. MOCKS has been made publicly available to the research community. Initial KWS experiments run on MOCKS suggest that it can serve as a challenging testset for future research.