MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset
Published
Annual Conference of the International Speech Communication Association (INTERSPEECH)
Abstract
The main purpose of this work is to create a comprehensive
audio testset that can be used to evaluate custom keyword spotting
(KWS) models and to benchmark different KWS solutions.
We also propose a set of requirements that should be followed
while creating testsets to evaluate custom KWS models. We
consider multiple versions of the problem: text and audio-based
keyword spotting, as well as offline and online (streaming)
modes. Our testset named MOCKS is based on LibriSpeech
and Mozilla Common Voice datasets. We used automatically
generated alignments to extract parts of the recordings, which
were split into keywords and test samples. The resulting testset
contains almost 50,000 keywords. It contains audio data in English,
French, German, Italian, and Spanish, but can be easily
extended to other languages. MOCKS has been made publicly
available to the research community. Initial KWS experiments
run on MOCKS suggest that it can serve as a challenging testset
for future research.