A Generative Model for Speech Segmentation and Obfuscation for Remote Health Monitoring
Published
IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences
Abstract
The prevalence of smart devices has enabled remote health monitoring outside of conventional clinical settings, and has reduced health care delivery cost. Passive audio recording is an essential component in remote health monitoring, however, it poses major privacy issues for subjects in uncontrolled environments like their home. There are existing voice activity detection and speech classification methodologies to identify sound events and obfuscate the human speech. However, they result in frequent false positives (>94%) when distinguishing human speech from other sound events; their performance is limited to a controlled environment for a specific application; and
require large amount of labeled data for training. In this paper, we present a novel speech privacy preservation methodology using generative adversarial networks to segment human speech in a recorded audio and generate human-like random speech to replace the original segment. We implemented our methodology and experimented on standard datasets of speech, environmental sounds, and cough samples generated from our internal mobile health study. Compared to current methodologies, our experimental results show much lower speech segmentation true positive rates of 17% and 14% for environmental sounds and cough datasets. Moreover, randomly generated audio samples to obfuscate the speech are shown to be likely indistinguishable from human speech (lower than 0.9% error in spectral attributes).