Freq2Time: Weakly Supervised Learning of Camera-Based RPPG from Heart Rate
Published
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Abstract
Camera-based vitals measurements have rapidly improved over recent years due to innovations in algorithm robust- ness. However, for deep learning systems to be trustworthy and generalizable, the current solutions require large training datasets with a diverse set of skin tones, lighting, camera sensors, motion, and coverage of the physiological ranges. Collecting such diverse data is challenging due to the need for simultaneous capture of a physiological ground truth. Many modern deep learning frameworks for rPPG even require a time-synchronized PPG waveform signal with the video. We present a weakly supervised learning framework for rPPG that only requires soft frequency labels. Since the pulse rate changes more slowly than the time signal and requires win- dowing, we find that synchronization with the video is not required. Relaxing the synchronization requirement brings about opportunities for collecting diverse and personalized training data in the future, where wearables could periodi- cally send frequency labels to a collection device wirelessly. Finally, we show that our networks trained only with pulse rates are still capable of accurately predicting the rPPG time signal, allowing for higher level biomarkers such as heart rate variability.