Security & Privacy

SRC-B's Person-identification System in ICASSP 2023 Challenge

By Jinting Wu Samsung R&D Institute China - Beijing
By Mei Tu Samsung R&D Institute China - Beijing

Background

Recently, monitoring physical and mental health states with the help of physiological signals and daily behaviors has attracted widespread attention. The development of wearable devices provides a low-cost, portable, and convenient way for this application. Sensors such as accelerometers, gyroscopes, and heart rate monitors embedded into wearable devices can collect multimodal data to analyze users’ movement, sleep, or psychological stress status.

However, research on the diagnosis of mental illness using physiological signals is still being explored. Despite various studies of psychotic conditions in neurobiology and neurophysiology, physiological features that can accurately diagnose psychotic symptomatology have not yet been discovered. Another problem is that users’ daily behavior habits and physiological characteristics differ. Therefore, the signals are individually biased, and this problem affects the accuracy and robustness of advanced features such as disease monitoring and mental state detection.

The ICASSP 2023 Person Identification and Relapse Detection from Continuous Recordings of Biosignals (e-Prevention) challenge provides a dataset that contains long-term continuous recordings of biosignals and attempts to solve this problem by mining large user-diverse data1. It also proposes two downstream tasks: track 1 is the identification of the wearer of the smartwatch, and track 2 is the detection of relapses in patients in the psychotic spectrum.

In this blog, we present our recent work which ranks first in the e-Prevention challenge. During wearer identification, the valid data of physiological signals collected by wearable devices may be short, and there are a large number of missing and abnormal values in these signals. To solve the problem, we divide the valid data into multiple short-term fragments to solve the above problem. Each segment is used to predict an independent result through a 1D-CNN and participates in voting to obtain the user ID. Based on this framework, we train multiple base classifiers by change the fragment length and the number of signal channels, and use ensemble model to obtain the final results of the user ID. Experimental results on the evaluation set demonstrate the effectiveness of our systems.

Dataset

There are a total of 46 participants in the dataset of the e-Prevetion challenge1. All participants were provided with a Samsung Gear S3 smartwatch that monitored the user’s linear and angular acceleration (m/s2 and deg/s2, sampled at 20Hz), heart rate variability and RR intervals (sampled at 5Hz), sleeping schedule, and steps. However, only the mean values per 5 seconds of the data are provided in the challenge. The training/validation/test set contains 2304/495/521 days, respectively.

Method

Eight-channel signals are used to explore the user's personalized activities and behaviors in our method, which include accelerometer, gyroscope, heart rate, and R-R interval. Step information is not used.

Firstly, outlier values are filtered out and replaced. The valid ranges of the accelerometer, gyroscope, and heart rate are set to [-19.6, 19,6], [-573, 573], and [40,180], respectively. For all invalid values in a day, the mean value for this day is used for replacement. It should be noted that the R-R interval values at the time when the heart rate value is abnormal will be replaced accordingly.

All data are normalized according to the distribution of the training data. Then, these data are divided into 30-minute segments2. Segments with missing timestamps are discarded.

During training, a neural network with multiple 1D convolutional layers3 is designed, which is shown in Fig.1. Each convolutional layer is followed by a batch normalization layer and a ReLU activation. After concatenating intermediate features which are output by convolutional blocks, multiple fully connected (FC) layers are followed. Except for the last layer, each fully connected layer is followed by a ReLU activation. Finally, the logits for the 46 identities are output. During validation, we predict all segments of a day and then use voting over these segments to obtain the predicted user ID of this day.

Figure 1.   Architecture of our network

In order to use sleep information and solve the problem of abnormal heart rate data collection, we use different data to train multiple models:

A) All 30-minute segments in the training set are used.

B) Training segments are divided into two groups based on their sleep states, and two models are trained respectively. During validation, the predicted user ID is given by voting over both sleep and awake segments.

C) Because a small number of validation samples and test samples only include invalid heart rate values, six-channel signals which include accelerometer and gyroscope data are used for training.

After obtaining three daily results of these models, we used the voting of them as the final daily result y to improve the recognition accuracy:

where the weights wA,wB,wC are 0.4, 0.35, 0.25 respectively. However, there are several samples whose lengths are so short that this ensemble model cannot give predictions. To predict them, we train a model D by using 10-minute segments. If the length of a sample is shorter than 10 minutes, the prediction result is randomly given.

Experiments

During training, the cross-entropy loss function is used for training, and the Adam optimization algorithm is utilized to minimize the loss. We train for 50 epochs with a batch size of 128, and the learning rate is set to 1e-4.

The validation accuracy of the above models and the ensemble model is shown in Table 1. The accuracy on the test set is 95.00%, which ranked 1st in track 1 of the e-Prevention challenge.

Table 1.  Accuracy on the validation set

Conclusion

In this blog, we describes a person identification system submitted to track 1 of the ICASSP 2023 e-Prevention challenge. Physiological signals are divided into short-term segments and fed into a 1D-CNN for prediction. On the basis, multiple base networks with different input lengths or different signal channels are trained, and user IDs are predicted through the ensemble model. Experimental results demonstrate the superiority of our systems compared to other systems.

Link to the paper

https://ieeexplore.ieee.org/document/10095917

References

1. Zlatintsi, Athanasia, et al. "E-Prevention: Advanced Support System for Monitoring and Relapse Prevention in Patients with Psychotic Disorders Analyzing Long-Term Multimodal Data from Wearables and Video Captures." Sensors 22.19 (2022): 7544.

2. Retsinas, George, et al. "Person identification using deep convolutional neural networks on short-term signals from wearable sensors." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.

3. Yıldırım, Özal, et al. "Arrhythmia detection using deep convolutional neural network with long duration ECG signals." Computers in biology and medicine 102 (2018): 411-420.