Anomaly Detection-Aided Active Learning
Published
IEEE Global Communications Conference (GLOBECOM)
Abstract
This study proposes an adaptive human-in-the-loop process that accelerates the creation of AI models from scratch by prioritizing the most uncertain samples as the top candidates for annotation (Active learning) and discarding ones that are adverse to training by abnormality score (Anomaly detection). The integrated system, combining ensembles of gradient-boosted trees (GBDT) for Uncertainty quantification and a One-Class Support Vector Machine (OC-SVM) for Anomaly detection, was validated with a large and noisy dataset when creating a regression model of extracellular/total body water (ECW/TBW) estimation on smartwatches' data. A total of 549 human subjects participated in 2196 measurement tests. ECW/TBW contributing factors, such as users' anthropometric parameters, bioimpedance were distributed over a wide range of values. The performance of the fully automatic planning system provides a 42% reduction in the number of necessary data labels for training a regression model, from 1310 to 541 experiments compared to the classic active learning approach. The proposed approach provides a significant reduction in monetary costs and the duration of data collection. The article presents the absence of a correlation between different types of uncertainty (total, data, knowledge) and abnormality, which motivates the use of the proposed approach. Also, the regression model's performance for estimating the ECW/TBW ratio is evaluated.