AI
Wearable devices are expected to deliver increasingly accurate health monitoring while remaining compact, lightweight, and power efficient. However, combining multiple sensing modalities often introduces additional hardware complexity, battery consumption, and robustness challenges.
Among physiological sensors, photoplethysmography (PPG) remains one of the primary technologies for heart rate monitoring. At the same time, PPG is highly sensitive to motion artifacts, temporary signal degradation, and increased power usage during continuous tracking. Accelerometer (ACC) sensors, in contrast, are significantly more robust and energy efficient, but they do not directly measure cardiovascular activity [1-3].
This raises an important research question: can one physiological sensor learn representations that are usually provided by another modality?
Our recent work, presented at ICASSP 2026, explores this concept through a lightweight cross-modal virtual sensing framework for wearable devices. Instead of treating physiological sensors as isolated data sources, we investigated whether synchronized sensor streams could learn latent relationships and compensate for missing or degraded signals.
Modern wearable devices increasingly operate under strict hardware constraints. Compact form factors such as smart rings, earbuds, or lightweight fitness wearables may not always support a full sensor stack. Even when optical sensors are available, motion-heavy activities can severely reduce signal quality.
Traditional multimodal systems rely on explicit sensor fusion: combining ACC and PPG simultaneously to improve robustness. While effective, such approaches still assume that all sensors remain available and reliable during inference.
In our work, we explored a different direction: virtual sensing.
The core idea is to reconstruct or infer physiological information from an alternative modality when the primary signal becomes unavailable, corrupted, or intentionally disabled for power saving.
To demonstrate this concept, we investigated two complementary tasks:
This creates a bidirectional cross-modal framework where one modality can partially compensate for another depending on device constraints and sensing conditions.
The experiments were conducted using synchronized wearable recordings collected during structured high-intensity interval training (HIIT) sessions. The dataset included:
Unlike controlled laboratory recordings, these sessions contained rapid transitions between sprint and recovery phases, strong wrist movement, and varying physiological responses across participants.
This created a particularly challenging scenario for robust wearable heart rate estimation.
The proposed framework uses a shared lightweight temporal encoder trained across different sensing directions. As illustrated in Figure 1, the framework learns shared latent representations across physiological modalities.
Figure 1. High-level cross-modal virtual sensing framework for wearable physiological inference.
The system supports:
A key aspect of the approach is that the model learns relationships between synchronized modalities during training, while remaining capable of operating with only a single modality during inference.
This allows the framework to support:
Figure 2. Cross-modal spectral reconstruction pipeline with adaptive attention and temporal modeling.
One of the main challenges was the severe level of motion corruption present in wearable physiological signals during high-intensity exercise.
PPG signals become unstable under rapid wrist movement, while accelerometer data contains large amounts of non-cardiac motion noise. As a result, extracting heart-rate-related information from ACC alone becomes extremely difficult.
Another important constraint was computational efficiency.
The framework was designed for real-time wearable deployment, meaning that latency, parameter count, and memory footprint had to remain minimal while still preserving meaningful physiological representations.
Balancing:
became one of the central engineering challenges throughout development.
The experiments demonstrated that cross-modal learning can significantly improve physiological inference under partial sensing conditions (see Table 1).
Table 1. Heart rate estimation performance across different sensing configurations.
The proposed virtual sensing approaches substantially improved ACC-only heart rate estimation and approached the performance of full multimodal fusion systems. Notably, the proposed framework narrowed the gap between single-modality inference and full multimodal fusion despite operating under severe motion conditions.
Key observations included:
The framework also demonstrated that attention-based refinement can transform noisy motion representations into more physiologically meaningful latent structures.
Importantly, the goal was not to literally reconstruct raw optical signals, but rather to learn latent physiological representations that preserve heart-rate-related dynamics across modalities.
Figure 3 demonstrates how cross-modal refinement transforms noisy accelerometer representations into physiologically meaningful structures.
Figure 3. Attention-based refinement suppresses motion artifacts and enhances HR-related spectral structure. Top: raw ACC/PPG signals; bottom: denoised ACC representations using attention and VAE-based refinement.
Although this work focused on wearable heart rate estimation, the broader concept extends beyond a single sensing task.
Cross-modal physiological learning opens possibilities for:
Future wearable systems may increasingly rely on virtual sensing approaches, where available modalities dynamically compensate for unavailable ones instead of depending on fixed sensor configurations.
This direction becomes particularly relevant for next-generation compact devices where battery capacity, physical size, and sensing hardware remain highly constrained.
Our work explored how synchronized wearable sensors can learn latent physiological relationships through cross-modal training.
By enabling one modality to partially infer another, the proposed framework demonstrates a step toward more adaptive, robust, and hardware-efficient wearable AI systems.
Rather than relying solely on explicit sensor fusion, future wearable devices may increasingly use learned physiological priors to maintain reliable monitoring under real-world constraints.
1. Fedorin, V. Pohribnyi, D. Sverdlov, and I. Krasnoshchok, “Lightweight neural network based model for real-time precise HR monitoring during high intensity workout using consumer smartwatches,” IEEE EMBC, 2022.
2. Fedorin, A. Smielova, M. Nastenko, and I. Krasnoshchok, “From Sprint to Recovery: LSTM-Powered Heart Rate Recovery Forecasting in HIIT Sessions,” IEEE EMBC, 2024.
3. Fedorin, K. Slyusarenko, V. Pohribnyi, J. Yoon, G. Park, and H. Kim, “Heart Rate Trend Forecasting During High-Intensity Interval Training Using Consumer Wearable Devices,” ACM MobiCom, 2021.