AI

When One Sensor Learns Another: Cross-Modal AI for Wearables

By Illia Fedorin Samsung R&D Institute Ukraine

By Margaryta Nastenko Samsung R&D Institute Ukraine

By Oleh Semchuk Samsung R&D Institute Ukraine

Wearable devices are expected to deliver increasingly accurate health monitoring while remaining compact, lightweight, and power efficient. However, combining multiple sensing modalities often introduces additional hardware complexity, battery consumption, and robustness challenges.

Among physiological sensors, photoplethysmography (PPG) remains one of the primary technologies for heart rate monitoring. At the same time, PPG is highly sensitive to motion artifacts, temporary signal degradation, and increased power usage during continuous tracking. Accelerometer (ACC) sensors, in contrast, are significantly more robust and energy efficient, but they do not directly measure cardiovascular activity [1-3].

This raises an important research question: can one physiological sensor learn representations that are usually provided by another modality?

Our recent work, presented at ICASSP 2026, explores this concept through a lightweight cross-modal virtual sensing framework for wearable devices. Instead of treating physiological sensors as isolated data sources, we investigated whether synchronized sensor streams could learn latent relationships and compensate for missing or degraded signals.

Motivation: Towards Virtual Physiological Sensing

Modern wearable devices increasingly operate under strict hardware constraints. Compact form factors such as smart rings, earbuds, or lightweight fitness wearables may not always support a full sensor stack. Even when optical sensors are available, motion-heavy activities can severely reduce signal quality.

Traditional multimodal systems rely on explicit sensor fusion: combining ACC and PPG simultaneously to improve robustness. While effective, such approaches still assume that all sensors remain available and reliable during inference.

In our work, we explored a different direction: virtual sensing.

The core idea is to reconstruct or infer physiological information from an alternative modality when the primary signal becomes unavailable, corrupted, or intentionally disabled for power saving.

To demonstrate this concept, we investigated two complementary tasks:

reconstructing virtual PPG-related representations from accelerometer signals,

generating pseudo-motion embeddings from optical signals for motion-aware denoising.

This creates a bidirectional cross-modal framework where one modality can partially compensate for another depending on device constraints and sensing conditions.

High-Intensity Wearable Data Collection

The experiments were conducted using synchronized wearable recordings collected during structured high-intensity interval training (HIIT) sessions. The dataset included:

132 workout logs,

3-axis accelerometer signals,

4-channel PPG signals,

reference heart rate from an ECG-grade chest device.

Unlike controlled laboratory recordings, these sessions contained rapid transitions between sprint and recovery phases, strong wrist movement, and varying physiological responses across participants.

This created a particularly challenging scenario for robust wearable heart rate estimation.

Cross-Modal Virtual Sensing Framework

The proposed framework uses a shared lightweight temporal encoder trained across different sensing directions. As illustrated in Figure 1, the framework learns shared latent representations across physiological modalities.

Figure 1. High-level cross-modal virtual sensing framework for wearable physiological inference.

The system supports:

ACC → virtual PPG reconstruction,

PPG → pseudo-motion embedding generation,

modality-aware denoising,

single-modality real-time inference.

A key aspect of the approach is that the model learns relationships between synchronized modalities during training, while remaining capable of operating with only a single modality during inference.

This allows the framework to support:

sensor dropout scenarios,

degraded sensing conditions,

reduced hardware configurations,

low-power wearable deployment.

Figure 2. Cross-modal spectral reconstruction pipeline with adaptive attention and temporal modeling.

Technical Challenges

One of the main challenges was the severe level of motion corruption present in wearable physiological signals during high-intensity exercise.

PPG signals become unstable under rapid wrist movement, while accelerometer data contains large amounts of non-cardiac motion noise. As a result, extracting heart-rate-related information from ACC alone becomes extremely difficult.

Another important constraint was computational efficiency.

The framework was designed for real-time wearable deployment, meaning that latency, parameter count, and memory footprint had to remain minimal while still preserving meaningful physiological representations.

Balancing:

robustness,

efficiency,

cross-modal generalization,

and real-time inference

became one of the central engineering challenges throughout development.

Results

The experiments demonstrated that cross-modal learning can significantly improve physiological inference under partial sensing conditions (see Table 1).

Table 1. Heart rate estimation performance across different sensing configurations.

The proposed virtual sensing approaches substantially improved ACC-only heart rate estimation and approached the performance of full multimodal fusion systems. Notably, the proposed framework narrowed the gap between single-modality inference and full multimodal fusion despite operating under severe motion conditions.

Key observations included:

strong improvement over raw ACC-only estimation,

near fusion-level performance in several configurations,

stable real-time inference,

robust operation under high-motion conditions,

efficient deployment characteristics suitable for wearable hardware.

The framework also demonstrated that attention-based refinement can transform noisy motion representations into more physiologically meaningful latent structures.

Importantly, the goal was not to literally reconstruct raw optical signals, but rather to learn latent physiological representations that preserve heart-rate-related dynamics across modalities.

Figure 3 demonstrates how cross-modal refinement transforms noisy accelerometer representations into physiologically meaningful structures.

Figure 3. Attention-based refinement suppresses motion artifacts and enhances HR-related spectral structure. Top: raw ACC/PPG signals; bottom: denoised ACC representations using attention and VAE-based refinement.

Beyond Heart Rate Monitoring

Although this work focused on wearable heart rate estimation, the broader concept extends beyond a single sensing task.

Cross-modal physiological learning opens possibilities for:

adaptive sensing systems,

reduced sensor stacks,

fault-tolerant wearable inference,

low-power monitoring,

and more flexible multimodal health devices.

Future wearable systems may increasingly rely on virtual sensing approaches, where available modalities dynamically compensate for unavailable ones instead of depending on fixed sensor configurations.

This direction becomes particularly relevant for next-generation compact devices where battery capacity, physical size, and sensing hardware remain highly constrained.

Conclusion

Our work explored how synchronized wearable sensors can learn latent physiological relationships through cross-modal training.

By enabling one modality to partially infer another, the proposed framework demonstrates a step toward more adaptive, robust, and hardware-efficient wearable AI systems.

Rather than relying solely on explicit sensor fusion, future wearable devices may increasingly use learned physiological priors to maintain reliable monitoring under real-world constraints.

Related Publications

ICASSP 2026: Learning Cross-Modal Physiological Signals on Wearables,
https://ieeexplore.ieee.org/document/11462938/

Information Fusion (extended work): Virtual PPG Reconstruction from Accelerometer Data via Adaptive Denoising and Cross-Modal Fusion,
https://www.sciencedirect.com/science/article/abs/pii/S1566253525008437

References

1. Fedorin, V. Pohribnyi, D. Sverdlov, and I. Krasnoshchok, “Lightweight neural network based model for real-time precise HR monitoring during high intensity workout using consumer smartwatches,” IEEE EMBC, 2022.
2. Fedorin, A. Smielova, M. Nastenko, and I. Krasnoshchok, “From Sprint to Recovery: LSTM-Powered Heart Rate Recovery Forecasting in HIIT Sessions,” IEEE EMBC, 2024.
3. Fedorin, K. Slyusarenko, V. Pohribnyi, J. Yoon, G. Park, and H. Kim, “Heart Rate Trend Forecasting During High-Intensity Interval Training Using Consumer Wearable Devices,” ACM MobiCom, 2021.

#ICASSP #Health AI

AI