AI

[INTERSPEECH 2025 Series #4] Low Complex IIR Adaptive Hear Through Ambient Filtering for Overcoming Practical Constraints in Earbuds

By M L N S Karthik Samsung R&D Institute India-Bangalore
By Rishabh Gupta Samsung R&D Institute India-Bangalore

Interspeech is one of the premier international conferences dedicated to advancing and disseminating research in the field of speech science and technology. It serves as a global platform where researchers, engineers, and industry professionals can share cutting-edge innovations, methodologies, and applications related to speech communication.

In this blog series, we are introducing some of our research papers at INTERSPEECH 2025 and here is a list of them.

#1. Beyond Hard Sharing: Efficient Multi-Task Speech-to-Text Modeling with Supervised Mixture of Experts (AI Center-Seoul)

#2. Significance of Time-Frequency Preprocessing for Automatic Ultrasonic Vocalization Classification in Autism Spectrum Disorder Model Detection(Samsung R&D Institute Poland)

#3. Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting (AI Center-Seoul)

#4. Low Complex IIR Adaptive Hear-Through Ambient Filtering for Overcoming Practical Constraints in Earbuds (Samsung R&D Institute India-Bangalore)

#5. SPCODEC: Split and Prediction for Neural Speech Codec (Samsung R&D Institute China-Beijing)

#6. Efficient Streaming TTS Acoustic Model with Depthwise RVQ Decoding Strategies in a Mamba Framework (AI Center-Seoul)

#7. Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes (AI Center-Cambridge)

#8. A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions (Samsung R&D Institute China-Nanjing)

#9. Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS (Samsung R&D Institute India-Bangalore)

Introduction

Latest earbud devices are equipped with advanced features such as immersive audio, active noise control (ANC) and ambient mode to enhance the wearing comfort and situational awareness of the users [1, 2]. The external ambient sound is attenuated by passive filtering due to physical structure of the earbud devices [2-5]. The goal of ambient mode is to ensure that ambient sound playback over earbuds is perceptually similar to the user’s natural open ear listening experience. Ambient mode enhances the user’s situational awareness in everyday listening scenarios such as conversations, ambient music, pet sounds and emergency situations such as alarms, sirens or horns. In this blog, we refer to the techniques for ambient mode and the corresponding filters as hear-through techniques and hear-through equalization (HT EQ) filters, respectively.

Minimizing computational complexity of hear-through techniques is important to reduce processing delays which enhances the naturalness of ambient mode. Larger processing delays can lead to unnatural listening experiences such as sonic artifacts due to comb-filtering [2-4]. While fixed filters analyzed in previous studies [4,5] have lower processing delay, their usage can lead to mismatch between modelled and desired open ear response for different users and different fittings for each user. Adaptive Finite Impulse Response (FIR) filters can provide a closer match to the target open ear response while maintaining stability, however, they can lead to higher computational complexity and larger processing delays [3,6,7]. On the other hand, while adaptive IIR filters can reduce computational complexity, memory usage and processing delay, they can lead to instability and poor performance for certain scenarios. However, most previous studies [4-7, 10-12] have not systematically analyzed the adaptive IIR filtering methods to overcome these limitations for achieving superior ambient mode performance in complex acoustic environments. Moreover, most of the previous studies have examined the performance by computing error signal at the internal microphone present near the earbud’s loudspeaker, whereas the ideal location is at the user’s eardrum. Since placing the error microphone at user’s ear is not practically feasible, equivalent circuit models such as lumped parameter models have been proposed in previous studies [9]. However, their practical usage is limited in ambient mode applications over earbuds due to their lack of generalizability across different frequency ranges, users and fittings.

Our Contributions

In this blog post, we focus on the virtual sensing based adaptive IIR filtering techniques for reducing computational complexity and processing delays while improving the HT EQ performance.

Our major contributions are as follows:

    
1.
The stability of IIR filter is improved using an equation error model [8], which converts the output transfer functions from pole–zero form to all-zero form
    
2.
The proposed method improves the estimation of ear canal response and ambient mode performance compared to other state-of-the-art techniques. The proposed method selects the ear canal response from a stored database utilizing the dominant frequency estimate of the leakage signal to ensure that the response selected from the database matches closely to the user’s response.
    
3.
We propose a novel method of integrating the adaptive IIR filtering with remote microphone techniques to design an end-to-end HT EQ system with low complexity and enhanced performance.
    
4.
We propose a method for online secondary path modelling based on user’s earbud fitting estimate to improve the secondary path estimation for each user fitting.


Proposed Method

The proposed algorithm follows a systematic approach as shown in the Figure 1. The input signal $x(n)$ is captured at the reference microphone and the generated ambient signal at the error microphone $d'(n)$ is given by

$d'(n)=p_e(n)+y' (n)$         (1)

where the residual ambient sound signal due to passive attenuation by the earbuds is denoted as $p_e(n)$ and the compensating signal generated by our proposed HT EQ filtering system is denoted by $y'(n)$. The generated ambient signal and the open ear ambient signal (ideally) must be same i.e., $d'(n)=d(n)$, with $d(n)$ being the desired open ear response. However, to make the ambient mode work well for multiple users, we aim to achieve perceptual similarity between the generated ambient signal and the open ear ambient signal.

Figure 1. Block diagram of the proposed HT EQ filter

Remote Microphone Technique

In order to design the HT EQ filter, we propose to utilize a remote microphone based virtual sensing technique to estimate the target ambient signal at the ear drum of the user. In Figure 1, $S(z)$ is the secondary path between the earbud’s speaker and internal microphone, $S_v(z)$ is the virtual secondary path between the earbud’s speaker and error mic at the eardrum, $P(z)$ is the primary path between ambient sound source and the internal microphone, and $T ̂(z)$ indicates the open ear response. The mean squared error is minimized using the gradient descent method to obtain the optimal coefficients of the proposed adaptive IIR filters. The readers are requested to refer to our full paper for detailed derivations for the filter update equations.

Generation of Database of the Virtual Paths

Due to the differences in anthropometric features and earbud fittings across individuals, we need to model the transfer function between the internal microphone (approximate error microphone location) and the eardrum microphone (ideal microphone location). We propose a two-stage method for virtual sensing, which involves a database creation process followed by an implementation stage. First, impulse responses for different earbud fittings are measured using a dummy head. During the first stage, the dominant frequency of the measured responses together with ear canal length and virtual paths are stored in a database on the earbuds. During the implementation phase, the dominant frequency of estimated leaked real signal is utilized to estimate approximate ear canal length, and corresponding virtual path which is the closest match is selected for further processing as shown in Fig. 1. Considering the computational memory constraints and construction of earbuds, the virtual path length of 128 and 10 different ear canal characteristics was proposed to be stored on earbuds.

Experiments and Results

To evaluate the performance of proposed algorithm in real-world scenarios, complex indoor and outdoor acoustic environments were simulated. The outdoor environment had the user listening to siren sounds on a vehicle moving at constant speed of 15 km/hr from azimuthal direction of $0^0$ to $150^0$, together with a diffuse wind noise signal, and a speech signal from a speaker situated at $75^0$ azimuth. For simulating an indoor environment, the user is listening to music from a loudspeaker located at $60^0$ together with diffuse construction noise, and distant sound of car passing by on the road outside the room from 0° to 150° azimuth at 15 km/hr. In all cases, the elevation for directional sources is taken as 0°.

Figure 2. MSE performance comparison for proposed adaptive IIR filter with state-of-the-art adaptive FIR and fixed IIR filters for (a) white noise, (b) simulated outdoor and (c) simulated indoor environments

Figure 3. Pole plot of the proposed IIR filters, which is stable with all poles inside the unit circle for white noise signal, simulated indoor and outdoor acoustic environment signals

Fig. 2(a), 2(b) and 2(c) shows the MSE plot for white noise signals, simulated outdoor and indoor acoustic environments respectively, where the blue lines represent the switching of the direction of arrival (DoA) for the car and siren sounds. As observed from Figure 2, the proposed technique is able to track the changes in the sound signals with lower MSE compared to other state-of-the art adaptive FIR and fixed IIR filtering methods for both indoor and outdoor simulated environments [4, 10-12]. Fig. 3 shows the pole plot for the proposed method evaluated using simulated acoustic environment signals and the white noise signal, demonstrating stability with all poles lying inside the unit circle for all three scenarios.

Furthermore, to verify the effectiveness of the proposed method for different user fittings, we have evaluated the power spectral density (PSD) error at the eardrum as shown in Fig. 4. The PSD error is evaluated using the estimated and target open ear responses using the white noise signal of length 10 seconds. From Fig. 2, we can observe that the proposed virtual sensing-based technique has similar performance as the state-of-the-art lumped parameter model [9] up to 3 kHz. It has significantly lesser PSD error compared to that of lumped parameter-based ear canal model above 3 kHz range by more than 20 dB for certain frequencies between 4 to 8 kHz. With our adaptive IIR filtering method combined with proposed ear canal modelling technique, the performance slightly improves above 3 kHz compared to filtering method without ear modelling. Therefore, the maximum benefit of our proposed ear canal modelling technique was observed at frequency region close to 5.5 kHz, with about 3 dB to 6 dB lesser PSD error.

Figure 4. PSD performance at the eardrum compared to open ear response for different fittings

Observations based on computational complexity: The computational complexity for all the algorithms is shown in Table 1. Here, $L_v$, $L_s$, $L_w$, $L_a$, and $L_b$ are the length of the virtual path, length of secondary path, filter taps of FIR adaptive filters, number of poles and zeros of the proposed IIR filter respectively. From Table 1, we demonstrate that proposed approach takes 34% less MAC operations compared to state-of-art adaptive FIR methods such as Filtered-x Least Mean Square (FxLMS), Filtered-x Block based LMS (FxBLMS), and Filtered-x Block based total least squares (FxBTLS) [10-12].

Table 1. Computational Complexity of Proposed and State-of-the-art Algorithms
$L_v=128$, $L_s=128$, $L_w=256$, $L_a=30$, and $L_b=30$

Conclusion

Our proposed method reduces the computational complexity by 34% in terms of MAC operations, while improving the ambient mode performance by about 10-20 dB for white noise signals and simulated indoor and outdoor acoustic environments over existing state-of-the-art adaptive FIR and fixed IIR filtering techniques. We also demonstrate that our proposed technique improves the ear canal modelling performance over existing lumped parameter methods above 3 kHz range. In future, we plan to utilize a larger database of responses measured for multiple users with different earbud fittings, together with further evaluation of adaptive IIR filter techniques in dynamic complex real-world acoustic environments.

References

1. "Hearable devices—Global market trajectory & analytics", Apr. 2021.

2. V. Valimaki, A. Franck, J. Ramo, H. Gamper and L. Savioja, "Assisted listening using a headset: Enhancing audio perception in real augmented and virtual environments", IEEE Signal Process. Mag., vol. 32, no. 2, pp. 92-99, Mar. 2015.W.-S. Gan, J. He, R. Ranjan and R. Gupta, "Natural and augmented listening for VR and AR/MR", Proc. ICASSP Tutorial, Apr. 2018.

3. R. Gupta, R. Ranjan, J. He, W.S. Gan and S. Peksi, "Acoustic transparency in hearables for augmented reality audio: Hear-through techniques review and challenges", Proc. AES Int. Conf. AVAR, pp. 3-7, Aug. 2020.

4. J. Ramo and V. Valimaki, "An allpass hear-through headset", Proc. 22nd Eur. Signal Process. Conf. (EUSIPCO), pp. 1123-1127, Sep. 2014.

5. J. Ramo and V. Valimaki, "Digital augmented reality audio headset", J. Electr. Comput. Eng., vol. 2012, pp. 1-13, Oct. 2012.

6. V. Patel, J. Cheer and S. Fontana, "Design and implementation of an active noise control headphone with directional hear-through capability", IEEE Trans. Consum. Electron., vol. 66, no. 1, pp. 32-40, Feb. 2020.

7. R. Gupta, R. Ranjan, J. He and W.S. Gan, "Parametric hear through equalization for augmented reality audio", Proc. IEEE 44th Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 1587-1591, May 2019.

8. C. Y. Ho, K. K. Shyu, C. Y. Chang, and S. M. Kuo. Development of equation-error adaptive IIR-filter-based active noise control system. Applied Acoustics, 163, 107226 (2020).

9. M. Hiipakka, “Measurement apparatus and modelling techniques of ear canal acoustics,” Ph.D. thesis, Helsinki University of Technology, Espoo, Finland.

10. C. R. Huang, C. Y. Chang and S. M. Kuo, "Time-Shift Modeling-Based Hear-Through System for In-Ear Headphones," in IEEE Transactions on Consumer Electronics, vol. 68, no. 3, pp. 273-280, Aug. 2022, doi: 10.1109/TCE.2022.3190422.

11. B. K Mohanty, and P. K. Meher. "A high-performance energy-efficient architecture for FIR adaptive filter based on new distributed arithmetic formulation of block LMS algorithm." IEEE Transactions on Signal Processing 61.4 (2012): 921-932.

12. Y. Chen, and H. Zhao. "Improved robust total least squares adaptive filter algorithms using hyperbolic secant function." IEEE Transactions on Circuits and Systems II: Express Briefs 69.9 (2022): 3944-3948.