Robotics

FFT-Based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images

By Elena Camuffo Samsung R&D Institute United Kingdom
By Umberto Michieli Samsung R&D Institute United Kingdom
By Mete Ozay Samsung R&D Institute United Kingdom

In this blog, we introduce our new work [1] published at the ICASSP 2024 Conference.

Introduction


In recent years, home robot technologies have hugely improved thanks to the progress of vision systems installed on them. Technologies such as object detection and recognition help robotic devices to avoid obstacles and recognize objects in the environment. However, those systems are prone to failure when they face challenging situations like light changes or adverse weather conditions [2, 3, 4, 5, 6].

In order to promote the implementation of robust vision models in real-world applications on device, it is imperative to address these robustness challenges. In this work, we deal with this problem, considering a set of visual distortions that are frequently seen in natural pictures acquired in any real-world context.

We propose a novel FFT-based RObust STatistics selection method (FROST) based on a codebook mapping FFT features to corruption-specific statistics.

The desiderata of our setup are as follows:

    
1.     
Robustness to image corruptions at test-time, useful for mobile and robotic applications in the wild;
    
2.     
Improved accuracy against competitors on ImageNet-C, especially in case of severe corruptions, preserving accuracy on clean data;
    
3.     
Usability with different models and architectures, optimized BatchNorm layers parameters, with very limited storage requirements.

Our Method: FROST


The proposed method (FROST) is composed of two main steps:

    
1.     
Determine which corruption is affecting that image by extracting high frequency FFT amplitude features and matching them to some corruption-specific prototypical features representing each corruption type.
    
2.     
Use a codebook to map each corruption type to a specific pre-trained set of normalization statistics to be plugged into the model to improve its robustness.


These two steps are computed both at training and test time. In detail:

    
1.     
At training time (Fig. 1, top) FROST extracts high-frequency amplitudes from corrupted images, it aggregates them for images with the same corruption, and it builds a set of per-corruption feature prototypes. Then, it estimates corruption-specific (Corr-S) and corruption-generic (Corr-G) normalization layer parameters starting from a pre-trained model.
    
2.     
At test time (Fig. 1, bottom), FROST identifies corruption types present in the test images and uses a codebook to map such corruptions to normalization layers’ parameters to minimize the recognition error. These normalization parameters come from either corruption-generic or specific model, depending on the confidence of the model.

Figure 1. At training time, we construct (i) corruption-specific prototypes using high-frequency FFT features and (ii) corruption-specific feature normalization statistics. At inference time, we extract FFT features and perform inference via prototype matching to select the most suitable statistics. If images are not corrupted enough or the model is unsure, the pre-trained standard model is used.

The choice of using high frequency FFT amplitude features comes from the observation that different corruptions present different behaviors in the high frequency amplitudes spectral space (Fig. 2) and they tend to show well-defined clusters. Other recent approaches have been developed relying on FFT transform [2, 3].

Figure 2. t-SNE projection of FFT features. Grouping indistinguishable clusters the ground truth partitioning tends to be very close to k-means partitioning.

Also, some recent approaches have been developed adopting normalization layers to improve model robustness [4, 5]. In this case, we exploit normalization statistics specialized on a single corruption only, showing that such parameters behave very differently according to the corruption type. Normalization parameters chosen via FFT features prove to be very close to corruption specific parameters selected by hand (Fig. 3).

Figure 3. Average variation of normalization layers parameters, obtained aggregating them according to similar corruptions. Ours is very close to the Corruption-Specific case.

Experimental Analyses


We evaluate our approach on the ImageNet-C [6] dataset, containing images of ImageNet [7] corrupted with different corruptions.

We use a generic architecture for classification as a model. This is pre-trained on ImageNet, using suitable data augmentation (we use AugMix [8] + DA [9] + HA [2]) and then FROST is applied on top of that.

We use the Corruption Error (CE) as an evaluation metric, as done by other works on all the severity levels and on severe corruptions only, respectively. Results are shown in Tables 1 and 2.

Our approach outperforms the other methods, gaining up to 37.1% mCE on all corruptions and 40.9% mCE in case of severe corruptions. Specifically, if we compare FROST with the model used as baseline for augmentations (AugMix + DA + HA), it is shown that we can boost the mCE up to 10.3% and 19.6% for all corruptions and severe only corruptions.

Several ablation studies are available on our full paper, where we show that:

    
•     
FROST improves baseline models trained with data augmentation, bringing a wider improvement than applying it on a model without data augmentation.
    
•     
FROST is applicable on top of any architecture and dataset.

Table 1. Analyses of the Corruption Error (CE) ↓ for the ResNet50 on the ImageNet-C. Error ↓ is the error of the model on clean images. mCE ↓ is the classification error averaged on all the 5 corruption levels of ImageNet-C. In green are reported codebook results in terms of accuracy ↑.

Table 2. Same as in Tab. 1, but results have been averaged only on the strongest corruption levels (4,5).

Our code is available at: https://github.com/SamsungLabs/FROST

Conclusion


We proposed a new approach for robust classification of severely corrupted images.

FROST features:

    
•     
Improved robustness of classification models in presence of corrupted images;
    
•     
Requires minimal training (just 0.01% of the network parameters);
    
•     
Portable solution that can be applied on top of any model and dataset;
    
•     
For training, FROST requires limited data and computation time.

Bibliography


[1] E. Camuffo, U. Michieli and M. Ozay, “FFT-Based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images,” in ICASSP, 2024.
[2] M. K. Yucel, R. G. Cinbis and P. Duygulu, “HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness,” in ICCV, 2023.
[3] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” MCOM, 1965.
[4] M. Zhang, S. Levine and C. Finn, “MEMO: Test Time Robustness via Adaptation and Augmentation,” NeurIPS, 2022.
[5] H. Lim, B. Kim, J. Choo and S. Choi, “TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation,” in ICLR, 2023.
[6] D. Hendrycks and T. Dietterich, “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations,” ICLR, 2019.
[7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” ICLR, 2021.
[8] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer and B. Lakshminarayanan, “AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty,” ICLR, 2020.
[9] D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. Zhu, S. Parajuli, M. Guo, D. Song, J. Steinhardt and J. Gilmer, “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization,” in ICCV, 2021.