Robotics
In this blog, we introduce our new work [1] published at the ICASSP 2024 Conference.
In recent years, home robot technologies have hugely improved thanks to the progress of vision systems installed on them. Technologies such as object detection and recognition help robotic devices to avoid obstacles and recognize objects in the environment. However, those systems are prone to failure when they face challenging situations like light changes or adverse weather conditions [2, 3, 4, 5, 6].
In order to promote the implementation of robust vision models in real-world applications on device, it is imperative to address these robustness challenges. In this work, we deal with this problem, considering a set of visual distortions that are frequently seen in natural pictures acquired in any real-world context.
We propose a novel FFT-based RObust STatistics selection method (FROST) based on a codebook mapping FFT features to corruption-specific statistics.
The desiderata of our setup are as follows:
The proposed method (FROST) is composed of two main steps:
These two steps are computed both at training and test time. In detail:
Figure 1. At training time, we construct (i) corruption-specific prototypes using high-frequency FFT features and (ii) corruption-specific feature normalization statistics. At inference time, we extract FFT features and perform inference via prototype matching to select the most suitable statistics. If images are not corrupted enough or the model is unsure, the pre-trained standard model is used.
The choice of using high frequency FFT amplitude features comes from the observation that different corruptions present different behaviors in the high frequency amplitudes spectral space (Fig. 2) and they tend to show well-defined clusters. Other recent approaches have been developed relying on FFT transform [2, 3].
Figure 2. t-SNE projection of FFT features. Grouping indistinguishable clusters the ground truth partitioning tends to be very close to k-means partitioning.
Also, some recent approaches have been developed adopting normalization layers to improve model robustness [4, 5]. In this case, we exploit normalization statistics specialized on a single corruption only, showing that such parameters behave very differently according to the corruption type. Normalization parameters chosen via FFT features prove to be very close to corruption specific parameters selected by hand (Fig. 3).
Figure 3. Average variation of normalization layers parameters, obtained aggregating them according to similar corruptions. Ours is very close to the Corruption-Specific case.
We evaluate our approach on the ImageNet-C [6] dataset, containing images of ImageNet [7] corrupted with different corruptions.
We use a generic architecture for classification as a model. This is pre-trained on ImageNet, using suitable data augmentation (we use AugMix [8] + DA [9] + HA [2]) and then FROST is applied on top of that.
We use the Corruption Error (CE) as an evaluation metric, as done by other works on all the severity levels and on severe corruptions only, respectively. Results are shown in Tables 1 and 2.
Our approach outperforms the other methods, gaining up to 37.1% mCE on all corruptions and 40.9% mCE in case of severe corruptions. Specifically, if we compare FROST with the model used as baseline for augmentations (AugMix + DA + HA), it is shown that we can boost the mCE up to 10.3% and 19.6% for all corruptions and severe only corruptions.
Several ablation studies are available on our full paper, where we show that:
Table 1. Analyses of the Corruption Error (CE) ↓ for the ResNet50 on the ImageNet-C. Error ↓ is the error of the model on clean images. mCE ↓ is the classification error averaged on all the 5 corruption levels of ImageNet-C. In green are reported codebook results in terms of accuracy ↑.
Table 2. Same as in Tab. 1, but results have been averaged only on the strongest corruption levels (4,5).
Our code is available at: https://github.com/SamsungLabs/FROST
We proposed a new approach for robust classification of severely corrupted images.
FROST features: