AI
In this blog we introduce our new work [1] published at the ACM MMSys’24 conference.
I have a challenge: I am going to show you a very noisy image. Would you be able to recognize its content confidently? No? Neither can AI models.
Like humans, they find it challenging to understand the content of corrupted images and, sadly, they tend to be much more sensitive to the problem than we are.
We show an example in the figure below: to a human, the images look identical, and they would have no issue identifying the content (elephant). However, we subtly modified the second image so that a target AI model completely fails to recognize the animal, predicting hamster instead.
Figure 1a. Prediction: African Elephant
Figure 1b. Prediction: Hamster
This is a very big problem for systems embedded on everyday devices, since (to a lesser degree) common natural corruptions lead to misclassifications as well.
In our paper, referred to as SyMPIE [1], we tackle the issue of AI model robustness efficiently, designing a modular system that cleans the input images before feeding them to the AI models. Our approach improves visual appearance and classification accuracy at the same time: in the next figure, we show the results of applying our strategy to the modified image.
Figure 2. Prediction: African Elephant
Notice how the colours are more saturated and the contrast is higher in the processed image, when compared to the input. It is not easy to spot the differences, but AI models are highly sensitive to this.
The desiderata of our setup are as follows:
As the name suggests, our System for Modular Parametric Image Enhancement (SyMPIE) is a parametric image enhancer built with modularity in mind.
In Figure 3 we show a before/after schematic representation of a multimedia system making use of our architecture, which is inserted between input and AI model to clean the former.
Figure 3. An illustration of our modular system(SyMPIE) for efficient image enhancement targeting increased model robustness to corruption in different multimedia taskas. SyMPIE contains two modules, namely, NEM and DWM.
Figure 4 shows a more detailed view of the SyMPIE architecture and of the two modules that comprise it: Noise Estimation Module (NEM) and Differentiable Warping Module (DWM).
Figure 4. A detailed scheme of our modules working together to enhance the content of an image. The Noise Estimation Module(NEM) receives a corrupted input and predicts a triple of parameters (Cs, Cm, K). These parameters are used by the Differentiable Warping Module(DWM) to enhance the image using parametric operators.
The core idea behind SyMPIE is that direct estimation of warping parameters is more computationally efficient than cleaning the input in a black-box approach.
We confirm this in Table 1, where we computed the Floating Point Operations (FLOPs) needed for a single inference of the various methods. The results show how our method is (in the worst case) 10x faster than the competitors.
Table 1. Computational complexity of our method compared to other input-level image enhancement strategies.
Differently from other denoising approaches, SyMPIE does not require paired clean-corrupted data for training. It is trained end-to-end exploiting a frozen downstream network.
The image enhancement objective is an emerging property of downstream task optimization, since clean images lead to better results on the frozen model.
In Figure 5, we show a schematic representation of the training strategy employed for SyMPIE, while a more detailed description in Algorithm 1.
Figure 5. An overview of the training procedure of our modular system.
We employ a two-step procedure exploiting the exponential moving average of our modules to avoid a common fallacy found in denoisers, modal collapse upon iterated application.
We evaluate our approach on multiple image classification and semantic segmentation benchmarks, achieving a consistent improvement of 5%.
In particular, we used the ImageNetC, ImageNetC-Bar and VizWiz datasets for the evaluation on corrupted image classification. We also tested the architecture on an additional benchmark we developed by adding multiple corruptions from ImageNetC to the same image, which we call ImageNetC-Mixed.
For Semantic Segmentation we used the Cityscapes, ACDC and DarkZurich datasets. We investigated a common task in domain adaptation: Clean-to-Adverse Weather.
Table 2 shows the main results of our system, confirming an average improvement of 5%. Note that we trained SyMPIE only once with RN50-V2 as the downstream classifier, and used it as-is in all other experiments.
Table 2. Results for the image classification task on the ImageNetC dataset(higher is better).
In Tables 3 and 4 we show the results attained in the ImageNetC-Mixed and VizWiz datasets, again one can appreciate the improvement brought by our system.
Table 3. Accuracy on ImageNetC-mixed via ResNet50.
Table 4. Results for the VizWiz with ResNet50 V2.
To the best of our knowledge, VizWiz is the only real-world benchmark providing corrupted images. Remarkably, even if most corruptions in the dataset cannot be effectively modeled by our approach, SyMPIE is able to improve the performance even on the clean data. This suggests that the images considered clean by the designers of the dataset suffer from corruptions as well, which were cleaned by our method.
To further investigate the effect of applying our system to images showing corruptions not modeled by the current implementation, we studied the performance on the ImageNetC-Bar dataset, which we show in Table 5.
Table 5. Quantitative results on the ImageNetC-Bar dataset with ResNet50.
The improvement here is similar to the one in the VizWiz dataset.
Finally, we report the results in Semantic Segmentation in the following Figure and Table.
Table 6. Quantitative results for semantic segmentation using a DeepLabV2 [7] architecture with the ResNet50 backbone.
Figure 6. Quantitative results on ACDC semantic segmentation benchmark.
SyMPIE improves the accuracy by 4% even in this completely different task.
In this work, we introduced a small and efficient modular system to enhance the images in input to multimedia systems. Our SyMPIE could improve performance in a variety of scenarios and tasks at a fraction of the cost of competitors thanks to the explicit parameter estimation and warping.
Link to paper arxiv: https://arxiv.org/abs/2402.18402
Link to open-source code: https://github.com/SamsungLabs/SyMPIE