AI
Image restoration has witnessed significant progress over the decades, especially with the advent of various deep learning networks. However, single models with different architectures or random initialization states exhibit prediction deviations from ground-truths, resulting in sub-optimal restoration results.
To alleviate this problem, ensemble learning, a traditional but influential machine learning technique, has been applied to image restoration. However, most ensemble methods in image restoration focus on training-stage ensemble requiring the ensemble strategy to be determined while training multiple models, thus sacrificing flexibility of changing models and convenience for plug-and-play usage [1].
Post-training ensemble methods are needed but challenging. Unlike classification and regression, image restoration predictions are matrices with each pixel is correlated with others and range from 0 to 255. As a result, traditional methods like bagging and boosting either require enormous computational resources for the restoration task or fail to generalize well due to the imbalance between candidate number and feature dimension. As an alternative, Jiang et al. propose a post-training ensemble algorithm for super-resolution by optimizing a maximum a posteriori problem with a reconstruction constraint [2]. However, this constraint requires an explicit expression of the degradation process, which is extremely difficult to define for other restoration tasks beyond super-resolution. These issues lead researchers in image restoration still suffer weighted averaging as their primary choice [3].
To this end, we formulate the ensemble of restoration models using Gaussian mixture models (GMMs), where ensemble weights can be efficiently learned via the expectation maximization (EM) algorithm and stored in a lookup table (LUT) for subsequent inference. Our method does not require training or prior knowledge of the base models and degradation processes, making it applicable to various image restoration tasks.
Given a test set with numerous pairs of input images and ground-truths, suppose we have M pre-trained base models for image restoration. The widely-used averaging ensemble in image restoration assigns equal weights for all samples and pixels. Recent method in the NTIRE 2023 competition assigns weights inversely proportional to the mean squared error between the predictions and their average. However, they adopt globally constant weights for all pixels and samples, neglecting that the performances of base models may fluctuate for different patterns and samples. Alternatively, we start from the prospective of GMMs and assign range-specific weights based on the EM algorithm.
Suppose we have a reference set with N pairs of input images and ground-truths. We assume the reference set and test set are sampled from the same data distribution. Based on Gaussian prior, it can be assumed that the estimation error of a model on an image follows a zero-mean Gaussian distribution, Then the observed ground-truth can be considered following a multivariate Gaussian with the mean equal to the prediction.
We can consider the ensemble problem as the weighted averaging of Gaussian variables and estimate the weights by solving its maximum likelihood estimation. However, solving the sample-wise mixture of Gaussian is not feasible because the covariance matrices are sample-wise different and thus hard to estimate. Besides, the number of prediction samples is much fewer than feature dimension, resulting in the singularity of the covariance matrices.
In contrast, we alternatively append the reference set into a single sample following Gaussian. Since data samples can be considered following i.i.d data distribution, the variance of the concatenated samples is diagonal.
However, the covariance matrix is still singular due to the imbalance between prediction sample number and feature dimension. Thus, directly mixing the multivariate Gaussian is still infeasible to solve. We thus alternatively categorize pixels into various small bins of mutually exclusive ranges such that the pixels within each range can be considered following a univariate Gaussian distribution according to the central limit theorem.
The reference set is therefore separated into a number of bin sets, and the ground-truth pixels inside each of them form a solvable univariate GMM. We then introduce the latent variable z representing the probability of the pixel belonging to the m-th Gaussian component, which is equivalent to the role of the ensemble weight for the m-th base model. The value of ensemble weights can be estimated by the maximum likelihood estimates of the observed ground-truths. We formulated the expression of GMMs for estimating the range-specific ensemble weights. The complete and detailed formulars can be seen in the paper.
For each bin set, we estimate ensemble weights by maximizing the log likelihood. We have an E-step to estimate the posterior distribution. After that, we have an M-step to obtain the maximum likelihood estimates. Thanks to the separation of bin sets, we have prior knowledge of the mean and variance of each model, which can be estimated.
We store the range-specific weights estimated on the reference set into a LUT. During the inference stage for a test sample, we have the prediction of the m-th base model. For a bin set, we partition input pixels of multiple models into each bin. Then we retrieve the estimated range-wise weights from the LUT based on each key of the bin set and obtain the aggregated ensemble. The details of main algorithm can be found in the paper.
Figure 1. A visual comparison on an image from Manga109 for super resolution. “HR & LR” means high-resolution and bicubic-upscaled low-resolution images. The second line of (c)-(g) are error maps
Benchmarks. We evaluate our ensemble method on 3 image restoration tasks including super resolution, deblurring, and deraining. For super-resolution, we use Set5, Set14, BSDS100, Urban100 and Manga109 as benchmarks. For deblurring, we use GoPro, HIDE, RealBlur-J and -R. For deraining, we adopt Rain100H, Rain100L, Test100, Test1200, and Test2800.
Base Models. To evaluate the generalization of ensemble methods against model choices, we employ a wide variety of base models, including CNNs, ViTs, MLPs and Mambas. For image super-resolution, we use SwinIR, SRFormer, and MambaIR. We choose MPRNet, DGUNet, and Restormer for deblurring, as well as MPRNet, MAXIM, and Restormer for deraining.
Baselines. We utilize regression algorithms including bagging, AdaBoost, random forests (RForest), gradient boosting decision tree (GBDT), histogram gradient boosting decision tree (HGBT) as baselines. Averaging is also a commonly used ensemble baseline. A recent method proposed by team ZZPM in the NTIRE 2023 competition is also included for comparison. Additionally, we adopt RefESR [2] for image super-resolution ensemble.
We present quantitative comparisons with existing ensemble methods in Tab. 1 for super-resolution. In contrast, our method, which learns per-value weights, can recognize performance biases and alleviate and consistently performs well for all cases.
Table 1. The ensemble results on the task of image super-resolution. The categories of “Base”, “Regr.” and “IR.” in the first column mean base models, regression-based ensemble methods, and those ensemble methods designed for image restoration.
More details of quantitative comparisons for deblurring and de-raining can be seen in the paper.
Figure 2. A visual comparison on an image from GoPro for the task of image deblurring. “GT & LQ” means ground-truth and low quality blurry images. The second line of (c)-(g) are error maps.
Figure 3. A visual comparison of ensemble on an image from Test100 for image de-raining. “GT & LQ” means ground-truth and low quality rainy images. The second line of (c)-(g) are error maps
We also provide qualitative visual comparisons in Fig. 1 and 2. In Fig. 1, our method with the bin width b = 32 that learns fine-grained range-wise weights successfully recovers the pattern. In Fig. 2, only our method can effectively obtain a better ensemble with sharp edges and accurate colors. In Fig. 3, MPRNet removes all stripe patterns on the ground together with rain streaks while ours alleviates the issue.
In this paper, we propose an ensemble algorithm for image restoration based on GMMs. We partition the pixels of predictions and ground-truths into separate bins of exclusive ranges and formulate the ensemble problem using GMMs over each bin. The GMMs are solved on a reference set, and the estimated ensemble weights are stored in a lookup table for the ensemble inference on the test set. Our algorithm outperforms regression-based ensemble methods as well as commonly used averaging strategies. It is training-free, model-agnostic, and thus suitable for plug-and-play usage.
https://neurips.cc/virtual/2024/poster/93407
[1] Lingfeng Wang, Zehao Huang, Yongchao Gong, and Chunhong Pan. Ensemble based deep networks for image super-resolution. Pattern Recognition, 2017.
[2] Junjun Jiang, Yi Yu, Zheng Wang, Suhua Tang, Ruimin Hu, and Jiayi Ma. Ensemble super-resolution with a reference dataset. IEEE TCYB, 2019
[3] Zhang Z, Zhang S, Wu R, et al. NTIRE 2024 challenge on bracketing image restoration and enhancement: Datasets methods and results[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 6153-6166.