White balance is an essential step in the camera imaging process; it ensures that the colors of the captured images are correct; see Figure 1 for an example. White balance involves estimating the scene illumination, which is then used to correct the image colors. Recent state-of-the-art methods rely on a single input image to estimate the scene illumination and typically involve training models with large numbers of parameters on large datasets. In this blog, we show how to overcome such challenges by using two input images from two cameras, simultaneously, to estimate the scene illumination. Our approach enables us to use a significantly more efficient neural network, only a few hundred parameters, to achieve state-of-the-art accuracy in illuminant estimation and white balance.
Figure 1. The role of white balance in correcting image colors.
Whenever we use our camera to capture an image, there is an important factor affecting the colors of the final image we get, which is scene illumination, or simply, the color of the light source at the scene. Let’s assume we want to capture an image of a scene, such as the one shown in Figure 1, using our camera. This scene is illuminated by a specific illumination, which can be represented by three color components---namely, red (R), green (G), and blue (B), which is the standard way to represent colors. When we image the scene with our camera, there are many operations happening inside the camera. As shown in Figure 2, the first output we can get from the camera is the rawRGB (or RAW) image, which is the direct digital readings off the imaging sensor. This rawRGB image then gets processed by an image signal processor (ISP) that is responsible for producing the final image we usually save to file as a JPEG image.
Figure 2. The camera imaging pipeline
The problem facing the camera is that the rawRGB image suffers from a color cast that is caused by the scene illumination and the spectral sensitivity of the imaging sensor. This color cast, as can be seen in Figure 1, results in inaccurate image colors, and hence, needs to be fixed. But before we fix the color cast, first we need to know about color constancy.
The human visual system can usually compensate for illumination differences and can recognize object colors accurately under different illuminations, which is known as human color constancy. However, cameras can’t perform color constancy on their own. When imaging a scene with a camera, the scene illumination affects the captured image’s colors directly, resulting in a color cast related to the color of the illumination. For example, if we capture an image of a white sheet of paper under a red light, without removing the red color cast, the image would appear red, not white. So, it is critical to undo the color cast caused by the scene illumination in order to obtain the correct colors in the captured image. On board a camera, this is typically done computationally, and hence, it is called computational color constancy.
The first step towards solving the color constancy problem is to estimate the color of the scene illumination. One the scene illumination is estimated, it can be used to correct the colors of the captured image and remove the color cast, as shown in Figure 3.
Figure 3. The separation of the white balance process into illuminant estimation and color correction.
Most existing methods use a single image from a single camera to estimate the scene illumination. Conversely, we use two images from two cameras to get a better estimation of the scene illumination. But before going into the details of our method, we need to go over an important theory that represents the foundation of our two-camera illuminant estimation method.
Our method is inspired by the chromagenic color constancy theory . To understand chromagenic color constancy, let’s assume we capture two images of the same scene under the same illumination using the same camera. We capture the first image normally. For the second image, we place a spectral color filter in front of the camera that results in changing the spectral response of the imaging sensor. This process imitates capturing the two images with two cameras having different spectral sensitivities.
We can approximate a mapping between the two images using a 3 x 3 linear transformation matrix T1 that maps the first image to the second. The chromagenic color constancy theory indicates that if the scene illumination changes and we capture another pair of images in the same way, the transformation matrix would change as well. In other words, if the illuminants are different, then the transformations between the image pairs will be also different. This relationship is illustrated in Figure 4.
Figure 4. The chromagenic color constancy concept.
Our method extends the chromagenic color constancy concept to two-camera systems in modern smartphones. Such cameras typically have different spectral sensitivities, that is, they respond to incident light differently, and hence, it is suitable to assume that the chromagenic color constancy idea applies to these cameras.
Our method operates as follows. We start by capturing two images from the two cameras. Since the two cameras are likely to have different optics, the two images would probably exhibit different views. So, we have to warp and crop one of the images to match the other. Then, we down-sample both images to minimize errors from misaligned pixels between the images. Then, we compute a 3 x 3 color transformation matrix. We then use this 3 x 3 matrix as the only input to a lightweight neural network that estimates the scene illuminant. Our method is illustrated in Figure 5.
Figure 5. Our two-camera illuminant estimation method.
We used three types of datasets to train and evaluate our method. The first dataset, is a radiometric dataset that consists of synthetic image pairs generated by integrating object reflectance spectra and illumination spectra using two different spectral sensitivity functions from two cameras . The second dataset is based on the images from two cameras from the NUS color constancy dataset . We collected a third dataset that consists of 156 image pairs captured with two cameras of the Samsung Galaxy S20 Ultra. All images contain the color rendition chart and provided in rawRGB and sRGB color spaces. Examples from the three dataset are shown in Figure 6.
Figure 6. Datasets used for training and evaluation.
Now, let’s have a look at the distribution of the ground truth illuminants from the two cameras from each of the three datasets, as shown in Figure 7. We can see that the two cameras have different distributions of illuminants, which indicates that they have different spectral sensitivities.
Figure 7. Distribution of the ground truth illuminants of the two cameras from three datasets.
We train and test our illuminant estimation neural network on the three proposed datasets. Our method achieves state-of-the-art performance in terms of mean angular error between estimated and ground truth illuminants, as shown in Figure 8.
Figure 8. Results of our method compared to existing methods.
It is worth noting that our method uses two images while other methods use one image. However, our method is significantly more lightweight and more efficient. We use only a few hundred parameters to outperform methods that use millions of parameters.
So, the take-home message is that using two images instead of one allows us to achieve state-of-the-art performance in illuminant estimation with an extremely lightweight neural network. Most smartphones nowadays have two or more cameras, such as the Galaxy S20 and S21 series, which makes our method easy to deploy to such smartphones in order to achieve better illumination estimation accuracy, which results in higher quality photos.
 Finlayson, G. et al. "Colour constancy using the chromagenic constraint." IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005.
 Barnard, K. et al. "A comparison of computational color constancy algorithms. I: Methodology and experiments with synthesized data." IEEE Transactions on Image Processing 11.9 (2002): 972-984.
 Cheng, D. et al. "Illuminant estimation for color constancy: why spatial-domain methods work and the role of the color distribution." JOSA A 31.5 (2014): 1049-1058.