Sensor-Independent Illumination Estimation for DNN Models

Mahmoud Afifi1 and Michael S. Brown1,2

1York University, 2Samsung Research


Abstract

While modern deep neural networks (DNNs) achieve state-of-the-art results for illuminant estimation, it is currently necessary to train a separate DNN for each type of camera sensor. This means when a camera manufacturer uses a new sensor, it is necessary to re-train an existing DNN model with training images captured by the new sensor. This paper addresses this problem by introducing a novel sensor-independent illuminant estimation framework. Our method learns a sensor-independent working space that can be used to canonicalize the RGB values of any arbitrary camera sensor. Our learned space retains the linear property of the original sensor raw-RGB space and allows unseen camera sensors to be used on a single DNN model trained on this working space. We demonstrate the effectiveness of this approach on several different camera sensors and show it provides performance on par with state-of-the-art methods that were trained per sensor.

Figure
(A) Traditional learning-based illuminant estimation methods train or fine-tune a model per camera sensor. (B) Our method can be trained on images captured by different camera sensors and generalizes well for unseen camera sensors.

Solution

We introduce a sensor-independent learning framework for illuminant estimation. The idea is similar to the color space conversion process applied onboard cameras that maps the sensor-specific RGB values to a perceptual-based color space — namely, CIE XYZ. The color space conversion process estimates a color space transform (CST) matrix to map white-balanced sensor-specific raw-RGB images to CIE XYZ. This process is applied onboard cameras after the illuminant estimation and white-balance step, and relies on the estimated scene illuminant to compute the CST matrix. Our solution, however, is to learn a new space that is used before the illuminant estimation step.

Figure
A scene captured by two different camera sensors results in different ground truth illuminants due to different camera sensor responses. We learn a device-independent working space that reduces the difference between ground truth illuminants of the same scenes.


Specifically, we design a novel unsupervised deep learning framework that learns how to map each input image, captured by arbitrary camera sensor, to a non-perceptual sensor-independent working space. Mapping input images to this space, allows us to train our model using training sets captured by different camera sensors achieving good accuracy and generalizing well for unseen camera sensors.

Figure
Our proposed method consists of two networks: (i) a sensor mapping network and (ii) an illuminant estimation network. Our networks are trained jointly in an end-to-end manner to learn an image-specific mapping matrix (resulting from the sensor mapping network) and scene illuminant in the learned space (resulting from the illuminant estimation network). The final estimated illuminant is produced by mapping the result illuminant from our learned space to the input image's camera-specific raw space.

Results

We used the leave-one-out cross-validation scheme to evaluate our method on all cameras of NUS 8-Camera, Gehler-Shi, and Cube+ datasets. We further tested our trained models on Cube+ challenge and INTEL-TUT dataset. Read our paper for more results.

Figure
Qualitative results of our method. (A) Input raw-RGB images. (B) After mapping images in (A) to the learned space. (C) After correcting images in (A) based on our estimated illuminants. (D) Corrected by ground truth illuminants.

Angular errors on NUS 8-Camera and Gehler-Shi datasets. For more information, please read our paper.
Methods NUS 8-Camera Gehler-Shi
Mean Median Best 25% Worst 25% Mean Median Best 25% Worst 25%
Avg. sensor-independent methods 4.263.250.999.43 5.10 4.03 1.91 10.77
Avg. sensor-dependent methods 2.401.640.50 5.752.621.750.505.95
Ours 2.051.50 0.524.48 2.77 1.93 0.55 6.53

Angular errors on Cube/Cube+ datasets. For more information, please read our paper.
Methods Cube Cube+
Mean Median Best 25% Worst 25% Mean Median Best 25% Worst 25%
Avg. sensor-independent methods 3.572.470.648.304.983.320.82 11.77
Avg. sensor-dependent methods 1.540.920.263.852.041.020.255.58
Ours 1.981.360.404.642.14 1.440.44 5.06

Files
Paper Supplementary Materials Presentation Source Code
Paper Supplementary Materials PPSX | PPTX Code

BibTeX
@inproceedings{Afifi2019Sensor,
booktitle = {British Machine Vision Conference (BMVC)},
title = {Sensor-Independent Illumination Estimation for DNN Models},
author = {Afifi, Mahmoud and Brown, Michael S.},
year = {2019},
}

This page contains files that could be protected by copyright. They are provided here for reasonable academic fair use.
Copyright © 2019