Real-world photographic color image denoisers using modern deep learning architectures

Anand Bhat
8 min readMay 7, 2021

linkedin github

Introduction

Most of the image denoiser techniques focus on the removal of AWGN(additive white gaussian noise). Usually, noise is added synthetically and various techniques were involved to remove these images. But with the advancements in deep learning focus has shifted towards designing denoising architectures for real-world noisy color images. Real-world noisy images are obtained through different cameras with different settings or under low light conditions. A corresponding clean image is also obtained with low camera ISO settings or in bright light conditions. Having clean and noisy image pairs we can train deep learning convolution architectures to denoise the image. Image denoising effects could be see-through naked human eyes. I use PSNR and SSIM metrics to measure the image denoiser performance.

Business Problem

Not at all high-quality images are guaranteed in photography. Sometimes images are going to be corrupted due to low light conditions or when the camera shutter speed is slow. Images are going to be corrupted during transmission and also while compressing. Denoising these low-quality images to match them with images in ideal conditions is a very high-demanding problem.

How to map it to the DL problem?

We have two image pairs, one is noisy and another is a clean or ground truth image. We train a convolution architecture to remove the noise. It is not a classification problem. In classification, consider ‘X’ as features and ‘Y’ as either binary value or categorical value. In image denoisers, we have ‘X’ as the noisy image and ‘Y’ as the true image or clean image. We use the square loss as the loss function as we are operating at the image pixel level. We try to minimize total pixel level loss. Any of the modern optimizers like adadelta, adam can be used as the optimizer.

Measuring Metrics:

PSNR: The PSNR block computes the peak signal-to-noise ratio, in decibels, between two images. This ratio is used as a quality measurement between the original and a compressed image. The higher the PSNR, the better the quality of the compressed, or reconstructed image.

The mean-square error (MSE) and the peak signal-to-noise ratio (PSNR) is used to compare image compression quality. The MSE represents the cumulative squared error between the compressed and the original image, whereas PSNR represents a measure of the peak error. The lower the value of MSE, the lower the error.

PSNR=10log10(R*R/MSE)

R =maximum value of the pixels

MSE=Mean squared error of clean and noisy pixels

SSIM: It is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. SSIM is used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as a reference.

image credits: Wikipedia

Source of data

I have collected ‘Renoir’ and ‘NIND’ datasets from the below links. Credits to people who prepared these datasets. Find the information about people who are involved in the project and dataset from the below link.

Renoir NIND NIND research paper

I initially collected around 600 images from these sources. Images were of average 30 MB size each, and size more than 2500*2500. As it is very difficult to fit these images to RAM while training, I resized them to 256*256 and trained the models. But later I found that resizing is not a good idea as it adds its own noise or information loss while compressing. Then I cut the original images into pieces, which is perfectly fine without any information loss. For example, if the image size is 2560*2560, I cut it into 100 pieces of 256*256. With just one image I produced more than 100 images for training. In this way, I prepared a dataset of 3791 images for training and 577 images for the test.

Data augmentation is applied to the dataset flipping and rotating.

An example of a noisy and clean image

Different architectures/models

Samsung-MRDNet

This architecture was used by the Samsung team in NTIRE 2020 challenge.

Related papers
https://arxiv.org/pdf/2005.04117.pdf. This paper presents more than 10 architectures used for real-world image denoising as part of the competition in 2020. I am using architecture which won the third rank.

The research paper I am using the architecture from can be found in the below link
https://openaccess.thecvf.com/content_CVPRW_2020/papers/w31/Bao_Real_Image_Denoising_Based_on_Multi-Scale_Residual_Dense_Block_and_CVPRW_2020_paper.pdf

Explanation

Real Image Denoising based on Multi-scale Residual Dense Block. MRDN architecture is proposed by the Samsung SLSI MSL team in the competition ‘NTIRE 2020 Challenge on Real Image Denoising’.

The Multi-scale Residual Dense Network (MRDN) is based on a new basic module, the Multi-scale Residual Dense Block (MRDB), as shown in Fig. 2 (a). MRDB combines multi-scale features from the ASPP and other features from the traditional residual dense block (RDB). As shown in Fig. 2 (b), the ASPP contains four parallel network blocks including Conv 1×1, Conv Rate 6, Conv Rate 12, and pooling. The Conv Rate 6 and Conv Rate 12 denote the 3×3 dilated convolutions with the dilation rate of 6 and 12, respectively. Conv Rate 6, Conv Rate 12, and image pooling can well capture the multi-scale features of the block input. The features outputted from the ASPP are concatenated and compressed to be combined with other features from the RDB. To have a seamless local residual connection, this concatenated feature is compressed with another Conv 1×1 before an element-wise adder. The output of the MRDB preserves the same number of channels of its input to avoid the exponential complexity increase. With the MRDB as a building module, the MRDN constructs the network using a similar way as the residual dense network (RDN) by cascading the MRDBs with dense connections. Specifically, the outputs of the MRDBs are concatenated and compressed with Conv 1×1, and a global residual connection is adopted to obtain clean features.

Model created using Keras library

GitHub -find the complete code here

Result

In the below diagram denoising effect can be seen in the predicted image using the above model.

Model MWRCAnet

The above architecture for denoising is proposed by teams Baidu Research Vision and HITVPC&HUAWEI.

https://arxiv.org/pdf/2005.04117.pdf. This paper presents more than 10 architectures used for real-world image denoising as part of the competition in 2020. I am using an architecture that won the second rank which is shown above. This architecture includes a special block called the Residual Channel attention block. Read more about this here

Model created using Keras library

Result

In the below diagram denoising effect can be seen in the predicted image using the above model.

Model EDSR(Enhanced Deep Residual Network):

Reference made for above architecture:https://arxiv.org/pdf/1707.02921.pdf
concept: Actually, this network model was developed to increase the image quality of resized images when transforming them again to a higher dimension. I modified the above architecture for image denoising of photographic images

Model code using Keras library

Result

In the below diagram denoising effect can be seen in the predicted image using the above model.

PSNR improvements

As shown in the above diagram mwrcanet architecture shows the highest improvements in PSNR value.

SSIM improvement

As shown in the above diagram samsung_mrdnet shows the highest improvements in terms of SSIM.

Things I tried:

  • I tried various initial learning rates with adam optimizer and 0.0001 worked best
  • Tried 3 different architectures with different researches involved in it
  • Initially, I had used the images after resizing them, but resizing makes information loss. So I cut the original image into pieces and used it for training. It had a very good effect on result improvement.
  • For example, if the image size is 3000*3000 I got 300*300 total of 100 images from just one complete image in order to not lose information after resizing
  • As mrdn model was overfitting used regularization and dropouts
  • Used new concepts like PRelu activation, iwt and dwt (wavelets transforms) with mwrcanet model

Conclusion

Good results were obtained for all three models. In terms of PSNR value, mwrcanet beat every other architecture. In terms of SSIM Samsung-MRDN beats every other architecture. But results produced by mwrcanet architecture are very close to a clean image for human eyes. Results obtained from EDSR architecture modification is also very good and close to top architectures and I consider it as a baseline model

Further scope

  1. Here all three color channels are fed to the model at the same time and a denoised image is obtained. we can try something like feeding individual channels separately and obtain the corresponding denoised image for each part and then combining them. So for each channels, we can obtain separate weights or feed each channel and obtain the denoised channel image using single architecture which increases the number of data points for training by 3 folds.
  2. I had cut the original image into pieces but I have not recombined them. One can estimate the denoised part for image pieces and combine them to produce one large image.

watch the below video for the application demo

References made

https://arxiv.org/pdf/1707.02921.pdf

Applied ai course

https://arxiv.org/pdf/2005.04117.pdf

Image super-resolution

https://openaccess.thecvf.com/content_CVPRW_2020/papers/w31/Bao_Real_Image_Denoising_Based_on_Multi-Scale_Residual_Dense_Block_and_CVPRW_2020_paper.pdf

https://arxiv.org/pdf/2005.04117.pdf

--

--

Anand Bhat

Hi I am post graduate from NITK surathkal in 2020. I have total 3 and half years of industry experience