Description

Recent years shown a large interest and tremendous advances have been achieved in image and video restoration and enhancement. A large number of solutions were proposed ranging from handcrafted designs to fully learned and generative models. Gradually the focus in image restoration shifted from improved fidelity of the results to improved perceptual quality. At the same time the studied corruptions departed from the standard synthetic/artificial corruptions in controlled environments to fully realistic and in the wild settings -- a fertile soil for developing semi- and unsupervised solutions.

This tutorial is on the current state-of-the-art in the fields of image and video restoration and enhancement with applications to autonomous driving and smartphone cameras. Moreover, this tutorial will convey the importance of the restoration and enhancement for the subsequent higher level computer vision tasks.

Talks

ETH Zurich

Abstract: Image and video restoration and enhancement tasks can be seen as image-to-image and video-to-video translations, respectively. We will review the image to image and video to video literature going through the fully supervised settings (when corresponding pairs of images/videos are available), semi-supervised and unsupervised (when only unpaired images/videos are available). We will review representative image to image translation architectures such as pix2pix, CycleGAN, ComboGAN, StarGAN, MUNIT, and SMIT, as well as very recent video to video translation methods including 3D CycleGAN, RecycleGAN and UVIT.

Bio: Radu Timofte is lecturer and research group leader in the Computer Vision Laboratory, at ETH Zurich, Switzerland. He obtained a PhD degree in Electrical Engineering at the KU Leuven, Belgium in 2013, the MSc at the Univ. of Eastern Finland in 2007, and the Dipl. Eng. at the Technical Univ. of Iasi, Romania in 2006. He serves as a reviewer for top journals (such as TPAMI, TIP, IJCV, TNNLS, TCSVT, CVIU, PR) and conferences (ICCV, CVPR, ECCV, NeurIPS, ICLR), as an area editor for Elsevier's CVIU journal (from 2017), and as an associate editor for SIAM’s SIIMS (from 2020). He served as area chair for ACCV 2018, ICCV 2019 and as SPC for IJCAI 2019, 2020. He received a NIPS 2017 best reviewer award. His work received several awards, including a best scientific paper award at ICPR 2012, the honorable mention award at FG 2017, the best student paper award at BMVC 2019, and his team won a number of challenges including traffic sign detection (IJCNN 2013) and apparent age estimation (ICCV 2015). He is co-founder of Merantix and co-organizer of NTIRE, CLIC, PIRM and AIM events. His current research interests include deep learning, augmented perception, domain translation, image/video compression, manipulation, restoration and enhancement.

Shuhang Gu

ETH Zurich

Abstract: Deep neural networks have achieved great successes in a wide range of image restoration tasks. Given paired training data, image restoration networks can be trained to map low quality images to high quality ones. Until very recently, the most common approach was to train the network by minimizing the l2 loss between the output and target image. This it a natural choice if the PSNR index is taken as the ultimate quality metric, since it directly maximizes the latter. As the most commonly used measurement for the image restoration task is the PSNR index, most of previous works train restoration networks via minimizing the l2 distance between the estimation and target images, which is closely related to the PSNR index. However, substantial research in image quality assessment have shown that the PSNR metric poorly aligns with the human perceptual quality. That is, higher PSNR index does not necessarily imply higher perceptual quality and vice versa. A direct maximization of the PSNR, obtained by minimizing l2, generally leads to a blurry result, due to its close relation to the arithmetic mean estimator. The development of alternative image quality measures that better correlates with the human perception is therefore an important field of research. Another branch of approaches exploit the recent developments in Generative Adversarial Networks (GANs), aiming to generate photo-realistic output images, indistinguishable from real high quality images. This tutorial will review recent progresses on bith fidelity and perceptual image restoration.

Bio: Shuhang Gu received the B.E. degree from the School of Astronautics, Beijing University of Aeronautics and Astronautics, China, in 2010, the M.E. degree from the Institute of Pattern Recognition and Artificial Intelligence, Huazhong University of Science and Technology, China, in 2013, and Ph.D. degree from the Department of Computing, The Hong Kong Polytechnic University, in 2017. He currently holds a post-doctoral position at ETH Zurich, Switzerland. His research interests include image restoration, enhancement and compression.

Martin Danelljan

ETH Zurich

Abstract: Many image enhancement and restoration tasks, such as image denoising and super resolution, suffer from the unavailability or scarcity of true ground-truth data. This severely complicates training and evaluation of methods in real settings. Instead of addressing these problems directly, the primary focus of research have been on artificially generated paired data. For example, in the case of image denoising, paired data is most commonly obtained by adding white Gaussian noise to clean images. Similarly, bicubic downsampling is applied in the context of super resolution, to obtain the corresponding low-resolution image. However, these image degradation techniques only serve as coarse approximations of their real counterparts. In reality, the degradation process is far more complex and often unknown. For example, bicubic downsampling significantly alters the image characteristics by reducing noise and other high-frequency content present in real images. Image enhancement and restoration methods trained in such artificial conditions therefore cannot be expected to generalize to the real setting. We will study the problems induced by the artificial setting, when models are applied to real data. We will review methods for semi- and unsupervised image enhancement and restoration, with a particular focus on the problem of real-world super-resolution, that aim at reducing or eliminating the need for artificially created paired data.

Bio: Martin Danelljan is a postdoctoral researcher at ETH Zurich, Switzerland. He received his Ph.D. degree from Linköping University, Sweden in 2018. His Ph.D. thesis was awarded the biannual Best Nordic Thesis Prize at SCIA 2019. His main research interests are online and meta-learning methods for visual tracking and video object segmentation, deep probabilistic models for image generation, and machine learning with no or limited supervision. His research in the field of visual tracking, in particular, has attracted much attention. In 2014, he won the Visual Object Tracking (VOT) Challenge and the OpenCV State-ofthe-Art Vision Challenge. Furthermore, he achieved top ranks in VOT2016 and VOT2017 challenges. He received the best paper award at ICPR 2016 and best student paper at BMVC 2019.

Dengxin Dai

ETH Zurich

Abstract: Adverse weather or illumination conditions create visibility problems for both people and the sensors that power automated systems~\cite{vision:atmosphere}. While sensors and the down-streaming vision algorithms are constantly getting better, their performance are mainly benchmarked with respect to clear weather images. However, in many outdoor applications, including autonomous driving, the ability to robustly cope with bad'' weather conditions is absolutely essential. One typical example of adverse weather conditions is fog, which degrades the visibility of a scene significantly. The denser the fog is, the more severe this problem becomes. During the past years, the community has made a tremendous progress on image dehazing (defogging) and image enhancement to increase the visibility of foggy images and nighttime images. The last few years have also witnessed a leap in semantic object recognition. A great deal of effort is made specifically on semantic road scene understanding. However, the extension of these techniques to adverse weather/illumination conditions has not received due attention, despite its importance in outdoor applications. This tutorial will teach recent technologies developed on extending the state-of-the-art semantic understanding algorithms from clear weather conditions to adverse weather/illumination conditions, especially to foggy and nighttime driving scenarios.

Bio: Dengxin Dai is a Lecturer and Group Leader working with the Computer Vision Lab at ETH Zurich. In 2016, he obtained his PhD in Computer Vision at ETH Zurich. Since then he is the Team Leader of TRACE-Zurich, working on Autonomous Driving within the R&D project "TRACE: Toyota Research on Automated Cars in Europe". His research interests lie in autonomous driving, robust perception in adverse weather and illumination conditions, automotive sensors and computer vision under limited supervision. He has been an organizer of the CVPR'19 Workshop Vision for All Seasons: Bad Weather and Nighttime and the ICCV'19 workshop Autonomous Driving. He has been a program committee member of several major computer vision conferences and received multiple outstanding reviewer awards. He is a guest editor for the IJCV special issue Vision for All Seasons and is an area chair for WACV 2020.

Zhiwu Huang

ETH Zurich

Abstract: Many image enhancement and restoration tasks, such as image denoising and super resolution, suffer from the unavailability or scarcity of \emph{true} paired data. This severely complicates training and evaluation of methods in real settings. Instead of addressing these problems directly, the primary focus of research have been on artificially generated paired data. For example, in the case of image denoising, paired data is most commonly obtained by adding white Gaussian noise to \emph{clean} images. Similarly, bicubic downsampling is applied in the context of super resolution, to obtain the corresponding low-resolution image. However, these image degradation techniques only serve as coarse approximations of their real counterparts. In reality, the degradation process is far more complex and often \emph{unknown}. For example, bicubic downsampling significantly alters the image characteristics by reducing noise and other high-frequency content present in real images. Image enhancement and restoration methods trained in such artificial conditions therefore cannot be expected to generalize to the real setting. Here, we will study the problems induced by the artificial setting, when models are applied to real data. We will review methods for semi- and unsupervised image enhancement and restoration that aim at reducing or eliminating the need for artificially created paired data. Methods such as Noise2Noise \cite{noise2noise} and Cycle-in-Cycle \cite{cycleincycle}, and degradation learning \cite{deggan} will be covered.

Bio: Zhiwu Huang is currently a postdoctoral researcher in the Computer Vision Lab, ETH Zurich, Switzerland. He received the PhD degree from Institute of Computing Technology, Chinese Academy of Sciences in 2015. His main research interest is in human-focussed video analysis with Riemannian manifold networks and Wasserstein generative models.

Robby T. Tan

Yale-NUS College & NUS

Abstract: Rain produces undesirable visual artefacts that can significantly impair visibility, causing many computer vision systems, such as self-driving cars, surveillance systems, autonomous drones, etc., to break down. Rain introduces artefacts in the forms of rain-streaks, rain accumulation/veiling effect (visually similar to mist or fog), and raindrops that are adhered to the camera lens or a car's windscreen. In this tutorial, we intend to discuss how we can restore the degraded background information due to these rain artefacts and their compound problems, and thus enhance the visibility of the scenes. To deal with rain streaks and rain accumulation, we will discuss briefly how the conventional non-deep-learning methods evolved, and more focus on how the more recent deep learning methods works. Most of the current deep learning based methods are trained in a supervised manner, provided with ground truth data. However, obtaining real ground truth data is extremely difficult. Therefore, existing methods rely on rendered synthetic data. The problem with this approach is that synthetic data is significantly different to real data in terms of degradation complexity, background variations, lighting variations, etc. Hence, to be able to resolve the problem of rain streaks and rain accumulation properly, we need to go beyond synthetic training. Aside from rain streak and rain accumulation, raindrops adhered to a glass window or camera lens can severely hamper the visibility of a background scene and degrade an image considerably. Some non-deep learning methods have been proposed to deal with adherent raindrops, but the results are inadequate. The problem is intractable, since first the regions occluded by raindrops are not given. Second, the information about the background scene of the occluded regions is completely lost for most part. To resolve the problem, a state of the art method applies an attentive generative network using adversarial training. The main idea is to inject visual attention into both the generative and discriminative networks. During the training, the visual attention learns about raindrop regions and their surroundings. Hence, by injecting this information, the generative network will pay more attention to the raindrop regions and the surrounding structures, and the discriminative network will be able to assess the local consistency of the restored regions.

Bio: Robby T. Tan is an Associate Professor at Yale-NUS College and also at Electrical and Computer Engineering Department, NUS. Before coming to Singapore, he was an Assistant Professor at Utrecht University in the Netherlands, a research associate at Imperial College London, and a research scientist at NICTA/Australian National University. He received his PhD degree in Computer Science from the University of Tokyo, Japan. He has organized the Emerging Topics on Image Restoration and Enhancement (IREw) workshop in conjunction with ACCV 2014, and a Workshop on Vision for All Seasons: Bad Weather and Nighttime in conjunction with CVPR 2019. He was area chairs in ACCV 2010 and ACCV 2018. He also served as publication chair in ECCV 2016 and regularly as program committee members for CVPR/ICCV/ECCV. His work on dehazing in CVPR 2008 is regarded as the pioneer work in single image dehazing literature. His research focus is in the areas of bad weather/nighttime and physics based vision.