AIM2022: Advances in Image Manipulation workshop and challenges on image and video manipulation

Join AIM 2022 workshop online Zoom for LIVE, talks, Q&A, interaction

The event starts 23.10.2022 at 2:00 EDT / 06:00 UTC / 09:00 Israel time / 14:00 China time.
Check the AIM 2022 schedule.
The recording of the whole AIM 2022 event:

Call for papers

Image manipulation is a key computer vision tasks, aiming at the restoration of degraded image content, the filling in of missing information, or the needed transformation and/or manipulation to achieve a desired target (with respect to perceptual quality, contents, or performance of apps working on such images). Recent years have witnessed an increased interest from the vision and graphics communities in these fundamental topics of research. Not only has there been a constantly growing flow of related papers, but also substantial progress has been achieved.

Each step forward eases the use of images by people or computers for the fulfillment of further tasks, as image manipulation serves as an important frontend. Not surprisingly then, there is an ever growing range of applications in fields such as surveillance, the automotive industry, electronics, remote sensing, or medical image analysis etc. The emergence and ubiquitous use of mobile and wearable devices offer another fertile ground for additional applications and faster methods.

This workshop aims to provide an overview of the new trends and advances in those areas. Moreover, it will offer an opportunity for academic and industrial attendees to interact and explore collaborations.

This workshop builds upon the success of Advances in Image Manipulation (AIM) workshop at ICCV 2021, ECCV 2020,ICCV 2019, Mobile AI (MAI) workshop at CVPR 2022 , CVPR 2021 , Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018 , the workshop and Challenge on Learned Image Compression (CLIC) editions at CVPR 2018, CVPR 2019, CVPR 2020, CVPR 2021, , CVPR 2022 and the New Trends in Image Restoration and Enhancement (NTIRE) editions: at CVPR 2017 , 2018, 2019 , 2020 , 2021 and 2022 and at ACCV 2016. Moreover, it relies on the people associated with the PIRM, CLIC, MAI, AIM, and NTIRE events such as organizers, PC members, distinguished speakers, authors of published papers, challenge participants and winning teams.

Papers addressing topics related to image/video manipulation, restoration and enhancement are invited. The topics include, but are not limited to:

Image-to-image translation
Video-to-video translation
Image/video manipulation
Perceptual manipulation
Image/video generation and hallucination
Image/video quality assessment
Image/video semantic segmentation
Perceptual enhancement
Multimodal translation
Depth estimation
Saliency and gaze estimation
Image/video inpainting
Image/video deblurring
Image/video denoising
Image/video upsampling and super-resolution
Image/video filtering
Image/video de-hazing, de-raining, de-snowing, etc.
Demosaicing
Image/video compression
Removal of artifacts, shadows, glare and reflections, etc.
Image/video enhancement: brightening, color adjustment, sharpening, etc.
Style transfer
Hyperspectral imaging
Underwater imaging
Aerial and satellite imaging
Methods robust to changing weather conditions / adverse outdoor conditions
Image/video manipulation on mobile devices
Image/video restoration and enhancement on mobile devices
Studies and applications of the above.

AIM 2022 challenges

One needs to check the corresponding Codalab competition(s) in order to learn more about and to register to access the data and participate in the challenge(s) of interest.

Important dates

Challenges Event	Date (always 23:59 CET)
Site online	May 23, 2022
Release of train data and validation data	May 24, 2022
Validation server online	June 1, 2022
Final test data release, validation server closed	July 23, 2022
Test phase submission deadline	July 30, 2022
Fact sheets, code/executable submission deadline	July 30, 2022
Preliminary test results release to the participants	August 2, 2022
Paper submission deadline for entries from the challenges	August 14, 2022 (EXTENDED)

Workshop Event	Date (always 23:59 CET)
Paper submission deadline	July 31, 2022 (EXTENDED)
Paper submission deadline (only for methods from AIM 2022 and Mobile AI 2022 challenges and papers reviewed elsewhere!)	August 14, 2022 (EXTENDED)
Paper decision notification	August 15, 2022
Camera ready deadline	August 22, 2022
Workshop day	October 23, 2022 (VIRTUAL)

Submit

Instructions and Policies

Format and paper length

A paper submission has to be in English, in pdf format, and at most 14 pages (excluding references) in single-column, ECCV style. The paper format must follow the same guidelines as for all ECCV 2022 submissions.
AIM 2022 and ECCV 2022 author guidelines

Double-blind review policy

The review process is double blind. Authors do not know the names of the chair/reviewers of their papers. Reviewers do not know the names of the authors.

Dual submission policy

Dual submission is not allowed. If a paper is submitted also to ECCV and accepted, the paper cannot be published both at the ECCV and the workshop.

Submission site

https://cmt3.research.microsoft.com/AIMWC2022/

Proceedings

Accepted and presented papers will be published after the conference in ECCV Workshops proceedings together with the ECCV2022 main conference papers.

Author Kit

The author kit provides a LaTeX2e template for paper submissions.
Please refer to the example for detailed formatting instructions.

People

Organizers (TBU)

Radu Timofte, University of Wurzburg and ETH Zurich,
Andrey Ignatov, AI Benchmark and ETH Zurich,
Ren Yang, ETH Zurich,
Marcos V. Conde, University of Wurzburg,
Furkan Kınlı, Özyeğin University,

PC Members (TBU)

Codruta Ancuti, UPT
Boaz Arad, Ben-Gurion University of the Negev
Siavash Arjomand Bigdeli, DTU
Michael S. Brown, York University
Jianrui Cai, The Hong Kong Polytechnic University
Chia-Ming Cheng, MediaTek
Cheng-Ming Chiang, MediaTek
Sunghyun Cho, Samsung
Marcos V. Conde, University of Wurzburg
Chao Dong, SIAT
Weisheng Dong, Xidian University
Touradj Ebrahimi, EPFL
Paolo Favaro, University of Bern
Graham Finlayson, University of East Anglia
Corneliu Florea, University Politechnica of Bucharest
Bastian Goldluecke, University of Konstanz
Shuhang Gu, OPPO & University of Sydney
Christine Guillemot, INRIA
Felix Heide, Princeton University & Algolux
Chiu Man Ho, OPPO,
Hiroto Honda, Mobility Technologies Co Ltd.
Andrey Ignatov, ETH Zurich
Aggelos Katsaggelos, Northwestern University
Jan Kautz, NVIDIA
Furkan Kınlı, Özyeğin University
Christian Ledig, University of Bamberg
Seungyong Lee, POSTECH
Kyoung Mu Lee, Seoul National University
Juncheng Li, The Chinese University of Hong Kong
Yawei Li, ETH Zurich
Stephen Lin, Microsoft Research
Guo Lu, Beijing Institute of Technology
Kede Ma, City University of Hong Kong
Vasile Manta, Technical University of Iasi
Rafal Mantiuk, University of Cambridge
Zibo Meng, OPPO
Yusuke Monno, Tokyo Institute of Technology
Hajime Nagahara, Osaka University
Vinay P. Namboodiri, University of Bath/li>
Federico Perazzi, Bending Spoons
Fatih Porikli, Qualcomm CR&D
Antonio Robles-Kelly, Deakin University
Aline Roumy, INRIA
Christopher Schroers, Disney Research | Studios
Nicu Sebe, University of Trento
Eli Shechtman, Creative Intelligence Lab at Adobe Research
Gregory Slabaugh, Queen Mary University of London
Sabine Süsstrunk, EPFL
Yu-Wing Tai, Kuaishou Technology & HKUST
Robby T. Tan, Yale-NUS College
Masayuki Tanaka, Tokyo Institute of Technology
Hao Tang, ETH Zurich
Qi Tian, Huawei Cloud & AI
Radu Timofte, University of Wurzburg & ETH Zurich
George Toderici, Google
Luc Van Gool, ETH Zurich & KU Leuven
Longguang Wang, National University of Defense Technology
Yingqian Wang, National University of Defense Technology
Gordon Wetzstein, Stanford University
Ming-Hsuan Yang, University of California at Merced & Google
Ren Yang, ETH Zurich
Wenjun Zeng, Microsoft Research
Kai Zhang, ETH Zurich
Yulun Zhang, ETH Zurich
Jun-Yan Zhu, Carnegie Mellon University
Wangmeng Zuo, Harbin Institute of Technology

Invited Talks

Sabine Süsstrunk

EPFL

Title: Uncovering local semantics in CNNs and GANs

Abstract: Automatically localizing similar semantic concepts within an image or a set of images allows for many applications, such as image segmentation, localization, and image editing. We propose Deep Feature Factorization (DFF), a method capable of detecting hierarchical cluster structures in feature space of a convolutional neural network (CNN). These clusters are visualized as heat maps, which highlight semantically matching regions across a set of images, revealing what the network `perceives' as similar. Analogue structures can also be found in generative adversarial networks (GANs). Focusing on StyleGAN, we introduce two simple and effective methods for making local, semantically-aware edits to a GAN output image. Our methods require neither supervision from an external model, nor involve complex spatial morphing operations. Semantic editing is demonstrated on variety of scenes, and even real photographs using GAN inversion.

Bio: Sabine Süsstrunk is Full Professor and Director of the Image and Visual Representation Lab in the School of Computer and Communication Sciences (IC) at the Ecole Polytechnique Fédérale (EPFL), Lausanne, Switzerland. Her main research areas are in computational imaging, computer vision, machine learning, and computational image quality and aesthetics. She is President of the Swiss Science Council SSC, Founding Member and Member of the Board of the EPFL-WISH (Women in Science and Humanities) Foundation, board member of the SRG SSR (Swiss Radio and Television Corporation), and Co-Founder and board member of Largo Films, Ltd. Sabine is a Fellow of IEEE, IS&T, and the Swiss Academy of Engineering Sciences (SATW).

Felix Heide

Princeton University & Algolux

Title: The Differentiable Camera: Designing Cameras to Detect the Invisible

Abstract: Although today's cameras fuel diverse applications, from personal photography to self-driving vehicles, they are designed in a compartmentalized fashion where the optics, sensor, image processing pipeline, and vision models are often devised in isolation: a camera design is decided by intermediate metrics describing optical performance, signal to noise ratio, and image quality, even though object detection scores may only matter for the camera application. In this talk, I will present a differentiable camera architecture, including compound optics, sensing and exposure control, image processing, and downstream vision models. This architecture allows us to learn cameras akin to neural networks, entirely guided by downstream loss functions. Learned cameras move computation to the optics, with entirely different optical stacks for different vision tasks (and beating existing stacks such as Tesla's Autopilot). The approach allows us to learn entirely new domain-specific cameras that perform imaging and vision tasks jointly, and learn active illumination together with the image pipeline, achieving accurate dense depth and vision tasks in heavy fog, snow, and rain (beating scanning lidar methods). Finally, I will describe an approach that makes the scene itself differentiable, allowing us to backpropagate gradients through the entire capture and processing chain in an inverse rendering fashion. As such, the proposed novel breed of learned cameras brings unprecedented capabilities in optical design, imaging, and vision.

Bio: Felix Heide is an Assistant Professor at Princeton University and Co-Founder and Chief Technology Officer of self-driving vehicle startup Algolux. He is researching the theory and application of computational imaging. As such, Felix's work lies at the intersection of optics, machine learning, optimization, computer graphics, and computer vision. Felix received his Ph.D. from the University of British Columbia. He obtained his MSc from the University of Siegen and was a postdoc at Stanford University. His doctoral dissertation won the Alain Fournier Ph.D. Dissertation Award and the SIGGRAPH outstanding doctoral dissertation award. He won an NSF CAREER Award and Sony Young Faculty Award 2021. He co-founded the autonomous driving startup Algolux, and his research is directly used by Waymo, Cruise, and Google as part of their autonomous vehicle programs.

Gordon Wetzstein

Stanford University

Title: Efficient Neural Scene Representation, Rendering, and Generation

Abstract: Neural radiance fields and scene representation networks offer unprecedented capabilities for photorealistic scene representation, view interpolation, and many other tasks. In this talk, we discuss expressive scene representation network architecture, efficient neural rendering approaches, and generalization strategies that allow us to generate photorealistic multi-view-consistent humans or cats using state-of-the-art 3D GANs.

Bio: Gordon Wetzstein is an Associate Professor of Electrical Engineering and, by courtesy, of Computer Science at Stanford University. He is the leader of the Stanford Computational Imaging Lab and a faculty co-director of the Stanford Center for Image Systems Engineering. At the intersection of computer graphics and vision, artificial intelligence, computational optics, and applied vision science, Prof. Wetzstein's research has a wide range of applications in next-generation imaging, wearable computing, and neural rendering systems. Prof. Wetzstein is the recipient of numerous awards, including an NSF CAREER Award, an Alfred P. Sloan Fellowship, an ACM SIGGRAPH Significant New Researcher Award, a Presidential Early Career Award for Scientists and Engineers (PECASE), an SPIE Early Career Achievement Award, an Electronic Imaging Scientist of the Year Award, an Alain Fournier Ph.D. Dissertation Award as well as many Best Paper and Demo Awards.

Bjorn Ommer

University of Munich

Title: Stable Diffusion++: Democratizing Visual Synthesis

Abstract: The ultimate goal of computer vision are models that can understand our (visual) world. Recent deep generative models for visual synthesis open up new avenues towards scene understanding by providing accurate representations of both, the rich details and the diversity featured by large, heterogeneous image datasets. Still, they exhibit specific limitations that restrict their applicability and performance especially in complex tasks such as high-resolution image synthesis. We will discuss a solution, latent diffusion models a.k.a. "Stable Diffusion", that significantly improves the efficiency of diffusion models. Now billions of training samples can be summarized in compact representations of just a few gigabyte so that the approach runs on even consumer GPUs, thus making high-quality visual synthesis accessible for everyone. We will then discuss ongoing work that shows what we should better NOT learn to represent with generative models and we instead propose a semi-parametric approach.

Bio: Björn Ommer is a full professor at the University of Munich where he is heading the Machine Vision and Learning Group. Before he was a full professor in the department of mathematics and computer science at Heidelberg University. He received his diploma in computer science from University of Bonn and his PhD from ETH Zurich. Thereafter, he was a postdoc in the vision group of Jitendra Malik at UC Berkeley. Björn serves as an associate editor for IEEE T-PAMI. His research interests include semantic scene understanding, visual synthesis and retrieval, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within the digital humanities and the life sciences.

Bo Zhu

Meta Reality Labs

Title: Practical considerations for on-device denoising

Abstract: This talk addresses some practical considerations of ML-based denoising for on-device contexts. We will discuss rapid and inexpensive methods for photon transfer curve characterization without the need for flat field illuminators and integration spheres, the importance of variance stabilization transforms in quantized, computationally efficient models, as well as compression of raw bayer data for offloaded inference scenarios.

Bio: Bo Zhu is a Research Scientist Manager of AI at Meta Reality Labs, leading a computer vision research team to advance imaging performance in AR & VR devices. Prior to joining Meta, Bo was cofounder and CTO of BlinkAI, a spinoff from ML image reconstruction research (Nature, 2018) he led as a postdoc at Harvard, whose AI video denoising products were commercially deployed on tens of millions of devices globally (including DXOMark’s #1 ranked camera phone in 2021, Mi 11 Ultra) and awarded 2021 Product of the Year at Embedded Vision Summit. He obtained his S.B., M.Eng., and Ph.D. from MIT and performed research at the Martinos Center for Biomedical Imaging and CSAIL.

Pratul Srinivasan

Google Research

Title: Neural Inverse Rendering

Abstract: Recent progress in neural 3D object and scene representations such as Neural Radiance Fields (NeRFs) are bringing us closer to tackling the longstanding inverse rendering problem of jointly estimating geometry, materials, and lighting from observed images. In this talk, I will discuss how current neural field approaches to inverse rendering can be thought of as lying on a spectrum of simulation vs. precomputation, and how this suggests potentially interesting and fruitful research directions.

Bio: Pratul Srinivasan is a research scientist at Google Research, where he works on problems at the intersection of computer vision and graphics. His recent work has focused on view synthesis, inverse rendering, and 3D reconstruction. He completed his PhD at UC Berkeley in 2020, where he was advised by Ravi Ramamoorthi and Ren Ng and supported by an NSF Graduate Fellowship. His research has been recognized by the David J. Sakrison Memorial Prize from UC Berkeley in 2020, the ACM Doctoral Dissertation Award Honorable Mention in 2022, the ECCV Best Paper Honorable Mention Award in 2020, the ICCV Best Paper Honorable Mention Award in 2021, and the CVPR Best Student Paper Honorable Mention Award in 2022.

23.October Tel-Aviv, Israel

AIM 2022

Advances in Image Manipulation workshop

in conjunction with ECCV 2022

Sponsors

Call for papers

AIM 2022 challenges

Important dates

Submit

Instructions and Policies

Format and paper length

Double-blind review policy

Dual submission policy

Submission site

Proceedings

Author Kit

People

Organizers (TBU)

PC Members (TBU)

Invited Talks

Sabine Süsstrunk

Felix Heide

Gordon Wetzstein

Bjorn Ommer

Bo Zhu

Pratul Srinivasan

"AIM 2022: 4th Edition Overview and Awards" Radu Timofte ( University of Wurzburg, ETH Zurich )

Challenge Report 1:"Realistic Bokeh Effect Rendering on Mobile GPUs" Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 1: " Uncovering local semantics in CNNs and GANs " Sabine Susstrunk (EPFL)

Challenge Report 2:"Efficient Single-Image Depth Estimation on Mobile Devices"Andrey Ignatov (ETH Zurich)*; Grigory Malivenko (AI); Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 2: " Stable Diffusion++: Democratizing Visual Synthesis " Bjorn Ommer (University of Munich)

Challenge Report 3:"Learned Smartphone ISP on Mobile GPUs with Deep Learning" Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 3: " The Differentiable Camera: Designing Cameras to Detect the Invisible " Felix Heide (Princeton University & Algolux)

Challenge Report 4:"Reversed Image Signal Processing and RAW Reconstruction" Marcos V. Conde (University of Würzburg), Radu Timofte (University of Wurzburg & ETH Zurich)

Lunch Break

Invited Talk 4: " Neural Inverse Rendering " Pratul Srinivasan (Google Research)

Challenge Report 5:"Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs" Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich); Maurizio Denna (Synaptics); Abdel Younes (Synaptics) et al.

Challenge Report 6:"Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning" Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich); Cheng-Ming Chiang (MediaTek Inc.); Hsien-Kai Kuo (MediaTek); et al.

Invited Talk 5: " Practical considerations for on-device denoising " Bo Zhu (Meta Reality Labs)

Challenge Report 7:"Super-Resolution of Compressed Image and Video" Ren Yang (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 6: "Efficient Neural Scene Representation, Rendering, and Generation" Gordon Wetzstein (Stanford University)

Challenge Report 8:"Instagram Filter Removal"Furkan Osman Kınlı (Özyeğin University)*; Sami Menteş (Ozyegin University); Barış Özcan (Özyeğin University); Furkan Kirac (Ozyegin University); Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Closing Remarks Radu Timofte ( University of Wurzburg, ETH Zurich )

Contact:

"AIM 2022: 4th Edition Overview and Awards"
Radu Timofte ( University of Wurzburg, ETH Zurich )

Challenge Report 1:"Realistic Bokeh Effect Rendering on Mobile GPUs"
Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 1: " Uncovering local semantics in CNNs and GANs "
Sabine Susstrunk (EPFL)

Challenge Report 2:"Efficient Single-Image Depth Estimation on Mobile Devices"
Andrey Ignatov (ETH Zurich)*; Grigory Malivenko (AI); Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 2: " Stable Diffusion++: Democratizing Visual Synthesis "
Bjorn Ommer (University of Munich)

Challenge Report 3:"Learned Smartphone ISP on Mobile GPUs with Deep Learning"
Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 3: " The Differentiable Camera: Designing Cameras to Detect the Invisible "
Felix Heide (Princeton University & Algolux)

Challenge Report 4:"Reversed Image Signal Processing and RAW Reconstruction"
Marcos V. Conde (University of Würzburg), Radu Timofte (University of Wurzburg & ETH Zurich)

Invited Talk 4: " Neural Inverse Rendering "
Pratul Srinivasan (Google Research)

Challenge Report 5:"Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs"
Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich); Maurizio Denna (Synaptics); Abdel Younes (Synaptics) et al.

Challenge Report 6:"Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning"
Andrey Ignatov (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich); Cheng-Ming Chiang (MediaTek Inc.); Hsien-Kai Kuo (MediaTek); et al.

Invited Talk 5: " Practical considerations for on-device denoising "
Bo Zhu (Meta Reality Labs)

Challenge Report 7:"Super-Resolution of Compressed Image and Video"
Ren Yang (ETH Zurich)*; Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Invited Talk 6: "Efficient Neural Scene Representation, Rendering, and Generation"
Gordon Wetzstein (Stanford University)

Challenge Report 8:"Instagram Filter Removal"
Furkan Osman Kınlı (Özyeğin University)*; Sami Menteş (Ozyegin University); Barış Özcan (Özyeğin University); Furkan Kirac (Ozyegin University); Radu Timofte (University of Wurzburg & ETH Zurich) et al.

Closing Remarks
Radu Timofte ( University of Wurzburg, ETH Zurich )