The 2nd Workshop on

Visual Understanding by Learning from Web Data 2018

18th June, 2018

Salt Lake City, Utah

in conjunction with CVPR 2018

==> WebVision2018 challenge results are released! <==

The recent success of deep learning has shown that a deep architecture in conjunction with abundant quantities of labeled training data is the most promising approach for most vision tasks. However, annotating a large-scale dataset for training such deep neural networks is costly and time-consuming, even with the availability of scalable crowdsourcing platforms like Amazon’s Mechanical Turk. As a result, there are relatively few public large-scale datasets (e.g., ImageNet and Places2) from which it is possible to learn generic visual representations from scratch.

Thus, it is unsurprising that there is continued interest in developing novel deep learning systems that train on low-cost data for image and video recognition. Among different solutions, crawling data from Internet and using the web as a source of supervision for learning deep representations has shown promising performance for a variety of important computer vision applications. However, the datasets and tasks differ in various ways, which makes it difficult to fairly evaluate different solutions, and identify the key issues when learning from web data.

This workshop aims at promoting the advance of learning state-of-the-art visual models directly from the web, and bringing together computer vision researchers interested in this field. To this end, we release a large scale web image dataset named WebVision for visual understanding by learning from web data. The datasets consists of 16 million of web images crawled from Interenet for 5,000 visual concepts. A validation set consists of around 250K images with human annotation will be provided for the convenience algorithm development.

Based on this dataset, we also organize the second Challenge on Visual Understanding by Learning from Web Data. The final results will be announced at the workshop, and the winners will be invited to present their approaches at the workshop. An invited paper tack will also be included in the workshop.

News 12.06.2018: Challenge results are released! Congratulations to winners, and thank all teams for your participation

News 07.06.2017: The room number for WebVision workshop has been confirmed, Room 150-DEF. See you there!

News 06.06.2017: The submission deadline has been extended to 10th June 2018, View details here.

News 18.03.2017: The WebVision 2018 challenge has been launched. Click here to participate in the challenge!

News 05.12.2017: The workshop website is now online.

Important Dates

Challenge Launch Date	March 18, 2018
Challenge Submissions Deadline	~~June 8, 2018~~ June 10, 2018
Challenge Award Notification	~~June 10, 2018~~ June 12, 2018
Workshop date (co-located with CVPR'18)	June 18, 2018

All deadlines are at 23:59 Pacific Standard Time.

Workshop Overview

The WebVision workshop contains a challenge track and an poster track:

WebVision Challenge Track

Researchers are invited to participate the WebVision challenge, which aims to advance the area of learning useful knowledge and effective representation from noisy web images and meta information. The challenge is based on the WebVision 2018 dataset, which is composed of training, validation, and test set. The training set is downloaded from Web without any human annotation. The validation and test set are human annotated, where the labels of validation data are provided and the ones of test data are withheld. To imitate the setting of learning from web data, the participants are required to learn their models solely on the training set and submit classification results on the test set. In this sense, the validation data and labels could be simply used to tune hyper-parameters and cannot be used to learn the model weights.

The WebVision dataset provides the web images and their corresponding meta information (e.g., query, title, comments, etc.) and more information can be found at the dataset page. Learning from web data poses several challenges such as

Label Noise: we can infer pseudo-labels for each instance from web metadata. Such labels are inherently noisy due to inconsistency in the metadata, weak because they typically tag concepts at a coarser granularity than required, and incomplete because they are not reliably present.
Better use of meta and cross-modal information: current approaches do not fully exploit either the semantic richness of the available metadata nor do they take advantage of much cross-modal information (e.g., audio and video) present in most web content. Addressing this requires us to consider research issues such as knowledge mining from unstructured data, joint representation learning from both images and language, joint models for audio-visual data, etc.
Transfer Learning: the computer vision community recognizes that it is important to generalize the knowledge learnt from one dataset to new domains and new tasks. Therefore, effectively adapting the visual representations learned from the WebVision dataset to new tasks with low sample complexity is a key issue of significant theoretical interest and practical importance.

Participant are encouraged to design new methods to solve these challenges.

Poster Track

A poster session will be held at the workshop. The goal is to provide a stimulating space for researchers to share their works with scientific peers. We welcome researchers to submit their recent works on any topics related to learning from web data.

Submission to the poster paper track does not require participation in the challenge track.
The submission can be published or unpublished work, but have to be accessible publicly. We recommend authors to upload their paper on arXiv, but other publicly accessible link is also acceptable.
There is no requirement on paper format or page limitation. We recommend the CVPR formatting style with 4-8 pages.
The submission will be reviewed by workshop chairs. Accepted papers will be presented at the poster session at the workshop. Note that all accepted papers will be linked on the workshop website, but will NOT appear in the CVPR workshop proceedings.
Poster paper are reviewed in a rolling base until the places are fulfilled. Acceptance notification will be sent out once the decision has been made. We encourage people to submit as early as possible. For papers submitted before May 15, 2018, the acceptance notification will be sent out at the latest by May 30, 2018.
How to submit? For poster paper submission, please send an email titled with "[WebVision2018 Poster Paper Submission] Your Name - Your Paper Title" to webvisionworkshop@gmail.com. The email should contain the following information

Paper Title
Author List
Keywords
Name of Main Contact
Email of Main Contact
Affiliation of Main Contact
Paper URL

You can also choose to enclose your submission in the attachment, and we will link it on the workshop website upon acceptance.

Workshop Schedule

Date: June 18th, 2018

Location: Room 150-DEF

Start Time	Event
8:30	Opening Remarks
8:40	Invited Talk: Human-machine Collaboration for Large-scale Image Annotation, Prof. Vittorio Ferrari (Google Research & Univ of Edinburgh))
9:20	Datset Update and Challenge Overview
10:00	Coffee Break
10:30	Participant Presentation by Vibranium Computer Vision Department Technology of Baidu, and Beihang University)
10:50	Participant Presentation by Overfit (University of Electronic Science and Technology of China, and SenseTime Research)
11:10	Poster Session Webly Supervised Learning Meets Zero-Shot Learning: A Hybrid Approach for Fine-Grained Classification, Li Niu, Ashok Veeraraghavan, Ashutosh Sabharwal "Unsupervised Large-Scale World Locations Dataset", Carlos Roig, David Varas, Genis Floriach, Joan Espadaler, Issey Masuda, Manuel Sarmiento, Juan Carlos Riveiro, Elisenda Bou-Balust Creating Capsule Wardrobes From Fashion Images, Wei-Lin Hsiao, Kristen Grauman Min-Entropy Latent Model for Weakly Supervised Object Detection , Fang Wan, Pengxu Wei, Jianbin Jiao, Zhenjun Han, Qixiang Ye Adversarial Complementary Learning for Weakly Supervised Object Localization , Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, Thomas S. Huang Multimodal Visual Concept Learning With Weakly Supervised Techniques , Giorgos Bouritsas, Petros Koutras, Athanasia Zlatintsi, Petros Maragos Learning Pixel-Level Semantic Affinity With Image-Level Supervision for Weakly Supervised Semantic Segmentation, Jiwoon Ahn, Suha Kwak Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation, Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, Kiyoharu Aizawa Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos, De-An Huang, Shyamal Buch, Lucio Dery, Animesh Garg, Li Fei-Fei, Juan Carlos Niebles NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning , Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts , Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, Ahmed Elgammal Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning , Weifeng Ge, Sibei Yang, Yizhou Yu Weakly Supervised Instance Segmentation Using Class Peak Response , Yanzhao Zhou, Yi Zhu, Qixiang Ye, Qiang Qiu, Jianbin Jiao Learning Facial Action Units From Web Images With Scalable Weakly Supervised Clustering , Kaili Zhao, Wen-Sheng Chu, Aleix M. Martinez
	Lunch Break
14:00	Invited Talk: Learning Single-Image 3D from the Web, Prof. Jia Deng (Univ of Michigan)
14:40	Invited Talk: Learning from Web Data and Adapting Beyond It, Dr. Boqing Gong (Tencent AI Lab / ICSI, UC Berkeley)
15:20	Participant Presentation by ACRV_ANU (Australian National University, and Australian Center of Excelence for Robotic Vision)
15:40	Award Session & Closing Remarks

Speakers

Prof. Vittorio Ferrari

Prof. Jia Deng

Prof. Boqing Gong

People

General Chairs

Jesse Berent

Abhinav Gupta

Rahul Sukthankar

Luc Van Gool

Program Chairs

Wen Li

Limin Wang

Wei Li

Eirikur Agustsson