The recent success of deep learning has shown that a deep architecture in conjunction with abundant quantities of labeled training data is the most promising approach for most vision tasks. However, annotating a large-scale dataset for training such deep neural networks is costly and time-consuming, even with the availability of scalable crowdsourcing platforms like Amazon’s Mechanical Turk. As a result, there are relatively few public large-scale datasets (e.g., ImageNet and Places2) from which it is possible to learn generic visual representations from scratch.

Thus, it is unsurprising that there is continued interest in developing novel deep learning systems that train on low-cost data for image and video recognition. Among different solutions, crawling data from Internet and using the web as a source of supervision for learning deep representations has shown promising performance for a variety of important computer vision applications. However, the datasets and tasks differ in various ways, which makes it difficult to fairly evaluate different solutions, and identify the key issues when learning from web data.

This workshop aims at promoting the advance of learning state-of-the-art visual models directly from the web, and bringing together computer vision researchers interested in this field. To this end, we release a large scale web image dataset named WebVision for visual understanding by learning from web data. The datasets consists of 2.4 million of web images crawled from Interenet for 1,000 visual concepts. A validation set consists of 50K images with human annotation are also provided for the convenience algorithm development.

Based on this dataset, we also organize the first Challenge on Visual Understanding by Learning from Web Data. The final results will be announced at the workshop, and the winners will be invited to present their approaches at the workshop. An invited paper tack will also be included in the workshop.

News 18.08.2017: WebVision2017 Photos have been uploaded

News 10.08.2017: Slides of talks, presentations, and the workshop have been uploaded. See workshop schedule for the links

News 22.07.2017: Prof. Jitendra Malik is unable to give a talk due to schedule conflict. We are happy to welcome Dr. Chen Sun from Goolge to give a talk instead

News 09.07.2017: Challenge Results are released!

News 23.06.2017: Test phase has started!

News 16.05.2017: Google meta information updated

News 18.04.2017: Test images released

News 04.04.2017: Original training images released

News 01.04.2017: Development kit released

News 22.03.2017: README.txt added and Flickr & Google Metadata updated because of missing q1632.json files

News 7.03.2017: The workshop website is now online. The dataset and challenge development kit will be released soon!

Workshop Schedule

8:30-8:40 Opening Remarks, Rahul Sukthankar (Google Research & CMU)
8:40-9:30 Invited Talk:Learning from Web-scale Image Data for Visual Recognition, Chen Sun (Google Research)
9:30-10:00 Database Overview and Challenge Overview, Wen Li & Limin Wang (ETH Zurich)
10:00-10:20 Coffee Break
10:20-10:40 Participant Presentation by Malong AI Research
10:40-11:00 Participant Presentation by SHTU_SIST
11:00-12:00 Poster Session
  1. Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing, Yu-Chuan Su, Kristen Grauman
  2. Self-supervised learning of visual features through embedding images into text topic spaces, Lluis Gomez*, Yash Patel*, Marçal Rusiñol, Dimosthenis Karatzas, C.V. Jawahar
  3. Learning to Learn from Noisy Web Videos, Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, Greg Mori, Li Fe-Fei
  4. Learning from Noisy Labels with Distillation, Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li
  5. On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products, Xintong Han*, Bharat Singh*, Vlad I. Morariu, Larry S. Davis
  6. Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition, Christian Rupprecht, Ansh Kapil, Nan Liu, Lamberto Ballan, Federico Tombari
  7. Webly-supervised Video Recognition by Mutually Voting for Relevant Web Images and Web Video Frames, Chuang Gan, Chen Sun, Lixin Duan, and Boqing Gong
  8. Webly Supervised Semantic Segmentation, Bin Jin, Maria V. Ortiz Segovia, Sabine Süsstrunk
  9. Weakly Supervised Semantic Segmentation Using Web-Crawled Videos, Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han
  10. Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning From Web Data, "Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid "
  11. Few-Shot Object Recognition From Machine-Labeled Web Images, Zhongwen Xu, Linchao Zhu, Yi Yang
  12. WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation, Thibaut Durand, Taylor Mordan, Nicolas Thome, Matthieu Cord
  13. Learning From Noisy Large-Scale Datasets With Minimal Supervision, Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie
  14. Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-Tuning, Weifeng Ge, Yizhou Yu
  15. Link the Head to the “Beak”: Zero Shot Learning From Noisy Text Description at Part Precision, Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, Ahmed Elgammal
14:00-14:50 Invited Talk:Exploiting Noisy Web Data for Large-scale Visual Recognition, Lamberto Ballan (Stanford University & University of Padova)
14:50-15:10 Participant Presentation by VISTA
15:10-15:30 Coffee Break
15:30-15:50 Participant Presentation by CRCV
15:50-16:40 Invited Talk: Towards Web-scale Video Understanding, Olga Russakovsky (Princeton University)
16:40-17:00 Award Session & Closing Remarks

Workshop Overview

The WebVision workshop contains a challenge track and an poster track:

WebVision Challenge Track

Researchers are invited to participate the WebVision challenge, which aims to advance the area of learning useful knowledge and effective representation from noisy web images and meta information. The knowledge and representation could be used to solve vision problems. In particular, we organize two tasks to evaluate the learned knowledge and representation: (1) WebVision Image Classification Task, and (2) Pascal VOC Transfer Learning Task. The second task is built upon the first task. Researchers can participate into only the first task, or both tasks.

WebVision Image Classification Task

The WebVision dataset is composed of training, validation, and test set. The training set is downloaded from Web without any human annotation. The validation and test set are human annotated, where the labels of validation data are provided and the ones of test data are withheld. To imitate the setting of learning from web data, the participants are required to learn their models solely on the training set and submit classification results on the test set. In this sense, the validation data and labels could be simply used to tune hyper-parameters and cannot be used to learn the model weights.

Pascal VOC Transfer Learning Task

This task is designed for verify the knowledge and representation learned from the WebVision training set on the new task. Hence, participants are required to submitting results to the first task and transfer the only models learned in the first task. We choose the image classification task of Pascal VOC to test the transfer learning performance. Participant could exploit different ways to transfer the knowledge learned in the first task perform image classification Pascal VOC. For example, treating the learned models as feature extractors and learning the SVM classifier based on the features. The evaluation protocol strictly follows the previous Pascal VOC.

The WebVision dataset provides the web images and their corresponding meta information (e.g., query, title, comments, etc.) and more information can be found at the dataset page. Learning from web data poses several challenges such as

  1. Label Noise: we can infer pseudo-labels for each instance from web metadata. Such labels are inherently noisy due to inconsistency in the metadata, weak because they typically tag concepts at a coarser granularity than required, and incomplete because they are not reliably present.
  2. Better use of meta and cross-modal information: current approaches do not fully exploit either the semantic richness of the available metadata nor do they take advantage of much cross-modal information (e.g., audio and video) present in most web content. Addressing this requires us to consider research issues such as knowledge mining from unstructured data, joint representation learning from both images and language, joint models for audio-visual data, etc.
  3. Transfer Learning: the computer vision community recognizes that it is important to generalize the knowledge learnt from one dataset to new domains and new tasks. Therefore, effectively adapting the visual representations learned from the WebVision dataset to new tasks with low sample complexity is a key issue of significant theoretical interest and practical importance.

Participant are encouraged to design new methods to solve these challenges.

Poster Track

A poster session will be held at the workshop. The goal is to provide a stimulating space for researchers to share their works with scientific peers. We welcome researchers to submit their recent works on any topics related to learning from web data.

  • Submission to the poster paper track does not require participation in the challenge track.
  • The submission can be published or unpublished work, but have to be accessible publicly. We recommend authors to upload their paper on arXiv, but other publicly accessible link is also acceptable.
  • There is no requirement on paper format or page limitation. We recommend the CVPR formatting style with 4-8 pages.
  • The submission will be reviewed by workshop chairs. Accepted papers will be presented at the poster session at the workshop. Note that all accepted papers will be linked on the workshop website, but will NOT appear in the CVPR workshop proceedings.
  • Poster paper are reviewed in a rolling base until the places are fulfilled.. Acceptance notification will be sent out once the decision has been made. We encourage people to submit as early as possible. For papers submitted before April 15, 2017, the acceptance notification will be sent out at the latest by April 30, 2017. The the remaining submissions, acceptance notification will be sent at the latest by June 30th, 2017.
  • How to submit? For poster paper submission, please send an email titled with "[WebVision2017 Poster Paper Submission] Your Name - Your Paper Title" to The email should contain the following information
    • Paper Title
    • Author List
    • Keywords
    • Name of Main Contact
    • Email of Main Contact
    • Affiliation of Main Contact
    • Paper URL
  • You can also choose to enclose your submission in the attachment, and we will link it on the workshop website upon acceptance.

Important Dates

Challenge Submissions Deadline June 30, 2017
Challenge Award Notification July 10, 2017
Paper Submission Deadline July 2, 2017
Paper Acceptance Notification July 3, 2017
Paper Camera-Ready Deadline July 15, 2017
Workshop date (co-located with CVPR'17) July 26, 2017

All deadlines are at 23:59 Pacific Standard Time.


Chen Sun
Lamberto Ballan
Olga Russakovsky


General Chairs

Jesse Berent
Abhinav Gupta
Rahul Sukthankar
Luc Van Gool

Program Chairs

Wen Li
Limin Wang
Wei Li
Eirikur Agustsson