***************************************************************************************** Database Overview ***************************************************************************************** The WebVision database contains more than 2.5 million web images, crawled from Flickr and Google Image Search by using the same 1,000 semantic concepts as the popular ImageNet ILSVRC 2012 dataset. The main purpose of this dataset is to advance the research on learning visual recognition models based on minimum human supervision. Images are crawled based on the images engines of Flickr and Google by 1,631 queries generated from those 1,000 semantic concepts. The WebVision database contains three splits, the training set, the validation set, and the test set. For the training set, both web images and meta information are provided. The labels of training images are determined by the semantic concepts used to generate its query. In other words, the training set contains no human annotation. Label noise may exist, which is one of the issues when learning from noisy web data. We also provide a validation set, and a test set for the convenience of algorithmic development, each of which contains 50,000 images (50 images per category). Both sets are manually annotated. The labels for validation set is provided to public, and the labels for the test set is withhold. People can submit their prediction results to our evaluation server for the classification accuracy on test set. ***************************************************************************************** File Structure ***************************************************************************************** =========== Dataset Info =========== The “info” folder contains information on three splits of WebVision dataset. It contains the following files, -- filelists: train_filelist_all.txt, train_filelist_flickr.txt, train_filelist_google.txt, val_filelist.txt, test_filelist.txt are respectively the filelist for all training images, flickr training images, google training images, validation images, and test images. The file “train_filelist_all.txt” is a concatenation of “train_filelist_google.txt” and “train_filelist_flickr.txt”. And for all filelists except test_filelist, each line contains the path of image and its label. For the test_filelist, only the file path is provided. -- query information: it includes synset_words.txt, queries_flickr.txt, queries_google.txt, and synset_query_map.txt. The file “synset_words.txt” includes the 1,000 semantic concepts used to build this database, which are the same as the ImageNet ILSVRC 2012 dataset. The files queries_flickr.txt and queries_google.txt include the queries used to search images from Flickr and Google respectively, each contains 1632 queries (one query is removed afterward). For each query, when there are multiple words, its Flickr version and Google version might be different, due to the difference in query interpreters of two search engines. The file “synset_query_map.txt” contains the query_id to concept_id mapping. -- meta lists: train_meta_list_all.txt, train_meta_list_flickr.txt, train_meta_list_google.txt are respectively the filelist and indices for all training images, flickr training images and google training images. The file “train_meta_list_all.txt” is a concatenation of “train_meta_list_google.txt” and “train_meta_list_flickr.txt”. Each line corresponds to the image in the same line of its corresponding filelist.txt. Each line contains two items, the meta file and the index of the the meta information of the corresponding image in that meta file, started from 1. For example, “google/q0001.json 2” represents this image is queried by the 1st query, and its meta information is saved in the 2nd item of the json file “google/q0001.json”. =========== Training Images =========== We provide two versions of training images, the original images, and the resized images with the minimum side length as 256. The resized version is with a much smaller file size, so it is easier for downloading. The two version of training images are with the same file structure. The training images are organized as, ./source_name/query_id/filename.jpg. For example, “./flickr/q0763/12345679.jpg” denotes an image from the 763-th query from Flickr. The source name is either “flickr” or “google”. The length of query id is fixed as “qxxxx”, where “xxxx” is from 0001 to 1632. The image name may vary. Note we change the file extension as “.jpg” for all images for convenience. But some images may be save as different format. Take care of this issue when you reading images. All images are ensured to be readable using “imread” function in MATLAB R2016b. =========== Meta Info =========== The mata information is saved as json files. For each source (Flickr and Google), the meta information is organized into different files according to the queries. The meta files are organized as, ./source_name/query_id.json, where source name is either “flickr” or “google”, and the form of query id is fixed as “qxxxx”, where “xxxx” is from 0001 to 1632. Note that, for flickr, not all queries have corresponding json files, because some queries return no images. For Flickr, each item contains, - ‘id': image id, which is used as image filename. - 'title': image title - 'description': image description - ‘tags’, image tags, annotated by flickr users. The machine tags are removed. -'url’: image url - 'width' and 'height': original image width and height - 'datetaken', date image was taken - 'rank', the rank of this image in the image list returned by flickr photo search engine, - 'latitude' and 'longitude': latitude and longitude. - 'views', how many views. For Google, each item contains, - ‘id’, image id - 'title', image title - ‘description’, image description -'url', image url - 'site', site name the image was searched from - 'width' and 'height', image width and height - 'rank', the rank of this image in the image list returned by Google Image search engine -'page_url', page url the image was searched from - 'site_url', site url the image was searched from Note not all meta items exist for every image.