Youtube-Objects dataset

A large-scale database of object videos from YouTube

Alessandro Prest, Christian Leistner, Javier Civera, Cordelia Schmid, Vittorio Ferrari

Overview

The YouTube-Objects dataset is composed of videos collected from YouTube by querying for the names of 10 object classes. It contains between 9 and 24 videos for each class. The duration of each video varies between 30 seconds and 3 minutes. The videos are weakly annotated, i.e. we ensure that each video contains at one object of the corresponding class.

In addition to the videos, this release also includes several materials from our paper [1]

Bounding-boxes annotations. For evaluation purposes we annotated the object location in a few hundred video frames for each class (see sec. 6.1 [1]).
Point tracks and motion segments. As produced by [2].
Tubes. Spatio-temporal bounding-boxes as described in section 3.2 [1]. We include all candidate tubes (yellow in the fig. above) as well as the tube automatically selected by our method (blue).

Important Notice

These videos were downloaded from the internet, and may subject to copyright. We don't own the copyright of the videos and only provide them for non-commercial research purposes.

Dataset release download

The dataset contains a total of 570'000 frames. As demonstrated in [1], the quality of the video frames play a crucial role in the performance of an object detector trained on them. In order to eliminate possible confusion when decoding the videos and in the frame numbering, we release individual video frames after decompression and after shot partitioning. In this manner, you are in possession of a perfect copy of the dataset as we used in our experiments [1].

As the total download size amounts to 89 GB, we have partitioned the dataset by object class. The following table contains the URLs of the different archives and MATLAB code to access the data. For installation instructions we refer to the README file included in the source code archive.

Filename	Description	Release Date	Size
code.tar.gz	MATLAB source code to access the Youtube-Objects dataset.	17 June 2012	1 MB
aeroplane.tar.gz		17 June 2012	2.0 GB
bird.tar.gz		17 June 2012	3.0 GB
boat.tar.gz		17 June 2012	7.6 GB
car.tar.gz		17 June 2012	1.7 GB
cat.tar.gz		17 June 2012	5.2 GB
cow.tar.gz		17 June 2012	6.1 GB
dog.tar.gz		17 June 2012	19.5 GB
horse.tar.gz		17 June 2012	14.7 GB
motorbike.tar.gz		17 June 2012	4.3 GB
train.tar.gz		17 June 2012	21.1 GB

Related publications and software

[1] A. Prest, C. Leistner, J. Civera, C. Schmid and V. Ferrari.
Learning Object Class Detectors fromWeakly Annotated Video
Computer Vision and Pattern Recognition (CVPR), 2012.

[2] T. Brox, J. Malik.
Object segmentation by long term analysis of point trajectories
European Conference on Computer Vision (ECCV), 2010.

Acknowledgements

This work was partially funded by the QUAERO project supported by OSEO, French State agency for innovation, the European integrated projects AXES and RoboEarth, DPI2009-07130, SNSF IZK0Z2-136096, CAIDGA IT 26/10 and a Google Research Award.

University of Edinburgh, CALVIN ETH Zurich, CALVIN INRIA Grenoble, LEAR