LLD - Large Logo Dataset

Alexander Sage, Eirikur Agustsson, Radu Timofte, Luc Van Gool


LLD - Large Logo Dataset - v0.1

The following is a preliminary version of a the Large Logo Dataset (LLD), a dataset with 500k+ logos crawled from the internet. This project is a work-in-progress, but we are releasing a preliminary version of our dataset.

The goal of this project is to explore to what extent, artificial intelligence can solve the creative task of designing logos.

For this we train Generative Adversarial Networks on our proposed dataset and obtain very promising results.

The dataset currently consists of 548210 favicons and was crawled from the the Alexa 1M websites list on April 7th 2017.

The main features of LLD 0.1 are:

  • Standardized resolution: 32 x 32 pixels
  • Training-friendly format: a sequence of binary python pickle files, each containing 100.000 logos (except for the last one) as numpy arrays in a random permutation of the data. The arrays are of standardized shape (32, 32, 3) ready for use in TensorFlow or similar machine learning frameworks.
  • Single-file format: Single PNG files for maximum flexibility of use
  • LLD 0.1 is available for download now

The favicon dataset is provided in two versions:

  • Full version containing all favicons we where able to crawl, with duplicates removed. Size: 548.210 images
  • Clean version, optimized for use with GAN's: Here we attempted to remove all non logo-like images, especially natural images like faces as well as very complex and empty logos. Size: 486.377 images

The LLD v1 will contain:

  • Crawled high resolution logos (> 100x100 px) of >100.000 websites
  • Associated meta-information
  • Training friendly formats
  • Our GAN models for high resolution logo generation

We will release LLD v1 as soon as we can!


Here's a sneek peak at 400 random logos from our v0.1 dataset:



Preliminary Results for Synthesizing Logos

This datasets presents many challenges for current state of the art GAN architectures. In particular, we found that both DCGAN and BEGAN will collapse on this dataset unless the resolution is reduced down to around 10x10 pixels.

Over the last months, we have been working on techniques to make GAN training work on such multi-modal datasets. Our preliminary results are promising, and already we are synthesizing 32x32 px logos such as the following 400 sampled from our model:

We can perform interpolation in the space, to verify that the learned space is smooth:

Stay tuned! We will be releasing more information in an upcoming arXiv submission.



Details on data acquisition and .pkl file format

The logos in this dataset where crawled by parsing the HTML pages with scrapy to look for the favicon declaration as described on Wikipedia: Favicon. If this failed, we attempted to download the default URL http://url/favicon.ico.

After downloading, all logos where directly converted to RGB with 32 pixel size, rejecting all non-square logos. Some statistics:

  • Unreadable image files: 71.596
  • Non-square images: 36.401
  • Total logos saved: 662.273
  • of which duplicates removed: 114.063

This resulted in 548.210 logos, in the following resolution:

  • Native 32p: 158.881 (24%)
  • Downscaled: 148.132 (22%)
  • Upscaled: 355.260 (54%)

The .pkl file is readable using the python package pickle or cPickle (see also provided sample code for convenience). It contains a numpy array of shape (number_of_icons, 32, 32, 3) and type uint8.

License

Please notice that this dataset is made available for academic research purpose only. All the images are collected from the Internet, and the copyright belongs to the original owners. If any of the images belongs to you and you would like it removed, please kindly inform us, we will remove it from our dataset immediately.

Citation

Please add a reference if you are using the dataset

@misc{sage2017logodataset,
author={Sage, Alexander and Agustsson, Eirikur and Timofte, Radu and Van Gool, Luc},
title = {LLD - Large Logo Dataset - version 0.1},
year = {2017}, 
howpublished = "\url{https://data.vision.ee.ethz.ch/cvl/lld}"}

Download

Low-Res Logos (32x32 px)

LLD v0.1, clean version (486.377 logos) (663MB)

LLD v0.1, full version (548.210 logos) (809MB)

Samples

A sample script for loading the data in python

A sample of 5000 logos from the LLD v0.1 clean version (8MB)

High-Res Logos

Soon!

Metadata

Soon!

Trained Models & Code

Soon!
Last updated: 25. August 2017