Alexander Sage, Eirikur Agustsson, Radu Timofte, Luc Van Gool
Designing a logo for a new brand is a lengthy and tedious back-and-forth process between a designer and a client. In this paper we explore to what extent machine learning can solve the creative task of the designer. For this, we build a dataset -- LLD -- of 600k+ logos crawled from the world wide web. Training Generative Adversarial Networks (GANs) for logo synthesis on such multi-modal data is not straightforward and results in mode collapse for some state-of-the-art methods. We propose the use of synthetic labels obtained through clustering to disentangle and stabilize GAN training. We are able to generate a high diversity of plausible logos and we demonstrate latent space exploration techniques to ease the logo design task in an interactive manner. Moreover, we validate the proposed clustered GAN training on CIFAR 10, achieving state-of-the-art Inception scores when using synthetic labels obtained via clustering the features of an ImageNet classifier. GANs can cope with multi-modal data by means of synthetic labels achieved through clustering, and our results show the creative potential of such techniques for logo synthesis and manipulation.
PDFAvailable on Github!
On GitHub we provide the python code for the logo generator as well as pre-trained models for generating logos based on our LLD dataset.
The following is the final version of the Large Logo Dataset (LLD), a dataset of 600k+ logos crawled from the internet. This dataset was introduced with our paper Logo Synthesis and Manipulation with Clustered Generative Adverserial Network
The dataset consists of two parts, crawled from the the Alexa 1M websites list:
The main features of LLD-icon are:
The LLD-icon dataset is provided in two versions:
The logos in this dataset where crawled by parsing the HTML pages with scrapy to look for the favicon declaration as described on Wikipedia: Favicon. If this failed, we attempted to download the default URL http://url/favicon.ico.
After downloading, all logos where directly converted to RGB with 32 pixel size, rejecting all non-square logos. Some statistics:
This resulted in 548.210 logos, in the following resolution:
HDF5: HDF5: Hierarchical Data Format containing images and labels in a single file designed for flexible and efficient I/O as well as portability. This data format uses CHW (Channels, Height, Width) dimension ordering.
PKL: The .pkl file is readable using the python package pickle or cPickle (see also provided sample code). It contains a numpy array of shape (number_of_icons, 32, 32, 3) and type uint8.
FILES: Single image files in original format (mostly jpeg or png)
Path | Description |
data | Image Data - array of shape (486377, 3, 32, 32) |
meta_data/names | Domain name corresponding to the logo on data with same index |
labels/AE_grayscale | Clustering labels acquired with AE-method (trained on grayscale images) as described in our paper. 100 cluster centers. |
labels/resnet/rc_32 | Clustering labels acquired with RC-method described in our paper. 32 cluster centers. |
labels/resnet/rc_64 | Labels for 64 RC-clusters. |
labels/resnet/rc_128 | Labels for 128 RC-clusters. |
Path | Description |
data | Image Data - array of shape (221369, 3, 32, 32) |
meta_data/names | Domain name corresponding to the logo on data with same index |
labels/resnet/rc_16 | Clustering labels acquired with RC-method described in our paper. 16 cluster centers. |
labels/resnet/rc_32 | Labels for 32 RC-clusters. |
labels/resnet/rc_64 | Labels for 64 RC-clusters. |
labels/resnet/rc_128 | Labels for 32 RC-clusters. |
LLD-icon HDF5 (486.377 logos) [762MB] LLD-icon-sharp HDF5 (221.369 logos) [300MB]
LLD-icon PKL (486.377 logos) [663MB]
LLD-icon PKL full data (548.210 logos) [809MB]
LLD-icon FILES (486.377 logos) [775MB]
LLD-icon FILES full data (548.210 logos) [915MB]
Indices for LLD-icon-sharp (pickle format) [1MB]
Domain names for LLD-icon
Domain names for LLD-icon full data
Script_HDF5.py Requirements: NumPy, H5py
Script_PKL.py Requirements: NumPy
LLD-icon sample (5000 logos) [8MB]
The main features of LLD-logo are:
The LLD-logo dataset is provided in two versions:
HDF5: HDF5: Hierarchical Data Format containing images, labels, original domain names and additional twitter metadata in a single file, designed for flexible and efficient I/O as well as portability. This data format uses CHW (Channels, Height, Width) dimension ordering and zero-padding to a uniform size of 400x400 pixels. The original image resolution is stored in the "shapes" dataset and can be used to easily remove the zero-padding.
FILES-PNG: Single image files in PNG format
FILES: Single image files in original format (mostly jpeg or png)
Path | Description |
data | Image Data - array of shape (122920, 3, 400, 400) Smaller images where zero-padded to 400x400 pixels. |
shapes | Image shapes - array of shape (122920, 1, 1, 1) containing the original image array shapes before zero padding. |
meta_data/names | Domain name corresponding to the logo in data with same index |
meta_data/ids | Twitter user-id corresponding to the logo in data with same index |
meta_data/user_object | Serialized tweepy user object (JSON format) corresponding to the logo in data with same index |
labels/ae_grayscale_64px/clusters_64 | Clustering labels acquired with AE-method (trained on grayscale images downsampled to 64 pixels) as described in our paper. 64 cluster centers. |
labels/resnet/rc_32 | Clustering labels acquired with RC-method described in our paper. 32 cluster centers. |
labels/resnet/rc_64 | Labels for 64 RC-clusters. |
labels/resnet/rc_128 | Labels for 128 RC-clusters. |
LLD-logo HDF5 (122,920 logos) [13GB]
LLD-logo FILES-PNG (122,920 logos) [6.2GB]
LLD-logo-full_data FILES (182,998 logos) [5.7GB]
LLD-logo_metadata (pkl) [166MB]
A sample script for loading the data in python
Script_LLD-logo.py Requirements: NumPy, H5py
A sample of 5000 logos from LLD-logo (PNG)
LLD-logo sample (500 logos) [27MB]
Please notice that this dataset is made available for academic research purposes only. All the images are collected from the Internet, and the copyright belongs to the original owners. If any of the images belongs to you and you would like it removed, please kindly inform us, we will remove it from our dataset immediately.
Please add a reference if you are using the dataset
@misc{sage2017logodataset, author={Sage, Alexander and Agustsson, Eirikur and Timofte, Radu and Van Gool, Luc}, title = {LLD - Large Logo Dataset - version 0.1}, year = {2017}, howpublished = "\url{https://data.vision.ee.ethz.ch/cvl/lld}"}