TIDE is a benchmark dataset suitable for fine-grained semi-supervised object recognition collected from public Flickr images. The first release of the dataset in June 2012 contained 100K images of 15 objects. he second release in June 2013, is more than twice as large with approximately 214K images of 23 objects. Each object is provided with 200 bound-box annotated instances and an estimated 1000 unsupervised instances per object.
To evaluate the specificity of dataset labels, we propose a simple metric based on the average depth of the labels in the WordNet ontology. Intuitively, datasets with more specific labels will have a larger average wordnet depth. The table below shows popular vision datasets sorted by increasing specificity. Note that the MSRAMM dataset with generic label ``animal'' and ``athlete" has a much lower specificity score than Stanford Dogs dataset with specifc dog breed labels like ``airedale'' and ``austrialian terrier''. Considering representative datasets, we define a dataset as ``fine grained'' if it has an average wordnet depth greater or equal to that of oxfordbuildings.
In order to evaluate the performance of a semi-supervised technique, we need a enough examples images for each label type to sufficiently sample the space of object apperances. Emperically, the local feature matching technique may require a thousand instances to form a well connected model of an object viewed from many viewpoints. As shown in the plot below, among existing datasets that have ``finegrained labels'', none have more than 300 examples for all labels. This is clearly insufficient. The TIDE dataset is the only one with a number of examples suitable for evaluating the propsed semi-supervised annotation method using local-feature matching.
The TIDE dataset was collected from flickr images and has the following properties:
The average ground truth image for each of the 23 objects are shown below. Average images are a huristic proposed by for visually summarizing the diversity of apperances and thus the degree of difficulty of a dataset. Datasets in which the average image looks like the object is a sign that the dataset lacks real-world diversity.