top of page

    Dataset    

Here we present a brief overview of our project. For more information please refer to the Project Book in the Downloads section.

    Brief Overview    

    Outline    

Traditional approaches to automatic species recognition in particular and object recognition in general concentrate on hand-engineering sophisticated feature designs, which are highly-tuned towards the task at hand, to describe images. Our solution instead opts for a deep learning approach that automatically learns and designs the representation directly from image data.
The proposed approach employs an off-the-shelf deep Convolutional Neural Network (CNN) to extract features directly from images of flowers. The learned CNN is pre-trained on a large collection of generic web images and the learned representation is applied, using simple SVM classifiers, to the specific task of flower recognition. That is, we transfer learned recognition capabilities from general domains to the specific challenge of flower identification task. The resulting approach is an end-to-end supervised learning strategy, with minimum assumptions about the contents of images. It is simple and general with ample scope for improvements and can be easily extended to many other recognition tasks.

As with all learning approaches, we first need to train a classifier on labeled data and then apply the learned classifier on new unlabeled data to identify the label.
In our system, training the classifier (also called the learning stage) is done as follows: the training module receives a class labeled training dataset of flower images, extracts an image descriptor for each using one of the layers of the pre-trained CNN, and utilizes the descriptors to construct linear predictors for each class in the dataset to be recognized using a one-vs-rest strategy. These predictors are leaned using Support Vector Machines (SVM).
After we have trained the classifier, we can use it for identifying new flower images as belonging to one of the classes in the training dataset. The recognition module receives a novel flower image, encodes the test image into an image descriptor using the pre-trained CNN in a similar way as for training, and employs the learned classifiers to identify the flower species. The final classification is then determined by the classifier with the most positive response over all flower classes. The image below summarizes the process.
Simple data augmentation techniques (that generate additional samples from each image by applying label preserving transformations) are applied for both training and recognition. At train time, augmented samples are taken as-is to enlarge the training set, and at test time, the descriptors of an augmented test image are combined to form a single descriptor.
The CNN used for extracting descriptors is pre-trained on ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). ILSVRC contains 1.2 million labeled images, with roughly 1000 images in each of 1000 basic-level categories (these are disparate categories, e.g. faces, motorbikes, cows, dogs etc.). The images were collected from the web and labeled by human labelers using Amazon’s Mechanical Turk crowd-sourcing tool.

    Method    

    Results    

The table to the right shows the performance, as mean class accuracy, of our method (CNN-SVM) compared to other basic and state-of-the-art baselines on the Oxford 102 flowers dataset (we use only the test set for evaluation). All methods, bar ours, use the segmentation of the flower from the background.
We achieve state-of-the-art performance, far surpassing the more sophisticated and highly-tuned handcrafted methods for the task of flower recognition, even without segmentation. The results highlight the effectiveness and generality of the learned deep representation, indicating that it has sufficient representational power to pick up the potentially subtle differences between very similar flowers, and that learning algorithms can design features better than humans can.
More generally, our results show that deeply learned representations extracted from deep CNNs are capable of achieving ground breaking results in fine-grained recognition (which aims to recognize among subcategories of the same basic-level class), by transferring learned recognition capabilities from general domains with lots of labeled data to more specific domains and tasks where labeled samples are scarce.

We use the Oxford 102 flowers dataset for training our classifier. It consists of 8189 images divided into 102 flower classes. The images were mostly collected from the web. The different species in the dataset can be viewed in the flower gallery to the left.
The dataset is divided into training, validation and test sets. Train and validation each consist of 10 images per class (1020 images each). The test consists of the remaining 6149 images (with minimum 20 images per class).
The validation set is used for tuning SVM parameters. Once the best parameter has been determined, the classifiers are trained (with the chosen parameter) on the combined training and validation sets.
To measure the performance, the recognition system is applied to the test set and performance is measured as mean class accuracy.

bottom of page