You are on page 1of 25

Fine-grained Bird

Classification & Detection


Group members:Rahul Kumar(B12089)
Ajay Kumar(B12039)
Mentor: Arnav Bhavsar

Problem Statement

We are given a set of images of the different bird species , where


each bird image is labelled with class name which it belongs to ,
and we are going to build a model which will first detect the bird
species in given image and then classify into the corresponding
classes.

Data Set

We are training our model on Caltech-UCSD Birds image dataset.


There are total 200 classes, with 15 features of each image

Each image have:-

1. part-location labels

2. Class labels.

3. Visibility attribute for that part location

Model Process

Our objective is to detect bird in a given image and classify it into respective
category. So our model process can be divide into two parts
1) Bird Detection

First we process an image into our model which detect the bird location in given
image.
2) Bird Classification
After the bird detection, we classify bird image which we got from detection
into their respective classes.

Bird Classification

We are given a set of images consist of different bird species, where each bird
image is labelled with class name which class it belongs to, and we are going to
built a model which classify the different images of the birds by learning fine
and highly discriminative set of features.

We have 200 classes and each class have 30 images for training and 30 for
testing.

Classification
1. Classification model based on Filter Response
Filter used to remove unwanted component or extract features from
image ex. edges in the image.
Filter Bank:- The LM set consists of 84 filters (gauss1d, Gaussian, LOG at
different scale and orientation.
Classifier modeling & testing:1) In the learning stage, we find filter response for each classess training
images. Exemplar For Filter responses (via K-Means clustering (Duda et al.,
2001)) find cluster centres and the histogram of Filter response's frequencies
is then used to form models corresponding to the training images.

Classification
2) In the classification stage, the same procedure is followed
to build the histogram corresponding to the given image.
This histogram is then compared with the models learnt
during training and is classified on the basis of the
comparison. Euclidian method employed to measure
distances between histograms.

Classification
2. Classification model based on Histogram of Oriented Gradients
Now we face difficult problem because birds in such categories can vary greatly in
appearance. Variations arise not only from changes in illumination and viewpoint
and other visual properties.
So we use not only shape information (HOG) and color information at local level.
HOG - In this method we find feature based on count occurrences of gradient
orientation in localized portions of an image.
Algorithm:1) Divide the image into small connected regions called 'cells', and for each cell compute edge orientations
for the pixels within the cell.
2) Discretize each cell into angular bins according to the gradient orientation.
3) For Each cell based on gradient orientation we give weight to its corresponding angular bin. Use it to
represents the block.

Color feature - Since different Objects have different shape with color so Color can
be an important factor in classify different objects into their corresponding classes.
Algorithm:1) Divide the image into small connected regions called cells.
2) For each RGB cell make cluster and use its centers as feature (via K-Means clustering (Duda et
al., 2001)).

Classification

Classification
3. Classification model based on CNN (Convolutional Neural Network)
Convolutional Neural Networks are very similar to ordinary Neural Networks. They
are made up of neurons that have learnable weights and biases.
In Feature Extraction we are using Deep Neural Network which is based on
Krizhevsky's ImageNet Classification with Deep Convolutional Neural.
Input Image First Layer [Conv Layer Max Pooling] Second Layer [Conv Layer
Max Pooling] Third Layer [Conv Layer] Fourth Layer [Conv Layer] Fifth Layer [Conv
Layer Max Pooling] Six Layer [FC Layer] Seventh Layer [FC Layer] Eight Layer [FC
Layer]

Classification
Steps
1) First learn network for training images.
2) Now find feature for each training images from learned
network.
3) We have feature vector and class label for each images so
we train multiclass svm model.
4) Now for testing, first find feature from cnn model and then
using svm model classify it.

Accuracy For classification

Using Filter bank


multiclass:- 54.4%

Using Hog
For 2-class:- 73.23%
For multiclass:- 80.4%

Using CNN
Multiclass :- 84.4%

Bird Detection

Detecting objects in images is a challenging task owing to their


variable appearance and the wide range of position that they can
adopt.

So, first we need a robust feature set that allows the bird form
to be discriminated cleanly, even in cluttered backgrounds under
difficult illumination.

For any bird image given ,image can be divided into two part

One that contains bird and other as Background.

Bird detection become two class problem(Bird , Background)

#1. Template based matching


A technique for finding small part of an image which can match the
template image.
we can divide the image into two parts bird part and background
part.
In our dataset, we have different color birds so , we have to make a
template which will not be affected by different color so we use
HOG feature which give shape information and is not effected by
different color

Training process(Template based)


a)

Divide the dataset into train and test.

b)

Now take an image and extract the image parts which contain bird
and rescale it and using histogram of oriented gradient (HOG) extract
feature.

c)

Using step (b) for all training images extract features and concatenate
them so will have a matrix which contain all training images bird
feature.

d)

Now for creating a template we can use these methods-

1)
2)

Average all bird feature and use average feature as template.

Do clustering and divide the bird feature into different group and
use there center feature as template.

Detection process(Template based)

a) Take test image and move a sliding window of a particular size on it and
extract size window feature and their position.

b) Now downscaled the image and move sliding window of a particular size
on it. Do this till image size is greater than sliding window size.

c) Now apply template matching on extract features using sum of absolute


difference.

d) Choose those sliding window which has minimum error and apply greedy
non-maximum suppression to find detection box for bird in image

#2 HOG and SVM (Training process)

a) Divide the dataset into train and test.

b) Now take an image and extract the image parts which contain
bird and background put it into bird class and background class. Do
this for all training images.

c) Now we extract hog features from all bird and background images
and create training dataset and label vector which contain their
respective class label.

d) Train a linear SVM model.

Testing or Detection (HOG and SVM)

a) Take test image and move a sliding window of a particular size on it and
extract size window feature and their position.

b) Now downscaled the image and move sliding window of a particular size on
it. Do this till image size is greater than sliding window size.

c) Now extract features using hog and calculate weight for each sliding window.

d) Apply greedy non-maximum suppression on all those sliding window and find
overlapped windows.

e) Now apply thresholding find detection box for bird in image

Results for detection process

Using Template matching 69%

Using HOG + SVM 88.5%

Reference Paper :
Navneet Dalal and Bill Triggs. Histograms of
Oriented Gradients for Human Detection.
Thomas Berg and Peter N. Belhumeur. Part-Based
One-vs-One Features for Fine-Grained
Categorization, Face Verification, and Attribute
Estimation

Thank You

You might also like