You are on page 1of 1

INDIAN STATISTICAL INSTITUTE

M. Stat. II Year (2018-19)

Course: PATTERN RECOGNITION

MINI PROJECT

You are required to work in groups of two to

1. solve a supervised pattern recognition problem of your choice,


2. write a report, which must be submitted together with programming codes via e-mail
to pamita@isical.ac.in by 6 p.m. on November 5, 2018,
3. make a 10-minute presentation on the same in class during the last week of the
semester (that is, on November 6 and 8, 2018),

in accordance with instructions given below.

Instructions
• Decide on a partner and identify a classification task based on data from the UCI Machine
Learning Repository. Use any dataset which has a minimum of 10 numerical features and at
least 1000 instances (data points), excluding those with missing data. Such datasets are
available at https://goo.gl/zzeoQV and brief information about them is available in
https://goo.gl/DY6NXA. Detailed information is available on the individual page for the dataset
concerned.
• PLEASE ENSURE THAT THE SAME DATA SET IS NOT USED BY MULTIPLE GROUPS. Groups must
work with distinct data sets, that is, on different classification problems.
• Fill up the relevant columns of the Google sheet https://goo.gl/9HD27q, no later than noon on
Monday, October 15, 2018. This will help you to avoid duplication of datasets.
• Split the dataset into training and test sets of approximately equal size, in case separate
training and test sets are not provided.
• Each member of the group must independently implement four of the supervised
classification methods covered in the course. The results should be consolidated and
compared using appropriate tables and plots.
• The report and the presentation must contain sufficient details about the following:

(a) The underlying pattern recognition problem


(b) The features used
(c) The eight supervised classification techniques used
(d) Results in the form of training and test set errors as well as the confusion matrix with the
test set.
(e) Analysis of the comparative performance of the classification techniques

• Use R to implement the algorithms of choice.


• All relevant R programming code should be submitted together with a properly-
formulated report containing intermediate and final outputs (including plots) by the
deadline given above.
_________________________________________

You might also like