You are on page 1of 2

SGN-2556 Pattern Recognition

Computer exercise 1 Bayes classier 28.03.2006

Introduction

Bayesian decision theory is a fundamental statistical approach to the problem of pattern recognition. It makes the assumption that the decision problem is posed in probabilistic terms, and that all of the relevant probability values are known. To minimize the probability of error in a classication problem, one should always choose the state of nature that maximizes the posterior probability P (i |x). Bayes formula allows us to calculate such probabilities given the prior probabilities P (i ) and the conditional densities p(x|i ) for dierent categories P (i |x) = p(x|i )P (i ) . p(x) (1)

One of the most useful and widely used ways to represent pattern classier is by use of the discriminant functions gi (x) = ln p(x|i ) + ln P (i ). The classier is said to assign a feature vector x to class i if gi (x) > gj (x) for all j = i. (3) (2)

For two-category case a single discrimination function (decision boundary) is used g(x) g1 (x) g2 (x). (4)

The following decision rule is used: Decide 1 if g(x) > 0;otherwise decide 2 . Thus, a two-category classier (dichotomizer) can be viewed as a machine that computes a single discriminant function g(x), and classies x according to the algebraic sign of the result. If the densities p(x|i ) are multivariate normal - equation (2) will take the form K 1 1 (5) gi (x) = (x i )t 1 (x i ) ln 2 ln |i | + ln P (i ). i 2 2 2 For the case when the covariance matrices i are dierent for each category, the resulting discriminant functions will be inherently quadratic
t gi (x) = xt Wi x + wi x + i0 ,

(6) (7)

where

1 Wi = 1 , 2 i 1

wi = 1 i , i and 1 1 i0 = t 1 i ln |i | + ln P (i ), i i 2 2 where i is the mean vector, i is the covariance matrix.

(8)

(9)

Exercise tasks
1) Download from the web page data generator.m function and generate two articial vectors of features and targets of dimension (3000, 2) and (3000, 1), respectively. Features is the two-dimensional feature vector belonging to two classes and consisting of 3000 samples. Corresponding targets for each class are specied in the targets vector. 2) Divide the vector of features into the training dataset (e.g. features train) consisting of 1000 samples and the test dataset (e.g. features test) containing the rest of your data. First dataset will be used to train the classier, whereas the second one to test the performance of the classier. Make sure that there is no overlap between these two sets. Divide the targets vector in a similar way. Visualize the training data as a 2-D scatter plot using dierent colors for dierent classes. Use the MATLAB scatter command (help scatter). 3) Derive a Bayesian classier for the two-category case assuming the general multivariate normal class-conditional probability density for each class (covariance matrices are dierent for each category). Use the bayes classier.m function to calculate gi (x), i = 1, 2. NOTE that before using this function you have to separate the features train vector into two distinct classes (i.e. two vectors) containing the samples of only one class. Use the sample frequencies for each class to calculate the prior probabilities P (i ). 4) Dene a single decision boundary of the two-category classier by the following rule g(x) g1 (x) g2 (x). (10)

where g1 (x) and g2 (x) are the discriminant functions for each class obtained in 3). 5) Plot the decision boundary of the classier superimposed on your train data points. Main steps of this procedure are indicated below i Evaluate the values of the function g(x) over the range of your data (help meshgrid) ii Plot the single contour plot of the matrix g(x) at the zeroth level (help contour). 6) Classify the test data points of your dataset using the following decision rule: Decide 1 if g(x) > 0;otherwise decide 2 . 7) Calculate the empirical Bayes error rate, i.e. the percentage of misclassied points for the classier for the test set by comparing the true targets test values to the classied ones. Comment your results. 2

You might also like