You are on page 1of 13

Biometric Authentication:

A Machine Learning
Approach

by
Sun-Yuan Kung
Dept. of Electrical Engineering, Princeton University
kung@ee.princeton.edu
Man-Wai Mak
Centre for Multimedia Signal Processing,
Dept. of Electronic and Information Engineering,
The Hong Kong Polytechnic University
enmwmak@polyu.edu.hk
Shang-Hung Lin
IC Media Corporation
shanghung.lin@ic-media.com
(Modified on March 26, 2004)

CONTENTS

1 MATLAB PROJECTS
1.1 Matlab Project 1: GMM and RBF Networks for Speech Pattern
Recognition
1.1.1 Introduction:
1.1.2 Objectives:
1.1.3 Procedures:
1.2 Matlab Project 2: SVM for Pattern Classification
1.2.1 Objectives:
1.2.2 Procedures:

1
1
1
1
1
6
6
6

vi

Chapter 1

MATLAB PROJECTS

1.1
1.1.1

Matlab Project 1: GMM and RBF Networks for Speech Pattern Recognition
Introduction:

Gaussian Mixture Models (GMMs) and Radial Basis Function (RBF) networks are
two of the promising neural models for pattern classification. In this laboratory
exercise, your task is to develop pattern classification systems based on GMMs and
RBF networks. The systems should be able to recognize 10 vowels. You will use
netlab (http://www.ncrg.aston.ac.uk/netlab/) together with Matlab to create the
GMMs and RBF networks.

1.1.2

Objectives:

You should complete the following tasks at the end of this laboratory exercise:
1. Create GMMs and RBF networks to represent 10 different classes.
2. Perform pattern classification using the created networks.
3. Compare the GMM-based system against the RBF network-based system in
terms of recognition accuracy, decision boundaries, training time, and recognition time.
4. Find the decision boundaries and plot them on a 2-D plane.

1.1.3

Procedures:

GMM-Based Classifier
1) Download the Netlab software from http://www.ncrg.aston.ac.uk/netlab and
save the m-files in your working directory. Download the training data and
testing data from M.W. Maks home page
http://www.eie.polyu.edu.hk/mwmak/Book/gmmrbf.zip.
This file contains 2-D vowel data (in the data/ directory) that you will use
in this laboratory exercise. You will also find the following files in gmmrbf.zip:
1

Matlab Projects

Chapter 1

load pattern.m: a function for importing the 2D vowel training and test
data.
train gmm.m: a function for training a GMM.
train all gmm.m: a function for training all the GMMs.
train rbf.m: a function for training an RBF network.
plot all gmm.m: a function for plotting the centers of all GMMs and
training data
plot gmm contour.m: a function for plotting the equal-probability contour of a GMM.
plot rbf.m: a function for plotting RBF centers and training data.
VowelClassifer.m: a program that calls the above functions to train 10
GMMs (or an RBF network with 10 outputs) and find the classification
accuracy.
plot boundary.m: a program for plotting the decision boundaries created
by 10 GMMs or an RBF network.
gmm example.m: an example program showing how to create and train
a GMM-based classifier. It also suggests a method to plot the decision
boundaries created by a GMM-based classifier.
Note that some of the files contain missing statements. You will need to fill
in these statements in this lab exercise.
2) Run Matlab, go to File Set Path and add the directory where netlab
was saved.
3) Read and run the file gmm example.m in gmmrbf.zip to see how to use
Netlab to create and train a GMM.
4) Import and save the training data, 2DVowel train pat.dat, to a 2D array. The imported matrix should be 338 12 in size. The first 2 columns
contain training feature vectors in a 2-D input space and the 3rd to 12th
columns indicate the class to which each pattern belongs. A Matlab function
load pattern.m is provided to help you to import the training and test data.
5) Create 10 GMMs to represent the 10 vowels using the data from 10 different
classes. It is recommended to separate the data into ten 2-D arrays. Set the
number of centers to 2 and covariance type to diag. The model can be
created by using
model name=gmm(data dimension, no of centers, covariance type)
and the model can be initialized by
model name=gmminit(model name, data, options)

Section 1.1.

Matlab Project 1: GMM and RBF Networks for Speech Pattern Recognition

6) Then, you use the EM algorithm gmmem to train the models.


7) Plot the imported training data together with the centers after EM training.
8) Now, import the test data, 2DVowel test pat.dat. This file is used for
finding the classification rate of the GMMs you have just created. The file
contains 333 data points, and again each point belongs to one of the 10 classes.
The likelihood of a particular model i for a data point xt (i.e., p(xt |i )) can
be calculated by the function gmmprob. Each data point is classified to the
class whose corresponding likelihood is the highest. The overall classification
rate is calculated by:
Number of correctly classified points
100%
Total number of data points
9) Now, try different numbers of centers and different covariance types (diag
or full) when creating the models. Find the optimal combination that gives
the highest classification rate. What is the optimal combination and what is
the classification rate?
10) Plot the decision boundaries that separate the 10 classes. See gmm example.m
for an example.
RBF network-Based Classifier
1. In this part, you will repeat the previous procedures but using an RBF network. Again, you should start with importing the training data and store it
in a 338 12 array.
2. After imported the data, you should separate it into 2 parts: one is the input
data part, which is 338 2 in size, and the other one is the desired outputs,
which is 338 10 in size.
3. Instead of creating 10 different RBF networks, you can create one RBF network. To create an RBF network, you use the function rbf. To specify the
network architecture, you provide the number of inputs, the number of hidden
units, and the number of output units.
4. After that, you initialise the RBF network by calling the function rbfsetbf.
You need to specify a number of option fields as in gmm. Before performing
classification, call the function rbftrain to train the RBF network. You also
need to specify a target vector, which contains the class information.
5. After training the network, import the test data and use the function rbffwd
to perform classification. This function has 2 input fields, one is the RBF
network that will be used for classification and the other one is a row vector.
In this exercise, this row vector has 2 fields: the x location and y location.

Matlab Projects

Chapter 1

The output is again a row vector, and its size will be equal to the number
of outputs that you specify in Step 3. For each test vector, the class ID is
determined by selecting the output whose response to the test vector is the
largest.
6. Compute the classification rate of the whole test set. Try different number
of hidden units and select the optimal one. What is the optimal number of
hidden units and what is the corresponding classification rate? Compare and
explain the classification performance of the RBF networks with that of the
GMMs.
7. Plot the decision boundaries. Fig. 1.1(b) shows an example of the decision
boundaries created by an RBF network. Compare the boundaries with those
of the GMMs.
8. Compare the GMM-based classifier against the RBFN-based classifier in terms
of classification accuracy, training time, and recognition time.

Section 1.1.

Matlab Project 1: GMM and RBF Networks for Speech Pattern Recognition

Decision Boundaries of GMMBased Classifier


4000
Class 1 Data
Class 2 Data
Class 3 Data
Class 4 Data
Class 5 Data
Class 6 Data
Class 7 Data
Class 8 Data
Class 9 Data
Class 10 Data
Trained Centres

3500

3000

2500

2000

1500

1000

500
200

300

400

500

600

700

800

900

1000

1100

1200

(a) GMM

Decision Boundaries of RBFBased Classifier


4000
Class 1 Data
Class 2 Data
Class 3 Data
Class 4 Data
Class 5 Data
Class 6 Data
Class 7 Data
Class 8 Data
Class 9 Data
Class 10 Data
Trained Centres

3500

3000

2500

2000

1500

1000

500
200

300

400

500

600

700

800

900

1000

1100

1200

(b) RBF network


Figure 1.1. Decision boundaries created by (a) 10 GMMs and (b) an RBF
networks with 10 outputs for the 2-D vowel problems.

Matlab Projects

1.2

Chapter 1

Matlab Project 2: SVM for Pattern Classification

1.2.1

Objectives:

Use linear and non-linear support vector machines (SVMs) to classify 2-D data.

1.2.2

Procedures:

Linear SVMs
1) Go to http://www.cis.tugraz.at/igi/aschwaig/software.html to download the
software svm 251 and save the m-files to your working directory. Some
functions such as plotboundary, plotdata, and plotsv are included in
the end of the m-file demsvm1.m. Extract these functions to form new
m-files, e.g. plotboundary.m, plotdata.m and plotsv.m.
2) Open Matlab, go to File Set Path and add the directory where svm 251
was saved. Alternatively, add the statement addpath svmdir, where svmdir
is the directory storing svm 251, to your .m file.
3) If your system does not have Matlab Optimization Toolbox installed, you
may need to compile the files pr loqo.c and loqo.c in Matlabs command
prompt, as follows:
mex pr loqo.c loqo.c
This will create a file namely pr loqo.dll. Then, find the line
workalpha = loqo(H, f, A, eqconstr, VLB, VUB, startVal, 1);
in svmtrain.m and change loqo to pr loqo.
4) Input the following training data. X is a set of input data, 20 2 in size. Y
contains the corresponding class labels, 20 1 in size.
X
Y

(2,7)
+1

(3,6)
+1

(2,5)
+1

(3,5)
+1

(3,3)
+1

(2,2)
+1

(5,1)
+1

(6,2)
+1

(8,1)
+1

X
Y

(5,8)
-1

(9,5)
-1

(9,9)
-1

(9,4)
-1

(8,9)
-1

(8,8)
-1

(6,9)
-1

(7,4)
-1

(4,4)
-1

(6,4)
+1

(4,8)
-1

X = [2, 7; 3, 6; 2, 5; 3, 5; 3, 3; 2, 2; 5, 1; 6, 2; 8, 1; 6, 4; 4, 8; 5, 8; 9, 5; ...
9, 9; 9, 4; 8, 9; 8, 8; 6, 9; 7, 4; 4, 4];
Y = [+1; +1; +1; +1; +1; +1; +1; +1; +1; +1; 1; 1; 1; ...
1; 1; 1; 1; 1; 1; 1];
Plot a graph to show the data set using the commands plotdata as the
following:
x1ran = [0, 10]; x2ran = [0, 10]; %data range
f1 = figure; plotdata(X, Y, x1ran, x2ran);
title(0 Data from class + 1 (squares) and class 1 (crosses)0 );

Section 1.2.

Matlab Project 2: SVM for Pattern Classification

5) Create a support vector machine classifier by using the function svm.


net = svm(nin, kernel, kernelpar, C, use2norm, qpsolver, qpsize)
Set nin to 2, as X contains 2-D data. Set kernel to linear in order to use linear
SVM. Set kernelpar to [ ], as linear kernel does not require any parameters.
Set C to 100. Set use2norm to 0 so that 1-norm is used (standard SVM).
Set qpsolver to loqo so that the C functions in pr loqo.dll will be used for
performing quadratic optimization. You may leave the last parameter qpsize
blank, i.e. the default value is to be used.
After creating a support vector machine, we train it by using the function
svmtrain
net = svmtrain(net, X, Y, alpha0, dodisplay)
Set alpha0 to [ ]. Set dodisplay to 2 to show the training data.
After carrying out the above steps, record the number of support vectors.
Also, record the norm of the separating hyperplane and calculate the length
of the margin from net.normalw.
Plot the SVM using the commands plotboundary, plotdata, and plotsv as follows:
figure; plotboundary(net, x1ran, x2ran);
plotdata(X, Y, x1ran, x2ran); plotsv(net, X, Y);
normW = norm(net.normalw);
tstring = sprintf(0 Linear SVM, C = %.1f, #SV = %d, normW = %.2f 0 , . . .
C, length(net.svind), norm);
title(tstring,0 FontSize0 , 11);
6) Vary the value of C in the function svm and repeat Step 5, e.g. C = 0.1, C = 10,
and C = 1000. For different values of C, plot the corresponding SVM, record
the number of the support vectors, the norm of the separating hyperplanes,
and the margin width. Discuss the change in the number of support vectors
and the margin width as a result of varying C.
Non-Linear SVMs
1) Repeat the procedures described in the linear SVM above, but this time replace the linear kernel by a polynomial kernel:
net = svm(2, 0 poly0 , 2, C, 0, 0 loqo0 ); % 2nd-order polynomial kernel
net = svmtrain(net, X, Y, [ ], 2);
2) Vary the value of C in the function svm and repeat Step 1, e.g. C = 0.1,
C = 10, and C = 1000. For different values of C, plot the corresponding SVM,
record the number of the support vectors. Discuss the change in the number
of support vectors and the margin width as a result of varying C Note that

Matlab Projects

Chapter 1

you cannot obtain the margin width from net.normalw because the margin
width is different in different regions of the input space.
3) Repeat Step 1 and Step 2 above but use a polynomial kernel of degree 3:
net = svm(2, 0 poly0 , 3, C, 0, 0 loqo0 ); % 2nd-order polynomial kernel
net = svmtrain(net, X, Y, [ ], 2);
Explain your observation.
4) Repeat Step 1 and Step 2 above but use an RBF kernel:
net = svm(2, 0 rbf 0 , 8, C, 0,0 loqo0 ); % RBF kernel with 2 2 = 8
net = svmtrain(net, X, Y, [ ], 2);
Explain your observations.
The Role of Kernel Parameters
1) Edit the file svmtrain.m to implement the kernel function

x xi p
K(x, xi ) = 1 +
.
2
Set p = 2 and re-run the 20-point problem in Section 1.2.2 for different values
of . How does affect the shape of the decision boundary for a fixed C and
p. Discuss your observations.
2) Vary the integer p but fix . How does p affect the shape of the decision
boundary for a fixed C and . Discuss your observations.
3) Vary the kernel parameter of the RBF kernel and repeat the 20-point problem. How does affect the shape of the decision boundary for a fixed C.
Discuss your observations.
Invariance Properties of SVMs
1) Input the following Matlab code to your .m file to create a noisy XOR problem:
shift = 0;
scale = 1;
xmean = [1, 1; 1, 1; 1, 1; 1, 1];
xstd = 0.5;
randn(0 state0 , 0);
x = [xstd randn(N, 1) + xmean(1, 1) xstd randn(N, 1) + xmean(1, 2); . . .
xstd randn(N, 1) + xmean(2, 1) xstd randn(N, 1) + xmean(2, 2); . . .
xstd randn(N, 1) + xmean(3, 1) xstd randn(N, 1) + xmean(3, 2); . . .
xstd randn(N, 1) + xmean(4, 1) xstd randn(N, 1) + xmean(4, 2)];
x = x scale + shift;
y = [1 ones(N, 1); ones(N, 1); ones(N, 1); 1 ones(N, 1)];
x1range = [2, 2] scale + shift;

Section 1.2.

Matlab Project 2: SVM for Pattern Classification

x2range = [2, 2] scale + shift;


2) Use linear SVMs, polynomial SVMs, and RBF SVMs to solve the noisy XOR
problem. For each case record the decision boundaries and the number of
support vectors. Note that you may need to edit the file plotdata.m to
remove the labels of the data points on the plots. Can the linear SVM solve
the XOR problem? Explain your observations.
3) Scale Invariance. Set scale = 5 to scale the training data (xi 5xi ) and
repeat Step 2 for using polynomial SVMs and RBF SVMs. Note that you need
to change the polynomial kernel function to (1 + xT xi / 2 )2 , where = 5 in
this case. This can be done by editing the file svmkernel.m. You also need
to properly scale the kernel parameter of the RBF kernel, as follows:
net = svm(2, rbf, (sigma 0.5*scale) 2, C, 0, loqo);
where sigma is the kernel parameter in

kx xi k2
K(x, xi ) = exp
2
for the unscaled data. You may set sigma = 1 to obtain reasonably good
results. Are all of the SVMs scale invariant? Explain your observations.
4) Change the kernel function in svmkernel.m to ( 2 + xT xi )2 and repeat Step
3. Does this kernel function lead to scale invariant SVMs? If not, why?
5) Translation Invariance. Set shift = 12 to shift the training data ((xi
xi + [12 12]T ) and repeat Step 2. Which type of SVMs is not translation
invariant? Explain your observations.
6) Set scale = 5 and shift = 12 to investigate both the scale and translation invariance properties of different SVMs.
Verifying the Analytical Solutions
1) Create a 4-point XOR problem by entering the following Matlab code to your
.m file:
C = 100;
shift = 0;
scale = 1;
x = [0, 0; 0, 1; 1, 0; 1, 1] scale + shift;
y = [1; 1; 1; 1];
x1range = [1, 2] scale + shift;
x2range = [1, 2] scale + shift;

10

Matlab Projects

Chapter 1

2) Create a polynomial SVM to solve this XOR problem and show that the
Lagrange multipliers and the bias are 1 = 10/3, 2 = 8/3,3 = 8/3, 4 = 2,
and b = 1. Determine the maximum value of the Lagrangian L(). You
may use the following code fragment to display the multipliers and the slack
variables i :
[svDec, svOut] = svmfwd(net, net.sv);
svSlack = 1 y(net.svind). svOut;
svAlpha = net.alpha(net.svind);
fprintf(0 svind
ai
xii n0 );
fori = 1 : length(net.svind)
fprintf(0 %d
%.4f
%fn0 , net.svind(i), svAlpha(i), svSlack(i));
end
3) Set shift = 0.5 and repeat Step 2. Compare the decision boundary between
the SVM obtained in Step 2 and the one obtained in this step. What is
the implication of your result? Suggest a possible method to address the
translation invariance issue.
4) Repeat Steps 2 and 3, but this time use an RBF SVM to solve the 4-point XOR
problem. Based on your results, comment on the applicability of polynomial
SVMs and the RBF SVMs in real world problems.

You might also like