You are on page 1of 4

ICSP2006 Proceedings

Kernel-based Classifier for Iris Recognition


Shuai Shao Mei Xie
Institute of Electrical Engineering, University of Electronic Science and Technology of China,
Chengdu 610054, P.R. China
Email: bakedcorn@163.com mxie@uestc.edu.cn

Abstract
Kernel-based nonlinear feature extraction and
classification algorithms are a popular new research
direction in machine learning and widely used in many
fields. Firstly, we will give an overview of kernel
Fisher discriminant analysis and Support Vector
Machine, and then the description of multi-class
classification method applied for them. The
performance of these two classification method is
analyzed on CASIA II database.

1. Introduction
To meet the increasing security requirement of
current commercial society, personal identification is
becoming more and more important. Some traditional
methods are usually not reliable. Therefore a new
method for personal identification named biometrics
has been attracting more and more attention. Thus, iris
patterns, due to the complexity of two underlying
processes, are believed to be unique.
Flom and Safir first proposed the concept of
automated iris recognition in 1987 [1]. Since then, iris
recognition has been receiving many researchers
attention [2-4]. The first complete iris based
recognition system was designed and patented by
J.Daugman [2]. He utilized Gabor filter with different
frequency and phase to denote the structure
information of iris and measure the resulting code with
Hamming distance. The algorithm of 1-D wavelet
transform at various levels on iris image is proposed
by Boles and Boashash [3]. He made use of the
projection of iris texture in low-dimension space as
characteristics and using Euclidean distance as
matching rule. V.Dorairaj and N.A.Schmid use PCA
and ICA for encoding and use Euclidean distance and
Hamming distance as measures [4].
In this paper, we propose a novel method for iris
recognition, where wavelet transform and PCA is used
for feature extraction, while kernel fisher discriminant
analysis or SVM is used for classification. In order to
illustrate the potential of SVM and KFD for iris
recognition, we give some experimental results on the
proposed method and comparison between existing
methods and proposed method.

____________________________________
0-7803-9737-1/06/$20.00 2006 IEEE

The remainder of this paper is organized as follows.


In the Section 2, we will give an overview of the idea
of kernel and following the Kernel fisher discriminant
and SVM classification algorithm. A series of
experimental results on CASIAII iris database and
comparison will be described in Section 3. Section 4
represents the conclusions.

2. Kernel-based Classifier
2.1 Kernel -idea
The idea of Kernel, an efficient nonlinear mapping
method, is firstly used in Support Vector Machine.
Through nonlinearly mapping using kernel trick, the
original space can be transformed into arbitrary higher
dimensional feature space. According to covers
theorem on the reparability of patterns, nonlinearly
separable patterns in the original space turn to be
linearly separable in a higher dimensional space. The
nonlinear mapping between the input space and the
feature space, with a possibly prohibitive
computational cost, is never implemented explicitly by
Kernel-idea [5, 7].

X = [ x0 , x1 ,LL x M 1 ] C N 1 be the
training samples, and be the nonlinear mapping
Let

function. So in the new feature space F, the matrix


constructed
of
training
samples
Q
is [( x0 ),( x1),L( xM 1)] . So a dot product over F is

< ( x i ), ( x j ) >= k ( x i , x j ) .

(1)

In this case, we should not care about what form


is, because dot product calculations are defined only
implicitly via the kernel function itself.
Three main types of kernel functions are widely
used in practice. They are polynomial kernels,
Gaussian RBF kernels, and sigmoid kernels.
Polynomial kernels:
Gaussian RBF kernels:

(( x + y ) + ) d
exp{

x y

(2)
2

2 2
Sigmoid kernels:
tanh( ( x y ) + )
where, d N , > 0, > 0.

(3)
(4)

2.2 Kernel Fisher Discriminant Analysis


Kernel fisher discriminant analysis is the non-linear
extension of fisher discriminant analysis [8]. Let us
assume the training data,

Txy = {( x1 , y1 ), ( x 2 y 2 ),LL ( x n , y n )}, y i {1,2}


and then we can obtain data mapped in kernel space,

XY

= {( ( x1 ), y1 ), L , ( ( x n ), y n )} y {1,2}

the class separability function in kernel space is


defined as:

wT S w
J F (W ) = T B ,
w SW w

(5)

where, the between-class scatter matrix

S B = (u1 u 2 )(u1 u 2 ) T ,
u y =

1
ly

(6)

at H 1 : wx + b = 1 i , and sample belonging to

y 2 closest to H locate at H 2 : wx + b = 1 + i .
The Optimal Hyperplane can be obtained by minimize
the following quadratic programming task:
n
1
( w, b) = arg min( w w + c i p )
2
i =1

c > 0, i > 0
where,

L=

SW = S1 + S 2

(7)

ly

S y = ( ( xi ) u y )( ( xi ) u y ) T , y {1,2} .
i =1

w can be denoted as linear combination of

the mapped data w =

i ( xi ) , so the class

~
T K (J 1)K T
(8)
T K (I J )K T
~
where, K is the kernel matrix, and 1 is k k matrix
with element 1 / k , and J is
1 / k y , if < y = yi = y j >
J ij =
y {1,2} (9)
0
,
otherwise

The problem converts into train the parameter

vector to maximize the function J F (W ) .

and then get optimal resolution

Q( ) = i

with restrict i

* which maximize

1
i j y i y j ( xi x j )
2 i, j

(13)

i 0, i [1 i y i ( wxi b)] = 0,
i 0, i i = 0

After computation above, we obtain discriminant


function
n

f ( x) = sgn( i* y i ( xi x) + b * )

(14)

i =1

where,

x is a new sample.
> 0 , corresponding x are support vector.

For i

If we use kernel function to replace the dot metrix in


Eq.(14), we obtain discriminant function of SVM [6],
n

(15)

i =1

SVM is the extension from Optimal Hyperplane


which makes spacing between two classes longest.
Suppose that a Hyperplane, H : wx + b = 0 could
classify dataset {( xi , y i ), i, LL N }, y {1,2}
generalizing

f ( x) = sgn( i* y i k ( xi x) + b * )

2.3 Support Vector Machine

Through

(12)

Computing the differential coefficient to w, b, i ,

i =1

J F (W ) =

1
|| w || 2 +C i +
2
i

i [1 i yi (wxi b)] i i

separability function can be written as

correctly.
function,

c stands for the regularization constant and i

are slack variables used to solve the problem of nonseparable data [9]. Through bringing into lagrange rule,
the optimal problem turn into maximize the following
function:

,l y

i =1

(11)

w ,b

( x ), y {1,2} ,

and the within-class scatter matrix

Because

y1 closest to H locate

make that sample belonging to

the

Hyperplane

y i ( wxi + b) 1 i ,

(10)

2.4 Multi-class Method for KFD and SVM


Classifier
Although KFD and SVM are effective classifier,
they are binary classifier which just can separate data
between two classes but helpless for multi-class data.
In order to do multi-class, we must take multi-class
methods for binary classifier. Suppose that we want to
correctly separate sample data of k classes,
Txy = {(x1 , y1 ), ( x2 y2 ),LL( xn , yn )}, yi {1,Lk}.

Two methods will be presented in the following:


A. One-Against-All method (OAA): OAA method
transforms the k multi-class problem into k binaryclass problems. In every binary classification, if we
choose data belonging to ith class as one class (+1) ,
data beyond ith class are separated into one
class (1) , here the discriminant function is signed

f i (x) . For a new sample x , we use


these k discriminant functions to estimate,
if only ith output is 1
w ,
,
(16)
x i
otherwise
reject ,
if only output of ith function is + 1 , x will be
classified into ith class; otherwise recognition reject.

One benefit of using wavelet is to wipe off the direct


current elements of an image, and another is to wipe
off the high frequency elements, which generally is
noise. In this paper, we adopt bior1.1 wavelet.
As shown is Fig.1, LL2 sub image contains most
useful information of original image. So we choose
LL2 sub image to be basis of following process.

as

Fig.1 sub images of 2-level wavelet transform

B. One-Against-One method (OAO): OAA


method transforms the k multi-class problem
into k ( k 1) / 2 binary-class problems. Every binary
classification just separates ith class from

jth class

(i , j 1, L k ) , here the discriminant function is


signed as

f ij (x) . We can set a label matrix L ,

initialized with every element equal to 0, and each row


present one class. If output of f ij (x ) is 1, Lij will be

x will
be separated into ith class if only ith row of L have
most number of '1' . For a new sample x , we use the
set to 1; otherwise, L ji will be set to 1. Finally,

k ( k 1) / 2

discriminant

functions

to

if only ith row have most


wi ,
x
number of '1'.
reject ,
otherwise.

estimate,

(17)

3. Experiment of iris recognition


This section evaluates the experimental results of
KFD and SVM for iris recognition. Experiments are
performed on the second CASIAII iris database
provided by Institute of Automation, Chinese
Academy of Sciences. This database includes 1200
images from 60 eyes. The iris image acquired after
preprocessing is showed in Fig.2.
In our experiment, we use False Acceptance Rate
(FAR) and False Rejection Rate (FRR) to evaluate the
proposed method. We choose 10 samples from each
class as training dataset, and the data else as testing
dataset. Before classification, we use 2-level wavelet
transform and PCA to reduce noise and extract feature.
Wavelet Transform: By 2-level wavelet transform,
an image will be decomposed to a set of sub images
which can be utilized to encoding or other processing.

(a) Original image

(b) Normalized image

(c) Enhanced normalized image


Fig.2 Iris image

PCA: Principle component analysis (PCA) is a


powerful technique for reducing a large set of
correlated component variables to a smaller number of
uncorrelated components, which allows for the
minimal mean square reconstruction error of the
original data [5]. Given a set of centered training data

x k , k = 1,2 L n ,

= 0.

k =1

PCA diagonalizes the covariance matrix,

C=

1 n
xi xiT ,

n i =1

(18)

here, we just need to solve the eigenvalue equation,


(19)
v = Cv ,
where, eigenvector v structures the orthonormal
projection matrix W , so the set of vector with a low
dimension is obtained,
(20)
z =WTx+b.
The vector Tz = {z1 , z 2 , L , z n } is the feature we
obtain, and which we will do classification on.

3.1 Experimental Comparison on KFD and


SVM with different kernel function
In this section, we enumerate a series of
experimental results to examine the performance of
KFD and SVM. As shown in Table.1, we can see that
KFD classifier with sigmoid kernel function and OAO
method has best performance, with Total Error Rate
1.00%. However, in Table.2, when using SVM
classifier, if combining Gaussian kernel function with
OAO method, we can obtain best result with Total
Error Rate 0.42%.
During our experiment OAO method spends less
training time, and from both two tables, we will also
gain better recognition result when using OAO method
rather than OAA method. Having these two
advantages, obviously, OAO method is superior to
OAA method.
Table.1 FAR and FRR of KFD with different kernel
functions and both OAA and OAO methods
Different
Total
Multikernel
FAR
FRR
Error
class
function
Rate
Method
Linear
0%
15.5%
15.5%
OAA
Poly (d=3)
0%
6.67%
6.67%
Gaussian
0%
2.08%
2.08%
Sigmoid
0%
3.08%
3.08%
Linear
0.42%
0.83%
1.25%
OAO
Poly (d=3)
0.33%
3.75%
4.08%
Gaussian
0.17%
1.75%
1.92%
Sigmoid
0.42%
0.58%
1.00%
Table.2 FAR and FRR of SVM with different kernel
functions and both OAA and OAO methods

Compared with existing methods, kernel-based


classifiers also have an encouraging performance as
shown in Table.3. Therefore, we can say that kernelbased classifiers perform well in iris recognition.
Table.3 Comparison between existing method and proposed
method

Methods
Daugmans
Zhus
PCA
ICA
KFD
SVM

4. Conclusion
In this paper, a new method for iris recognition is
proposed, where wavelet-transform and PCA is used
for feature extraction and dimension reduction, while
KFD or SVM is used for classification. Experiments
illustrate that on the basis of wavelet transform and
PCA, KFD and SVM can improve the performance
and outperform the existing method for iris recognition,
where SVM with Gaussian kernel function and OAO
multi-class method performs best.

Acknowledgment
This work is supported by the National Natural
Science Foundation of China (No.60472046).

References
[1]

Multiclass
Method
OAA

OAO

Different
kernel
function
Linear
Poly (d=3)
Gaussian
Sigmoid
Linear
Poly (d=3)
Gaussian
Sigmoid

FAR
0.08%
0%
0%
0.08%
0.08%
0.50%
0%
0.67%

FRR
2.83%
6.00%
2.08%
3.08%
0.67%
4.50%
0.42%
2.08%

Total
Error
Rate
2.92%
6.00%
2.08%
3.16%
0.75%
5.00%
0.42%
2.75%

By comparing the FAR, FRR and TER in both


Table.1 and Table.2, the best performance belongs to
SVM with Gaussian kernel function and OAO multiclass method.

[2]

[3]

[4]

[5]
[6]

3.2 Experimental Comparison between KFD


or SVM and other methods
In this section, we use Equal Error Rate (EER) to
evaluate the existing method, where EER is the point
when FAR is equal to FRR in numerical value.

Error Rate
4.08% (EER)
19.86%(EER)
3.84%(EER)
5.21%(EER)
0.42%(FAR) 0.58%(FRR)
0% (FAR) 0.42%(FRR)

[7]

[8]

L.Flom and A.Safir, Iris Recognition System, U.S.


Patent, No.4641394, 1987.
J.Daugman, High Confidence Visual Recognition of
Persons by a Test of Statistical Independence, IEEE
Trans. on Pattern Analysis and Machine Intelligence,
Vol. 15, No.11, pp.1148-1161, 1993.
W.W.Boles and B.Boashah, A Human Identification
Technique Using Images of the Iris and Wavelet
Transform, IEEE Trans. on Signal Processing,
Vol.46, pp.1185-1188, 1998.
Vivekanand Dorairaj, Natalia A.Schmid and Gamal
Fahmy, Performance Evaluation of Iris Based
Recognition System Implementing PCA and ICA
Encoding Techniques, Biometric Technology for
Human Identification II, Proc.of SPIE Vol.5779.
Nello Cristianini, Kernel Methods for General
Pattern Analysis, www.kernel-methods.net.
B. Shroulkopf and C. Burges Advances in KernelMethod Support Vector Learning, MIT, press, 1999.
K.-R. Mller and S. Mika, An introduction to kernelbased learning algorithms, IEEE Transactions on
Neural Networks, 12(2):181-201, 2001.
S. Mika, G. Rtsch and J. Weston, Fisher discriminant
analysis with kernels, Neural Networks for Signal
Processing IX, pages 41-48. IEEE 1999.

You might also like