Professional Documents
Culture Documents
=
=
fusion techniques. Finally, the experimental results are
given in section 5. Conclusions are given in the last
section.
2. System Structure
The multimodal biometric system is developed using two
traits i.e. speech signal and palmprint as shown in figure
1. For the speech signal and palmprint recognition, the
input image is recognized using MFCC and 2D Gabor
filter algorithm respectively. When we are using a gabor
filter, the matching score is calculated using Hamming
distance also when we are using MFCC, GMM is used.
The modules based on the individual traits returns an
integer vector after matching the database and query
feature vectors. The final score is generated by fusing
both the scores using fusion technique, which is passed to
the decision module.
Figure 1. Block diagram of speech signal and palmprint multimodal
biometric system.
3. Feature Extraction using MFCC and
Gabor filter
3.1. Feature extraction using MFCC
Feature extraction is the first component in an automatic
speaker recognition system [3]. This phase consists of
transforming the speech signal in a set of feature vectors
called also parameters. The aim of this transformation is
to obtain a new representation, which is more compact,
Figure 2. Components of a speaker recognition system
less redundant, and more suitable for statistical modeling
and calculation of distances. Most of the speech
parameterizations used in speaker recognition systems
relies on a Cepstral representation of the speech signal [4].
The Mel-frequency Cepstral coefficients (MFCC) are
motivated by studies of the human peripheral auditory
system. Firstly, the speech signal x(n) is divided into Q
short time windows which are converted into the spectral
domain by a Discrete Fourier Trans form(DFT). The
magnitude spectrum of each time window is then
smoothed by a bank of triangular bandpass filters (Figure 3)
that emulate the critical band processing of the human ear.
Figure 3. Mel filter bank
Each one of the bandpass filter H (k, m) computes a
weighted average of that subband, which is then log|.|
arithmically compressed:
(1)
where X (k) is the DFT of a time window of the signal x(n)
having the length N, the index k, k =0, . . . , N 1,
corresponds to the frequency f
k
= k f
s
/N, with f
s
the
sampling frequency, the index m, m =1, . . . M and M <<
N, is the filter number, and the filters H (k, m) are
triangular filters defined by the center frequencies f
c
(m)
(Sigurdsson et al., 2006). The log compressed filter outputs
X f (m) are then decorrelated by using the Discrete Cosine
Transform (DCT):
(2)
where
( ) c l is the
th
l MFCC of the considered time window.
Figure 4. Extraction of MFCC
Feature
Extraction
Speaker
Modeling
Pattern
Matching
Speaker
Modal
Databas
e
Decision
Training
Mode
Recognitio
n
Mode
Speech
Signal
Pre-
emphasis
and
windowing
Log
|.|
DFT
MFCC
DCT
Filter Bank
with
Mel-frequency
Filter Bank with
Linear-
frequency
( )
1
1
( ) cos
2
M
m
c l X m l m
M
t
=
'
=
| | ||
||
\ \ ..
1 1
0
1 1
( , ) ( , ) ( ( , ) ( , )) ( , ) ( , ) ( ( , ) ( . ))
2 ( , ) ( , )
N N
M M R R M M I I
i j
N N
M M
i j
P i j Q i j P i j Q i j P i j Q i j P i j Q i j
D
P i j Q i j
=
= =
+
=
There are several analytic formulae for the Mel scale used
to compute the center frequencies f
c
(m). In this study we
use the following common mapping:
(3)
3.2. The Gaussian Mixture Model
In this study, a Gaussian Mixture Model approach
proposed in [5] is used where speakers are modeled as a
mixture of Gaussian densities. The use of this model is
motivated by the interpretation that the Gaussian
components represent some general speaker-dependent
spectral shapes and the capability of Gaussian mixtures to
model arbitrary densities.
The Gausssian Mixture Model is a linear
combination of M Gaussian mixture densities, and given
by the equation
(4)
Where x
1
/2 1/2
1 1
( ) exp ( ) ( )
2 (2 ) | |
T
i i D
i
i
bi x x x
`
)
=
H
{ }
2 2
2 2
1
( , , , , , ) exp
2 2
exp 2 ( cos sin )
x y
G x y u
ni ux uy
u o
to o
u u
=
+
`
)
| | | |
| |
( )
2
, , , ,
, , , , , , , ,
2 1
n n
i n j n
G i y u
G s y u G x y u
n
u o
u o u o
= =
=
+
(9)
Where, n is any raw score, and and are the mean and
standard deviation of the stream specific scores and are
computed on some development data.
5. Fusion
Multibiometric fusion refers to the fusion of multiple
biometric indicators. Such systems seek to improve the
speed and reliability (accuracy) of a biometric system by
integrating matching scores obtained from multiple
biometric sources.
5.1. Matcher Weighting using False Acceptance Rate
and False Rejection Rate (MW FAR/FRR)
This fusion technique can be used again in the case of
having two matcher types only. In this technique the
performance of the individual matchers determines the
weights so that smaller error rates result in larger weights.
The performance of the system is measured by False
Acceptance Rate (FAR) and False Rejection Rate (FRR).
These two types of errors are computed at different
thresholds. The threshold that minimizes the absolute
difference between FAR and FRR on the development set
is then taken into consideration. The weights for the
respective matchers are computed as follows.
(10)
and
(11)
Where FAR
1
, FRR
1
and w
1
are the false acceptance rate,
false rejection rate and the weight for one matcher and
FAR
2
,FRR
2
are the false acceptance rate, false rejection
rate for the other matcher with the weight w
2
. Note that
the weight (obtained on some development data) is in the
interval of 0 and 1, with the constraint w
1
+w
2
=1. The
fused score using different matchers is given as
(12)
where, x
m
is the normalized score of matcher m and f is
the fused score.
5.2. Matcher Weighting based on Equal Error Rate
(MW - EER)
The matcher weights in this case depend on the Equal
Error Rates (EER) of the intended matchers for fusion.
These EERs are computed using the given development
data. EER of matcher m is represented as E
m
,
m=1,2,,M and the weight w
m
associated with matcher
m is computed as
(13)
Note that 0sw
m
s1. It is apparent that the weights are
inversely proportional to the corresponding errors in the
individual matchers. The weights for less accurate
matchers are lower than those of more accurate matchers.
The fused score is calculated as
(14)
Where f is the fused score, x
m
is the normalized match
score from the m
th
matcher and w
m
is the corresponding
weight.
5.3. Logistic Regression (LR)
Another simple classification method can be used in the
case of a two-class problem (Clients / Impostors) is that
based on the principles of logistic regression [11, 12-14].
The Logistic Regression method classifies the data based
on using two functions: logistic regression function and
logit transformation as follows:
(15)
Where E(Y|x) is the conditional probability for the binary
output variable Y and where the M-dimensional input
vector x=( x
1
,x
2
,....,x
M
) exists and g(x) is defined as:
(16)
where w
m
is the weight for the mth modality. Due to the
fact that each w
m
with i0multiplies one of the M
modalities, it is evaluated as the level of the importance
of that modality in the fusion process. A high w
m
shows
an important modality whilst a low w
m
shows a modality
not contributing a great deal. Parameters in the above
equation (w
0
,w
1
,...,w
M
) can be calculated with the
maximum likelihood approach. The outcome is thus
compared with optimal threshold calculated on the
development data.
5.4. Quadratic Discriminant Analysis (QDA)
This technique is similar to FLD but is based on forming
a boundary between two classes using a quadratic
equation given as [15]
(17)
For training data 1 and 2 from two different classes,
which are distributed as M[i,Si], ie1 and 2,the
transformation parameters A and B can be obtained on
the development data as:
(18)
(19)
C is a constant that depends on the mean vectors and
covariance matrices and is computed as follows:
(20)
n
x
o
=
1 1
1
2 2 1 1
1 ( )
2 ( )
FAR FAR
w
FAR FRR FAR FRR
+
=
+ + +
2 2
2
1 1 2 2
1 ( )
2 ( )
FAR FAR
w
FAR FRR FAR FRR
+
=
+ + +
1 1 2 2
f wx w x = +
1
1
1
m
M
m
m m
w
E
E
=
=
| |
|
\ .
1
M
m m
m
f w x
=
=
( )
( )
( | )
1
g x
g x
e
E Y x
e
=
+
0 1 1
( ) ...
M M
g x w wx w x = + + +
( )
T T
h x x Ax B x C = + +
( )
1 1
1 2
1
2
A S S
=
1 1
1 1 2 2
B S S
=
1 1 1
1 1 1 2 2 2
2
| |
ln
| |
T T
S
C S S
S
= +
6. Experimental results
We evaluate the proposed multimodal system on a data
set including 720 pairs of images and speech signal from
120 subjects. The training database contains a speech
signals and palmprint images for each subject. Each
subject has 6 palm images and 6 different words taken at
different time intervals, which is stored in the database.
Before extracting features of palmprint, we locate
palmprint images to 128x128.
The multimodal system has been designed at
matching score level. At first experimental the individual
systems were developed and tested for FAR, FRR &
accuracy. In the last experiment both the traits are
combined at different fusion techniques and compared.
The results are found to be very encouraging and
promoting for the research in this field. The overall
accuracy of the system is more than 99% with EER less
than 1.21%. Table1 shows Accuracy in terms of EER
with different fusion techniques.
Figure 5. Accuracy vs. threshold curves for four different fusion
techniques
TABLE1
Verification results in terms of EER(%), based on score level fusion.
Fusion Candidates
M
W
F
A
R
/
F
R
R
M
W
E
E
R
L
R
Q
D
A
Face Speech
Feature Classifier Feature Classifier
Gabor
Hamming
Distance
MFCC GMM 0.85 0.64 0.47 1.21
Figure 6. Comparison of Modalities measured at FRR=10
-2
7. Conclusion
Biometric systems are widely used to overcome the
traditional methods of authentication. But the unimodal
biometric system fails in case of biometric data for
particular trait. Thus the individual score of two traits
(speech signal & palmprint) are combined at classifier
level and trait level to develop a multimodal biometric
system. The performance table shows that multimodal
system performs better as compared to unimodal
biometrics with accuracy of more than 97%.
8. References
[1] U. M. Bubeck, "Multibiometric authentication: An
overview of recent developments," in Term Project
CS574 Spring. San Diego State University, 2003.
[2] C. Sanderson and K. K. Paliwal, "Identity
Verification using Speech and Face Information,"
Digital Signal Processing, vol. 14, pp.449-480, 2004.
[3] G. Feng, K. Dong, D. Hu, and D. Zhang. When Faces
Are Combined with Palmprints: A Novel Biometric
Fusion Strategy. In Proceedings of ICBA, pages 701
707, 2004.
[4] G. Feng, K. Dong, D. Hu & D. Zhang, When Faces
are Combined with Palmprints: A Noval Biometric
Fusion Strategy, ICBA, pp.701-707, 2004.
[5] D. A. Reynolds, Experimental Evaluation of
Features for Robust Speaker Identification, IEEE
Transactions on SAP, vol. 2. Pp. 639-643,1994.
[6] J.G. Daugman, High Confidence Visual Recognition
of Persons by a Test of Statistical Independence,
IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 15, no. 11, pp. 1148-1161, Nov.
1993.
[7] A. K. Jain, K. Nandakumar, and A. Ross, "Score
normalisation in multimodal biometric systems"
Pattern Recognition, vol. 38, pp. 22702285, 2005.
[8] R. Snelick, U. Uludag, A. Mink, M. Indovina, and A.
K. Jain, "Large Scale Evaluation of Multimodal
Biometric Authentication Using State-of-the-Art
Systems," IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 27, pp. 450-455, 2005.
[9] N. Poh and S. Bengio, "A Study of the Effects of
Score Normalisation Prior to Fusion in Biometric
Authentication Tasks," IDIAP Research Report No.
IDIAPRR 04-69, 2004.
[10] k. Nandakumar, "Integration of Multiple Cues in
Biometric Systems," M.S. Thesis: Michigan State
University, 2005.
[11] P. Verlinde, P. Druyts, G. Chollet, and M. Acheroy,
"A multi-level data fusion approach for gradually
upgrading the performances of identity verification
systems," Sensor Fusion: Architectures, Algorithms
and Application III, vol. 3719, pp. 14-25, 1999.
[12] B. D. Ripley, Pattern Recognition and Neural
Networks. U.K: Cambridge University 1996.
[13] D. W. Hosner and S. Lemeshow, Applied logistic
regression: John Wiley & Sons, 1989.
[14] Y. So, "A Tutorial on Logistic Regression," SAS
Institute Inc, 1995.
[15] B. Flury, Common Principle Components and Related
Multivariate Models. USA: John Wiley and Sons,
1988.