You are on page 1of 12

Novel Multi Algorithm based Speech and Face Recognition

(Multimodal) System Design and Implementation.

INTRODUCTION
Biometrics refers to the authentication techniques that rely on measurable physiological
and individual characteristics that can be automatically verified. Depending on the application
context, a biometric system may operate in verification mode or identification mode. As the
level of security breaches and transaction fraud increases, the need for highly secure
identification and personal verification techniques is becoming apparent. Biometric based
solutions are able to provide for confidential transactions and personal data privacy. Multimodal
biometric integrates different biometric systems for verification in making a personal
identification.
A biometric recognition system can be used in two different modes: identification (1:N
matching) or verification (1:1 matching). Identification is the process of trying to find out a
persons identity by comparing the person who is present against a biometric pattern/template
database. The system would have been pre-programmed with biometric pattern or template of
multiple individuals. During the enrolment stage, a biometric would have been processed, stored
and encrypted, for each individual.
A pattern / template that is going to be identified is going to be matched against every
known template, yielding either a score or distance describing the similarity between the pattern
and the template. The system assigns the pattern to the person with the most similar biometric
template. To prevent impostor patterns (in this case all patterns of persons not known by the
system) from being correctly identified, the similarity has to exceed a certain level. If this level is
not reached, the pattern is rejected.
With verification, a persons identity is known and therefore claimed a priority to search
against. The pattern that is being verified is compared with the persons individual template only.
Similar to identification, it is checked whether the similarity between pattern and template is
sufficient enough to provide access to the secured system or area.

Statement of the problem:


Most of the biometric systems deployed in real world applications are unimodal which
rely on the evidence of single source of information for authentication (e.g. fingerprint, face,
voice etc.). These systems are vulnerable to variety of problems such as noisy data, intra-class
variations, inter-class similarities, non-universality and spoofing. It leads to considerably high
false acceptance rate (FAR) and false rejection rate (FRR), limited discrimination capability,
upper bound in performance and lack of permanence.
Some of the limitations imposed by unimodal biometric systems can be overcome by
including multiple sources of information for establishing identity. These systems allow the
integration of two or more types of biometric systems known as multimodal biometric systems.
These systems are more reliable due to the presence of multiple, independent biometrics .These
systems are able to meet the stringent performance requirements imposed by various
applications. They address the problem of non-universality, since multiple traits ensure sufficient
population coverage. They also deter spoofing since it would be difficult for an impostor to spoof
multiple biometric traits of a genuine user simultaneously.

Objectives of the proposed study:

The main purpose of the proposed system is to reduce the error rate as low as possible
and improve the performance of the system by achieving good acceptable rate during
identification and authentication.

To

replace

the

existing

computationally intensive

algorithms

with

multiple

computationally efficient algorithms and design these algorithms to be on par in the


performance with the highly complex algorithms and procedures.

This replacement of complex procedures of multimodal biometrics with optimized multi


algorithm approach is to make use of parallel architecture based signal processing
hardware to meet real time challenges.

LITERATURE SURVEY
Biometrics refers to the physiological or behavioural characteristics of a person to
authenticate his/her identity [1]. The increasing demand of enhanced security systems has led to
an unprecedented interest in biometric based person authentication system. Biometric systems
based on single source of information are called unimodal systems. Although some unimodal
systems [2] have got considerable improvement in reliability and accuracy, they often suffer
from enrollment problems due to non-universal biometrics traits, susceptibility to biometric
spoofing or insufficient accuracy caused by noisy data [3].
Multi algorithm approach employs a single biometric sample acquired from single sensor.
Two or more different algorithms process this acquired sample. The individual results are
combined to obtain an overall recognition result. This approach is attractive, both from an
application and research point of view because of use of single sensor reducing data acquisition
cost. The 2002 Face Recognition Vendor Test has shown increased performance in 2D face
recognition by combining the results of different commercial recognition systems [4]. Gokberk
et al. [5] have combined multiple algorithms for 3D face recognition. Xu et al. [6] have also
combined different algorithmic approaches for 3D face recognition.
Many different ways of combining the face and voice modalities have been presented in
the literature [7]-[12] , [17-18].
For speech Many classifier approaches, such as vector quantization (VQ), Bayesian
discriminant dynamic time warping (DTW), Gaussian mixture model (GMM), hidden Markov
model (HMM) and neural network (NN), have been studied for speaker recognition. Among
these approaches, GMM yield the best performance, especially for text-independent applications
[13]. GMM is a powerful approach to model a speakers characteristics for its flexibility to
approximate the underlying probability distribution in a high dimensional space.
PCA is used to calculate uncorrelated components from the covariance matrix of the
original data in the orthogonal matrix transform [15]. LDA searches for those vectors in the
underlying space that best discriminate among the classes and also reduce the dimensionality of
original data [16]. The majority of the biometric systems use Singular Value Decomposition
(SVD) method. The SVD method plays a vital role in analyzing the biometric traits.

Types of Biometrics:
The biometric system can be classified into two different types:
1. Uni modal Biometric System:
The unimodal biometric employs single biometric trait (either physical or behaviour trait)
to identify the user. Example: Biometric system based on Face or Iris or Palm print or
Voice or Gait etc.
2. Multimodal Biometric System:
A biometric system that consolidates the information from multiple sources is known as
multimodal biometric system. For example:

Speech and Signature

Face and Iris

Face Recognition, Fingerprint verification and speaker verification.

Fingerprint and Hand Geometry.

Limitations of unimodal biometric systems:

Noise in sensed data: Noise in the sensed data may result from defective or improperly
maintained sensor.ex. Finger print image with scar, voice sample altered by cold etc.

Intra-class variation: Caused by an individual who is incorrectly interacting with sensor


and this will increase False Reject Rate (FRR).

Intra-class similarities: Refers to overlapping of feature spaces corresponding t multiple


classes or individuals. This may increase the False Acceptance Rate of the system.

Non-universality: Biometric system may not able to acquire meaningful biometric data
from a subset of users.

Spoof attacks: Involves the deliberate manipulation of ones biometric traits in order to
avoid recognition. This type of attack is relevant when behaviour traits are used.

Multimodal Biometrics:
The term multimodal is used to combine two or more different biometric sources of a
person (like face and fingerprint) sensed by different sensors.

The Benefits of Multimodal Biometrics:


The multimodal biometric system exhibits number of advantages as compared to that of
unimodal biometric system:

Since multimodal biometric system acquires more than one type of information it offers a
substantial improvement in the matching accuracy as compared to that of unimodal
system.

Multi modal biometric systems are capable of addressing the non universality issue by
accommodating a large population of users.

Multimodal biometric systems are less sensitive to imposter attacks. It is very difficult to
spoof the legitimate user enrolled in multimodal biometric system

Multimodal biometric systems are insensitive to the noise on the sensed data i.e. when
information acquired from the single biometric trait is corrupted by noise we can use
another trait of the same user to perform the verification.

These systems also help in continuous monitoring or tracking the person in situation
when a single biometric trait is not enough. For example tracking a person using face and
gait simultaneously.

Challenges in designing multimodal biometric systems:


Since multimodal biometric relies on multiple information, combing the information plays
an important role in designing the multimodal biometric system. The following are the
challenges involved in designing the multimodal biometric system.

Selection of multimodal biometric source is very challenging as it depends upon the


application and cost involved in acquiring the same.

In multimodal biometric system the information acquired from different sources can be
processed either in sequence or parallel. Hence it is challenging to decide about the
processing architecture to be employed in designing the multimodal biometric system as
it depends upon the application and the choice of the source. Processing is generally
complex in terms of memory and or computations.

Since information obtained from different biometric sources can be combined at four
different levels such as: sensor, feature, match score and decision level. Choosing the

level of fusion will have direct impact on performance and cost involved in developing a
system. Thus, it is challenging to decide the level of fusion to be employed for the given
sources and application.

Given the biometric source and level of fusion, numbers of techniques are available for
fusing the multiple source of information. Hence, it is challenging to find the optimal
one for the given application.

Multi Algorithm Approach:

Multi algorithm approach employs a single biometric sample acquired from single
sensor. Two or more different algorithms process this acquired sample. The individual
results are combined to obtain an overall recognition result.

This approach is attractive, both from an application and research point of view because
of use of single sensor reducing data acquisition cost.

Multi Sample Approach:

Multi sample or multi instance algorithms use multiple samples of the same biometric.
The same algorithm processes each of the samples and the individual results are fused to
obtain an overall recognition result.

In comparison to the multi algorithm approach, multi sample has advantage that using
multiple samples may overcome poor performance due to one sample that has
unfortunate properties. Acquiring multiple samples requires either multiple copies of the
sensor or the user availability for a longer period of time.

Compared to multi algorithm, multi sample seems to require either higher expense for
sensors, greater cooperation from the user, or a combination of both.

Modes of Operation:
A multimodal system can operate in one of three different modes:

Serial mode: In the serial mode of operation, the output of one modality is typically used
to narrow down the number of possible identities before the next modality is used.
Therefore, multiple sources of information (e.g., multiple traits) do not have to be
acquired simultaneously. Further, a decision could be made before acquiring all the traits.
This can reduce the overall recognition time.

Parallel mode: In the parallel mode of operation, the information from multiple
modalities are used simultaneously in order to perform recognition.

Hierarchical mode: In the hierarchical scheme, individual classifiers are combined in a


treelike structure. This mode is relevant when the number of classifiers is large

Multimodal biometrics in terms of FAR & FRR:

FAR (false acceptance rate): the probability of an imposter being accepted as a genuine
individual.

FRR (false rejection rate): the probability of a genuine individual being rejected as an
imposter.

Applications:
The applications of biometrics can be divided into the following three main groups.

Commercial applications such as computer network login, electronic data security, ecommerce, Internet access, ATM, credit card etc.

Government applications such as national ID card, correctional facility, drivers license,


social security, welfare disbursement, border control, and passport control.

Forensic applications such as criminal investigation, terrorist identification, parenthood


determination, and missing children.

Traditionally, commercial applications have used knowledge- based systems (e.g., PINs and
passwords), government applications have used token-based systems (e.g., ID cards and badges),
and forensic applications have relied on human experts to match biometric features.

METHODOLOGY

Multi algorithm approach :


Multi algorithm approach employs a single biometric sample acquired from
single sensor. Two or more different algorithms process this acquired sample.
The individual results are combined to obtain an overall recognition result. This
approach is attractive.
Parallel architecture approach:
In the parallel mode of operation, the information from multiple modalities is
used simultaneously in order to perform recognition.

Recognition System:

Any recognition system involves various stages. The final output is the recognized person
or identity. Here the first task is the data collection that acquires the data in the system. In
the problem of fusion of face and speech, the camera is used to take the photograph of the
person. At the same time the microphone may be used to capture his voice. Here the
system would be very simple to use for the user where the image and speech can be
acquired simultaneously.

The next step comes is the image pre processing. This is needed for the noise removal as
well as to highlight the features. In case of the face the input is in the form of image that
requires the application of noise removal operators and binarization. In case of speech the
input is a signal that may be freed from noise by the application of noise removal filters.

The next task is segmentation. Here we segment the image and the features. In image the
task is concerned with application of gradient mask, dialization, filling up of holes, etc. In
speech we segment each and every word of the spoken sentence. Then feature extraction
is done. Here we extract the features for dimensionality reduction. The extracted features
must be such that they lead to large inter-class distances and small intra-class distances.
They must be relatively constant when the same face is clicked numerous times, or the
person speaks various times.

Levels of fusion:
The information of the multimodal system can be fused at any of the four modules.

Fusion at the sensor level:

In this the raw data from different sensors are fused. In it we can either use samples of same
biometric trait obtained from multiple compatible sensors or multiple instances of same
biometric trait obtained using a single sensor. In it the data is fused at very early stage so it has a
lot of information as compared to other fusion levels.

Fusion at the Feature Extraction Level:

The data or the feature set originating from multiple sensors or sources are fused together.
Features extracted from each sensor form a feature vector. These features vectors are then
concatenated to form a single new vector. In feature level fusion we can use same feature
extraction algorithm or different feature extraction algorithm on different modalities whose
features has to be fused.

Matcher Score Level:

Each system provides a matching score indicating the proximity of the feature vector with the
template vector. These scores can be combined to assert the veracity of the claimed identity. The
scores obtained from different matchers are not homogeneous, score normalization technique is
followed to map the scores obtained from different matchers on to a same range. These scores
contain the richest information about the input.
Fusion at the Decision Level:
The final outputs of the multiple classifiers are combined. A majority vote scheme can be
used to make final decision. Decision level fusion includes very abstract level of information so
they are less preferred in designing multimodal biometric systems.

POSSIBLE OUTCOME

To achieve multimodal and multi algorithm approach for the recognition of face and
speech.

Computationally efficiency algorithms based on multi algorithm approach for multi


modal biometrics.

To achieve optimal procedures optimized for power efficiency and also enhanced
performance.

Improved FAR and FRR compare to those of existing methodology

REFERENCES
1.A. K. Jain, A. Ross and S. Prabhakar, An introduction to biometric recognition. IEEE
Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 420, Jan 2004.
2.Chander Kant, Rajender Nath, Reducing Process-Time for Fingerprint Identification System,
International Journals of Biometric and Bioinformatics, Vol. 3, Issue 1, pp.1- 9, 2009.
3.A.K. Jain, A. Ross, Multibiometric systems. Communications of the ACM, vol. 47, pp. 3440, 2004.
4. Phillips, P.J., P. Grother R.J. Michaels, D.M. Blackburn and E. Tabassi and J.M. Bone,
FRVT 2002: overview and summary", March 2003.
5. Gokberk, B., A.A. Salah. and L. Akarun, Rank-Based Decision Fusion for 3D Shape- Based
Face Recognition, LNCS 3546: AVBPA, pp. 1019-1028, July 2005.
6. Xu, C., Y. Wang, T. Tan and L. Quan, Automatic 3D face recognition combining global
geometric features with local shape variation information, Aut. Face and Gesture Recog., pp.
308 -313, 2004.
7.Aleksic, P. S. and Katsaggelos, A. K., Audio-visual biometrics, Proc. IEEE, vol. 94, no. 11,
pp. 2025- 2044, Nov. 2006.
8. Sanderson, C., Automatic person verification using speech and face information, Ph.D.
Thesis, Griffith University, Queensland, Australia, 2003.
9. Chetty, G. and Wagner, M., Face-voice authentication based on 3D face models, Proc.
ACCV, pp. 559- 568, Jan. 2006.
10. Chetty, G. and Wagner, M., Speaking faces for face voice speaker identity verification,
Proc. Inter speech, pp. 513-516, Sept. 2006.

11.Erzin, E., Yemez, Y., and Tekalp, A. M., Multimodal speaker identification using an
adaptive classifier cascade based on modality reliability, IEEE Trans. Multimedia, vol. 7, no. 5,
pp. 840-852, Oct. 2005.
12. Chetty, G. andWagner, M., Audio-visual speaker verification based on hybrid fusion of
cross modal features, Proc. PreMI, pp. 469-478, Dec. 2007.

13. Sanderson, C., Biometric person recognition: face, speech, and fusion. VDM Verlag, June
2008.
14. Reynolds, D. A., Quatieri, T., Dunn, R., "Speaker verification using adapted Gaussian
mixture models", Digital Signal Process, 10, 19-41, 2000.
15. M. Turk and A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience,
vol. 3, no. 1, 1991, pp.71-86.
16. P.N. Belhumeur, 1.P. Hespanha, and D. J. Kriegman, "Eigen faces vS.Fisher faces:
Recognition using class specific linear rojection", IEEE Trans. Pattern Anal. Machine Intel., vol.
19, PP. 711-720, May 1997.
17.Ibiyemi T. S. , Ogunsakin J. , Daramola S. A. 2012. "Bi-Modal Biometric Authentication by
Face Recognition and Signature Verification", International Journal of Computer Applications,
vol. 42, no. 20, pp 17-21.
18.Ibiyemi T. S. , Akintola A. G. 2012. "Speaker Authentication and Speech Recognition
Enabled Telephone Auto-Dial in Yorb", International Journal of Science and Advanced
Technology, vol. 12, no. 4, pp 88-187.

You might also like