You are on page 1of 9

An Idea of Fusion Feature using Neural Networks

Abstract

In this paper, the neural networks are introduced at first. Neural networks have emerged as a field
of study within AI and engineering via the collaborative efforts of engineers, physicists,
mathematicians, computer scientists, and neuroscientists. Although the strands of research are
many, there is a basic underlying focus on pattern recognition and pattern generation. Then the
author turns to information fusion in multimodal biometrics. At last I propose the idea of fusion
feature using a multi-level neural network. The neural network has two input levels and two hide
levels and it has only one output level whose architecture is different from others’. In the paper I
give details about the network such as its architecture and its learning process.

Key Items: Neural networks; Multimodal biometrics; Information fusion; Fusion feature

1. Neural Networks

Neural networks have emerged as a field of study within AI and engineering via the collaborative
efforts of engineers, physicists, mathematicians, computer scientists, and neuroscientists. Although
the strands of research are many, there is a basic underlying focus on pattern recognition and
pattern generation, embedded within an overall focus on network architectures. Many neural
network methods can be viewed as generalizations of classical pattern-oriented techniques in
statistics and the engineering areas of signal processing, system identification, optimization, and
control theory. There are also ties to parallel processing, VLSI design, and numerical analysis.
A neural network is first and foremost a graph, with patterns represented in terms of
numerical values attached to the nodes of the graph and transformations between patterns
achieved via simple message-passing algorithms. Certain of the nodes in the graph are generally
distinguished as being input nodes or output nodes, and the graph as a whole can be viewed as a
representation of a multivariate function linking inputs to outputs. Numerical values (weights) are
attached to the links of the graph, parameterizing the input/output function and allowing it to be
adjusted via a learning algorithm.
A broader view of a neural network architecture involves treating the network as a statistical
processor, characterized by making particular probabilistic assumptions about data. Patterns
appearing on the input nodes or the output nodes of a network are viewed as samples from
probability densities, and a network is viewed as a probabilistic model that assigns probabilities to
patterns. The problem of learning the weights of a network is thereby reduced to a problem in
statistics that of finding weight values that look probable in the light of observed data.

Figure 1. A simple neural network

The links to statistics have proved important in practical applications of neural networks.
Real-world problems are often characterized by complexities such as missing data, mixtures of
qualitative and quantitative variables, regimes of qualitatively different functional relationships,
and highly nonuniform noise. Neural networks are complex statistical models with the flexibility
to address many of these complexities, but as with any flexible statistical model one must take
care that the complexity of the network is adjusted appropriately to the problem at hand. A
network that is too complex is not only hard to interpret but, by virtue of over fitting the random
components of the data, can perform worse on future data than a simpler model. This issue is
addressed via statistical techniques such as cross-validation, regularization, and averaging, as well
as the use of an increasingly large arsenal of Bayesian methods. Other practical statistical issues
that arise include the assignment of degrees of confidence to network outputs (“error bars”), the
active choice of data points (“active learning”), and the choice between different network
structures (“model selection”). Progress has been made on all these issues by applying and
developing statistical ideas.
The statistical approach also helps in understanding the capabilities and limitations of
network models and in extending their range. Neural networks can be viewed as members of the
class of statistical models known as “nonparametric,” and the general theory of nonparametric
statistics is available to analyze network behavior. It is also of interest to note that many neural
network architectures have close cousins in the nonparametric statistics literature; for example, the
popular multilayer perceptron network is closely related to a statistical model known as
“projection pursuit,” and the equally popular radial basis function network has close ties to kernel
regression and kernel density estimation.
A more thoroughgoing statistical approach, with close ties to “semiparametric” statistical
modeling, is also available in which not only the input and output nodes of a network but also the
intermediate (“hidden”) nodes are given probabilistic interpretations. The general notion of a
mixture model, or more generally a latent variable model, has proved useful in this regard; the
hidden units of a network are viewed as unobserved variables that have a parameterized
probabilistic relationship with the observed variables (i.e., the inputs, the outputs, or both). This
perspective has clarified the links between neural networks and a variety of graphical probabilistic
approaches in other fields; in particular, close links have been forged with hidden Markov models,
decision trees, factor analysis models, Markov random fields, and Bayesian belief networks. These
links have helped to provide new algorithms for updating the values of nodes and the values of the
weights in a network; in particular, EM algorithms and stochastic sampling methods such as Gibbs
sampling have been used with success. Neural networks have found a wide range of applications,
the majority of which are associated with problems in pattern recognition and control theory. Here
we give a small selection of examples, focusing on applications in routine use.
The problem of recognizing handwritten characters is a challenging one that has been widely
studied as a prototypical example of pattern recognition. Some of the most successful approaches
to this problem are based on neural network techniques and have resulted in several commercial
applications. Mass screening of medical images is another area in which neural networks have
been widely explored, where they form the basis for one of the leading systems for semi-automatic
interpretation of cervical smears. As a third example of pattern recognition we mention the
problem of verifying handwritten signatures, based on the dynamics of the signature captured
during the signing process, where the leading approach to this problem is again based on neural
networks.

2. Fusion Feature

Biometrics, which refers to identify or verify an individual based on person’s physiological or


behavioral characteristics to make a positive personal identification. It is inherently more reliable
and more capable than knowledge-based and token-based techniques in differentiating between an
authorized person and a fraudulent impostor, because the physiological or behavioral
characteristics are unique to every person. Biometrics provides a solution for the security
requirements of our electronically interconnected information society. In theory collecting and
verifying biometric data is no problem but in today demanding real-world applications there are a
lot of problems with biometric systems. One of those to vary with time for one and the same
person and to make it even worse, this variation is itself very variable from one person to another.
Most of the other problems are caused by extreme or constantly changing surroundings and the
nature of certain biometric measures.
A. Noise
Noisy biometric data like a person having a cold (voice recognition), a simple cut on ones
finger (fingerprint scan) or different lighting conditions (face detection) are some examples of
noisy inputs. Other examples are misconfigured or improperly maintained sensors or inconvenient
ambient conditions like dirt on a sensor for fingerprints or voice recognition with loud background
noise. The problem with noisy biometric data is that authorized personnel may get incorrectly
rejected (FR), if the noisy data affects the extracted features so much, that no match can be found
in the biometric database. The other extreme situation would occur if noise would change the
extracted features in such a way, that the result feature set would match to another person.
B. Distinctiveness
While a biometric trait is expected to vary significantly across individuals, there may be large
similarities in the feature sets used to represent theses traits. Thus, every biometric trait has a
theoretical upper bound in terms of discrimination capability.
C. Non-universality
The problem of non-universality arises when it is not possible to acquire certain biometric
traits from all users. That means that even though a person has a fingerprint, it still may be
impossible to acquire that trait because of the poor quality of the ridges which make up the
fingerprint.
Most of the problems and limitations of biometrics are imposed by unimodal biometric
systems, which rely on the evidence of only a single biometric trait. Some of these problems may
be overcome by multi biometric systems and an efficient fusion scheme to combine the
information presented in multiple biometric traits. It is evident that problems like non-universal
traits, distinctiveness and security problems are easier and better to deal with if more biometric
traits are present. So if a person’s fingerprint can not be acquired by a sensor, other biometric
methods, such as voice recognition and retina scans, are taken into account and the resulting data
is validated against the biometric database. Spoofing of biometric data also becomes harder since
it is far easier to spoof only one biometric trait whereas with multi biometric systems it would be
necessary to spoof several traits simultaneously.
In general there are three possible levels of fusion for combining two or more biometric
systems to a multi biometric system:
A. Fusion at the feature extraction level
Feature sets are acquired from each sensor where each feature set is represented as a vector.
Then the vectors are concatenated which results in a new feature vector with higher
dimensionality representing a person’s identity in a different hyperspace.

Figure 2. Fusion at the feature extraction level


B. Fusion at the matching score level
Each biometric system provides a matching score which indicates the proximity of the
feature vector with the template vector. Fusion at this level would mean combining the matching
scores in order to verify the claimed identity. In order to combine the matching scores reported by
the sensors, techniques such as logistic regression are used. The logistic regression model is
simply a non-linear transformation of the linear regression. The logistic distribution is an S-shaped
distribution function similar to the standard normal distribution, but it is easier to work with in
most applications because the probabilities are easier to calculate. These techniques attempt to
minimize the FRR for a given FAR.
Figure 3. Fusion at the matching score level
C. Fusion at the decision level
The resulting feature vectors from each sensor need to be classified into two classes reject or
accept. Afterwards a majority vote scheme can be used to make a final decision.

Figure 4. Fusion at the decision level

3. Fusion Feature using Neural Networks

Biometric systems that integrate information at an early stage of processing are believed to be
more effective than those systems which perform integration at a later stage. Since the features
contain richer information about input biometric data than the matching score or the decision of a
matcher, integration at the feature level should provide better recognition results than other levels
of integration. However, integration at the feature level is difficult to achieve in practice. First, the
relationship between the feature spaces of different biometric systems may not know. Second,
concatenating two feature vectors may result in a feature vector with very large dimensionality
leading to the “cures of dimensionality” problem. At last, some feature vectors are not accessed!
So very few researchers have studied integration at the feature level and most of them generally
prefer fusion schemes after matching.
Now we give attention to the neural networks. A trained neural network can provide predictions
without understanding. The network is a black box and no attempt is made to understand the
rationale linking inputs and outputs. In this way, the neural network approach differs from other
more analytical forms of decision making. Let’s take a famous example to explain those. Neural
networks can be used for pattern recognition and in one such application a network was trained to
recognize tanks that were camouflaged in undergrowth of varying density. The training set
comprised a number of photographs containing undergrowth both with and without tanks. A neural
network was trained and validated successfully with an impressive prediction of tanks even when
they were almost unrecognizable in the most dense camouflage. Unfortunately, all photographs
without tanks were taken before noon; those with tanks were taken in the afternoon. The neural
network was trained to recognize the direction of shadows and not the tanks. Any tank before
noon would be immune from detection and empty undergrowth in the afternoon would be prone to
destruction.
The above example demonstrates that critical mistakes can be made when the information in
the training and validation sets is not complete and representative. Clearly, a similar situation
could occur which may advise against the use of a beneficial clinical treatment or suggest an
ineffective therapy. If pre-operative condition is not detailed, a treatment which is performed only
on the worst cases may be unfairly denounced when it could be most effective at improving the
original condition. Therefore, when applying neural networks, new practices should apply as far as
possible to include all relevant data, to seek to understand the reasons for the network predictions
and to be sceptical of these predictions, especially when they conflict with an alternative form of
clinical judgment.
Now we suppose that there are two features space A and B in a multi biometric system, in
which A may be coming from fingerprint and B may be coming from face for example. We are
asked to implement a verification system using the features space A and B . Thus each example in

the system can be noted by X = (a, b) , where a ∈ A and b ∈ B . For we don’t know what the

relationship between A and B is, we chose the neural network to implement the system.

3.1 Architecture

The neural network two input levels and two hide levels and it has only one output level, as

showing in Fig. 5. For each example X = (a, b) , a as input signals connects with input level 1

whereas b as input signals connects with input level 2. Neurons in input level connect with each
one in hide level. Neurons in hide level connect with each one in output level.
Figure 5. A neural network for fusion feature
For each neuron in hide level, its value is
d d
net lj = ∑ x li w lji + wlj 0 =∑ x li wlji (1) where the subscript
i =1 i =0

l indexes units’ direction, 1 for A and 2 for B . In Equ.1 the xi is the i-th element of feature set

and wij denotes the input-to-hidden layer weights at the hidden unit j . Each hidden emits an

l
output that is a nonlinear function of its activation, f ( net ) , that is,

y lj = f (net lj ) (2) where f (.) is

called the activation function and in many papers it is a sign function,


1 if net ≥ 0
f (net ) =  (3)
− 1 if net < 0

Each output unit similarly computes its net activation based on the hidden unit signals as
nH nH
net j = ∑ y lj wkjl + wkl 0 =∑ xjl wkjl
l
(4)
j =1 j =0

where the subscript k indexes units in the output layer and nH denotes the number of hidden units.
An output unit computers the nonlinear function of it net, emitting
1 2
z k = g (net j , net j ) (5)

where the activation function g (.) is binary function. g (.) maybe like f (.) , for example

1 if net 1 + net 2 ≥ 0
g (net 1 , net 2 ) =  (6)
− 1 if net 1 + net 2 < 0

When there are c output units, we can think of the network as computing c discriminant

functions z k = ϕ ( X ) , and can classify the input according to which discriminant function is

largest.
In the verification system, getting an example X , it computes n times and gets a series of 1
or -1. We converse -1 to 0 and converse the series of 1 and 0 to a binary number which is the
example’s id in system. If the id is same as the id the person input into the system, it decides that
he or she is just the one. Otherwise, it is decided that he or she is an intruder.

3.2 Learning Process

From last section, we can know that the network combine two three-layer neural network. So we
can think the learning process of the network has two steps: deciding the modified value of each
three-layer neural network and updating the weight of sub-networks.

We can assume that each element in training dataset is formed ( X , n) , where X is the feature

set of the element and n is id of the element. At beginning, all weights are equal to 0.
Learning Algorithm:
0. Start;

1. Take an element from training dataset ( X , n) ;

2. Converse n to binary number which is a series of 1 or 0, noted as ni , i = 1,2,... . For each

ni , if ni = 0 ,then let ni = −1 ;

3. Input X to the system and get z i , i = 1,2... ;


ni − z i
4. For each unit in output level, if z i ≠ ni , then ei = ei =
1 2
;
2
5. Apply back propagation algorithm to update each sub-network.
6. When all the networks are of convergence, go to step 7. Otherwise go to step 1;
7. end
4. Conclusion

In this paper, I try to propose an idea for information fusion at feature level using neural network.
The neural networks are introduced at first. Then I turn to information fusion in multimodal
biometrics. At last I propose the idea for fusion feature using a multi-level neural network. The
neural network has two input levels and two hide levels and it has only one output level. The idea
is needed to be validated in next step which I will true to.

Reference

[1] Jian Yang, Jing-yu Yang, David Zhang, Jian-feng Lu. Feature fusion: parallel strategy vs. serial
strategy. Pattern Recognition. 2003:36, pp. 1369-1381.
[2] Neural Networks at Pacific Northwest National Laboratory.
http://www.emsl.pnl.gov:2080/docs/cie/neural/neural.homepage.html
[3] A Novel Approach to Modelling and Diagnosing the Cardiovascular System.
http://www.emsl.pnl.gov:2080/docs/cie/neural/papers2/keller.wcnn95.abs.html
[4] Artificial Neural Networks in Medicine.
http://www.emsl.pnl.gov:2080/docs/cie/techbrief/NN.techbrief.ht
[5] Electronic Noses for Telemedicine.
http://www.emsl.pnl.gov:2080/docs/cie/neural/papers2/keller.ccc95.abs.html
[6] Pattern Recognition of Pathology Images http://kopernik-
eth.npac.syr.edu:1200/Task4/pattern.html

You might also like