You are on page 1of 8

Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011

17
1
AbstractThe aim of this paper is to apply a new robust
hardware Artificial Neural Network (ANN) for ECG
classification systems. This ANN includes a penalization
criterion which makes the performances in terms of
robustness. Specifically, in this method, the ANN weights are
normalized using the auto-prune method. Simulations
performed on the MIT BIH ECG signals, have shown that
significant robustness improvements are obtained regarding
potential hardware artificial neuron failures. Moreover, we
show that the proposed design achieves better generalization
performances, compared to the standard back-propagation
algorithm.
Index TermsFault tolerant, artificial neural networks,
hybrid backpropagation algorithms, medical diagnosis.
I. INTRODUCTION
The electrocardiogram (ECG) is one of the most
important biological signal used by cardiologists for the
purpose of diagnostic [1]. For instance, detecting abnormal
ECG beats is particularly important during any clinical
monitoring. For this purpose, several analysis methods and
automated arrhythmia detection have been developed during
the last decades [2]-[3]. These include heuristic approaches,
expert systems, Markov model, auto - organizing map, and
Artificial Neural Networks (ANNs) [4]. In this context,
designing systems based on these processing algorithms can
be performed either using software solutions or hardware
solutions. In case of software solutions, algorithms are
implemented on programmable devices. On the other hand,
hardware solutions can be employed for other requirements
such as reducing the execution time. In fact, this can be
achieved through a large possibility of parallelism allowed
by some devices (e.g. FPGAs). When dealing with ANNs as
in our application and despite theoretical advances in the
ANN [5]-[6], one has to point out that model parameters
determination remains empirical. In fact, various
experiments have shown that the generalization ability is
related to the network topology as well as to the nominal
parameter values determination (i.e. synaptic weight,
activation threshold, etc) by some learning methods [7]-[8].
In fact, in real cases, these configurations could suffer from
disturbances due to: (1) network environment influences
(e.g. temperature, humidity...), (2) manufacturing
technology. Therefore, for high reliability applications such
as medical diagnostics, the risk is too high [9].
1
Consequently, some specific algorithms have been
developed in order to improve physical characteristics such
as fault tolerance. Moreover, this allows a low cost hardware
implementation. For this purpose, one possibility consists of
using the prediction error as a tool that allows a fault
tolerance neural network. As it has been explained in [10],
for each neural network node, a fault error is considered
through a generalized prediction error. Another possibility is
to include a probability error in the training process for each
neuron [11]. This improvement forces the network to
distribute uniformly the calculations on different neurons. A
specific restriction is achieved either by, a modification of
the neuron weight magnitudes by adding a noise in the input
of the first hidden layer (which tends to reduce the output
layer connections weight magnitude) [12], or by introducing
a parameter to penalize higher weight amplitudes during the
phase of gradient calculation [3]. Weigend [13] uses a
specific procedure called "Addition / Deletion Procedure"
(ADP) to eliminate small neurons and add some neurons to
spread weights on several neurons.

In this paper, we propose a new hardware algorithm
implementation of a cardiac ECG signal classification
technique. This implementation is based on a specific neural
network which employs a robust gradient back-propagation
algorithm. This algorithm is a modified version of the
learning algorithm proposed in [14]. Moreover, the
additional weight penalization criterion terms are obtained
using the Auto-prune method [15]. In this method, the
most important weights are defined by a test statistic T as
describe later in this paper (Section 3). This process leads to
a uniform distribution of the weights, which makes the
neural network more robust. Simulation results show that
the proposed algorithm improves significantly the neural
network robustness even if some hidden neurons are
destroyed. Consequently, better generalization performances
compared to the standard back-propagation algorithm are
obtained. This paper is organized as follows:

In section 2, we define the new additional error term
approach (for generalization improvement) added to the
standard output error. This approach is based on the hybrid
back propagation algorithm as described in [14]. It has been
improved in this paper (section 3) through a robust back
propagation algorithm (which is our main contribution).
Section 4 deals with the theoretical analysis of the tolerance
coefficient and results related to various simulations are
Fault Tolerant Neural Network for ECG Signal
Classification Systems
Mostefa MERAH
1, 2
, Abdelazziz OUAMRI
2
, Amine NAT-ALI
1
, Mokhtar KECHE
2
1
LISSI Universit Paris 12 France Laboratoire Images, Signaux & Systmes Intelligents, EA 3956
61, avenue du Gnral de Gaulle, 94010 Crteil France,
2
LSI Universit MB, USTO, Laboratoire Signaux & Images, B.P 1505, Oran El Mnouar, Algrie
mostefa.merah@u-pec.fr
1582-7445 2011 AECE
Digital Object Identifier 10.4316/AECE.2011.03003
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
18
provided in section 5. Finally, a conclusion concerning this
work is given in Section 6.
II. THE ADDITIONAL ERROR TERM
As it is well known, when using a neural network, the aim
is to minimize the error between the real output Y and the
desired one Yd, according to a given learning data. For this
purpose, we define an energy E (squared error) by:
2
0
A
E Yd Y
i i
i
=
=
(1)
Where A: is the example number used in the learning
method. The gradient back propagation is indeed a common
method, generally employed to minimize this energy.
Here, we are interested in the alter neural network, which
produces a function, denoted by
~
Y , when a W
ij
fault occurs
at the hidden cells weights W
ij
:

( )
~
0
N
c
Y f W W a j
ij ij i
i
= +

=
| |
|
|
\ .

~
0
N
c
f W a Y Y
ij i j j
i
= = +

=
| |
|
|
\ .
(2)
Where
~
W
ij
denotes the damaged weight and
~
Y j
the
corresponding output,
i
is the corresponding pre-synaptic
neural activation.
In this case we can measure the desired and deteriorated
functions Yd and
~
Y distance as follows:
~ ~
0 0
2
A A
E Yd Y E Y E E i
i i
i i
= ~ + = +

= =
(3)

In addition to the standard output error E we set a new
error E. The final goal is to minimize the respective terms
of this errors (E and E). Minimizing the second term E
implies a corresponding Y minimization, and thereby
reducing the output neural network sensitivity to its
component fluctuations.
For this purpose we apply the chain rule for an L-layer feed-
forward neural network. According to [15], the sensitivity is
given by:
( ) ( ) ( )
1 1
' ' 1 ' 1
( ) ( ) ( )
1 1
1 1 1 2 1 1
, ,
1 1
y l l
l i
W W W f y f a f a
i ij j j j j j k l l
l l l l x
j j
k
l
c

= . .


c
.


(4)

Where x
k
and y
i
denote the k
th
element of the input vector
and the i
th
element of the output vector, respectively.
( )
1
l
W
ij
l
is the layer l synaptic interconnections and f
l

(.) is the
derivative of the sigmoid nonlinear function f
l

(.) at the l
th
hidden layer.
jl
l
(the jl
th
neuron post-synaptic activation of
the hidden layer) defined by:
^
1
( ) 1
1 1
1
N
l
l l l
a W a
j j j j
l l l l
j
l

(5)
Where:
^
( )
l l
a f a
j j l
l l
(6)
a
jl
l
: is the corresponding pre-synaptic neural activation.
For simplicity purpose, it is assumed here that all neurons
at the l
th
layer have the same sigmoid function f
l.
The
summation concerns all hidden layers neurons.
From (4), the sensitivity may be reduced by saturating the
hidden neurons activation:
^
'
( ) 1 f a
j l
l
((
(7)
In the next section the additional term in the back
propagation gradient error algorithm will be based on this
condition.
III. ROBUST BACK PROPAGATION ALGORITHM
By analysing some simulation results obtained with both
the classical back propagation algorithm and the Hybrid
learning algorithm [14], we have noticed that several hidden
cells have disproportionate weights. Therefore, if one of
these neurons is destroyed, neural network performance
might significantly decrease. In order to prevent this
problem, techniques such as weight decay algorithm and
weight elimination algorithm, have been proposed,
respectively in [16] and [17].
In this work, we will use auto-prune algorithm [15].
The fault Y obtained from the equation (4), can be
expressed as follows:
^
( ) Y f a = (8)
Where:
~
0 0
N N
a W a W a
j ij i ij i
i i
=

= =
(9)
The error appears in the weight summation corresponding
to the input neuron j. For this reason, it is not possible to
determine which weight is defective.
If we consider single connection destruction (e.g. example
the connection between the i
0
neuron and the j neuron), one
can obtain from (8) the following equality:
0 0
a W a
j i j i
= (10)
In this case one can notice that the error
j
is
proportional to the weight value. Hence; weight destructions
will affect their corresponding outputs, especially for high
value weights [18]. Some authors propose techniques to
solve such a problem by introducing a higher weight
amplitude penalizing parameters in the gradient calculating.
In [13], is used a procedure called "Addition / Deletion
Procedure" (ADP) is used to eliminate small neurons and
add some neurons to spread the weights on several neurons.
Results obtained using these techniques were not
satisfactory. In fact, poor performances of the deteriorated
ANN are obtained.
In our work, we propose a solution to solve the problem
related to the disproportionate weights in the ANN hidden
layer. As evoked previously, it consists of using the auto-
prune technique, generally used in the ANN pruning
methods. The auto-prune technique proceeds by removing
the weight that has less effect on output, with the aim of
increasing the generalization. In our case, the most
significant weight will be penalized in order to obtain a
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
19
uniform distribution of the weights in the hidden neurons.
Consequently, the neural network importance of weights
using the auto-prune method according to [19] [15], is
defined by a test statistic T assuming that a weight tends to
the zero during the training process:
( )
1

2
1
N E
l
W
s i j s
W
i j
l
T W L o g
i j
N E E
s s s
W W
i j i j

=
c
=
c c

=
c c
| |
| | | | |
| | |
| |
|
\ . \ .
|
|
| |
| | | |
|
|
| |
|
| | |
|
\ . \ .
\ .
\ .
(11)
This measure does not assume that the minimum of error
has been reached. It can be computed at any time during
training. In the above formula, the sums are computed over
examples s from the training set. is the learning rate; N is
the corpus learning number. A large value of T indicates a
high importance of the connection with weight W
ij
l
.
As it has been highlighted, our goal is to penalize the most
significant weights. This is achieved by:
If T(W
ij
l
) > M
T
, e [0. .1] (12)
With :
( )
l
T W
ij k
M
T
K

=
(13)
Where K: is the network connections number.
Then
l l l
W W W
ij ij
=
(14)
Where : is a positive constant called penalty coefficient
andW represents the weights average value set:
1 1
N M
l
l l
W
ij i j l
W
N M
l l

= =
=
(15)
Thus, the
l
W term will penalize the most significant
weights.
Note that for small values of , neglected influence is
observed on the training process. This is not the case for the
high values of , where long convergence duration is to be
expected. To overcome this problem, it has been proposed in
[20] to carry out this penalization for a learning cycles
number Ta. However, this method is unfortunately not well
defined (i.e. it does not specify when the first penalization is
required and it does not specify as well the learning cycle
number required between penalization phases). For this
reason we propose:
1 - A periodic weight readjustment (penalization) after a
specified periodic learning cycle number NCA.
2 Stop the operation if (the learning cycle number = Ta),
or if a maximum validation failures occur (the validation
performance has increase more than MAX_FAIL times
since the last time it decreased, when using validation) [21].
Our aim from the combined use of this penalization
criterion and the hybrid algorithm is:
a- To speed up the neuron network convergence.
b- And to enhance its robustness and generalization
performances.
For this purpose we rewrite the error to minimize as follows:
1 1
1 1
0 0
1 1 1 1
L M M L
i s s i
E E E E E
a a i i
M M
i s s i


= + = +

= = = =
(16)
With:
1
2
( )
0
2
0
s s s
E Yd Y
i i
N
i
=

(17)
And:
^
1
'
( )
N
l
s
sl
l
E f a
a j
l N
j
l
l
=

(18)
E
S
0
and E
Sl
a
represent, respectively, the output error and the
l
th
hidden layer s
th
learning example additional error term.
Yd
S
i
and Y
S
i
are, respectively, the i
th
neuron of the s
th
learning example, desired and the current outputs values.
l
l
s
j
a

is the post-synaptic value of the j


th
element of the hidden
layer l. The
i
value represents the network hidden layer
factor error for the additional error E
l
a
and the output error
E
0
.
M, N0 and N
l
are respectively the corpus learning number,
the neurons number of the output layer and the neurons
number of the hidden layer. The errors are normalized by
these numbers.
In the case of a bipolar sigmoid tangent, the previous
definition in (16) for E
Sl
0
, may take the following form:
^
1
2
[1 ( ) ]
2
N
l
s
sl
l
E f a
a j
l N
j
l
l
=

(19)
The minimization of the E
Sl
a
error, lead the neural
network to the nonlinear area, which improve the neural
network robustness [14].
The network is trained by the steepest-descent error
minimization algorithm. The update of the synaptic weight
using the penalization criterion of (14) for the l
th
stored
pattern can be expressed as follows:

1 1
1
s
E
s s s
W a
s j j j j l l
l l l l W
j i
l l

c
= =
c

(20)
Where
l
is the learning step and
jl
S
is the E
s
error
sensitivity, for all post-synaptic activations a
jl
in the layer l,
given by the back-propagation error as:
1
'
( )
^
1 1
1
1
N
l
E
s s s s s l
W a f a
j j j j j jl
l l l l l MN
j
l l
l
a
j
l


+
c
= +

+ +
=
+
c
| |
|
|
\ .

(21)
The output layer sensitivity
jL
S
is defined by:
( )
1
'
( )
s
Yd Y f Y
j j j j
L L L L MN
L


(22)
Substituting (20) into (21), shows two weight correction
terms, the back-propagated sensitivity (a modification of the
standard back-propagation algorithm), and the hidden
neurons tolerance gradient term (additional term) given by:
^
'
( )
1
s s s l
a a f a
j j j
l l l MN
l

(23)
The derivative f prevents the synaptic weight from
growing or declining indefinitely [14].
To calculate T, the test statistic, we can carry out some
simplifications. Using (20) into (11) we obtain:
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
20
( )
( )
( )
1
1
2
1
1 1
N
l s s
W a
ij j i j j l s
l l l
T W Log
ij
N
s s s s
a a
j i j j j i j j l s
l l l l

=

=

=

| |
|
|
|
|
|
\ .
(24)
By applying the relation:
( )
( )
2
2
2 1 1
1
N N
N x x N
i i i i
x x
i
N
i


= =
=

=
(25)
Into (24) we get the following equation:
( )
1
1
2 2
( ) ( )
1 1
1 1
N
l s s
NW a
ij j i j j l s
l l l
T W Log
ij
N N
s s s s
N a a
j i j j j i j j s s
l l l l
l
N

+

=

=


= =

| |
|
|
|
|
|
|
\ .
(26)
This formula makes it possible to carry out the sum of
1
s s
a
j i j j
l l

and
2
( )
1
s s
a
j i j j
l l

during the training. We thus


use it in our algorithm.
IV. THE TOLERANCE COEFFICIENT TLR
For the theoretical analysis and the computation of the
tolerance coefficient we proceed as follows:
Consider the case of a modification x in the input vector
x, the output can be approximated as:
( ) ( )
( )
2
(1) '
( )
y
i
y y x x y x x W f a W x
i i i ij j j k jk k
x
j k k i

c
+ ~

c
| |
|
\ .

(27)
By minimizing the part
y
i
y
i

, we increase the network


robustness in respect to the noisy input values.
y
i
could be written as follow:
( )
2
( ) y W f a
i ij j
j
=


(28)
From (27) and (28) we calculate Tlr, using the following
equation:
' ( )
(1)
( )
f a
y
j j
i
Tlr W x
jk k
y
j k i
f a
j j

(29)
It is possible, as noted in the (27), to approximate the output
y
i
using the hidden layer activity a
i
, by:
( ) ( )
y
i
y y a a y a a
i i i k
a
k i

c
+ ~

c
| |
|
\ .
(30)
In the same way the hidden layer activity a
i
may be
approximated by
( ) ( )
a
i
a a x x a x x
i i i k
x
k i

c
+ ~

c
| |
|
\ .
(31)
From these equations (Tlr) may be considered as a
referred tolerance parameter. It can also be defined for each
hidden layer neuron.
For a small disturbance we have |x
k
|<<|x
k
| then:
( )
^
1
^ ^
' '
( )
0 ^ ^
( )
W x a
kj k j k
Tlr T lr f a f a
j j j j
j j
f a f a
j j j

< =

| |
|
| |
\ .
|
\ .
(32)
The activation function is equal to:
^
^ ^
2
( ) ( )
0 ^ ^
a
j
T l r a T l r a
j j
j
a a
j j
e e
<

(33)
Its derivative is given by:
^
2
^
1 ( )
'
2
f a
j
f a
j

=
| |
|
\ .
(34)

By substituting (33) and (34) into (32), we obtain the
tolerance coefficient Tlr:
^
^ ^
2
( ) ( )
0 ^ ^
a
j
Tlr a T lr a
j j
j
a a
j j
e e
<

(35)

The T
0
lr(a
j
) has generally a maximum at
j
=0 and two
minima at
j
=. The T
0
lr(a
j
) and f(
j
) functions have a
similar functional forms, according to literature [22].
For a larger values of
j
, the value of T
0
lr(a
j
) becomes
exponentially decreasing. Consequently, this implies better
robustness and generalization performances.
As far as an L layered feed-forward neural network is
concerned, the same result for the T
0
lr(a
j
) computation can
be obtained. According to the above results, the remaining
T
0
lr(a
j
) of all hidden neurons layers may be deduced by
analogy.
The proposed fault tolerant neural network (ROBBPT)
can be summarized as follows:
Step 1. (Initialization)
- Initialize the weights of the
network.
- Fix the training parameters
- Fix the tolerance parameters (,

1
, , Ta and Nca)
For each cycle do
Step 2. (Weights Saturation)
- Calculate the out-put E
S
0
and the
additional error term E
Sl
a
errors
and there corresponding gradient
norm.
Step 3. (Weights penalizations)
- If conditions, T (W
ij
l
) > M
T
is
not satisfied go Step 4, otherwise
continue.
- if the iteration number = the
periodic learning cycle number Nca
OR the learning cycle number Ta
reached, go to Step 4, otherwise
penalize all the weights W
ij
.
Step 4. (Training the neural network)
- Update all the weights and biases
- if training stopping criterion
satisfied, STOP, otherwise go to
step2
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
21
V. EXPERIMENTAL RESULTS
In this study, all the algorithms are tested on real
recordings from MIT-BIH ECG database. The MIT dB
arrhythmia base is made up of 48 ECG recordings (half hour
duration for each one), sampled at a frequency of 360 Hz,
containing 116.137, normal and pathological beats
(ventricular or auricular extrasystoles, left or right branch
block). This database includes 25 men of age between 25
and 89 years, and 22 women of age between 22 and 89
years. Derivation n1 and n 2 are selected for the entire
posterior analyzes, because they allow identifying the beats
of the heart.
This work is a continuation of [23]. We distinguish there
different stages: the stage of data conditioning (sampling
and filtering), the stage of characteristics extraction
(transformation and parameter setting) and the stage of
neural classification (data training and test set validation).
For the multi-layer perceptron (MLP), Matlab has a variety
of trainings algorithms based on the backpropagation
principle, which vary in calculation cost and convergence
speed. Some of these algorithms are suitable for a
classification task whereas others are more powerful in
functions approximation. The most powerful
backpropagation classification algorithm in the case of
altered network is the Traingdx (Gradient descent with
momentum and adaptive learning rate backpropagation).
Trainlm (Levenberg-Marquardt backpropagation) converges
very quickly and apparently this poses a problem of
coherence in the results [23].
In order to reduce the ANN complexity, the training data
examples vectors size, were reduced by the singular values
decomposition SVD. Setting preserved energy to 99.9% of
the total energy (we keep the most significant components
of total energy), we obtained as a result 17 elements
example vector, from 3600 initial elements [23].
The proposed neural network classifier is a 3 classs
problem (pathological ECG signal, normal ECG signal and a
rejection class). The input vectors size is 17. There are 578
examples for the learning data (APR) and 265 for the
generalization data (G), (the test vectors are not taken into
account during the training process). We use a three layers
feed forward neural network, totally connected. We present
only the generalization results, those which are in fact
significant in practice.
Notice that all the figures show an average of 10 results.
In the next, we'll use the following notations:
"SBPT": Training by the standard Gradient Back-
propagation algorithm.
"HBPT": Training by the Merging back-propagation and
Hebbian learning rules algorithm [14].
"PBPT": Training by Gradient Back-propagation +
Penalization algorithm [21].
"ROBBPT": Training by the Robust Gradient Back-
Propagation Algorithm (saturation + Penalization)
algorithm.
To show the efficiency of the proposed algorithm
(ROBBPT) in terms of generalization and robustness, we
compare it to other different algorithms: the (SBPT), the
(HBPT), and the (PBPT). The analysis process of ECG
signal was carried out by using jointly, Matlab environment
and a program developed in c# language for this purpose.
This program allows us to visualize different signals and
also provides us with the number and the position of each
signal (pathological ECG signal, normal ECG signal and a
rejected one).
Note: all the next results regarding ROBBPT algorithm
were obtained using the following parameters: (=0.7,

i
= 0.4, = 0.2, Ta = 3 and Nca = 20), Where,
respectively, represents the pruning coefficient,
i
the
network hidden layer error factor, the penalty coefficient,
Ta the learning cycle number and Nca the periodic learning
cycle number.
In the table below, a comparison of SBPT and ROBBPT
generalization performance is presented. These results
correspond to a random destruction number d of the hidden
cells (d = 0, 1, 2 ...). For each of these destruction cases, we
give two generalization rates:
The worst rate: the lowest rate obtained, trying all the
configurations of d destroyed hidden cells.
The average rate: is calculated for a given number d of
destroyed hidden cells, taking into account all possible
combinations of cells destructions.
TABLE I. SBPT AND ROBBPT GENERALIZATION RESULTS FOR A RANDOM
DESTRUCTION OF THE HIDDEN NEURONS
Worst Rate
Destroyed
Cells Rate
SBPT ROBBPT
0 % 95.47 % 97.74 %
10 % 65.00 % 84.2 %
20 % 50.3 % 72,2 %
30 % 45.6 % 68.1 %
40 % 20.05 % 58,4 %
50 % 08.12 % 34.2 %
60 % 04.20 % 29.9%
Average Rate
SBPT ROBBPT
0 % 95.47 % 97.74 %
10 % 83.58 % 95.39 %
20 % 70.14 % 81.8 %
30 % 58.20 % 72.85%
40 % 45.3 % 61.49 %
50 % 29.10 % 52.9%
60 % 16.25% 42.5%
To compare the speed of convergence of the two
algorithms, we plot the mean square error versus leaning
epoch for SBPT and ROBBPT in Fig.1. We notice that the
proposed algorithm converge faster than the standard one.
0 50 100 150 200 250 300
10
-2
10
-1
10
0
Learning epoch
M
e
a
n

s
q
u
a
r
e

e
r
r
o
r
SBPT
ROBPT
Figure 1. Comparison of convergence speed between SBPT and ROBBPT
algorithms
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
22
Moreover the global recognition rate versus learning
epoch, of the two algorithms SBPT and ROBBPT, presented
in Fig.2 show that a quite noticeable learning performances
improvement is obtained by the use of the ROBBPT
algorithm.
0 50 100 150 200 250 300
0
10
20
30
40
50
60
70
80
90
100
Learning epoch
R
e
c
o
g
n
i
t
i
o
n

r
a
t
e

(
%
)
SBPT
ROBBPT
Figure 2. Global recognition rate versus learning epoch, comparisons
between SBPT and ROBBPT algorithms
The following figure shows the results of the tests carried
out on the file (MIT 100) from MIT-BIH database, using
ROBBPT and SBPT algorithms. This file contains a normal
cardiac rhythm.
(a)
(b)
(c)
Figure 3. Stages of the algorithms applied to MIT 100 (a) Original signal.
(b) classified signal by SBPT algorithm with zero neurons destruction rate.
(c) classified signal by ROBBPT algorithm with zero neurons destruction
rate
We notice from the previous figure that after having been
pretreated, the MIT 100 signal in Fig.3 (a) was analyzed by
SBPT and ROBBPT algorithms with zero neurons
destruction rates. The ROBBPT, Fig.3 (c), classified MIT
100 signal as normal, whereas the SBPT algorithm, Fig.3
(b), classified a fraction of the signal as pathological, which
is wrong since the MIT 100 signal is a normal cardiac
rhythm.
The increase in the generalization rate recognition was
expected, 97.74 % for ROBBPT and 95.47 % for SBPT (see
table2 for 0 % destroyed cells), due to the additional error
introduction (see section 2). This error is plotted with the
quadratic error in the following figure.
0 20 40 60 80 100 120 140 160 180 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Learning epoch
E
r
r
o
r
Mean square error
Additional error
Figure 4. Errors evolution (the mean square error and the additional error)
versus learning epoch
Fig.5 shows that the proposed algorithm respects the
saturation condition as explained in section 2. We notice
that the hidden layer neurons mean values tend towards the
saturation zone of the activation functions during the
training process, which is not the case for the SBPT
algorithm.
0 50 100 150 200 250 300
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Learning epoch
N
e
u
r
o
n
s

a
v
e
r
a
g
e

v
a
l
u
e
SBPT
ROBBPT
Figure 5. Comparison of the average neurons values during the training
process between SBPT and ROBBPT
As demonstrated in section 4, the infinitesimal hidden
neurons tolerance coefficient value enhances the neuron
network robustness. Fig. 6 shows that the average hidden
layer neurons tolerance coefficient value of ROBBPT
algorithm is lower than that of SBPT algorithm, which
confirms the robustness of proposed algorithm.
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
23
0 50 100 150 200 250 300
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Learning epoch
H
i
d
d
e
n

n
e
u
r
o
n
e
s

a
v
e
r
a
g
e

t
o
l
e
r
a
n
c
e

v
a
l
u
e
s
SBPT
ROBBPT
Figure 6. Average tolerance value of the hidden neurons during the training
process, comparison between SBPT and ROBBPT
In the following figure we notice that the most significant
weights penalizations occur in the beginning of the training
process. This is may be due to the first weights random
initialization, which in our case is the Nguyen-Widrow
initialization.
0 20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
140
Learning epoch
N
u
m
b
e
r

o
f

p
e
n
a
l
i
z
e
d

w
e
i
g
h
t
s
Figure 7. The number of penalized weights versus learning epoch
The tests results presented in fig. 8, were obtained by
using file (MIT 203) of MIT-BIH database, which contains
pathological cardiac rhythm (Ventricular tachycardia) with a
wide anarchistic QRS Complex (> 0.12s). These results
shows that for a 30% hidden layer neurons destruction rate,
the network trained by ROBBPT algorithm remains
satisfactory compared to the one trained by SBPT algorithm.
It is also noticed that for 0% destruction rate the
ROBBPT, Fig.8 (d) has classified the signal as pathological
while the SBPT, Fig.8 (b), has classified a part of the signal
as normal. Furthermore we observe that for 30% destruction
rate the ROBBPT algorithm, Fig.8 (e), has correctly
classified almost the totality of the signal, while the SBPT
algorithm Fig.8 (b) classification is completely erroneous.
(a)
(b)
(c)
(d)
(e)
Figure 8. Results of the classification of the SBPT and ROBBPT
algorithms applied to MIT 203 signal: (a) Original signal. (b) classified
signal by SBPT algorithm with zero neurons destruction rate. (c) classified
signal by SBPT algorithm with 30% neurons destruction rate. (d) classified
signal by ROBBPT algorithm with zero neurons destruction rate. (e)
classified signal by ROBBPT algorithm with 30% neurons destruction rate.
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]
Advances in Electrical and Computer Engineering Volume 11, Number 3, 2011
24
In addition, the tolerance parameter can be calculated for
each hidden layer neurons. In Fig. 9, the values of the
tolerance coefficients, for each hidden layer neuron, are
illustrated for networks, trained by SBPT and ROBBPT
algorithms.
N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 N17 N18 N19
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Hidden layer neurons
T
o
l
e
r
a
n
c
e

c
o
e
f
f
i
c
i
e
n
t

v
a
l
u
e
s
SBPT
RORBPT
Figure 9. Comparison of the different hidden neurons tolerance values,
between networks trained by SBPT and ROBBPT algorithms
Fig. 9 shows that the proposed algorithm, improves the
tolerance of all hidden layer neurons.
The following figure, illustrates further this improvement.
In this figure the generalization rate versus different hidden
neurons destruction rates is plotted, for the ROBBPT,
HBPT, SBPT and PBPT algorithms. It can be noticed that
the ROBBPT outperforms the others.
0% 10% 20% 30% 40% 50% 60%
0
10
20
30
40
50
60
70
80
90
100
Hidden neurons destruction rate (%)

G
e
n
e
r
a
l
i
z
a
t
i
o
n

r
a
t
e


(
%
)
ROBBPT
HBPT
SBPT
PBPT
Figure 10. Generalization rate versus the hidden neurons destruction rate,
for the learning algorithms: SBPT, HBPT, PBPT and ROBBPT
VI. CONCLUSION
In this work we have studied the application of a new
fault tolerant Artificial Neural Network called (ROBBPT),
for ECG classification. The simulation results that we have
presented show that the proposed algorithm brings about a
significant improvement in the neural network robustness
and provides as well a good generalization against neurons
destruction. These results were obtained from a comparative
analysis, achieved using HBPT, SBPT, PBPT and ROBBPT
algorithms, for different hidden neurons destruction rates.
Generally speaking, we have shown that the ROBBPT
achieves a good classification of three different classes
(Normal, pathological, rejected).
One has to point out that the ROBBPT can be extended to
a multi hidden layer neural network (i.e. the training is also
achieved through random destruction of the hidden layer
neurons). Furthermore, this algorithm can also be applied in
the case of input cells destruction.
A marked weakness of the research in the area of fault-
tolerant neural networks is that few results have been really
applied to real-world applications, although intuitively they
could at least benefit the design of robust VLSI neural
circuits. Try to apply present results to real-world problems,
then, is a very important issue for future work.
REFERENCES
[1] Splawski,J. Shen, K.W. Timothy, G.M. Vincent, M.H. Lehmann, MT.
Keating, Genomic structure of three long QT syndrome genes,
Kvlqt, Herg, and Kcne1. Genomics, No. 50, Pp. 86-97, 1998,
Available: http://dx.doi.org/10.1006/geno.1998.5361.
[2] CJ. James, CW. Hesse, Independent component analysis for
biomedical signals, Physiol Meas, No. 26, Pp.1539, 2005,
Available: http://dx.doi.org/10.1088/0967-3334/26/1/R02.
[3] C. Chui, K. Mehrotra, K.M. Chilukuri, R. Sanjay, Modifying
Training Algorithms for Improved Fault Tolerance, IEEE
International Conference on Neural Networks, Florida, Pp. 333-338,
1994, Available: http://dx.doi.org/10.1109/ICNN.1994.374185.
[4] A. Hyvarinen, E. Oja, Independent component analysis: algorithms
and applications, Neural Network, No. 13, Pp. 411-430, 2000,
Available: http://dx.doi.org/10.1016/S0893-6080(00)00026-5.
[5] T.Y. Kwok, D.Y. Yeung, Constructive Algorithms for Structure
Learning in Feedforward Neural Networks for Regression Problems,
IEEE Trans. Neural Networks, Vol. 8, No. 3, Pp. 630-645, May 1997,
Available: http://dx.doi.org/10.1109/72.572102.
[6] F.L. Luo, Applied Neural Networks for Signal Processing,
Cambridge Univ. Press, Cambridge, Mass., 1999.
[7] C. Campbell, Constructive learning techniques for designing neural
network systems, In CT Leondes, editor, Neural Network Systems
Technologies and Applications. Academic Press, 1997.
[8] M.L. Nasir, R.I. John, S.C. Bennett, Selecting the neural network
topology for student modelling of prediction of corporate bankruptcy,
Campus-Wide Information Systems, Vol. 18, No. 1, Pp. 13 22,
2001, Available: http://dx.doi.org/10.1108/10650740110364390.
[9] F. BLAYO, Rseaux de neurones artificiels du laboratoire au
march industriel, SAMOS (Statistiques Appliques et Modlisation
Stochastiques), Universit Paris1, Panthon Sorbonne 1998.
[10] S. John, C.L. Andrew , Prediction error of a fault tolerant neural
network, Neurocomputing, Vol. 72, No.3, Pp. 653-658, December
2008, Available: http://dx.doi.org/10.1016/j.neucom.2008.05.009.
[11] C.S. Lin, I.C. Wu, Maximizing Fault Tolerance in Multilayer Neural
Networks, IEEE International Conference on Neural Networks,
Florida, Pp. 419-424 , 1994, Available:
http://dx.doi.org/10.1109/ICNN.1994.374199.
[12] T. Kurita, H. Asoh, S. Umeyama, A. Hosomi, A structural Learning
by Adding Independant Noises to Hidden Units, IEEE International
Conference in Neural Networks, Florida, Pp. 275-278, 1994,
Available: http://dx.doi.org/10.1109/ICNN.1994.374174.
[13] A.S . Weigend, D.E. Rumelhart, A.B. Huberman, Generalization by
Weight-Elimination applied to Currency Exchange Rate Prediction,
IEEE International Conference on Neural Networks, Vol. 1, Pp. 837-
841, 1991, Available: http://dx.doi.org/10.1109/IJCNN.1991.155287.
[14] D.G. Jeong, S.Y. Lee, Merging back-propagation and Hebbian
learning rules for robust classifications, Neural Networks, Vol. 9, Pp.
1213-1222, 1996, Available: http://dx.doi.org/10.1016/0893-
6080(96)00042-1.
[15] W. Finnoff, F. Hergert Improving model selection by non convergent
methods, Neural Networks, Vol. 6, Pp. 771-783, 1993, Available:
http://dx.doi.org/10.1016/S0893-6080(05)80122-4.
[16] A. Korgh, J.A. Hertz, A simple weight decay can improve
generalization, Advances in neural information processing systems,
San Mateo, CA, Morgan Kaumann, Vol. 4, Pp. 950 957, 1992.
[17] Y. LE Cun, JS. Denker, S.A. Solla, Optimal brain damage, Adv. In
Neural Info. Proc. Sys, Morgan Kaufmann, Vol. 2, Pp. 598-605, 1990.
[18] B.E. Segee, M.J. Carter, Fault tolerance of pruned multilayer
networks, Digest IJCNN, Vol. 2, Pp. 447 452, 1991, Available:
http://dx.doi.org/10.1109/IJCNN.1991.155374.
[19] L. Prechelt, Connection pruning with static and adaptive pruning
schedules, Fakultt fr Informatik, Universitt Karlsruhe, Germany,
8 Nov. 1995, Available: http://dx.doi.org/10.1016/S0925-
2312(96)00054-9.
[20] N.C. Hammadi, I. Hideo, A Leaning Algorithm for Fault Tolerant
Feedforward Neural Networks, Chiba Univesity, Chiba-shi, Japan,
Pp. 263, 1996.
[21] M. Merah, B. Nacredine Algorithme de rtro-propagation du
gradient avec Pnalisation des poids R.P.G.P. CNIE, USTO, 15-16
December 2002.
[22] S.Y. Jeong, S.Y. Lee, Adaptive learning algorithms to incorporate
additional functional constraints into neural networks,
Neurocomputing, Vol. 35, Pp. 73-90, 2000, Available:
http://dx.doi.org/10.1016/S0925-2312(00)00296-4.
[23] M. Merah, A. Ouamri, Analyse et traitements de lECG pour la
conception dune base dapprentissage dun R.N.A, The 3
rd
International Summer School on Signal Processing and its
Applications, Jijel, Algeria, Pp. 08-12, July 2006.
[Downloaded from www.aece.ro on Saturday, October 12, 2013 at 21:54:47 (UTC) by 197.200.2.74. Redistribution subject to AECE license or copyright. Online distribution is expressly prohibited.]

You might also like