Professional Documents
Culture Documents
Przemyslaw M. Szecowka
Faculty of Microsystem Electronics and Photonics
Wroclaw University of Technology
Wroclaw, Poland
przemyslaw.szecowka@pwr.wroc.pl
k _
sij -
Fi (xi
x....
,Xn)
Ox~X(
Se.
jK
Stbs
I. INTRODUCTION
maXk [sij]
Simi
(
[k
(4)
2]
iji j
(5)
l3k
Y:k=l Iij
Z K
iK
(6)
i,ag
maxk=,...,K {S,aj7}
(7)
uS3
ference between these two definitions of norms has multiplicative character and does not affect generality of considerations) and is normalized. J.i,avg represents importance
of an input i for an evaluation of outputs. Further in [8],
significance measure is used to prune a neural network by
removing inputs.
ANALYSIS METHOD
The authors show in [6] and [7] that the sensitivity
analysis method for neural networks can be ineffective,
when used to evaluate an inputs importance and thus
ineffective for a neural network pruning. Specifically, it
can happen, when inputs are dependent.
More formally, authors claim that neural network pruning heuristic algorithms based on the significance measure
ki,avg defined in (7) can effect with highly non-optimal
OFi (xi z .
~xn) M
k(L)
k(L
k
(smj im ) results (i.e. networks with important inputs removed) in
Oxj
m=l
\(2) the cases with input dependencies.
Moreover, the explanation of this phenomenon and more
general conclusions are presented here with focus on the
k(1) _ Yi
F, ,k(1) \(1)
(3) unwanted impact of input dependency on the results of the
ij -0 Xk- i (X )WtUi
sensitivity method.
Formla
Formla()()isvali
isvali forL >1 whre Lis te nuber
Assume that output is related to inputs by
of layers in the neural network. After partial derivatives
are calculated for certain points in an input variables y=Ux:...:n 8
x)
8
space, generalization has to be undertaken to find the
actual sensitivity of an output to an input. Three types of
Assume also that one input can be expressed as a value
generalization are presented here with slight modification
of a function g of remaining inputs.
...
0.8
~~~~~~~~~~analysis
0.6
xi
Fig.
Xi
g(Xl,---
deinedoe
cpase,
ove hea
te
sacnthr
su
defined--
In sch
ismored
Xn)otlimited
(9)
funchtion
-,such thatch
thee ismorethanone
tha one uncton h
acase
X)
f(X1
by (9)
be h(xl,X2 fX Xl)x'+X2
--
=x
However,
However,
function h can be any function of the form1xh(xi,
12)
ax, + bX2 where a + b 1. Figure 1 shows three of these
functions.
Consequences of described fact for neural network training are significant.
Patterns produced by (8) and (9) do not densely cover
all the regions of the input space. When such patterns
are used to train a neural network, values of function
implemented by the network are highly unpredictable for
the inputs belonging to these empty regions. It means that
the neural network can implement any function as long as
this function has proper (or close) values for the inputs
covered by patterns.
For example, in the cases of two inputs analyzed in this
article, patterns create a three dimensional curve, as seen in
Figs. 3 - 6. Every surface, containing this curve represents
a potential neural network function for these patterns.
Obviously, different functions yield different slopes (different partial derivatives), potentially in the whole domain,
This fact is very important when considering that these
partial derivatives are the only base for the sensitivity3
analysis method to evaluate an inputs significance.
The outcome of these considerations is that one can
not completely rely on sensitivity analysis results, when
evaluating real inputs significance in cases when dependencies between inputs are observed. These results are highly
dependent on unwanted factors, such as training method,
2
)
A. Inputs independent
In the first example, the case without dependencies
between inputs is analyzed. Fig. 2 shows the graph of the
neural network function and patterns for this case. It means
that both 11 and 12 values have been generated randomly in
the brackets [0,1]. Since the patterns cover densely whole
input variables space, the function implemented by neural
network is very similar to the original f (X1, 12) in this
space.
Table I shows the results of the sensitivity analysis
method for three different methods of generalization. For
every norm, sensitivities for both inputs are almost the
same. This can be explained by the fact that the function
(10) is symmetrical with respect to the plane described by
the equation I y. Thus, in this case without dependency
between inputs, results of sensitivity analysis are correct.
0.6
0.4
0.2
TABLE
9 11
\SENSITIVITIES
,l
0.8
1.6
1.4
xi
1.<06
0~ ~.2
1 x2
1.2
- -
. - -
- - /-0.4
1.8
Fig. 2. Patterns (black points) and neural network function (grid) for the
case of independent inputs
TABLE I
SENSITIVITIES FOR THE CASE OF INDEPENDENT INPUTS
method
max
euclidean
absolute
input 1
1.32478
0.728889
0.628355
method
max
euclidean
input 1
2.176415
0.981789
input 2
4.670336
absolute
0.854413
1.993096
2.196103
x2
9(x1)
sin(2x1).
Note that
12
1.8
0.6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.
--/\
</
/XX
/>X\/.
\
.
/ i0x
2
4. Patterns (black points) and neural network function (grid) for the
Fig.
X )9/\g
X
of dependent
inputs (X2 - sin(2xi))
\case
IgX
X=)
input 2
1.310584
0.710269
0.616506
0.1~
0.4
(X2
that, even though both pattern sets were created using the
same formula (10), the neural network function from Fig.
3 appears quite different than the one from Fig. 2. It is
explained here as a consequence of a special patterns configuration, which, as stated earlier, is caused by the inputs 0.7'
dependency. In addition, the function which is implemented 0.6
by the neural network in this case, as the solution for
the approximation problem, is much less complicated than
the original function (10) (which obviously is still proper
approximation in this case) and is probably easier to learn.3m
0.2 .
using the backpropagation algorithm.
0.8
g(11)
Fig. 3. Patterns (black points) and neural network function (grid) for the
' y
0.2\
:;'
\ rS
0.4\ - . - -<~0.
0.6
0
0.8
X2
'
0.2
second input removed. Mean errors over the testing set are
shown in the table (average over three trials was taken) V.
TABLE V
TABLE III
SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS
(X2
0.0086
sin(2xi))
method
input 1
input 2
max
euclidean
absolute
0.611417
0.198803
0.116956
1.641876
0.88746
0.801896
E. 7-inputs system
[6] and [7] show the results of the application of sensitivity analysis to the systems with 4 and 6 (dependent)
inputs. Here additional experiment for 7 input systems with
input dependencies is shown. The data was generated using
the following equations:
y
7)
0.8
0.4-
15
9(X4)
(13)
14
(14)
9(11,15) X114 + random ([0,2))
; <04
It, was also normalized to improve neural network training and sensitivity analysis results. Table VI presents
sensitivities of the output to the inputs as well as average
errors (over three trials) of the networks with one input
removed.
17
1.
o
_,
40.6
02
04
7Z/0.8
x206
.85
1
Fig. 5. Patterns (black points) and neural network function (grid) for the
case of dependent inputs (x2 - Siri(Sxi))
TABLE VI
SENSITIVITIES AND AVERAGE ERRORS
input number
1
2
sensitivity
0.128
0.237
mean error
5
6
7
0.197
0.125
0.14
0.0128
0.0374
3
4
TABLE IV
SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS
(X2
sin(4.7xi) +x1(xi
0.166
0.198
0.0298
0.0284
0.0177
0.0786
0.0337
+ 1))
method
input 1
input 2
max
euclidean
tabsolute
1.36
0.582
2.408
0.995
0.488
0.87
on Table VI).
It can be easily seen that the sensitivities for the inputs differ significantly from their real importance. More
detailed analysis shows that the input with the smallest
sensitivity and thus the first candidate to be removed is
input 1. However removing this input yields relatively big
drop of a network performance. Input with the smallest
0.25
* a1
0.2-.
brabout backpropagation.learning.algorithm.whichkwasaused
=015.n|
HZ 01 2
w I
*
0.1.|
|
|
|
|
|
|
o. .
input number
Fig. 6.
USING?
Considerations in this article show that sensitivity analysis results can be invalid for certain cases. The authors
claim however that this does not make the method com-
Networks", International Joint Conference on Neural Networks, Volume 3, pp. 1829-1833, 1999.
[5] feedforward
J. J. Montano
, A. Palmer, "Numeric sensitivity analysis applied to
neural networks", Neural Computing & Application 12,
pletely useless.
3643 pp. 27-32, 2005.
[8] J. M. Zurada, A. Malinowski, S. Usui, "Perturbation method for
The example from the previous section, where 12
deleting redundant inputs of perceptron networks, Neurocomputing
siM(2x1) dependency was used, shows relatively small
14", pp. 177-193, 1997.
values of sensitivity for the first input. On the other hand
[9] J. M. Zurada, A. Malinowski, I. Cloete, "Sensitivity analysis for
pruning of training data in feedforward neural networks", Proc. of First
it is known that removing the first input from the network
Australian and New Zealand Conference on Intelligent Information
would not be necessarily the best choice since the second
Systems, Perth, Western Australia, December 1-3, pp. 288-292, 1993.
input is characterized by minimal possible significance.
[10] J. M. Zurada, A. Malinowski, I. Cloete, "Sensitivity analysis for
minimization of input data dimension for feedforward neural network",
However, even if not optimal, it can be still considered
of IEEE International Symposium on Circuts and Systems,
reasonable
reasonable choice. Low sensitivity
for choice.
the firstLowsenstivityorthefrsProc.
input means
London, May 28-June 2, pp. 447-450, 1994.
that the value of the output does not change significantly
with changes in the first input for this particular neural
network function. That in turn means that some constant
value of this variable can be used every time to calculate
the output value without significant error. This implies that
the sensitivity analysis method can be useful (but may be
not sufficient) in the process of evaluating least significant
inputs. This problem, however needs more precise consideration.
V. CONCLUSIONS