Professional Documents
Culture Documents
01993 JEEE
0-7803-0614-7/93/$03.00 944
probable one. This is done by combining information Then, the functions h!, can be logically considered
given by heuristics. Based on some measurements or ob- as density of probability functions. Therefore, the
servations, heuristics evaluates the chances 1.0 be on the following relationship could be assumed
right path.
The following assumptions are made for modeling uncer- P,, = Pl,(Yt)
tainty
which links the conditional probability of a measure-
0 T h e characteristics which define uniquely a solution
(cf. Section 1) are represented by random variables
+
ment being between pl, and p, 6 given a particular
partial solution y,. If 6 is sudciently small, the fol-
or random vectors. Let X, be the random variable lowing approximation can be done
or vector receiving a value at level i. The X,’s are
discrete random variables or vectors Liking values +
P ( p t , 5 p 5 PI] ‘IK = yl)
in a finite set of size S, where S, 5 n;=,
nk. The
Characteristics defining a candidate partial solution
0
hIJ(PIJ(YI))6
.=[;:I
up to level i will be represented by the random vector Also, the a priori probabilities P ( x , = Z s k l X - 1 =
y l - l ) of level I are assumed t o he independent of the
heuristics and measurements at the previous levels
(< 0.
Given a measurement p, a typical heuristic rating func-
tion, h(p) may look like in Fig.1, where
Therefore, a complete solution will be a sample value
of the randorii vector Y N .
0 Suppose that the n, values {z,k(y,-l)’+ are all the
possible values the random variable f:,
can take,
t
given that the previous levels random vector K-1
takes the value y,-1. Then, i t is assumed that for
each level i, a heuristic gives an estimate of the fol-
lowing U priori probabilities
P ( X , = Ztklyt-1 = Y l - I ) = f:(Z,k,Yt-1)
2,k = Z,k(Y,-l)
with k = 1 , .. . , n,. Obviously,
1, tc-
hSJ(p)dp =
tions give a measurement < pl and > p4. If a is close t o
1, the heuristic is less assertive in its rejection of a can-
didate. Therefore, a should he somewhat proportional t o
the reliability of the heuristic.
945
2.2 Rating function Clearly, all the terms in the numerator are known
or can be computed at level i. However, the values
Since the expert system, in its search of a satisfying so- {Zii(W(i-l)k # yi-1)) are not necessarily known, since
lution, cannot materially estimate all the possible solu- only the ath defined by K-1 = yi-1 may have been ex-
tions, a heuristic search strategy is followed. In the sense plored. $herefore, most probably, the denominator can-
that, at any given step, the most promising path is cho- not be computed.
sen. To do so, a global rating function must be defined. It is logical to use these a posteriori probabilities (mea-
This global rating function must be able to compare can- suring the plausibility of a candidate solution or partial
didate partial solutions a t different level as well as a t the solution), for the design of a rating function. Therefore,
same level. Furthermore, this ratin function must obey the rating functions wdl take into account the conditional
to common sense by giving alwaystigher ratings to the probabilities { P ( X = yil{Pij = pij(yi)}, .. . ,{ P l j =
most probable paths. As a startin point, let's compute
the conditional probabilities for eaca candidates given the p l j (Yl)})}.
values returned by the heuristics
Level 1: Using Bayes' Theorem (cf. Papoulis [2], c h a p 2.2.1 Level rating function
ter 7.) and the fact that the heuristics are independent, Since the denominators of the conditional probabilities are
the probability that the characteristics of the solution de- identical for all the candidate solutions at the same level,
scribed by the random variable X1 takes the values Zlk i t is natural to choose the followin level rating function to
given the measurements {plJ(Zlk)}l j = 1,. . . ,m l compare candidate solutions at t%e same, level. Suppose
that y; is a candidate solution up to level I, (i= 1,. . . ,N)
P(X1 = Z l k ) { P l J = PlJ(21k))) =
n?=l1 mi ml
xr=l1
P(Plj(zlk)<P<Plj(flk )t61X1=flk) P(Xl=Zlk)
lima-o 1J ,,.l Ri(yi) = n h i j ( P i j ( y i ) ) . .. n h l j ( P l j ( Y 1 ) ) (fi(Yi). .. f i ( ~ 1 ) )
P ( p l i ( z l r ) ~ P ~ P l j ( Z l l ) + ~ I x ~P =
( X~l l=~z )l l ) j=l j=l
- nJmll ~ ~ J ( =I k ))
h l(P f l ( =1k )
"1
1lJz1h l J ( p l J ( z l r ) )
"1
-fl(zlr)
k-1 j=l
946
estimate of
M
Ethk,)(Y,) = maxhk,(P)lY.=:y,
EtfkHYI) = maxfk(Y)lY,=!,, is the global rating of yi. Generally, for numerical reasons
the logarithmic form will be used instead
This is a very cautious approach. Indeed, the can- N m r
didate partial solutions a t the lower levels are ad-
vantaged with respect to the candidates ,tt the higer
levels. Most probably, the expert system in its search
of the solution will backtrack more often to explore
paths at lower levels which have not been selected
earlier.
2. Expected value approach: The expected value is k=l ,=1
used for the heuristics and the a priori probabilities
to be estimated under the assumption that the can- This rating could he normalized and different weights
didate is the solution. Given the assumptiion that the could be attributed to the different levels to emphasize
heuristic rating functions h k j are density of proba- more critical levels
bility functions for the measurements and that the
rating functions fk are probabilities, the expected
values can be computed (when possible) the follow-
ing manner
E{hk3}(yt) = 1: h:,(P)lY,=Y, dP
nk
E{fk}(Ya) = fk2(rlk)lY,=y,
t=1
3 Fuzzy reasoning
The heuristics rate the candidate solutions based on some
measurements or observations. Very often, the range of
However, most often, E{fk}(yl) cannot bfecomputed the measured variable is more important than its exact
this way. Therefore, the only remaining option is value (see for example, the heuristic in Fig.1). The mea-
to use the mean values computed from several test surement can be often classified in simple cate ories de-
runs. This could be done also, when needed, to com- scribing how well it fits the hypothesis that t i e candi-
date is the solution t o the problem. For example, Good,
pute E{hk3}(y1). Suppose that yk,, k := 1,..., N , Average, and Bad could be the categories or linguistic
I = l , . .. , M , are the partial solutions found in M
different runs and that yI, is compatible with or equal terms describing how the observation fits the hypothe-
to y, V I , then sis. Furthermore, the transitions between these categories
(i.e., Good and Average or Average and Bad) are of-
M ten blurred. Therefore, there must be also a smooth tran-
1 sition in the rating of the different candidate solutions.
E{hk~}(yl) = ;i? x h k 3 ( P k ~ ( ! / k , ) ) Logically, fuzzy logics seem to be a promising approach
1=1 for handling this type of uncertainty and fuzziness.
947
3.1 Modelization the following rule
The following assumptions are made for modeling uncer- Economy rule: Cf,!(zik) 5 1 Vi,k
tainty and fuzziness (based on the terminology used in,
1
e.g., Bellman & Zadeh [3] or Mamdani [4].
0 The values which define uniquely a solution (cf. Sec- Furthermore, if it is assumed that the linguistic
tion 1) belonq to nonfuzzy support sets of universes terms span all the possible degrees of fit, then the
of discourse like, for example, the k-th characteris- following rule applies
tic or set of characteristics of the solution . Let Xk
be the set of possible values for the characteristics Completeness rule: f,!(Zik) 2 1 v i, li
of the solution receiving a value at level k. Then, 1
the xk’s are the nonfuzzy support sets, having a fi-
nite number, s k where sk 5 flfZl
n i , of elements, of Consequently, if the three rules apply
the universes of discourse “k-th characteristic or set
of characteristics of the solution”. Based on these
definitions, the characteristics defining a candidate
partial solution up to level k will be represented by
the nonfuzzy support set Finally, by default if the a priori degree of fit is un-
known, a constant membership function will be as-
sumed, i.e.,
Therefore, a partial solution up to level i will be an
element of Y , and a complete solution will be an el-
ement of YN.
k =
0 Suppose that the n, values
6“‘
lk Yi-l)},
1 , . . . , m i , are all the elements epending on yi-1
which belongs to the nonfuzzy support set XI,where
yi-1 is an element selected in the previous levels non-
fuzzy support set, X-1. Suppose that there exist M fi(zik,Yi-l) = [ 1, 0 , 0, ]
different categories or linguistic terms (e.g., Good,
Average and Bad, M = 3>, for describing to what 0 There are mi different rating functions at level a (one
degree z i k fits the Characteristics of the solution r e p per heuristic). Let hf,(p,j) be the j-th membership
resented by X i . These linguistic terms define Euzzy function of level i associated to the linguistic term I,
subsets of the universes of discourse “i-th character- where pi, is an observation depending on the charac-
istic or set of characteristics of the solution”. Then, teristics described by Y,. Let hij(pij) be the vector
for each level i , there exists M membership functions
(one per linguistic term) of M membership functions hfj(piJ). Given a candi-
date partial solution up to level i, yi E Y,, an obser-
vation pi,(yi) is made and the j-th heuristic returns
a fuzzy subset, hi3(p;,), describing how the observa-
tion p,,(yi) fits the hypothesis that yI is the solution
up to level i.
Without loss of generality, it can be assumed that
the. membership functions, hi,, are normalized, so
which define fuzzy subsets associated to the linguis- that
tic terms I , describing to what degree the values max hf, (p) = 1
(zik(P-1)) belongs or fits a priori the i-th char- P
acteristic or set of characteristics of the solution Furthermore, it is assumed that the Economy rule
represented by Xi. Let’s define the vector f i as applies
the combination of these M membership functions,
{f,!}, in one vector. For example, if the linguistic V i , j and V p
terms are (Good, Average, Bad), the fuzzy sub- Chfj(p) 5 1
1
set [21,z2,23] = f,(zi$ expresses that z i k belongs
a priori to the Good t category with a factor 21,
to the Average fit category with a factor 2 2 , and to Given a measurement p and assuming that M = 3,
the Bad fit category with a factor 23, for the char- with linguistic terms (Good, Average, Bad), a typical
acteristics of the solution represented by Xi. heuristic membership function, h ( p ) , with an overlapping
Without loss of generality, it can be assumed that the factor of SO%, may look like in Fi ure 3.1, where The
membership functions, f,!, are normalized, so that observation p belongs to the Gooi fit category with a
factor 1 around a reference value, p * , then slowly starts
to belong to the Average category and finally to the Bad
Normality rule: max f,!(z) = 1 category outside this area.
I
948
3.2.2 Level rating function
Now, the individual heuristic ratings must be combined
for each levels. Suppose that y, is a candidate solution up
.
I Bad Average Good AverageBad t o level i , the level rating function is the combination of
the m, heuristic rating functions h,, and of the a priori
rating function f, where all these rating functions repre-
sents fuzzy subsets. A classical approach in fuzzy logics
(see! e.g., Bellman & Zadeh [3 ) consists in taking the
b
minimum of the different mem ership functions, i.e., if
the fuzzy centroid defuzzification scheme is adopted
Figure 2: Heuristic fuzzy membership function, h ( p )
Once the rating membership function have been defined, i = 1 , .. . , N . This rating function does not take into
given an observation, p, given a heuristic with heuristic account the rating obtained at the previous levels (< i).
membership functions, h'(p), where 1 spans all the possi- Only the heuristics of the current level are combined in
ble linguistic terms, we can define the cumulated rating this manner. We propose different methods t o take into
membership function, gc(r), as account the ratings from the previous levels: either a
weighted sum of the different level ratings, or a global
!IC(.> = Csl(r)hl(P) product inference centroid of all the different levels, or even the minimum
of the different level ratings. Assuming that the minimum
I inference is used
gc(r) = min(g'(r), \h'(p)) minimini inference 1. Weighted sum level combination
I
F'
R = arg maxgc(r) =
4 arg maxy( g1(+'(P))
product in erence
E,:s'- min(gJ(r),hl(p)) dr
i minimum inference
Which method should be selected depends on the applica- 3. E x p e c t e d value approach: Ofcourse, if the prob-
tion. For example, if it is critical to find the exact solution
at each level and if the level have the same importance,
ability distributions of the observations Pkj, k = : +
1,. . . ,N, are known for a partial solution equal to yl (like
the minimum combination is the most suitable method.
However, if the different levels have different importance in the Bayesian approach), then the expected values could
and if the different level characteristics Xi, i = 1,.. . , N, +
be directly computed. So, let dkj(p)lyi, k = i 1,. . . , N ,
are poorly related, it is much better to use the weighted be the corresponding density of probability functions,
sum method. Finally, if the different level characteristics then
X i , I = 1 , . . . ,N are very interdependent, then the cen-
troid combination method is better. The corresponding
rating functions in the product inference case can be eas-
ily deduced from (2), (3),
As in the Bayesian approach, t).
and t ese level rating functions
are unable to compare candidate partial solutions at dif-
ferent levels. Based on these estimates, the global rating function is
defined the following way. Suppose that yi is a candidate
solution up to level i, and minimum inferences are used
1. W e i g h t e d sum level combination
3.2.3 Global rating function
To compare two candidate partial solutions at different
levels, the rating of the candidate at the lowest level
should be extrapolated up to the highest level. In other
words, an estimate of the rating that a candidate could
possibly obtained at a higher level must be found. There-
fore, to compare candidate solutions at every possible lev-
els the rating of every candidates should be estimated up
to the highest possible level, N . Assume that yi IS a can-
didate solution up to level i, then an estimate of
+ 4
must be found. Let's define E { h k , ) yi) and E { f k ) 6 y i ) ,
k = i 1,. . . , N as these estimates. hen, they coul be
computed different ways depending on the type of hehav-
ior which is desired for the expert system.
1. Best case approach: For each of the heuristics and
a priori ratings to be estimated, the maximum value is
always assumed. For example, if the linguistic terms are
(Good, A v e r a g e , Bad)
950
References
[I] N . J. Nilsson. Principles of artificial int,elligence. Mor-
gan Kaufmann Publishers, San Mateo, CA, 1980.
[Z] A . Papoulis. Probability, Random 'Variables, and
Stochastic Processes. McGraw-hill, New-York, NY,
1984.
[3] R. E. Bellman and L. A . Zadeh. Decision making in a
fuzzy environment. Management Science, 17:141-164,
1970.
[4] E. H. Mnmdani. Aplication of fuzzy logic to ap-
proximate reasoning using linguistic synthesis. ZEEE
Trans. on computers, C-26:1182-1191, 1977.
[5] B. Kosko. Fuzzy knowledge combination. Int. J . of
Intelligent St~sterrts,I:293-320, 1986.
95 1