You are on page 1of 10

Fuzzy Min-Max Classification with Neural Networks

Patrick K. Simpson
General Dynamics Electronics Division
P.O. Box 85310; Mail Zone 7202-K
San Diego, CA 92186-5310
ABSTRACT
A feedforward neural network classifier that uses min-max vector pairs to define
classes is described. This two-layer neural network utilizes a supervised learning
rule to build a set of classes. Each node in the output layer of the network
represents a class. During recall each class node produces an output value that
represents the degree to which the input pattern fits within the represented classes.
This fuzzy neural network is ideally suited to applications that have very little data
available to define classes. This paper provides a brief overview of fuzzy sets and
fuzzy pattern classification, a description of fuzzy min-max classification and its
neural network implementation, and an example of the classification operation.
1. FUZZY SETS AND FUZZY PATTERNS
Fuzzy sets where introduced by Zadeh (1965) as a means of representing inexact concepts.
Linguistic constructs such as many, few, often, and sometimes mean different things
depending on the situation and the observer. A fuzzy set, A, is a subset of the universe of
discourse, X, that ranges from no membership (the empty set) to full membership. A membership
function, mA(x), is used to describe the degree to which the object, x, belongs within theset A.
The degree, or grade, of membership in A ranges from 0 to 1, where 0 represents no membership
and 1 represents full membership. As an example, assume that A is the set of all people that are
young in the world X. The degree to which a 25 year old person belongs to X is greater than a
person twice that age.
From another perspective, a fuzzy set is a class that admits the possibility of partial membership.
Let X ={x) bethe space of all objects of interest (the universe of discourse). Then the fuzzy set A
in X is a set of ordered pairs A ={x, MA(x)), x E X, where MA(x) E [0,1] is defined as the
degree of membership of x in A (Kandel, 1986).
1.1. Operations on Fuzzy Sets
The power of fuzzy sets is their ability to represent and manipulate imprecise data. Like
traditional set theory, there is an entire suite of operations available to fuzzy sets that allow this
manipulation. The basic operations are comparison, containment, union, intersection, and
complementation. Let X ={XI, x2, ..., x,) be a standard set. Assume Y and Z are each fuzzy
subsets of X. The above operations are defined as follows:
ComDarison: Y and Z are said to beequal (Y =Z) iff My(x) =Mz(x), x E X.
Containment: Y G Z iff My(x) 5 MB(x), x E X.
Union: The union of Y and Z, denoted as Y U Z, is defined as
291
Y U 2 =(ma~(My(x), Mz(x)) I x E X}.
Intersection: The intersection of Y and 2, denoted as Y n 2, is defined as
Y n 2 =(min(My(x), Mz(x)) I x E X).
Comdement: The complement of Y, denoted as p, is defined as
Y C=(l -My(x)I xE X).
In addition to these operations, it is also useful in many instances to know the size of a fuzzy set.
The sigma-count, thesum of all the set membership values, provides this measure. Given the
fuzzy set Y, the sigma-count of Y is defined as
Zcount(Y) = C mdx )
X X
1.2. Fuzzy Sets as Fuzzy Patterns
An n-dimensional fuzzy pattern is a pattern constructed from the membership values of an n-
element ordered fuzzy set. Consider the ordered fuzzy set Y E X, where Y =(yl, y2, ..., yn). The
fuzzy pattern for Y is considered here to be a vector constructed from the membership values of
each of the n corresponding fuzzy set elements. There are several ways to represent a fuzzy
pattern. Kosko (1990) describes fuzzy patterns as points in the n-dimensional unit hypercube. For
low dimensional pattern sets (n<4), this is an indicatively pleasing way to represent the data. For
higher dimensional data, a histograph (a histogram with the bars replaced by a point at the
maximum value of each bar) is preferred as it makes the visualization of the data much easier. The
histograph representation of a fuzzy pattern (Figure 1) represents the fuzzy set elements as points
along the abscissa of the histograph and the degree of membership for each set element along the
ordinate of the histograph.
Figure 1: Histograph Representation of a Fuzzy Set
1 .o
-
a
E
E
n
E 0.5
E -
e!
Q)
r
0
Q)
P
=
0.0
-
*
I I I I
0 . 0
I
1 2 3 4 fuzzy set element n
There are several illustrative examples of fuzzy patterns. One example is frequency spectra. Let
the universe of discourse be the n-dimensional frequency space where each frequency value
ranges from 0 to 1. Each frequency bin of the frequency spectrum represents the degree of
membership in that frequency bin's range of values. The fuzzy set is a collection of frequency bin
membership values and the fuzzy pattern is the frequency spectrum.
Another example of a fuzzy pattern is a grey scale image. Let the universe of discourse be all
possible nXp-dimensional images (spatial patterns) where each pixel ranges from 0 to 1. Then
each pixel of the image is an element of the fuzzy set and each pixel value represents the
membership in the range of pixels values. The fuzzy set is the collection of pixel values and the
fuzzy pattern is the image.
' (a) Classification (b) Mean-Variance (c) Min-Max
Problem Solution Solution
2. FUZZY MIN-MAX CLASSIFICATION
Pattern classification determines which class an input pattern belongs. Pattern classification
requires class boundaries for each class to be defined. Traditional pattern classification is usually
performed by collecting data, extracting features, normalizing the features, and then attempting to
find a set of class boundaries that minimizes the classification error (Fukunaga, 1986).
Alternatively, a fuzzy class is a subset of the universe of discourse. Fuzzy pattern classification
seeks to find the subset of the universe of discourse that best represents a given pattern class.
Classification requires boundaries. Decision boundaries (Figure 2 (a)) in most classifiers seek to
minimize the intraclass distance, XI and x2, and maximize the interclass distance, y. Techniques
for creating the boundaries seek to find the mean and the variance of each class (Figure 2(b)). The
mean-variance approach results in a set of n-dimensional hyperspheres (or hyper-ellipsoids), one
for each class. Fuzzy min-max classification works from a different perspective. Class boundaries
are hyperboxes (boxes pack into cubes much better than spheres and ellipsoids). The min and max
points of a hyperbox are all that are required to completely define its size and shape. Hence, fuzzy
min-max classification seeks to find the min point and max point for each class (Figure 2(c)).
2.1. Class Overlap and Decision Boundaries
f\
Figure 2: Overview of Classification Methods
. e . . e .
293
Each n-dimensional hyperbox represents a class in the n-dimensional hypercube. The hyperboxes,
like the hyperspheres, can overlap. In traditional pattem classification, a Bayes theoretic boundary
is created that minimizes the misclassification between classes. Although the boundary can be
weighted according to the number of data points in each class and the relative importance of
misclassifying a data point, the end result is a hard decision that represents a willingness to live
with a fixed amount of misclassification. An alternate perspective is taken with fuzzy min-max
classification. If a class overlaps, the data point that falls within the overlap belongs to both
classes. Although at first glance this seems ludicrous, after some thought the motivations become
clear. There are many areas in pattem classification where patterns are not strictly of one class or
another, rather they are a mixture of two classes. The overlap between the classes represents that
mixture. As an example, assume there is a two class system: one class for circles and the other for
squares. Which class does an octagon belong to? It has a circular shape constructed from straight
lines. Is it a course circle or a sloppy square? The answer is both. There is a degree to which it
belongs to both classes.
2.2. Measuring the Degree Of Classification (DOC)
Fuzzy min-max classes are represented by a min point and a max point. Using the histograph
representation described above, one fuzzy set defines the max point and another defines the min
point (Figure 3(a)). If an input pattem falls completely between the max and min, then the input is
a member of the class with degree 1 (Figure 3(b)). If the input falls completely outside the min-
Figure 3: Fuzzy Min-Max Degree Of Classification (DOC)
Q
>
a -
Q
E 0.5 f i
c c
.... r Min
3
-
I
1 1 A b - 4
dimension
le I
Point
Point
N
-
3 I
1 1 A b - 4
dimension
(a) Mln and max points define the class.
U)
2 1.
- -
....-
.... r
~ 0.0 Input
.................................................................
E l -
3
.c
(c) Input pattern has DOC =0.
1. c
Input
$ - 0.0
3
-
I
1 1 A b -4l
dimension
(b) Input pattern has DOC =1.
v)
(d) Input has 0 <DOC c 1.
max boundaries, then the input is a member of the class with degree 0 (Figure 3(c)). If theinput
pattern is neither completely within or completely outside of the min-max class boundaries, then
the degree to which the input is a member of the class falls between 0 and 1 (Figure 3(d)).
Degree of Classification (DOC) is the measure of how well the kth input pattern, Ak, falls
between the min point of the jth class, V,, and the max point of the same class, Wj.The=are three
measures that have been derived to describe the degree of classification for min-max classes: (1)
subsethood/supersethood, (2) average underlap/overlap, and (3) biased underlap/overlap.
Subsethood/Suuersethood. Kosko (1990) describes a method of measuring the degree to which a
fuzzy set Y set is a superset (or subset) of another fuzzy set Z. Thesupersethood measure is a
measure of the amount of Ys underlap with Z normalized by Ys sigma-count. Underlap is the
measure of the amount of Y that is not a superset of Z. The subsethood measure is the
complement of the supersethood measure; supersethood(Y,Z) =1 - subsethood(Y,Z) where Y and
Z are fuzzy sets. Using this measure, the degree of classification (DOC) is computed as follows
n
i = 1
n
c ki
i = 1
c ki
i = 1
where Ak =(akl, aa, ..., ah) is the kth n-dimensional fuzzy input pattern, k =1: 2, ..., m; V, =
(Vjl, vj2, ..., Vjn) is the jth classs min point; Wj =(wjl, wj2, ..., Wjn) is the jth classs max point,
and j is the index of the class. This measure is clearly biased by the size of the input pattern. If the
sigma-count (size) of the input pattern is large, the sumof the violations in the numerator of the
two fractions has less of an affect than if the input pattern was small. Hence, points equidistant
from the class boundary on the top and bottom provide different responses.
Average Underlau/OverlaD: To alleviate the bias created by the input patterns size, an alternative
classification measure is introduced as a modification of (1) that normalizes by the pattern
dimensionality. This second degree of classification metric is defined as
This measure is a product of two complements, the complement of the average underlap and the
complement of the average overlap. Although this measure does eliminate the pattern size bias,
for large dimension patterns, the average difference between the input and the class boundaries
become very small. Hence the relative difference between the classification responses becomes
very small. When using this method of measuring misclassification, the DOC values are close to
unity most of the time, making it difficult to discriminate class membership from the DOC2
values.
295
Biased Underlap/Overlap: As a compromise between DOC1 and DOC2, the biased underlap/
overlap is introduced. The biased underlap/overlap calculates the amount of underlap and divides
that value by the number of components of the fuzzy pattern that where less than the min point
(nsub). Similarly, the amount of overlap is divided by the number of components of the fuzzy
pattern that where greater than the max point (nsup). The resulting equation is
DOC,(A,, VP Wj ) =
n n
r i r 1
h p
j/ G i = l
1 - i =
nsub
(3)
The average underlap and overlap are now biased by the number of components that where
outside the class boundaries. In practice, this measure has worked well for large dimension
patterns (n>100).
3. FUZZY LOGIC AND NEURAL NETWORKS
A neural network is a distributed processing system that utilizes only local information to carry
out useful information processing tasks. Local processing, in the neural network sense, means that
all the information a processing element needs to compute an output value is available at the
abutting connections. Because of the local processing nature of neural networks, it is possible to
implement them in parallel which can provide real-time processing. The fuzzy min-max classifier
can beimplemented as a two-layer dual connection feedforward neural network. All of the input
necessary to compute a single valued output is available in the connections and the input pattern
Ak =(akl, au, ..., ab).
3.1. Dual Connection Network Topology
Most feedforward neural networks have a single connection between any two processing
Figure 4: Single and Dual Connection Neural Networks
FB
FA
FB
F A
( akl 9 ak2 9 ak3 ) = A k
(a)Slngle connection neural network.
( akl 9 ak2 9 ak3 ) =Ak
(b) Dual connection neural network.
elements in the network (cf. Simpson, 1990). As an example, a two-layer feedforward fully
interconnected network has one connection from each FA processing element to each FB
processing element (Figure 4(a)). The connection from the ith FA processing element to the jth
FB processing element is wji. These connections are used to represent class exemplars or form a
span of the input space.
Recently there have been dual connection neural networks introduced (cf. Simpson, 1991) with
two connections between the processing elements. As an example, a two-layer feedforward fully
interconnected network has two connections from each FA processing element to each FB
processing element (Figure 4(b)). The two connections between the ith FA and the jth FB
processing elements are v,i and w,i. These connections had only previously been used to represent
the mean (wji) and variance (vji) of a class. Here we will use these dual connections to store the
min and max points of a class. In this framework, the min point for a class is represented by the
values on the Vj connections (min connections for the jth class), V, =vjl, vj2, ..., vjn, where v,i is
the connection from the ith FA processing element to the jth FB processing element in a two-
layer feedforward neural network. The max point for a class is represented by the values on the
Wj connections (max connections for the jth class), W, =wjl, w,2, ..., wjn, where wji is the
connection from the ith FA processing element to the jth FB processing element in a two-layer
feedforward neural network.
3.2. Fuzzy Min-Max Adaptation
The fuzzy min-max classifier is a supervised learning classifier. Each fuzzy pattern has a class
associated with it. Only the min and max points associated with that class are adjusted during the
presentation of that fuzzy pattern. The learning procedure utilizes the fuzzy union operation to
adjust the max point and the fuzzy intersection operation to adjust the min point. As a result, the
min point represents the set of lowest values among all the fuzzy patterns associated with a given
class. Similarly, the max point represents the set of highest values among all the fuzzy patterns
associated with a given class. Assuming the connection topology described in the previous
section, the min connections are adjusted using the equation
(4)
new old
I J
v.. =v.. n aki
for all i =1, 2, ..., n, where j is the class associated with the fuzzy input pattern Ak =(akl, au, ...,
ah), k =1,2, ..., m. The max connections are adjusted using the equation
( 5)
new old
W. . , =wji U aki
for all i =12, ..., n and j is the class associated with the fuzzy input pattern A,.
Two important aspects of these adaptation equations should beemphasized. First, the adaptation
does not require several presentations of the input patterns, rather the learning is immediate.
Second, the learning equations are able to learn on-line. New data can be added to the system
without complete retraining. These qualities are very appealing in applications where it is difficult
to maintain a data set for retraining and training time must bedone quickly.
3.3. Fuzzy Class Initialization
Initially each class is empty. During the presentation of the first pattern for a class, the min and
max points should assume the same values as the input pattern. During successive pattem
presentations the min and max points will then separate (providing successive patterns are not
297
Figure 5: The Six Fuzzy Patterns
Used to Form Three Classes
I
t kc Pattern A2
I e
-
Pattern A,
t
t
Pattern A3
Pattern A4
t
I
Pattern A5
t
r
Pattern A6
identical to the first pattern) with the max
boundary increasing its values and the min
boundary decreasing its values. To achieve this
type of learning behavior, the min points are
initially set to all 1 s (the FULL point of the
unit hypercube) and the max points are
initially set to all 0s (the NULL point of the
unit hypercube).
3.4. Fuzzy Min-Max Recall
During recall, an input pattern, Ak, is
presented to a neural network that has p
classes. The neural network produces a set of p
values, one for each class, that represents the
degree to which the input pattern, Ak, fits
within the class j. Any one of the three
equations described for the degree of
classification (1)-(3) can be used for recall.
The selection is problem dependant. In the
following section is an example of the recall
process using only equation (3).
4. AN EXAMPLE OF MIN-MAX
LEARNING AND CLASSIFICATION
4.1. Fuzzy Pattern Data Set
To illustrate how the fuzzy min-max classifier
works, the following example is presented. A
data set of six lodimension patterns (Figure
5) is associated with 3 classes. The fuzzy
patterns and their associated classes are as
follows:
Ai =(0.0,0.2,0.4,0.4,0.4,0.2,0.2,0.2,0.2,0.2) CbSS 1
A2 =(0.4,0.4,0.2,0.2,0.2,0.4,0.6,0.6,0.6,0.6) C ~ S S 2
4 =(0.2,0.4,0.6,0.4,0.2,0.6,0.8,0.4,0.2,0.2) ChSS3
4 =(0.2,0.2,0.2,0.4,0.6,0.6,0.4,0.2,0.2,0.2) chSS1
& =(0.6,0.4,0.2,0.2,0.2,0.2,0.2,0.6,0.6,0.6) CbSS2
4 =(0.6,0.4,0.2,0.4,0.6,0.4,0.2,0.4,0.6,0.4) ChSS3
4.2. Fuzzy Min-Max Class Formation
To store these patterns requires a 10x3 dual
Figure 6: The Three Classes
Formed From A, - A6
4
Class 1 (Vl,Wl) with A,
T
Class 2 (V,,Wd with A,
Class 3 (V3,W3) with A,
~~
Table 1 : Degree Of Classification
for Patterns B1 and B2
I
Class 1
0.36
Class 2 Class 3
*
connection two-layer feedforward neural
network. The learning procedure described by
equations (4) and ( 5) produces the following
sets of min and max connections (Figure 6) for
this network:
Class 1 min and maxpoints:
Vi =(0.0,0.2,0.2,0.4,0.4,0.2,0.2,0.2,0.2,0.2)
W1 =(0.2,0.2,0.4,0.4,0.6,0.6,0.4,0.2,0.2,0.2)
Class 2 min and maxpoints:
V2 =(0.4,0.4,0.2,0.2,0.2,0.2,0.2,0.6,0.6,0.6)
W2 =(0.6,0.4,0.2,0.2,0.2,0.4,0.6,0.6,0.6,0.6)
Class 3 min and max points:
V3 =(0.5,0.3,0.1,0.3,0.5,0.3,0.1,0.3,0.5,0.3)
W3 =(0.6,0.4,0.2,0.4,0.6,0.4,0.2,0.4,0.6,0.4)
4.3. Classifying an Input Pattern
Two new input patterns will be used to
demonstrate the classification of the Fuzzy
Min-Max Classifier. The two input patterns are
B1 =(0.0,0.2,0.4,0.6,0.8,0.6,0.4,0.2,0.2,0.2)
82 =(0.8,0.6,0.4,0.2,0.0,0.2,0.4,0.6,0.6,0.6)
Figure 7 shows these patterns relatives to the
three classes formed. Using equation (3),
DOC3(), the degree to which each input
pattern fits within each of the three previously
created classes are computed, yielding the
classification values found in Table 1. It is
interesting to note that B2produced almost an
identical response from Classes 2 and 3. Also,
Both B1 and B2 produced equivalent DOCS
for class 3.
5. CONCLUSION
A two-layer feedforward supervised learning
neural network that classifies fuzzy patterns
has been described and illustrated. This neural
network can learn on-line and on a single pass
through the data set. The algorithm provides
an output value for each class that represents
the degree to which the input pattern falls
within the respective class boundaries. This
algorithm is currently being explored for
applications in pattern recognition and control.
299
Figure 7: Comparing the Input Patterns with the Classes Formed
(Input patterns shown as dashed lines)
t t
Class 1 (V,,W,) with A,
Class 1 (V,,W,) with A2
Class 2 (V,,W,) with A,
Class 2 (V2,W2 with A2
4 A
<
b
Class 3 (V3,W3) with A,
Class 3 (V3,W3) with A2
REFERENCES
Fukunaga, K. (1986). Statistical pattern recognition, In Handbook of Pattern Recognition and
Image Processing, T. Young and K. Fu, Eds., pp. 3-32. Academic Press: San Diego, CA.
Kandel, A. (1986). Fuzzy Mathematical Techniaues with ADDliCatiOnS, Addison-Wesley:
Reading, MA.
Kosko, B. (1990). Fuzziness vs. probability, International Journal of General Systems, Vol. 17,
Simpson, P. (1990). Artificial Neural Systems: Foundations, Paradigms, ADDliCatiOnS and
Simpson, P. (1991). Foundations of neural networks, Chapter 1 in Neural Networks, C. Lau & E.
Zadeh, L. (1965). Fuzzy sets, Information and Control, Vol. 8, pp. 338-353.
pp. 211-240.
ImDlementations, Pergamon Press: Elmsford, NY.
Sanchez-Sinecio, Eds., IEEE Press: Piscataway, NJ.
300

You might also like