Professional Documents
Culture Documents
DOI 10.1007/s10489-006-0105-0
C Springer Science + Business Media, LLC 2006
Springer
244 Appl Intell (2006) 25:243–251
By using the so-called grey relational structure, new instances the distance will be set to one (i.e., the distance is maximal).
in a specific domain can be classified with high accuracy. Meanwhile, the distance will also be set to one if a or b is
Moreover, the total number of classes in a specific domain unknown (i.e., missing). This biases to a high dependence on
does not affect the complexity of the proposed approach. symbolic attributes or unusually high biases against missing
Forty classification problems are used for performance com- attributes. A heterogeneous similarity function is thus defined
parison. Experimental results have shown that the proposed for domains with both numeric and symbolic attributes. This
approach yields higher performance over other methods that distance function, also known as HOEM [39], was adopted
adopt one of the above-mentioned two similarity functions in IB1, IB2 and IB3 [1].
or both, i.e., Euclidean metric and the Value Difference Met- In [34], another well-known distance function, namely the
ric (VDM). Moreover, the proposed method can yield higher Value Difference Metric (VDM), was proposed to determine
performance, compared to some other classification algo- the ‘nearness’ between instances. For each symbolic attribute
rithms. s of instances, the VDM between two values a and b is defined
The rest of this paper is structured as follows. Some simi- as follows.
C k
larity functions used for IBL are introduced in Section 2. The
concept of grey relational analysis is reviewed in Section 3. In vdm(a, b) = Nai − Nbi , (3)
N Nb
Section 4, an instance-based learning approach based on grey i=1 a
relational structure is presented. In Section 5, experiments
where Na is the number of training instances with value a for
performed on forty datasets are reported. Finally, Section 6
attribute s, and Nai is the number of training instances with
gives our conclusions.
value a for attribute s and output class i; C is the number
of output classes in a specific domain and k is usually set to
1 or 2. In Eq. (3), the two ratios are estimates of the prob-
2 Similarity functions in IBL
abilities of attribute values given the class (i.e., of the class
conditional probabilities).
This section reviews some similarity functions used for
Generally, the VDM is mainly suitable for domains with
IBL. In most IBL systems, the Euclidean similarity func-
symbolic attributes. Some discretization methods were thus
tion has been adopted. Let x and y be two instances with
incorporated with the VDM for dealing with numeric at-
n attributes, denoted as x = (x(1), x(2), . . . , x(n)) and y =
tributes, i.e., numeric attributes were discretized into sym-
(y(1), y(2), . . . , y(n)). The Euclidean metric is defined as
bolic attributes for the learning tasks.
follows.
In [39], three variants of VDM, namely Heterogeneous
n
Value Difference Metric (HVDM), Interpolated Value Differ-
Eu(x, y) = [x(i) − y(i)]2 . (1) ence Metric (IVDM) and Windowed Value Difference Met-
i=1 ric (WVDM) were proposed. The HVDM uses the VDM and
the above-mentioned normalized-distance (i.e., each distance
An alternative similarity function, the Manhattan distance
for numeric attribute i is divided by 4 standard deviations of
function, is defined as follows.
attribute i) to handle symbolic and numeric attributes, respec-
n
tively. As for the IVDM and the WVDM, different discretiza-
Ma(x, y) = |x(i) − y(i)|. (2)
tion methods were incorporated with the original version of
i=1
VDM. These similarity functions are useful for applications
where x and y are two instances with n attributes, denoted as with both numeric and symbolic input attributes. Further de-
x = (x(1), x(2), . . . , x(n)) and y = (y(1), y(2), . . . , y(n)). tails regarding these three similarity functions are mentioned
Obviously, Euclidean-like distance functions are normally in [39]. Similar to VDM, however, their complexity is in-
applicable to domains with numeric attributes. In general, creased along with the number of classes in the learning
prior to learning, normalization should be done for each nu- tasks. In addition, the PEBLS learning system [32] intro-
meric attribute, i.e., each distance for numeric attribute i is duces a variant of VDM, called the modified VDM (MVDM).
divided by the maximal difference or by the standard devia- MVDM incorporates VDM with a scheme that adjusts the
tion of attribute i. In this manner, the problem of one attribute weight of each attribute for learning. A review of various
with a relatively larger range of values than other attributes similarity functions can be found in [39].
dominating the distance can be avoided. Accordingly, the
distance of each attribute can be bounded between zero and 3 Grey relational analysis
one. To deal with each symbolic attribute i, the distance be-
tween two values a and b of attribute i will be set to zero if a Since its inception in 1984 [8], grey relational analysis
and b are the same (i.e., the distance is minimal); otherwise [8–10] has been applied to a wide variety of application
Springer
Appl Intell (2006) 25:243–251 245
domains. As a measurement method, grey relational analy- The grey relational orders of observations x1 , x2 ,
sis is used to determine the relationships among a referential x3 , . . . , xm can be similarly obtained as follows.
observation and the compared observations based on cal-
culating the grey relational coefficient (GRC) and the grey
GRO(xq ) = (yq1 , yq2 , . . . , yqm ), (7)
relational grade (GRG). Consider a set of m + 1 observa-
tions {x0 , x1 , x2 , . . . , xm }, where x0 is the referential obser-
vation and x1 , x2 , . . . , xm are the compared observations. where q = 1, 2, . . . , m, GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥
Each observation xe includes n attributes and is represented · · · ≥ GRG(xq , yqm ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq ,
as xe = (xe (1), xe (2), . . . , xe (n)). The grey relational coeffi- r = 1, 2, . . . , m, and yqa = yqb if a = b.
cient can be calculated as, In Eq. (7), the GRG between observations xq and yq1
exceeds those between xq and other observations (yq2 , yq3 ,
min + ζ max . . . , yqm ). That is, the difference between xq and yq1 is the
GRC (x0 ( p), xi ( p)) = , (4)
|x0 ( p) − xi ( p)| + ζ max smallest.
Grey relational analysis meets four principal axioms, in-
where min = min∀ j min∀k |x0 (k) − x j (k)|, max = cluding (a) normality (b) dual symmetry (c) wholeness and
max∀ j max∀k |x0 (k) − x j (k)|, ζ ∈ [0,1] (Usually, ζ = 0.5), (d) approachability [8–10, 29].
i = j = 1, 2, . . . , m, and k = p = 1, 2, . . . , n.
Here, GRC (x0 ( p), xi ( p)) is considered as the similar- (a) normality—GRG(x0 , xi ) takes a value between zero and
ity between x0 ( p) and xi ( p). If GRC (x0 ( p), x1 ( p)) exceeds one
GRC (x0 ( p), x2 ( p)) then the similarity between x0 ( p) and (b) dual symmetry—If only two observations (x0 and x1 )
x1 ( p) is larger than that between x0 ( p) and x2 ( p); otherwise are made in the relational space, then GRG(x0 , x1 ) =
the former is smaller than the latter. Moreover, if x0 and xi GRG(x1 , x0 )
have the same value for attribute p, GRC (x0 ( p), xi ( p)) (i.e., (c) wholeness—If three or more observations are made in
the similarity between x0 ( p) and xi ( p)) will be one. By con- the relational space, then GRG(x0 , xi ) often doesnot equal
trast, if x0 and xi are different to a great deal for attribute GRG(xi , x0 ), ∀i and
p, GRC (x0 ( p), xi ( p)) will be close to zero. Similar meth- (d) approachability—GRG(x0 , xi ) decreases as the difference
ods for dealing with symbolic attributes will be detailed in between x0 ( p) and xi ( p) increases (other values in Eqs.
Section 4. Notably, changing the value of ζ does not affect (4) and (5) are held constant).
GRO and the performance of the proposed learning approach. Based on these axioms, grey relational analysis offers
Restated, the original version of GRC (as stated in [8]) is con- some advantages. For example, it gives a normalized mea-
sidered to determine the similarity between two instances in suring function (Normality)—a method for measuring the
the proposed learning approach. similarities or differences among observations—to analyze
Accordingly, the grey relational grade between instances the relational structure. Also, grey relational analysis yields
x0 and xi is expressed as, whole relational orders (wholeness) over the entire relational
space. As stated in the following section, these properties are
1 n
GRG (x0 , xi ) = GRC(x0 (k), xi (k)), (5) useful for instance-based learning.
n k=1
Springer
246 Appl Intell (2006) 25:243–251
where GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , for classifying a new, unseen instance i, only k inward edges
yqm ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , connected with instance i in the above k-level grey relational
m, and yqa = yqb if a = b. structure are needed. In other words, k nearest neighbors of
Here, a graphical structure, called k-level grey relational each unseen instance are considered for the learning tasks
<
structure (k = m), is defined as follows to describe the relation- (i.e., pattern classification).
ships among referential instance xq and all other instances, Next, an instance-based learning algorithm for pattern
where the total number of ‘nearest’ instances of referential classification based on the k-level grey relational structure
instance xq (q = 0, 1, . . . , m) is restricted to k. is detailed. Assume that we have a training set T of m
labeled instances, denoted by T = {x1 , x2 , . . . , xm }, where
GRO∗ (xq , k) = (yq1 , yq2 , . . . , yqk ), (9) each instance xe has n attributes and is denoted as xe =
(xe (1), xe (2), . . . , xe (n)). For classifying a new, unseen in-
where GRG(xq , yq1 ) ≥ GRG(xq , yq2 ) ≥ · · · ≥ GRG(xq , stance x0 , the proposed learning procedure is performed as
yqk ), yqr ∈ {x0 , x1 , x2 , . . . , xm }, yqr = xq , r = 1, 2, . . . , k, follows.
and yqa = yqb if a = b.
That is, a directed graph, shown in Fig. 1, can be used Step 1. Calculate the grey relational coefficient (GRC) and
to express the relational space, where each instance xq (q = the grey relational grade (GRG) between x0 and xi , for
0, 1, . . . , m) as well as its k nearest instances (i.e., yqr , r = i = 1, 2, . . . , m.
1, 2, . . . , k) are represented by vertices, and each expression
GRO∗ (xq , k) is represented by k directed edges (i.e., xq to If attribute p of each instance xe is numeric, the
yq1 , xq to yq2 , . . . , xq to yqk ). value of GRC (x0 ( p), xi ( p)) is calculated by
Here, the characteristics of the proposed k-level grey rela- Eq. (4).
tional structure are described in detail. First, for each instance If attribute p of each instance xe is symbolic, the
xq , k instances (vertices) are connected by the inward edges value of GRC (x0 ( p), xi ( p)) is calculated as
from instance xq . That is, these instances are the nearest
neighbors (with small difference) of instance xq , implying GRC (x0 ( p), xi ( p)) = 1, if x0 ( p) and xi ( p) are the same.
that, they evidence the class label of instance xq according
GRC (x0 ( p), xi ( p)) = 0, if, x0 ( p) and xi ( p) are different.
to the nearest neighbor rule [7, 12]. Also, in the one-level
grey relational structure, instance yq1 , with the largest sim-
Accordingly, calculate the grey relational grade
ilarity, is the nearest neighbor of instance xq . Thus, a new,
(GRG) between x0 and xi , for i = 1, 2, . . . , m
unseen instance can be classified according to its nearest in-
by Eq. (5).
stance in the one-level grey relational structure or its nearest
instances in the k-level grey relational structure. Obviously,
Step 2. Calculate the grey relational order (GRO) of x0 based
on the degree of GRG(x0 , xi ), where i = 1, 2, . . . , m.
Fig. 1 k-level grey relational Step 3. Construct the k-level grey relational structure accord-
x0 y 01
structure
ing to the above grey relational order (GRO) of x0 , where
y 02
. k<= m. Here, only k inward edges connected with instance
. x0 are needed.
Step 4. Classify the new instancex0 by considering the class
y 0k
labels of instances y1 , y2 , . . . , yk with the majority voting
x1 y 11 method [7], where y1 , y2 , . . . , yk are the vertexes con-
y 12 nected by k inward edges from x0 in the k-level grey
.
relational structure (i.e., instances y1 , y2 , . . . , yk are the
.
nearest neighbors of instance x0 ). Notably, the best choice
of k used for pattern classification can be determined by
. y 1k cross validation [35].
.
.
As stated in Section 3, GRC (x0 ( p), xi ( p)) can be treated
xm y m1
as the similarity between x0 ( p) and xi ( p). If x0 and xi have the
.
y m2 same value for symbolic attribute p, GRC (x0 ( p), xi ( p)) (i.e.,
. the similarity between x0 ( p) and xi ( p)) will be set to one.
By contrast, if x0 and xi are different for symbolic attribute
y mk p, G RC (x0 ( p), xi ( p)) will be set to zero. These settings are
Springer
Appl Intell (2006) 25:243–251 247
similar to those used in [1]. As mentioned earlier, the similar- ing approach. As mentioned earlier, the complexity of using
ity function presented here offers some advantages, including the VDM is increased along with the number of classes in
normality and wholeness (i.e., asymmetry). That is, this simi- a specific application domain, i.e., O(mnC), where C is the
larity function is appropriate for measuring the similarities or number of classes. This problem does not appear in the pro-
differences among observations and yields whole relational posed learning approach.
orders (wholeness) over the entire relational space in which In addition to deal with classification tasks, the above k-
all instances (or patterns) in a specific domain are treated as level grey relational structure can be used for instance prun-
various vectors. ing or partial memory learning [21, 26, 40]. For example,
In some application domains, instances may contain miss- an instance may not be connected by any inward edges from
ing attribute values (for example, some datasets in the exper-
Table 1 Average accuracy (%) of classification for the proposed ap-
iments in Section 5 contain missing attribute values). In this
proach and other methods with HOEM (with k-nn), HVDM (with k-nn)
paper, to handle domains that contain missing attribute val- and IVDM (with k-nn) [39], respectively. (k) indicates the best value
ues, a method presented in [20] for missing attribute value for k using cross-validation on each classification problem
prediction is applied prior to learning (That is, domains with
Proposed
missing attribute values in the experiments in Section 5 are Dataset HOEM HVDM IVDM approach (k)
handled by using the prediction method first presented in
[20]). In this missing attribute value prediction method, the Allbp 94.89 95.05 95.29 95.48 (13)
nearest neighbors of an instance with missing attribute val- Allhyper 97.09 97.00 97.20 97.35 (11)
Allhypo 90.31 90.16 96.11 92.74 (7)
ues can be determined. Accordingly, the valid attribute val-
Allrep 96.14 96.31 98.25 97.18 (7)
ues derived from these nearest neighbors are used to predict Australian 81.30 81.72 80.52 81.87 (13)
those missing values. After predicting (estimating) missing Autos 74.90 79.79 80.19 76.45 (1)
attribute values with high accuracy, an imperfect dataset can Breast cancer 70.90 66.73 66.73 70.90 (7)
be handled as a complete dataset in classification tasks. Fi- Breast-w 95.54 95.24 95.74 96.78 (5)
nally, the proposed learning approach is applied for classifi- Cpu 68.04 68.51 65.39 70.09 (3)
cation. Notably, any method used for dealing with missing Crx 80.12 81.06 80.27 80.82 (11)
attribute values probably biases the data. Dis 98.20 98.42 98.24 97.77 (3)
Echoi 81.09 80.32 79.36 80.94 (7)
Assume that we have a training set T of m labeled
Glass 69.45 72.63 70.69 74.13 (3)
instances, denoted by T = {x1 , x2 , . . . , xm }, where each Hepatitis 79.40 80.45 81.47 80.84 (7)
instance xe has n attributes and is denoted as xe = Hypothyroid 93.58 93.60 98.06 98.24 (1)
(xe (1), xe (2), . . . , xe (n)). For classifying a new, unseen in- Ionosphere 87.22 86.40 91.14 91.37 (1)
stance x0 in typical instance-based learning methods (in Iris 94.67 94.67 94.67 95.33 (7)
which the Euclidean distance is used as the similarity func- Letter 96.25 96.01 96.10 95.30 (1)
tion), the Euclidean distance between x0 and xa (1 < <
= a = m)
Liver disorders 61.86 62.89 62.43 63.30 (19)
Mushroom 100.00 100.00 100.00 100.00 (1)
is calculated without considering other training instances
Pageblocks 96.17 96.24 96.34 96.43 (1)
(i.e., without considering all xi , i = a, and 1 < <
= i = m). By
Pimadiabetes 70.49 70.25 68.53 69.40 (17)
contrast, in the proposed learning approach, all training in- Satelliteimage 90.23 90.26 90.14 90.19 (5)
stances with n attributes will be considered (calculated) to Satelliteimagetest 88.35 88.32 88.41 88.81 (1)
determine the similarity (i.e., GRC and GRG) between x0 Segment 96.80 97.07 97.33 97.81 (1)
and xa (1 < <
= a = m), i.e., the property of the axiom, whole- Shuttle 99.05 99.86 99.85 99.94 (1)
ness [8] of GRA in Section 3. This consideration is the main Shuttletest 98.88 98.76 98.87 98.99 (1)
difference between Euclidean-based similarity functions and Sick 87.01 86.86 96.84 92.75 (5)
Sickeuthyroid 68.30 68.41 95.08 88.60 (7)
GRA. In other words, in the proposed learning approach, a
Sonar 85.88 87.20 84.24 86.01 (1)
whole relational order (i.e., GRO) by considering all train- Soybean 90.51 90.98 92.03 89.31 (1)
ing instances will be derived for classifying a new, unseen Soybeansmall 100.00 100.00 100.00 100.00 (1)
instance. Sponge 84.29 84.38 84.28 85.37 (1)
In addition, let m denote the number of compared in- Tae 63.71 60.99 60.33 63.51 (1)
stances and n be the number of attributes. The time for clas- Vehicle 70.01 70.90 69.53 70.86 (5)
sifying a new, unseen instance (including the time for cal- Voting 93.57 95.17 95.17 93.57 (5)
culating the grey relational order) is O (mn + m log m). In Vowel 98.52 98.67 98.52 98.95 (1)
Wine 94.89 95.46 97.47 97.30 (13)
addition, the time for discovering of the best value for k in
Yeast 53.37 53.72 53.21 53.44 (19)
the proposed learning approach should also be included as Zoo 95.45 95.34 96.43 96.04 (1)
the complexity of the proposed learning approach. The two Average 85.91 86.15 87.26 87.35
parts are the overall time complexity of the proposed learn-
Springer
248 Appl Intell (2006) 25:243–251
Table 2 The statistic analysis for the worse test result of X/Y under method
proposed approach and other learn- S column indicates that the proposed
ing methods with HOEM, HVDM and learning approach performs better than
IVDM [39], respectively. A better or method S in X cases
Table 3 Average accuracy (%) of classification for the proposed approach and other classification algorithms
Allbp 95.25 94.82 35.04 44.04 95.93 93.89 96.89 97.29 95.48
Allhyper 97.25 97.21 97.93 87.14 97.57 95.68 98.61 98.64 97.35
Allhypo 92.14 95.75 98.21 91.68 96.68 95.00 99.11 99.43 92.74
Allrep 96.89 96.89 96.89 93.07 96.82 93.96 99.21 99.25 97.18
Australian 55.51 85.51 44.93 86.81 85.51 76.67 85.07 85.51 81.87
Autos 32.74 44.93 62.95 58.93 62.90 58.02 80.05 82.52 76.45
Breast cancer 70.30 70.69 69.94 67.13 65.39 74.42 74.14 75.18 70.90
Breast-w 65.52 91.70 88.42 96.00 91.84 96.00 94.42 95.28 96.78
Cpu 57.90 65.10 61.24 55.57 69.86 68.98 67.93 66.05 70.09
Crx 55.51 85.51 60.43 84.78 85.51 77.68 85.51 85.94 80.82
Dis 98.39 98.39 48.61 55.21 98.25 95.14 98.75 99.14 97.77
Echoi 42.58 64.63 70.91 74.14 63.71 86.34 79.19 84.25 80.94
Glass 35.50 44.87 50.97 57.90 58.35 50.39 69.18 66.71 74.13
Hepatitis 79.38 81.29 65.16 83.88 80.00 82.58 83.87 83.23 80.84
Hypothyroid 95.23 97.38 95.51 48.61 97.91 97.85 99.08 99.24 98.24
Ionosphere 64.10 82.61 92.60 94.02 82.04 82.36 89.75 90.90 91.37
Iris 33.33 66.67 93.33 96.67 93.33 96.00 93.33 95.33 95.33
Letter 4.07 7.09 22.25 61.23 17.25 64.11 71.24 88.23 95.30
Liver disorders 57.98 61.17 44.62 59.76 57.40 55.96 56.23 66.38 63.30
Mushroom 51.80 88.68 99.77 99.88 98.52 95.75 100.00 100.00 100.00
Pageblocks 89.77 93.13 91.39 87.01 93.55 90.08 95.85 96.00 96.43
Pimadiabetes 65.11 72.01 35.03 64.84 72.28 75.78 74.48 74.09 69.40
Satelliteimage 24.17 44.74 48.00 71.52 59.82 79.55 82.57 86.11 90.19
Satelliteimagetest 23.50 41.70 58.30 72.70 58.05 79.05 80.80 83.25 88.81
Segment 14.29 28.53 75.50 77.45 63.98 80.30 91.69 97.10 97.81
Shuttle 78.41 86.94 84.15 78.27 94.69 91.52 99.75 99.96 99.94
Shuttletest 79.16 86.77 87.61 82.68 94.65 92.88 99.70 99.90 98.99
Sick 93.89 96.75 93.86 61.93 96.54 92.57 97.54 98.82 92.75
Sickeuthyroid 90.74 94.44 90.74 46.38 94.91 84.00 97.28 97.79 88.60
Sonar 53.38 71.60 59.57 55.76 63.93 65.93 72.55 74.07 86.01
Soybean 13.03 26.06 89.90 82.44 39.41 90.55 83.39 88.93 89.31
Soybeansmall 35.50 57.00 100.00 97.50 83.50 97.50 100.00 98.00 100.00
Sponge 16.07 41.25 80.36 75.89 44.82 84.46 73.04 66.07 85.37
Tae 34.42 35.75 47.04 52.96 42.42 53.63 50.96 53.04 63.51
Vehicle 25.77 39.59 38.55 53.55 52.70 44.45 67.96 73.28 70.86
Voting 61.38 95.62 38.62 90.34 95.62 90.57 94.49 97.00 93.57
Vowel 9.09 17.58 36.67 60.20 34.34 67.78 67.07 80.81 98.95
Wine 39.93 59.51 91.08 95.46 76.93 97.22 91.08 94.41 97.30
Yeast 31.20 40.70 35.85 50.27 40.30 58.09 56.88 54.39 53.44
Zoo 40.64 60.45 94.09 94.09 73.36 95.18 89.18 92.09 96.04
Average 55.02 67.78 69.40 73.69 74.26 81.20 84.70 86.59 87.35
Springer
Appl Intell (2006) 25:243–251 249
Table 4 The statistic analysis for the proposed approach and other various classification algorithms. A better or worse test result of X/Y under
method S column indicates that the proposed learning approach performs better than method S in X cases
Average accuracy 55.02 67.78 69.40 73.69 74.26 81.20 84.70 86.59 87.35
Better or worse test (B/W test) 37/3 31/9 33/6 35/5 30/10 32/8 21/17 17/21 −
Wilcoxon test 99.50 99.50 99.50 99.50 99.50 99.50 94.05 54.35 −
other instances in the k-level grey relational structure. In other ods and the proposed approach. Here, “Baseline” means that
words, this instance is rarely used in determining the class the majority class is simply chosen for classification. Simi-
labels of other instances, implying that, it is probably a good larly, Table 4 gives the statistical analysis, including better or
choice for instance pruning in a learning system. worse test (B/W test; for example, a better or worse test re-
sult of 32/8 under NaiveBayes column indicates that the pro-
posed learning approach performs better than NaiveBayes in
5 Experimental results 32 cases) and Wilcoxon Signed Ranks test [37] (i.e., the pro-
posed approach is compared with others), for comparing the
In this section, experiments performed on forty data sets above learning methods. As a result, the proposal presented
(from [5]) are reported to demonstrate the performance of here can yield higher performance, compared to some other
the proposed learning approach. In the experiments, ten-fold classification algorithms.
cross validation [35] was used and applied ten times for each
application domain. That is, the entire data set of each appli-
cation domain was equally divided into ten parts in each trial; 6 Conclusions
each part was used once for testing and the remaining parts
were used for training. Accordingly, the average accuracy of In this paper, an instance-based learning approach based on
classification was obtained. grey relational structure is proposed. Grey relational analy-
Table 1 gives the performances (average classification ac- sis is used to precisely describe the entire relational structure
curacy) of the proposed approach (i.e., h-nn with GRG), the of all instances in a specific application domain. By using
HOEM (with k-nn), HVDM (with k-nn) and IVDM (with k- the above-mentioned grey relational structure, new instances
nn) [39]. These distance functions used for comparison are can be identified with high accuracy. Experiments performed
mentioned in Section 2. As shown in Table 2, the statistical on forty application domains are reported to demonstrate the
analysis, including better or worse test (B/W test; for exam- performance of the proposed approach. It can be easily seen
ple, a better or worse test result of 29/7 under the HOEM that the proposed approach yields high performance over
column means that the proposed learning approach performs other methods that adopt one of Euclidean metric and the
better than the HOEM in 29 cases) and Wilcoxon Signed Value Difference Metric (VDM) or both. Moreover, the pro-
Ranks test [37] (i.e., the proposed approach is compared with posal presented here can yield higher performance, compared
others), was done for the above methods with various distance to some other classification algorithms. For some domains
functions. Here, Wilcoxon Signed Ranks test was used to test with pure symbolic attributes in the experiments, instance-
the null hypothesis that the differences (classification accu- based learning approaches using the VDM perform better
racy) between two methods are distributed symmetrically than the proposed learning approach, which is based on the
around zero (i.e., to determine if one method is significantly grey relational structure. As pointed earlier, the VDM is
better than another, regarding the classification accuracy). mainly applicable to domains with symbolic attributes. For
Of these forty application domains, the proposed approach further work, the VDM will be incorporated with the pro-
reveals its superiority over the HOEM (with k-nn) and the posed metric to increase the performance of the correspond-
HVDM (with k-nn). Meanwhile, the classification accuracy ing instance-based learning approach.
of the proposed approach is comparable to that of the IVDM
(with k-nn).
Acknowledgments This work was supported in part by the National
Furthermore, various classification algorithms were also Digital Archive Program-Research & Development of Technology Di-
used for performance comparison, including DecisionStump vision (NDAP-R & DTD), the National Science Council of Taiwan
[41], DecisionTable [27], HyperPipes [41], C4.5 [31], Naive- under grant NSC 94-2422-H-001-006, and by the Taiwan Information
Security Center (TWISC), the National Science Council under grant
Bayes [25], 1R [18] and VFI [41]. These methods are all
NSC 94-3114-P-001-001-Y. In addition, the authors would like to thank
available in [41]. Table 3 gives the performances (average the National Science Council of Taiwan for financially supporting this
accuracy of classification) of the above classification meth- research under grant NSC 94-2213-E-022-006.
Springer
250 Appl Intell (2006) 25:243–251
Springer
Appl Intell (2006) 25:243–251 251
Chi-Chun Huang is currently Assistant Professor in the Department Hahn-Ming Lee is currently Professor in the Department of Computer
of Information Management at National Kaohsiung Marine University, Science and Information Engineering at National Taiwan University
Kaohsiung, Taiwan. He received the Ph.D. degree from the Depart- of Science and Technology, Taipei, Taiwan. He received the B.S. de-
ment of Electronic Engineering at National Taiwan University of Sci- gree and Ph.D. degree from the Department of Computer Science and
ence and Technology in 2003. His research includes intelligent Internet Information Engineering at National Taiwan University in 1984 and
systems, grey theory, machine learning, neural networks and pattern 1991, respectively. His research interests include, intelligent Internet
recognition. systems, fuzzy computing, neural networks and machine learning. He
is a member of IEEE, TAAI, CFSA and IICM.
Springer