Professional Documents
Culture Documents
I. I NTRODUCTION
Time-series classification is an active research topic in
machine learning and data mining, because it finds several
applications including finance, medicine, biometrics, chemistry, astronomy, robotics, networking and industry [11]. The
increasing interest in time-series classification resulted in a
plethora of different approaches ranging from neural [16]
and Bayesian networks [22] to genetic algorithms, support
vector machines [6] and frequent pattern mining [7], [1].
Nevertheless, despite its simplicity, the 1-nearest neighbor
(1-NN) classifier based on the dynamic time warping (DTW)
distance [21] has been shown to be competitive, if not
superior, to many state-of-the art time-series classification
methods [20], [2], [13]. Due to its good performance, this
method has been examined in depth (a thorough summary
of results can be found at [18]) with the aim to improve its
accuracy [19] and efficiency [12].
The wide acceptance of 1-NN in time-series classification
has been supported also by its lack of parameters. Nevertheless, it is known that the choice of parameter k in
the k-NN classifier affects the bias-variance trade-off [9].
Smaller values of k may result in increased variance due to
overfitting, whereas larger values of k increase the bias by
capturing only global tendencies. Recent studies [17] have
Figure 2.
tN N (t)
k0
where N (t) D2 is the set of k 0 -NNs of t and ae (tN , ki )
is the associated error value of each tN N (t).
Similar to the primary models, the secondary models also
have to determine the value of k 0 . Nevertheless, the proposed
approach has the advantage that, as will be asserted by our
Figure 3.
4 Ratios other than the examined 5-4, gave similar results. In case of
leave-one-out, the training data was split according to 5 to 4 proportion.
V. E XPERIMENTAL E VALUATION
A. Experimental configuration
of the data sets. According to the findings in [17], timeseries data sets usually have high intrinsic dimensionality
and, thus, some of their instances tend to misclassify a
surprisingly large number of other instances when using the
k-NN classifier (k 1). These instances are called bad
hubs and are responsible for a very large fraction of the
total error. For this reason, for each time series, t, in a
data set, we measured two quantities: the badness B(t) of
t and the goodness G(t) of t. B(t) (G(t), resp.) is the total
number of time series in the data set, which have t as their
first nearest neighbor while having different (same, resp.)
class label from t. For each data set, we sort all time series
according to the G(t) B(t) quantity in descending order.
Then we change the label of first p percent time series in
this ranking (p varies in range 0-10%).5 Since the above
procedure results in data sets that have stronger bad hubs
and a less clear separation between classes, the comparison
among the examined methods becomes more challenging
and can characterize better the robustness of the methods.
p=1%
30 (20)
5 (1)
30 (15)
5 (1)
p=5%
34 (29)
1 (0)
30 (9)
5 (1)
p = 10 %
34 (31)
1 (0)
28 (14)
7 (1)
total
98 (80)
7 (1)
88 (38)
17 (3)
Table II
N UMBER IEP S WINS / LOOSES AGAINST 1-NN AND k-NN.
As shown, in the vast majority of the cases IEP outperforms its competitors, often significantly, whereas when it
looses, the difference is usually non-significant. In several
cases, the error of IEP is an order of magnitude lower (e.g.,
5 The time series whose labels were changed by this procedure, are
assigned to an additional class (not included in the original data set). To
keep our experimental evaluation meaningful, the time series with changed
labels were excluded from the test set.
D. Execution Time
To investigate the overhead produced by IEP both in the
off- and online computation, in Table III we report the
execution times (on a Xeon 2.3 GHz processor) for three
among the largest datasets: Wafer, Two-Patterns, and ChlorineConcentration. Please note that the off-line time refers to
the time required for training the secondary regression level,
which has to be performed only once. Online time refers to
the actual time needed to classify a new time-series. IEP has
almost the same off-line time (reported in minutes) as k-NN.
This is because training is dominated by the classification
of the hold-out set D2 in both cases. Although the data
sets in Table III are rather large, the resulting off-line time
is reasonable in all cases. Regarding the online time, it is
evident that IEP is able to maintain the fast classification of
new time series.
Dataset
size
50 Words
Adiac
Car
CBF
ChlorineConcentration
CinC
DiatomSizeReduction
ECG200
ECGFiveDays
FaceFour
FacesUCR
Fish
GunPoint
Haptics
InlineSkate
ItalyPowerDemand
Lighting2
Lighting7
Mallat
MedicalImages
Motes
OSULeaf
Plane
SonyAIBORobotS.
SonyAIBORobotS.II
StarLightCurves
Symbols
SyntheticControl
SwedishLeaf
Trace
TwoLeadECG
TwoPatterns
Wafer
WordSynonyms
Yoga
IEP
0.239
0.373
0.279
0.004
0.053
0.003
0.006
0.136
0.013
0.063
0.029
0.228
0.010
0.490
0.469
0.038
0.192
0.254
0.014
0.212
0.059
0.320
0.005
0.026
0.034
0.076
0.023
0.020
0.170
0.005
0.001
0.001
0.003
0.224
0.071
905
781
120
930
4307
1420
322
200
884
112
2250
350
200
463
650
1096
121
143
2400
1141
1272
442
210
621
980
9236
1020
600
1125
200
1162
5000
7164
905
3300
-/-/+/+
+/+
+/+
-/+/+
+/+
-/-/+
+/+/+
+/-/+/+
+/+
+/+
+/+
+/+/+
-/+/+
+/+
+/-/+/+
p =1%
1-NN
0.249
0.381
0.278 0.106
0.021 +
0.033
0.031
0.171
0.041
0.108
0.059
0.254
0.036
0.582
0.461 0.087
0.133
0.254
0.055
0.228
0.090
0.287 0.034
0.073
0.063
0.119
0.061
0.076
0.206
0.046
0.041
0.065
0.042
0.238
0.099
k-NN
0.242
0.384
0.303
0.047
0.021 +
0.011
0.038
0.156
0.045
0.072
0.039
0.239
0.061
0.532
0.483
0.081
0.125
0.289
0.018
0.234
0.078
0.292
0.038
0.068
0.067
0.073 0.031
0.017 0.197
0.036
0.052
0.007
0.004
0.241
0.114
IEP
0.270
0.415
0.310
0.043
0.077
0.008
0.010
0.150
0.020
0.075
0.044
0.244
0.016
0.540
0.523
0.059
0.209
0.279
0.019
0.228
0.073
0.363
0.020
0.035
0.037
0.096
0.029
0.028
0.189
0.005
0.005
0.014
0.006
0.270
0.085
+/+/+/+/+/+
+/+
+/+/+/+
+/-/+/+/+
+/+/+
+/+
+/+
+/+/+/+/+/+
+/+/+
+/+/+/-
p =5%
1-NN
0.388
0.508
0.416
0.328
0.075 0.143
0.141
0.313
0.164
0.234
0.193
0.386
0.162
0.681
0.562
0.237
0.270
0.426
0.178
0.339
0.206
0.402
0.148
0.234
0.212
0.253
0.196
0.227
0.328
0.180
0.175
0.236
0.160
0.379
0.223
k-NN
0.260 0.451
0.353
0.057
0.075 0.021
0.049
0.134 0.136
0.112
0.046
0.280
0.176
0.553
0.570
0.060
0.209
0.338
0.034
0.256
0.107
0.345 0.114
0.083
0.119
0.098
0.036
0.058
0.216
0.064
0.025
0.019
0.005 +
0.287
0.115
IEP
0.321
0.476
0.330
0.139
0.121
0.019
0.021
0.201
0.028
0.099
0.070
0.329
0.043
0.632
0.602
0.096
0.257
0.341
0.048
0.248
0.093
0.407
0.021
0.083
0.071
0.151
0.066
0.068
0.247
0.034
0.025
0.068
0.015
0.332
0.123
+/+/+/+/+
+/+/+
+/+
+/+/+/+
+/+
+/+
+/+/+/+
+/+
+/+
+/+
+/+/+
+/+/+/+/+
+/+
+/+/+
p =10%
1-NN
0.505
0.614
0.514
0.496
0.115 0.242
0.276
0.442
0.273
0.386
0.316
0.512
0.258
0.774
0.679
0.389
0.422
0.558
0.308
0.457
0.316
0.523
0.304
0.365
0.362
0.388
0.332
0.355
0.461
0.322
0.315
0.384
0.279
0.510
0.332
k 0
k-NN
0.338
0.519
0.358
0.174
0.115 0.029
0.055
0.141 +
0.097
0.198
0.083
0.392
0.207
0.622 0.661
0.117
0.239
0.310
0.067
0.277
0.148
0.383 0.225
0.143
0.166
0.149 0.069
0.096
0.257
0.094
0.028
0.079
0.020
0.349
0.190
0.021
0.018
0.051
0.044
0.021
0.048
0.058
0.073
0.041
n/a
0.028
0.042
0.107
0.026
0.025
0.030
n/a
n/a
0.030
0.023
0.038
0.028
0.093
0.047
0.031
0.003
0.040
0.057
0.023
0.074
0.042
0.026
0.025
0.032
0.020
Table I
C LASSIFICATION ERROR .
Figure 5.
IEP
k-NN
Average difference of the secondary models performace (in RMSE) between using k=5 and k=10 for p = 5 % noise for each data set.
Wafer
12.9 m / 0.22 s
12.9 m / 0.06 s
Two-Patterns
19.8 m / 0.51 s
19.8 m / 0.18 s
Chlorine
6.8 m / 0.23 s
6.8 m / 0.04 s
Table III
E XECUTION TIMES : TRAINING TIME ( OFFLINE ) IN MINUTES AND
PREDICTION TIME ( ONLINE ) IN SECONDS .
VI. C ONCLUSION
We examined the problem of time-series classification
based on the k-NN classifier and the DTW distance. Although the 1-NN classifier had been shown to be competitive, if not superior, to many state-of-the art time-series
classification methods, we argued that in several cases we
may not only consider k > 1 for the k-NN classifier, but
also select k in an individual base for each time series that
has to be classified.
We proposed an IEP mechanism that considers a range
of k-NN classifiers (for different k values) and uses secondary regression models that predict the error of each such
classifier. The proposed approach selects separately for each
time series the classifier with the minimum predicting error.
This allows for adapting to characteristics that are varying
among the different regions in a data set and overcoming
the problem of selecting a single k value.
Our experimental evaluation used a large collection of real
data sets. Our results indicate that the proposed method is
more robust and compares favorably against two examined
baselines by resulting in significant reduction in the classification error. Other advantageous properties of the proposed
method are its small sensitivity against the parameters it uses
and the small overhead it adds in execution time.
It is important to state that the proposed IEP approach has
several generic features. For the k-NN classifier, IEP can be
employed for learning other parameters than k, such as the
distance measure or the importance of nearest neighbors.
More importantly, IEP is not only limited for the problem
of k-NN classification of time-series data, since it can be
used in combination with other classification algorithms and
data types, whenever the complexity of the data requires
such an individualized approach. Therefore, our future work
involves the examination of IEP in a more general context
of classification problems.
R EFERENCES
[1] K. Buza, L. Schmidt-Thieme, Motif-based Classification of
Time Series with Bayesian Networks and SVMs,
Proc. of
32nd Annual Conference of the Gesellschaft fur Klassifikation (GfKl 2008), Springer Verlag, 2009.
[2] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, E. Keogh,
Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures,
VLDB 08, 2008.
[3] C. Domeniconi and D. Gunopulos, Adaptive Nearest Neighbor Classification using Support Vector Machines,
NIPS,
2001.
[4] C. Domeniconi, J. Peng. D. Gunopulos, Locally Adaptive
Metric Nearest-Neighbor Classification,
Transactions on
Pattern Analysis and Machine Intelligence, 2002.
[5] N. Duffy, D. Helmbold, Boosting Methods for Regression,
Machine Learning, 47, pp. 153200, 2002.
[6] D. Eads, D. Hill, S. Davis, S. Perkins, J. Ma, R. Porter and
J. Theiler, Genetic algorithms and suppor vector machines
for time series classification, In Proc. of the International
Society for Optical Engineering (SPIE), volume 4787, pages
7485, 2002.
[7] P. Geurts Pattern Extraction for Time Series Classification,
PKDD, LNAI 2168, pp. 115127, Springer Verlag, 2001.