Correlation Based Dynamic Time Warping of Multivariate Time Series

Correlation Based Dynamic Time Warping of
Multivariate Time Series
Zoltán Bankó, János Abonyi ∗

University of Pannonia, Department of Process Engineering P.O.Box. 158,
H-8200, Veszprem, Hungary
Abstract
The proper selection of similarity measure of objects is the cornerstone of all data
mining project. In recent years, Dynamic Time Warping (DTW) has begun to be-
come the most widely used technique for univariate time series comparison by re-
placing Euclidean distance. However, DTW similarity of multivariate time series
has not been examined as detailed as it has been for the univariate time series. The
direct comparison of the variables – even if DTW is applied – ignores the hidden
process, i.e. the correlation between process variables and often this hidden process
carries the real information. For this reason, Principal Component Analysis (PCA)
similarity factor (SPCA) is used in many real time industrial application such as
process supervision in chemical plants.
This paper presents a new algorithm (CBDTW) for the comparison of multivari-
ate time series which combines the two approach by generalizing DTW for the needs
of correlated multivariate time series. To preserve the correlation the multivariate
time series have been segmented and SPCA has been selected as basis of DTW.
The segments have been obtained by bottom-up segmentation using special, PCA
related costs. This new technique qualified on two databases, the database of Sig-
nature Verification Competition 2004 and the commonly used AUSLAN dataset.
It has been shown that the presented segmentation and PCA aided DTW method
obviously outperforms the standard PCA similarity factor and it can compete with
standard, Euclidean distance based multivariate DTW.
Key words: Dynamic Time Warping, multivariate time series, Principal

Component Analysis, segmentation, similarity
∗ Corresponding author.
Email address: abonyij@fmt.uni-pannon.hu (János Abonyi).
URL: www.fmt.uni-pannon.hu/softcomp (János Abonyi).
Preprint submitted to Elsevier 31 May 2009

1 1 Introduction
2 A time series is a sequence of values measured as a function of time. These

3 kinds of data are widely used in the field of process engineering [33], medicine
4 [37], bioinformatics [2], chemistry [3], finance [26] and even for tornado predic-
5 tion [25]. The increasing popularity of knowledge discovery and data mining
6 tasks for discrete data has indicated the growing need for similarly efficient
7 methods for time series databases. These tasks share a common requirement: a
8 similarity measure has to be defined between the elements of a given database.
9 Moreover, the results of the data mining methods from simple clustering (par-
10 titioning the data into coherent but not predefined subsets) and classification
11 (placing the data into predefined, labeled groups) to complex decision-making
12 systems are highly depend on the applied similarity measure.
13 The similarity of multivariate time series can be approached from two different
14 perspectives. The first way is the application of metric based measures such as
15 Euclidean distance or their “warped” extensions, i.e. L2 norm based DTW or
16 Longest Common SubSequences (LCSS). These techniques are perfectly suit-
17 able for univariate tasks like speech recognition, where the analyzed process
18 is represented by one variable only. At first sight, it may seem these methods
19 can be easily generalized for the needs of the multivariate time series; however,
20 their application for comparing time series, where the process depends on two
21 or more variables, is often not as effective as it is expected.
22 The reason of this unexpected inaccuracy is that the multivariate time series
23 is usually much more than the collection of univariate time series. The variable
24 of an univariate time series represents the process itself, thus two univariate
25 processes can be compared by using this one variable. However, two processes
26 represented by multivariate time series often cannot be compared using direct
27 comparison of the variables because the processes are not only described by
28 the variables but their relation. This relation is the correlation between the
29 variables and it can be treated as a hidden process which carries the real
30 information. A very good example for this hidden process based approach
31 is process monitoring in chemical plants. The monitoring of a high-density
32 polyethylene plant requires to track more than 10 variables. Under monitoring,
33 the tracked signals (polymer production intensity, hydrogen input, ethylene
34 input, etc.) are measured against the stored patterns to detect any sign of
35 malfunctions. However, neither of the monitored signals are compared to their
36 counterparts directly because of deviation in one or more signals does not mean
37 malfunction automatically. Thus, Principal Component Analysis (PCA) based
38 similarity measures are used to compute the hidden process and to do the
39 comparison in real-time.
40 Although PCA considers the time series as a whole, it does not take into
2
41 account the alterations in the relationship between the variables. The main
42 goal of this paper is to construct a similarity measure which deals with the
43 changes in the correlation structure of the variables as flexible as DTW does.
44 1.1 Related works
45 The data mining community has been investigating different similarity mea-
46 sures for univariate time series for a long time. The majority of these works
47 have focused on the Minkowski metrics (Lp norms), especially on the Euclidean
48 distance. Its popularity did not only arise from its simple geometric representa-
49 tion and fast computability, but also from the fact that the Euclidean distance
50 is preserved under orthogonal transformations such as Fourier and Wavelet
51 Transform and it satisfies the triangle inequality. Technically speaking, the
52 Euclidean distance allows the efficient indexing of large time series databases
53 and the speed gained by indexing was the most important to consider when a
54 data mining application was developed in the early 1990s.
55 Agrawal et al. presented the Fast Fourier Transform based indexing version of
56 Euclidean distance [5] which was improved in Reference [28] by using the last
57 few Fourier coefficients without storing them in the index. As Wavelet Trans-
58 formation started to replace Fourier Transformation in many areas, it was also
59 introduced for data mining purposes in Reference [27]. Finally, the GEMINI
60 framework [8] generalized the creation of the indexing methods and it pro-
61 vided facilities for new, non-Euclidean distance based approaches which held
62 out higher accuracy. E. Keogh and his colleagues made big efforts to overcame
63 the problem of computational time of Dynamic Time Warping and after some
64 segmentation based solutions [17,20] the exact indexing is presented in Ref-
65 erence [15]. Based on this work Vlachos et al. created an indexing framework
66 which supports several similarity measures for trajectory datasets [36].
67 The increasing and less and less costly computational power made it possible
68 to create similarity measures without considering their indexing capabilities
69 for tasks where the quality of the result is much more important than the
70 speed of the calculation. Non-linear matching techniques such as Dynamic
71 Time Warping, Longest Common SubSequences and Symbolic Approximation
72 has been studied extensively for this purposes. DTW proved its adaptability
73 and superiority over other similarity measures in wide range of time series
74 applications from speech recognition [32] to fingerprint verification [34] and
75 – as final prof of this – Yanikoglu won the Signature Verification Conference
76 with a DTW based algorithm in 2004 [1,21]. A year later Ratanamahatna et
77 al. create a database specific global constraint [29] and dispelled the common
78 myths about DTW [30]. For those who deal with univariate time series it has
79 seemed the equally good similarity measure has been found.
3
80 Nowadays other new techniques are ready to be introduced for time series data
81 mining community. Balasko et al. analyzed the symbolic episode segmentation
82 based sequence alignment – a common tool for DNA sequence alignment –
83 from the viewpoint of process engineering [6]. In parallel with DTW studies,
84 multivariate monitoring and control schemes based on latent variable methods
85 have been receiving increasing attention by industrial practitioners. {...} Sev-
86 eral companies have enthusiastically adopted the methods and have reported
87 many success stories. Applications have been reported where multivariate sta-
88 tistical process control, fault detection and diagnosis is achieved by utilizing
89 the latent variable space, for continuous and batch processes, as well as, for
90 process transitions as for example start ups and re-starts [23]. Seborg et al.
91 has developed many modification of general PCA similarity factor [13,10] for
92 process engineering purposes and Yang[38] examined standard and modified
93 PCA similarity factors. More information on these PCA related measures can
94 be found in Appendix B.
95 Concluding the above cited results, there is logical to combine the strength of
96 DTW and PCA similarity factor. In this paper a new and intuitive method
97 is proposed which is based on DTW aided PCA and segmentation, namely
98 Correlation Based Dynamic Time Warping (CBDTW). The coherent parts of
99 a multivariate time series define segments; therefore, segmentation can be ap-
100 plied to address that the PCA similarity factor does not take into account the
101 alternation of variables. These segments can be compared directly; however,
102 DTW has been used to improve the accuracy. This makes the presented simi-
103 larity measure invariant to phase shifts of the time axis and to the differences
104 in the number of segments. It is also capable to compensate the ”locally elas-
105 tic” shifts (local time warpings) of the time series. The presented algorithm
106 has been qualified on two databases, the database of Signature Verification
107 Competition 2004 and the AUSLAN dataset which is widely used by the time
108 series data mining community [11]. It has been shown that CBDTW outper-
109 forms the PCA similarity factor and it can compete with Euclidean distance
110 and Euclidean distance based multivariate DTW in term of accuracy.
111 The rest of the paper is organized as follows. Section 2, Appendix A and
112 B detail the theoretical background of the proposed similarity measure, i.e.
113 segmentation, Dynamic Time Warping and Principal Component Analysis. In
114 Section 3 the novel similarity measure has been introduced in full detail, while
115 Section 4 conducts a detailed empirical comparison of the presented method
116 with other techniques. Finally in Section 5 conclusions and suggestions for
117 future work have been presented.
4
118 2 Theoretical background
119 Xn is an n-variable, m-element time series where xi is the ith variable and
120 xi (j) denotes its jth element:
Xn = [x1 , x2 , x3 , . . . , xn ],
(1)
xi = [xi (1), xi (2), . . . , xi (j), . . . , xi (m)]T
121 According to this notation a multivariate time series can be represented by a

122 matrix in which each column represents a variable and each row represents a
123 sample of the multivariate time series at a given time:

x1 x2 ... xn
  
 Xn (1)   x1 (1) x2 (1) . . . xn (1) 
(2)
   
 Xn (2)   x1 (2) x2 (2) . . . xn (2) 
  

..   . .. .. .. 
   
 ..


 .    . . .  
   
Xn (m) x1 (m) x2 (m) . . . xn (m)
124 The similarity between Xn and Yn is denoted by s(Xn , Yn ), where 0 ≤ s(Xn , Yn ) ≤ 1,

125 s(Xn , Yn ) = s(Yn , Xn ) and s(Xn , Xn ) = 1. Obviously, the similarity is nothing
126 more than a real number between zero and one which is intended to express
127 the tightness of connection between the processes behind the time series. The
128 closer the number is to one, the processes are treated more similar. In practice,
129 the term dissimilarity or distance (d(Xn , Yn )) is used instead of similarity. The
130 properties of distance are the following:
131 • symmetric, i.e. d(Xn , Yn ) = d(Yn , Xn )

132 • positive-definite, i.e. d(Xn , Yn ) ≥ 0 and d(Xn , Yn ) = 0 iff Xn = Yn
133 • 0 ≤ d(Xn , Yn ) < ∞
134 If the value of the distance is given by a number ranged from zero to one,
135 it can be associated with similarity: s(Xn , Yn ) = 1 − d(Xn , Yn ). Please note,
136 the applied distance is not expected to satisfy the triangle inequality if the
137 distance is not used as the basis of indexing.
138 Getting valuable information from time series data by data mining methods
139 requires a good similarity measure; even so, the traditional approaches are
5
140 rarely precise enough for the most applications. This is caused by the brit-
141 tleness of the conventional comparison methods. Distance measures such as
142 Euclidean distance are unable to handle the distortions in time axis, so these
143 distortions almost randomly affect the distance between time series. One pos-
144 sible solution for this problem is the application of Dynamic Time Warping
145 which can ”warp” the original time series (nonlinearly dilate or compress their
146 time axes) to be similar in shape to the query series as much as possible. The
147 algorithm of DTW is not discussed here; however, the interested reader can
148 find a short review on it in Appendix A.
149 As it was mentioned before, the time series should be segmented to consider
150 the alternation of the latent variable. Moreover, the segmentation has another
151 advantage, i.e. it speeds up DTW which is computationally expensive. In the
152 next section the basics of segmentation is introduced.
153 2.1 Segmentation
154 The ith segment of Xn is a set of consecutive time points, Si (a, b) =

155 [Xn (a); Xn (a + 1); . . . ; Xn (b)]. The c-segmentation of time series Xn is a
c
156 partition of Xn to c non-overlapping segments, SX n
= [S1 (1, a); S2 (a +
157 1, b); . . . ; Sc (k, m)]. In other words, a c-segmentation splits Xn to c disjoint
158 time intervals, where 1 ≤ a and k ≤ m.
159 The segmentation problem can be framed in several ways [19], but its main
160 goal is always the same: finding homogenous segments by the definition of
161 a cost function, cost(Si (a, b)). This function can be any arbitrary function
162 which projects from the space of multivariate time series to the space of the
163 non-negative real numbers. Usually, cost(Si (a, b)) is based on the distances
164 between the actual values of the time series and the values given by a simple
165 function f (constant or linear function, a polynomial of a higher but limited
166 degree) fitted to the data of each segment:
b
1 X
cost(Si (a, b)) = d(Xn (l), f (Xn (l))) (3)
b − a + 1 l=a
167 Thus, the segmentation algorithms simultaneously determine the parameters

168 of the models and the borders of the segments by minimizing the sum of the
169 costs of the individual segments:
c
c
X
cost(SX n
) = cost(Si (a, b)) (4)
i=1
6
170 The cost of a time series can be minimized by dynamic programming, which is
171 computationally intractable for many real datasets [12]. Consequently, heuris-
172 tic optimization techniques such as greedy top-down or bottom-up techniques
173 are frequently used to find good but suboptimal c-segmentations:
174 • Bottom-Up: Every element of Xn is handled as a segment. The costs of

175 the adjacent elements are calculated and two elements with the minimum
176 cost are merged. The merging cost calculation of adjacent elements and the
177 merging are continued until some goal is reached.
178 • Top-Down: The whole Xn is handled as a segment. The costs of every
179 possible splits are calculated and the one with the minimum cost is executed.
180 The splitting cost calculation and splitting is continued recursively until
181 some goal is reached.
182 • Sliding Windows: The first segment is started with the first element of
183 Xn . This segment is grown until its cost exceeds a predefined value. The
184 next segment is started at the next element. The process is repeated until
185 the whole time series is segmented.
186 All of these segmentation methods have their own specific advantages and
187 drawbacks. Accordingly, the sliding windows method not able to divide up a
188 sequence into predefined number of segments but this is the fastest method.
189 The applied method depends on the given task. In Reference [19] these heuris-
190 tic optimization techniques were examined in detail through the application
191 of Piecewise Linear Approximation. It can be said if real-time (on-line) seg-
192 mentation is not required, the best result can be reached by Bottom-Up seg-
193 mentation.
194 3 Correlation Based Dynamic Time Warping of Multivariate Time

195 Series
196 3.1 Drawbacks of DTW and PCA similarity factor
197 The distance between two multivariate time series has to be calculated in
198 the n-dimensional space. The traditional, Euclidean distance based DTW be-
199 tween two sequences can be computed in two different ways. In first case –
200 when the multidimensionality is considered – the points of the n-dimensional
201 space are paired and compared each other rather than the values of a given
202 variable. Figure 1 shows that the effect of DTW is the same as in the uni-
203 variate case. This approach is useful when the number of variables is only
204 two or three, they are all measured in the same scale and there are no big
205 differences in their amplitudes and values. However, there are two problems
206 with this trajectory based approach: firstly, the variables cannot be weighted
7
100
Y movement
80
60
15
40
10
20
0
20
5
X movement
40
60 80
Time
Fig. 1. Comparing trajectories of two signatures with Euclidean distance based

DTW
207 globally; and secondly, the differences in the scaling of the variables can easily
208 produce unwanted weighting, i.e. change of 10% of variables with big values
209 counts more than the change of same percentage of variables with small val-
210 ues. To address this problem usually z-normalization is used and/or the DTW
211 distance is computed between the same variables of the two multivariate time
212 series and the results are summarized with a help of a weighting vector. The
213 main difference between this method and the trajectory based one is that the
214 trajectory based method creates only one globally optimal warping path while
215 the second method creates warping paths for each variable.
216 From the viewpoint of correlation, there is a serious problem with the previ-
217 ously mentioned approaches. To demonstrate this two two-variable time se-
218 ries, called X2 and Y2 , have to be imagined which need to be compared to the
219 constant query Z2 . If the first variable of X2 grows linearly while its second
220 variable decreases linearly and the variables of Y2 do the same with the same
221 amplitude but in reverse order, the distances of X2 and Y2 are equal from Z2
222 even with DTW. It is true to say that this example is unnatural; however,
223 it clearly reveals the drawback of application of DTW for multivariate time
224 series. 1
225 The handicap of PCA based methods is more obvious. They are not able to
226 handle the alternations of the correlation structure which are affect the hy-
227 perplanes, therefore segmentation is required in most real-life application to
228 create homogeneous segments from the viewpoint of the correlation structure.
229 However, the segmentation raises another problem: Although in many real-life
1 Other examples can be found in Reference [7].
8
230 applications a lot of variables must be simultaneously tracked and monitored,
231 most of the segmentation algorithms are used for the analysis of only one
232 time-variant variable [22]. Usage of only one variable for segmentation of mul-
233 tivariate time series is not precise enough when the correlation between the
234 variables is an important factor. Moreover, the higher dimensional segmenta-
235 tion problems, such as surface simplification [31], have much better understood
236 than its multivariate relative.
237 3.2 Proposed Algorithm and Evolution Method
238 To overcame the above mentioned problem, the PCA based segmentation can
239 be applied which was presented in Reference [4]. The authors used modified
240 Hotelling’s T 2 statistics and the Q reconstruction error as the measure of the
241 homogeneity of the segments, i.e. to construct the cost function. Figure 2
242 shows these two measures in case of a 2-variable 11-element time series. The
243 elements are represented with green ellipses and the two red arrows are the
244 principal components of the sequence. The black dot marks the intersection of
245 these axes, i.e. the center of the space which was defined by the principal com-
246 ponents. If the second principal component is ignored then two distances can
247 be computed for an element. The first one is the Euclidean distance between
248 the original data point and its mapping in the one dimensional space. The
249 blue arrow, which is noted with Q, shows this distance. This reconstruction
250 error can be computed for the jth data point of the times series Xn as:
Q(j) = (Xn (j) − X̂n (j))(Xn (j) − X̂n (j))T , (5)
251 where X̂n (j) is the jth predicted value of Xn . If the mean is subtracted from
252 Xn – its expected value is made to 0 –, the value of Q(j) can be computed in
253 another way:
Q(j) = Xn (j)(I − UXn ,p UXT n ,p )Xn (j)T , (6)
254 where I is the identity matrix and UXn ,p is the matrix of eigenvectors. These
255 eigenvectors belong to the most important p ≤ n eigenvalues of covariance
256 matrix of Xn , thus they describe the hyperplanes. Please note, this PCA based
257 segmentation can be considered as the natural extension of Piecewise Linear
258 Approximation presented in Reference [17], i.e. both of them define the cost
259 function based on the Euclidean distance between the original data and its
260 mapping on the lower dimensional hyperplane.
261 The second measure which can be used to construct the cost function is mod-
262 ified Hotelling’s T 2 statistic. This shows the distance of each element from the
9
X2
2
T
X1
Fig. 2. Measures can be used for PCA model based segmentation

263 center of the data, hence it signs the distribution of the projected data. Its
264 formula is the following for the jth point:
T 2 (j) = Yp (j)Yp (j)T , (7)
265 where Yp (j) is the lower (p) dimensional representation of Xn (j). The cost
266 functions can be defined as:
b
1 X
costQ (Si (a, b)) = Q(j)
b − a + 1 j=a
b
(8)
1
T 2 (j)
X
costT 2 (Si (a, b)) =
b − a + 1 j=a
267 Using one of the above mentioned PCA based segmentation the Correlation
268 Based Dynamic Time Warping of multivariate time series can be realized. The
269 proposed method is intuitive but yet simple:
270 • Segment the time series of the given database by correlation to create the
271 most homogenous segments. The projection error or Hotelling’s statistics
272 can be used as basis of the cost function. This segmentation can be done
273 off-line in most cases.
274 • Segment the query time series according to the database.
275 • Calculate the distance between the query and the time series of the database
276 with Dynamic Time Warping. The base distance of DTW can be any dis-
277 tance which arises from a covariance based similarity measure (e.g. 1−sP CA ).
10
278 The size of matrix D
c (the computation time of DTW) depends on the num-
279 ber of the segments.
280 The validation of a similarity measure which is not optimized for a specific
281 database cannot be made theoretically, experimental evaluation must be made
282 on wide range of different datasets. Unfortunately, there is no similar Classifi-
283 cation/Clustering Page for multivariate time series as for univariate series [18].
284 So, to perform the validation of CBDTW a modified leave-one-out k nearest
285 neighbor search algorithm was used. Algorithm 1 shows its pseudocode. In
286 the first for loop the precision array is initialized which is used to contain the
287 values of precision at the given values of recall. The precision is calculated for
288 every item in the database in the main for loop. The value of k (number of
289 nearest neighbors) and r (number of required items from the same class) is
290 set for the actual time series and the kNN search is performed in the while
291 loop. If the number of retrieved items from the same class (c) is equal to r
292 then the precision array is updated and the value of k and r are incremented.
293 If the number of items from the same class is less than the required value
294 (r), the algorithm looks for more neighbors (increments the value of k). This
295 continues until the value of r is less than or equal to r items. For simplicity
296 and according to Reference [38], the value of r items was chosen to 10.
297 Using the precision array, a recall-precision graph can be plotted which is
298 a common tool to measure and demonstrate the performance of the Infor-
299 mation Retrieval (IR) [9] systems. The precision expresses the proportion of
300 the relevant sequences from the retrieved sequences. Similarly, the recall is
301 the number of the relevant elements in a database retrieved by the k nearest
302 neighbor search. Note that the graph can be considered as the extension of
303 the 1-nn search used at [18].
304 Before Algorithm 1 is executed two important parameters have to be selected

305 for every dataset. The first one is the number of principal components (p) be-
306 cause of the increasing number of principal components decreases the recon-
307 struction error. If p equals the number of the variables, the Q reconstruction
308 error becomes zero and the modified T 2 statistics becomes the real distances
309 in the whole dataset. On the other hand, if p is too small, the reconstruction
310 error will be large for the entire time-series. In these two extreme cases the
311 segmentation is not based on the internal relationships among the variables,
312 so simple equidistant segments can be detected. To avoid this, the value of p
313 has to be selected carefully, i.e. the first few p eigenvalues should contain the
314 95 − 99% of the total variance.
315 The other important parameter is the number of segments. It can be deter-
316 mined by different techniques such as the method presented by Vasko et al.
317 in Reference [35]. This method is based on permutation test which used to
318 determine whether the increase of the model accuracy with the increase of the
11
Algorithm 1: The pseudocode of modified leave-one-out k nearest neighbor
search for Recall-Precision diagrams of Figure 5 and Figure 7
Input: N : the number of multivariate time series in database
Input: k : the number of required nearest neighbors
Input: r items: the number of relevant items
Output: precision: the array of the values of precision as the function of
recall
/* Create the result array */
for (i = 1; i ≤ r items; i++) do
precision(i) = 0;
end
/* Calculate the precision for every elements in dataset */
for (i = 1; i ≤ N; i++) do
curr item = ith item in the database;
/* Number of nearest neighbors */
k = 1;
/* Number of requested relevant items */
r = 1;
while (r ≤ r items) do
/* Perform kNN search for curr item, c is the number of
the items from the k retrieved items with the same
labels as curr item */
c = knnsearch(curr item, k);
if (c == r) then
precision[r] = precision[r] + c/k;
r = r + 1;
end
k = k + 1;
end
end
for (i = 1; i ≤ 10; i++) do
precision(i) = precision(i)/N;
end
319 number of segments is due to the underlying structure of the data or due to
320 the noise. In this paper a similar but much simpler method has been applied
321 for this purpose. It is based on the weighted modeling error where the weight
322 is the number of the segments. To get a clearer picture, the relative reduction
323 of the modeling error is also used:
c−1 c
c cost(SXn
) − cost(SX n
)
RR(SX n
)= c−1 , (9)
cost(SXn )
c
324 where RR(SX n
) is the relative reduction of error when c segments are used
12
325 instead of c − 1 segments.
326 4 Experimental Results
327 In the following, the detailed empirical comparison of the presented method
328 (CBDTW) with Euclidean-distance (Euc), Euclidean-distance based multi-
329 variate DTW (MDTW, i.e. only one warping path is defined in the multidi-
330 mensional space of time series and the Euclidean-distance is used to calculate
331 the distance between the paired points), PCA similarity factor (SPCA) and
332 its segmented version (SegPCA) is presented. To demonstrate the scalability
333 of the new method different number of segments and hyperplanes have been
334 examined. Before the results are discussed mention should be made of the
335 applied parameters to help the reproduction of the presented results.
336 Euclidean distance is not able to handle time series with different length and
337 DTW tends to show its best when the sequences have the same length [30].
338 Hence, the time series have been interpolated to the average length in each
339 database for these two methods. Obviously, the PCA based methods are not
340 require such an action; however, the average length of the AUSLAN dataset
341 is only 57 so to prepare all of its times series for segmentation, linear interpo-
342 lation was used to obtain sequences with a length of 300. Another important
343 parameters for the segmentation based methods are the way of segmentation
344 and the applied cost function. The decision on framing the segmentation was
345 easy, according to Reference [19] the Bottom-Up algorithm was used. The cost
346 function also had influence on the results; however, the difference in the re-
347 sults does not affects the order of the similarity measures. Thus, the results
348 of segmentation based methods which used projection error as cost function
349 were omitted for brevity. The distances between the corresponding segments
350 has been arisen from the PCA similarity factor, that is 1 − sP CA .
351 The applied constraint on the warping path of DTW also has to be defined. In
352 References [29,30], E. Keogh and C. A. Ratanamahatana stated that “all the
353 evidence suggests that narrow constraints are necessary for accurate DTW ”
354 which is obviously true for properly preprocessed datasets used in data mining
355 applications. However, sometimes there is no chance to do proper preprocess-
356 ing and compensate the intial/ending shifts of the time series in real-time
357 application due to the time or hardware limit. Moreover, the difference be-
358 tween using the best (i.e., applying R–K Band [29]) and no warping path is
359 often not significant even for preprocessed datasets [18]. Thus, as a bullet proof
360 solution, no constraint has been applied. Please note, this decision does not
361 affect the relation between the results of multivariate DTW and PCA based
362 methods in case of the two applied datasets, but it made easier to reproduce
363 the results.
13
364 Selecting databases for validation purposes is almost as hard as the creation
365 of a new and useful similarity measure. There is no argue that the best data
366 should come from the industrial world (production data, plant supervision
367 data, real-time sensor information provided for ECUs, etc.); however, these
368 kind of data are rarely allowed to be published. Thus, according to Reference
369 [16], two datasets were selected for validation purposes which are available
370 on the Internet. An important aspect of the selection of these databases was
371 the decided difference between their hidden processes which can affect the
372 efficiency of any PCA based similarity measure. If the hidden process can be
373 discovered easily and the correlation of the variables do not vary inner class
374 then the PCA based similarity measures are not as effective as the conventional
375 measures like Euclidean-distance or DTW.
376 4.1 SVC2004
377 Considering the above mentioned, the first selected database is the one created
378 for Signature Verification Conference 2004 (SVC2004) [1]. It has 40 sets of
379 signature data and each set contains 20 genuine signatures from one signature
380 contributor and 20 skilled forgeries from five other contributors. Although
381 both genuine signatures and skilled forgeries are available, obviously only the
382 800 genuine signatures were used for validation. In each signature file, the
383 signature is represented as a sequence of points. The first line stores a single
384 integer which is the total number of points in the signature and the average
385 length of 184 can be counted from these integers. The signatures were collected
386 on a WACOM Intuos tablet, hence seven parameters of the pen could be
387 measured under the enrollment process:
388 • X-coordinate - scaled cursor position along the x-axis

389 • Y-coordinate - scaled cursor position along the y-axis
390 • Time stamp - system time at which the event was posted (not used in this
391 paper)
392 • Button status - current button status (0 for pen-up and 1 for pen-down)
393 (not used in this paper)
394 • Azimuth - clockwise rotation of cursor about the z-axis
395 • Altitude - angle upward toward the positive z-axis
396 • Pressure - adjusted state of the normal pressure
397 Please note, the time stamps and the button status were not used in this
398 paper. The reason behind this decision is very simple: time stamps and button
399 status do not add too much for the accuracy except when special handwriting
400 recognition techniques is used. Furthermore, the usage of time stamps requires
401 an extra interpolation step which makes harder to reproduce the results.
14
x and y coordinates
10000
5000
1000
1 50 100 150
Time(ms)
Fig. 3. Signature from the database of SVC2004. The horizontal lines mark the
major changes in covariance between the time series of the two coordinates x and y
402 This is an ideal dataset for any segmentation based methods because the
403 correlation alternates many times between the variables as it was illustrated
404 in Figure 3. However, the hidden process is not as hidden as one can expect.
405 Only one of the coordinates can be used to represent the whole signature due
406 to the fact that the variables usually change in the same way at the same place
407 for a given participant.
408 For the validation algorithm the number of segments and the number of prin-
409 cipal components have to be chosen in advance for the PCA based methods.
410 The number of hyperplanes can be determined by using the desired accu-
411 racy (loss of variance) of the PCA models. The first one, two, three and four
412 eigenvalues describe 79.84%, 99.04%, 99.91% and 99.98% of the total variance
413 respectively. To demonstrate the dependency of PCA based methods on the
414 tightness of the representation both 2 and 4 dimensional hyperplanes have
415 been used for such techniques.
416 The number of the segments has been achieved by plotting the weighted cost
417 and its relative reduction rate. Figure 4 shows the costT 2 as the function of the
418 number of segments and it seems that only 8 segments are enough to get a good
419 segmentation from the viewpoint of relative error reduction. However, seeing
420 Figure 3, it is obvious that this 8 segments are not enough to characterize a
421 signature well (and take the full advantage of Dynamic Time Warping) so the
422 number of segments was chosen to 20.
423 The results of Algorithm 1 for this database are shown in Figure 5. In the light
424 of the results of SVC2004 and considering the fact that the hidden process
425 is not “hidden enough”, it is not surprising that Euclidean-distance and its
15
5
x 10
5
Mean of error
3
0 5 10 15 20 25 30 35 40
Number of segments
4
x 10
10
Error reduction
0 5 10 15 20 25 30 35 40
Fig. 4. The costT 2 and its relative reduction rate with number of segments in case
of SVC2004 dataset using two hyperplanes
426 warped version are outperforms all other methods. From CBDTW point of
427 view the relations of the PCA based techniques is much more interesting
428 . It is clearly seen that the third and fourth hyperplanes do not add any
429 useful information to the first two, but makes the classification less effective
430 because the PCA similarity factor weights the eigenvectors equally. The graph
431 of SegPCA shows that this technique really outperforms the segmentation free
432 version; however, the application of CBDTW has been improved the precision
433 even further.
434 4.2 AUSLAN
435 The high quality version of Australian Language Sign dataset (AUSLAN) col-
436 lected by Mohammed Waleed Kadous [14] has been also selected for validation
437 purposes. This contains 95 signs and each of them has 27 examples, so the
438 total number of sings is 2565. Two 5DT gloves as well as two Flock-of-Birds
439 magnetic position trackers with refresh rate of 100Hz were used by a native
16
1
0.9
0.8
0.7 SPCA 99.9

CBDTW 99.5
0.6 CBDTW 99.9
Precision
euc
0.5
SegPCA 99.5
0.4 SegPCA 99.9
DTW
0.3 SPCA 99.5
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Recall
Fig. 5. The Recall-Precision graph of SVC2004 dataset
440 signer. The signals of 11 channels were measured from each hand. The po-
441 sition trackers measured the x, y, z coordinates, the roll, pitch and yaw for
442 each hand. The gloves also provided the finger bend data from the five fin-
443 gers. Position and orientation were defined to 14-bit accuracy, giving position
444 information with a typical positional error less than one centimeter and angle
445 error less than one half of a degree. Finger bend was measured with 8 bits per
446 finger. The full dataset is available from UCI KDD Archive [11] and its high
447 quality version used in this publication was provided by E. J. Keogh [18].
448 AUSLAN is a much more complex dataset than SVC2004. It has 22 variables,
449 and lots of them are 0 most of the time. This yields to the underlying process is
450 much more hidden than it was for SVC2004. In addition the average length of
451 the database is only 57, so some preprocessing steps are needed. It is obvious
452 that the Correlation Based Dynamic Time Warping method requires 10 − 20
453 segments at least for effective warping and this cannot be guaranteed with this
454 average length because a segment has 0 costQ until its number of elements not
455 equals to the number of principal components. Hence, the sequences were
456 linearly interpolated 2 to a sufficiently large fixed length which was chosen to
457 300. Please note, this interpolation was not executed for the comparison with
458 PCA similarity factor.
459 The first one, two, three and four eigenvectors describe 92.04%, 97.53%, 98.95%
460 and 99.51% of the total variance, thus – as for SVC2004 dataset – the first
461 two and four eigenvectors were selected to provide basis for PCA based meth-
462 ods. The wighted error and its relative reduction rate can be seen in Figure
463 6. According to this, 20 segments has seemed as appropriate as it was for
464 SVC2004.
465 The results of the validation are shown in Figure 7. The high values of PCA
466 similarity factor based methods show that it is harder to reveal the correla-
467 tion between the variables than in SVC2004 dataset and this characterize the
468 classes much more. Considering the fact how many variables had to be tracked
2 interp1 function of Matlab was used
17
0.012
0.01
0.008
Mean of error
0.006
0.004
0.002
0
0 5 10 15 20 25 30 35 40
Number of segments
−3
x 10
3
2.5
2
Error reduction
1.5
0.5
0
0 5 10 15 20 25 30 35 40
Fig. 6. The costT 2 and its relative reduction rate with the number of segments in
case of AUSLAN dataset using two hyperplanes
469 to observe the underlying processes (i.e., 22 variables had to be monitored to

470 record the signs which can be expressed almost as good by using only 2 “hid-
471 den” variables) it is not surprising at all that Euclidean-distance and its time
472 warped extension are not performed as well as it does for SVC2004.
473 CBDTW proved its superiority over all other methods; however, it can be
474 criticized for two reasons: the simple PCA similarity factor (using four eigen-
475 vectors) has provided almost the same result and the multivariate DTW can
476 be fine tuned by constraining to also provide almost as good results as SCPA
477 99 [38]. Thus, CbDTW has been executed again using different number of
478 segments. The results can be seen in Figure 8.
18
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Euclidean
SPCA 95
0.2 SPCA 99
CBDTW 95
CBDTW 99
0.1 SegPCA 95
SegPCA 99
DTW
0
1 2 3 4 5 6 7 8 9 10
Fig. 7. The Recall-Precision graph of AUSLAN dataset
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
5
10
0.1 20
30
40
0
1 2 3 4 5 6 7 8 9 10
Fig. 8. The Recall-Precision graph of AUSLAN dataset for CbDTW using different
number of segments
479 5 Conclusion and Future Work
480 megvizsgltuk egy paramternek hatst
481 In this paper a novel similarity measure, CBDTW, has been presented for
482 multivariate time series. The algorithm is based on covariance driven seg-
483 mentation and Dynamic Time Warping. Two homogeneity measures has been
484 introduced as cost function for segmentation which are corresponding to the
485 two typical application of PCA models. The Q reconstruction error can be
486 used to segment the time series according to the change of the correlation
487 among the variables, while the modified Hotelling’s T 2 statistics segments the
19
488 time series based on the drift of the center of the operating region. Dynamic
489 Time Warping was utilized to compensate the time shifts and make the mea-
490 sure more accurate. The distances between the segments were calculated using
491 the simple PCA similarity factor.
492 To prove that CbDTW can outperform PCA similarity factor in any environ-
493 ment, CBdTW was tested on two datasets which differ from correlation point
494 of view. AUSLAN has 22 variables with a complex correlation structure. It
495 was selected to simulate the “typical” industrial data, i.e. the large number of
496 variables exist and their correlation structure cannot be revealed without the
497 application of PCA. The algorithm was also tested on the dataset of SVC2004
498 in which, contrary to AUSLAN, the correlation between the variables is obvi-
499 ous.
500 The recall-precision graphs showed superiority of CBDTW over PCA similar-
501 ity factor – irrespective of the complexity of the hidden process – and it even
502 outperforms the Euclidean-distance based Dynamic Time Warping when high
503 number of variables with complex correlation structure has to be handled. The
504 results has showed that the proposed algorithm can replace the standard PCA
505 similarity factor in many areas such as distinguishing and clustering typical
506 operational conditions and analyzing product grade transitions of process sys-
507 tems. Moreover, CBDTW can be used for data mining purposes when indexing
508 capability is not required.
509 The results of the validations are more than promising; however, there are some
510 places for future research. We intend to continue our work in two direction.
511 The automatization of parameter selection (type of the cost function, number
512 of segments, constraining of warping path, type of the basis similarity measure
513 such as sEros or sλP CA , etc.) can simplify the adaptation of the algorithm and
514 it would made easier its application. On the other hand, there is no lower
515 bounding function for PCA similarity factor. We would like to make lower
516 bounding possible by using new correlation based similarity measures.
517 6 Acknowledgements
518 The authors acknowledge the financial support of the Cooperative Research
519 Centre (VIKKK, project 2004-I), the Hungarian Research Found (OTKA
520 49534), the Bolyai János fellowship of the Hungarian Academy of Science
521 and the Öveges fellowship. The authors also thank Prof. Eamonn J. Keogh for
522 providing us the AUSLAN dataset from Time Series Data Mining Archive.
20
523 A Dynamic Time Warping
524 In the following the DTW algorithm is reviewed for univariate time series x
525 and y. To align these sequences, firstly grid D has to be defined with size
526 n × m where each cell represents the distance between the appropriate indices
527 of the two time series. In this step any application-dependent distance – like L1
528 and L∞ norms – can be chosen, but generally Euclidean distance is suggested
529 because of its popularity and because it allows of the efficient indexing of
530 Dynamic Time Warping [15]. Considering this:
q
D(i, j) = (x(i) − y(j))2 (A.1)
531 Using grid D, arbitrary mappings can be created – called warping paths –
532 between x and y. The construction of a warping path [p(1), p(2), . . . , p(l)] has
533 to be restricted with the following constraints:
534 • Boundary condition: The path has to start in D(1, 1) and end in D(n, m).
535 • Monotonicity: The path has to be monotonous, i.e always heading from
536 D(1, 1) to D(n, m). If p(k) = D(i, j) and p(k + 1) = D(i0 , j 0 ) then i0 − i ≥ 0
537 and j 0 − j ≥ 0.
538 • Continuity: The path has to be continuous. If p(k) = D(i, j) and p(k + 1) =
539 D(i0 , j 0 ) then i0 − i ≤ 1 and j 0 − j ≤ 1.
540 Thus, the length of a path is at least equal to the length of the longer time
541 series (max(m, n)), and its maximal length is the sum of the lengths of the
542 two series minus one (m + n − 1). Although the above restrictions reduce
543 the number of eligible warping paths, lots of them are still remained. To find
544 the optimal warping path (the DTW distance of the two time series), every
545 warping path has an assigned cost which is the sum of values of the affected
546 cells divided by a normalization constant K:
 qP 
l
i=1 p(l)
dDT W (x, y) = min   (A.2)
K
547 The value of K depends on the application, in most cases this is the length
548 of the path, but it can be omitted. More information about the method of
549 defining K and its significance can be found in [32]. Note that the Euclidean
550 distance is a special case of DTW, i.e. the two time series are have the same
551 length, the path is located on the diagonal of grid D and K = 1.
552 Obviously, the number of paths exponentially grows by the size of time series.
553 Fortunately, the optimal path can be found in O(nm) time with the help
21
300
200
100
11 10
9 8 16
7 6 12 14
5 4 8 10
3 2 4 6
y 1 2
x
Fig. A.1. Cumulative distance matrix D

b and the optimal warping path on it
554 of dynamic programming using cumulative distance matrix (D).

c Cells of the
555 cumulative matrix contain the sum of the value of the appropriate cell in
556 matrix D and the minimum of the three cells from where the cell can be
557 reached:
 
c − 1, j) + D(i, j),
D(i
 
D(i, c j − 1) + D(i, j),
c j) = min  D(i,
 
 (A.3)
 
c − 1, j − 1) + D(i, j)
D(i
558 The DTW distance between the two time series can be found in D(n,c m).
559 Please note, extension of DTW to multivariate time series is quite simple:
560 only the x and y have to be replaced by Xn and Yn and a distance has to be
561 defined between their indices to fill matrix D.
c
562 B PCA Similarity Factor
563 PCA is a well known and widely used multivariate statistical technique for
564 reducing the dimensionality of the data. Fundamentally, this is a linear trans-
565 formation that projects the original (Gaussian distributed) data to a new
566 coordinate system with minimal loss of information. In multivariate cases the
567 information is the structure of the original data, i.e. the correlation between
568 the variables and the alteration of the correlation structure. To create a pro-
569 jection based on this, PCA selects the coordinate axes of the new coordinate
570 system one by one according to the greatest variance of any projection.
22
Fig. B.1. Dimension reduction with the help of PCA from 3 (filled dots) to 2 (empty
dots) dimensions. Notice that the correlation between the dots is maximally pre-
served.
571 In projected space, the first coordinate (first principal component) is the vector
572 which has maximal variance in the original data space, the second coordinate
573 (second principal component) has maximal variance in the original data space
574 among all vectors which are orthogonal to the first coordinate, the third coor-
575 dinate is orthogonal to the first two and has maximal variance in original data
576 space, etc. Technically speaking, the eigenvectors of the covariance matrix of
577 the original data (according to the descending order of corresponding eigen-
578 values) have been selected for axes in the new dimensional space. In most real
579 multivariate processes the high-order principal components do not add any
580 significant information to the low-order components so they can be omitted
581 when dimensionality reduction is performed.
582 Krzanowski defined [24] the PCA similarity factor to measure the similarity
583 between different data by comparing the hyperplanes (the dimensionality re-
584 duced, new coordinate systems). Let Xn and Yn two multivariate time series
585 with n variable. UXn ,p and UYn ,p indicates the matrices of eigenvectors which
586 belong the most important p ≤ n eigenvalues of covariance matrices of Xn
587 and Yn , i.e. the two hyperplanes. The similarity between them:
tr(UXT n ,p UYn ,p UYTn ,p UXn ,p )

sP CA (Xn , Yn ) = (B.1)
p
588 The similarity factor has a geometrical explanation because it measures the
589 similarity between the two hyperplanes by computing the squared cosine values
590 between all the combinations of the first p principal components from UXn ,p
591 and UYn ,p :
23
p X p
1X
sP CA (Xn , Yn ) = cos2 Θi,j , (B.2)
p i=1 j=1
592 where Θi,j is the angle between the ith principal component of Xn and the
593 jth principal component of Yn .
594 The main advantage of this method is its optimal variable reduction property
595 which makes it ideal for tasks with numerous variables such as process engi-
596 neering jobs. PCA similarity factor have also gained in popularity because of
597 its outstanding property which makes possible to recognize the direction of
598 the changes in the distance between the variables, i.e. the “rotation” of the
599 hyperplanes.
600 The PCA similarity factor weights all principal components equally, hence it
601 may not capture well enough the degree of similarity between the sequences
602 when only one or two principal components explain most of the variance.
603 Thus, it was natural to define a modified PCA similarity factor that weights
604 each principal component by its explained variance. M. C. Johannesmeyer
605 [13] defined this modified PCA similarity factor by weighting each principal
606 component with its eigenvalue:
Pp Pp Xn Yn 2
i=1 j=1 (λi λj ) cos Θi,j
sλP CA (Xn , Yn ) = Pp Xn Yn , (B.3)
j=1 λi λi
607 where λXi

n
and λYj n are the corresponding eigenvalues for the ith and jth
608 principal component of Xn and Yn .
609 This thought was developed by K. Yang [38] who presented the logical ex-
610 tension of PCA similarity factor and SPλ CA called Eros (E xtended Frobenius
611 Norm). For two n variable time series Xn , Yn and for the corresponding ma-
612 trices UXn and UYn the Eros similarity is defined as:
n n
T
X X
sEros (Xn , Yn ) = wi |UXn (i)UYn (i) | = wi | cos Θi |, (B.4)
i=1 i=1
613 where cos Θi is the angle between the two corresponding principal compo-
614 nents and wi is the weighting vector which is based on the eigenvalues of the
615 sequences in the data set.
616 Generally speaking, the Eros measures the similarity of two multivariate time
617 series by comparing the angle between the principal components and using the
24
b a
α
β
Fig. B.2. Two corresponding principal components a and b

618 aggregated eigenvalues as weights, hence it takes into account the variance of
619 each principal component. It has to be noted that Eros always computes the
620 acute angle between the two corresponding principal components (eigenvec-
621 tors). Therefore, as illustrated in Figure B.2, when the angle (α) between the
622 two corresponding eigenvectors is not acute, the absolute value of it is taken
623 and the similarity between the two corresponding eigenvectors is computed by
624 using the acute angle (β). More specifically, the inner product of two normal
625 vectors, a and b in Figure B.2, yields cosα, while cosβ = cos(π − α) = −cosα
626 is needed. Therefore, the absolute value of cosine of the angle between the
627 corresponding eigenvectors is taken, so that cosα is computed when α ≤ π/2,
628 while −cosα is computed when α > π/2.
629 References
630 [1] Signature verification competition, http://www.cs.ust.hk/svc2004/.

631 [2] Aach, Church, Aligning gene expression time series with time warping
632 algorithms, BIOINF: Bioinformatics 17.
633 [3] J. Abonyi, B. Feil, S. Nemeth, P. Arva, Modified gath-geva clustering for fuzzy
634 segmentation of multivariate time-series, Fuzzy Sets and Systems 149 (1) (2005)
635 39–56.
636 [4] J. Abonyi, B. Feil, S. Nemeth, P. Arva, Principal component analysis based
637 time series segmentation, in: IEEE International Conference on Computational
638 Cybernetics, 2005.
639 [5] R. Agrawal, C. Faloutsos, A. N. Swami, Efficient similarity search in sequence
640 databases, in: D. Lomet (ed.), Proceedings of the 4th International Conference
641 of Foundations of Data Organization and Algorithms (FODO), Springer Verlag,
642 Chicago, Illinois, 1993.
643 [6] B. Balaskó, Z. Bankó, J. Abonyi, Analyzing trends by symbolic episode
644 representation and sequence alignment (2007).
25
645 [7] Z. Bankó, Correlation based dynamic time warping: A novel method for
646 comparing multivariate time series, M.sc. thesis, University of Pannonia,
647 Hungary (2007).
648 [8] C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching

649 in time-series databases, in: Proceedings 1994 ACM SIGMOD Conference,
650 Mineapolis, MN, 1994.
651 [9] W. B. Frakes, R. Baeza-Yates, Information Retrieval: Data Structures and

652 Algorithms, Prentice-Hall, Upper Saddle River, NJ, USA, 1992.
653 [10] J. C. Gunther, J. S. Conner, D. E. Seborg, Fault detection and diagnosis in an

654 industrial fed-batch cell culture process, Biotechnology Progress 23 (4) (2008)
655 851–857.
656 [11] S. Hettich, S. D. Bay, The UCI KDD Archive, http://kdd.ics.uci.edu, University
657 of California, Department of Information and Computer Science (1999).
658 [12] J. Himberg, K. Korpiaho, H. Mannila, J. Tikanmaki, H. Toivonen, Time series

659 segmentation for context recognition in mobile devices, in: ICDM, 2001.
660 [13] M. C. Johannesmeyer, Abnormal situation analysis using pattern recognition

661 techniques and historical data, M.sc. thesis, University of California, Santa
662 Barbara, CA (1999).
663 [14] M. W. Kadous, Temporal classification: Extending the classification paradigm

664 to multivariate time series, Ph.D. thesis, School of Computer Science &
665 Engineering, University of New South Wales (2002).
666 [15] E. Keogh, Exact indexing of dynamic time warping, in: VLDB, 2002.
667 [16] E. Keogh, S. Kasetty, On the need for time series data mining benchmarks: a
668 survey and empirical demonstration, in: KDD ’02: Proceedings of the 8th ACM
669 SIGKDD International Conference on Knowledge Discovery and Data Mining,
670 2002.
671 [17] E. Keogh, M. Pazzani, Scaling up dynamic time warping to massive datasets,
672 in: J. M. Zytkow, J. Rauch (eds.), 3rd European Conference on Principles and
673 Practice of Knowledge Discovery in Databases (PKDD’99), vol. 1704, Springer,
674 Prague, Czech Republic, 1999.
675 [18] E. Keogh, X. Xi, L. Wei,

676 C. A. Ratanamahatana, The ucr time series classification/clustering homepage,
677 htt://www.cs.ucr.edu/ eamonn/time series data/, Riverside CA. University of
678 California - Computer Science and Engineering Department (2006).
679 [19] E. J. Keogh, S. Chu, D. Hart, M. J. Pazzani, An online algorithm for segmenting
680 time series, in: ICDM, 2001.
681 [20] E. J. Keogh, M. J. Pazzani, Scaling up dynamic time warping for datamining
682 applications, in: Knowledge Discovery and Data Mining, 2000.
26
683 [21] A. Kholmatov, B. A. Yanikoglu, Biometric authentication using online
684 signatures, in: ISCIS, 2004.
685 [22] S. Kivikunnas, Overview of process trend analysis methods and applications,
686 in: ERUDIT Workshop on Applications in Pulp and Paper Industry, ERUDIT,
687 1998.
688 [23] T. Kourti, Application of latent variable methods to process control and
689 multivariate statistical process control in industry, International Journal of
690 Adaptive Control and Signal Processing 19 (4) (2005) 213–246.
691 [24] W. Krzanowski., Between-groups comparison of principal components., in:

692 Journal of the American Statistical Society, 1979.
693 [25] A. McGovern, D. Rosendahl, A. Kruger, M. Beaton, R. Brown, K. Droegemeier,

694 Understanding the formation of tornadoes through data mining, in: 5th
695 Conference on Artificial Intelligence and its Applications to Environmental
696 Sciences at the American Meteorological Society annual conference, 2007.
697 [26] T. Podding, C. Huber, Data mining for the detection of turning points in
698 financial time series, in: IDA ’99: Proceedings of the Third International
699 Symposium on Advances in Intelligent Data Analisys, Springer-Verlag, 1999.
700 [27] K. pong Chan, A. W.-C. Fu, Efficient time series matching by wavelets, in:
701 ICDE, 1999.
702 [28] D. Rafiei, A. Mendelzon, Similarity-based queries for time series data, in:
703 SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD International Conference
704 on Management of Data, 1997.
705 [29] C. A. Ratanamahatana, E. J. Keogh, Making time-series classification more

706 accurate using learned constraints, in: SDM, 2004.
707 [30] C. A. Ratanamahatana, E. J. Keogh, Three myths about dynamic time warping
708 data mining, in: SDM, 2005.
709 [31] H. P. S., M. Garland, Survey of polygonal surface simplification algorithms, in:
710 Proceedings of the 24th International Conference on Computer Graphics and
711 Interactive Techniques, Multiresolution Surface Modeling Course, 1997.
712 [32] H. Sakoe, S. Chiba., Dynamic programming algorithm optimization for spoken
713 word recognition, in: IEEE Transactions on Acoustics, Speech, and Signal
714 Processing, February 1978.
715 [33] A. Singhal, D. E. Seborg, Matching patterns from historical data using pca and
716 distance similarity factors, in: Proceedings of the American Control Conference,
717 2001.
718 [34] Z. M. K. Vajna, A fingerprint verification system based on triangular matching

719 and dynamic time warping, IEEE Trans. Pattern Anal. Mach. Intell. 22 (11)
720 (2000) 1266–1276.
27
721 [35] K. T. Vasko, H. T. T. Toivonen, Estimating the number of segments in time
722 series data using permutation tests, in: ICDM ’02: Proceedings of the 2002
723 IEEE International Conference on Data Mining (ICDM’02), IEEE Computer
724 Society, Washington, DC, USA, 2002.
725 [36] M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, E. Keogh, Indexing multi-

726 dimensional time-series with support for multiple distance measures, in: KDD
727 ’03: Proceedings of the 9th ACM SIGKDD International Conference on
728 Knowledge Discovery and Data Mining, 2003.
729 [37] H. J. L. M. Vullings, M. H. G. Verhaegen, H. B. Verbruggen, Ecg segmentation

730 using time-warping, in: IDA ’97: Proceedings of the Second International
731 Symposium on Advances in Intelligent Data Analysis, Reasoning about Data,
732 Springer-Verlag, London, UK, 1997.
733 [38] K. Yang, C. Shahabi, A pca-based similarity measure for multivariate time
734 series, in: MMDB ’04: Proceedings of the 2nd ACM International Workshop on
735 Multimedia databases, ACM Press, 2004.
28

Correlation Based Dynamic Time Warping of Multivariate Time Series

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation Based Dynamic Time Warping of Multivariate Time Series

Uploaded by

Copyright:

Available Formats

Correlation Based Dynamic Time Warping of

Multivariate Time Series

Zoltán Bankó, János Abonyi ∗

Key words: Dynamic Time Warping, multivariate time series, Principal

Preprint submitted to Elsevier 31 May 2009

2 A time series is a sequence of values measured as a function of time. These

44 1.1 Related works

121 According to this notation a multivariate time series can be represented by a

124 The similarity between Xn and Yn is denoted by s(Xn , Yn ), where 0 ≤ s(Xn , Yn ) ≤ 1,

131 • symmetric, i.e. d(Xn , Yn ) = d(Yn , Xn )

153 2.1 Segmentation

154 The ith segment of Xn is a set of consecutive time points, Si (a, b) =

167 Thus, the segmentation algorithms simultaneously determine the parameters

174 • Bottom-Up: Every element of Xn is handled as a segment. The costs of

194 3 Correlation Based Dynamic Time Warping of Multivariate Time

196 3.1 Drawbacks of DTW and PCA similarity factor

Fig. 1. Comparing trajectories of two signatures with Euclidean distance based

1 Other examples can be found in Reference [7].

237 3.2 Proposed Algorithm and Evolution Method

Q(j) = (Xn (j) − X̂n (j))(Xn (j) − X̂n (j))T , (5)

Q(j) = Xn (j)(I − UXn ,p UXT n ,p )Xn (j)T , (6)

Fig. 2. Measures can be used for PCA model based segmentation

T 2 (j) = Yp (j)Yp (j)T , (7)

304 Before Algorithm 1 is executed two important parameters have to be selected

326 4 Experimental Results

376 4.1 SVC2004

388 • X-coordinate - scaled cursor position along the x-axis

434 4.2 AUSLAN

0.7 SPCA 99.9

Fig. 5. The Recall-Precision graph of SVC2004 dataset

2 interp1 function of Matlab was used

469 to observe the underlying processes (i.e., 22 variables had to be monitored to

Fig. 7. The Recall-Precision graph of AUSLAN dataset

479 5 Conclusion and Future Work

480 megvizsgltuk egy paramternek hatst

Fig. A.1. Cumulative distance matrix D

554 of dynamic programming using cumulative distance matrix (D).

562 B PCA Similarity Factor

tr(UXT n ,p UYn ,p UYTn ,p UXn ,p )

607 where λXi

Fig. B.2. Two corresponding principal components a and b

630 [1] Signature verification competition, http://www.cs.ust.hk/svc2004/.

648 [8] C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching

651 [9] W. B. Frakes, R. Baeza-Yates, Information Retrieval: Data Structures and

653 [10] J. C. Gunther, J. S. Conner, D. E. Seborg, Fault detection and diagnosis in an

658 [12] J. Himberg, K. Korpiaho, H. Mannila, J. Tikanmaki, H. Toivonen, Time series

660 [13] M. C. Johannesmeyer, Abnormal situation analysis using pattern recognition

663 [14] M. W. Kadous, Temporal classification: Extending the classification paradigm

675 [18] E. Keogh, X. Xi, L. Wei,

691 [24] W. Krzanowski., Between-groups comparison of principal components., in:

693 [25] A. McGovern, D. Rosendahl, A. Kruger, M. Beaton, R. Brown, K. Droegemeier,

705 [29] C. A. Ratanamahatana, E. J. Keogh, Making time-series classification more

718 [34] Z. M. K. Vajna, A fingerprint verification system based on triangular matching

725 [36] M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, E. Keogh, Indexing multi-

729 [37] H. J. L. M. Vullings, M. H. G. Verhaegen, H. B. Verbruggen, Ecg segmentation

You might also like