Time Series 1

Time Series (1)
Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/
Time series are everywhere

Time series, that is, sequences of observations (samples) made through time, are present in everydays life:
Temperature, rainfalls, seismic traces Weblogs Stock prices EEG, ECG, blood pressure Enrolled students at the Engineering Fac.
29 28 27 26 25 24 230 50 100 150 200 250 300 350 400 450 500
This as well as many of the following figures/examples are taken from the tutorial given by Eamonn Keogh at SBBD 2002 (XVII Brazilian Symposium on Databases)
www.cs.ucr.edu/~eamonn/
Time series (1)
Sistemi Informativi M
Why is similarity search so important?

Similarity search can help you in:
Looking for the occurrence of known patterns Discovering unknown patterns Putting things together (clustering) Classifiying new data Predicting/extrapolating future behaviors
Time series (1)
Why is similarity search difficult?

Consider a large time series DB, e.g.:
1 hour of ECG data: 1 GByte Typical Weblog: 5 GBytes per week Space Shuttle DB: 158 GBytes MACHO Astronomical DB: 2 Tbytes, updated with 3 GBytes a day (20 million stars recorded nightly for 4 years) http://wwwmacho.anu.edu.au/
First problem: large database size Second problem: subjectivity of similarity evaluation
Time series (1)
How to measure similarity

Given two time series of equal length D, the usual way to measure their (dis-)similarity is based on Euclidean distance However, with Euclidean distance we have to face two basic problems
1. High-dimensionality: (very) large D values 2. Sensitivity to alignment of values
For problem 1. we need to define effective lower-bounding techniques that work in a (much) lower dimensional space For problem 2. we will introduce a new similarity criterion
Time series (1)
L 2 (s, q) =
D-1 t =0
(s t q t )
Pre-processing
Diretly comparing two time series (e.g., using Euclidean distance) might lead to counter-intuitive results This is because Euclidean distance, as well as other measures, are very sensitive to some data features that might conflict with the subjective notion of similarity Thus, a good idea is to pre-process time series in order to remove such features
Time series (1)
Offset translation
Subtract from each sample the mean value of the series
3 2.5 2 1.5 1 0.5 0 3 2.5 2 1.5 1 0.5 0
d(q,s)
50
100
150
200
250
300
50
100
150
200
250
300
q = q - mean(q) s = s - mean(s) d(q,s)

0 50 100 150 200 250 300
50
100
150
200
250
300
Time series (1)
Amplitude scaling
Normalize the amplitude (divide by the standard deviation of the series)
100
200
300 400
500 600 700
800
900 1000
100
200
300 400
500 600
700 800
900 1000
q = (q - mean(q)) / std(q) s = (s - mean(s)) / std(s)
Time series (1)
Noise removal
Average the values of each sample with those of its neighbors (smoothing)
8 6 4 2 0 -2 -4 0
8 6 4 2 0 -2 -4 0
20
40
60
80
100
120
140
20
40
60
80
100
120
140
q = smooth(q) s = smooth(s)
Time series (1) Sistemi Informativi M 9
Dimensionality reduction: DFT (1)

The first method for reducing the dimensionality of time series, proposed in [AFS93], was based on the Discrete Fourier Transform (DFT) Remind: given a time series s, the Fourier coefficients are complex numbers (amplitude,phase), defined as:
Sf = 1
s exp( j2f t/D) D

t t =0
D 1
f = 0,...,D 1
Img
Parseval theorem: DFT preserves the energy of the signal:

E(s ) = s = E(S) = S f
2 t t =0 f =0 D 1 D 1
Sf
2
|Sf | Re
where |Sf| = |Re(Sf) + j Img(Sf)| = (Re(Sf)2 + Img(Sf)2) is the absolute value of Sf (its amplitude), which equals the Euclidean distance of Sf from the origin

The key observation is that the DFT is a linear transformation Thus, we have:
L 2 (s, q) 2 = (s t q t ) = E(s q) = E(S Q) = S f Q f
2 t =0 f =0 D 1 D 1 2
= L 2 (S, Q) 2
where |Sf - Qf | is the Euclidean distance between Sf and Qf in the complex plane The above just says that DFT also preserves the Euclidean distance
Img
What can we gain from such transformation?
Qf
|Sf - Qf | Sf Re
Time series (1)
11

The key observation is that, by keeping only a small set of Fourier coefficients, we can obtain a good approximation of the original signal This is because most of the energy of many real-world signals concentrates in the low frequencies ([AFS93]): More precisely, the energy spectrum (|Sf|2 vs. f) behaves as O(f-b), b > 0:
b = 2 (random walk or brown noise): used to model the behavior of stock movements and currency exchange rates b > 2 (black noise): suitable to model slowly varying natural phenomena (e.g., water levels of rivers) b = 1 (pink noise): according to Birkhoffs theory, musical scores follow this energy pattern
Thus, by only keeping the first few coefficients (D << D), an effective dimensionality reduction can be obtained
Note: this is the basic idea used by well-known compression standards, such as JPEG (which is based on Discrete Cosine Transform)
For what we have seen, this projection technique satisfies the L-B lemma
An example: EEG data

Sampling rate: 128 Hz
Time series (4 secs, 512 points)
Energy spectrum
Time series (1)
13
Another example
128 points
s s
data values
0.4995 0.5264 0.5523 0.5761 0.5973 0.6153 0.6301 0.6420 0.6515 0.6596 0.6672 0.6751 0.6843 0.6954 0.7086 0.7240 0.7412 0.7595 0.7780 0.7956 0.8115 0.8247 0.8345 0.8407 0.8431 0.8423 0.8387
Fourier coefficients
1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928 0.1635 0.1602 0.0992 0.1282 0.1438 0.1416 0.1400 0.1412 0.1530 0.0795 0.1013 0.1150 0.1801 0.1082 0.0812 0.0347 0.0052 0.0017 0.0002 ...
First 4 Fourier coefficients

1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928
20
40
60
80
100
120
140
s = approximation of s with 4 Fourier coefficients
Time series (1)
14
Comments on DFT
Can be computed in O(DlogD) time using FFT (provided D is a power of 2) Difficult to use if one wants to deal with sequences of different length Not really amenable to deal with signals with spots (time-varying energy) An alternative to DFT is to use wavelets, which takes a different perspective:
A signal can be represented as a sum of contributions, each at a different resolution level Discrete Wavelet Transform (DWT) can be computed in O(D) time Also used in the JPEG2000 standard for compressing images
Experimental results however show that the superiority of DWT w.r.t. DFT is dependent on the specific dataset
Good for wavelets bad for Fourier

0 200
Time series (1)
Good for Fourier bad for wavelets

400 600 0 200 400 600
15 Sistemi Informativi M
Haar Wavelets: the intuition

X X'
DWT
0 20 40 60 80 100 120 140
Haar 0
Haar 1
Haar 2
As with DFT, the time series is decomposed into a linear combinations of base elements The Haar DWT, applied to a series of length D = 2n , pairs samples, stores their difference and passes their average to the next stage This process is repeated recursively, which leads to 2n 1 dierence values and one nal average (which is 0 for normalized series) s = (3,2,4,6,5,6,2,2) Differences Averages (1,-2,-1,0) (2.5,5,5.5,2) (-3.5,3.5) (3.75,3.75) (0) (3. 75)
Sistemi Informativi M 16
Haar 3
Haar 4
Haar 5
Haar 6
Haar 7
Time series (1)
Dimensionality reduction: PAA

PAA (Piecewise Aggregate Approximation) [KCP+00,YF00] is a very simple, intuitive and fast, O(D), method to approximate time series
Its performance is comparable to that of DFT and DWT
We take a window of size W and segment our time series into D = D/W pieces (sub-sequences), each of size W iW 1 For each piece, we compute the average of values, i.e. st t =(i 1) W ' Our approximation is therefore s = (s1,,sD) si = W We have W L2(s,q) L2(s,q) (arguments generalize those used for the global average example)
The same can be generalized to work with arbitrary Lp-norms [YF00] s W s'
20
40
60
80
100
120
140
Time series (1)
17
Some results (thanks again to Eamonn Keogh!)

The graphs shows the fraction of data (indexed by an R-tree) that must be retrieved from disk to answer a 1-NN query
mixed dataset
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
On most datasets, 0.5 the three methods 0.4 yield very similar results 0.3 0.2 The actual response 0.1 time may vary, even for a given method, 0 depending on the 16 1024 32 512 implementation 64
256
0.2
0.2
0.2
0.1
0.1
0.1
0 1024 512 256 64 32 16
0 1024 512 256 64 32 16
0 1024 512 256 64 32 16
DFT
Time series (1)
DWT
PAA
Sistemi Informativi M 18
Example: computing the Euclidean distance

Consider a sequential 1-NN algorithm based on Euclidean distance, and let r be the lowest distance found so far Two basic optimization techiniques are
1. Avoid taking the square root, i.e., use squared Euclidean distance
This does not influence the result

2. Early terminate to evaluate
5 Euclid Opt1 Opt2
Seconds
the distance on a seres s if the so-far accumulated distance for s is r
Clearly, these optimization are not peculiar to time series
10,000
50,000
100,000
Number of Objects Time series (1) Sistemi Informativi M 19
10

Time Series 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series 1

Uploaded by

Copyright:

Available Formats

Time Series (1)

Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/

Time series are everywhere

Time series (1)

Why is similarity search so important?

Time series (1)

Why is similarity search difficult?

Time series (1)

How to measure similarity

Time series (1)

q = q - mean(q) s = s - mean(s) d(q,s)

Time series (1)

500 600 700

q = (q - mean(q)) / std(q) s = (s - mean(s)) / std(s)

Time series (1)

Dimensionality reduction: DFT (1)

s exp( j2f t/D) D

Parseval theorem: DFT preserves the energy of the signal:

Dimensionality reduction: DFT (2)

What can we gain from such transformation?

Time series (1)

Dimensionality reduction: DFT (3)

An example: EEG data

Time series (4 secs, 512 points)

Time series (1)

First 4 Fourier coefficients

s = approximation of s with 4 Fourier coefficients

Time series (1)

Good for wavelets bad for Fourier

Good for Fourier bad for wavelets

Haar Wavelets: the intuition

Time series (1)

Dimensionality reduction: PAA

Time series (1)

Some results (thanks again to Eamonn Keogh!)

0 1024 512 256 64 32 16

0 1024 512 256 64 32 16

0 1024 512 256 64 32 16

Example: computing the Euclidean distance

This does not influence the result

5 Euclid Opt1 Opt2

the distance on a seres s if the so-far accumulated distance for s is r

Clearly, these optimization are not peculiar to time series

Number of Objects Time series (1) Sistemi Informativi M 19

You might also like