Professional Documents
Culture Documents
Isabel Silva
Departamento de Engenharia Civil, Faculdade de Engenharia da Universidade do Porto
Centro de Investigação e Desenvolvimento em Matemática e Aplicações (CIDMA), Universidade de Aveiro
21 de Abril de 2010
Outline
Motivation
Principal Component Analysis for time series
◮ Classic Principal Component Analysis
◮ Weighted Principal Component Analysis
◮ Dynamics Principal Component Analysis
◮ Singular Spectrum Analysis / Multi-Channel Singular Spectrum Analysis
Illustration
Final remarks
Motivation
Motivation
Multidimensional time
and space-time series
y
Dimensionality reduction
Motivation
Multidimensional time
and space-time series
y
Dimensionality reduction
Motivation
Multidimensional time
and space-time series
y
Dimensionality reduction
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 4 / 24
Isabel Silva Principal Component Analysis for Time Series
1 n
xij = yij − Y j = yij − ∑ yij ,
n i=1
i = 1, . . . , n; j = 1, . . . , T
x1 x11 x12 ··· x1T
x2 h i x21 x22 ··· x2T
X= = X1 X2 ··· XT =
.. .. .. .. ..
.
. . . .
xn xn1 xn2 ··· xnT
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 4 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series
Var(Zj ) = λj , j = 1, . . . , T
λj
Proportion of variance due to Zj : , j = 1, . . . , T
λ1 + · · · + λT
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series
Var(Zj ) = λj , j = 1, . . . , T
λj
Proportion of variance due to Zj : , j = 1, . . . , T
λ1 + · · · + λT
1
uij = (yij − Y j )
sjj
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series
Var(Zj ) = λj , j = 1, . . . , T
λj
Proportion of variance due to Zj : , j = 1, . . . , T
λ1 + · · · + λT
√
uij = ωj (yij − Y j ), for i = 1, . . . , n; j = 1, . . . , T
T
Weights: ωj , such that ωj ≥ 0, ∑ ωj = 1
j=1
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 6 / 24
Isabel Silva Principal Component Analysis for Time Series
√
uij = ωj (yij − Y j ), for i = 1, . . . , n; j = 1, . . . , T
T
Weights: ωj , such that ωj ≥ 0, ∑ ωj = 1
j=1
y
Weighted matrix of covariances of data
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 6 / 24
Isabel Silva Principal Component Analysis for Time Series
√
uij = ωj (yij − Y j ), for i = 1, . . . , n; j = 1, . . . , T
T
Weights: ωj , such that ωj ≥ 0, ∑ ωj = 1
j=1
y
Weighted matrix of covariances of data
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 6 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 7 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 7 / 24
Isabel Silva Principal Component Analysis for Time Series
T−1
1 T−k
Sample spectrum: f̂ (ν ) = ∑ γ̂ (k)e2π iν k , γ̂ (k) = ∑ (xt+k − x̄)(xt − x̄)
n t=1
k=−(T−1)
m
2
1 T
∑ xt e
−2π iνk t
Periodogram: I(νk ) = √ , νk = k/T, k = 0, . . . , T − 1
T t=1
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 7 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series
ytj (ν ) = ej (ν )∗ X, j = 1, . . . , T
Var(ytj (ν )) = λj (ν )
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series
ytj (ν ) = ej (ν )∗ X, j = 1, . . . , T
Var(ytj (ν )) = λj (ν )
Problem:
Appropriate choice of ν
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series
Carry out a PCA on a suitable chosen lagged version of the original time series
Decompose the original series in a small number of independent and
interpretable components that can be considered as trend and oscillatory
components and a structureless noise
No stationarity assumptions for the time series are needed
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 9 / 24
Isabel Silva Principal Component Analysis for Time Series
Carry out a PCA on a suitable chosen lagged version of the original time series
Decompose the original series in a small number of independent and
interpretable components that can be considered as trend and oscillatory
components and a structureless noise
No stationarity assumptions for the time series are needed
Basic SSA
Decomposition stage
◮ Embedding
◮ Singular Value Decomposition (SVD)
Reconstruction stage
◮ Grouping
◮ Diagonal averaging
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 9 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series
SVD
S = XT X −→ eigenvalues: λ1 ≥ λ2 ≥ · · · ≥ λL and eigenvectors: U1 , U2 , . . . , UL
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series
SVD
S = XT X −→ eigenvalues: λ1 ≥ λ2 ≥ · · · ≥ λL and eigenvectors: U1 , U2 , . . . , UL
√
d = rank(X) = max{i : λi > 0} ≤ L Vi = X Ui / λi , i = 1, . . . , d
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series
SVD
S = XT X −→ eigenvalues: λ1 ≥ λ2 ≥ · · · ≥ λL and eigenvectors: U1 , U2 , . . . , UL
√
d = rank(X) = max{i : λi > 0} ≤ L Vi = X Ui / λi , i = 1, . . . , d
√
X = X1 + X2 + · · · + Xd , Xi = λi Vi Ui T , (λi , Ui , Vi ) : eigentriples
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 11 / 24
Isabel Silva Principal Component Analysis for Time Series
X ≈ XI1 + · · · + XIM
∑i∈Ik λi
The contribution of the component XIk :
∑di=1 λi
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 11 / 24
Isabel Silva Principal Component Analysis for Time Series
X ≈ XI1 + · · · + XIM
∑i∈Ik λi
The contribution of the component XIk :
∑di=1 λi
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 12 / 24
Isabel Silva Principal Component Analysis for Time Series
L∗ = min{L, K}; K ∗ = max{L, K}; xij∗ (k) = xij (k) if L < K; xij∗ (k) = xji (k) if L ≥ K
1
∑
t+1 ∗ (k)
x
p=1 p,t−p+2
, if 0 ≤ t < L∗ − 1
+
t 1
1 L∗ ∗
∗ ∑p=1 p,t−p+2
(k) (k)
ỹt = x , if L∗ − 1 ≤ t < K ∗
L
1 ∑n−K +1
∗
∗ (k)
, if K ∗ ≤ t < n
∗ +2 xp,t−p+2
n−t p=t−K
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 12 / 24
Isabel Silva Principal Component Analysis for Time Series
L∗ = min{L, K}; K ∗ = max{L, K}; xij∗ (k) = xij (k) if L < K; xij∗ (k) = xji (k) if L ≥ K
1
∑
t+1 ∗ (k)
x
p=1 p,t−p+2
, if 0 ≤ t < L∗ − 1
+
t 1
1 L∗ ∗
∗ ∑p=1 p,t−p+2
(k) (k)
ỹt = x , if L∗ − 1 ≤ t < K ∗
L
1 ∑n−K +1
∗
∗ (k)
, if K ∗ ≤ t < n
∗ +2 xp,t−p+2
n−t p=t−K
M
∑ ỹt
(k)
y = X̃I1 + · · · + X̃IM ⇐⇒ yt = , t = 0, . . . , n − 1
k=1
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 12 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 13 / 24
Isabel Silva Principal Component Analysis for Time Series
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 13 / 24
Isabel Silva Principal Component Analysis for Time Series
Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Rodrigues and de Carvalho (2008): carefully choice of L and M → they can
compromise the analysis results
Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Rodrigues and de Carvalho (2008): carefully choice of L and M → they can
compromise the analysis results
1200
800
Dataset: Monthly average number of
700
http://robjhyndman.com/TSDL//) 400
Jan1963 Dec1976
month
Illustration
Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
Illustration
Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,
Illustration
Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,
501 488 504 578
488 504 578 545
X = 504 578 545 632 ,
578 545 632 728
545 632 728 725
Illustration
Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,
501 488 504 578 466.1 490.7 535.5 574.7
488 504 578 545 475.8 501.0 546.7 586.7
X = 504 578 545 632 , X1 = XI1 = 508.8 535.7 584.6 627.3 ,
578 545 632 728 560.8 590.5 644.4 691.5
545 632 728 725 594.2 625.6 682.8 732.7
Illustration
Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,
501 488 504 578 466.1 490.7 535.5 574.7
488 504 578 545 475.8 501.0 546.7 586.7
X = 504 578 545 632 , X1 = XI1 = 508.8 535.7 584.6 627.3 ,
578 545 632 728 560.8 590.5 644.4 691.5
545 632 728 725 594.2 625.6 682.8 732.7
Illustration
800
y
700
y_reconstructed
600
500
400
1 2 3 4 5 6 7 8
50
residual=y−y_reconstructed
−50
1 2 3 4 5 6 7 8
Illustration
Principal Components of the monthly number of occupied rooms (L = 12)
100
500 200 200
0
0 0 0
−100
−500 −200 −200
−200
50
100 100 100
0
0 0 0
−50
−100 −100 −100
−100
50 100
50 50
0 50
0 0
−50 0
−50 −50
−100 −50
Illustration
100
90
80
70
normalized λi
60
50
40
30
20
10
0
0 2 4 6 8 10 12
i
Illustration
100 3
90
2.5
80
70
2
normalized λi
normalized λi
60
50 1.5
40
1
30
20
0.5
10
0 0
0 2 4 6 8 10 12 2 4 6 8 10 12
i i
Illustration
The contribution of the components XI1 : 97.96%, XI2_3 : 1.42%, XI4_5 : 0, 32%
1200
y
1000 y_rec_PC1
800
600
400
20 40 60 80 100 120 140 160
1500
y
1000 y_rec_PC_2_3
500
−500
20 40 60 80 100 120 140 160
1500
y
1000 y_rec_PC_4_5
500
−500
20 40 60 80 100 120 140 160
Illustration
The contribution of the component XI1_5 : 99.70%
1200
1100 y
y_rec_PC_1_to_5
1000
900
800
700
600
500
400
20 40 60 80 100 120 140 160
100
50
−50
residuals
−100
20 40 60 80 100 120 140 160
Illustration
Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
Illustration
Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
24 97.96 0.71 0.71 99.38
36 97.95 0.72 0.71 99.38
Illustration
Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
24 97.96 0.71 0.71 99.38
36 97.95 0.72 0.71 99.38
80 97.92 0.74 0.72 99.38
Illustration
Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
24 97.96 0.71 0.71 99.38
36 97.95 0.72 0.71 99.38
80 97.92 0.74 0.72 99.38
6 98.55 0.86 0.33 99.73
Illustration
Principal Components of the monthly number of occupied rooms (L = 6)
2400 400 300
300
2200
200
200
2000
100
100
1800 0
0
−100
1600
−200
−100
1400
−300
100 100
50
50
50
0
0
0
−50
−50
−50
−100
−100
−150 −100
Final remarks
PCA is a very popular and widely used tool for reducing the dimension of high
dimensional data
Time series can be considered as variables or as measurements → observation
times are the variables
Classical PCA does not take into account dependence between observations
Results of the different PCA-based techniques are not directly comparable
Practical problems of SSA: choice of L and M
References
Brillinger , D. R., 2001. Time Series: Data Analysis and Theory. Classics in Applied Mathematics,
36, SIAM.
Golyandina, N., Nekrutkin, V. and Zhigljavsky, A., 2001. Analysis of Time Series Structure: SSA and
related techniques. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.
Golyandina, N. and Stepanov, D., 2005. SSA-based approaches to analysis and forecast of
multidimensional time series. In Proceedings of the Fifth Workshop on Simulation, pp. 293Ű298.
Hassani, H., 2007. Singular Spectrum Analysis: Methodology and Comparison. Journal of Data
Science, Vol. 5, pp. 239Ű257.
Jolliffe, I. T., 2002. Principal Component Analysis. Springer, New York, 2nd ed..
Pinto da Costa, J., Silva, I. and Silva, M. E., 2009. Weighted principal components for time series.
Actas do XVII Congresso Anual da SPE (submitted).
Rodrigues, P. C. and De Carvalho, M., 2008. Monitoring Calibration of the Singular Spectrum
Analysis Method. In Proceedings of COMPSTAT’2008, Vol. 2, pp. 955-964, Physica-Verlag.
Shumway, R. and Stoffer, D., 2000. Time Series Analysis and Its Applications. Springer, New York.