You are on page 1of 58

Isabel Silva Principal Component Analysis for Time Series

Principal Component Analysis for Time


Series

Isabel Silva
Departamento de Engenharia Civil, Faculdade de Engenharia da Universidade do Porto
Centro de Investigação e Desenvolvimento em Matemática e Aplicações (CIDMA), Universidade de Aveiro

Seminário do Grupo de Probabilidades e Estatística

21 de Abril de 2010

Seminário do Grupo de Probabilidades e Estatística 1 / 24


Isabel Silva Principal Component Analysis for Time Series

Outline

Motivation
Principal Component Analysis for time series
◮ Classic Principal Component Analysis
◮ Weighted Principal Component Analysis
◮ Dynamics Principal Component Analysis
◮ Singular Spectrum Analysis / Multi-Channel Singular Spectrum Analysis

Illustration
Final remarks

Seminário do Grupo de Probabilidades e Estatística 2 / 24


Isabel Silva Principal Component Analysis for Time Series

Motivation

Multidimensional time and space-time series

Motivation Seminário do Grupo de Probabilidades e Estatística 3 / 24


Isabel Silva Principal Component Analysis for Time Series

Motivation

Multidimensional time
 and space-time series

y

Number of observations (T)


 > Number of series (n)

y

Dimensionality reduction

Motivation Seminário do Grupo de Probabilidades e Estatística 3 / 24


Isabel Silva Principal Component Analysis for Time Series

Motivation

Multidimensional time
 and space-time series

y

Number of observations (T)


 > Number of series (n)

y

Dimensionality reduction

Principal Components Analysis (PCA)

Motivation Seminário do Grupo de Probabilidades e Estatística 3 / 24


Isabel Silva Principal Component Analysis for Time Series

Motivation

Multidimensional time
 and space-time series

y

Number of observations (T)


 > Number of series (n)

y

Dimensionality reduction

Principal Components Analysis (PCA)

T original variables linear M uncorrelated variables:


−−−−−−−−−→
(observation times) transformation Principal Components (PC)

M ≪ T retain most of the variation presented in the dataset [Jolliffe, 2002]

Motivation Seminário do Grupo de Probabilidades e Estatística 3 / 24


Isabel Silva Principal Component Analysis for Time Series

Classic Principal Component Analysis

n measurements on T VARIABLES : {Y1 , Y2 , . . . , YT }, Yj ∈ Rn , j = 1, . . . , T


~
w


n time series, each one with T OBSERVATIONS : {y1 , y2 , . . . , yn }, yi ∈ RT , i = 1, . . . , n

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 4 / 24
Isabel Silva Principal Component Analysis for Time Series

Classic Principal Component Analysis

n measurements on T VARIABLES : {Y1 , Y2 , . . . , YT }, Yj ∈ Rn , j = 1, . . . , T


~
w


n time series, each one with T OBSERVATIONS : {y1 , y2 , . . . , yn }, yi ∈ RT , i = 1, . . . , n

1 n
xij = yij − Y j = yij − ∑ yij ,
n i=1
i = 1, . . . , n; j = 1, . . . , T

   
x1 x11 x12 ··· x1T
   
 x2  h i  x21 x22 ··· x2T 
X=  = X1 X2 ··· XT =
   
.. .. .. .. .. 

 . 


 . . . . 

xn xn1 xn2 ··· xnT

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 4 / 24
Isabel Silva Principal Component Analysis for Time Series

Classic Principal Component Analysis


1
Sample variance-covariance matrix (T × T) of X : S = XT X
n
⇓ Diagonalizing S
λ1 ≥ λ2 ≥ · · · ≥ λT > 0 ||υ j || = 1, j = 1, . . . , T

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series

Classic Principal Component Analysis


1
Sample variance-covariance matrix (T × T) of X : S = XT X
n
⇓ Diagonalizing S
λ1 ≥ λ2 ≥ · · · ≥ λT > 0 ||υ j || = 1, j = 1, . . . , T

jth Principal Component


Zj = Xυ j = υj1 X1 + υj2 X2 + . . . + υjT XT , j = 1, . . . , T

Var(Zj ) = λj , j = 1, . . . , T
λj
Proportion of variance due to Zj : , j = 1, . . . , T
λ1 + · · · + λT

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series

Classic Principal Component Analysis


1
Sample variance-covariance matrix (T × T) of X : S = XT X
n
⇓ Diagonalizing S
λ1 ≥ λ2 ≥ · · · ≥ λT > 0 ||υ j || = 1, j = 1, . . . , T

jth Principal Component


Zj = Xυ j = υj1 X1 + υj2 X2 + . . . + υjT XT , j = 1, . . . , T

Var(Zj ) = λj , j = 1, . . . , T
λj
Proportion of variance due to Zj : , j = 1, . . . , T
λ1 + · · · + λT

Variables with different scales → initial data standardization

1
uij = (yij − Y j )
sjj
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series

Classic Principal Component Analysis


1
Sample variance-covariance matrix (T × T) of X : S = XT X
n
⇓ Diagonalizing S
λ1 ≥ λ2 ≥ · · · ≥ λT > 0 ||υ j || = 1, j = 1, . . . , T

jth Principal Component


Zj = Xυ j = υj1 X1 + υj2 X2 + . . . + υjT XT , j = 1, . . . , T

Var(Zj ) = λj , j = 1, . . . , T
λj
Proportion of variance due to Zj : , j = 1, . . . , T
λ1 + · · · + λT

Variables with different scales → initial data standardization


m
PCA uses the Pearson’s correlation matrix of original variables
1
uij = (yij − Y j )
sjj
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 5 / 24
Isabel Silva Principal Component Analysis for Time Series

Weighted Principal Component Analysis (WPCA) [Pinto da Costa, Silva and


Silva, 2009]


uij = ωj (yij − Y j ), for i = 1, . . . , n; j = 1, . . . , T
T
Weights: ωj , such that ωj ≥ 0, ∑ ωj = 1
j=1

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 6 / 24
Isabel Silva Principal Component Analysis for Time Series

Weighted Principal Component Analysis (WPCA) [Pinto da Costa, Silva and


Silva, 2009]


uij = ωj (yij − Y j ), for i = 1, . . . , n; j = 1, . . . , T
T
Weights: ωj , such that ωj ≥ 0, ∑ ωj = 1
 j=1
y
Weighted matrix of covariances of data

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 6 / 24
Isabel Silva Principal Component Analysis for Time Series

Weighted Principal Component Analysis (WPCA) [Pinto da Costa, Silva and


Silva, 2009]


uij = ωj (yij − Y j ), for i = 1, . . . , n; j = 1, . . . , T
T
Weights: ωj , such that ωj ≥ 0, ∑ ωj = 1
 j=1
y
Weighted matrix of covariances of data

WPCA is capable of higher levels of compression of the data


No stationarity assumptions for the time series are needed
Objective of WPCA descriptive, not inferential

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 6 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


PCA for stationary time series in the frequency domain
DPCA approximate a p vector-valued time series Xt by a set of k uncorrelated
time series Yt which is the best approximation of Xt in m.s.e. sense.
PCA at each frequency −→ uncorrelated principal components series

inferential procedures

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 7 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


PCA for stationary time series in the frequency domain
DPCA approximate a p vector-valued time series Xt by a set of k uncorrelated
time series Yt which is the best approximation of Xt in m.s.e. sense.
PCA at each frequency −→ uncorrelated principal components series

inferential procedures
∞ ∞
∑ |γ (k)| < ∞ ⇒ f (ν ) = ∑ γ (k)e2π iν k , −1/2 ≤ ν ≤ 1/2
k=−∞ k=−∞

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 7 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


PCA for stationary time series in the frequency domain
DPCA approximate a p vector-valued time series Xt by a set of k uncorrelated
time series Yt which is the best approximation of Xt in m.s.e. sense.
PCA at each frequency −→ uncorrelated principal components series

inferential procedures
∞ ∞
∑ |γ (k)| < ∞ ⇒ f (ν ) = ∑ γ (k)e2π iν k , −1/2 ≤ ν ≤ 1/2
k=−∞ k=−∞

T−1
1 T−k
Sample spectrum: f̂ (ν ) = ∑ γ̂ (k)e2π iν k , γ̂ (k) = ∑ (xt+k − x̄)(xt − x̄)
n t=1
k=−(T−1)
m
2
1 T
∑ xt e

−2π iνk t
Periodogram: I(νk ) = √ , νk = k/T, k = 0, . . . , T − 1

T t=1

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 7 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


DPCA [Shumway and Stoffer, 2000]
X = [xij ](i = 1, . . . , n, j = 1, . . . , T) : matrix with n (zero-mean) stationary time series

f̂(ν ) : sample (T × T) spectral density matrix of X → complex-valued, nonnnegative


definite and Hermitian matrix

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


DPCA [Shumway and Stoffer, 2000]
X = [xij ](i = 1, . . . , n, j = 1, . . . , T) : matrix with n (zero-mean) stationary time series

f̂(ν ) : sample (T × T) spectral density matrix of X → complex-valued, nonnnegative


definite and Hermitian matrix

(λ1 (ν ), e1 (ν )), . . . , (λT (ν ), eT (ν )) be (eigenvalue, eigenvector) pairs of f̂(ν ) :


λ1 (ν ) ≥ · · · ≥ λT (ν ) ≥ 0 ||ej (ν )|| = 1, j = 1, . . . , T

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


DPCA [Shumway and Stoffer, 2000]
X = [xij ](i = 1, . . . , n, j = 1, . . . , T) : matrix with n (zero-mean) stationary time series

f̂(ν ) : sample (T × T) spectral density matrix of X → complex-valued, nonnnegative


definite and Hermitian matrix

(λ1 (ν ), e1 (ν )), . . . , (λT (ν ), eT (ν )) be (eigenvalue, eigenvector) pairs of f̂(ν ) :


λ1 (ν ) ≥ · · · ≥ λT (ν ) ≥ 0 ||ej (ν )|| = 1, j = 1, . . . , T

jth principal component series at frequency ν :

ytj (ν ) = ej (ν )∗ X, j = 1, . . . , T
Var(ytj (ν )) = λj (ν )

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series

Dynamic Principal Component Analysis (DPCA) [Brillinger, 2001]


DPCA [Shumway and Stoffer, 2000]
X = [xij ](i = 1, . . . , n, j = 1, . . . , T) : matrix with n (zero-mean) stationary time series

f̂(ν ) : sample (T × T) spectral density matrix of X → complex-valued, nonnnegative


definite and Hermitian matrix

(λ1 (ν ), e1 (ν )), . . . , (λT (ν ), eT (ν )) be (eigenvalue, eigenvector) pairs of f̂(ν ) :


λ1 (ν ) ≥ · · · ≥ λT (ν ) ≥ 0 ||ej (ν )|| = 1, j = 1, . . . , T

jth principal component series at frequency ν :

ytj (ν ) = ej (ν )∗ X, j = 1, . . . , T
Var(ytj (ν )) = λj (ν )

Problem:
Appropriate choice of ν
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 8 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

Carry out a PCA on a suitable chosen lagged version of the original time series
Decompose the original series in a small number of independent and
interpretable components that can be considered as trend and oscillatory
components and a structureless noise
No stationarity assumptions for the time series are needed

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 9 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

Carry out a PCA on a suitable chosen lagged version of the original time series
Decompose the original series in a small number of independent and
interpretable components that can be considered as trend and oscillatory
components and a structureless noise
No stationarity assumptions for the time series are needed

Basic SSA
Decomposition stage
◮ Embedding
◮ Singular Value Decomposition (SVD)
Reconstruction stage
◮ Grouping
◮ Diagonal averaging

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 9 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Embedding
Time series: y = {y0 , y1 , . . . , yn−1 } L : window length (1 < L < n)

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Embedding
Time series: y = {y0 , y1, . . . , yn−1 } L : window length (1 < L < n)
y Trajectory matrix (K × L, K = n − L + 1)

 
y0 y1 y2 ··· yL−1
 
 y1 y2 y3 ··· yL 
h i  
···
 
X= X1 X2 X3 ··· XL = y2 y3 y4 yL+1 
 
 .. .. .. .. .. 

 . . . . . 

yK yK+1 yK+2 ··· yn−1

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Embedding
Time series: y = {y0 , y1, . . . , yn−1 } L : window length (1 < L < n)
y Trajectory matrix (K × L, K = n − L + 1)

 
y0 y1 y2 ··· yL−1
 
 y1 y2 y3 ··· yL 
h i  
···
 
X= X1 X2 X3 ··· XL = y2 y3 y4 yL+1 
 
 .. .. .. .. .. 

 . . . . . 

yK yK+1 yK+2 ··· yn−1

SVD
S = XT X −→ eigenvalues: λ1 ≥ λ2 ≥ · · · ≥ λL and eigenvectors: U1 , U2 , . . . , UL

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Embedding
Time series: y = {y0 , y1, . . . , yn−1 } L : window length (1 < L < n)
y Trajectory matrix (K × L, K = n − L + 1)

 
y0 y1 y2 ··· yL−1
 
 y1 y2 y3 ··· yL 
h i  
···
 
X= X1 X2 X3 ··· XL = y2 y3 y4 yL+1 
 
 .. .. .. .. .. 

 . . . . . 

yK yK+1 yK+2 ··· yn−1

SVD
S = XT X −→ eigenvalues: λ1 ≥ λ2 ≥ · · · ≥ λL and eigenvectors: U1 , U2 , . . . , UL

d = rank(X) = max{i : λi > 0} ≤ L Vi = X Ui / λi , i = 1, . . . , d

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Embedding
Time series: y = {y0 , y1, . . . , yn−1 } L : window length (1 < L < n)
y Trajectory matrix (K × L, K = n − L + 1)

 
y0 y1 y2 ··· yL−1
 
 y1 y2 y3 ··· yL 
h i  
···
 
X= X1 X2 X3 ··· XL = y2 y3 y4 yL+1 
 
 .. .. .. .. .. 

 . . . . . 

yK yK+1 yK+2 ··· yn−1

SVD
S = XT X −→ eigenvalues: λ1 ≥ λ2 ≥ · · · ≥ λL and eigenvectors: U1 , U2 , . . . , UL

d = rank(X) = max{i : λi > 0} ≤ L Vi = X Ui / λi , i = 1, . . . , d


X = X1 + X2 + · · · + Xd , Xi = λi Vi Ui T , (λi , Ui , Vi ) : eigentriples
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 10 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Grouping
M : number of PC → Partition of {1, . . . , d} into M disjoint subsets I1 , . . . , IM ,
where Ik = {ik1 , . . . , ikp }
Construct the corresponding resultant matrix XIk = Xik + · · · + Xikp
1

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 11 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Grouping
M : number of PC → Partition of {1, . . . , d} into M disjoint subsets I1 , . . . , IM ,
where Ik = {ik1 , . . . , ikp }
Construct the corresponding resultant matrix XIk = Xik + · · · + Xikp
1

X ≈ XI1 + · · · + XIM
∑i∈Ik λi
The contribution of the component XIk :
∑di=1 λi

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 11 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Grouping
M : number of PC → Partition of {1, . . . , d} into M disjoint subsets I1 , . . . , IM ,
where Ik = {ik1 , . . . , ikp }
Construct the corresponding resultant matrix XIk = Xik + · · · + Xikp
1

X ≈ XI1 + · · · + XIM
∑i∈Ik λi
The contribution of the component XIk :
∑di=1 λi

Depend on the objective of the study


Inspection of the singular values (λi ) and vectors (Ui , Vi )
To use supplementary information for the parameter choice [Hassani, 2007]:
◮ Periodicity on dataset, periodogram analysis, pairwise scatterplots of singular
vectors, . . .
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 11 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Diagonal Averaging
h iL,K
(k) (k)
Transform XIk = xij (k) , k = 1, . . . , M, into a new series X̃Ik = {ỹ0 , . . . , ỹn−1 },
i,j=1

(k)
ỹt is obtained by averaging xij (k) over all i, j : i + j = t + 2, t = 0, . . . n − 1

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 12 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Diagonal Averaging
h iL,K
(k) (k)
Transform XIk = xij (k) , k = 1, . . . , M, into a new series X̃Ik = {ỹ0 , . . . , ỹn−1 },
i,j=1

(k)
ỹt is obtained by averaging xij (k) over all i, j : i + j = t + 2, t = 0, . . . n − 1

L∗ = min{L, K}; K ∗ = max{L, K}; xij∗ (k) = xij (k) if L < K; xij∗ (k) = xji (k) if L ≥ K

1


t+1 ∗ (k)
 x
p=1 p,t−p+2
, if 0 ≤ t < L∗ − 1
+


 t 1
1 L∗ ∗

∗ ∑p=1 p,t−p+2
(k) (k)
ỹt = x , if L∗ − 1 ≤ t < K ∗
 L
 1 ∑n−K +1


∗ (k)
, if K ∗ ≤ t < n

∗ +2 xp,t−p+2

n−t p=t−K

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 12 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]


Diagonal Averaging
h iL,K
(k) (k)
Transform XIk = xij (k) , k = 1, . . . , M, into a new series X̃Ik = {ỹ0 , . . . , ỹn−1 },
i,j=1

(k)
ỹt is obtained by averaging xij (k) over all i, j : i + j = t + 2, t = 0, . . . n − 1

L∗ = min{L, K}; K ∗ = max{L, K}; xij∗ (k) = xij (k) if L < K; xij∗ (k) = xji (k) if L ≥ K

1


t+1 ∗ (k)
 x
p=1 p,t−p+2
, if 0 ≤ t < L∗ − 1
+


 t 1
1 L∗ ∗

∗ ∑p=1 p,t−p+2
(k) (k)
ỹt = x , if L∗ − 1 ≤ t < K ∗
 L
 1 ∑n−K +1


∗ (k)
, if K ∗ ≤ t < n

∗ +2 xp,t−p+2

n−t p=t−K

M
∑ ỹt
(k)
y = X̃I1 + · · · + X̃IM ⇐⇒ yt = , t = 0, . . . , n − 1
k=1
Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 12 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

Multichannel SSA [Golyandina and Stepanov, 2005]


Extension of SSA to p time series of length n :

{y1 , . . . , yp } where yi = {yi,0 , yi,1 , . . . , yi,n−1 }, i = 1, . . . , p

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 13 / 24
Isabel Silva Principal Component Analysis for Time Series

Singular Spectrum Analysis (SSA) [Golyandina, Nekrutkin and Zhigljavsky, 2001]

Multichannel SSA [Golyandina and Stepanov, 2005]


Extension of SSA to p time series of length n :

{y1 , . . . , yp } where yi = {yi,0 , yi,1 , . . . , yi,n−1 }, i = 1, . . . , p

Apply SSA to a large trajectory matrix (K × Lp)


 
y1,0 ··· y1,L−1 y2,0 ··· y2,L−1 ··· yp,0 ··· yp,L−1
 
 y1,1 ··· y1,L y2,1 ··· y2,L ··· yp,1 ··· yp,L 
X= .
 
 .. .. .. .. .. .. .. .. .. .. 
 . . . . . . . . . 

y1,K ··· y1,n−1 y2,K ··· y2,n−1 ··· yp,K ··· yp,n−1

Principal Component Analysis for time series Seminário do Grupo de Probabilidades e Estatística 13 / 24
Isabel Silva Principal Component Analysis for Time Series

Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices

Illustration Seminário do Grupo de Probabilidades e Estatística 14 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Rodrigues and de Carvalho (2008): carefully choice of L and M → they can
compromise the analysis results

Illustration Seminário do Grupo de Probabilidades e Estatística 14 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Rodrigues and de Carvalho (2008): carefully choice of L and M → they can
compromise the analysis results

1200

Software: SSA - Matlab Tools for 1100

SSA (Eric Breitenberger) and ssa.m 1000

number of occupied rooms


(Francisco Alonso) 900

800
Dataset: Monthly average number of
700

occupied hotel rooms, from 1963 to 600

1976 (Source: Time Series Data Library, 500

http://robjhyndman.com/TSDL//) 400
Jan1963 Dec1976
month

Illustration Seminário do Grupo de Probabilidades e Estatística 14 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)

Illustration Seminário do Grupo de Probabilidades e Estatística 15 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,

Illustration Seminário do Grupo de Probabilidades e Estatística 15 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,

 
501 488 504 578
 
 488 504 578 545 
 
X =  504 578 545 632  ,
 
 
 578 545 632 728 
 
545 632 728 725

Illustration Seminário do Grupo de Probabilidades e Estatística 15 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,

   
501 488 504 578 466.1 490.7 535.5 574.7
   
 488 504 578 545   475.8 501.0 546.7 586.7 
   
X =  504 578 545 632  , X1 = XI1 =  508.8 535.7 584.6 627.3  ,
   
   
 578 545 632 728   560.8 590.5 644.4 691.5 
   
545 632 728 725 594.2 625.6 682.8 732.7

Illustration Seminário do Grupo de Probabilidades e Estatística 15 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Example (L = 4, K = 8 − 4 + 1 = 5, M = 1)
n o
y= 501 488 504 578 545 632 728 725 ,

   
501 488 504 578 466.1 490.7 535.5 574.7
   
 488 504 578 545   475.8 501.0 546.7 586.7 
   
X =  504 578 545 632  , X1 = XI1 =  508.8 535.7 584.6 627.3  ,
   
   
 578 545 632 728   560.8 590.5 644.4 691.5 
   
545 632 728 725 594.2 625.6 682.8 732.7

The contribution of the component XI1 : 99.75%


n o
X̃I1 = 466.1 483.6 515.1 554.5 589.0 632.5 687.1 732.7

Illustration Seminário do Grupo de Probabilidades e Estatística 15 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

800
y
700
y_reconstructed

600

500

400
1 2 3 4 5 6 7 8

50

residual=y−y_reconstructed
−50
1 2 3 4 5 6 7 8

Illustration Seminário do Grupo de Probabilidades e Estatística 16 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration
Principal Components of the monthly number of occupied rooms (L = 12)

1000 400 400 200

100
500 200 200
0
0 0 0
−100
−500 −200 −200
−200

−1000 −400 −400 −300


0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150

200 100 200 200

50
100 100 100
0
0 0 0
−50
−100 −100 −100
−100

−200 −150 −200 −200


0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150

100 150 100 100

50 100
50 50
0 50
0 0
−50 0
−50 −50
−100 −50

−150 −100 −100 −100


0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150

Illustration Seminário do Grupo de Probabilidades e Estatística 17 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Normalized singular values of the monthly number of occupied rooms


If n, L and K are sufficiently large, each harmonic produces two eigentriples with
close singular values

100

90

80

70
normalized λi

60

50

40

30

20

10

0
0 2 4 6 8 10 12
i

Illustration Seminário do Grupo de Probabilidades e Estatística 18 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Normalized singular values of the monthly number of occupied rooms


If n, L and K are sufficiently large, each harmonic produces two eigentriples with
close singular values

100 3

90
2.5
80

70
2
normalized λi

normalized λi
60

50 1.5

40
1
30

20
0.5
10

0 0
0 2 4 6 8 10 12 2 4 6 8 10 12
i i

Illustration Seminário do Grupo de Probabilidades e Estatística 18 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration
The contribution of the components XI1 : 97.96%, XI2_3 : 1.42%, XI4_5 : 0, 32%

1200
y
1000 y_rec_PC1

800

600

400
20 40 60 80 100 120 140 160

1500
y
1000 y_rec_PC_2_3

500

−500
20 40 60 80 100 120 140 160

1500
y
1000 y_rec_PC_4_5

500

−500
20 40 60 80 100 120 140 160

Illustration Seminário do Grupo de Probabilidades e Estatística 19 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration
The contribution of the component XI1_5 : 99.70%

1200

1100 y
y_rec_PC_1_to_5
1000

900

800

700

600

500

400
20 40 60 80 100 120 140 160

100

50

−50

residuals
−100
20 40 60 80 100 120 140 160

Illustration Seminário do Grupo de Probabilidades e Estatística 20 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38

Illustration Seminário do Grupo de Probabilidades e Estatística 21 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
24 97.96 0.71 0.71 99.38
36 97.95 0.72 0.71 99.38

Illustration Seminário do Grupo de Probabilidades e Estatística 21 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
24 97.96 0.71 0.71 99.38
36 97.95 0.72 0.71 99.38
80 97.92 0.74 0.72 99.38

Illustration Seminário do Grupo de Probabilidades e Estatística 21 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration

Choice of L
Contribution of
L PC1 PC2 PC3 % var. PC1-PC3
12 97.96 0.71 0.71 99.38
24 97.96 0.71 0.71 99.38
36 97.95 0.72 0.71 99.38
80 97.92 0.74 0.72 99.38
6 98.55 0.86 0.33 99.73

Illustration Seminário do Grupo de Probabilidades e Estatística 21 / 24


Isabel Silva Principal Component Analysis for Time Series

Illustration
Principal Components of the monthly number of occupied rooms (L = 6)
2400 400 300

300
2200
200
200
2000
100
100

1800 0

0
−100
1600
−200
−100
1400
−300

1200 −400 −200


0 50 100 150 0 50 100 150 0 50 100 150

150 150 100

100 100
50
50
50
0
0
0
−50
−50
−50
−100
−100
−150 −100

−200 −150 −150


0 50 100 150 0 50 100 150 0 50 100 150

Illustration Seminário do Grupo de Probabilidades e Estatística 22 / 24


Isabel Silva Principal Component Analysis for Time Series

Final remarks

PCA is a very popular and widely used tool for reducing the dimension of high
dimensional data
Time series can be considered as variables or as measurements → observation
times are the variables
Classical PCA does not take into account dependence between observations
Results of the different PCA-based techniques are not directly comparable
Practical problems of SSA: choice of L and M

Final remarks Seminário do Grupo de Probabilidades e Estatística 23 / 24


Isabel Silva Principal Component Analysis for Time Series

References

Brillinger , D. R., 2001. Time Series: Data Analysis and Theory. Classics in Applied Mathematics,
36, SIAM.
Golyandina, N., Nekrutkin, V. and Zhigljavsky, A., 2001. Analysis of Time Series Structure: SSA and
related techniques. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.
Golyandina, N. and Stepanov, D., 2005. SSA-based approaches to analysis and forecast of
multidimensional time series. In Proceedings of the Fifth Workshop on Simulation, pp. 293Ű298.
Hassani, H., 2007. Singular Spectrum Analysis: Methodology and Comparison. Journal of Data
Science, Vol. 5, pp. 239Ű257.
Jolliffe, I. T., 2002. Principal Component Analysis. Springer, New York, 2nd ed..

Pinto da Costa, J., Silva, I. and Silva, M. E., 2009. Weighted principal components for time series.
Actas do XVII Congresso Anual da SPE (submitted).
Rodrigues, P. C. and De Carvalho, M., 2008. Monitoring Calibration of the Singular Spectrum
Analysis Method. In Proceedings of COMPSTAT’2008, Vol. 2, pp. 955-964, Physica-Verlag.
Shumway, R. and Stoffer, D., 2000. Time Series Analysis and Its Applications. Springer, New York.

References Seminário do Grupo de Probabilidades e Estatística 24 / 24

You might also like