Professional Documents
Culture Documents
Slide: 1
Overview
Time series data & forecasts
ARIMA models
Model diagnosis & testing
Slide: 2
Data
YEND
YT
Sample
Yt
Yn
Time
Forecasts
Yt m
BackCasting
Y1
Y0
WithinSample
Forecasts
Yn
Ex-Post
Forecasts
Yn 1
Ex-Ante
Forecasts
YN YN 1
YN k
Forecasting
Period
Slide: 3
Autoregressive AR(1):
yt = a0 + a1yt-1 + et
Slide: 4
White Noise
Variance is constant
Var(et) = E(et2) = s2
Uncorrelated
Cov(et , et-j) = 0 for j < > 0 and t
Slide: 5
Lag Operator
Lm yt = yt-m
So AR(1) process can be represented as:
(1 - bL) yt = et
Invertibility
An AR(1) process can be represented as MA():
If |b| < 1
yt = (1 - bL)-1 et
yt = [1 + bL + (bL)2 + . . . ] et
yt = et + b et-1 + b2 et-2 + . . .
Slide: 6
Stationarity
Strong stationarity
In addition, yt is normally distributed
Slide: 7
Stationary Series
Stationary Series ~ N(0,1)
3
2
1
0
-1
10
15
20
-2
-3
Slide: 8
E ( yt ) a0 a1i a1t y0
i 0
Slide: 9
Stationarity Considerations
Sample drawn from recent process may not be
stationary
Hence many econometricians assume process
has been continuing for infinite time
Can be problematic
Slide: 10
11
16
Slide: 11
n 2
2
Var ( yt ) E e t 2 e t e s ns
t s
1
Slide: 12
MA(1) process
yt = et + bet-1 = (1 + bL)et
Invertibility: |b| < 1
(1+bL)-1 yt = et
yt = S(-b) j yt-j + et
So MA(1) process with |b| < 1 is an infinite autoregressive
process
Similary an AR(1) process with |b| < 1 is invertible
i.e. can be represented as an infinite MA process
Slide: 13
MA(1) Process
MA(1) Process
2.0
1.5
1.0
0.5
0.0
-0.5 1
11
16
-1.0
-1.5
Slide: 14
ARMA(1, 1) Process
yt = a1yt-1 + et + bet-1
ARMA(1, 1) Process yt = ayt-1 + e t + b e t-1
2.5
2.0
1.5
1.0
0.5
0.0
-0.5 1
11
16
-1.0
-1.5
Slide: 15
ARMA(p, q)
yt = f1yt-1 + f2yt-2 + . . . + fpyt-p +
et + q1et-1 + q2et-2 + . . . + qqet-q
F(L) yt = q(L)et
Slide: 16
Unit Roots
Stationarity Condition
Roots of f(L) must lie outside the unit circle
|xi| > 1 for all roots xi
Invertibility Condition
Roots of q(L) must lie outside the unit circle
|zi| > 1 for all roots zi
Slide: 17
Autocorrelation
1
ct
n t
(y
t 1
y )( yt t y )
Slide: 18
yt = S biyt-i + et
Slide: 24
Calculating PACF
Slide: 25
PACF by Yule-Walker
f ss
r s f s 1 r s j
j 1
s 1
1 f s 1 r j
j 1
j 1, 2, ...s 1
Slide: 26
Slide: 27
Lab:
Generate time series
Compute theoretical ACF
Yule-Walker equations
Estimate sample ACF
Autocorrel function
How does pattern of ACF depend on parameters?
Slide: 28
Yule-Walker Equations
g0 = E(yt yt) = a1E(yt-1 yt)+ E(et yt) + b1E(et-1 yt)
= a1g1 + s2 + b1E[et-1(a1yt-1 + et + b1et-1)]
= a1g1 + s2 + b1(a1+ b1 ) s2
s
g0
s
1
1
(1 b12 2a1 b1 )
(1 a12 )
(1 a12 )
Copyright 1999-2011 Investment Analytics
Slide: 29
a1 = b1 = 0.5
0.20
Estimated
0.00
-0.20
9 10 11 12 13 14 15 16 17 18 19 20
-0.40
Lag
a1 = 0.6,
b1 = -0.95
9 10 11 12 13 14 15 16 17 18 19 20
Theoretical
Estimated
-0.20
-0.30
-0.40
Lag
Slide: 30
a1 = b1 = 0.5
0.000
-0.500
9 10 11 12 13 14 15 16 17 18 19 20
Estimated
-1.000
Lag
a1 = 0.7,
b1 = -0.3
0.50
Theoretical
Estimated
0.00
1
9 10 11 12 13 14 15 16 17 18 19 20
-0.50
Lag
Slide: 31
ACF
PACF
White Noise
AR(1): a > 0
AR(1): a < 0
MA(1): b > 0
All rs = 0
Geometric decay: r1 = as
Oscillating decay: r1 = as
+ve spike at lag 1.
r0 = 0 for s > 1
-ve spike at lag 1.
r0 = 0 for s > 1
Geometric decay at lag 1
Sign r1 = sign(a+b)
Oscillating decay at lag 1
Sign r1 = sign(a+b)
All fss = 0
f11 = r1; fss = 0; s>1
f11 = r1; fss = 0; s>1
Oscillating decay
f11 > 0
Decay
f11 > 0
Osc. decay at lag 1
f11 = r1
Geom. decay at lag 1
f11 = r1
MA(1): b < 0
ARMA(1,1): a < 0
ARMA(1,1): a > 0
Slide: 32
Box-Jenkins Methodology
Phase I - identification
Identify appropriate models
Slide: 33
Phase I - Identification
Data preparation
Transform data to stabilise variance
Difference data to obtain stationary series
Model selection
Examine data, ACF and PACF to identify
potential models
Slide: 34
Estimation
Estimate model parameters
Select best model using suitable criterion
Diagnostics
Check ACF/PACF of residuals
Do portmanteau test of residuals
Are residuals white noise?
If not, return to phase I (model selection)
Slide: 35
Two objectives
Minimize sums of squares of residuals
Can always reduce by adding more parameters
Parsimony
Avoid excess paramterization
I.E. Loss of degrees of freedom
Better forecasting performance
Solution
Penalize the likelihood for each additional term
added to model
Slide: 36
Likelihood Function
Assume yt ~ No(m, s2)
Likelihood
m = Syt / n
(s)2 = S(yt - m)2 / n
Slide: 37
Likelihood
L = (-n/2)[Ln(2) +Ln(s2)] - (1/2s2)S(yt - bxt)2
MLE Estimates
(s)2 = S(et)2 / n
b S(xtyt ) / S(xt )2
Standard Error
sb = s / {S(xt - xmean)2}1/2
Slide: 38
= -2Ln(Likelihood) + 2m
nLn(SSE) + 2m
= -2Ln(Likelihood) + mLn(n)
nLn(SSE) + mLn(n)
L is likelihood function
SSE is error sums of squares
n is number of observations
m is number of model parameters
Slide: 39
Slide: 40
Sums of Squares
Sums of Squares
Due to Model = SSM
SSM ( yt y ) 2
Slide: 41
Slide: 42
Coefficient of Determination
R2 = SSR/SST
How much of total variation is explained by
regression
Adjusted R2
Adjusted R2 = 1 - (1 - R2 ) (n - 1) / (n - m - 1)
Idea: R2 can always increase by adding more variables
Penalize R2 for loss of degrees of freedom
Useful for comparing models with different # independent
variables m
Slide: 43
Diagnostic Checking
Test for:
Bias: non-zero mean
Heteroscedasticity (non-constant variance)
Non-normality of residuals
Slide: 44
Residual
Residual Plot
Ft
Slide: 45
Residual
Ft
Slide: 46
Residual
Ft
Slide: 47
10 11 12 13 14 15 16 17 18 19 20
-0.20
-0.30
-0.40
-0.50
Lag
Slide: 48
Durbin-Watson Test
n
DW
2
(
e
e
)
t t 1
t 2
2
e
t
t 1
Slide: 49
Q n r
s 1
2
s
Usually h 20 is selected
Used to test autocorrelations of residuals
If residuals are white noise the Q ~ c2(h-m)
m is number of model parameters (0 for raw data)
Copyright 1999-2011 Investment Analytics
Slide: 50
Q* n(n 2)
s 1
r s2
ns
Slide: 51
Jarque-Bera Test
J-B = n[Skewness / 6 + (Kurtosis 3)2 / 24]
J-B ~ c2(2)
Slide: 52
5.0
4.0
3.0
2.0
1.0
97
93
89
85
81
77
73
69
65
61
57
53
49
45
41
37
33
29
25
21
17
13
-1.0
0.0
-2.0
-3.0
Actual
-4.0
Forecast
Slide: 53
ACF
PA C F
0.60
U p p er 9 5 %
Lo wer 9 5 %
0.40
0.20
19
17
15
13
11
0.00
-0.20
-0.40
Slide: 54
ACF
0.15
U p p er 9 5 %
PA C F
Lo wer 9 5 %
0.10
0.05
19
17
15
13
11
-0.05
0.00
-0.10
-0.15
-0.20
-0.25
Slide: 55
Forecast Function
E.g. AR(1) process: yt+1 = a0 + a1yt + et+1
Forecast Function
Et(yt+1) = a0 + a1yt
Et(yt+j) = Et(yt+j | yt , yt-1, yt-2, . . . , et, et-1 , . . .)
Et(yt+2) = a0 + a1 Et(yt+1) = a0 + a0a1 + a12yt
Et(yt+j) = a0(1 + a1 + a12 + . . . + a1j-1) + a12yt a0/(1- a1)
Slide: 56
Forecast Error
Slide: 57
Confidence Intervals
Forecast Variance
Var[ht(j)] = s2[1j + a12 + a14 + . . . + a12(j-1)] s2/(1- a12 )
Forecast variance is an increasing function of j
In limit, forecast variance converges to variance of {yt}
Confidence Intervals
Var[ht(1)] = s2
Hence 95% confidence interval for yt+1 is a0 + a1yt 1.96s
Slide: 58
Non-Stationarity
Slide: 59
ARIMA Models
ARIMA(p,d,q) models
Autoregressive Integrated Moving Average
d is the order of the differencing operator required to produce
stationarity (in the mean)
Slide: 60
Seasonal Models
Slide: 61
Explicitly in model
With AR and/or MA terms at lag S
Seasonal differencing
Difference series at lag S to achieve (seasonal)
stationarity
E.g. for monthly seasonality form y*t = yt - y12
Model resulting stationary series y*t in usual way
Slide: 62
140
120
100
80
60
40
20
Jan-92
Jan-90
Jan-88
Jan-86
Jan-84
Jan-82
Jan-80
Jan-78
Jan-76
Jan-74
Jan-72
Jan-70
Jan-68
Jan-66
Jan-64
Jan-62
Jan-60
Slide: 63
Data preparation
Clearly non-stationary in mean and variance
Consider DLn(WPI)
Slide: 64
a0
0.0013
0.04%
AR(2)
0.0035
0.52%
ARMA(1,(1,4)) 0.0025
5.96%
ARMA(2,(1,4)) 0.0025
6.25%
a1
a2
b1
b4
0.0738
0.00%
0.4423 0.2345
-502.3 -496.6
0.00% 0.46%
0.7700
-0.4246 0.3120 -511.0 -502.6
0.03%
3.48% 0.07%
0.7969 -0.0238 -0.4411 0.3132 -509.0 -497.8
0.02% 43.38% 2.98% 0.06%
AIC
BIC Adj. R
-497.3 -494.4 33.3%
36.4%
42.7%
42.3%
Slide: 65
Actual
0.04
Forecast
0.03
0.02
0.01
0.00
-0.01 1
-0.02
-0.03
Slide: 66
Regression Models
Linear models of form:
Yt = b0 + b1X1t + b2X2t + . . . + bmXmt + et
Slide: 67
Slide: 68
Regression Methods
Standard Method
Use all data
Problem: data dependent; structural change
Stepwise
Forward: start with minimal model, add variables
Backward: start with full model and eliminate variables
Estimate contribution of individual variables
Rolling/ Recursive
Re-estimate regression over overlapping, successive fixedlength periods
Re-estimate regression after adding each new periods data
Useful for ex-ante estimation & out of sample forecasting
Slide: 69
Slide: 70
22
20
18
16
14
12
19
89
M
6
19
92
M
2
19
73
M
6
19
76
M
2
19
78
M
10
19
81
M
6
19
84
M
2
19
86
M
10
19
65
M
6
19
68
M
2
19
70
M
10
19
60
M
2
19
62
M
10
10
Slide: 71
ANOVA
R2
Correl
F
-0.024
0.010
-2.442
1.497%
8.6%
20.7%
10.82
14.338
3.424
4.188
0.003%
DF
-0.280
0.065
-4.321
0.002%
461.00
-0.007
0.003
-2.763
0.595%
Prob
-0.159
0.040
-3.941
0.009%
0.000%
Slide: 72
Residuals
20%
15%
10%
5%
Errors
0%
-6.00% -4.00% -2.00% 0.00%
-5%
2.00%
4.00%
6.00%
-10%
-15%
-20%
-25%
Forecast Excess Returns
Slide: 73
300%
Regression
Buy & Hold
250%
200%
150%
100%
50%
0%
-50%
1960M2 1962M2 1964M2 1966M2 1968M2 1970M2 1972M2 1974M2 1976M2 1978M2 1980M2 1982M2 1984M2 1986M2 1988M2 1990M2 1992M2
Slide: 74
Conditional Mean = yt
yt+s = yt + Set+i for i = 1, . . . , s
Et(yt+s) = yt + Et(Set+i ) = yt
Slide: 75
Hence non-stationary
Covariance
E[(yt - y0)(yt-s - y0) = E[(Sei )(et-s + et-s-1 + . . . e1)]
= E[(et-s)2 + . . . + (e1)2]
gt-s = (t - s)s2
Slide: 76
Correlation: rs = [(t-s)/t]1/2
For small s, (t-s)/t 1
As s increases, rs will decay very slightly
Identification Problem
Cant use ACF to distinguish between a unit root
process (a1 = 1) and one in which a1 is close to 1
Will mimic an AR(1) process with a near unit root
Slide: 77
Slide: 78
Appears stationary
ACF decays to zero
0.000
0.80
11
16
-1.000
0.60
0.40
-2.000
0.20
Estimated
0.00
-0.20
2 3
6 7
9 10 11 12 13 14 15 16 17 18 19 20
-0.40
Lag
Slide: 79
Dickey-Fuller Methodology
Use Monte-Carlo
Generate 10,000 unit root processes {yt }
Estimate parameter a1
Estimate confidence levels:
90% of estimates are less than 2.58 SE from 1
95% of estimates are less than 2.89 SE from 1
99% of estimates are less than 3.51 SE from 1
Test Example
Suppose we have series for which estimated value of
parameter a1 is 2.95 SE < 1
Reject hypothesis of unit root at 5% level
Slide: 80
Dickey-Fuller Tests
Unit Root Process: yt = a1yt-1 + et
Equivalent form
Dyt = gyt-1 + et
g = 1 - a1
Test: g = 0
Equivalent to testing a1 = 1
Slide: 81
Test Procedure
Estimate g using OLS
Compute t-statistic
Divide OLS estimate by SE
Compare t-statistic with appropriate critical
value in Dickey-Fuller tables
Critical value depends on
sample size
form of model
confidence level
Slide: 82
Critical Values
Model
Dyt = a0 + gyt-1 + a2t + et
Dyt = a0 + gyt-1 + et
Dyt = gyt-1 + et
Hypothesis
g=0
g = a2 = 0
f3
a0 g = a2 = 0
f2
g=0
tm
a0 g = 0
f1
g=0
Slide: 83
Joint Tests
Used to test joint hypotheses e.g. a0 = g = 0
Constructed like ordinary F-test
fi
[
RSS (restricted ) RSS (unrestricted ) / r
Slide: 84
Extensions of Dickey-Fuller
AR(p) Process
yt = a0 + a1yt-1 + . . . + ap-2yt-p+2 + ap-1yt-p+1 + apyt-p + et
Add and subtract apyt-p+1
yt = a0 + a1yt-1 + . . . + ap-2yt-p+2 + (ap-1 + ap)yt-p+1 - ap Dyt-p+1 + et
Add and subtract (ap-1 + ap)yt-p+2
yt = a0 + a1yt-1 + . . . -(ap-1 + ap) Dyt-p+2 - ap Dyt-p+1 + et
Slide: 85
p
p
g 1 ai bi ai
j i
i 1
No intercept or trend: t
Intercept, no trend: tm
Intercept and Trend: tt
Copyright 1999-2011 Investment Analytics
Slide: 86
Slide: 87
Slide: 88
Phillips-Perron Tests
wT s
DX = det(XTX), the determinant of the regressor matrix X
S is the standard error of the regression
w is the # of estimated correlations
1 T 2 2 T T
2
~
s Tw ui ut ut s
T i 1
T s 1 t s 1
Copyright 1999-2011 Investment Analytics
Slide: 89
Slide: 90
PPP model
Et = pt - p*t + dt
Et is log of dollar price of foreign exchange
pt is log of US price levels
p*t is log of foreign price levels
dt represents deviation from PPP in period t
Testing PPP
Reject if series {dt} is non-stationary
Slide: 91
Slide: 92
Worksheet: PPP
Series of real Yen FX rates 1973-89
Slide: 93
SE
0.038 0.0203
1.881
6.14%
g -0.031 0.0173
-1.820
7.03%
a0
m
n
ANOVA
Max Likelihood
AIC
-291.35
BIC
DW
R2
Adj. R2
1
202
DF
SS
MS
Model
1 0.0039
0.00388
Error
200 0.2340
0.00117
Total
201 0.2379
F
3.31
p
7.03%
-288.04
2.03
1.6%
1.1%
Portmanteau Tests
Q(24)
p
Box-Pierce
26.83
26.32%
Ljung-Box
29.10
17.69%
T-Test: H0: g = 0
Could reject at the 93% confidence level
Conclude series is stationary and PPP holds
Dickey-Fuller
Cant reject unit root hypothesis at 95% level
Slide: 94
Simple methods
Exponential smoothing, etc.
Simple, low cost, often effective
Limitations
Query out of sample performance
Underlying model not articulated
ARIMA models
Staple of econometricians
Models articulated and testable
Limitations
Estimation is non-trivial
Problems with (near) random processes
Slide: 95