Professional Documents
Culture Documents
1
Univariate Time Series Models
In Univariate Time Series models we attempt to predict a variable using only information
contained in its past values. (i.e. let the data speak for themselves)
1. E(yt) = , t = 1,2,...,
2. E ( yt )( yt ) = <
2
3. E ( yt )( yt ) = t t t1 , t2
1 2 2 1
2
A stationary series with zero mean
3
Non-stationarity due to changing
mean
4
Non-stationarity due to changing
variance
5
Non-stationarity in autocorrelations
well as in variance
A driftless r andom walk Xt=Xt+N(0,9)
50
40
30
20
10
-10
-20
-30
-40
250 500 750 1000
6
Non-Stationarity in autocorrelations
well as in mean and variance
A r a n d o m w a lk w ith d r ift X t= 0 .2 + x t( - 1 ) + N ( 0 ,9 )
240
200
160
120
80
40
0
250 500 750 1000
7
Non-Stationarity due to mean and
variance: real data
Source: Mukherjee et al(1998). Econometrics and data analysis for developing countries 8
Log transformation to remove non-
stationarity in variance
9
Why stationarity is required?
For a stochastic process Y1, Y2,..YT we need to estimate:
T means E(Y1), E(Y2), . . .E(YT)
T variances V(Y1), V(Y2), . . .V(YT)
T(T-1)/2 covariances Cov(Yi,Yj), i<j
In all 2T+ T(T-1)/2 = T(T+3)/2 parameters
We only have T time series observations
10
Univariate Time Series Models (contd)
So if the process is covariance stationary, all the variances are the same and all
the covariances depend on the difference between t1 and t2. The moments
E ( yt E ( yt ))( yt + s E ( yt + s )) = s , s = 0,1,2, ...
are known as the covariance function.
The covariances, s, are known as autocovariances.
12
Joint Hypothesis Tests
We can also test the joint hypothesis that all m of the k correlation coefficients
are simultaneously equal to zero using the Q-statistic developed by Box and
m
Pierce:
Q = T k2
k =1
where T = sample size, m = maximum lag length
The Q-statistic is asymptotically distributed as a m2.
However, the Box Pierce test has poor small sample properties, so a variant
has been developed, called the Ljung-Box statistic:
m
k2
Q = T (T + 2 )
~ m2
k =1 T k
This statistic is very useful as a portmanteau (general) test of linear dependence
in time series.
13
An ACF Example
Question:
Suppose that a researcher had estimated the first 5 autocorrelation coefficients
using a series of length 100 observations, and found them to be (from 1 to 5):
0.207, -0.013, 0.086, 0.005, -0.022.
Test each of the individual coefficient for significance, and use both the Box-
Pierce and Ljung-Box tests to establish whether they are jointly significant.
Solution:
A coefficient would be significant if it lies outside (-0.196,+0.196) at the 5%
level, so only the first autocorrelation coefficient is significant.
Q=5.09 and Q*=5.26
Compared with a tabulated 2(5)=11.1 at the 5% level, so the 5 coefficients
are jointly insignificant. [p-val=1-@cchisq(5.09,5)=0.595]
14
Moving Average Processes
Some economic hypothesis lead to moving average time series structure. Changes in
price of a stock from day 1 to next day behave as a series of uncorrelated random
variables with zero mean and constant variance
i.e. y t = Pt Pt 1 + u t , t = 1, 2, . . ., T [ ut is uncorrelated random variable]
Random component ut reflects unexpected news e.g. new information about financial
health of a corporation, popularity of the product suddenly rises or falls (due to reports
of desirable or undesirable effects), emergence of a new competitors, revelation of
management scandal etc.
But suppose that full impact of any unexpected news is not completely absorbed by the
market in one day. Then the price change next day might be y t +1 = u t +1 + u t
Where ut +1is the effect of new information received during day t+1 andu reflects
t
the continuing assessment of day t news.
The equation above is a moving average process. The value of economic variable
is yt +1 a weighted combination of current and past period random disturbances.
15
Moving Average Processes
16
Example of an MA Problem
17
Solution
Var(Xt) = E[Xt-E(Xt)][Xt-E(Xt)]
but E(Xt) = 0, so
Var(Xt) = E[(Xt)(Xt)]
= E[(ut + 1ut-1+ 2ut-2)(ut + 1ut-1+ 2ut-2)]
= E[ u t2 + 12 u t21 + 22 u t2 2 +cross-products]
18
Solution (contd)
So Var(Xt) = 0= E [ u t + 1 u t 1 + 2 u t 2 ]
2 2 2 2 2
= +1 + 2
2 2 2 2 2
(why?)
= (1 + 1 + 2 )
2 2 2
= 1 2 + 1 2 2
= ( 1 + 1 2 ) 2
19
Solution (contd)
3 = E[Xt-E(Xt)][Xt-3-E(Xt-3)]
= E[Xt][Xt-3]
= E[(ut +1ut-1+2ut-2)(ut-3 +1ut-4+2ut-5)]
=0
So s = 0 for s > 2.
20
Solution (contd)
(iii) For 1 = -0.5 and 2 = 0.25, substituting these into the formulae above
gives 1 = -0.476, 2 = 0.190.
21
ACF Plot
1.2
0.8
0.6
0.4
acf
0.2
0
0 1 2 3 4 5 6
-0.2
-0.4
-0.6
22
Autoregressive Processes
Economic activity takes time to slow down and speed up. There is a
built in inertia in economic series. A simple process that characterize
this process is the first order autoregressive process
yt = + 1 yt 1 + ut
Where is an intercept parameter and it is assumed that 1 < 1 < 1
ut is uncorrelated random error with mean zero and variance 2
yt is seen to comprise two parts (in addition to intercept)
i. 1yt1 carry over component depending on last period value of yt
Ii. ut new shock to the level of economic variable in current period
23
Autoregressive Processes
or y t = + i L y t + u t
i
i =1
or ( L) y t = + u t where ( L) = 1 (1 L + 2 L2 +... p Lp ) .
24
The Stationary Condition for an AR Model
The condition for stationarity of a general AR(p) model is that the roots of lag
polynomial
1 1 L 2 L ... p L = 0
2 p
all lie outside the unit circle i.e. have their absolute value greater than one.
25
Wolds Decomposition Theorem
States that any stationary series can be decomposed into the sum of two
unrelated processes, a purely deterministic part and a purely stochastic
part, which will be an MA().
where,
( L) = (1 1 L 2 L2 ... p Lp ) 1
26
Sample AR Problem
27
Solution
(i) Unconditional mean: Assume that 1 < < 1 so that AR(1) process is
1
stationary
Stationarity implies that mean and variance are same for all yt t= 1,2,.
E(yt) = E(+1yt-1)
= + 1 E ( yt )
so E ( yt ) 1 E ( yt ) =
E ( y ) =
t
1 1
28
Solution (contd)
yt = ut + 1ut 1 + 1 ut 2 + ...
2
Var(yt) = E[yt-E(yt)][yt-E(yt)]
but E(yt) = 0, since we are setting = 0.
Var(yt) = E[(yt)(yt)]
29
Solution (contd)
Var(yt)=E (u t )(
+ 1u t 1 + 1 u t 2 + .. u t + 1u t 1 + 1 u t 2 + ..
2 2
)
(ut + 1 ut 1 + 1 ut 2 + ... + cross products )]
=E 2 2 2 4 2
+ 1 ut 1 + 1 ut 2 + ...)]
2 2 2 4 2
=E (ut
= u2 + 12 u2 + 14 u2 + ...
= u2 (1 + 12 + 14 + ...)
u2
=
(1 12 )
30
Solution (contd)
= 1 2 + 13 2 + 15 2 + ...
1 2
=
(1 12 )
(make a bivariate table for understanding product of brackets)
31
Solution (contd)
12 2
=
(1 12 )
32
Solution (contd)
1s 2
s =
(1 12 )
The acf can now be obtained by dividing the covariances by the
variance:
33
Solution (contd)
0
0 = =1
0
2 2
1 2 1
2 2
(1 1 ) (1 1 )
1 2
1 = = = 1 2 = = = 12
0
0
2 2
2 2
(1 1 ) (1 1 )
3 = 1
3
s = 1s
34
The Partial Autocorrelation Function (denoted kk)
So kk measures the correlation between yt and yt-k after removing the effects
of yt-k+1 , yt-k+2 , , yt-1 .
35
The Partial Autocorrelation Function (denoted kk)
(contd)
The pacf is useful for telling the difference between an AR process and an
ARMA process.
In the case of an AR(p), there are direct connections between yt and yt-s only
for s p.
In the case of an MA(q), this can be written as an AR(), so there are direct
connections between yt and all its previous values.
36
ARMA Processes
where ( L) = 1 1 L 2 L2 ... p Lp
and ( L) = 1 + 1L + 2 L2 + ... + q Lq
or y t = + 1 y t 1 + 2 y t 2 + ... + p y t p + 1u t 1 + 2 u t 2 + ... + q u t q + u t
with E (u t ) = 0; E (u t ) = ; E (u t u s ) = 0, t s
2 2
37
The Invertibility Condition
38
Summary of the Behaviour of the acf for
AR and MA Processes
39
Summary of the Behaviour ACF and PACF
40
Can you identify the appropriate
ARIMA model from this Pacf?
41
First or second difference needs to be performed?
42
Some sample acf and pacf plots
for standard processes
The acf and pacf are not produced analytically from the relevant formulae for a model of that
type, but rather are estimated using 100,000 simulated observations with disturbances drawn
from a normal distribution.
ACF and PACF for an MA(1) Model: yt = 0.5ut-1 + ut
0.05
0
1 2 3 4 5 6 7 8 9 10
-0.05
-0.1
-0.15
acf and pacf
-0.2
-0.25
-0.3
acf
-0.35
pacf
-0.4
-0.45
Lag
43
ACF and PACF for an MA(2) Model:
yt = 0.5ut-1 - 0.25ut-2 + ut
0.4
0.3 acf
pacf
0.2
0.1
acf and pacf
0
1 2 3 4 5 6 7 8 9 10
-0.1
-0.2
-0.3
-0.4
Lags
44
ACF and PACF for a slowly decaying AR(1) Model:
yt = 0.9yt-1 + ut
0.9
acf
0.8 pacf
0.7
0.6
acf and pacf
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
-0.1
Lags
45
ACF and PACF for a more rapidly decaying AR(1)
Model: yt = 0.5yt-1 + ut
0.6
0.5
acf
pacf
0.4
acf and pacf
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
-0.1
Lags
46
ACF and PACF for a more rapidly decaying AR(1)
Model with Negative Coefficient: yt = -0.5yt-1 + ut
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
acf and pacf
-0.1
-0.2
-0.3
-0.4
acf
-0.5 pacf
-0.6
Lags
47
ACF and PACF for a Non-stationary Model
(i.e. a unit coefficient): yt = yt-1 + ut
0.9
acf
pacf
0.8
0.7
0.6
acf and pacf
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
Lags
48
ACF and PACF for an ARMA(1,1):
yt = 0.5yt-1 + 0.5ut-1 + ut
0.8
0.6
acf
pacf
0.4
acf and pacf
0.2
0
1 2 3 4 5 6 7 8 9 10
-0.2
-0.4
Lags
49
Building ARMA Models
- The Box Jenkins Approach
Box and Jenkins (1970) were the first to approach the task of estimating an
ARMA model in a systematic manner. There are 3 steps to their approach:
1. Identification
2. Estimation
3. Model diagnostic checking
Step 1:
- Involves determining the order of the model.
- Use of graphical procedures
- A better procedure is now available
50
Building ARMA Models
- The Box Jenkins Approach (contd)
Step 2:
- Estimation of the parameters
- AR model can be estimated using least square, while MA and mixed
(ARMA/ARIMA) involve non-linear parameter models can be
estimated iteratively using maximum likelihood.
Step 3:
- Model checking
51
Estimation of ARIMA models
Consider MA(1) model
Box and Jenkins suggest a grid search procedure
Estimate = x and equating first sample and population autocorrelations
functions r1 = 1 /(1 + 12 ) . Using them as starting
values and assuming 0 = 1 = 0 compute by recursive
substitution as follows 1 = X 1
t = X t 1 t 1 , t 2
Compute for each set of values in a suitable range. Point estimates
of the parameters are obtained where error sum of square is minimized.
If tare assumed normally distributed, the Maximum likelihood
estimates as same as LS. Formula for asymptotic distribution of variances
of ML estimators can be applied for computing standard error and
confidence intervals. More complex models can estimated similarly.
52
Some More Recent Developments in
ARMA Modelling
Identification would typically not be done using acfs.
Reasons:
- variance of estimators is inversely proportional to the number of degrees of
freedom.
- models which are profligate might be inclined to fit to data specific features
This gives motivation for using information criteria, which embody 2 factors
- a term which is a function of the RSS
- some penalty for adding extra parameters
The information criteria vary according to how stiff the penalty term is.
The three most popular criteria are Akaikes (1974) information criterion
(AIC), Schwarzs (1978) Bayesian information criterion (SBIC), and the
Hannan-Quinn criterion (HQIC).
AIC = ln($ 2 ) + 2 k / T
k
SBIC = ln( 2 ) + ln T
T
2k
HQIC = ln( 2 ) + ln(ln(T ))
T
where k = p + q + 1, T = sample size. So we min. IC s.t. p p, q q
SBIC embodies a stiffer penalty term than AIC.
Which IC should be preferred if they suggest different model orders?
SBIC is strongly consistent but (inefficient).
AIC is not consistent, and will typically pick bigger models.
54
ARIMA Models
55
Forecasting in Econometrics
Forecasting = prediction.
An important test of the adequacy of a model. e.g.
- Forecasting tomorrows return on a particular share
- Forecasting the price of a house given its characteristics
- Forecasting the riskiness of a portfolio over the next year
- Forecasting the volatility of bond returns
The distinction between the two types is somewhat blurred (e.g, VARs).
56
In-Sample Versus Out-of-Sample
Say we have some data - e.g. monthly KSE-100 index returns for 120
months: 1990M1 1999M12. We could use all of it to build the model, or
keep some observations back:
A good test of the model since we have not used the information from
1999M1 onwards when we estimated the model parameters.
57
How to produce forecasts
Models include:
simple unweighted averages
exponentially weighted averages
ARIMA models
Non-linear models e.g. threshold models, GARCH, etc.
59
Forecasting with ARMA Models
60
Forecasting with MA Models
61
Forecasting with MA Models (contd)
ft, 4 = E(yt+4 | t ) =
ft, s = E(yt+s | t ) = s4
62
Forecasting with AR Models
64
How can we test whether a forecast is accurate or not?
For example, say we predict that tomorrows return on the FTSE will be 0.2, but
the outcome is actually -0.4. Is this accurate? Define ft,s as the forecast made at
time t for s steps ahead (i.e. the forecast made for time t+s), and yt+s as the
realised value of y at time t+s.
Some of the most popular criteria for assessing the accuracy of time series
forecasting techniques are:
N
1
MSE =
N
t =1
( yt + s f t , s ) 2
N
1
MAE is given by MAE =
N
t =1
yt + s f t , s
1 N yt +s ft ,s
Mean absolute percentage error: MAPE = 100
N t =1 yt +s 65
Box-Jenkins Methodology Summarized
66
Illustrations of Box-Jenkins
methodology-I (Pak GDP forecasting)
Year GDP Year GDP Year GDP
Pakistan's real GDP at 1980-81 factor cost (Rs million)
1961 82085 1975 180404 1989 403948 700,000
1962 86693 1976 186479 1990 422484
1963 92737 1977 191717 1991 446005
600,000
1964 98902 1978 206746 1992 480413
1965 108259 1979 218258 1993 487782
1966 115517 1980 233345 1994 509091 500,000
1967 119831 1981 247831 1995 534861
1968 128097 1982 266572 1996 570157 400,000
1969 135972 1983 284667 1997 579865
1970 148343 1984 295977 1998 600125 300,000
1971 149900 1985 321751 1999 625223
1972 153018 1986 342224
200,000
1973 163262 1987 362110
1974 174712 1988 385416
100,000
67
Pakistan GDP forecasting
Stationarity and Identification
First difference of GDP still seems to have some trend with high
variability near end of sample. First difference of log GDP appears
to be relatively trend less.
28,000 .08
24,000
20,000 .06
16,000
.04
12,000
8,000
.02
4,000
0
.00
1965 1970 1975 1980 1985 1990 1995
1965 1970 1975 1980 1985 1990 1995
.04
10,000
.02
0 .00
-.02
-10,000
-.04
-20,000
-.06
-.08
-30,000
1965 1970 1975 1980 1985 1990 1995
1965 1970 1975 1980 1985 1990 1995
Var(d(gdp),2)=72503121 Var(d(log(gdp),2)=0.00074
69
Stationarity and Identification
GDP series appears to have very slowly decaying autocorrelation and
single spike at lag 1 possibly indicates that GDP is a random walk. First
differenced GDP has many significant autocorrelations, which can also be
seen from Ljung-Box stats and p-values
70
Stationarity and Identification
Log of GDP has same autocorrelation structure as GDP. First difference of
log (GDP) looks like white noise. Also look at the Q-stats and p-values
71
Stationarity and Identification
Second differencing seems to be unnecessary. So we
work with first difference of log(GDP).i.e. d=1. ACF
and PACF do not show any nice looking theoretical
pattern.
72
Stationarity and Identification
We will consider fitting several ARIMA(p,1,q) models
ARIMA (p,d,q) AIC BIC
ARIMA (1,1,0) -4.879 -4.792
ARIMA (4,1,0) -4.932 -4.708
ARIMA (0,1,1) -4.910 -4.824
ARIMA (0,1,4) -5.370 -5.284
ARIMA (4,1,4) -5.309 -5.174
ARIMA (5,1,5) -5.249 -5.113
ARIMA (1,1,4) -5.333 -5.202
ARIMA(0,1,4), is identified as the best models using the two model selection
criteria. Smaller the values of the selection criteria better is the in-sample fit
73
Estimation of the models
Estimation output of two best
fitting models
(1 L) yt = (1+1L +2 L2 +3L3 +4 L4 )t
74
(1 L) yt = (1 0.104L + 0.165L 0.201L + 0.913L )t
2 3 4
Model Diagnostics
We look at the correlogram of the estimated model.
The residuals appear to be white noise. P-values of Q-
stats of ARIMA(0,1,4) are smaller.
75
Forecasting: In sample Estimation
To compare the out sample performance of the competing
forecasting models, we hold out last few observations. In this case
the out sample performance will be compared using 5 year hold
out sample 1995-1999
Re-estimate the model using sample 1961-1994
700,000
Observed GDP and and fitted GDP from ARIMA(0,1,4) model
300,000
200,000
100,000
0
1965 1970 1975 1980 1985 1990 1995
GDP GDPF6
76
Forecasting: In sample Estimation
Similar underestimation is
observed for ARIMA(1,1,4)
model
We will select the forecasting 700,000
Observed GDP and and fitted GDP from ARIMA(1,1,4) model
RMSE =
(Y Y ) 2 0
1965 1970 1975 1980 1985 1990 1995
h GDP GDPF7
77
Out sample forecast evaluation
Using the two competing models the forecasts are generated as follows:
79
Airline Passenger Data
Stationarity and Identification
The time series plot indicates an upward trend with
with seasonality and increasing variability. Log
transformation seem to stabilize variance. Seasonality
has to be modeled.
Num ber of airline passengers in thousand
LO G (P AS S E NGE R S )
700
6.50
600 6.25
500 6.00
400 5.75
5.50
300
5.25
200
5.00
100
4.75
0
4.50
49 50 51 52 53 54 55 56 57 58 59 60 61
49 50 51 52 53 54 55 56 57 58 59 60 80
61
Airline Passenger Data
Stationarity and Identification
First difference of eliminates trend. Seasonality is evident. A
seasonal difference Yt -Yt-12 is also needed after first difference.
This is done in Eviews as d(log(Yt),1,12). Both trend and
seasonality appear to be removed.
D(LOG(PASSENGERS)) D(LOG(PASSENGERS),1,12)
.3 .15
.2 .10
.1 .05
.0 .00
-.1 -.05
-.2 -.10
-.3 -.15
49 50 51 52 53 54 55 56 57 58 59 60 61 49 50 51 52 53 54 55 56 57 58 59 60 61
81
Airline Passenger Data
Stationarity and Identification
Lets have a look at ACF, PACF. ACF and PACF of d(log(Yt),1,12) indicate some significant
values at lag 1, and 12. We will do further work on d(log(Yt),1,12)
82
Airline Passenger Data: Identification
83
Airline Passenger Data: Estimation
All the coefficient in the estimated
model AR(1) MA(1) SMA (12)
are significant. The estimated
model in compact form is
(1 1 L) yt = (1 + L)(1 + wL12 ) t
where yt = d (log( passenger ,1,12)
84
Airline Passenger Data: Diagnostic
Checks
After estimating the above
model Use Eviews
command ident resid.The
residuals appear to be
white noise.
85
Airline Passenger Data: Forecasting
Here is the graph of observed and
fitted passengers. The forecast are
given in the table below:
Obse rved values of number of passengers and forecast for 1961
700
Month Forecast
1961.01 442.3 600
1961.10 474.11
1961.11 412.15 86
1961.12 462.14