You are on page 1of 46

Introduction to Time Series Regression and Forecasting (SW Chapter 12)

Time series data are data collected on the same observational unit at multiple time periods Aggregate consumption and GDP for a country (for example, 20 years of quarterly observations observations" for ' year ()* observations" !0

#en$%, pound$% and &uro$% exchange rates (daily data +igarette consumption per capital for a state
'2,'

&xample -' of time series data. /0 rate of inflation

'2,2

&xample -2. /0 rate of unemployment

'2,(

Why use time series data? 1o develop forecasting models o 2hat 3ill the rate of inflation be next year4 1o estimate dynamic causal effects o 5f the 6ed increases the 6ederal 6unds rate no3, 3hat 3ill be the effect on the rates of inflation and unemployment in ( months4 in '2 months4 o 2hat is the effect over time on cigarette consumption of a hi7e in the cigarette tax Plus, sometimes you don8t have any choice9 o :ates of inflation and unemployment in the /0 can be observed only over time;
'2,<

Time series data raises new technical issues 1ime lags +orrelation over time (serial correlation or autocorrelation" 6orecasting models that have no causal interpretation (speciali=ed tools for forecasting". o autoregressive (A:" models o autoregressive distributed lag (AD>" models +onditions under 3hich dynamic effects can be estimated, and ho3 to estimate them +alculation of standard errors 3hen the errors are serially correlated
'2,*

sing Regression !odels "or Forecasting (SW Section 12#1) 6orecasting and estimation of causal effects are quite different ob?ectives; 6or forecasting, o R 2 matters (a lot@" o Amitted variable bias isn8t a problem@ o 2e 3ill not 3orry about interpreting coefficients in forecasting models o &xternal validity is paramount. the model estimated using historical data must hold into the (near" future
'2,)

Introduction to Time Series $ata and Serial Correlation (SW Section 12#2) 6irst 3e must introduce some notation and terminology; %otation "or time series data Yt value of Y in period t; T observations on the time series Data set. Y',9,YT random variable Y 2e consider only consecutive, evenly,spaced observations (for example, monthly, 'B)0 to 'BBB, no missing months" (else yet more complications;;;"
'2,C

We will trans"orm time series &aria'les using lags( "irst di""erences( logarithms( ) growth rates

'2,!

Example. Duarterly rate of inflation at an annual rate +P5 in the first quarter of 'BBB ('BBB.5" ')<;!C '));0( +P5 in the second quarter of 'BBB ('BBB.55" Percentage change in +P5, 'BBB.5 to 'BBB.55
'));0( ')<;!C ';') '00 '00 ')<;!C ')<;!C

0;C0(E

Percentage change in +P5, 'BBB.5 to 'BBB.55, at an annual rate <0;C0( 2;!'E (percent per year" >i7e interest rates, inflation rates are (as a matter of convention" reported at an annual rate; /sing the logarithmic approximation to percent changes yields <'00Flog('));0(" G log(')<;!C"H 2;!0E
'2,B

Example. /0 +P5 inflation G its first lag and its change CPI = Consumer price index (Bureau of abor !tatistics"

'2,'0

'2,''

*utocorrelation 1he correlation of a series 3ith its o3n lagged values is called autocorrelation or serial correlation; 1he first autocorrelation of Yt is corr(Yt,YtG'" 1he first autocovariance of Yt is cov(Yt,YtG'" 1hus corr(Yt,YtG'"
cov(Yt , Yt ' " var(Yt " var(Yt ' "

'

1hese are population correlations G they describe the population ?oint distribution of (Yt,YtG'"
'2,'2

'2,'(

Sample autocorrelations 1he jth sample autocorrelation is an estimate of the #th population autocorrelation.
I# Y ,Y " cov( t t # Y" var(
t

3here
Y ,Y " cov( t t #
T ' (Yt Y # +',T "(Yt # Y',T # " T # ' t = # +'

3here Y # +',T is the sample average of Yt computed over observations t #J',9,T o $ote. the summation is over t=#J' to T (%&y"'
'2,'<

Example. Autocorrelations of. ('" the quarterly rate of /;0; inflation (2" the quarter,to,quarter change in the quarterly rate of inflation

'2,'*

1he inflation rate is highly serially correlated (' about this quarter8s inflation rate 1he plot is dominated by multiyear s3ings Kut there are still surprise movements@

;!*"

>ast quarter8s inflation rate contains much information

'2,')

(ore examples of time series ) transformations

'2,'C

(ore examples of time series ) transformations* ctd+

'2,'!

Stationarity+ a ,ey idea "or e-ternal &alidity o" time series regression 0tationarity says that the past is li7e the present and the future, at least in a probabilistic sense;

2e8ll focus on the case that Yt stationary;


'2,'B

*utoregressions (SW Section 12#.) A natural starting point for a forecasting model is to use past values of Y (that is, YtG', YtG2,9" to forecast Yt; An autoregression is a regression model in 3hich Yt is regressed against its o3n lagged values; 1he number of lags used as regressors is called the order of the autoregression; o 5n a first order autoregression, Yt is regressed against YtG' o 5n a pth order autoregression, Yt is regressed against YtG',YtG2,9,YtGp;
'2,20

The First /rder *utoregressi&e (*R(1)) !odel 1he population A:('" model is Yt

0 J 'YtG' J ut

0 and ' do not have causal interpretations if ' 0, YtG' is not useful for forecasting Yt 1he A:('" model can be estimated by A>0 regression of Yt against YtG' 1esting ' 0 v; ' 0 provides a test of the hypothesis that YtG' is not useful for forecasting Yt
'2,2'

Example+ *R(1) model o" the change in in"lation &stimated using data from 'B)2.5 G 'BBB.5L.
Inf t

0;02 G 0;2''InftG' R 2 (0;'<" (0;'0)"

0;0<

5s the lagged change in inflation a useful predictor of the current change in inflation4 t

;2''$;'0)

';BB M ';B) 0 at the *E significance level

:e?ect ,0. '

#es, the lagged change in inflation is a useful predictor of current change in infl; (but lo3 R 2 @"
'2,22

Example. A:('" model of inflation G 01A1A 6irst, let 01A1A 7no3 you are using time series data
generate time=q(1959q1)+_n-1; _n is the observation no. So this command creates a new variable time that has a special quarterly date format format time %tq; sort time; Specify the quarterly date format Sort by time

tsset time; Let STATA know that the variable time is the variable you want to indicate the time scale

'2,2(

Example. A:('" model of inflation G 01A1A, ctd;


. gen lcpi = log(cpi); . gen inf = 400*(lcpi _n!-lcpi _n-1!); annual rate . corrgram inf " noplot lags(#); variable cpi is already in memory quarterly rate of inflation at an computes first 8 sample autocorrelations

$%& %' (%' ) (ro*+) ----------------------------------------1 0.#459 0.#4,, 11,.,4 0.0000 0..,,/ 0.1.4-1-.9. 0.0000 / 0..,4, 0./1## /09.4# 0.0000 4 0.,.05 -0.--1# /#4.1# 0.0000 5 0.5914 0.00-/ 44-.,. 0.0000 , 0.55/# -0.0-/1 494.-9 0.0000 . 0.4./9 -0.0.40 5/-.// 0.0000 # 0./,.0 -0.1,9# 555./ 0.0000 . gen inf = 400*(lcpi _n!-lcpi _n-1!) 0his syntax creates a new variable inf the !nth" observation of which is #$$ times the difference between the nth observation on lcpi and the !n%&"th observation on lcpi that is the first difference of lcpi

'2,2<

Example. A:('" model of inflation G 01A1A, ctd


Syntax1 $.2inf is t3e first lag of 2inf . reg 2inf $.2inf if tin(19,-q1"1999q4)" r; 4egression 5it3 ro*6st stan2ar2 errors 76m*er of o*s 8( 1" 150) (ro* + 8 4-sq6are2 4oot 9:; = = = = = 15/.9, 0.04#4 0.044, 1.,,19

-----------------------------------------------------------------------------< 4o*6st 2inf < 'oef. :t2. ;rr. t (+<t< 95% 'onf. =nter>al! -------------+---------------------------------------------------------------2inf < $1 < -.-1095-5 .1059#-# -1.99 0.04# -.4-0/,45 -.0015404 _cons < .01##1.1 .1/50,4/ 0.14 0.##9 -.-4#05..-#5,914 -----------------------------------------------------------------------------if tin(19,-q1"1999q4) STATA time series syntax for usin' only observations between &()*q& and &(((q# +inclusive,. This requires definin' the time scale first as we did above

'2,2*

Forecasts and "orecast errors - note on terminology. A predicted value refers to the value of Y predicted (using a regression" for an observation in the sample used to estimate the regression G this is the usual definition A forecast refers to the value of Y forecasted for an observation not in the sample used to estimate the regression; Predicted values are Nin sampleO 6orecasts are forecasts of the future G 3hich cannot have been used to estimate the regression;
'2,2)

Forecasts+ notation YtPtG'


I Y t Pt '

forecast of Yt based on YtG',YtG2,9, using the forecast of Yt based on YtG',YtG2,9, using the

population (true un7no3n" coefficients estimated coefficients, 3hich 3ere estimated using data through period tG'; 6or an A:('", YtPtG'

0 J 'YtG'
I J I YtG', 3here I and I 3ere estimated 0 ' 0 '

I Y t Pt '

using data through period tG';

'2,2C

Forecast errors 1he one,period ahead forecast error is, forecast error
I Yt G Y t Pt '

1he distinction bet3een a forecast error and a residual is the same as bet3een a forecast and a predicted value. a residual is Nin,sampleO a forecast error is Nout,of,sampleO G the value of Yt isn8t used in the estimation of the regression coefficients
'2,2!

The root mean s0uared "orecast error (R!SF1) :Q06&


I "2 H E F(Yt Y tPt '

1he :Q06& is a measure of the spread of the forecast error distribution; 1he :Q06& is li7e the standard deviation of ut, except that it explicitly focuses on the forecast error using estimated coefficients, not using the population regression line; 1he :Q06& is a measure of the magnitude of a typical forecasting Nmista7eO
'2,2B

Example+ forecasting inflation using and A:('" A:('" estimated using data from 'B)2.5 G 'BBB.5L.
Inf t

0;02 G 0;2''InftG'

Inf'BBB.555 Inf'BBB.5L Inf'BBB.5L

2;! (units are percent, at an annual rate" (;2 0;< 0;02 G 0;2''0;< ,0;0) ,0;' (;2 G 0;' (;'
'2,(0

0o the forecast of Inf2000.5 is,


Inf 2000. I P'BBB. I.

so
Inf 2000.I P'BBB.I. Inf Inf'BBB.5L J 2000. I P'BBB. I.

The pth order autoregressi&e model (*R(p)) Yt

0 J 'YtG' J 2YtG2 J 9 J pYt/p J ut

1he A:(p" model uses p lags of Y as regressors 1he A:('" model is a special case 1he coefficients do not have a causal interpretation 1o test the hypothesis that YtG2,9,YtGp do not further help forecast Yt, beyond YtG', use an 0,test /se t, or 0,tests to determine the lag order p Ar, better, determine p using an Ninformation criterionO (see !1 !ection 23+4 / %e %on5t cover t&is"
'2,('

Example. *R(2) model o" in"lation


Inf t

;02 G ;2'InftG' G ;(2InftG2 J ;'BInftG( (;'2" (;'0" (;0B" 0;2' (;0B"

G ;0<InftG<, R 2 (;'0"

0,statistic testing lags 2, (, < is );<( (p,value R ;00'" R 2 increased from ;0< to ;2' by adding lags 2, (, < >ags 2, (, < (?ointly" help to predict the change in inflation, above and beyond the first lag
'2,(2

Example+ *R(2) model o" in"lation 3 ST*T*


. reg 2inf $(1?4).2inf if tin(19,-q1"1999q4)" r; 4egression 5it3 ro*6st stan2ar2 errors 76m*er of o*s 8( 4" 14.) (ro* + 8 4-sq6are2 4oot 9:; = = = = = 15,..9 0.0000 0.-0./ 1.5-9-

-----------------------------------------------------------------------------< 4o*6st 2inf < 'oef. :t2. ;rr. t (+<t< 95% 'onf. =nter>al! -------------+---------------------------------------------------------------2inf < $1 < -.-0.#5.5 .099-/ --.09 0.0/# -.40/959-.011.55# $- < -./1,1/19 .0#,9-0/ -/.,4 0.000 -.4#.90,# -.144/5. $/ < .19/9,,9 .0#4.119 -.-9 0.0-/ .0-,55,5 ./,1/..4 $4 < -.0/5,..4 .0994/#4 -0./, 0..-0 -.-/-1909 .1,0#/,1 _cons < .0-/.54/ .1-/9-14 0.19 0.#4# -.--114/4 .-,#,5------------------------------------------------------------------------------.T/S $(1?4).2inf is A convenient way to say !use la's &0# of dinf as re'ressors" L& 1 L# refer to the first second 1 #th la's of dinf
'2,((

'2,(<

Example+ *R(2) model o" in"lation 3 ST*T*( ctd#


. 2is @%2A6ste2 4sq6are2 = @ _res6lt(#); %2A6ste2 4sq6are2 = .1#5.,#-. test $-.2inf $/.2inf $4.2inf; ( 1) ( -) ( /) $-.2inf = 0.0 $/.2inf = 0.0 $4.2inf = 0.0 8( /" 14.) = (ro* + 8 = ,.4/ 0.0004 result+8, is the rbar%squared of the most recently run re'ression

L*.dinf is t3e secon2 lag of 2inf" etc.

$ote. some of the time series features of 01A1A differ bet3een 01A1A v; C and 01A1A v; !9

'2,(*

$igression. 3e used Inf, not Inf, in the A:8s; Why? 1he A:('" model of InftG' is an A:(2" model of Inft. Inft or Inft G InftG' or Inft so Inft InftG' J 0 J 'InftG' G 'InftG2 J ut

0 J 'InftG' J ut 0 J '(InftG' G InftG2" J ut

0 J ('J'"InftG' G 'InftG2 J ut

'2,()

0o 3hy use Inft, not Inft4 A:('" model of Inf. A:(2" model of Inf. Inft Inft

0 J 'InftG' J ut 0 J 'Inft J 2InftG' J vt

2hen Yt is strongly serially correlated, the A>0 estimator of the A: coefficient is biased to3ards =ero; 5n the extreme case that the A: coefficient ', Yt isn8t stationary. the ut8s accumulate and Yt blo3s up; 5f Yt isn8t stationary, our regression theory are 3or7ing 3ith here brea7s do3n

'2,(C

Sere, Inft is strongly serially correlated G so to 7eep ourselves in a frame3or7 3e understand, the regressions are specified using Inf 0or optional reading* see !1 !ection 23+6* 27+8* 27+7

'2,(!

Time Series Regression with *dditional 4redictors and the *utoregressi&e $istri'uted 5ag (*$5) !odel (SW Section 12#2) 0o far 3e have considered forecasting models that use only past values of Y 5t ma7es sense to add other variables (9" that might be useful predictors of Y, above and beyond the predictive value of lagged values of Y. Yt

0 J 'YtG' J 9 J pYtGp
J '9tG' J 9 J r9tGr J ut

1his is an autoregressive distributed lag (ADL) model


'2,(B

Example. lagged unemployment and inflation According to the NPhillips curveO says that if unemployment is above its equilibrium, or Nnatural,O rate, then the rate of inflation 3ill increase; 1hat is, Inft should be related to lagged values of the unemployment rate, 3ith a negative coefficient 1he rate of unemployment at 3hich inflation neither increases nor decreases is often called the Nnon, accelerating rate of inflationO unemployment rate. the TA5:/ 5s this relation found in /0 economic data4 +an this relation be exploited for forecasting inflation4
'2,<0

1he empirical NPhillips +urveO

1he TA5:/ is the value of u for 3hich Inf

0
'2,<'

Example. AD>(<,<" model of inflation


Inf t

';(2 G ;()InftG' G ;(<InftG2 J ;0CInftG( G ;0(InftG< (;'0" (;0!" (;0B"

(;<C" (;0B"

G 2;)!:nemtG' J (;<(:nemtG2 G ';0<:nemtG( J ;0C:nemptG< (;<C" (;!B" (;!B" (;<<"

R2

0;(* G a big improvement over the A:(<", for ;2'

2 R 3hich

'2,<2

Example. dinf and unem G 01A1A


. reg 2inf $(1?4).2inf $(1?4).6nem if tin(19,-q1"1999q4)" r; 4egression 5it3 ro*6st stan2ar2 errors 76m*er of o*s 8( #" 14/) (ro* + 8 4-sq6are2 4oot 9:; = = = = = 15..99 0.0000 0./#01./.1

-----------------------------------------------------------------------------< 4o*6st 2inf < 'oef. :t2. ;rr. t (+<t< 95% 'onf. =nter>al! -------------+---------------------------------------------------------------2inf < $1 < -./,-9#.1 .09-,//# -/.90.000 -.54,095, -.1.9#.#, $- < -./4/-01. .100#-1 -/.40 0.001 -.54-49/. -.14/909, $/ < .0.-4,54 .0#4#.-9 0.#5 0./95 -.095/0-.-40-// $4 < -.0/4,0-, .0#,#/-1 -0.40 0.,91 -.-0,-4-# .1/.0/.. 6nem < $1 < --.,#//94 .4.-/554 -5.,# 0.000 -/.,1.095 -1..49,9$- < /.4/--#.##9191 /.#, 0.000 1.,.4,-5 5.1#99/9 $/ < -1.0/9.55 .#901.59 -1.1. 0.-45 --..99/5# ..19#49 $4 < .0.-0/1, .44-0,,# 0.1, 0.#.1 -.#01.9#4 .945#,15 _cons < 1./1.#/4 .4.04011 -.#0 0.00, ./#.99,1 -.-4.,.-----------------------------------------------------------------------------'2,<(

Example. AD>(<,<" model of inflation G 01A1A, ctd;


. 2is @%2A6ste2 4sq6are2 = @ _res6lt(#); %2A6ste2 4sq6are2 = ./454##1. test $-.2inf $/.2inf $4.2inf; ( 1) ( -) ( /) $-.2inf = 0.0 $/.2inf = 0.0 $4.2inf = 0.0 8( . ( ( ( ( /" 14/) = (ro* + 8 = 4.9/ 0.00-# The extra la's of dinf are si'nif.

test $1.6nem $-.6nem $/.6nem $4.6nem; 1) -) /) 4) $.6nem = 0.0 $-.6nem = 0.0 $/.6nem = 0.0 $4.6nem = 0.0 8( 4" 14/) = (ro* + 8 = #.51 0.0000 The la's of unem are si'nificant

The null hypothesis that the coefficients on the la's of the unemployment rate are all 2ero is re3ected at the &4 si'nificance level usin' the 5% statistic
'2,<<

1he test of the ?oint &ypot&esis that none of the 98s is a useful predictor, above and beyond lagged values of Y, is called a Granger causality test

NcausalityO is an unfortunate term &ere; <ranger Causality simply refers to (marginal" predictive content;
'2,<*

Summary+ Time Series Forecasting !odels 6or forecasting purposes, it isn8t important to have coefficients 3ith a causal interpretation@ 0imple and reliable forecasts can be produced using A:(p" models G these are common Nbenchmar7O forecasts against 3hich more complicated forecasting models can be assessed Additional predictors (98s" can be addedU the result is an autoregressive distributed lag (AD>" model 0tationary means that the models can be used outside the range of data for 3hich they 3ere estimated 2e no3 have the tools 3e need to estimate dynamic causal effects;;;
'2,<)

You might also like