You are on page 1of 31

Autocorrelation

Click to edit Master subtitle style

11

What is Autocorrelation?

One of the important assumptions of CLRM is absence of autocorrelation/serial correlation among the error terms ui of the pop. regression fx. Assumptions: in CLRM it is assumed that conditional on the Xs, disturbances in two time periods are not correlated: Corr(ut,us|X) = 0 for all ts. Alternatively E(ut,us)=0 when ts-error term relating to one obs. is not affected by error relating to other obs. When this assumption is violated, errors are correlated across time known as the problem of serial correlation. Suppose ut-1>0 and on an avg. ut>0 then we may find serial correlation: Corr(ut,ut-1)>0.
22

What is Autocorrelation?

Example: 3 month T-bill rates (i3t) are regressed on rate of inflation (inft) and deficit as a % of GDP (deft). (Wooldridge, 2003): i3t = 0+1inft+2deft+ut Assumption of Homoskedasticity: conditional on X, variance of ut is same for all t: Var(ut|X)=Var(ut)=2; t=1, 2.n. In this context, homosk. implies that over time, the unobservables (us) affecting interest rates have a constant variance over time. Autocorrelation in this context implies: if interest rates are high for period t-1, it is likely to be high in period t as well. Absence of auto assumes nothing about the correlation of Xs across time, e.g. it is not concerned whether inft is correlated across time or not (it is generally so).
33

What is Autocorrelation?

Example: Consider the case of quarterly data of output, labour and capital. If there is a labour strike in one quarter, it is expected to affect output in that quarter. But in the absence of autocorrelation the effect of this strike is not expected to have effect on next period. In cross sectional analysis if we assume random sampling, for 2 hhold i and h, ui and uh are independent. Thus serial correlation is a potential problem in time series data.

44

Why Serial Correlation?


Inertia or sluggishness could result in serial correlation. Many time series data (GDP, employment, production) exhibit business cycles and successive observations are likely to be interdependent. Omitted Variable Bias, when some important variables are not included in the model: Example: demand for beef is a function of its price (X1t), income of consumers (X2t) and price of chicken (X3t): Yt=0+1X1t+2X2t+3X3t+ut If we run the reg. without X3t: Yt=0+1X1t+2X2t+vt the error term becomes vt= 3X3t+ut and the error term is expected to exhibit a systematic pattern.

55

Why Serial Correlation?

Incorrect Functional Form: serial correlation can occur if the regression is not correctly specified. E.g. let the model of MC be specified as: MCi=0+1Outputi+2Outputi2+ui

If the squared term is not included estimated MC will be a linear one and new disturbance, vi=Output2+ui will incorporate the systematic effect of squared output on MC. Cobweb Pattern: for agricultural goods supply reacts to price with a lag of 1 period: St=0+1Pt-1+ut and in such cases us are not random because if farmers produces more in period t, they might produce less in t+1.
66

Why Serial Correlation?

Models with Lag: Const=0+1Incomet+2Const-1+ut.

If the lagged term is not incorporated in the model the error term is expected to capture the effect of lagged variable. Manipulation of Data: If we generate data through manipulation (dividing quarterly data by 3 to get monthly data, interpolation, extrapolation) it might incorporate systematic pattern in the disturbances.

77

Consequences: Key Issues

OLS estimators are linear and unbiased but do not have the minimum variance among all of the linear unbiased estimators. OLS estimators are not BLUE in the presence of autocorrelation. OLS standard errors and t-statistics are no longer valid. The t-statistics will often be too large. The usual F test and LM test are invalid.

88

Gauss Markov in Time Series Analysis

Assumption 1: linear in parametres: yt=0+1xt1+ ..kxtk+ut Assumption 2: for each t, the expected value of the error term, given the explanatory var. for all time periods is zero: E(ut|X)=0, t=1n. Assumption 3: no perfect collinearity: no explanatory variable is constant or a perfect linear comb. of others. Assumption 4: homoskedasticity: the variance of ut (conditional on the Xs) is same for all t: Var(ut| X)=Var(ut)=2 Assumption 5: no serial correlation: conditional on X, errors in 2 time period are uncorrelated:Corr(ut,us|X)=0 for all ts.
99

Different Ways Errors are Generated

Assume that the disturbances can be characterized in the manner: ut=ut-1+et where t=1,2..n and ||<1 is known as the coefficient of autocovariance. et are the uncorrelated random variables (satisfies standard ass. of OLS) with zero mean and constant variance e2. This structure of the error term is known as 1st order autoregressive process or AR(1). Shift in ut can be divided into 2 parts: (i) ut-1-systematic shift and (ii) et -random term. An AR(2) model is the one like: ut=1ut-1+2ut-2+et
1010

Different Ways Errors are Generated

If the disturbance term follows a 1st order Moving Average Scheme MA(1): ut=vt+vt-1 here v is a random disturbance term with usual properties and ||<1. In case of ARMA (1,1): ut=ut-1+vt+vt-1 Higher order scheme can be considered as well. Consider a bivariate model: yt=0+1xt+ut Assume that sample avg of xt is zero ( x = 0) Let us assume that the disturbances can be characterized as AR(1): ut=ut-1+et

1111

OLS Estimator in case of Autocorrelation

OLS estimator of 1: = + SST1 1 1 x here SSTx = xt2


t =1 n

xu
t =1

t t

n 2 Var( 1 ) = SSTx Var( xt ut ) t =1 2 t

Var( 1 ) = SST [ x Var(ut ) + 2 xt xt + j E (ut ut + j )]


2 x t =1 t =1 j =1 n 1 n t 2 2 2 Var( 1 ) = ( / SSTx ) + 2( / SSTx ) j xt xt + j

n 1 n t

here the 1st term in RHS isVar ( 1 ) when =0 and this is the usual variance and Var(ut)= 2 and Cov(ut,ut+j)=j2
1212

t =1 j =1

OLS Estimation in case of Autocorrelation


If the usual formula of variance is used then the 2nd term is ignored. In most cases (i) >0 so j>0; (ii) independent variables are positively correlated: xtxt+j is positive. As a result the 2nd term of RHS is positive and variance of OLS estimator (2/SSTx) is an underestimation of the true variance. If <0 then j>0 when j is even and j<0 if j is odd. So the sign of the 2nd term is difficult to determine. In both cases in the presence of serial correlation the usual variance estimator will be biased for Var ( 1 )

1313

OLS Estimation in case of Autocorrelation

In addition, stan. error of -hat is an estimate of the sta. dev. of -hat, if we use the usual OLS stan. error in the presence of serial correlation, it is no longer valid. t-stats are also not valid as the usual t-stat will be artificially large with auto. The F test and LM test are not also applicable. Under certain assumptions (stationary data, weakly dependent data) the goodness-of-fit measures are still valid. In case of cross-sectional data R2=1-(u2/ y2 ). Overtime variances of error term and dep. variable do not change.

1414

Testing For Serial Correlation

Graphical Method: Absence of auto. is concerned with unknown population disturbance, ut which cannot be observed and instead we infer about it while observing ut generated through OLS. A graphical plot of u-hat often be used for an informal detection of auto. We can plot the residuals or standardized residuals (u-hat/hat) against time. If the residuals exhibit a pattern then its ut possible that ut is non random. We can also plot against ut 1 and observe the pattern, whether there is any systematic association. [insert/draw graph here and show it in STATA].

1515

Testing for Serial Correlation: t test with Strictly Exogenous Regressors


Our model is: yt = 0+1xt1+.kxtk+ut Assume the regressors are strictly exogenous: error term ut will be uncorrelated with the regresssors in all time periods. we exclude models with lagged dep. variables, its a large sample test. Null hypothesis E(et|ut-1,ut-2)=0. H0: =0 Var(et|ut-1)= Var(et)=e2 Consider AR(1): ut=ut-1+et E(ut|X)=0, t=1n. If ut could be observed: could estimate from the reg. of ut on ut-1 and use t-test for the significance of -hat. We dont observe it, so replace ut with OLS residual ut-hat and use t-test.
1616

Testing for Serial Correlation: t test with Strictly Exogenous Regressors


Run an OLS of yt on Xs and obtain ut-hat. Regress ut-hat on ut-1-hat and obtain -hat and the corresponding t-stat. Use this t-stat to test the null: H0: =0. We conclude on the basis of t stat at 5% significant level whether the null is rejected. In very large data sets sometimes serial correlation is found even with small value of . In addition to AR(1) can detect other form of error and any corr. between adjacent errors can be picked up. If adjacent errors are not correlated this test fails to detect serial correlation.
1717

The Durbin-Watson (DW) Test


The assumptions of the Durbin-Watson test are: The reg. model should include an intercept term. The Xs are non stochastic, or fixed in repeated sampling. Disturbances are generated through AR(1) process. The reg. model cannot include lagged dependent variable (autoregressive models). There is no missing observation in the data. The Durbin-Watson d statistics can be defined as: n
d= (ut ut 1 ) 2
t=2

u
t=2

2 t

This is the ratio of sum of squared differences in successive residuals to the residual sum square. 1818

The Durbin-Watson (DW) Test

Expanding the model we get:

u + u 2 u u d= u
2 t 2 t 1 2 t

t t 1

DW and -hat obtained in t-test are linked: d2(1- ^) ut ut 1 where = assume the adjacent sum squares are equal. ut2 As we know: -1+1: 0d4. The exact dist. of the statistics is difficult to derive so there is no unique critical value but upper and lower bound.
Reject H0, indecision Reject Ho do not reject H0 or H0* or both indecision negative auto

positive auto

no auto

1919
0 dL dU 2 4-dU 4-dL 4

The Durbin-Watson Test


Run the OLS and get the residual u-hats. Compute d with the formula. For the sample size and no. of explanatory vars, find dL and dU. Make a conclusion of the null. One problem is when it falls within indecision region. There are different variants of DW test. Example: Static Philips curve: inf t = 1.42 + .468unemt Reg. ut-hat on ut-1-hat and -hat=0.573 with t=4.93. DW=2(1- -hat)=0.854; with k=1, n=50, dL=1.32; falls in rejection region so reject null: there is auto.

2020

The Durbin-Watson Test

Expectation Augmented Philips curve:

inf t = 3.03 + .543unemt

Reg. ut-hat on ut-1-hat and -hat=-0.036 with t=-.297. DW=2(1- -hat)=2.07; with k=1, n=50, dU=1.59; falls in acceptance region so accept null: no auto. Alternatively in STATA: tsset data with time var. run OLS, predict residuals and lag of residuals, reg. ut-hat on ut-1-hat and get -hat. For the t-test, simply check the significance of -hat. For DW test type estat dwatson for getting the result of DW stat. The manual results and STATA could differ slightly.
2121

The Breusch-Godfrey Test for Higher Order Auto

If the disturbance term ut is generated by the q-th order autoregressive scheme: ut=1ut-1+ 2ut-2 . qut-q+et The null is: H0: 1=0; 2=0 q=0. Run the OLS of yt on xt1, xt2.xtk and get the residuals uthat for all t=1,2.n. Regress ut-hat on all the xs (xt1, xt2.xtk) and the lagged value of the estimated residuals (ut-1-hat,ut-2-hat etc) for all t=(q+1),..n. Compute the F-test for joint significance of all u-hats. Alternative is to compute Lagrange Multiplier test (LM) LM=(n-q)R2u-hat where the R-sq is the one obtained in the 2nd stage of regression of u-hat.

2222

Breusch-Godfrey Test for Higher Order Auto

Under the null: LM~2. If (n-q)R2 exceeds critical 2 we can reject the null and that means that at least one is sig. different from zero. The explanatory variable can contain lagged value of Y. It is applicable even if the errors follow a qth order MA process: ut=t+ 1t-1 . qt-q When it is 1st order autoregression (q=1) it is known as Durbins m test. One problem is, the length of lag q cannot be specified. Example: Estimate a model of import of Barium Chloride on several controls (Wooldridge, 2003) by OLS:

log(chnimp) = 17.80 + 3.12 log(chempi) + .196 log( gas) + .983 log(rtwex) + .060befile6 .032affile6 .565afdec6
2323

BG Test: Example

In STATA: estimate the model with OLS and get predicted value of residual, and lags of order 1, 2, 3. Run an OLS of ut-hat on all the xs and the lagged residuals (ut-1-hat,ut-2-hat, ut-3-hat). For F test of joint significance of the u-hats type test ut-1-hat ut-2-hat ut-3-hat after the regression. F(3,118)=5.12; Prob>F=.0023 where the null is no serial corr. We reject the null so AR(3). For BG test: LM=(n-q)R2=(131-3)*.1159=14.835. Here the Rsq is obtained from the reg of residual on all Xs and us. Critical chi2 (3)=12.838. Cant accept null of no serial corr. Type estat bgodfrey, lag(3)-we get chi2=14.768; prob>chi2=.0020; reject the null of no auto.
2424

Remedial Measures: Quasi-Differenced Method

Assume AR(1) model with strictly exogenous regressors (without any lagged dependent var. as X). ut = ut-1+et for al t=1.n Assume, Var(ut) =e2/(1-2). Consider the model: yt=0+1xt+ut For t2: yt-1=0+1xt-1+ut-1 Multiply (2) with and subtract from (1): yt-yt-1= (1- )0+1(xt-xt-1)+et yt*= (1- )0+1xt*+et for t2 for t2 (4) (2) (3) (1)

This is known as quasi-differenced data (with =1 these are differenced data) where errors are not serially corr. ( ||<1).

When the structure of auto () is known we could estimate eq. (4) which satisfies all ass. of Gauss-Markov.
2525

Quasi-Differenced Method

The OLS estimators of (4) are not BLUE but can easily be transformed into. y1=0+1x1+u1 for t=1 (5) If we add (5) to (4) we get serially uncorrelated error. But Var(u1) =e2/(1-2)>e2(Var(et)). If we multiply (5) by (1-2)1/2, we get: (1-2)1/2 y1= (1-2)1/2 0+(1-2)1/21x1+(1-2)1/2u1 ~ = (1 2 )1/ 2 + ~ + u y1 x1 ~1 (6) 0 1 Here the error term has variance =(1-2)Var(u1)= e2 Can use (6) with (4) and this gives BLUE estimators and satisfies Gauss Markov. This is a form of GLS. For given it is easy to transform data and to perform OLS. 2626

Feasible GLS with AR(1) Errors

In case of GLS the problem is we might not know the exact value of but we can get an estimate of it and this method is known as feasible GLS. Regress OLS residuals on lagged residuals and get -hat. Use this estimate of instead of actual to get quasidifferenced variables. ~ = ~ + ~ + ........ ~ + error 0 xt 0 1 xt1 t Apply OLS: yt (7)k xtk ~ = (1 ) ~ = (1 2 )1/ 2 xt ,0 x1,0 For t=1 t2 Thus the procedure is: Regress (OLS) of yt on the xs and obtain ut-hat for t=1n. Regress estimated residuals on its lag and get -hat. Run an OLS of (7) and obtain the beta-s.
2727

Feasible GLS with AR(1) Errors

In this procedure when the 1st obs. is omitted and uses estimated is known as Cochrane-Orcutt Estimation. When the 1st obs. used-known as Prais Winsten Estimation. Both of the procedure follows an iterative scheme. Once the FGLS is found with -hat, can compute a new set of residuals and get a new estimator of , transform the data with the new estimate of and estimate (7) by OLS. Similar procedure is followed many times, till estimated- changes minutely from previous estimate. The shortcoming of using FGLS is, it doesnt have certain properties of a finite sample. it is not unbiased (so not BLUE) but it is consistent under certain assumptions (weakly dependent data). ` 2828

Example: CO Procedure

The t and F stats are approximately t and F distributed due to estimation error of -hat. But most cases these are not serious problem unless the sample size is small. Estimate a model of import of Barium Chloride on several controls (Wooldridge, 2003) with OLS:

log( chnimp ) = 17.80 + 3.12 log( chempi ) + .196 log( gas ) + .983 log( rtwex ) + .060 befile 6 .032 affile 6 .565 afdec 6

In STATA: tsset data with time var., run OLS, after the OLS type prais dep.var. ind.vars., corc for CO iterative method. For the significant variables the result doesnt differ much. But for CO estimates the st. errors are higher as they take care of the problem of serial corr. The OLS st. errors understate the actual sampling variation and 2929 should be corrected when auto is present.

Differencing as a Remedy

Consider the model: yt=0+1xt+ut t=1, 2..n ut follows AR(1) scheme. In case of non stationary data and random walk models, using OLS is misleading and often differencing is used. Differencing leads to: yt=1xt+ut t=2..n First differencing is often a good strategy in case of auto corr. with a positive and large value of as it is expected to eliminate serial corr. Example: i3t = 1.25 + .613inft + .700deft In STATA obtain u-hat from this reg. and also its lagged value and estimate -hat (=.529)- sig. at 1% so serial corr. Take 1st diff. of all the vars. and again check new -hat =(.068) not sig. thus differencing has removed auto.
3030

Serial Correlation Robust SE after OLS


Detailed discussion, Wooldridge (2003). Basically we need to get a serial correlation robust SE (here se(^) is the usual incorrect OLS SE.
se ( 1 ) = [" se ( 1 )" / ]2 v v = a + 2[1 h /( g + 1)]( at at h )
t =1 2 t h =1 t =h +1 n g n

Estimate the model by OLS and get se(-hat) , -hat and u-hats. Compute rt-hats and form at-hat. Also compute v-hat. Finally get se(^) while using the formula. Example: Puerto Rican wage data (Wooldridge, 2003).
3131

at = rt ut

You might also like